Information Publication for Systems Engineers – making engineering outputs more accessible

John Welford

02/11/18

Setting the scene

What don’t we cover?

What do we cover?

Contents.

Why do we care…

…as engineers?

…as Systems Engineers?

Pedantry, or professionalism?

Who am I?

Not a data visualisation expert or a typographer, just a practitioner and an enthusiast.

Dr John Welford MEng PhD CEng IET MINCOSE

WSP New Zealand, Technical Principal Systems Engineer


General concepts

Data and information

The DIKW model. The [DIKW](https://en.wikipedia.org/wiki/DIKW_pyramid) model.

Think about your audience

Presentation of abstraction

As engineers we are often working with an abstraction of the real system.

When we publish information we are always presenting an abstraction of the real system.

It seems that perfection is attained not when there is nothing more to add, but when there is nothing more to remove.

Antoine de Saint Exupéry

It is therefore up to us to choose what to emphasize, for example…

xkcd indicating what your chosen abstraction says about you. [xkcd](https://xkcd.com/977/) indicating what your chosen abstraction says about you.

Separating content from presentation

Familiar to web-authors: HTML = content (+ structure),
CSS = presentation.

In DIKW terms: information = content (+ structure),
publication = presentation.

Ideally — first develop the content, then later develop the presentation.
Practically — development is often in parallel; however, content should always be prioritised.

Some tools provide a clear separation between content and presentation, others are more WYSIWYG. In either case, it pays to at least conceptually make the separation.

Dangers of conflating content with presentation

This content is written in Markdown, the formatting uses tufte-css styling.

  1. Distraction from the process of working with information
  2. Reduced portability and compactness
  3. Lack of proper information structure

Additional considerations

Be conscious & intentional: Every aspect of presentation represents a decision. Every decision should be justifiable.

Be consistent: The same decision should have the same outcome each time it’s made. There should be uniformity in the resulting publication.

Beauty vs. practicality: Ideally both! But (for engineering): \(practicality >> beauty\).

Science, art, opinion

Be aware that most advice on the topic of information publication falls into one of three categories (including this tutorial):

  1. Science (researched and peer-reviewed)
  2. Art (general expert concensus)
  3. Opinion (my own)

References

Other authors to read or follow: Edward Tufte, Naomi Robbins, Mike Bostock, Bret Victor.

References cited or linked throughout, but recommended reference texts are:

The Elements of Typographic Style — Robert Bringhurst

Visualization Analysis & Design — Tamara Munzer

Show Me the Numbers: Designing Tables and Graphs to Enlighten — Stephen Few


Data/information management

Data structure

Likely to be out of your control:

The process of producing information from the data will involve some degree of analysis, but it is also where sensible choices can be made about the information structure.

Variations in data structure

Data can have many different structures. Part of data analysis is cleaning and tidying the data.

Consider…

Treatment A Treatment B
John Smith 2
Jane Doe 16 11
Mary Johnson 3 1

Transposed as…

John Smith Jane Doe Mary Johnson
Treatment A 16 3
Treatment B 2 11 1

Or tidied as…

Name Treatment Result
John Smith A
Jane Doe A 16
Mary Johnson A 13
John Smith B 2
Jane Doe B 11
Mary Johnson B 1

Dont be afraid to change your data structure to support analysis and information presentation.

Information structure – focus on content

The output of analysis should yield information. At this stage it can be very tempting to start presenting the information, indeed you may wish to start considering the final published form prior to the information being complete.

However, it requires discipline to keep the concept of structuring the content separate from presenting it.

For example, setting up the structure of your report, is separate from deciding which levels of heading are going to be in bold.

Configuration control

Both data and information should be under some form of configuration management, ideally supporting:

Also consider maintaining an auditable trail of how the information was generated from the data.

Data/Information tools

Spreadsheets: Excel, Calc, Numbers, Google Sheets

Databases: Access, DOORS, SQL, MBSE tools

Formats: CSV, JSON, XML

Languages: Matlab, Python, R, LaTeX, Markdown


Document presentation

The definition of ‘best’ here will depend on your audience and what the information is – as discussed previously.

Assume the information we wish to publish is best presented in the form of a document.

We’ll also assume that the information is already developed in terms of both content and structure. The format that this is captured in might vary depending on the tool that we choose to use.

For reference let’s have a look at some examples of content and structure…

Content and structure – LibreOffice Writer

NB: Structure is less explicit here, as it is a WYSIWYG tool.

An example document in LibreOffice. An example document in LibreOffice.

Content and structure – Markdown

The same document in Markdown.

Example document
================

Demonstrating the information *content* for a document, and a bunch of different aspects of document *structure* (the funny-looking parts).

## Lists

Could be:
* Enumerated
* Unordered
  * Sub-lists

## Links

To websites such as [Google](www.google.com), or to other sections of the document such as the [list](#lists) section.

## Tables

| Heading | Attributes |
| ------- | ---------- |
| Content | Can include text or numbers |
| Rows    | 3.142 |

## Images

![WSP local logo](graphics/wsp.png)
![SESA web logo](https://www.sesa.org.au/templates/js_simplepro_red/images/sesa_logo.png)

Content and structure – HTML

The same document in HTML.

<h1 id="example-document">Example document</h1>
<p>Demonstrating the information <em>content</em> for a document, and a bunch of different aspects of document <em>structure</em> (the funny-looking parts).</p>

<h2 id="lists">Lists</h2>
<p>Could be:</p>
<ul>
  <li>Enumerated</li>
  <li>Unordered
    <ul>
      <li>Sub-lists</li>
    </ul>
  </li>
</ul>

<h2 id="links">Links</h2>
<p>To websites such as <a href="www.google.com">Google</a>, or to other sections of the document such as the <a href="#lists">lists</a> section.</p>

<h2 id="tables">Tables</h2>
<table>
  <thead>
    <tr>
      <th>Heading</th>
      <th>Attributes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Content</td>
      <td>Can include text or numbers</td>
    </tr>
    <tr>
      <td>Rows</td>
      <td>3.142</td>
    </tr>
  </tbody>
</table>

<h2 id="images">Images</h2>
<p><img src="graphics/wsp.png" alt="WSP local logo">
<img src="https://www.sesa.org.au/templates/js_simplepro_red/images/sesa_logo.png" alt="SESA web logo"></p>

Content and structure – LaTeX

The same document in LaTeX.

\section{Example document}\label{example-document}

Demonstrating the information \emph{content} for a document, and a bunch of different aspects of document \emph{structure} (the funny-looking parts).

\subsection{Lists}\label{lists}
Could be:
\begin{itemize}
  \item Enumerated
  \item Unordered
  \begin{itemize}
    \item Sub-lists
  \end{itemize}
\end{itemize}

\subsection{Links}\label{links}
To websites such as \href{www.google.com}{Google}, or to other sections of the document such as the \hyperlink{lists}{lists} section.

\subsection{Tables}\label{tables}
\begin{longtable}[]{@{}ll@{}}
\toprule
Heading & Attributes\tabularnewline
\midrule
\endhead
Content & Can include text or numbers\tabularnewline
Rows & 3.142\tabularnewline
\bottomrule
\end{longtable}

\subsection{Images}\label{images}
\includegraphics{graphics/wsp.png}
\includegraphics{https://www.sesa.org.au/templates/js_simplepro_red/images/sesa_logo.png}

Presenting the document

In all the preceeding examples the content was the same, although the syntax used to provide the structure was different. However, presentation of that content depends on the styling that is added based on the structure.

Styling will also have it’s own syntax that is tool/language specific. The capability of different tools to provide styling also varies significantly.

Next we will run through a bunch of different document presentation areas where we should make concious decisions about styling, this is better known as typography.

Typeface

This document is predominantly set in the open source ETBook font, similar to the proprietary Monotype Bembo font.

A general ‘font family’, typefaces include many different fonts with variations in size and emphasis (bold, italic, etc).

Typefaces (and fonts) may be classified as either sans-serif, or serif.

Serifed fonts are considered easier to read in print than sans-serif. However, the science on this appears to be inconclusive.

Conversely sans-serif are sometimes preferred for on screen reading, as they scale better at low resolutions.

Some typefaces have been designed to assist dyslexic readers, however their efficacy is disputed.

Sans Forgetica is designed to be intentionally difficult to read, as this prompts your brain to engage in deeper processing.

Headings

Typically a document may have many levels of heading.

Too many levels and you will lose the reader!

ALL CAPS, ‘Title Case’, and ‘Sentence case’ can be used at different levels of heading, along with changes in size, colour and font.

Text size and spacing

Size: choose for legibility and to match the page (see later).

Letter spacing: best to stick with the default!

Setting: Either flush-left ragged-right, or justified. Justified text is achieved by the tool modifying the inter-word spacing, so choose justifed text only when the line is long enough — hyphenation may still be necessary to avoid sloppy spacing (if your tool supports this).

Advanced tools may support microtypography, which subtly adjusts other aspects of the text to improve readability and appearance.

Justified text is considered to be a problem for dyslexics, due to the uneven spacing and distracting ‘rivers’ of white space.

Kerning:

xkcd.com again demonstrating what not to do. [xkcd.com](https://xkcd.com/1015/) again demonstrating what not to do.

Spaces between sentences

Double spacing is an artefact of victorian typewriter usage and is no longer relevant.

Paragraphs

Provide a pause in reading, and may be shown by either an initial indent or a slight space between blocks of text.

Indents are more common in printed literature, whilst spacing is more prevalent on the web (where there are less space constraints).

If you’re working in a WYSIWYG tool then you shouldn’t be inserting an extra carriage return between paragraphs. This is mixing up content with presentation.

Weighting and emphasis

Change one parameter at a time.

A roadmap of fonts in a family of type, originally from Bringhurst, reproduced by Boulton. A roadmap of fonts in a family of type, originally from [Bringhurst](https://en.wikipedia.org/wiki/The_Elements_of_Typographic_Style), reproduced by [Boulton](https://markboulton.co.uk/journal/five-simple-steps-to-better-typography-part-5).

Page layout

Line length: 66 characters is considered ideal, but anything 45 to 75 characters is ok (including punctuation and spaces). Longer might be ok for discontinuous texts (e.g. bibliographies).

Line spacing: leading is usually slightly more than character height, giving a small gap between lines. Sometimes much larger spacing (1.5 or double-space) may be requested to allow for handwritten review comments.

Margins: textblock width should be defined to achieve the right line length based on the typeface size and page size. Textblock height depends only on what size margins you leave — don’t be stingy on the margins or your page will look ugly! Also worth considering are binding and on-screen reading.

Headers and footers: may carry information about the section and the page, or about the publication itself. The latter only seems necessary if there is a danger that pages may be reproduced out-of-context.

Lists

Avoid over-punctuating lists.

Be consistent in list structure and punctuation.

Enumerate lists when items have an order, or where they need to be referenced later (although be aware that this may imply a priority).

Special characters

NB: These are content not presentation!

Use non-breaking spaces where words should not be separated.

Dashes – come in various lengths, choose the correct one for your purpose:

Notation, quantities and units

Try to stick with standard choices for symbols representing variables (but don’t forget to also define them!).

Consider using a different typeface or italic font for variables.

Use SI units as far as possible.

Take care when typesetting numbers and units, some tips:

Units in tables and figures

In tables and figures consider using a slash to denote the units. This is recommended by the BIPM as the correct method of expressing values for multiple quantities.

An example figure showing the use of a slash for units. An example figure showing the use of a slash for units.

Equations

Equations are usually set centred, with a reference number on the right-hand side of the page.

(3.32)

\(\theta_b = \max\bigl(\widehat{\theta}_b,\min(-\widehat{\theta}_b,\int \omega_b \text{ d} t)\bigr)\)

Use a proper equation editor!

Abbreviations (acronyms & initialisms)

Very rarely do readers wish that an author had used more abbreviations!

Define abbreviations both the first time they are used, and within an abbreviations list.

Notes and references

Notes: use end-notes or side-notes for digressions that do not belong in the main text. References are a subset of notes.

References: Use proper reference management software; autogenerate bibliographies.

Consider hyperlinks if your document will be presented in digital format.

Don’t cite fake references!

Cross-references

Where appropriate, cross-references within the document should be made, including:

Always use your tool’s cross-referencing functionality for these.

Figures

See the Graphic Presentation section for more.

Figures should appear as soon as possible after, but not before, they are referenced in the text.

Text in figures should be horizontal (or at least oblique).

Text should be legible.

All figures should have captions below them.

Tables

Tables should appear as soon as possible after, but not before, they are referenced in the text.

Numbers in columns should be aligned on the decimal.

Heavy gridlines are not necessary; a few horizontal rules are ok, but white space is usually better.

Quoted numeric precision should reflect accuracy of measurement.

A tables should have captions above them.

Making decisions on style

Start with prescribed document templates.

Check whether your organisation has a house style or style guide.

Choose another organisation’s manual of style:

Document tools

Word processing: Word, Writer, LyX, Pages, Google Docs

Typesetting: InDesign, Scribus, Publisher, LaTeX

Content editing (any text editor): Notepad, Notepad++, Emacs, vim

For converting content and structure between tools try Pandoc.


Graphic presentation

Is a diagram worth a thousand words?

Sometimes no! Sometimes no!

But sometimes yes!

Always consider text before tables, and tables before graphics.

Anscombes Quartet

For another fun example check out the Anscombosaurus!

x y
10.00 8.04
8.00 6.95
13.00 7.58
9.00 8.81
11.00 8.33
14.00 9.96
6.00 7.24
4.00 4.26
12.00 10.84
7.00 4.82
5.00 5.68
x y
10.00 9.14
8.00 8.14
13.00 8.74
9.00 8.77
11.00 9.26
14.00 8.10
6.00 6.13
4.00 3.10
12.00 9.13
7.00 7.26
5.00 4.74
x y
10.00 7.46
8.00 6.77
13.00 12.74
9.00 7.11
11.00 7.81
14.00 8.84
6.00 6.08
4.00 5.39
12.00 8.15
7.00 6.42
5.00 5.73
x y
8.00 6.58
8.00 5.76
8.00 7.71
8.00 8.84
8.00 8.47
8.00 7.04
8.00 5.25
19.00 12.50
8.00 5.56
8.00 7.91
8.00 6.89

All sets have the same:

Anscombes Quartet reveals very different results when graphed. Anscombes Quartet reveals very different results when graphed.

Two motivations for visualisation

Discovery:

Presentation:

The graphic may need to display the data, but the message should be the information.

Dataset types

Munzer defines three types of data sets. [Munzer](https://www.cs.ubc.ca/~tmm/vadbook/) defines three types of data sets.

Attributes

Munzer also defines three types of attributes. [Munzer](https://www.cs.ubc.ca/~tmm/vadbook/) also defines three types of attributes.

Encoding channels

Attributes may be encoded on different channels, which vary in effectiveness. Attributes may be encoded on different channels, which vary in effectiveness.

Why are some channels more effective?

Stevens Psychophysical Power Law: \(S=I^n\), defines why certain channels are more effective than others. [Stevens Psychophysical Power Law](https://simple.wikipedia.org/wiki/Stevens%27_power_law): $S=I^n$, defines why certain channels are more effective than others.
Crowdsourced results show how graphical perception varies for different encodings. [Crowdsourced results](http://vis.stanford.edu/files/2010-MTurk-CHI.pdf) show how graphical perception varies for different encodings.

Keys and values

Keys: Independent attributes (categorical or ordinal)

Values: Dependent attributes (categorical, ordinal or quantitative)

Zero keys: Scatterplot

One key: Bar chart, Line chart, Dot charts, Coloured scatterplot

Two keys: Heatmap, Stacked bar chart, Coloured bar/line/dot charts

Three or more keys: ‘Small multiples’ of the above

Small multiples

Small multiples of plots allow inclusion of more keys, without resorting to colours or symbols. Small multiples of [plots](http://extuitive.co.uk/papers/jpsPilotStudy.pdf) allow inclusion of more keys, without resorting to colours or symbols.

Categorical keys

Choose a sensible order for the key.

Don’t connect points between categorical data.

Line graphs make sense for quantitative and some ordinal data, but are not suitable for categorical data – they have been shown to suggest trends where they are not meaningful. Line graphs make sense for quantitative and some ordinal data, but are not suitable for categorical data -- they [have been shown](https://pdfs.semanticscholar.org/bc64/34389ee4533735151c07f564dfd647b02e1d.pdf) to suggest trends where they are not meaningful.

Networks and trees

Both are highly relevant to Systems Engineering.

Networks:

Trees:

Intuitive, but limited in network size (consider interactivity or separate views).

Link density = number of links per node. Trees have link density of one. Maximum link density for effectiveness is around three or four.

Consider layout:

An example node-link diagram. Credit: Mike Bostock. An example node-link diagram. Credit: [Mike Bostock](https://beta.observablehq.com/@mbostock/d3-force-directed-graph).
An example edge bundling diagram. Credit: Mike Bostock. An example edge bundling diagram. Credit: [Mike Bostock](https://beta.observablehq.com/@mbostock/d3-hierarchical-edge-bundling).

Matrix views (networks and trees)

In graphs of more than 20 vertices, matrix views typically perform better; the exception being where path finding is important.

A example adjacency matrix diagram. Credit: Brian Staats. A example adjacency matrix diagram. Credit: [Brian Staats](https://beta.observablehq.com/@bstaats/matrix-diagram).

Enclosure (trees)

Show hierarchical structure through containment rather than connection.

Note: these are not appropriate for networks.

An example treemap diagram. Credit: Mike Bostock. An example treemap diagram. Credit: [Mike Bostock](https://beta.observablehq.com/@mbostock/d3-treemap).
An example bubblemap diagram. Credit: Mike Bostock. An example bubblemap diagram. Credit: [Mike Bostock](https://beta.observablehq.com/@mbostock/d3-circle-packing).

Spatial data

Where data is spatial, it is usually beneficial to present it spatially.

Geometry: Chloropeth maps

Fields: Isocontours, Vector fields

Colour

Categorical: There are a limited number of discriminable bins (max 6–12), ensure that there is clear separation between them. Use variations in colour hue.

Ordered: Colour scales should be perceptually linear. Use variations in colour luminance, saturation and/or hue.

Rainbow colourmaps (e.g. ‘Jet’) are a poor default, as they are perceptually unordered and nonlinear. Try Viridis or Cividis.

Consider colour-blind users (Cividis is better).

Use online tools such as colorbrewer to choose a palette.

Jet is not percieved as accurately as greyscale or Viridis, particularly by Colour Vision Deficient (CVD) readers. Jet is [not percieved as accurately](https://arxiv.org/pdf/1712.01662.pdf) as greyscale or Viridis, particularly by Colour Vision Deficient (CVD) readers.

Or, taken from a tweet:

The discontinuity is not obvious in the upper (Jet) representation, but is much clearer in the lower (Viridis). Credit: Matthias Bussonnier. The discontinuity is not obvious in the upper (Jet) representation, but is much clearer in the lower (Viridis). Credit: [Matthias Bussonnier](https://twitter.com/Mbussonn).

Information density

Also called data-ink ratio.

Above all else show the data.

Edward Tufte

Amount of information encoded, vs empty space in the graphic.

Generally higher information density is preferred.

Labelling graphics

Always label graphics! (items and attributes)

Ensure labels are legible (axis, font, size). Avoid abbreviations.

Use horizontal labels wherever possible. Rotating the graphic can make this easier. Oblique labels are a compromise.

Refer back to Stevens Power Law for a good example.

Make sure all encoding is labelled. Legends are ok, directly labelling the data is better.

Producing graphics

  1. For each item, choose the attribute(s) you wish to display
  2. For each attribute to display, choose an appropriate channel to encode it
  3. Choose a layout and labelling structure
  4. If appropriate, choose how your item keys are sorted
  5. Review and iterate

Graphics tools

Drawing: Visio, Inkscape

Data drawing: Tableau, Excel, Google charts

Data linking: D3 (javascript), Shiny (R), Matlibplot (Python), Visio


Publication

Remember the basics!

Work in a suitable format

Raster drawing tools manipulate pixels in an image. Except in rare cases, they do not embed the data.

Vector drawing tools manipulate shapes in a coordinate system. They allow data to be directly represented and embedded.

Benefits of vector formats:

Deliver in a suitable format

Word and Excel are editing tools, with reading modes bolted on.

Due to proprietary formats they also require the recipient to own a copy of the tool.

PDF is a better alternative. Delivering in PDF will:

HTML is emerging as an even better standard.

Image formats

Different image formats have different uses (except bitmap, which has been superseded).

Different image formats have different uses (except bitmap, which has been superseded).


Advanced techniques

Interactive graphics

Interactivity is useful to handle complexity, it:

Avoid unnecessary interactivity, and do not rely on it.

Multiple views

A presentation of the data is only one possible view of it. Different views may be useful for:

Levels of abstraction

Present to reflect Systems Thinking – take the reader through Bret Victor’s Ladder of abstraction.

Transitioning graphics

Animated transitions between views of the same data can be used to help aid understanding.

Research has shown that animated transitions between related data graphics significantly improve visual perception.

These benefits are greater if the animations are staged.

Explorable explanations

Explorable explanations are reactive and explorable documents that allow a reader to:

There are a large number of examples available.

Most are built in HTML, however the Computable Document Format does exist as well.

There are lots of examples of explorable explanations and interactive visulations on observable.

A recent development is the Observable coding system, which provides a reactive programming environment to support explorable explanations, visualisations and active reading.

Speed reading

There are a few technologies around such as Spritz to facilitate speed reading.

E-books

E-books are arguably the future of reading:

There are a variety of formats, readers and creation tools available, formats can be converted easily.


Homework exercise

Improve upon one of these publications.

Taken from the Transport for New South Wales Future Transport Strategy. Taken from the [Transport for New South Wales Future Transport Strategy](https://future.transport.nsw.gov.au/plans/future-transport-strategy/a-vision-for-transport).
Taken from the seminal work of Eric Honour. Taken from the seminal work of [Eric Honour](http://www.honourcode.com/seroi/documents/SE-ROI%20Slides-distrib.ppt).
  1. What do you think the overall message is? (information)
  2. What is the background data? (items and attributes)
  3. What type are the attributes? (ordinal/quantitative/categorical) (keys/values)
  4. What are the channel encodings that have been used?
  5. What different (/better) encodings could be used?
  6. Sketch out a new version and label it.

Feedback

email icon john.welford@wsp.com

twitter icon extuitive