Thoughts

Article

Charting missing data

I was studying this serious infographic on the multimedia section of the Eyewitness News website – charting civilian casualties in the Afghanistan conflict from 2009 to 2016 – and was baffled that the graphic design failed to highlight the interim reporting of the data (January to June).

To anyone that wants to study the data critically, those intervals would be important.

Also, the bar chart misrepresents the data by showing that the total number of casualties is more than what they actually are.

EWN bar graph: Afghanistan’s mounting civilian casualties

EWN infographic - record level of civilian casualties sustained in first half of 201
Civilian Deaths and Injured: January to June 2009 - 2016, UN report

EWN bar graph misrepresents some data

EWN infographic - civilian casualties in the Afghanistan graphic misrepresents data
These three values are less than 4,000, but the graphic indicates otherwise

How could one chart gaps in the data?

I wondered why would someone report on ’missing data’? But if that’s the task, then one needs to approach the graph design differently.

Bar graphs are appropriate to chart a time-series, like this data. However, this graph design makes it look like a continuum of variables, which it is not (data is from January to June each year).

Graphs is a series of small multiples

One possible solution is to combine graphs in a series of small multiples, as Edward Tutfe, the man the New York Times calls The da Vinci of data, describes them:

Start quote

At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution.
Edward Tufte, in Envisioning Information (page 67)

End quote

The advantage of using a series of smaller graphs:

  • Makes it clear that there’s missing data (July to December)
  • Makes it easier to compare the different years

To me, the exact number labels can be eliminated – unnecessary data ink – as what’s most important is to show the trend, thereby creating a simpler presentation.

Afghanistan: Record level of civilian casualties sustained in first half of 2016

Chart showing Afghanistan's mounting civilian casualties using a series of smaller graphs
Civilian Deaths and Injured: January to June 2009 - 2016, small multiples