1 Invitation to Data Visualization
1.1 Goals
In this lesson I will provide some examples of interesting and influential data visualizations.
1.2 Atmospheric carbon dioxide concentration
Climate change over the coming century will be controlled by human-driven emission of carbon dioxide from fossil fuels into the atmosphere, and possibly our ability to remove it from the atmosphere. Starting in the 1950s, the amount of carbon dioxide in the atmosphere (in parts per million) was regularly measured. Subsequently, methods for analyzing gasses trapped in ice were used to extend this record back about one million years. There is a direct physical link between atmospheric concentration of carbon dioxide and the loss of heat from Earth to space, resulting in a mechanistic link between increasing carbon dioxide concentration in the atmosphere and the mean temperature of the surface of the Earth. Visualizations of this data and assocated global mean temperature data have been extremely influential, forming the cornerstone of books, a documentary movie, and countless educational and environmental change movements.
The two graphs below show carbon dioxide concentrations (in parts per million) in the atmosphere measured from a high-altitude observatory on the island of Hawai’i. The first graph shows two years of data. The second graph shows the monthly record since regular observations began.
Two signals are immediately clear:
- there is an annual cycle in the CO\(_2\) concentration and
- there is a yearly increase in CO\(_2\) concentration, and this increase has been gradually increasing over the past 60 years.
A related graph, with a contrasting format shows the annual cycle of sea ice in the Arctic over the past 40 years.
Sample visualizations of atmospheric carbon dioxide which are regularly updated are available from the institute that has been collecting this data for decades.
The closely related data of estimated global mean temperature over time are available from NASA.
Many other sites have information on these data, usually presenting data visually as a testament to the importance of visualizations.
1.3 Human health and development
Hans Rosling was a physician and popularizer of data visualizations to develop understanding of human health and economic development over time and across countries. His public presentations illustrate his view of how dyanmic charts can help us come to see the trajectory of global development, particularly the connections between health and economic development. I strongly encourage you to watch one of his presentations. He was especially well known for his effort to dispell misunderstandings about differences across countries in health and human development. He popularized a style of scatterplot which combined the use of colour, symbol size, and animations to show changes over time.
1.4 Weather
Many people are strongly interested in their local weather conditions. As a result of this strong interest and the complextity of the data, many visualizations have been developed. Forecasts, such as those produced by Environment Canada, and historical retrospectives, such as those produced by Weatherspark are examples that leverage familiarity with the data, broad-scale human interest, and data-rich but not overly complicated displays. Two examples are shown below.
Note that each image contains a wealth of data:
- current weather conditions and forecasts of weather conditions, high temperature for 12 days,
- average historical daily highs and lows throughout the year, plus information on the quantiles of the temperature distribution on each day (shaded regions), some annotations showing high and low temperatures on selected days, and comparative descriptive labels (cold, warm, and neither).
1.5 Journalism
In the past decade there has been a resurgence of interest in data visualizations, stimulated in part by journalists emphasising visualizations in their publications. This example in the New York Times shows projected earnings for college graduates in a range of fields of study and is accompanied by notes and discussion questions. The New York Times has a series of educational materials on both visualizations and their stories.
1.6 Historically important visualizations
Many ideas in contemporary data visulizations can be traced back to the 18th and 19th centuries, as represented by several impactful examples. In the late 18th century William Playfair introduced a wide range of visualizations including bar charts, time series line plots, and pie charts. In 1869, Charles Minard produced a map of Napoleon’s Russian campaign of 1812. Florence Nightingale was a pioneer user of data visualizations to communicate messages about sanitation and public health, famously in a polar histogram showing causes of mortality of soldiers. Also in public health, John Snow mapped a cholera outbreak in London, visually linking deaths to a water source. All of these visualizations were great advances over the bills of mortality produced a few centuries earlier.
1.7 Stories
A common observation is that humans learn from stories. What is the role of data and its visualization in story telling? A graph does not tell a story by itself, but a story can be woven from a combination of words and some data visualizations.
Wilke’s book has an excellent argument in favour of storytelling with data which he tells in a video (starting at time 6:42). His essential elements of a story are an arc including an opening, challenge, action, and resolution, which results in an emotional reaction such as excitement, curiosity or surprise. The principle is that the emotional response from the resolution of the challenge gets your audience engaged and helps them retain your message.
It may seem to you that a graph is far removed from a story. In fact, a single graph rarely tells a story on its own. A pair of graphs, or a dynamic graph, or even just an original graph and an updated graph can be used to tell a story. For example, return to the carbon dioxide figures at the top of this lesson. Two years of data show a seasonal cycle in atmospheric carbon dioxide with a modest year over year trend. Suppose that was all you knew about carbon dioxide. It would be hard to know why there was a problem. Now look at the record since 1958. It’s now clear that there is a long-term increase and the interannual variation is small in comparison. The 800,000 year record from ice cores shown below provides even more context. Current atmospheric carbon dioxide concentrations are far outside the range of documented variability for the past 800,000 years.
We will return to the theme of story telling frequently in the course, particularly in assignments.
1.8 Exercises
- Compare the presentation of data in the carbon dioxide and sea ice volume graphs. Which is easiest for you to understand? Why?
- Predict how CO\(_2\) over time would look if plotted in the style of the sea ice volume graph. Do the reverse prediction too. Return to this task later in the course once you have some more data manipulation and plotting skills.
- Which visualization from this lesson is most appealing? Most easily understood? Least appealing? Hardest to understand? Try to draw some conclusions about what you think “works” in a visualization.
1.9 Futher reading
- Kurt Vonnegut summary of story arcs
- Wilke (2019), in particular Chapter 29.
- A course about Data Visualization in R by Claus Wilke.