by George Taniwaki
Time series data are hard to display in a way that shows other relationships. That’s because one generally uses a static 2-dimensional chart. The x-axis is used to display time, leaving only one other axis to show another variable. The display can be greatly enhanced by using animation. Time can flow while the x and y axes can display relationship data or geographic data. This can aid in understanding how relationships vary over time.
Last week, Google announced the release of Google Public Data Explorer. This web-based data visualization tool (because all Google tools are web-based) takes time series data from public sources and provides a way to create animations. Data Explorer allows you to create line charts, bar charts, maps and scatterplots from web data. It has a pretty clear interface to change the axes. For scatterplots you can also change the color and size of the data points.
Google’s animated trend charts are based on technology called Trendalyzer it acquired in 2006 from Gapminder, a nonprofit foundation in Sweden. I mentioned Gapminder in a 2008 blog post that has been accidentally deleted.
Below is an example scatterplot I created using World Bank data for fertility rate by life expectancy worldwide by country. The data covers 1960 to 2007. First note that in 1960, there was a strong correlation between high fertility rate and low life expectancy. Also note the strong correlation between high income and high life expectancy. Many researchers have shown that women are likely to have more children if they fear that some of the children will not grow to adulthood. Note the one outlier among the high income countries that in 1960 had low life expectancy (54 years) and high fertility (5.7). That country is South Korea. If you play the animation, you can see Korea move down and to the right over time.
Country fertility rate by life expectancy. Image by George Taniwaki
I’ve marked a few countries that display unusual behavior over time. China is an interesting outlier because in 1960 it was poor with a low life expectancy, but also with a low fertility rate. But if you play the animation, you see that this may have been a statistical anomaly because in 1962, the fertility rate jumps to 7.5. The fertility rate rapidly falls to under 3 as the one-child policy takes effect and continues to fall below the 2.1 replacement level in the 1990s. Rwanda, Timor-Leste, and Cambodia show short, horrifying drops in both life expectancy and fertility during genocide campaigns. Guinea-Bissau shows large fluctuations in fertility rate without a corresponding change in life expectancy. I can’t explain it. The country is very poor and perhaps its health statistics are unreliable, though this is true for many other countries. Lesotho and Zimbabwe are poor countries with high levels of HIV infection and AIDS that are causing both fertility rates and life expectancy to fall. The AIDS epidemic is taking a toll even on wealthier countries like South Africa.
Below is another scatterplot I created showing population by income in the U.S. by state. I would have liked to have shown income by education, city/rural ratio, or some other correlated variable, but no such variable was available in the dataset. I use log scale for population since there is such a large range in state size. I also use a log range for income because the data is not adjusted for inflation and I want the data range to show the change in income dispersion over time.
State population by income. Image by George Taniwaki
One of the most surprising things about the chart is that over the 40 years the rankings of state incomes doesn’t vary much. In 1969 the poorest state was Mississippi, and all the poorest states were in the southeast census region. If you play the animation, you’ll see some horizontal reordering, but not much. Even fast growing states like Nevada and Arizona don’t change order. They shoot up, but their income ranking doesn’t move much. Same with states with falling populations like Michigan and Ohio. The only exceptions are small states and districts like Hawaii, Alaska, and Washington DC. They zip around like flies. And maybe that explains why the data doesn’t move much. State level data is too coarse and it would be better to see data for the 100 largest cities or some other finer geographic region.
About a year ago, my friend Steve Duenser pointed me to a couple of nice animations by FlowingData that shows the growth of Target and Wal-Mart. The animations use Modest Maps, a Flash-based mapping tool.
By an odd coincidence, the three biggest discount store chains, Wal-Mart, Target, and Kmart, all started in 1962 which makes a time series comparison of store openings easier. Target is headquartered in Minneapolis and slowly began to expand throughout the U.S., jumping to large metropolitan areas. Wal-Mart grew much more quickly, but was invisible to most people because it concentrated on rural areas near its headquarters in Bentonville. It blanketed the southeastern U.S. before its steady expansion to the rest of the country. The videos showing geographic distribution make the different growth strategies more easy to observe than a tabular list would. It would be even more obvious if the two animations were combined to show overlays.
Target vs Wal-Mart. Images from FlowingData
Finally, my friend Carol Borthwick pointed out an interesting way of visualizing Twitter feeds across both time and space. An interactive map was shown in The New York Times after the Super Bowl last year. It’s a fun chart and makes you realize that there’s a lot of data on the web that’s just waiting to be mined. Unfortunately, there is no story indicating how the data was collected and organized, what tools were used, or how you can make your own Twitter charts.
Map of Popular Twitter comments in real-time. Image from New York Times
[Update: I added a paragraph stating that Google’s animated trend charts are based on technology it acquired from Gapminder.]