by George Taniwaki

Did you watch the debate on Monday night? I did. But I am also very interested in the post-debate media coverage and analysis. This morning, two articles that combine big data and the debate caught my eye. Both are novel and much more interesting than the tired stories that simply show changes in polls after a debate.

First, the New York Time reports that during the presidential debate (between 9:00 and 10:30 PM EDT) there is high correlation between the Betfair prediction market for who will win the presidential election and afterhours S&P 500 futures prices (see chart 1).

PresidentSandP500

Chart 1. Betfair prediction market for Mrs. Clinton compared to S&P 500 futures. Courtesy of New York Times

Correlation between markets is not a new phenomena. For several decades financial analysts have measured the covariance between commodity prices, especially crude oil, and equity indices. But this is the first time I have seen an article illustrating the covariance between a “fun” market for guessing who will become president against a “real” market. Check out the two graphs above, the similarity in shape is striking, including the fact that both continue to rise for about an hour after the debate ended.

In real-time, while the debate was being broadcast, players on Betfair believed the chance Mrs. Clinton will win the election rose by 5 percent. Meanwhile, the price of S&P 500 futures rose by 0.6%, meaning investors (who may be the same speculators who play on Betfair) believed the stock market prices in November were likely to be higher than before the debates started. There was no other surprise economic news that evening, so the debate is the most likely explanation for the surge. Pretty cool.

If the two markets are perfectly correlated (they aren’t) and markets are perfectly efficient (they aren’t), then one can estimate the difference in equity futures market value between the two candidates. If a 5% decrease in likelihood of a Trump win translates to a 0.6% increase in equity futures values, then the difference between Mr. Trump or Mrs. Clinton being elected (a 100% change in probability) results in about a 12% or $1.2 trillion (the total market cap of the S&P 500 is about $10 trillion) change in market value. (Note that I assume perfect correlation between the S&P 500 futures market and the actual market for the stocks used to calculate the index.)

Further, nearly all capital assets (stocks, bonds, commodities, real estate) in the US are now highly correlated. So the total difference is about $24 trillion (assuming total assets in the US are $200 trillion). Ironically, this probably means Donald Trump would be financially better off if he were to lose the election.

****

The other article that caught my eye involves Google Trend data. According to the Washington Post, the phrase “registrarse para votar” was the third highest trending search term the day after the debate was broadcast. The number of searches is about four times higher than in the days prior to the debates (see chart 2). Notice the spike in searches matches a spike in Sep 2012 after the first Obama-Romney debate.

The article says that it is not clear if it was the debate itself that caused the increase or the fact that Google recently introduced Spanish-language voting guides to its automated Knowledge Box, which presumably led to more searches for “registrarse para votar”. (This is the problem with confounding events.)

After a bit of research, I discovered an even more interesting fact. The spike in searches did not stop on Sep 27. Today, on Sep 30, four days after the debates, the volume of searches is 10 times higher than on Sep 27, or a total of 40x higher than before the debate (see chart 3). The two charts are scaled to make the data comparable.

VotarWashPost

Chart 2. Searches for “registrarse para votar” past 5 years to Sep 27. Courtesy of Washington Post and Google Trends

VotarToday

Chart 3. Searches for “registrarse para votar” past 5 years to Sep 30. Courtesy of Google Trends

I wanted to see if the spike was due to the debate or due to the addition of Spanish voter information to the Knowledge Box. To do this, I compared “registrarse para votar” to “register to vote”. The red line in chart 4 shows Google Trend data for “register to vote” scaled so that the bump in Sept 2012 is the same height as in the charts above. I’d say the debate really had an unprecedented effect on interest in voting and the effect was probably bigger for Spanish speaking web users.

VoteToday

Chart 4. Searches for “register to vote” past 5 years to Sep 30. Courtesy of Google Trends

Finally, I wanted to see how the search requests were distributed geographically. The key here is that most Hispanic communities vote Democratic and many states with a large Hispanic population are already blue (such as California, Washington, New Mexico, New Jersey, and New York). The exception is Florida with a large population of Cuban immigrants who tend to vote Republican.

VotarRegionToday

Chart 5. Searches for “registrarse para votar” past 5 years to Sep 30 by county. Courtesy of Google Trends

If you are a supporter of Democrats like Mrs. Clinton, the good news is that a large number of queries are coming from Arizona, and Texas, two states where changes in demographics are slowly turning voting preferences from red to blue.

In Florida, it is not clear which candidate gains from increased number of Spanish-speaking voters. However, since the increase is a result of the debate (during which it was revealed that Mr. Trump had insulted and berated a beauty pageant winner from Venezuela, calling her “miss housekeeping”), I will speculate many newly registered voters are going to be Clinton supporters.

If the Google search trend continues, it may be driven by new reports that Mr. Trump may have violated the US sanctions forbidding business transactions in Cuba. Cuban-Americans searching for information on voter registration after hearing this story are more likely to favor Mrs. Clinton.

by George Taniwaki

I was driving on NE 8th Street in Bellevue last Saturday. As I was approaching 156th Ave. NE, I saw two of those ubiquitous sign spinners on opposite corners. I missed the light, so I had time to see that they were both advertising that the nearby Haggen Northwest Fresh was closing and everything was 30% off.

This is an unfortunate end to a misguided attempt by Haggen to compete with Whole Foods. Here’s the story.

When I moved to Bellevue in 2000, there were several supermarkets in my neighborhood. The closest is a small QFC, part of the Kroger chain, located inside the Crossroads Shopping Center, just north of the food court. Cater-corner to it was another small supermarket, an Albertsons. (I say “was” because it closed within a few months after I moved in and was eventually replaced by an Ace Hardware and a Bartell Drugs.)

A few blocks north of that is a Trader Joe’s, a chain that specializes in a small selection of fancy foods at low prices. Another few blocks north of that was an Uwajimaya, a local Asian food chain. (It moved a few years ago. More on that later.) A few blocks west of the Trader Joe’s is a Fred Meyer, another Kroger chain that sells a variety of department store goods and groceries. A few blocks north of the Fred Meyer (on the same street as Uwajimaya) is a Safeway. In addition, there are lots of small convenience stores and ethnic food stores in the neighborhood.

Supermarkets

Figure 1. Map of east Bellevue showing locations of largest grocery stores from 2000 to 2013. Map image courtesy of Microsoft, logos courtesy of respective firms

There’s no shortage of places to buy food in east Bellevue. That’s why I was very surprised in 2001 when Haggen announced it was opening a big TOP Food and Drug just north of the Crossroads Shopping Center. Haggen is a regional supermarket, headquartered in Bellingham, WA.

Prior to the opening of the TOP store, I bought most of my groceries at Fred Meyer or Safeway. They were farther from my home than the QFC or Albertsons, but had a better selection of private label goods (aka store brands) and a bigger selection of fresh produce.

When the TOP store opened, I immediately switched to it. But I am not a loyal shopper. I shop for groceries every few days and will pull into a store whenever it is convenient (based on direction I am driving, time of day, etc.). Thus, I continued to occasionally shop at the other stores.

After a few years, I noticed that the prices were higher at TOP than the other stores. But since it was close to home, it remained my primary store. However, the parking lot at TOP was not as full as in the past. Price sensitive shoppers were abandoning the store.

****

The competitive landscape got tougher in 2004, when Whole Foods Market, a purveyor of high quality, high price groceries, built a large store near downtown Bellevue. This caused a lot of people to switch their purchases of meats, cheeses, and wines to Whole Foods. These are items that have high margins. This especially hurt a nearby competing store, called Larry’s Market, which ultimately led the entire chain to fail.

The space where Larry’s Market was became a Big 5 Sporting Goods store for a while, then was empty, and finally in 2011, Uwajimaya moved into part of the space. The remaining space was taken up by Total Wine, the biggest liquor store you have ever seen.

Uwajimaya realized that it was not a direct competitor to Whole Foods and could safely open a store across the street from it. In fact, the two stores are somewhat complementary. Also, even though Whole Foods in on a busy street, it is hard to get to. You can’t make a left turn out of the parking lot, forcing extra driving regardless of which direction you come from. Uwajimaya is easier to get to. Finally, Uwajimaya is one block from the Home Depot, a store I visit a lot. So I drive past the Uwajimaya frequently and thus shop there often.

****

In 2011, Haggen received a large capital infusion from Comvest Group, a private equity firm. Comvest decided that the TOP location in Bellevue would not be successful unless it switched from competing on price (apparently, TOP stands for Tough On Prices) to competing on quality. Thus, they changed the name of the store from TOP to Haggen Northwest Fresh and spent what I estimate was over $200,000 to remodel the store.

The remodeling effort took about three weeks. During that time, the store was open, but was a chaotic mess. So I stopped going the TOP and instead shopped at QFC and Trader Joe’s, two store that I rarely visited in the past.

I came back to the new Haggen store after the remodel. The biggest changes were to increase the spacing between stands in the produce aisle to give it the look of a faux farmers’ market, expand the wine section, and add a big exhibit stand in the center of the store. To make space for all this, they cut back on the private label brands, the very items that I visited the store for.

The other change they made was an effort to brand each of the sections of the store with names. For instance, the fish counter was renamed Lummi Bay Market and the deli counter was renamed Dot’s Kitchen. The idea was developed by the Hartman Group a local consulting firm, that called it a “store within a store” concept. Unfortunately, there didn’t appear to be any budget to change the look of the counters themselves, the offerings, or training for the personnel. So other than the new signage, nothing appeared to have changed. It seemed a waste.

Apparently, sales at the remodeled store did not meet expectations, so they began mailing $5 off coupons to customers. This must have been very expensive, and counter to the new high price, high quality image they were seeking. Ultimately, I decided to continue to patronize QFC and Trader Joe’s and only rarely stepped into the Haggen.

****

The remodel has been a disaster for Haggen and Comvest and led to the store’s closure. Hopefully, the company can find its footing and achieve success in its other stores. I liked the TOP store when it first opened and am sad it failed. The retail business is tough and is undergoing a revolution with companies like Trader Joe’s and Whole Foods taking the high-end, ethnic food stores catering to the immigrant population, and services like Amazon Fresh making inroads in the grocery delivery business.

****

Several months ago a Walmart Neighborhood Market opened west of our house. It occupies the space of a failed Kmart store that had been empty for years. Then last month, a new Grocery Outlet opened just across the street from Crossroads Mall in what previously had been a high-end appliance store that closed during the housing crisis. Apparently, there are still people who think the grocery business is worth investing in.

Below are two maps I created to indicate the locations of events described in my father’s memoir reproduced in three earlier blog posts. These are live maps, so you can click on the map to see the actual data on Bing Maps.

The first map is of Southern Japan. It shows the cities (numbers 1 through 5) that my father traveled through by train and ferry from his farm in Susaki-shi, Kochi-ken on the island of Shikoku to get to Hiroshima-shi on the main island of Honshu. It also shows the two cities (6 and 7) mentioned later in the memoir. The short link to this map is http://binged.it/13db12G.

MapJapan

Map of southern Japan showing the cities mentioned in my father’s memoirs. Courtesy of Bing Maps

The second map is of the city of Hiroshima. It shows the location of the army camp (1), the bomb shelters (2), ground zero (3), Niho Elementary School (4), Hiroshima train station (5), East training ground (6), sentry post west of city (7), temporary crematory (8), and the army headquarters (9). The short link to this map is http://binged.it/WrMi3q.

MapHiroshima

Map of Hiroshima-shi showing the landmarks mentioned in my father’s memoirs. Courtesy of Bing Maps

The bomb shelter where my father was working when the atomic bomb detonated no longer exists. It was closed off and covered over. There is now a service road in the park that passes near the point. Using Bing Maps you can create a bird’s eye view looking west that shows the topography of Hiji-yama-koen (Hiji hill park).

BombShelter

Bird’s eye view looking west shows the topography near the army camp (1), bomb shelters (2), and a temporary crematory (8). Courtesy of of Bing Maps

I have updated the original blog posts to include links to the maps. Links to the memoir are: Part 1, part 2, and part 3.

Much thanks to Greg Huber and other readers for feedback on how they used online maps and the relative location of ground zero to various locations mentioned in the memoir to help them understand the extent of damage caused by the bomb.

I love looking at maps. Good maps contain high data density and if you are already familiar with the region being described on the map, looking at the map evokes your memories of passing through the area defined. A good map is a piece of art.

The current issue of the Univ. Chicago Mag. May 2011 describes the map work of Eric Fischer, an engineer at Google.

One of his projects was to create a Flickr photostream with maps of various cities with colored lines showing people’s inferred path based on the timestamps of their geocoded photos uploaded to Flickr or Picasa. The project is called the Geotaggers’ World Atlas. The map of Seattle area is shown below.

SeattleGeocodeAtlas

Map showing paths of people who shared photos taken in Seattle. Image from Eric Fischer in the Geotaggers’ World Atlas

The color of the lines shows the estimated speed of the photographer. Black is by foot, red is by bike, and green is by car. You’ll notice that the densest set of lines are in downtown Seattle and most of them are black. The fewest lines are in the suburbs, and most of them are green. Even though the population on either sides of Lake Washington are similar, most of the photos are taken on the Seattle side. More proof that the eastside is boring.

Using the same data, Mr Fischer creates another set of maps that teases out whether the photographs are taken by tourists or locals. He does this by first seeing if there is a single city in which the user’s pictures span more than a month. If so, the user is considered a local in that city. All other pictures taken by the same user outside that city (and which have timestamps that span less than a month) are considered tourist pictures. This provides two interesting bits of data. First, do locals and tourists take pictures of different things? Second, do some cities attract more photo-happy tourists than others.

In another project, Mr Fischer has created a set of maps of major U.S. cities showing the level of racial segregation in 2010. Normally, maps like these start with neighborhood boundaries and then use a chloropleth technique to indicate the population density of each race in the neighborhood. However, with the power of computers today, it is now possible to reverse the process. One can display households on the map and let the colors define the boundaries of neighborhoods.

Mr Fischer’s maps do this by placing a colored dot on a map to represent 25 households of a particular race (African in blue, Caucasian in red, Asian in green, and Other in yellow) or Hispanic origin regardless of race (orange). He uses U.S. Census self-reported race and ethnicity data from the both 2000 and 2010 census at the census block level. A map of Seattle (it also includes most of King County and parts of Kitsap County) is shown below.

As can be seen in the map,  Seattle has a smaller proportion of blacks than any other large urban area in the U.S., at 6.1%. The black population is concentrated in neighborhoods south of downtown. Asians make up a larger proportion of the population at 13.2%. They are also concentrated south of downtown but also tend to be dispersed throughout the region with pockets in Bellevue (east of Lake Washington), Renton, and Kent (both south of Bellevue).

SeattleRace

Map showing Seattle’s racial distribution. Image from Eric Fischer

The idea for the maps comes from Bill Rankin’s map of Chicago that is available at a website called Radical Cartography. This site has a variety of other maps for cities, the United States, the world, and even the universe.

[Update: I corrected an error in the description of the race maps. Pacific Islanders are included in the Other race category, not in the Asian category. Pacific Islanders make up 0.5% of the Seattle population. I subtracted this from the population data for Asians.]

Dante Chinni and James Gimpel, the authors of Our Patchwork Nation: The Surprising Truth About the “Real” America, look at demographic data of the United States at the county level. There are 3,141 counties in the U.S. and though they vary considerably in geographical size and population, using county-level maps to display data provides a convenient way to compare and contrast demographic data.

By mapping demographic data at the county level, you can see how attributes like population density, income, education, attitudes, behaviors, and health are distributed across the U.S.

Unfortunately, the authors take the wonderfully detailed data available from various sources at the county level and use segmentation analysis to group the counties into twelve categories and give them cute names. Ugh.

OurPatchworkNation

Our Patchwork Nation. Image from Amazon

Luckily, the authors provide access to the raw county-level data at their website, patchworknation.org. You can view county level chloropleth maps for a wide variety of data. There is even a tool to overlay two maps to do comparisons. The tool doesn’t work very well since you cannot select the colors of the overlays. But overall, the patchworknation site has some of the best U.S. data maps available.

An example of the problem using segments rather than the raw data is illustrated in an article that appears in The Atlantic Apr 2010. The map shown below shows the 12 segments. But the user has to flip back and forth between the legend and the map to determine what each color means.

12States

The 12 states of America. Image from PatchworkNation

The raw data, at the Patchwork Nation website is shown in the three maps below. The first map shows the median household income in 1980. The colors show which quintile each county falls into. The second map shows quintiles for 2010. It is hard to compare the two maps to see how the distribution changes. Going to the Patchwork Nation website and toggling between the two maps makes it easier.

12StatesIncome1980

Distribution of median household income by county in 1980. From Patchwork Nation

12StatesIncome2010

Distribution of median household income by county in 2010. From Patchwork Nation

The map below shows the change in median income for each county using 2010 adjusted dollars. Notice the counties with large and growing urban populations, mostly on the east and west coasts. They had the highest median incomes in 1980 and in 2010. They also show the highest growth in median income while the remaining counties show smaller gains or a loss. The gap in income distribution in the U.S. is growing.

For more on this story, read the article at the Patchwork Nation website.

12StatesIncomeGrowth1980-2010

Change in median household income by county from 1980 to 2010. From Patchwork Nation

The U.S. Census has just started to release the 2010 census data at the county level. Until now, only state level data was available. Data for 21 states is currently available with the rest to be released over next two months as it becomes available.

In addition to providing the data in CSV format, the U.S. Census has created several nifty interactive maps that can be viewed at their site or embedded in your site. (Embedding requires support for iframe tags.) The map widget includes three maps, population change, population density, and apportionment for each state for every decade for the past 100 years. It’s pretty nifty and an excellent example of how dense data can be made more approachable.

USCensus

Image from U.S. Census

by George Taniwaki

Time series data are hard to display in a way that shows other relationships. That’s because one generally uses a static 2-dimensional chart. The x-axis is used to display time, leaving only one other axis to show another variable. The display can be greatly enhanced by using animation. Time can flow while the x and y axes can display relationship data or geographic data. This can aid in understanding how relationships vary over time.

Last week, Google announced the release of Google Public Data Explorer. This web-based data visualization tool (because all Google tools are web-based) takes time series data from public sources and provides a way to create animations. Data Explorer allows you to create line charts, bar charts, maps and scatterplots from web data. It has a pretty clear interface to change the axes. For scatterplots you can also change the color and size of the data points.

Google’s animated trend charts are based on technology called Trendalyzer it acquired in 2006 from Gapminder, a nonprofit foundation in Sweden. I mentioned Gapminder in a 2008 blog post that has been accidentally deleted.

Below is an example scatterplot I created using World Bank data for fertility rate by life expectancy worldwide by country. The data covers 1960 to 2007. First note that in 1960, there was a strong correlation between high fertility rate and low life expectancy. Also note the strong correlation between high income and high life expectancy. Many researchers have shown that women are likely to have more children if they fear that some of the children will not grow to adulthood. Note the one outlier among the high income countries that in 1960 had low life expectancy (54 years) and high fertility (5.7). That country is South Korea. If you play the animation, you can see Korea move down and to the right over time.

FertilityByLifeExp

Country fertility rate by life expectancy. Image by George Taniwaki

I’ve marked a few countries that display unusual behavior over time. China is an interesting outlier because in 1960 it was poor with a low life expectancy, but also with a low fertility rate. But if you play the animation, you see that this may have been a statistical anomaly because in 1962, the fertility rate jumps to 7.5. The fertility rate rapidly falls to under 3 as the one-child policy takes effect and continues to fall below the 2.1 replacement level in the 1990s. Rwanda, Timor-Leste, and Cambodia show short, horrifying drops in both life expectancy and fertility during genocide campaigns. Guinea-Bissau shows large fluctuations in fertility rate without a corresponding change in life expectancy. I can’t explain it. The country is very poor and perhaps its health statistics are unreliable, though this is true for many other countries. Lesotho and Zimbabwe are poor countries with high levels of HIV infection and AIDS that are causing both fertility rates and life expectancy to fall. The AIDS epidemic is taking a toll even on wealthier countries like South Africa.

Below is another scatterplot I created showing population by income in the U.S. by state. I would have liked to have shown income by education, city/rural ratio, or some other correlated variable, but no such variable was available in the dataset. I use log scale for population since there is such a large range in state size. I also use a log range for income because the data is not adjusted for inflation and I want the data range to show the change in income dispersion over time.

PopByInc

State population by income. Image by George Taniwaki

One of the most surprising things about the chart is that over the 40 years the rankings of state incomes doesn’t vary much. In 1969 the poorest state was Mississippi, and all the poorest states were in the southeast census region. If you play the animation, you’ll see some horizontal reordering, but not much. Even fast growing states like Nevada and Arizona don’t change order. They shoot up, but their income ranking doesn’t move much. Same with states with falling populations like Michigan and Ohio. The only exceptions are small states and districts like Hawaii, Alaska, and Washington DC. They zip around like flies. And maybe that explains why the data doesn’t move much. State level data is too coarse and it would be better to see data for the 100 largest cities or some other finer geographic region.

****

About a year ago, my friend Steve Duenser pointed me to a couple of nice animations by FlowingData that shows the growth of Target and Wal-Mart. The animations use Modest Maps, a Flash-based mapping tool.

By an odd coincidence, the three biggest discount store chains, Wal-Mart, Target, and Kmart, all started in 1962 which makes a time series comparison of store openings easier. Target is headquartered in Minneapolis and slowly began to expand throughout the U.S., jumping to large metropolitan areas. Wal-Mart grew much more quickly, but was invisible to most people because it concentrated on rural areas near its headquarters in Bentonville. It blanketed the southeastern U.S. before its steady expansion to the rest of the country. The videos showing geographic distribution make the different growth strategies more easy to observe than a tabular list would. It would be even more obvious if the two animations were combined to show overlays.

Target

Walmart

Target vs Wal-Mart. Images from FlowingData

****

Finally, my friend Carol Borthwick pointed out an interesting way of visualizing Twitter feeds across both time and space. An interactive map was shown in The New York Times after the Super Bowl last year. It’s a fun chart and makes you realize that there’s a lot of data on the web that’s just waiting to be mined. Unfortunately, there is no story indicating how the data was collected and organized, what tools were used, or how you can make your own Twitter charts.

SuperBowlTwitter

Map of Popular Twitter comments in real-time. Image from New York Times

[Update: I added a paragraph stating that Google’s animated trend charts are based on technology it acquired from Gapminder.]