IHME-Covid1 IHME-Covid2

Yesterday’s forecast (left) and today’s (right). Images from IHME

by George Taniwaki

The Covid-19 story moves very fast. Yesterday, I posted a blog entry with a chart showing that the Institute of Health Metrics and Evaluation (IHME) forecast 72,000 deaths in the U.S. by June 2020 with almost no new deaths between then and August 2020. The IHME forecast assumed that stay-at-home orders will remain in place until August (see chart above left).

Today, three interesting pieces of news were reported. First, the IHME abandoned its assumption that the population will stay at home and instead switched to using smartphone location data provided by mobile carriers to estimate population mobility. This boosted their estimate of deaths in August to 134,000, an increase of 62,000 (see chart above right).

Second, a group of data scientists led by the University of Sydney’s Centre for Translational Data Science has reviewed the forecasts by IHME and found that they underestimate the uncertainty associated with COVID-19 deaths. 70% of the state level forecasts were outside the 95% prediction interval (Arxiv May 2020). You should only expect 5% of the forecasts to be outside the 95% prediction interval.

Finally, in yesterday’s blog post I discussed the ensemble forecast of Covid-19 deaths created by the Center for Disease Control and Prevention (CDC).  Until last week, the IHME forecast was included in the ensemble. On Friday, the CDC dropped the IHME forecast from its ensemble and replaced it with forecasts from Imperial College.

The IHME forecast was lower than most other forecasts and had been a favorite of the Trump administration (Politico Apr 2020) and of the Center for Disease Control (CDC) (Medium Apr 2020).

National-Forecast-2020-04-20 National-Forecast-2020-04-27-1280px

Last week, the CDC ensemble forecast (left) included the IHME data but does not this week (right). Image from CDC


Animated Covid-19 map, screenshot from Domo

by George Taniwaki

In order to make predictions about the future trajectory of the spread of Covid-19, you need to be able make sense of the currently available data. There are several steps to get good data.

Medical event data

First, you have to be able to collect data from multiple sources, clean them, and aggregate them based on a standard criteria. Each data record could include the following elements:

  1. Event (what was counted, e.g., tests administered, positive test results, negative results, hospital admissions, ICU status, ventilation status, discharges, recoveries, deaths, etc.)
  2. Location ID (where the event occurred, see below)
  3. Date of incidence (when the event occurred)
  4. Date of reporting (sometimes data is reported days or even months after the event and can be updated many times as errors are corrected or missing data is estimated)
  5. Value (a count)

The best repository of Covid-19 data is maintained by the New York Times (on GitHub) with an interactive viewer. Johns Hopkins University Coronavirus Resource Center also has a dataset. The best source for counts of tests in the U.S. is available from the Covid Tracking Project sponsored by the Atlantic.


One of several graphics available from the New York Times

Public policy change data

In addition to medical events, there are public policy events that can be tracked, such as government orders to close nonessential businesses, travel restrictions, and so forth. These records could include the following elements:

  1. Event (what type of public policy change was made)
  2. Location ID (where the change applies to, see below)
  3. Date of incidence (when the change was implemented)
  4. Date of reporting (when change was reported, usually before the change is implemented)

Unfortunately, I could not find a centralized source of information on government restrictions and the dates they became effective. A different source of information that can help indicate how much contact there is between people is the amount of movement by people who carry smartphones. Smartphones contain a GPS antenna and can report their position. The position can be used to indicate what type of activity the person is engaging in. Google Health has a community mobility report that is updated regularly. An example report is shown below and the data in .csv format is available for download.


Among those who own Android smartphones and participate in tracking, trips have declined. Screenshot from Google Health

Demographic and geographic data

To analyze the data, you will want append demographic and geographic data about the locations. Unlike events, demographic and geographic data changes slowly, so only needs to be collected once during the model building process. The following data elements could be useful to prepare a model of forecast:

  1. Location ID (from above)
  2. Name or description
  3. Location hierarchy (continent > country > region > state > county > city > zip code, etc.)
  4. Latitude and longitude of centroid
  5. Latitude and longitude of center of largest city
  6. Surface area (km3)
  7. Total population
  8. Age distribution
  9. Gender distribution
  10. Income distribution
  11. Race distribution
  12. Political party affiliation distribution
  13. Health insurance coverage distribution
  14. Comorbidity distribution (smoking, diabetes, etc.)
  15. Number of hospitals
  16. Number of hospital beds
  17. Number of ICU beds
  18. Number of ventilators

Some good sources for this type of data are US Census, United Nations Demographic Year Book, United Nations Development Programme’s (UNDP) Human Development Report and the World Bank’s World Development Report, Gapminder, and ESRI.

Visualize the data

Once the data is aggregated, there are many ways to visualize it. Maps are an obvious way to display location data. Line charts are an obvious way to display time series data. Domo, a developer of business intelligence software, has very nice animation that displays time series data on a map (screenshot at top of blog).

Two caveats about their display. First, the number of cases is underreported because testing for infection was not widespread early in the pandemic, and is still too low today.

Second, outside the U.S. the data is by reported by country, not state or other smaller region. A single marker is used to represent the location of events. This is probably fine for Europe or Africa, where countries tend to be small. However, it is misleading for larger countries like Canada, Russia, China, Indonesia, Australia, and Brazil. Even data for a states like California is distorted because one would expect separate markers for the Bay Area and for the LA Basin instead of a single one in the middle of the state.

Johns Hopkins Center for Systems Science and Engineering has produced a nice dashboard hosted on ArcGIS (screenshot below). It does a better job of dividing large countries into smaller geographic partitions, but the colors are dark. A description of the project was published in Lancet Infect Dis (Feb 2020) and in a press release (Jan 2020). All of the data and the dashboard are available in a GitHub repository.


Another example of a Covid-19 map. Screenshot from ArcGIS

A note about line charts. You often see Covid-19 growth charts by country that display time (either calendar date, or days since the nth event occurred) on the horizontal axis and count on the vertical axis. Both are scaled linearly. I find these charts hard to interpret and compare. I think a better way to display growth data is to display data on the vertical axis using logarithm of counts per 100,000 population and on the horizontal axis using days since the n*(population/100,000)th event occurred. Even better would be to divide large countries into smaller regions so that all the charts covered regions with similar populations.

Making Forecasts

There are many groups making forecasting of Covid-19 infection rates and death rates. The CDC has a summary of them along with its own ensemble forecast. It predicts under 100,000 deaths in the U.S. at the end of May. The Institute of Health Metrics and Evaluation (IHME) predicts about 72,000 total deaths at the end of May but with a range from 60,000 to 115,000. You can download the data from the Global Health Data Exchange.

In addition to forecasting deaths, the IHME forecasts hospital utilization. These forecasts are used by hospitals to schedule resources and plan for peak usage.


Individual forecasts of cumulative reported deaths in U.S. from Covid-19 (left) and CDC ensemble forecast (right). Image from CDC


Cumulative death forecast in U.S. Image from IHME.

One of the best forecasts I have seen was produced by the Economist. It synthesizes data from US Census, New York Times, Covid Tracking Project, IHME, Google Health, and Unacast. The choropleth map of the U.S. below shows risk factors for Covid-19 mortality at the county level. Green shows areas where the risk level is low (less than 1%) and red shows high (6% or above).


Dixie in the crosshairs. Image from Economist

* * * *

Update1: In just one day, the IHME forecast is obsolete. See my response at https://realnumeracy.wordpress.com/2020/05/04/tracking-the-growth-of-covid-19-redux/

Update2: Add link to New York Times dataset and interactive viewer


Not how the crow flies to get to work each day

by George Taniwaki

I really don’t like to drive to work. I’ll do almost anything to avoid a long commute. For most of my adult life I have either walked or ridden a bus to get to work. Yes, it’s possible. I always chose an apartment to rent or house to buy based on how close it is to my job. And once I’ve found a place to live, I usually reject a new job unless it’s within walking or riding distance. It helps that I like living in big cities.

But I just started a contract assignment in Mountlake Terrace, a suburb of Seattle about 20 miles from where I live as the crow flies (see image above, or not). This isn’t the longest commute in my life, but the first long one in a few decades. And it’s the first long commute where I’m driving alone rather than in a carpool.

Google Maps, initial attempt

The traffic in Seattle is awful. To reduce my commute time, I’ve decided that starting my drive at 6:30 and working from 7:30 to 4:00 will help. Before my first day to the job site, I pull out Google Maps and plot my route (see Fig 1).


Figure 1. My first route to the office, average 42 minutes every morning

For the analysis in this blog post, I split my drive into segments based on type of driving. Segment A consists of surface streets from my house to the highway. B is 17 miles at highway speed driving north, away from the city, C is a slow slog where I double-back and join the commuters coming into town, and D is the final, short segment of surface streets to the office.

The table below shows details of my commute. Most of it is tolerable. But notice that segment C (the red zone) constitute less than one-sixth of my commute distance but over one-third of my commute time.

Map Description



Elapsed Time

A Surface streets from home to SR520




B SR520 to I-405 to I-5




C I-5 to Mountlake Terrace




D Surface streets from I-5 to office








Google Maps, redux

After a couple weeks of following this route, I’ve learned which lanes to use on which segments to slice a few minutes off my commute. But I think I can still do better. I check Google Maps for some alternatives. This time it gives me a completely different, and unexpected, route. It tells me to go a few miles out of my way south to I-90 and drive north through the city on I-5 (see Fig 2).

I-5 in downtown Seattle is one of the most congested highways in the U.S. I get queasy every time I drive it worrying about getting stuck. But maybe it’s not so bad at 6:45 AM, which is about what time I will get there. So I trust Google Maps and try it.


Figure 2. Google Maps’ new suggestion, average time 44 minutes

It works. I try the route on two consecutive days. The table below shows the average results. There is congestion on I-5 between I-90 to Olive Way (Segment C red zone) but it is a shorter segment than my previous commute. However, the route is longer, so it doesn’t save much time. Further, I don’t like this route because it limits my options. If there is an accident or other delay, I will be stuck in traffic with no easy way to avoid it.

Map Description



Elapsed Time

A Surface streets from home to I-90




B I-90 to downtown




C I-5 to Olive Way




D I-5 to Mountlake Terrace




E Surface streets from I-5 to office








Waze to the rescue

Waze is a GPS navigation app originally developed in Israel but quickly went global. It uses traditional digital map data and combines it with real-time location data from users including speed, route, reports of traffic jams, accidents, police speed traps, and gasoline prices at nearby stations. Thus, the more people who use it, the more accurate it becomes.

Waze also shows you the current toll price (Seattle uses variable toll pricing) and lets you avoid tolls, ferries, or highways, if desired, when choosing a route.

Google (now Alphabet) acquired Waze in 2013 but it remains a separate entity from Google Maps. Because Waze collects potentially personally identifiable information (PII), it has a less restrictive user agreement than Google Maps and warns users of that fact. (Though most people never read the agreement and just click “I accept”.)

Generating a route is a highly resource intensive calculation that often involves machine learning. To simplify the work, Google Maps generally limits routes to major arterial streets. Waze combines those calculations with the actual routes users are taking to find the minimum travel time. Thus, Waze often creates routes that run through residential neighborhoods. Of course, the neighbors sometimes complain or even fight back by generating fake route data (Wash Post, Jun 2016).

Figure 3 below shows the route Waze recommends for my commute. It looks just like the original route that Google Maps suggested, except for the last segment. I still take the I-5 cloverleaf, but instead of continuing onto I-5, it has me veer right and use side streets to get to the office.


Figure 3. My new favorite route to work

Map Description



Elapsed Time

A Surface streets from home to SR520




B SR520 to I-405




C Surface streets from I-405 to office








The best part is that I can see I-5 from the I-405 off-ramp. When traffic is light (speed is 30 mph or more), I can veer to the left and take I-5 to the office. When I-5 is congested, I can veer to the right and take surface streets. While on the surface streets, I can continue to see I-5 and confirm whether I made the right decision and improve my choice for future days.

Waze leads me astraze

With my success with Waze in the morning, I decide to use it for my evening commute home as well. As I turn onto I-5, Waze tells me there is road kill ahead. I wonder where. Then suddenly I see a raccoon and am jolted by the thump. It saddens me to know that I’ve squashed an innocent animal under my tires, even if it is already dead.

The rest of the commute home is uneventful until Waze tells me to exit I-405 at NE 85th St in Kirkland (Fig 4b), 4 miles before my usual exit at SR520 (Fig 4a). Gee, that seems like a bad idea. Should I ignore Waze and keep going straight? Or should I take the exit? Maybe there is an accident on my regular route. Or maybe the crowd of Waze users knows a sneak route. Well, Waze has been pretty accurate so far, so I take the exit.

Ugh, what a mistake. Driving east on NE 85th St takes me straight into a huge traffic jam on Redmond Way. Also, there is a giant construction project on the Microsoft campus, so West Lake Sammamish Pkwy is overflowing with drivers avoiding lane closures on 156th Av NE. My commute today is more than 35 minutes longer than usual. I won’t do that again.

CommuteToHomeI405Label CommuteToHomeI405LocalLabel

Figures 4a, b. My normal commute home, Waze suggestion for 11/21/2019


Both Waze and Google Maps show you unexpected options and are likely to give better routes than you could find on your own. Overall, my experience with Waze was better than Google Maps, but both could use improvements.

* * * *

All this talk about commute time has me remembering a brain teaser from my childhood. Let’s say I want my average commute speed to be 40 mph. One day, I get stuck in traffic and cover the first half of the distance to work at an average speed of 20 mph. How fast do I have to drive on the second half to meet my goal? Hint: The answer is not 60 mph or even 80 mph.

by George Taniwaki

Did you watch the debate on Monday night? I did. But I am also very interested in the post-debate media coverage and analysis. This morning, two articles that combine big data and the debate caught my eye. Both are novel and much more interesting than the tired stories that simply show changes in polls after a debate.

First, the New York Time reports that during the presidential debate (between 9:00 and 10:30 PM EDT) there is high correlation between the Betfair prediction market for who will win the presidential election and afterhours S&P 500 futures prices (see chart 1).


Chart 1. Betfair prediction market for Mrs. Clinton compared to S&P 500 futures. Courtesy of New York Times

Correlation between markets is not a new phenomena. For several decades financial analysts have measured the covariance between commodity prices, especially crude oil, and equity indices. But this is the first time I have seen an article illustrating the covariance between a “fun” market for guessing who will become president against a “real” market. Check out the two graphs above, the similarity in shape is striking, including the fact that both continue to rise for about an hour after the debate ended.

In real-time, while the debate was being broadcast, players on Betfair believed the chance Mrs. Clinton will win the election rose by 5 percent. Meanwhile, the price of S&P 500 futures rose by 0.6%, meaning investors (who may be the same speculators who play on Betfair) believed the stock market prices in November were likely to be higher than before the debates started. There was no other surprise economic news that evening, so the debate is the most likely explanation for the surge. Pretty cool.

If the two markets are perfectly correlated (they aren’t) and markets are perfectly efficient (they aren’t), then one can estimate the difference in equity futures market value between the two candidates. If a 5% decrease in likelihood of a Trump win translates to a 0.6% increase in equity futures values, then the difference between Mr. Trump or Mrs. Clinton being elected (a 100% change in probability) results in about a 12% or $1.2 trillion (the total market cap of the S&P 500 is about $10 trillion) change in market value. (Note that I assume perfect correlation between the S&P 500 futures market and the actual market for the stocks used to calculate the index.)

Further, nearly all capital assets (stocks, bonds, commodities, real estate) in the US are now highly correlated. So the total difference is about $24 trillion (assuming total assets in the US are $200 trillion). Ironically, this probably means Donald Trump would be financially better off if he were to lose the election.


The other article that caught my eye involves Google Trend data. According to the Washington Post, the phrase “registrarse para votar” was the third highest trending search term the day after the debate was broadcast. The number of searches is about four times higher than in the days prior to the debates (see chart 2). Notice the spike in searches matches a spike in Sep 2012 after the first Obama-Romney debate.

The article says that it is not clear if it was the debate itself that caused the increase or the fact that Google recently introduced Spanish-language voting guides to its automated Knowledge Box, which presumably led to more searches for “registrarse para votar”. (This is the problem with confounding events.)

After a bit of research, I discovered an even more interesting fact. The spike in searches did not stop on Sep 27. Today, on Sep 30, four days after the debates, the volume of searches is 10 times higher than on Sep 27, or a total of 40x higher than before the debate (see chart 3). The two charts are scaled to make the data comparable.


Chart 2. Searches for “registrarse para votar” past 5 years to Sep 27. Courtesy of Washington Post and Google Trends


Chart 3. Searches for “registrarse para votar” past 5 years to Sep 30. Courtesy of Google Trends

I wanted to see if the spike was due to the debate or due to the addition of Spanish voter information to the Knowledge Box. To do this, I compared “registrarse para votar” to “register to vote”. The red line in chart 4 shows Google Trend data for “register to vote” scaled so that the bump in Sept 2012 is the same height as in the charts above. I’d say the debate really had an unprecedented effect on interest in voting and the effect was probably bigger for Spanish speaking web users.


Chart 4. Searches for “register to vote” past 5 years to Sep 30. Courtesy of Google Trends

Finally, I wanted to see how the search requests were distributed geographically. The key here is that most Hispanic communities vote Democratic and many states with a large Hispanic population are already blue (such as California, Washington, New Mexico, New Jersey, and New York). The exception is Florida with a large population of Cuban immigrants who tend to vote Republican.


Chart 5. Searches for “registrarse para votar” past 5 years to Sep 30 by county. Courtesy of Google Trends

If you are a supporter of Democrats like Mrs. Clinton, the good news is that a large number of queries are coming from Arizona, and Texas, two states where changes in demographics are slowly turning voting preferences from red to blue.

In Florida, it is not clear which candidate gains from increased number of Spanish-speaking voters. However, since the increase is a result of the debate (during which it was revealed that Mr. Trump had insulted and berated a beauty pageant winner from Venezuela, calling her “miss housekeeping”), I will speculate many newly registered voters are going to be Clinton supporters.

If the Google search trend continues, it may be driven by new reports that Mr. Trump may have violated the US sanctions forbidding business transactions in Cuba. Cuban-Americans searching for information on voter registration after hearing this story are more likely to favor Mrs. Clinton.

by George Taniwaki

I was driving on NE 8th Street in Bellevue last Saturday. As I was approaching 156th Ave. NE, I saw two of those ubiquitous sign spinners on opposite corners. I missed the light, so I had time to see that they were both advertising that the nearby Haggen Northwest Fresh was closing and everything was 30% off.

This is an unfortunate end to a misguided attempt by Haggen to compete with Whole Foods. Here’s the story.

When I moved to Bellevue in 2000, there were several supermarkets in my neighborhood. The closest is a small QFC, part of the Kroger chain, located inside the Crossroads Shopping Center, just north of the food court. Cater-corner to it was another small supermarket, an Albertsons. (I say “was” because it closed within a few months after I moved in and was eventually replaced by an Ace Hardware and a Bartell Drugs.)

A few blocks north of that is a Trader Joe’s, a chain that specializes in a small selection of fancy foods at low prices. Another few blocks north of that was an Uwajimaya, a local Asian food chain. (It moved a few years ago. More on that later.) A few blocks west of the Trader Joe’s is a Fred Meyer, another Kroger chain that sells a variety of department store goods and groceries. A few blocks north of the Fred Meyer (on the same street as Uwajimaya) is a Safeway. In addition, there are lots of small convenience stores and ethnic food stores in the neighborhood.


Figure 1. Map of east Bellevue showing locations of largest grocery stores from 2000 to 2013. Map image courtesy of Microsoft, logos courtesy of respective firms

There’s no shortage of places to buy food in east Bellevue. That’s why I was very surprised in 2001 when Haggen announced it was opening a big TOP Food and Drug just north of the Crossroads Shopping Center. Haggen is a regional supermarket, headquartered in Bellingham, WA.

Prior to the opening of the TOP store, I bought most of my groceries at Fred Meyer or Safeway. They were farther from my home than the QFC or Albertsons, but had a better selection of private label goods (aka store brands) and a bigger selection of fresh produce.

When the TOP store opened, I immediately switched to it. But I am not a loyal shopper. I shop for groceries every few days and will pull into a store whenever it is convenient (based on direction I am driving, time of day, etc.). Thus, I continued to occasionally shop at the other stores.

After a few years, I noticed that the prices were higher at TOP than the other stores. But since it was close to home, it remained my primary store. However, the parking lot at TOP was not as full as in the past. Price sensitive shoppers were abandoning the store.


The competitive landscape got tougher in 2004, when Whole Foods Market, a purveyor of high quality, high price groceries, built a large store near downtown Bellevue. This caused a lot of people to switch their purchases of meats, cheeses, and wines to Whole Foods. These are items that have high margins. This especially hurt a nearby competing store, called Larry’s Market, which ultimately led the entire chain to fail.

The space where Larry’s Market was became a Big 5 Sporting Goods store for a while, then was empty, and finally in 2011, Uwajimaya moved into part of the space. The remaining space was taken up by Total Wine, the biggest liquor store you have ever seen.

Uwajimaya realized that it was not a direct competitor to Whole Foods and could safely open a store across the street from it. In fact, the two stores are somewhat complementary. Also, even though Whole Foods in on a busy street, it is hard to get to. You can’t make a left turn out of the parking lot, forcing extra driving regardless of which direction you come from. Uwajimaya is easier to get to. Finally, Uwajimaya is one block from the Home Depot, a store I visit a lot. So I drive past the Uwajimaya frequently and thus shop there often.


In 2011, Haggen received a large capital infusion from Comvest Group, a private equity firm. Comvest decided that the TOP location in Bellevue would not be successful unless it switched from competing on price (apparently, TOP stands for Tough On Prices) to competing on quality. Thus, they changed the name of the store from TOP to Haggen Northwest Fresh and spent what I estimate was over $200,000 to remodel the store.

The remodeling effort took about three weeks. During that time, the store was open, but was a chaotic mess. So I stopped going the TOP and instead shopped at QFC and Trader Joe’s, two store that I rarely visited in the past.

I came back to the new Haggen store after the remodel. The biggest changes were to increase the spacing between stands in the produce aisle to give it the look of a faux farmers’ market, expand the wine section, and add a big exhibit stand in the center of the store. To make space for all this, they cut back on the private label brands, the very items that I visited the store for.

The other change they made was an effort to brand each of the sections of the store with names. For instance, the fish counter was renamed Lummi Bay Market and the deli counter was renamed Dot’s Kitchen. The idea was developed by the Hartman Group a local consulting firm, that called it a “store within a store” concept. Unfortunately, there didn’t appear to be any budget to change the look of the counters themselves, the offerings, or training for the personnel. So other than the new signage, nothing appeared to have changed. It seemed a waste.

Apparently, sales at the remodeled store did not meet expectations, so they began mailing $5 off coupons to customers. This must have been very expensive, and counter to the new high price, high quality image they were seeking. Ultimately, I decided to continue to patronize QFC and Trader Joe’s and only rarely stepped into the Haggen.


The remodel has been a disaster for Haggen and Comvest and led to the store’s closure. Hopefully, the company can find its footing and achieve success in its other stores. I liked the TOP store when it first opened and am sad it failed. The retail business is tough and is undergoing a revolution with companies like Trader Joe’s and Whole Foods taking the high-end, ethnic food stores catering to the immigrant population, and services like Amazon Fresh making inroads in the grocery delivery business.


Several months ago a Walmart Neighborhood Market opened west of our house. It occupies the space of a failed Kmart store that had been empty for years. Then last month, a new Grocery Outlet opened just across the street from Crossroads Mall in what previously had been a high-end appliance store that closed during the housing crisis. Apparently, there are still people who think the grocery business is worth investing in.

Below are two maps I created to indicate the locations of events described in my father’s memoir reproduced in three earlier blog posts. These are live maps, so you can click on the map to see the actual data on Bing Maps.

The first map is of Southern Japan. It shows the cities (numbers 1 through 5) that my father traveled through by train and ferry from his farm in Susaki-shi, Kochi-ken on the island of Shikoku to get to Hiroshima-shi on the main island of Honshu. It also shows the two cities (6 and 7) mentioned later in the memoir. The short link to this map is http://binged.it/13db12G.


Map of southern Japan showing the cities mentioned in my father’s memoirs. Courtesy of Bing Maps

The second map is of the city of Hiroshima. It shows the location of the army camp (1), the bomb shelters (2), ground zero (3), Niho Elementary School (4), Hiroshima train station (5), East training ground (6), sentry post west of city (7), temporary crematory (8), and the army headquarters (9). The short link to this map is http://binged.it/WrMi3q.


Map of Hiroshima-shi showing the landmarks mentioned in my father’s memoirs. Courtesy of Bing Maps

The bomb shelter where my father was working when the atomic bomb detonated no longer exists. It was closed off and covered over. There is now a service road in the park that passes near the point. Using Bing Maps you can create a bird’s eye view looking west that shows the topography of Hiji-yama-koen (Hiji hill park).


Bird’s eye view looking west shows the topography near the army camp (1), bomb shelters (2), and a temporary crematory (8). Courtesy of of Bing Maps

I have updated the original blog posts to include links to the maps. Links to the memoir are: Part 1, part 2, and part 3.

Much thanks to Greg Huber and other readers for feedback on how they used online maps and the relative location of ground zero to various locations mentioned in the memoir to help them understand the extent of damage caused by the bomb.

I love looking at maps. Good maps contain high data density and if you are already familiar with the region being described on the map, looking at the map evokes your memories of passing through the area defined. A good map is a piece of art.

The current issue of the Univ. Chicago Mag. May 2011 describes the map work of Eric Fischer, an engineer at Google.

One of his projects was to create a Flickr photostream with maps of various cities with colored lines showing people’s inferred path based on the timestamps of their geocoded photos uploaded to Flickr or Picasa. The project is called the Geotaggers’ World Atlas. The map of Seattle area is shown below.


Map showing paths of people who shared photos taken in Seattle. Image from Eric Fischer in the Geotaggers’ World Atlas

The color of the lines shows the estimated speed of the photographer. Black is by foot, red is by bike, and green is by car. You’ll notice that the densest set of lines are in downtown Seattle and most of them are black. The fewest lines are in the suburbs, and most of them are green. Even though the population on either sides of Lake Washington are similar, most of the photos are taken on the Seattle side. More proof that the eastside is boring.

Using the same data, Mr Fischer creates another set of maps that teases out whether the photographs are taken by tourists or locals. He does this by first seeing if there is a single city in which the user’s pictures span more than a month. If so, the user is considered a local in that city. All other pictures taken by the same user outside that city (and which have timestamps that span less than a month) are considered tourist pictures. This provides two interesting bits of data. First, do locals and tourists take pictures of different things? Second, do some cities attract more photo-happy tourists than others.

In another project, Mr Fischer has created a set of maps of major U.S. cities showing the level of racial segregation in 2010. Normally, maps like these start with neighborhood boundaries and then use a chloropleth technique to indicate the population density of each race in the neighborhood. However, with the power of computers today, it is now possible to reverse the process. One can display households on the map and let the colors define the boundaries of neighborhoods.

Mr Fischer’s maps do this by placing a colored dot on a map to represent 25 households of a particular race (African in blue, Caucasian in red, Asian in green, and Other in yellow) or Hispanic origin regardless of race (orange). He uses U.S. Census self-reported race and ethnicity data from the both 2000 and 2010 census at the census block level. A map of Seattle (it also includes most of King County and parts of Kitsap County) is shown below.

As can be seen in the map,  Seattle has a smaller proportion of blacks than any other large urban area in the U.S., at 6.1%. The black population is concentrated in neighborhoods south of downtown. Asians make up a larger proportion of the population at 13.2%. They are also concentrated south of downtown but also tend to be dispersed throughout the region with pockets in Bellevue (east of Lake Washington), Renton, and Kent (both south of Bellevue).


Map showing Seattle’s racial distribution. Image from Eric Fischer

The idea for the maps comes from Bill Rankin’s map of Chicago that is available at a website called Radical Cartography. This site has a variety of other maps for cities, the United States, the world, and even the universe.

[Update: I corrected an error in the description of the race maps. Pacific Islanders are included in the Other race category, not in the Asian category. Pacific Islanders make up 0.5% of the Seattle population. I subtracted this from the population data for Asians.]