Example of polar area chart showing causes of mortality among soldiers by month during Crimean war. Image from Wikimedia

by George Taniwaki

May 12 is International Nurses Day to recognize the contribution nurses make and to celebrate the birth of Florence Nightingale. Today marks the 200th anniversary of her birth. The World Health Organization named this year the Year of the nurse and midwife in her honor. Certainly, with the Covid-19 pandemic in full force, 2020 will be remembered as the Year of the nurse for many years to come.

Ms Nightingale, who was born in Florence, Italy was the founder of the modern nursing profession. Prior to her efforts, nursing was a volunteer activity, most often undertaken by untrained family members, soldiers, or religious members. Ms Nightingale trained nurses during the Crimean War. She later founded the first secular nursing school and published many nursing textbooks.

In addition to advancing nursing in a clinical setting, Ms Nightingale was a social activist who advocated for more government spending on healthcare for the poor. She helped develop the field of public health nursing to reach patients who were poor and sick at home.

Finally, Ms Nightingale was an incredible statistician and a pioneer in data visualization. She kept thorough notes and documented which treatments worked and which did not, making it possible for others to replicate her results. She popularized a type of pie chart that she called a coxcomb (see image above) and is now known as a polar area chart. She was the first woman elected to the Royal Statistical Society and became an honorary member of the American Statistical Association.

IHME-Covid1 IHME-Covid2

Yesterday’s forecast (left) and today’s (right). Images from IHME

by George Taniwaki

The Covid-19 story moves very fast. Yesterday, I posted a blog entry with a chart showing that the Institute of Health Metrics and Evaluation (IHME) forecast 72,000 deaths in the U.S. by June 2020 with almost no new deaths between then and August 2020. The IHME forecast assumed that stay-at-home orders will remain in place until August (see chart above left).

Today, three interesting pieces of news were reported. First, the IHME abandoned its assumption that the population will stay at home and instead switched to using smartphone location data provided by mobile carriers to estimate population mobility. This boosted their estimate of deaths in August to 134,000, an increase of 62,000 (see chart above right).

Second, a group of data scientists led by the University of Sydney’s Centre for Translational Data Science has reviewed the forecasts by IHME and found that they underestimate the uncertainty associated with COVID-19 deaths. 70% of the state level forecasts were outside the 95% prediction interval (Arxiv May 2020). You should only expect 5% of the forecasts to be outside the 95% prediction interval.

Finally, in yesterday’s blog post I discussed the ensemble forecast of Covid-19 deaths created by the Center for Disease Control and Prevention (CDC).  Until last week, the IHME forecast was included in the ensemble. On Friday, the CDC dropped the IHME forecast from its ensemble and replaced it with forecasts from Imperial College.

The IHME forecast was lower than most other forecasts and had been a favorite of the Trump administration (Politico Apr 2020) and of the Center for Disease Control (CDC) (Medium Apr 2020).

National-Forecast-2020-04-20 National-Forecast-2020-04-27-1280px

Last week, the CDC ensemble forecast (left) included the IHME data but does not this week (right). Image from CDC


Animated Covid-19 map, screenshot from Domo

by George Taniwaki

In order to make predictions about the future trajectory of the spread of Covid-19, you need to be able make sense of the currently available data. There are several steps to get good data.

Medical event data

First, you have to be able to collect data from multiple sources, clean them, and aggregate them based on a standard criteria. Each data record could include the following elements:

  1. Event (what was counted, e.g., tests administered, positive test results, negative results, hospital admissions, ICU status, ventilation status, discharges, recoveries, deaths, etc.)
  2. Location ID (where the event occurred, see below)
  3. Date of incidence (when the event occurred)
  4. Date of reporting (sometimes data is reported days or even months after the event and can be updated many times as errors are corrected or missing data is estimated)
  5. Value (a count)

The best repository of Covid-19 data is maintained by the New York Times (on GitHub) with an interactive viewer. Johns Hopkins University Coronavirus Resource Center also has a dataset. The best source for counts of tests in the U.S. is available from the Covid Tracking Project sponsored by the Atlantic.


One of several graphics available from the New York Times

Public policy change data

In addition to medical events, there are public policy events that can be tracked, such as government orders to close nonessential businesses, travel restrictions, and so forth. These records could include the following elements:

  1. Event (what type of public policy change was made)
  2. Location ID (where the change applies to, see below)
  3. Date of incidence (when the change was implemented)
  4. Date of reporting (when change was reported, usually before the change is implemented)

Unfortunately, I could not find a centralized source of information on government restrictions and the dates they became effective. A different source of information that can help indicate how much contact there is between people is the amount of movement by people who carry smartphones. Smartphones contain a GPS antenna and can report their position. The position can be used to indicate what type of activity the person is engaging in. Google Health has a community mobility report that is updated regularly. An example report is shown below and the data in .csv format is available for download.


Among those who own Android smartphones and participate in tracking, trips have declined. Screenshot from Google Health

Demographic and geographic data

To analyze the data, you will want append demographic and geographic data about the locations. Unlike events, demographic and geographic data changes slowly, so only needs to be collected once during the model building process. The following data elements could be useful to prepare a model of forecast:

  1. Location ID (from above)
  2. Name or description
  3. Location hierarchy (continent > country > region > state > county > city > zip code, etc.)
  4. Latitude and longitude of centroid
  5. Latitude and longitude of center of largest city
  6. Surface area (km3)
  7. Total population
  8. Age distribution
  9. Gender distribution
  10. Income distribution
  11. Race distribution
  12. Political party affiliation distribution
  13. Health insurance coverage distribution
  14. Comorbidity distribution (smoking, diabetes, etc.)
  15. Number of hospitals
  16. Number of hospital beds
  17. Number of ICU beds
  18. Number of ventilators

Some good sources for this type of data are US Census, United Nations Demographic Year Book, United Nations Development Programme’s (UNDP) Human Development Report and the World Bank’s World Development Report, Gapminder, and ESRI.

Visualize the data

Once the data is aggregated, there are many ways to visualize it. Maps are an obvious way to display location data. Line charts are an obvious way to display time series data. Domo, a developer of business intelligence software, has very nice animation that displays time series data on a map (screenshot at top of blog).

Two caveats about their display. First, the number of cases is underreported because testing for infection was not widespread early in the pandemic, and is still too low today.

Second, outside the U.S. the data is by reported by country, not state or other smaller region. A single marker is used to represent the location of events. This is probably fine for Europe or Africa, where countries tend to be small. However, it is misleading for larger countries like Canada, Russia, China, Indonesia, Australia, and Brazil. Even data for a states like California is distorted because one would expect separate markers for the Bay Area and for the LA Basin instead of a single one in the middle of the state.

Johns Hopkins Center for Systems Science and Engineering has produced a nice dashboard hosted on ArcGIS (screenshot below). It does a better job of dividing large countries into smaller geographic partitions, but the colors are dark. A description of the project was published in Lancet Infect Dis (Feb 2020) and in a press release (Jan 2020). All of the data and the dashboard are available in a GitHub repository.


Another example of a Covid-19 map. Screenshot from ArcGIS

A note about line charts. You often see Covid-19 growth charts by country that display time (either calendar date, or days since the nth event occurred) on the horizontal axis and count on the vertical axis. Both are scaled linearly. I find these charts hard to interpret and compare. I think a better way to display growth data is to display data on the vertical axis using logarithm of counts per 100,000 population and on the horizontal axis using days since the n*(population/100,000)th event occurred. Even better would be to divide large countries into smaller regions so that all the charts covered regions with similar populations.

Making Forecasts

There are many groups making forecasting of Covid-19 infection rates and death rates. The CDC has a summary of them along with its own ensemble forecast. It predicts under 100,000 deaths in the U.S. at the end of May. The Institute of Health Metrics and Evaluation (IHME) predicts about 72,000 total deaths at the end of May but with a range from 60,000 to 115,000. You can download the data from the Global Health Data Exchange.

In addition to forecasting deaths, the IHME forecasts hospital utilization. These forecasts are used by hospitals to schedule resources and plan for peak usage.


Individual forecasts of cumulative reported deaths in U.S. from Covid-19 (left) and CDC ensemble forecast (right). Image from CDC


Cumulative death forecast in U.S. Image from IHME.

One of the best forecasts I have seen was produced by the Economist. It synthesizes data from US Census, New York Times, Covid Tracking Project, IHME, Google Health, and Unacast. The choropleth map of the U.S. below shows risk factors for Covid-19 mortality at the county level. Green shows areas where the risk level is low (less than 1%) and red shows high (6% or above).


Dixie in the crosshairs. Image from Economist

* * * *

Update1: In just one day, the IHME forecast is obsolete. See my response at

Update2: Add link to New York Times dataset and interactive viewer


Track this. Photo from Bloomberg BusinessWeek by Karen Ducey/Getty Images

by George Taniwaki

In a Bloomberg Businessweek editorial (Apr 2020), Cathy O’Neil (mathbabe) explains why a Covid-19 tracking app won’t work. It’s all about self-selection bias.

* * * *

Update: For a good non-technical description of how the Apple and Google contact tracing API works, including the encryption method, see Economist, Apr 2020. The article also suggests that even though using an app for contact tracing is imperfect, its low-cost and passive nature makes it worthwhile.


Can a partially effective vaccine flatten the curve?

by George Taniwaki

During this Covid-19 pandemic, we want to know when we can stop sheltering at home and go back into public spaces again. Further, we want to know which actions can speed up the time before that can happen.

One thing we do know is that when dealing with a novel disease (one that no human appears to have immunity for), the entire population cannot go back to pre-epidemic behavior at the same time before it is safe. Doing so will cause a spike in infections and deaths. This will terrorize the population leading to another round of isolation. If the public loses faith that the government knows when it is safe to change behavior, then when it finally is safe, people will still be afraid and time will be lost during the recovery, causing additional economic hardship.

So when can we go back to normal? I think that can happen only after herd immunity is achieved. This can take a very long time as a trickle of individuals become infected and recover with resistance or die, a process called flattening the curve. Or it can happen pretty quickly after the wide-spread inoculation of individuals with a safe and effective vaccine.

An effective vaccine may take 18 to 24 months to develop. Many people, including President Donald Trump, think staying home this long is unrealistic. Is it possible to shorten that time by releasing a partially effective vaccine sooner? Doing so may help flatten the curve without requiring social distancing.

Partially effective vaccines

An intriguing paper by Eduard Talamàs & Rakesh Vohra, entitled “Free and perfectly safe but only partially effective vaccines can harm everyone” pretty much contains the answer in its title.

The idea is that a partially effective vaccine will cause people to change their behavior too much, too soon, causing the spike we want to avoid. The conclusion is similar to the analysis popularized by Sam Peltzman of the Univ. Chicago (a microeconomics professor while I was a student there) who suggested that stricter automobile safety regulations could lead to increased deaths (of pedestrians) as drivers felt safer and became more reckless (J Polit Econ, Aug 1975).

The most important conclusion in Talamàs et al., is that with overlapping social networks, even those who do not increase the size of their networks after the introduction of the vaccine can be harmed by those who do. This conclusion is slightly different than those of most epidemiological models that assume random contact between individuals rather than strategic networks. A good description of the paper is given by one of the authors, Vohra, at The Leisure of the Theory Class (Apr 2020).


Not how the crow flies to get to work each day

by George Taniwaki

I really don’t like to drive to work. I’ll do almost anything to avoid a long commute. For most of my adult life I have either walked or ridden a bus to get to work. Yes, it’s possible. I always chose an apartment to rent or house to buy based on how close it is to my job. And once I’ve found a place to live, I usually reject a new job unless it’s within walking or riding distance. It helps that I like living in big cities.

But I just started a contract assignment in Mountlake Terrace, a suburb of Seattle about 20 miles from where I live as the crow flies (see image above, or not). This isn’t the longest commute in my life, but the first long one in a few decades. And it’s the first long commute where I’m driving alone rather than in a carpool.

Google Maps, initial attempt

The traffic in Seattle is awful. To reduce my commute time, I’ve decided that starting my drive at 6:30 and working from 7:30 to 4:00 will help. Before my first day to the job site, I pull out Google Maps and plot my route (see Fig 1).


Figure 1. My first route to the office, average 42 minutes every morning

For the analysis in this blog post, I split my drive into segments based on type of driving. Segment A consists of surface streets from my house to the highway. B is 17 miles at highway speed driving north, away from the city, C is a slow slog where I double-back and join the commuters coming into town, and D is the final, short segment of surface streets to the office.

The table below shows details of my commute. Most of it is tolerable. But notice that segment C (the red zone) constitute less than one-sixth of my commute distance but over one-third of my commute time.

Map Description



Elapsed Time

A Surface streets from home to SR520




B SR520 to I-405 to I-5




C I-5 to Mountlake Terrace




D Surface streets from I-5 to office








Google Maps, redux

After a couple weeks of following this route, I’ve learned which lanes to use on which segments to slice a few minutes off my commute. But I think I can still do better. I check Google Maps for some alternatives. This time it gives me a completely different, and unexpected, route. It tells me to go a few miles out of my way south to I-90 and drive north through the city on I-5 (see Fig 2).

I-5 in downtown Seattle is one of the most congested highways in the U.S. I get queasy every time I drive it worrying about getting stuck. But maybe it’s not so bad at 6:45 AM, which is about what time I will get there. So I trust Google Maps and try it.


Figure 2. Google Maps’ new suggestion, average time 44 minutes

It works. I try the route on two consecutive days. The table below shows the average results. There is congestion on I-5 between I-90 to Olive Way (Segment C red zone) but it is a shorter segment than my previous commute. However, the route is longer, so it doesn’t save much time. Further, I don’t like this route because it limits my options. If there is an accident or other delay, I will be stuck in traffic with no easy way to avoid it.

Map Description



Elapsed Time

A Surface streets from home to I-90




B I-90 to downtown




C I-5 to Olive Way




D I-5 to Mountlake Terrace




E Surface streets from I-5 to office








Waze to the rescue

Waze is a GPS navigation app originally developed in Israel but quickly went global. It uses traditional digital map data and combines it with real-time location data from users including speed, route, reports of traffic jams, accidents, police speed traps, and gasoline prices at nearby stations. Thus, the more people who use it, the more accurate it becomes.

Waze also shows you the current toll price (Seattle uses variable toll pricing) and lets you avoid tolls, ferries, or highways, if desired, when choosing a route.

Google (now Alphabet) acquired Waze in 2013 but it remains a separate entity from Google Maps. Because Waze collects potentially personally identifiable information (PII), it has a less restrictive user agreement than Google Maps and warns users of that fact. (Though most people never read the agreement and just click “I accept”.)

Generating a route is a highly resource intensive calculation that often involves machine learning. To simplify the work, Google Maps generally limits routes to major arterial streets. Waze combines those calculations with the actual routes users are taking to find the minimum travel time. Thus, Waze often creates routes that run through residential neighborhoods. Of course, the neighbors sometimes complain or even fight back by generating fake route data (Wash Post, Jun 2016).

Figure 3 below shows the route Waze recommends for my commute. It looks just like the original route that Google Maps suggested, except for the last segment. I still take the I-5 cloverleaf, but instead of continuing onto I-5, it has me veer right and use side streets to get to the office.


Figure 3. My new favorite route to work

Map Description



Elapsed Time

A Surface streets from home to SR520




B SR520 to I-405




C Surface streets from I-405 to office








The best part is that I can see I-5 from the I-405 off-ramp. When traffic is light (speed is 30 mph or more), I can veer to the left and take I-5 to the office. When I-5 is congested, I can veer to the right and take surface streets. While on the surface streets, I can continue to see I-5 and confirm whether I made the right decision and improve my choice for future days.

Waze leads me astraze

With my success with Waze in the morning, I decide to use it for my evening commute home as well. As I turn onto I-5, Waze tells me there is road kill ahead. I wonder where. Then suddenly I see a raccoon and am jolted by the thump. It saddens me to know that I’ve squashed an innocent animal under my tires, even if it is already dead.

The rest of the commute home is uneventful until Waze tells me to exit I-405 at NE 85th St in Kirkland (Fig 4b), 4 miles before my usual exit at SR520 (Fig 4a). Gee, that seems like a bad idea. Should I ignore Waze and keep going straight? Or should I take the exit? Maybe there is an accident on my regular route. Or maybe the crowd of Waze users knows a sneak route. Well, Waze has been pretty accurate so far, so I take the exit.

Ugh, what a mistake. Driving east on NE 85th St takes me straight into a huge traffic jam on Redmond Way. Also, there is a giant construction project on the Microsoft campus, so West Lake Sammamish Pkwy is overflowing with drivers avoiding lane closures on 156th Av NE. My commute today is more than 35 minutes longer than usual. I won’t do that again.

CommuteToHomeI405Label CommuteToHomeI405LocalLabel

Figures 4a, b. My normal commute home, Waze suggestion for 11/21/2019


Both Waze and Google Maps show you unexpected options and are likely to give better routes than you could find on your own. Overall, my experience with Waze was better than Google Maps, but both could use improvements.

* * * *

All this talk about commute time has me remembering a brain teaser from my childhood. Let’s say I want my average commute speed to be 40 mph. One day, I get stuck in traffic and cover the first half of the distance to work at an average speed of 20 mph. How fast do I have to drive on the second half to meet my goal? Hint: The answer is not 60 mph or even 80 mph.


by George Taniwaki

This post describes the final six classes I took to obtain the Microsoft Artificial Intelligence Certificate. Four were required and two were optional. For a description of the first six classes I took, see this May 2019 blog post.

DAT236x – Deep Learning Explained

Deep learning is the use of machine learning on large datasets, often using neural networks. It is used in fields such as computer vision, speech recognition, and language processing (topics covered in more detail in later classes). Techniques include logistic regression, multilayer perceptron, convolutional neural networks, recurrent neural networks, and long short-term memory.

Time: 10 hours on 6 modules

Score: Missed 4 homework questions and 2 knowledge check questions for a score of 92%

DAT236x Score  DAT236x Certificate

DAT257x – Reinforcement Learning Explained

Reinforcement learning assumes a problem can be modeled as a Markov decision process. There is a set of discrete states (S), an agent that can perform a set of actions (A) selected from a set of decision policies (Π). Each possible action will result in a reward (R) and a new state (S’). The goal is to find the optimum policy π(s) for all s in S or to determine if a given policy is optimal.

Solutions to the reinforcement learning problem include use of multi-arm bandits, regret minimization, dynamic programming (Bellman equation), policy evaluation and optimization, linear function approximation, deep neural networks, and deep Q-learning. Advanced topics include likelihood ratio methods, variance reduction, and actor-critic.

I have an undergraduate degree in chemical engineering where I learned about control theory and Markov chains. However, that coursework only covered analog PID controllers. The topics in this class were new to me and so it was slow going.

Time: 16 hours for 10 modules

Score: Missed 7 knowledge check questions early on, then slowed down and got the rest right. Missed 3 lab questions. Final score of 91%

DAT257x Score  DAT257x Certificate

DEV287x – Speech Recognition Systems

Speech recognition is an interdisciplinary activity that combines signal processing, acoustics, linguistics, and domain knowledge with computer science. The topics covered in this course include:

  1. Fundamental theory – Phonetics, words and syntax, performance metrics
  2. Speech signal processing – Feature extraction, mel filtering, log compression, Feature normalization
  3. Acoustic modeling – Markov chains, feedforward deep neural networks, sequence based objective function
  4. Language modelingN-gram models, language model evaluation (likelihood, entropy, perplexity), LM operations (n-gram pruning, interpolating probabilities, merging), class-based LMs, neural network LMs
  5. Speech decoding – Weighted finite state transducers, WFST and acceptors, graph composition
  6. Advanced topics – Improved objective functions, sequential objective function, sequence discriminative objective functions

This class was pretty awful and I’m glad I didn’t pay for it. It consists mostly of text displayed in the edX courseware. It would have been helpful to if video or audio lectures were included to show voice recognition in action. The text itself was a split into multiple web pages containing embedded MathML equations, making is unsearchable. I ended up copying and pasting all the text into a Word document.

The lab assignments in this class are provided as Python files designed for Linux. Some labs require a Linux shell and would not run in Visual Studio on Windows. I would expect instructors in a Microsoft sponsored course to design lessons that could run on Windows. Simply putting the code in Jupyter notebooks would have made it easier to read and to work with.

Some labs require a Python package called OpenFST that does not compile with the latest build tools available from Microsoft. Again, I would expect instructors to design lessons that could run on Windows.

Time: 6 hours for 6 modules

Score: None, I did not take this class for credit

DEV287x Score

DEV288x – Natural Language Processing (NLP)

Natural language processing consists of many separate but related tasks. These include transcription, translation, conversation, and image captioning.

Machine translation has evolved from conventional statistical machine translation (STM) that uses hand-coded phrase pairs, to neural machine translation that use deep neural networks to create end-to-end sequence probabilities to translate entire sentences at a time.

The deep semantic similarity model (DSSM) is a DNN model for representing text strings as vectors. It can be used for information retrieval and entity ranking tasks.

Natural language understanding requires spoken language processing, continuous word representations, neural knowledgebase embedding, and KB-based question answering. NLP can be enhanced using deep reinforcement learning.

Finally, image captioning requires multimodal intelligence, combining image recognition and assigning labels to images in a natural language format.

Time: 10 hours for 6 modules

Score: None, I did not take this class for credit

DEV288x Score

DEV290x – Computer Vision and Image Analysis

This course is an excellent overview of the state-of-the-art in computer vision. It starts with a description of classical methods including thresholding, clustering, region growing, template matching, and feature detection (Sobel edges and Harris corners).

Next it covers object classification and detection algorithms such as Viola-Jones, histogram of oriented gradients (HOG), deep learning, extending classifiers into detectors, object proposals, and introduces convolutional neural networks (CNN).

Finally, the course introduces advanced topics such as super-pixels and conditional random fields, deep segmentation, and transfer learning.

Time: 10 hours for 20 modules

Score: Missed 4 quiz questions and 1 final exam question for a score of 94%

DEV290x Score  DEV290x Certificate

DAT264x – Microsoft Professional Capstone : Artificial Intelligence

This is the last class for the certificate. Similar to the capstone for the Microsoft Data Science certificate, it is a month-long project designed as a contest hosted by Unlike the capstone class for the Data Science certificate, there is no report, the grade is based solely on the contest score. For a description of the April 2019 contest, see this [future date] blog post.

I used Microsoft’s cognitive neural toolkit (CNTK) package for Python for my solution. I had a hard time debugging my code. CNTK is not as popular as Google’s Tensorflow, so searching error messages on the web gives few results.

Time: 30 hours for single assignment

Score: Log-loss error of 0.22 for a final score of 95%

DAT264x Score  DAT264x Certificate

Final Certificate

Below is my certificate of completion for the Microsoft Professional Program, Artificial Intelligence Certificate.


* * * *

As an aside, starting in January 2019, edX has changed the way it handles students who audit courses. To encourage more students to pay for its courses, edX now limits access to course content to 30 days after enrollment. After 30 days, you lose access, even if you have posted items on the discussion board. Further, it eliminated access to the assessment content (quizzes, labs, and exams) entirely unless you pay. This sucks.

I hope Microsoft will provide funding to edX to allow audit students to participate. Or Microsoft should stop working with edX and move its content to another MOOC platform that supports audit students. I’ve paid over $2000 to participate in the edX courses. But I always audit a course before paying for the content. I think the try-before-you-buy model is essential to get students to trust they will get their money’s worth. Preventing audit students from seeing the assessment content will make it difficult to gain their trust in the value of that content.