by George Taniwaki

Did you watch the debate on Monday night? I did. But I am also very interested in the post-debate media coverage and analysis. This morning, two articles that combine big data and the debate caught my eye. Both are novel and much more interesting than the tired stories that simply show changes in polls after a debate.

First, the New York Time reports that during the presidential debate (between 9:00 and 10:30 PM EDT) there is high correlation between the Betfair prediction market for who will win the presidential election and afterhours S&P 500 futures prices (see chart 1).


Chart 1. Betfair prediction market for Mrs. Clinton compared to S&P 500 futures. Courtesy of New York Times

Correlation between markets is not a new phenomena. For several decades financial analysts have measured the covariance between commodity prices, especially crude oil, and equity indices. But this is the first time I have seen an article illustrating the covariance between a “fun” market for guessing who will become president against a “real” market. Check out the two graphs above, the similarity in shape is striking, including the fact that both continue to rise for about an hour after the debate ended.

In real-time, while the debate was being broadcast, players on Betfair believed the chance Mrs. Clinton will win the election rose by 5 percent. Meanwhile, the price of S&P 500 futures rose by 0.6%, meaning investors (who may be the same speculators who play on Betfair) believed the stock market prices in November were likely to be higher than before the debates started. There was no other surprise economic news that evening, so the debate is the most likely explanation for the surge. Pretty cool.

If the two markets are perfectly correlated (they aren’t) and markets are perfectly efficient (they aren’t), then one can estimate the difference in equity futures market value between the two candidates. If a 5% decrease in likelihood of a Trump win translates to a 0.6% increase in equity futures values, then the difference between Mr. Trump or Mrs. Clinton being elected (a 100% change in probability) results in about a 12% or $1.2 trillion (the total market cap of the S&P 500 is about $10 trillion) change in market value. (Note that I assume perfect correlation between the S&P 500 futures market and the actual market for the stocks used to calculate the index.)

Further, nearly all capital assets (stocks, bonds, commodities, real estate) in the US are now highly correlated. So the total difference is about $24 trillion (assuming total assets in the US are $200 trillion). Ironically, this probably means Donald Trump would be financially better off if he were to lose the election.


The other article that caught my eye involves Google Trend data. According to the Washington Post, the phrase “registrarse para votar” was the third highest trending search term the day after the debate was broadcast. The number of searches is about four times higher than in the days prior to the debates (see chart 2). Notice the spike in searches matches a spike in Sep 2012 after the first Obama-Romney debate.

The article says that it is not clear if it was the debate itself that caused the increase or the fact that Google recently introduced Spanish-language voting guides to its automated Knowledge Box, which presumably led to more searches for “registrarse para votar”. (This is the problem with confounding events.)

After a bit of research, I discovered an even more interesting fact. The spike in searches did not stop on Sep 27. Today, on Sep 30, four days after the debates, the volume of searches is 10 times higher than on Sep 27, or a total of 40x higher than before the debate (see chart 3). The two charts are scaled to make the data comparable.


Chart 2. Searches for “registrarse para votar” past 5 years to Sep 27. Courtesy of Washington Post and Google Trends


Chart 3. Searches for “registrarse para votar” past 5 years to Sep 30. Courtesy of Google Trends

I wanted to see if the spike was due to the debate or due to the addition of Spanish voter information to the Knowledge Box. To do this, I compared “registrarse para votar” to “register to vote”. The red line in chart 4 shows Google Trend data for “register to vote” scaled so that the bump in Sept 2012 is the same height as in the charts above. I’d say the debate really had an unprecedented effect on interest in voting and the effect was probably bigger for Spanish speaking web users.


Chart 4. Searches for “register to vote” past 5 years to Sep 30. Courtesy of Google Trends

Finally, I wanted to see how the search requests were distributed geographically. The key here is that most Hispanic communities vote Democratic and many states with a large Hispanic population are already blue (such as California, Washington, New Mexico, New Jersey, and New York). The exception is Florida with a large population of Cuban immigrants who tend to vote Republican.


Chart 5. Searches for “registrarse para votar” past 5 years to Sep 30 by county. Courtesy of Google Trends

If you are a supporter of Democrats like Mrs. Clinton, the good news is that a large number of queries are coming from Arizona, and Texas, two states where changes in demographics are slowly turning voting preferences from red to blue.

In Florida, it is not clear which candidate gains from increased number of Spanish-speaking voters. However, since the increase is a result of the debate (during which it was revealed that Mr. Trump had insulted and berated a beauty pageant winner from Venezuela, calling her “miss housekeeping”), I will speculate many newly registered voters are going to be Clinton supporters.

If the Google search trend continues, it may be driven by new reports that Mr. Trump may have violated the US sanctions forbidding business transactions in Cuba. Cuban-Americans searching for information on voter registration after hearing this story are more likely to favor Mrs. Clinton.

Business Week

Ranking 8 cities by total growth. Image from Bloomberg BusinessWeek

by George Taniwaki

The graphic above from Bloomberg BusinessWeek Mar 2013 lists the four metro areas with the greatest economic growth over the five-year period 2007-2011. It also gives their population change during the same period. And it lists the four cities that had both negative population growth and GDP growth during the same period.

This chart is a bit light on data, containing only 16 data points. And the changes to population and GDP are not directly comparable since the population change is reported cumulatively for the four years (total number of years minus one) while GDP is annualized. Let’s calculate the cumulative GDP change as follows:

Total change GDP = (1 + Annual change GDP)^Years – 1.

Also, notice that the data has differing numbers of significant digits. The annualized GDP changes are displayed with two digits. The population changes show one, except for Chicago and Providence which have several. I’m sure this was done to show that the populations of these two cities were falling rather than flat. Let’s get rid of those extra digits.

This chart ranks the best and worst performing metro areas. One could reasonably argue that the metro areas with the greatest absolute GDP growth are the best. (I will argue otherwise shortly.) But should the worst performing areas be defined as the four that had both declining population and declining GDP? For a counterexample, consider a city where the population is growing but GDP is falling. I would say it is actually in worse shape based on the negative value of its per capita GDP growth. In fact, any city where the population is growing faster than GDP (or shrinking slower than GDP) would have negative GDP growth. Perhaps GDP per capita is a better measure of performance than total GDP change.

To address this, let’s calculate the change in GDP per capita as follows:

Change GDP per capita = ((Change in GDP + 1) / (Change in pop. +1)) –1.

The normalized data from the chart is summarized in the table below.

Metro area
Total change in pop Total change in GDP Total change in GDP per capita
Portland   4%   22%   18%
San Jose   3%   19%   15%
Austin 12%   14%      2%
New Orleans 16%     8%    -7%
Detroit -4% -19%   -16%
Cleveland -1%   -5%     -4%
Chicago -0%   -2%     -2%
Providence -0%   -1%     -1%


Notice now that New Orleans, despite having very high GDP growth, has large negative per capita GDP growth because its population is growing faster than its GDP. Austin’s performance now looks less impressive too. And while Cleveland, Chicago, and Providence all have negative per capita GDP growth, they are not doing as badly as it first appears.

Even when normalized, the data in the table is still lacking context. It doesn’t give the reader a feel for the big picture. For instance, how many metropolitan areas over 1 million are there in the U.S.?  What is the average population change and GDP change among those cities? Which cities had the greatest change, either positive or negative, in population, GDP, and per capita GDP?

Continuing that analysis, we would want to know if most cities were growing near the average rate or if there is a large dispersion. What is the shape of this dispersion? Are there geographic location, city size, or other factors that correlate with growth? Finally, are there time series trends? To answer these questions we need to go back to the source data and create our own charts.

Creating the metro population and GDP dataset

The footnote to the Bloomberg BusinessWeek chart says the data is from the Bureau of Economic Analysis and the Census Bureau. The BEA GDP data is available from an interactive website. I selected Table = GDP by metro area, Industry = All industry total, Area = All MSAs, Measures = Levels, and Year = 2007 to 2011.

The Census Bureau population estimates for the metropolitan statistical areas (MSAs) are available for download from the Census website. I downloaded the historical decennial data for 2000 to 2009 and the current decennial data that covers 2010 to 2012. I merged the three data sets keyed off of Census Bureau statistical area (CBSA) code.

Note that several of the CBSAs changed in 2010, meaning the code changed too. The most significant is that Los Angeles-Long Beach-Santa Ana, CA (31100) changed to Los Angeles-Long Beach-Anaheim, CA (31080).

In addition to the MSA records, I created two additional records. One contains the total population and GDP for all MSAs and the other for MSAs with population greater than one million.

Since the geographic names of the MSAs are often quite long, I want to find shorter labels that I can use on a scatterplot. I decide to use airport codes. These are short, unique, cover any big city with an airport worldwide, and if you travel a lot, you’ve possibly memorized quite a few, so you don’t need a legend to decode them. I append this to each record.

Finally, I calculate the following descriptive statistics for each MSA and append them to the records:

Change in population = (Pop on Jul 2011 / Pop on Jul 2007) – 1

Change in GDP = (GDP for 2011 / GDP for 2007) – 1

Per capita GDP for year 20xx = GDP for year 20xx / Pop on Jul of year 20xx

Change in GDP per capita = (Per capita GDP for 2011 / GDP per capita for 2007) – 1

Comparing the data I collected with the Bloomberg BusinessWeek data, the ranking for the top four cities match, but the values for population change and GDP change do not. This could be because different data was used (historical population and GDP estimates are revised annually).

The data for the bottom four cities don’t match at all. The data I collected shows only one city that had falling population and GDP during the time period, Detroit. The three other cities showed rising GDP and two showed rising population as well. And despite the falling population and GDP, all four cities showed rising GDP per capita.

The data for those 8 metro areas plus a few outliers are shown below The means for all 51 MSAs with population greater than one million are included for comparison.

Metro area
Total change in pop Total change in GDP Total change in GDP per capita
Portland   4.5% 23.1% 17.8%
San Jose   5.0% 28.6% 12.9%
Austin 11.7% 18.4%   6.0%
New Orleans   9.4% 17.5%   7.4%
Salt Lake City   1.3% 14.9% 13.4%
Mean   4.2%   6.9%   2.6%
Detroit -3.8% -2.4%   1.4%
Cleveland -1.5%   3.5%   5.0%
Chicago   0.5%   5.4%   4.9%
Providence   0.0%   6.7%   6.6%
Las Vegas   7.0% -5.9% -12.1%
Charlotte 36.7%   9.3% -20.1%


Visualization and Analysis

I generated a simple scatterplot of change in GDP against change in population for all 51 MSAs. The cities from the table above are highlighted in green and red. I added a population weighted trend line shown in brown. The trend line passes through the mean (4.2%, 6.9%) and has an y-intercept at 2.6%.

I could have made the chart fancy by adding information using the size, shape and color of the markers. For instance I could change the size of the markers based on the population of the MSA, change the shape of the marker based on whether the city was coastal or inland, and change the color of the marker based on which Census region it was in.


Figure 1. A simple scatterplot showing the top 51 MSAs

The four metro areas with the highest GDP growth are all above the trend line and have high per capita GDP growth. However, the Bloomberg BusinessWeek chart leaves off Salt Lake City which has lower GDP growth but because its population only grew 1.3% during the period, its per capita GDP growth is a very high 13.4%.

The fastest shrinking metro area is Detroit, which matches the Bloomberg BusinessWeek result. Note that its lies above the 45-degree diagonal running through the origin, meaning its GDP decline is less than its population decline and so has positive GDP per capita growth. However, it is still below the trend line, meaning it is growing slower than the average.

The other three metro areas in the Bloomberg BusinessWeek chart, Cleveland, Chicago, and Providence, all show slow or negative population growth, but are all above the trend line. They probably should not be considered bust-towns. The true bust-town in the scatterplot is Las Vegas. It is an outlier with a population growth of 7% but a GDP decline of 6% which results in a 12% drop in GDP per capita.

The final outlier is Charlotte. It shows a population gain of nearly 37% which is more than double of the next fastest growing city. But it has only a 9% increase in GDP leaving it with a 20% drop in GDP per capita. This is a sign that rapid growth can actually be very bad for a city.

Data error and bias

The statement in the paragraph above assumes that the data for change in economic activity and in population at the MSA level developed by two separate organizations for separate reasons is accurate and comparable. Neither of these assumptions is particularly sound. Specifically, there is a big discontinuity in the population estimate for Charlotte between Jul 2009 (the last estimate based on the 2000 census) and Jul 2010 (the first estimate based on the 2010 census) that accounts for most of the population gain. Thus, the annual population estimates may need to be smoothed before calculating the change between years.

I believe the BEA estimate of economic activity for an MSA is based partly on the population estimate for the MSA. Thus, if the population estimate changes (it is revised annually), then the GDP estimate will no longer be valid and will need to be updated.

Finally, you should be careful when combining data from different sources and comparing them. We do it all the time but we have to be conscience of what the consequences are. This is an especially important point since everybody today is rapidly building giant data warehouses and running analytics on data that has never been combined before.

I love looking at maps. Good maps contain high data density and if you are already familiar with the region being described on the map, looking at the map evokes your memories of passing through the area defined. A good map is a piece of art.

The current issue of the Univ. Chicago Mag. May 2011 describes the map work of Eric Fischer, an engineer at Google.

One of his projects was to create a Flickr photostream with maps of various cities with colored lines showing people’s inferred path based on the timestamps of their geocoded photos uploaded to Flickr or Picasa. The project is called the Geotaggers’ World Atlas. The map of Seattle area is shown below.


Map showing paths of people who shared photos taken in Seattle. Image from Eric Fischer in the Geotaggers’ World Atlas

The color of the lines shows the estimated speed of the photographer. Black is by foot, red is by bike, and green is by car. You’ll notice that the densest set of lines are in downtown Seattle and most of them are black. The fewest lines are in the suburbs, and most of them are green. Even though the population on either sides of Lake Washington are similar, most of the photos are taken on the Seattle side. More proof that the eastside is boring.

Using the same data, Mr Fischer creates another set of maps that teases out whether the photographs are taken by tourists or locals. He does this by first seeing if there is a single city in which the user’s pictures span more than a month. If so, the user is considered a local in that city. All other pictures taken by the same user outside that city (and which have timestamps that span less than a month) are considered tourist pictures. This provides two interesting bits of data. First, do locals and tourists take pictures of different things? Second, do some cities attract more photo-happy tourists than others.

In another project, Mr Fischer has created a set of maps of major U.S. cities showing the level of racial segregation in 2010. Normally, maps like these start with neighborhood boundaries and then use a chloropleth technique to indicate the population density of each race in the neighborhood. However, with the power of computers today, it is now possible to reverse the process. One can display households on the map and let the colors define the boundaries of neighborhoods.

Mr Fischer’s maps do this by placing a colored dot on a map to represent 25 households of a particular race (African in blue, Caucasian in red, Asian in green, and Other in yellow) or Hispanic origin regardless of race (orange). He uses U.S. Census self-reported race and ethnicity data from the both 2000 and 2010 census at the census block level. A map of Seattle (it also includes most of King County and parts of Kitsap County) is shown below.

As can be seen in the map,  Seattle has a smaller proportion of blacks than any other large urban area in the U.S., at 6.1%. The black population is concentrated in neighborhoods south of downtown. Asians make up a larger proportion of the population at 13.2%. They are also concentrated south of downtown but also tend to be dispersed throughout the region with pockets in Bellevue (east of Lake Washington), Renton, and Kent (both south of Bellevue).


Map showing Seattle’s racial distribution. Image from Eric Fischer

The idea for the maps comes from Bill Rankin’s map of Chicago that is available at a website called Radical Cartography. This site has a variety of other maps for cities, the United States, the world, and even the universe.

[Update: I corrected an error in the description of the race maps. Pacific Islanders are included in the Other race category, not in the Asian category. Pacific Islanders make up 0.5% of the Seattle population. I subtracted this from the population data for Asians.]

Dante Chinni and James Gimpel, the authors of Our Patchwork Nation: The Surprising Truth About the “Real” America, look at demographic data of the United States at the county level. There are 3,141 counties in the U.S. and though they vary considerably in geographical size and population, using county-level maps to display data provides a convenient way to compare and contrast demographic data.

By mapping demographic data at the county level, you can see how attributes like population density, income, education, attitudes, behaviors, and health are distributed across the U.S.

Unfortunately, the authors take the wonderfully detailed data available from various sources at the county level and use segmentation analysis to group the counties into twelve categories and give them cute names. Ugh.


Our Patchwork Nation. Image from Amazon

Luckily, the authors provide access to the raw county-level data at their website, You can view county level chloropleth maps for a wide variety of data. There is even a tool to overlay two maps to do comparisons. The tool doesn’t work very well since you cannot select the colors of the overlays. But overall, the patchworknation site has some of the best U.S. data maps available.

An example of the problem using segments rather than the raw data is illustrated in an article that appears in The Atlantic Apr 2010. The map shown below shows the 12 segments. But the user has to flip back and forth between the legend and the map to determine what each color means.


The 12 states of America. Image from PatchworkNation

The raw data, at the Patchwork Nation website is shown in the three maps below. The first map shows the median household income in 1980. The colors show which quintile each county falls into. The second map shows quintiles for 2010. It is hard to compare the two maps to see how the distribution changes. Going to the Patchwork Nation website and toggling between the two maps makes it easier.


Distribution of median household income by county in 1980. From Patchwork Nation


Distribution of median household income by county in 2010. From Patchwork Nation

The map below shows the change in median income for each county using 2010 adjusted dollars. Notice the counties with large and growing urban populations, mostly on the east and west coasts. They had the highest median incomes in 1980 and in 2010. They also show the highest growth in median income while the remaining counties show smaller gains or a loss. The gap in income distribution in the U.S. is growing.

For more on this story, read the article at the Patchwork Nation website.


Change in median household income by county from 1980 to 2010. From Patchwork Nation

The U.S. Census has just started to release the 2010 census data at the county level. Until now, only state level data was available. Data for 21 states is currently available with the rest to be released over next two months as it becomes available.

In addition to providing the data in CSV format, the U.S. Census has created several nifty interactive maps that can be viewed at their site or embedded in your site. (Embedding requires support for iframe tags.) The map widget includes three maps, population change, population density, and apportionment for each state for every decade for the past 100 years. It’s pretty nifty and an excellent example of how dense data can be made more approachable.


Image from U.S. Census

As mentioned in a Nov 2009 blog post, there isn’t very much data on the long-term outcomes for live kidney donors. That’s because they are not being tracked. Further, there is little data on what attributes (independent variables) may indicate which donors (cases) are more likely to suffer adverse outcomes (dependent variables).

Harvey Mysel of Living Kidney Donors Network recently posted a link on Facebook to an article that shows that medical outcomes for living kidney donors vary by race. The study in the New Engl. J. Med. Aug 2010  (subscription required) caught my attention because two of the authors, Connie Davis and Paolo Salvalaggio, are at the Univ Washington Medical Center where my donor surgery will be performed. Dr. Davis is a nephrologist and director of the kidney transplant program. Dr. Salvalaggio is a surgeon in the program and was originally assigned to be the surgeon for my nephrectomy. (A schedule change led to a change in surgeon.)

They used a clever technique called a retrospective study to find the outcomes of donors. Rather than ask donors as they enter a transplant program to participate in a longitudinal study (called a prospective study) they looked at historical medical data after the fact. They obtained the historical medical data by matching the ID of donors in the United Network for Organ Sharing (UNOS) database with the customer database of a cooperating health insurer (the insurer is not identified, but my guess is Kaiser Permanente). Retrospective studies are fast (no need to wait several years to collect data) and inexpensive (no need to track patients for years as they move, stop cooperating, change insurance plans, etc.). However, these studies are subject to many types of sampling bias, which are beyond the scope of this blog post.

The authors make two findings. First is that some donors, both black and white, receive treatment for hypertension, diabetes mellitus, and chronic kidney disease after their surgery. Second is that black donors had higher prevalence of these morbidities than whites for all three conditions. On their own, these findings are not particularly surprising since these three diseases are very common chronic conditions and the black population as a whole has higher rates than whites.

However, it does lead to two concerns. The first is that although kidney donors are healthier than the population at large, doctors must not assume they will remain so. They should be vigilant for signs of chronic diseases among their patients who were kidney donors. This study shows that even within a few years someone who was thoroughly tested (and kidney donors get an extremely detailed examination) may begin to show symptoms of chronic disease. Hypertension, diabetes mellitus, and chronic kidney disease are often called silent killers. This study shows just how silent.

Second, the article says prevalence of these diseases among certain groups of kidney donors were in some cases as high as or higher than expected for a similar subpopulation that were not donors. This deserves additional research. Using prevalence rate (proportion who have the diagnosis) rather than incidence rate (proportion who receive their first diagnosis) may understate the seriousness of the problem. That’s because within the general population, the prevalence of these three chronic conditions is higher than it was for the kidney donors during the year in which they underwent their donor surgery. Thus, if the prevalence of these chronic conditions is the same as the general population in later years, then the incidence rate each year among kidney donors must be higher than for the general population. This may indicate that the kidney donation itself may be a factor in the evolution of the disease.

Or it could be a result of sampling bias. That is, kidney donors are more likely to have insurance and thus more likely to see a doctor who will diagnose the disease. The authors state,

“In our study, the increased prevalence of hypertension among Hispanic donors, as compared with the general population, may, in part, reflect underreporting of hypertension in this ethnic group, as compared with white respondents, in NHANES. We speculate that medical surveillance after kidney donation may mitigate barriers to the recognition of hypertension rather than differentially affect the risk of hypertension among Hispanic donors.”

by George Taniwaki

Every year Mattel introduces several new editions of its ”I can be…” Barbie doll. They include the clothing and accessories necessary for this famous doll to be successful in a particular career. (Me thinks this may have more to do with selling toys than broadening the career sights of little girls, but I digress.) This year, the company decided to let girls vote for their favorite career. The top vote getter would be featured as this fall’s new edition. Mattel heavily promoted the contest on social media sites like Facebook and Twitter. (Not sure how I missed it.)

On Feb 12, at the New York Toy Fair, Mattel announced that the winner of the popular vote for the 125th special edition Barbie was computer engineer. It’s hard to tell from the drawing below, but Barbie is wearing a Bluetooth headset, has a t-shirt with computer code written on it, and has a smartphone strapped on. As my friend Jim Reichle would say, “The heat, the heat.” (It’s an inside joke from Caltech.) But to compensate for that bit of nerdiness, she’s color coordinated with hot pink laptop, glasses, and wristwatch (maybe it’s a revival of a Spot Watch).


Winner of the popular vote, computer engineer. Image from Mattel

You can preorder a computer engineer Barbie from the Mattel web site. I’m sure it will soon become a popular collectible among a certain crowd here in Redmond.


I thought it was rather odd that computer engineer would beat out the other careers that Mattel offered girls to choose from: architect, environmentalist, news anchor, and surgeon. I didn’t think many girls consider computer science as an attractive career. After all, it isn’t very popular among women entering college. And it turns out that computer engineer wasn’t the first choice of the girls, news anchor was.

An article in Wall St. J. Apr 9. reveals that a viral campaign started by computer engineers hijacked the voting for Barbie’s new career. Computer engineer Barbie became a cause célèbre among the digerati. For instance, a writer for SQLblog encouraged his followers to vote. The influential GeekGirlCamp ran an appeal asking readers to “Please help us in getting Barbie to get her Geek On!”

In the end, Mattel realized the power of social media cuts both ways and decided to have two winners. News anchor was declared the winner of the girls’ vote while computer engineer was the winner of the popular vote. Mattel will release an anchorwoman Barbie in time for this year’s holiday season.


Winner of the girl’s vote, news anchor. Image from Mattel


Why did computer engineer Barbie attract so much attention? Well, I think part of it may be the odd sense of duty (or sense of humor) that geeks have toward promoting their culture. (Had I known about this contest, I certainly would have voted for computer engineer.) But part of it may have been to actually raise awareness of computer science as an attractive career for women.

Many formerly male dominated professions such as law, accounting, mathematics, medicine, and biological science are now much more gender balanced, or in some cases becoming female dominated. However, engineering, physical science, and computer science are not.

In fact, the proportion of men in many of these professions never fell below 70% and are actually on the rise again. A Wall St. J. blog post states that the number of women in computer science has been falling while the total number of workers has been growing, causing a steep rise in the male-to-female ratio. (And I don’t think it was caused by girls hearing Barbie say, “Math class is tough.” A study published in Science Jul 2008 shows that the gender gap in math achievement as measured by standardized tests has disappeared. So it is likely something else is causing it. The article points to a problem with standardized tests themselves and the pernicious effect of the No Child Left Behind legislation. But I digress again.)

Most of my own college education and work experience has been in heavily male dominated fields. My freshman year was spent at California Institute of Technology, where in 1977 fewer than 10% of the undergraduates were female. I transferred to the Colorado School of Mines where the proportion of females was about double that. At both schools there were almost no female graduate students or professors. Even after almost 30 years, a Amer. Assoc. Univ. Profes. 2006 report cites Caltech as the doctorate-level school with the lowest proportion of female full professors (14%) in the U.S. The next lowest school? Mines at 16%.

In its first 100 years (from 1874 to 1973), Mines graduated a total of 14 women. The percentage grew quickly thereafter and was still rising while I was attending. But then it stopped. A recent story in Mines Magazine shows that the proportion of women at the school has remain steady for the last twenty years at about 25%. However, the type of student may be changing as women now hold about half of the student leadership positions.


A professor at Mines, a woman named Tracy Camp, authored a paper that appeared in Comm. ACM Oct 1997 that highlighted the falling enrollment of women in computer science programs and warned of its consequences to the U.S. economy and global competitiveness. She urged action to identify and counteract the forces that were, and still are, leading fewer women to seek degrees in computer science and careers in the IT industry.

I wondered if Dr. Camp was one of the adults who voted for computer engineer Barbie. When asked, she said, “Yes, I voted for the computer engineer Barbie. I also sent an announcement out on my networks, which helped add a lot more votes.” She doesn’t feel bad at all about adults hijacking the vote, “Research has shown that we need to change the image of computing to get more girls interested. Barbie may help.” (One of the great things about writing a blog is that I can send impertinent emails to busy people and they respond, but I digress.)

So there you have it. Computer engineer Barbie is a child’s toy, a collectible, a role model for career-minded girls, an Internet meme that provides a lesson in how social media is changing marketing, a symbol of U.S. economic competitiveness, and a partial solution to the gender gap in engineering. Who knew?