Search engine | Real Numeracy

November 28, 2019

Google Maps vs Waze

Posted by gtaniwaki under Personal, Real Numeracy | Tags: Maps, Search engine, Statistics |
1 Comment

Not how the crow flies to get to work each day

by George Taniwaki

I really don’t like to drive to work. I’ll do almost anything to avoid a long commute. For most of my adult life I have either walked or ridden a bus to get to work. Yes, it’s possible. I always chose an apartment to rent or house to buy based on how close it is to my job. And once I’ve found a place to live, I usually reject a new job unless it’s within walking or riding distance. It helps that I like living in big cities.

But I just started a contract assignment in Mountlake Terrace, a suburb of Seattle about 20 miles from where I live as the crow flies (see image above, or not). This isn’t the longest commute in my life, but the first long one in a few decades. And it’s the first long commute where I’m driving alone rather than in a carpool.

Google Maps, initial attempt

The traffic in Seattle is awful. To reduce my commute time, I’ve decided that starting my drive at 6:30 and working from 7:30 to 4:00 will help. Before my first day to the job site, I pull out Google Maps and plot my route (see Fig 1).

Figure 1. My first route to the office, average 42 minutes every morning

For the analysis in this blog post, I split my drive into segments based on type of driving. Segment A consists of surface streets from my house to the highway. B is 17 miles at highway speed driving north, away from the city, C is a slow slog where I double-back and join the commuters coming into town, and D is the final, short segment of surface streets to the office.

The table below shows details of my commute. Most of it is tolerable. But notice that segment C (the red zone) constitute less than one-sixth of my commute distance but over one-third of my commute time.

Map	Description	Dist.	Speed	Elapsed Time
A	Surface streets from home to SR520	3	30	6
B	SR520 to I-405 to I-5	17	60	17
C	I-5 to Mountlake Terrace	4	15	16
D	Surface streets from I-5 to office	1	20	3
	TOTAL	26	37	42

Google Maps, redux

After a couple weeks of following this route, I’ve learned which lanes to use on which segments to slice a few minutes off my commute. But I think I can still do better. I check Google Maps for some alternatives. This time it gives me a completely different, and unexpected, route. It tells me to go a few miles out of my way south to I-90 and drive north through the city on I-5 (see Fig 2).

I-5 in downtown Seattle is one of the most congested highways in the U.S. I get queasy every time I drive it worrying about getting stuck. But maybe it’s not so bad at 6:45 AM, which is about what time I will get there. So I trust Google Maps and try it.

Figure 2. Google Maps’ new suggestion, average time 44 minutes

It works. I try the route on two consecutive days. The table below shows the average results. There is congestion on I-5 between I-90 to Olive Way (Segment C red zone) but it is a shorter segment than my previous commute. However, the route is longer, so it doesn’t save much time. Further, I don’t like this route because it limits my options. If there is an accident or other delay, I will be stuck in traffic with no easy way to avoid it.

Map	Description	Dist.	Speed	Elapsed Time
A	Surface streets from home to I-90	4	30	8
B	I-90 to downtown	8	60	8
C	I-5 to Olive Way	2	20	6
D	I-5 to Mountlake Terrace	14	60	14
E	Surface streets from I-5 to office	1	20	3
	TOTAL	29	42	39

Waze to the rescue

Waze is a GPS navigation app originally developed in Israel but quickly went global. It uses traditional digital map data and combines it with real-time location data from users including speed, route, reports of traffic jams, accidents, police speed traps, and gasoline prices at nearby stations. Thus, the more people who use it, the more accurate it becomes.

Waze also shows you the current toll price (Seattle uses variable toll pricing) and lets you avoid tolls, ferries, or highways, if desired, when choosing a route.

Google (now Alphabet) acquired Waze in 2013 but it remains a separate entity from Google Maps. Because Waze collects potentially personally identifiable information (PII), it has a less restrictive user agreement than Google Maps and warns users of that fact. (Though most people never read the agreement and just click “I accept”.)

Generating a route is a highly resource intensive calculation that often involves machine learning. To simplify the work, Google Maps generally limits routes to major arterial streets. Waze combines those calculations with the actual routes users are taking to find the minimum travel time. Thus, Waze often creates routes that run through residential neighborhoods. Of course, the neighbors sometimes complain or even fight back by generating fake route data (Wash Post, Jun 2016).

Figure 3 below shows the route Waze recommends for my commute. It looks just like the original route that Google Maps suggested, except for the last segment. I still take the I-5 cloverleaf, but instead of continuing onto I-5, it has me veer right and use side streets to get to the office.

Figure 3. My new favorite route to work

Map	Description	Dist.	Speed	Elapsed Time
A	Surface streets from home to SR520	3	30	6
B	SR520 to I-405	17	60	17
C	Surface streets from I-405 to office	6	30	12
	TOTAL	26	44	35

The best part is that I can see I-5 from the I-405 off-ramp. When traffic is light (speed is 30 mph or more), I can veer to the left and take I-5 to the office. When I-5 is congested, I can veer to the right and take surface streets. While on the surface streets, I can continue to see I-5 and confirm whether I made the right decision and improve my choice for future days.

Waze leads me astraze

With my success with Waze in the morning, I decide to use it for my evening commute home as well. As I turn onto I-5, Waze tells me there is road kill ahead. I wonder where. Then suddenly I see a raccoon and am jolted by the thump. It saddens me to know that I’ve squashed an innocent animal under my tires, even if it is already dead.

The rest of the commute home is uneventful until Waze tells me to exit I-405 at NE 85th St in Kirkland (Fig 4b), 4 miles before my usual exit at SR520 (Fig 4a). Gee, that seems like a bad idea. Should I ignore Waze and keep going straight? Or should I take the exit? Maybe there is an accident on my regular route. Or maybe the crowd of Waze users knows a sneak route. Well, Waze has been pretty accurate so far, so I take the exit.

Ugh, what a mistake. Driving east on NE 85th St takes me straight into a huge traffic jam on Redmond Way. Also, there is a giant construction project on the Microsoft campus, so West Lake Sammamish Pkwy is overflowing with drivers avoiding lane closures on 156th Av NE. My commute today is more than 35 minutes longer than usual. I won’t do that again.

Figures 4a, b. My normal commute home, Waze suggestion for 11/21/2019

Conclusion

Both Waze and Google Maps show you unexpected options and are likely to give better routes than you could find on your own. Overall, my experience with Waze was better than Google Maps, but both could use improvements.

* * * *

All this talk about commute time has me remembering a brain teaser from my childhood. Let’s say I want my average commute speed to be 40 mph. One day, I get stuck in traffic and cover the first half of the distance to work at an average speed of 20 mph. How fast do I have to drive on the second half to meet my goal? Hint: The answer is not 60 mph or even 80 mph.

June 12, 2019

Google news search is (probably) not biased

Posted by gtaniwaki under Real Numeracy | Tags: Search engine, Statistics, Time series |
Leave a Comment

Google rewards reputable reporting, not left-wing politics, from The Economist

by George Taniwaki

A few months ago The Economist added a new feature to its back section called Graphic detail. It’s a pleasure to read because it nearly always contains bivariate plots where the x-axis is something more interesting than the date.

This week’s entry does not disappoint. It is entitled, Seek and you shall find and contains two charts (see above) with interesting x-axes. The charts analyze the impact of Google news search on the traffic a news source receives. It uses two independent measures, Accuracy score and Ideology score to rate different news sources. Accuracy and bias were determined using data from Adfontesmedia.com and Mediabiasfactcheck.com.

Many people claim that Google favors liberal news sources to the detriment of conservative views. Google claims it has a set of outside reviewers who check news sources for accuracy and reach. Point of view is not considered. However, one could imagine that a news source that has a strong point of view may report facts to match a point of view and that would reduce accuracy. As can be seen on the chart on the left, news sources with a strong ideological bias (darker red and blue dots) tend to have lower accuracy scores than less biased sources. I encourage you to go to the website because the data is interactive.

The dependent y-axis is the share of web traffic that comes from search engines. This is a bit problematic since if users believe that Google’s results are biased against their favorite news sources, they will visit it directly without using a search engine. Nonetheless, the data shows that search engine (mostly Google) share of web traffic increases with accuracy, not with ideology. That is, the plot on the left shows a linear relationship while the right plot does not.

Expected v. Actual

A separate experiment confirms the results. The Economist built a model to predict the number of news results appearing in 37 publications should receive from Google’s search engine based on their accuracy and their reach. It then compared the model results to actual search results on a “clean” computer using “a browser with no history, in a politically centrist part of Kansas.” (Why Kansas, you wonder? I’m guessing that is where the author lives.)

No bias detected, from The Economist

Again, no bias was detected. The difference between left and right are small and could be due to how they are defined, time of study, keywords searched, or other factors. The story is an excellent example of combining data from multiple sources, programming a bot to collect data, and visual display of statistical analysis.

August 4, 2011

Stoopid Internet tricks Part 2–Search engines

Posted by gtaniwaki under Real Numeracy | Tags: Internet Explorer, Search engine, Software security |
Leave a Comment

Does using Internet Explorer make you stupid? I think not, but sometimes it can trick you. (See part 1 of this story here.)

I use a variety of browsers and operating systems, but my favorite is Internet Explorer 9 running on Windows 7. I like the feature that combines the address bar with the search box into a single text edit field. It allows me to just type a company name in the search box and the browser will resolve it into a domain name for me. (Of course, not everyone likes this design.)

Anyway, a few minutes ago I was using Safari on my Mac and typed “Ikea” in the address bar. Naturally, what I really wanted was “www.ikea.com”. Safari doesn’t automatically send invalid URLs to the search engine like IE9 does. I have Comcast broadband at home. Comcast detects and captures any invalid URLs and displays its own custom DNS error page, a practice called DNS hijacking. A portion of the page is shown below.

Custom DNS error page. Image from Comcast

Notice that the first item is a sponsored link that has the title “IKEA.com – Official Site” and has the URL www.ikea.com that I wanted highlighted in green. Naturally, I clicked on it. After a few redirections, this is what I see:

It sort of looks like an Ikea home page. Image from rewardsclub.com

This looks like it could be the official IKEA site, but it isn’t. The domain name displayed in the address bar is not for ikea.com but for rewardsclub.com, one of those credit card scam companies that is basically a phishing site. The top part of the page is designed to look like it is complete. But you will notice that the scroll bar indicates there is more content below the fold. If you are willing to scroll down, you’ll see the following disclaimer:

IKEA is a registered trademark of Inter IKEA Systems B.V. BigBrandRewards.com is not affiliated with IKEA®. All IKEA® trademarks are the property of IKEA® and BigBrandRewards.com does not, in any way, claim to represent or own any of the IKEA® trademarks or rights. IKEA® does not own, endorse, or promote BigBrandRewards.com or this promotion.

This Gift Program is not endorsed, sponsored by or affiliated with the manufacturers and retailers of the gift items listed above in anyway. All trademarks, service marks and logos are property of their respective owners.

Well, I guess that disclaimer may protect them from lawsuits by Ikea (trademark infringement) or from disgruntled customers and state attorneys general (fraud and deceptive trade practices). But I doubt it.

This sucks. Only a credulous rube would actually purchase a prepaid credit card. But everyone is forced to waste time figuring out that this is not the Ikea website and either manually typing in the correct URL to get there or go back to Comcast’s search page and click on a different link.

However, I don’t blame Comcast for this travesty, at least not directly. I believe the search results on the DNS server not found error page are provided by Yahoo (which uses Microsoft Bing as its search engine) and that Yahoo and Microsoft run the keyword auctions that populate the sponsored links. Thus, it is up to them to ensure that the green text in the sponsored link ads matches to the domain that the user will be redirected to.

December 29, 2010

Bing offers free songs, scam artist jumps

Posted by gtaniwaki under Real Numeracy | Tags: Search engine, Software security |
Leave a Comment

Today, Microsoft Bing announced that it would offer a free music download to the first 500,000 customers who signed up on its website. I went to Softpedia to read about it, see the screenshot below.

Bing story. Image from Softpedia

Notice the big download button above the story. I clicked on it, which led me to the following download page.

I clicked on the Download Free button, which downloads the file and opens the IE security warning dialog.

Do you notice something unusual here? The file isn’t an mp3 and the publisher isn’t Microsoft Corporation. Instead, it is an executable from some company called ie.conduit-download.com.

Very clever. A company called SearchAle bought a Google display ad that leads people to believe that Microsoft Bing’s free music download is obtained by clicking a big button. I gotta quit clicking on big buttons like that. Though since I run on a Mac with a virtual Windows 7 machine, the worst that can happen is I ruin the VM and need to reimage it.

December 25, 2010

Google Books Ngram viewer

Posted by gtaniwaki under Real Numeracy | Tags: Search engine, Software, Statistics, Time series |
Leave a Comment

Google Books is a project sponsored by search engine giant Google to scan the pages of every book available, convert the scans to text using OCR, and make the resulting text corpus searchable. Not withstanding any remaining copyright disputes surrounding the project, Google has reached an agreement with most of the major copyright holders (authors and the publishers that represent them) around the world. So far, Google has scanned over 15 million books, most of which are no longer in print or commercially available. This database is a treasure trove for history scholars.

Last week, Google released a new statistics tool for the Google Books project called the Ngram Viewer. It it simple to use. You simply enter a list of words or phrases separated by a comma (each called an n-gram and is case-sensitive), the language (American English and British English can be searched separately), the date range (starting from 1500 to 2008, though the number of books is sparse before 1780 which makes the early data very spiky), and the amount of moving average smoothing to apply (long trends are easier to see with smoothing, but the individual yearly data is lost).

The Ngram Viewer is the best time series graphing toy I’ve seen.

There have been lots of stories in the press showing interesting trends in the popularity of certain words and phrases in books. For instance, Jennifer Valentino-DeVries of the Wall St. J. shows that Merry Christmas beats Happy Holidays by a big margin.

Merry Christmas vs Happy Holidays, Image from Google labs

Slate’s Tom Scocca has been posting an Ngram of the Day on the site comparing the frequency of words like shopping vs salvation and television vs the Bible. He even does a comparison between words and shows the year in which the two cross in popularity. For instance, anxiety passes shame in 1942.

anxiety vs. shame. Image from Google labs

Here’s a couple charts I created for the n-grams “independence” and “rebellion” in U.S. and British English. I have no idea what conclusions to draw from this data, but it is just begs for an explanation based on unfounded speculation. There is a spike in both words in U.S. books in the 1770s, but no spike in British literature. Independence becomes more popular than rebellion in books from both countries after about 1820. The word rebellion has a spike again in the U.S. from 1860 to 1870. The absolute occurrence of both words are similar in the two countries starting around 1900. Independence shows a spike from 1940 to 1942 and another spike around 1968 to 1970.

independence vs. rebellion, British (top) vs. American (bottom). Images from Google labs.

For the truly hardcore programmer, the n-gram datasets are available for download from Google. Their use is covered under the Creative Commons Attribution 3.0 Unported license.

The next step is for Google to add its Ngram Viewer toolkit to its Public Data Explorer visualization tool (see Mar 2010 blog post) to allow animations and drill down. I can hardly wait.

[Update: I rescaled the first two graphs to normalized the time spans.]

July 4, 2010

Facebook’s new Questions application

Posted by gtaniwaki under Real Numeracy | Tags: Search engine, Social media, Software |
Leave a Comment

by George Taniwaki

Facebook has a new application (or widget) currently in beta release called Questions that allows users to post questions and wait for another user to answer it. The questions are categorized into groups and users are shown questions that other people who have similar interests have answered. If you know anything about search and recommendation you realize that Facebook is trying to solve two really hard computing problems simultaneously.

First, how do you categorize the questions? What keywords and contexts do you use? For instance, what weight do you give to the interests of the person asking the question? And how do you categorize those interests? What weight do you give to the length of the question? How do you handle misspelled words? Do you give any weight to the fact that any words are misspelled?

Second, how do you decide which questions to show which user? Should you predict if the potential answerer is actually qualified to answer the question? Is it more important to generate lots of responses or to get the correct response quickly? Or is it actually more important to entertain users with a stream of interesting questions, regardless of whether they answer them? (This would be really hard to predict since Facebook will never get any feedback from users regarding the question they don’t answer.)

Community run Q&A sites are not new. Yahoo! Answers and Answers.com have both been around for years and are quite popular. However, I believe that most of the answers are written by a small group of dedicated users who vie for points and recognition. Facebook’s goal is to engage the entire community, since the longer you stay at their site, the more likely you are to click on some ads.

Anyway, I want so show some screenshots. This may violate some promise I made to Facebook. The first shows a few examples of questions from the Questions widget. The widget appears in the right column under the Sponsored links widget. Notice how many of the questions seem to be factual and could be more quickly (and correctly) answered using standard web-based research skills.

Two examples of the Questions widget. Image from Facebook

If you click on a question in Questions, you will taken to a page showing all the responses for that question. You can then vote yea or nay for any response. The example below shows how introspective Facebook users are.

Question responses. Image from Facebook

Finally, if you click on the Asked about link, you will see a list of all the questions related to that category. Notice the example below for the category “Roots”. As I mentioned above, categorizing questions is tough. And was this question really asked by that Kristin Bell?

Category detail. Image from Facebook

June 16, 2010

Cognitive surplus at work

Posted by gtaniwaki under Real Numeracy | Tags: Probability, Search engine |
Leave a Comment

Here’s a funny email exchange between my wife and me. I reversed the thread so that you can read it from top to bottom.

From: George Taniwaki
Sent: Friday, June 11, 2010 10:48 AM
To: Susan Wolcott
Subject: Using search engines to pick stocks

Check out this story, http://www.technologyreview.com/blog/guest/25308/?nlid=3099
George

From: Susan Wolcott
Sent: Friday, June 11, 2010 10:58 AM
To: George Taniwaki
Subject: RE: Using search engines to pick stocks

And the posted comments are interesting, but in a completely different way…

From: George Taniwaki
Sent: Friday, June 11, 2010 11:40 AM
To: Susan Wolcott
Subject: RE: Using search engines to pick stocks

Yeah. Who are these people and how do they decide 1) to read Technology Review and 2) write political rants?

From: Susan Wolcott
Sent: Friday, June 11, 2010 1:44 PM
To: George Taniwaki
Subject: RE: Using search engines to pick stocks

It’s part of that valuable cognitive surplus.

****

Sue and I obviously have too much time, er… valuable cognitive surplus, on our hands. If you don’t get the reference to cognitive surplus, read this book.

Real Numeracy

Google Maps vs Waze

Google Maps, initial attempt

Google Maps, redux

Waze to the rescue

Waze leads me astraze

Conclusion

Google news search is (probably) not biased

Expected v. Actual

Stoopid Internet tricks Part 2–Search engines

Bing offers free songs, scam artist jumps

Google Books Ngram viewer

Cognitive surplus at work

Search this blog

RSS feeds

Categories

More by other kidney donors

More by other data crunchers

Archives