A common task that webmasters are asked to perform is to get bids for hosting a website. When gathering information to prepare a quote, the vendor will often ask what the peak load (in server requests per second) will be. As a webmaster you may well ask, how the heck do I do that?

Estimating page views per month

Estimating peak server requests per second is a four step process. First, we must estimate page views per month. Next, we estimate average page views per second during the heaviest or prime viewing period. Then we estimate peak page views per second during the prime viewing period. Finally, we estimate peak server requests per second during the prime viewing period.

Average page views per month can be obtained by looking at server logs. If server logs are not available or you are creating a new site, then it can be estimated using logs from similar sites. For this blog post, we will assume the website generates an estimated 2.6 million page views per month. (By comparison, this blog generates about 2,000 page views per month, mostly from bots, I think.)

Estimating average page views per prime viewing second

Assume your website that generates 2.6 million page views per month has traffic that is fairly steady all day and all night on every day. That is, of the 730 hours in a month (= 365 days per year / 12 months per year * 24 hours per day) , all of them will be prime viewing hours. In that case, we can calculate the mean page views per second by doing some simple arithmetic.

page views per second = 2.6 million page views per month / 730 hours per month / 3600 seconds per hour = approx. 1 page view per second

But what if traffic to the website isn’t steady. What if people only visit it during work hours? Well, there are about 168 work hours per month compared to about 730 actual hours per month, a ratio of about 4.3 to1. So during prime viewing hours there will be about 4.3 page views per second and 0 page views per second during non-work hours. (This assumes everyone works the same days and same hours regardless of time zone.)

The prime viewing hours for a website can be even more compressed. Let’s say you run a website for NBC and it has a blog that contains a synopsis of the television show Grimm and an update is posted immediately after each new episode airs. In that case, perhaps all of the page views will occur during a 4 hour period starting at 10:00 pm every Friday. Thus, there will be 16 prime viewing hours per month during which there will be 45 page views per second and 0 page views per second during the rest of the month.

The chart below shows the page view distribution for the three cases described above. This model is quite simplified. It can obviously be made more complex by assuming that the prime viewing hour is dependent on time zone, that page views do not drop to zero during the non-prime viewing hours, and having multiple variables that affect page views during a particular hour.

PrimeViewing

Three ways to achieve 2.6 million page views per month. Image by George Taniwaki

Let’s look at the distribution of page views in more detail. In the four-hour prime viewing period case we said that there were 0 page views per second before and after the prime viewing period and an average of 45 page views per second during the prime viewing period. If the number of page views is constant throughout the prime viewing period, then the distribution curve is rectangular as shown by the blue line in the chart below.

But it is unlikely that the change in page view rate is so abrupt. It is more likely that page views rise steadily to a peak and then fall. If the distribution is triangular and spread across four hours, then the average page views at the maximum point will be 90 per second (=45*2) as shown by the brown line below. If the distribution is bell shaped, called the normal distribution, then the average peak page views at the maximum point will be somewhere in between as shown by the green line below.

ProbDist

Three ways to achieve 625,000 page views in an evening. Image by George Taniwaki

One caveat, I tried to draw the curves in the chart above so that all three would have similar variance but didn’t actually do the calculations to verify it.

Estimating peak page views per prime viewing second

All of the work above was to find the average number of page views per second during the prime viewing time. However, visitors to the website will arrive randomly. So we can expect that there will be some fluctuation in the number of page views during a second. Some seconds during the prime viewing time will have fewer than the average number of visitors and some seconds will have more. We can model this random arrival of visitors using the Poisson distribution.

Since the arrival of visitors will be random, we cannot estimate the maximum number of visitors the website will ever receive in a second. That number is actually infinite. But we can estimate it for a variety of confidence levels, such as 90%, 99%, and even 99.999% (the so-called five 9s availability level). In this case confidence level indicates the proportion of one second intervals that will be below the peak.

Using Excel’s Poisson distribution function we can estimate the ratio between peak page views per second to average page views per second at various confidence levels. The results are shown in the three tables below. Notice that although the average page views per second can be a fraction, the peak page views per second is always an integer.

Avg. page views per month

Avg. page views per second at max. point*

Peak page view per second at 0.9 confidence level

Ratio of peak to average at max. point

1

0.00000165

0 or 1

0 or 604,800

1,000

0.00165

0 or 1

0 or 605

1,000,000

1.65

3

1.81

2,600,000

4.30

7

1.63

1,000,000,000

1653

1706

1.03

Avg. page views per month

Avg. page views per second at max. point*

Peak page view per second at 0.99 confidence level

Ratio of peak to average at max. point

1

0.00000165

0 or 1

0 or 604,800

1,000

0.00165

0 or 1

0 or 605

1,000,000

1.65

5

3.0

2,600,000

4.30

10

2.3

1,000,000,000

1653

1749

1.06

Avg. page views per month

Avg. page views per second at max. point*

Peak page view per second at 0.99999 confidence level

Ratio of peak to average at max. point

1

0.00000165

0 or 1

0 or 604,800

1,000

0.00165

1

605

1,000,000

1.65

9

5.4

2,600,000

4.30

16

3.7

1,000,000,000

1653

1830

1.11

*Assumes 168 prime viewing hours per month with uniform distribution

Also notice that when the average page views per second is low, the peak page views per second can have two solutions, 0 or 1. These cases occur when the average page views per second is below 1- confidence level. For instance, if all you care is that your web server can handle all the traffic 99% of the time, and your average traffic is less than 0.01 page views per second you don’t need a web server at all! That’s because 99% of the time (during the prime viewing period), there is no traffic to your website.

However, if your goal is to be able to serve 99% of your visitors during the peak viewing time, then you need a web server than can deliver at least one page view per second. And if you provide a web server to deliver one page every 100 seconds, your ratio of peak to average will be 100. Providing a complete web server ready to serve the rare visitor results in tremendous overhead costs, which is why cloud computing, where resources are shared among many websites, is becoming so popular.

Finally, notice that when the average page views per second is high (say 1,000) , then the ratio of peak to average is close to 1 and does not grow very much even at high availability (or confidence) levels. At high levels of average page views, the error in estimating the average number of page views per second is likely to be much greater than the error introduced by ignoring the random distribution of page views per second.

Estimating peak server requests per prime viewing second

Our last step is to estimate the number of server calls generated by a single web page request from a user. A typical web page consists of a static html file plus one or more images, videos, ads, and JavaScript widgets displayed on the page. (In the case of dynamic pages, the content will be generated on the server, usually as a jsp file, or a aspx file if you are using the Microsoft .NET Framework.) If the page is not cached, sending all of the page contents may take over 100 requests to the server.

Assume your website consists of a single page that contains 100 items and all of those items reside on a single server. Now assume a single user calls for that page and you don’t want the user to have to wait more than one second before being able to interact with any part of it. That means the web server will need to be able to handle at least 100 requests per second per page view. (There are other potential bottlenecks in rendering the web page including Internet traffic, ISP speed, and the speed of the client computer, but we’ll ignore these for purposes of this blog post.)

Using the five nines confidence level, the final results for page views and server calls are shown in the table below. For our website with an expected 2.6 million page views per month, we need a web server that can handle 1,600 requests per second.

Avg. page views per month

Avg. server requests per month*

Peak page view per second at 0.99999 confidence level**

Peak server requests per second at 0.99999 confidence level*,**

1

100

1***

100***

1,000

100,000

1

100

1,000,000

100,000,000

9

900

2,600,000

260,000,000

16

1,600

1,000,000,000

100,000,000,000

1830

183,000

*Assumes 100 server calls per page view
**Assumes 168 prime viewing hours per month with uniform distribution
***Assumes goal is to satisfy 99.999% of visitor requests, not 99.999% of time

*****

Update: If your website is one of many hosted on a single server, then you should skip the calculations for estimating peak page views per prime viewing second. That’s because traffic to your site will be combined with traffic to other sites. In that case, it is up to the company hosting the sites to combine the average traffic from all the sites first, then calculate the peak page views based on their promised availability.