Mental Wanking and Premature Optimization

bugatti-veyron.jpgOptimization appeals to us geeks. I am sure there is some psychological reason for this, If you happen to know what the reason is then drop me a line. I promise not about to witter on about The Fallacy of Premature Optimization either. Optimization is important but far too many of us get caught up in pointless arguments about performance. The following article deals specifically with the faster webserver, in particular serving static pages.

There’s a ton of articles on Apache vs Apache2 vs lightttpd vs thttpd vs mongrel vs nginx vs litespeed vs whatever-httpd. The vast majority of these articles cater for people not adverse to a bit of mental wanking. These people have a perceived problem ie performance. They want to get an extra few percent from their machine. Off they go on their merry way Googling for a performance comparison chart that will show them some stats. These people are the unadulterated speed freaks of the computing world. A lot of them understand their… affliction, but quite a few don’t.

In some organizations performance is critical ie Yahoo, Google, Wikipedia, livejournal, BBC, Dozens of Universities, Any bank, etc…. I am missing hundreds here. For some it’s competitive advantage ie Google. For these organizations performance is talked about all the time. It’s brought up in meetings, at the bar, over lunch, out jogging and in their dreams. It has a direct affect on the bottom line.

My point is this, most people online discussing the relative merits of Apache vs Apache2 vs lighttpd are not in this select group, most are running small websites.

Here are some idle thoughts that may help the more pragmatic developers out there.

A fairly common question asked is. “What hardware do we need to sustain “N” requests per second”? Assume a static page is 25KB. If we also assume that we have an eCPM of $1.00 (and this is a small amount) we can put “N” in perspective……

  1. 1   rps  == 2,592,000      rpm == $2592 per month
  2. 10 rps  == 25,920,000     rpm == $25,920 pm
  3. 50       == 129,600,000   rpm == $129,600 pm
  4. 100     == 259,200,000   rpm == $259,200 pm
  5. 500     == 1,296,000,000  rpm == $1.296 million per month

500 pages per second is nearly 1.3 Billion Pages per month. This is an awful lot of pages and $1.3 million dollars is certainly an awful lot of money. On hearing 500rps as a requirement a common reaction is to jump too Google and start looking for a suitable configuration. It would be far simpler and more enlightening to do a few calculations and a small test to see where we stand.

For instance:

500 * 25KB requests per second == (500*25*1024*8)b/ps == 104Mb/s

This is a serious requirement. A 100Mb dedicated line is not cheap and would be likely to set you back several thousand dollars per month but hey we are making $1.3 Million per month, who cares!

Now, if I was to go out and spend $2000 dollars on a server just how many pages could it serve. Lets assume I own the following machine:

  • Dell 2850
  • RAM: 2GB
  • model name: Intel(R) Xeon(TM) CPU 2.80GHz
  • cpu family: 15
  • 6 10K SCSI disks (2 mirrored, 4 in raid 10)

Basically a decent spec 2005/6 Single Processor Dual Core machine with the OS disks on a raid 1 array and the web server serving pages from the raid 10 array.

So, is it possible for this single machine to serve 500rps? Yes! It could, if we had enough bandwidth! The following results were taken from a base install Debian machine. I did not tune apache2 in any way whatsoever. This was done on a 100Mb Ethernet network.

debian:~# ab -n 10000 -c 50 http://farty.com/
This is ApacheBench, Version 2.0.40-dev
Completed 1000 requests
………
Finished 10000 requests

Server Software: Apache/2.2.3
Server Hostname:        farty.com
Server Port: 80

Document Path:   /
Document Length: 25000 bytes
Concurrency Level: 50
Time taken for tests: 21.877168 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 252929056 bytes

HTML transferred: 250243696 bytes

Requests per second: 457.10 [#/sec] (mean)
Time per request: 109.386 [ms] (mean)
Time per request: 2.188 [ms] (mean, across all concurrent requests)
Transfer rate: 11290.35 [Kbytes/sec] received

I know ab is not the best tool to be running benchmarks and that requesting a single file is different thatn random files etc but I am not trying to be precise.

So, a cheap machine can saturate a 100Mbit line. Yes, Easily. At no point during the tests did the server go over 0.5 load. I also ran a more long running test and the results were the same. It hardly broke a sweat and this is running an unmodified, untweaked pre-forking Apache2 server.

im_AH64ApacheHelicopter.jpg

So whats the conclusion, is benchmarking Apache2 vs lighttpd pointless? I would say that 99% of the time, yes. If you ever have the problem were you need to be serving the amount of pages where the difference between Apache2 and lighttpd makes a big difference then you are likely able to afford more hardware or staff but I wouldn’t be choosing lighttpd over Apache2 unless I really have to.

Heres a quote from a 20000 concurrent connection apache setup

… HEAnet’s National Mirror Server for Ireland. Currently
mirroring over 50,000 projects ….. It regularly sustains over 20,000
concurrent connections on a single Apache instance
and has served as
many as 27,000 with about 3.5 Terabytes of content per day. The
front-end system is a Dell 2650, with 2 2.4 Ghz Xeon processors, 12Gb
of memory and the usual 2 system disks and 15k RPM SCSI disks, running
Debian GNU/Linux and Apache 2.x.

So the next time you hear someone discussing how fast their httpd server is at serving static content ask yourself if they are just jerking off or do they know what they are talking about.