Better HTTP Benchmarking
When Mark Nottingham (a key person behind HTTP over the years) read the comparison between various Web servers and their HTTP performance he wanted to take the time to discuss HTTP testing and has laid out some nice rules:
- Consistency: The most important thing to get right is to test the same time, every time. Any changes in the system — whether its an OS upgrade or another app running and stealing bandwidth or CPU — can affect your test results, so you need to be aggressive about nailing down the test environment.
- One Machine, One Job: The most common mistake I see people making is benchmarking a server on the same box where the load is generated. This doesn’t just put your results out a little bit, it makes them completely unreliable.
- Check the Network: Before each test, you need to understand how much capacity your network has, so that you’ll know when it’s limiting your test, rather than the server your’e testing.
- Remove OS Limitations: You need to make sure that the operating system doesn’t impose artificial limits on your server’s performance.
- Don’t Test the Client: Modern, high-performance servers make it very easy to mistake limitations in your load generator for the capacity of the server you’re testing. So, check to make sure your client box isn’t maxed out on CPU.
- Overload is not Capacity: A better way to get an idea of capacity is to test your server at progressively higher loads, until it reaches capacity and then backs off.
- Thirty Seconds isn’t a Test: It takes a while for the various layers of buffers and caches in the applications, OS and network stacks to stabilise, so a 30 second test can be very misleading.
- Do More than Hello World: Finding out how quickly your implementation can serve a 4-byte response body is an interested but extremely limited look at how it performs. What happens when the response body is 4k — or 100k — is often much more interesting, and more representative of how it’ll handle real-life load.
- Not Just Averages: If someone tells you that a server does 1,000 responses a second with an average latency of 5ms, that’s great. But what if some of those responses took 100ms?
- Publish it All: A result given without enough information to reproduce it is at best a useless statement that requires people to take it on faith (a bad idea), and at worst an intentional effort to mislead.
- Try Different Tools: If you got this far, you might think I’m championing httperf and autobench over other tools. While I’d like to have a single singing, dancing test tool, httperf is unfortunately not it; for modern servers, it’s simply too slow, mostly because it doesn’t implement an event loop. While that’s fine for testing PHP apps that can do 50 or 500 requests a second, it’s completely inadequate for testing modern Web servers that can do multiple tens of thousands of requests a second without breaking a sweat.