Quick Throughput Experiment
Posted June 21, 2010on:
One of the first things we performance engineers do with a new server application is to conduct a quick throughput experiment. The goal is to find the maximum throughput that the server can deliver. In many cases, it is important that the server be capable of delivering this throughput with a certain response time bound. Thus, we always qualify the throughput with an average and 90th percentile response time (i.e. we want 90% of the requests to execute within the stated time). Any decent workload should therefore measure both the throughput and response time.
Let us assume we have such a workload. How best to estimate the maximum throughput within the required response time bounds ? The easiest way to conduct such an experiment is to run a bunch of clients (emulated users, virtual users or vusers) to drive load against the target server without any think time. Here is how the flow from a vuser will look like :
Create Request ==> Send Request ==> Receive Response ==> Log statistics
This sequence of operations is executed repeatedly (without any pauses in between i.e. no think times) for a sufficient length of time to get statistically valid results. So, to find the maximum throughput, run a series of tests, each time increasing the number of clients. Simple, isn’t it ?
A little while ago, I realized that if one doesn’t have the proper training, this isn’t that simple. I came across such an experiment with the following results :
What is wrong with these results ?
Firstly, the throughput is roughly the same at all loads. This probably means that the system saturated even below the base load level of 5,000 Vusers. Recall, that the workload does not have a think time. When you have this many users repeatedly submitting requests, the server is certain to be overloaded. I must mention that the server in this case is a single system with 12 cores having hyper-threading enabled. A multi-threaded server application typically will use one or more threads to receive requests from the network, then hand the request to a worker thread for processing. Considering the context-switching, waiting, locking etc. one can assume that at most one can run 4x the number of cores or in this case about 96 server threads. Since each Vuser submits a request and waits for a response, it probably requires 2-2.5x the number of Vusers as the number of server threads to saturate a system. Using this rule of thumb, one would need to run a maximum of 200-250 Vusers.
Throughput with Correct Vusers
After I explained the above, the tests were re-run with the following results:
Notice that the maximum throughput is still nearly the same as from the previous set, but it has been achieved with a much lower number of Vusers (aka clients). So does it really matter ? Doesn’t it look better to say that the server could handle 35000 connections rather than 300 ? No, it doesn’t. The reason becomes obvious if we take a look at the response times.
The Impact of Response Times
The graph below shows how the 90% Response Time varied for both sets of experiments :
The response times for the first experiment with very large number of Vusers ranges in the hundreds of millisecs. When the number of Vusers was pared down to just reach saturation, the server responded hundred times faster ! Intuitively too, this makes sense. If the server is inundated with requests, they are just going to queue up. The longer a request waits for processing, the larger is it’s response time.
When doing throughput performance experiments, it is important to take into consideration the type of server application, the hardware characteristics etc. and run appropriate load levels. Otherwise, although you may be able to find out what the maximum throughput is, you will have no idea what the response time is.