Posts Tagged ‘throughput’
Going by the many posts in various LinkedIn groups and blogs, there seems to be some confusion about how to measure and analyze a web application’s performance. This article tries to clarify the different aspects of web performance and how to go about measuring it, explaining key terms and concepts along the way.
Web Application Architecture
The diagram below shows a high-level view of typical architectures of web applications.
The simplest applications have the web and app tiers combined while more complex ones may have multiple application tiers (called “middleware”) as well as multiple datastores.
The Front end refers to the web tier that generates the html response for the browser.
The Back end refers to the server components that are responsible for the business logic.
Note that in architectures where a single web/app server tier is responsible for both the front and back ends, it is still useful to think of them as logically separate for the purposes of performance analysis.
Front End Performance
When measuring front end performance, we are primarily concerned with understanding the response time that the user (sitting in front of a browser) experiences. This is typically measured as the time taken to load a web page. Performance of the front end depends on the following:
- Time taken to generate the base page
- Browser parse time
- Time to download all of the components on the page (css,js,images,etc.)
- Browser render time of the page
For most applications, the response time is dominated by the 3rd bullet above i.e. time spent by the browser in retrieving all of the components on a page. As pages have become increasingly complex, their sizes have mushroomed as well – it is not uncommon to see pages of 0.5 MB or more. Depending on where the user is located, it can take a significant amount of time for the browser to fetch components across the internet.
Front end Performance Tools
Front-end performance is typically viewed as waterfall charts produced by tools such as the Firebug Net Panel. During development, firebug is an invaluable tool to understand and fix client-side issues. However, to get a true measure of end user experience on production systems, performance needs to be measured from points on the internet where your customers typically are. Many tools are available to do this and they vary in price and functionality. Do your research to find a tool that fits your needs.
Back End Performance
The primary goal of measuring back end performance is to understand the maximum throughput that it can sustain.Traditionally, enterprises perform “load testing” of their applications to ensure they can scale. I prefer to call this “scalability testing“. Test clients drive load via bare-bones HTTP clients and measure the throughput of the application i.e. the number of requests per second they can handle. To increase the throughput, the number of client drivers need to be increased until the point where throughput stops to increase or worse stops to drop-off.
For complex multi-tier architectures, it is beneficial to break-up the back end analysis by testing the scalability of individual tiers. For example, database scalability can be measured by running a workload just on the database. This can greatly help identify problems and also provides developers and QA engineers with tests they can repeat during subsequent product releases.
Many applications are thrown into production before any scalability testing is done. Things may seem fine until the day the application gets hit with increased traffic (good for business!). If the application crashes and burns because it cannot handle the load, you may not get a second chance.
Back End Performance Tools
Numerous load testing tools exist with varying functionality and price. There are also a number of open source tools available. Depending on resources you have and your budget, you can also outsource your entire scalability testing.
Front end performance is primarily concerned with measuring end user response times while back end performance is concerned with measuring throughput and scalability.
One of the first things we performance engineers do with a new server application is to conduct a quick throughput experiment. The goal is to find the maximum throughput that the server can deliver. In many cases, it is important that the server be capable of delivering this throughput with a certain response time bound. Thus, we always qualify the throughput with an average and 90th percentile response time (i.e. we want 90% of the requests to execute within the stated time). Any decent workload should therefore measure both the throughput and response time.
Let us assume we have such a workload. How best to estimate the maximum throughput within the required response time bounds ? The easiest way to conduct such an experiment is to run a bunch of clients (emulated users, virtual users or vusers) to drive load against the target server without any think time. Here is how the flow from a vuser will look like :
Create Request ==> Send Request ==> Receive Response ==> Log statistics
This sequence of operations is executed repeatedly (without any pauses in between i.e. no think times) for a sufficient length of time to get statistically valid results. So, to find the maximum throughput, run a series of tests, each time increasing the number of clients. Simple, isn’t it ?
A little while ago, I realized that if one doesn’t have the proper training, this isn’t that simple. I came across such an experiment with the following results :
What is wrong with these results ?
Firstly, the throughput is roughly the same at all loads. This probably means that the system saturated even below the base load level of 5,000 Vusers. Recall, that the workload does not have a think time. When you have this many users repeatedly submitting requests, the server is certain to be overloaded. I must mention that the server in this case is a single system with 12 cores having hyper-threading enabled. A multi-threaded server application typically will use one or more threads to receive requests from the network, then hand the request to a worker thread for processing. Considering the context-switching, waiting, locking etc. one can assume that at most one can run 4x the number of cores or in this case about 96 server threads. Since each Vuser submits a request and waits for a response, it probably requires 2-2.5x the number of Vusers as the number of server threads to saturate a system. Using this rule of thumb, one would need to run a maximum of 200-250 Vusers.
Throughput with Correct Vusers
After I explained the above, the tests were re-run with the following results:
Notice that the maximum throughput is still nearly the same as from the previous set, but it has been achieved with a much lower number of Vusers (aka clients). So does it really matter ? Doesn’t it look better to say that the server could handle 35000 connections rather than 300 ? No, it doesn’t. The reason becomes obvious if we take a look at the response times.
The Impact of Response Times
The graph below shows how the 90% Response Time varied for both sets of experiments :
The response times for the first experiment with very large number of Vusers ranges in the hundreds of millisecs. When the number of Vusers was pared down to just reach saturation, the server responded hundred times faster ! Intuitively too, this makes sense. If the server is inundated with requests, they are just going to queue up. The longer a request waits for processing, the larger is it’s response time.
When doing throughput performance experiments, it is important to take into consideration the type of server application, the hardware characteristics etc. and run appropriate load levels. Otherwise, although you may be able to find out what the maximum throughput is, you will have no idea what the response time is.