Posts Tagged ‘response time’
Going by the many posts in various LinkedIn groups and blogs, there seems to be some confusion about how to measure and analyze a web application’s performance. This article tries to clarify the different aspects of web performance and how to go about measuring it, explaining key terms and concepts along the way.
Web Application Architecture
The diagram below shows a high-level view of typical architectures of web applications.
The simplest applications have the web and app tiers combined while more complex ones may have multiple application tiers (called “middleware”) as well as multiple datastores.
The Front end refers to the web tier that generates the html response for the browser.
The Back end refers to the server components that are responsible for the business logic.
Note that in architectures where a single web/app server tier is responsible for both the front and back ends, it is still useful to think of them as logically separate for the purposes of performance analysis.
Front End Performance
When measuring front end performance, we are primarily concerned with understanding the response time that the user (sitting in front of a browser) experiences. This is typically measured as the time taken to load a web page. Performance of the front end depends on the following:
- Time taken to generate the base page
- Browser parse time
- Time to download all of the components on the page (css,js,images,etc.)
- Browser render time of the page
For most applications, the response time is dominated by the 3rd bullet above i.e. time spent by the browser in retrieving all of the components on a page. As pages have become increasingly complex, their sizes have mushroomed as well – it is not uncommon to see pages of 0.5 MB or more. Depending on where the user is located, it can take a significant amount of time for the browser to fetch components across the internet.
Front end Performance Tools
Front-end performance is typically viewed as waterfall charts produced by tools such as the Firebug Net Panel. During development, firebug is an invaluable tool to understand and fix client-side issues. However, to get a true measure of end user experience on production systems, performance needs to be measured from points on the internet where your customers typically are. Many tools are available to do this and they vary in price and functionality. Do your research to find a tool that fits your needs.
Back End Performance
The primary goal of measuring back end performance is to understand the maximum throughput that it can sustain.Traditionally, enterprises perform “load testing” of their applications to ensure they can scale. I prefer to call this “scalability testing“. Test clients drive load via bare-bones HTTP clients and measure the throughput of the application i.e. the number of requests per second they can handle. To increase the throughput, the number of client drivers need to be increased until the point where throughput stops to increase or worse stops to drop-off.
For complex multi-tier architectures, it is beneficial to break-up the back end analysis by testing the scalability of individual tiers. For example, database scalability can be measured by running a workload just on the database. This can greatly help identify problems and also provides developers and QA engineers with tests they can repeat during subsequent product releases.
Many applications are thrown into production before any scalability testing is done. Things may seem fine until the day the application gets hit with increased traffic (good for business!). If the application crashes and burns because it cannot handle the load, you may not get a second chance.
Back End Performance Tools
Numerous load testing tools exist with varying functionality and price. There are also a number of open source tools available. Depending on resources you have and your budget, you can also outsource your entire scalability testing.
Front end performance is primarily concerned with measuring end user response times while back end performance is concerned with measuring throughput and scalability.
Service Level Agreements (SLAs) usually specify a response time criteria that must be met. Although SLAs can have a wide range of metrics like throughput, up time, availability etc., we will focus on response times in this article.
We often hear phrases like the following :
- “The response time was 5 seconds”
- “This product’s performance is much worse than slowpoke’s. It takes longer to respond.”
- “Our whizbang product can perform 100 transactions/sec with a response time of 10 seconds or less”
Do you see anything wrong in these statements? Although they sound fine for general conversation, anyone interested in performance should really be asking what exactly do they mean.
Let’s take the first statement above and make the assumption that it refers to a particular page in a web application. When someone says that the response time is 5 seconds, does it mean that when this user typed in the URL of this page, the browser took 5 seconds to respond? Or does it mean that in an automated test repeatedly accessing this page, the average response time was 5 seconds? Or perhaps, the median response time was 5 seconds?
You get the idea. For some reason, people tend to talk loosely about response times. Without going into details of how to measure the response time (that’s a separate topic), this article will focus on what is a meaningful response time metric.
For purposes of this discussion, let us assume we are measuring the response time of a transaction (which can be anything – web, database, cache etc.) What is the most meaningful measure for the response time of a transaction?
Mean Response Time
This is the most common measure of response time, but alas, usually is the most flawed as well. The mean or average response time simply adds up all the individual response times taken from multiple measurements and divides it by the number of samples to get an average. This may be fine if the measurements are fairly evenly distributed over a narrow range as in Figure 1.
- Figure 1: Steady Response Times
- Figure 2: Varying Response Times
But if the measurements vary quite a bit over a large range like in Figure 2, the average response time is not meaningful. Both figures have the same scale and show response times on the y axis for samples taken over a period of time (x axis).
Median Response Time
If the average is not a good representation of a distribution, perhaps the median is? After all, the median marks the 50th percentile of a distribution. The median is useful when the response times do have a normal distribution but have a few outliers. In this case, the median helps to weed out the outliers.The key here is few outliers. It is important to realize that if 50% of the transactions are within the specified time, that means the remaining 50% have a higher response time. Surely, a response time specification that leaves out half the population cannot be a good measure.
90th or 95th percentile Response Time
In standard benchmarks, it is common to see 90th percentile response times used. The benchmark may specify that the 90th percentile response time of a transaction should be within x seconds. This means that only 10% of the transactions have a response time higher than x seconds and can therefore be a meaningful measure. For web applications, the requirements are usually even higher – after all, if 10% of your users are dissatisfied with the site performance, that could be a significant number of users. Therefore, it is common to see 95th percentile used for SLAs in web applications.
A word of caution – web page response times can vary dramatically if measured at the last mile (i.e. real users computers that are connected via cable or DSL to the internet). Figure 3 shows the distribution of response times for such a measurement.
- Figure 3: Response Time Histogram
It uses the same data as in Figure 2. The mean response time for this data set is 12.9 secs and the median is even lower at 12.3 secs. Clearly neither of these measures covers any significant range of the actual response times. The 90th percentile is 17.3 and the 95th is 18.6. These are much better measures for the response time of this distribution and will work better as the SLA.
To summarize, it is important to look at the distribution of response times before attempting to define an SLA. Like many other metrics, a one size fits all approach does not work. Response time measurements on the server side tend to vary a lot less than on the client. A 90th or 95th percentile response time requirement is a good choice to ensure that the vast majority of clients are covered.
Web pages today are becoming increasingly complex and it is now well recognized that simply measuring page load time does not represent the response time of a page.
But what exactly do we mean by response time? Terms such as time to interactivity, perceived performance, above the fold performance etc. have come into vogue. Let’s examine these in turn.
Time To Interactivity
In last year’s Velocity Conference, Nicholas Zakas of Yahoo! gave an excellent presentation on how the Yahoo! Front Page responsiveness was improved in which he focused on time to interactivity. In other words, when can a user actually interact with a page (say click on a link) is more important than ensuring that the entire page gets rendered. With pages increasingly containing animated multi-media content which the user may not care about, this definition may make sense. However, imagine a page whose primary purpose is to serve up links to other pages (e.g. news) and loads lots of images after onload. All the links appear first which means the user can interact with the site, yet the page has lots of white space where the images go – can we truly just measure the time to interactivity and claim this to be the response time of a page?
Another popular term, perceived performance is defined loosely as the time that the user perceives it takes for the page. This means that all the major elements of the page that the user can easily see must be rendered. By definition, this measurement is highly subjective. Perceptions differ – for e.g. one user may not miss the Chat pane in gmail while for another this is a very important feature. Further, again by definition, this metric is application dependent. Nevertheless, for a particular web application, it is possible for developers and performance engineers to agree on a definition of what is perceived performance and work towards measuring/improving it.
Above The Fold Performance
In the Velocity Online conference last week, Jake Brutlag of google proposed “Above the Fold Time” (AFT) as a method of measuring a more meaningful response time. This was defined as the time taken to render all of the static elements in the visible portion of the browser window. Jake and others have put in some serious thought and defined the algorithm to distinguish between the static and dynamic (e.g. ads) content on a page.
Clearly, one part of the AFT proposal is valid – one doesn’t really care about the content on the page that is not visible initially and which the user can only get to by scrolling. But measuring everything above the fold has its issues as well. Take for instance the new Yahoo! Mail Beta. Y! Mail now has extensive social features to enable one to connect to Facebook, Twitter, Messenger and endless third-party applications. It is arguable whether the user will expect all of these third-party links to be on the page before he “perceives” that his request is complete. The page still looks finished without those links.
In my opinion, we need to distinguish between the essential parts of the page vs the optional ones (A caveat here – although no one would argue that an ad is essential, it is essential for the page to look complete). Looking at just static vs dynamic pixels misses this point. The difficulty of course is that there is no uniform way to define what is essential – it once again becomes application/page specific.
But for now, that’s the way I am going – defining “perceived performance” on a case by case basis.