Performance Blog

Posts Tagged ‘perceived performance

Service Level Agreements (SLAs) usually specify a response time criteria that must be met. Although SLAs can have a wide range of metrics like throughput, up time, availability etc., we will focus on response times in this article.

We often hear phrases like the following :

  • “The response time was 5 seconds”
  • “This product’s performance is much worse than slowpoke’s. It takes longer to respond.”
  • “Our whizbang product can perform 100 transactions/sec with a response time of 10 seconds or less”

Do you see anything wrong in these statements? Although they sound fine for general conversation, anyone interested in performance should really be asking what exactly do they mean.

Let’s take the first statement above and make the assumption that it refers to a particular page in a web application. When someone says that the response time is 5 seconds, does it mean that when this user typed in the URL of this page, the browser took 5 seconds to respond? Or does it mean that in an automated test repeatedly accessing this page, the average response time was 5 seconds? Or perhaps, the median response time was 5 seconds?

You get the idea. For some reason, people tend to talk loosely about response times. Without going into  details of how to measure the response time (that’s a separate topic), this article will focus on what is a meaningful response time metric.

For purposes of this discussion, let us assume we are measuring the response time of a transaction (which can be anything – web, database, cache etc.) What is the most meaningful measure for the response time of a transaction?

Mean Response Time

This is the most common measure of response time, but alas, usually is the most flawed as well. The mean or average response time simply adds up all the individual response times taken from multiple measurements and divides it by the number of samples to get an average. This may be fine if the measurements are fairly evenly distributed over a narrow range as in Figure 1.

Steady Response Times
Figure 1: Steady Response Times
Figure 2: Varying Response Times
Figure 2: Varying Response Times

But if the measurements vary quite a bit over a large range like in Figure 2, the average response time is not meaningful. Both figures have the same scale and show response times on the y axis for samples taken over a period of time (x axis).

Median Response Time

If the average is not a good representation of a distribution, perhaps the median is? After all, the median marks the 50th percentile of a distribution. The median is useful when the response times do have a normal distribution but have a few outliers. In this case, the median helps to weed out the outliers.The key here is few outliers. It is important to realize that if 50% of the transactions are within the specified time, that means the remaining 50% have a higher response time.  Surely, a response time specification that leaves out half the population cannot be a good measure.

90th or 95th percentile Response Time

In standard benchmarks, it is common to see 90th percentile response times used. The benchmark may specify that the 90th percentile response time of a transaction should be within x seconds. This means that only 10% of the transactions have a response time higher than x seconds and can therefore be a meaningful measure. For web applications, the requirements are usually even higher – after all, if 10% of your users are dissatisfied with the site performance, that could be a significant number of users. Therefore, it is common to see 95th percentile used for SLAs in web applications.

A word of caution – web page response times can vary dramatically if measured at the last mile (i.e. real users computers that are connected via cable or DSL to the internet). Figure 3 shows the distribution of response times for such a measurement.

Figure 3: Response Time Histogram
Figure 3: Response Time Histogram

It uses the same data as in Figure 2. The mean response time for this data set is 12.9 secs and the median is even lower at 12.3 secs. Clearly neither of these measures covers any significant range of the actual response times. The 90th percentile is 17.3 and the 95th is 18.6. These are much better measures for the response time of this distribution and will work better as the SLA.

To summarize, it is important to look at the distribution of response times before attempting to define an SLA. Like many other metrics, a one size fits all approach does not work. Response time measurements on the server side tend to vary a lot less than on the client. A 90th or 95th percentile response time requirement is a good choice to ensure that the vast majority of clients are covered.

Web pages today are becoming increasingly complex and it is now well recognized that simply measuring page load time does not represent the  response time of a page.

But what exactly do we mean by response time? Terms such as time to interactivity, perceived performance, above the fold performance etc. have come into vogue. Let’s examine these in turn.

Time To Interactivity

In last year’s Velocity Conference, Nicholas Zakas of Yahoo! gave an excellent presentation on how the Yahoo! Front Page responsiveness was improved in which he focused on time to interactivity. In other words, when can a user actually interact with a page (say click on a link) is more important than ensuring that the entire page gets rendered. With pages increasingly containing animated multi-media content which the user may not care about, this definition may make sense. However,  imagine a page whose primary purpose is to serve up links to other pages (e.g. news) and loads lots of images after onload. All the links appear first which means the user can interact with the site, yet the page has lots of white space where the images go – can we truly just measure the time to interactivity and claim this to be the response time of a page?

Perceived Performance

Another popular term, perceived performance is defined loosely as the time that the user perceives it takes for the page. This means that all the major elements of the page that the user can easily see must  be rendered. By definition, this measurement is highly subjective. Perceptions differ – for e.g. one user may not miss the Chat pane in gmail while for another this is a very important feature. Further, again by definition, this metric is application dependent. Nevertheless, for a particular web application, it is possible for developers and performance engineers to agree on a definition of what is perceived performance and work towards measuring/improving it.

Above The Fold Performance

In the Velocity Online conference last week, Jake Brutlag of google proposed  “Above the Fold Time” (AFT) as a method of measuring a more meaningful response time. This was defined as the time taken to render all of the static elements in the visible portion of the browser window. Jake and others have put in some serious thought and defined the algorithm to distinguish between the static and dynamic (e.g. ads) content on a page.

Clearly, one part of the AFT proposal is valid –  one doesn’t really care about the content on the page that is not visible initially and which the user can only get to by scrolling. But measuring everything above the fold has its issues as well. Take for instance the new Yahoo! Mail Beta. Y! Mail now has extensive social features to enable one to connect to Facebook, Twitter, Messenger and endless third-party applications. It is arguable whether the user will expect all of these third-party links to be on the page before he “perceives” that his request is complete.  The page still looks finished without those links.

In my opinion, we need to distinguish between the essential parts of the page vs the optional ones (A caveat here – although no one would argue that an ad is essential, it is essential for the page to look complete).  Looking at just static vs dynamic pixels misses this point. The difficulty of course is that there is no uniform way to define what is essential – it once again becomes application/page specific.

But for now, that’s the way I am going – defining “perceived performance” on a case by case basis.


Shanti's Photo

Pages

Latest Tweets

Categories

Archives