Performance Blog

Posts Tagged ‘scalability

I just posted an article on how to performance load testing in production on the Box Tech blog – check it out.

The focus of that article is applying load to ensure that the system is stable and performs reasonably. This should be distinguished from Scalability Testing where the goal is to measure the scalability of the application for performance analysis as well as capacity planning.

To test scalability, it is important to increase the load methodically so that performance can be measured and plotted. For example, we can start with running 100 emulated users, then increment in terms of 100 until we reach 1000 users. This will give us 10 data points which is sufficient to give a good estimate of scalability. To compute a scalability metric, take a look at Gunther’s USL (Universal Scalability Law).


Going by the many posts in various LinkedIn groups and blogs, there seems to be some confusion about how to measure and analyze a web application’s performance. This article tries to clarify the different aspects of web performance and how to go about measuring it, explaining key terms and concepts along the way.

Web Application Architecture

The diagram below shows a high-level view of typical architectures of web applications.

The simplest applications have the web and app tiers combined while more complex ones may have multiple application tiers (called “middleware”) as well as multiple datastores.

The Front end refers to the web tier that generates the html response for the browser.

The Back end refers to the server components that are responsible for the business logic.

Note that in architectures where a single web/app server tier is responsible for both the front and back ends, it is still useful to think of them as logically separate for the purposes of performance analysis.

Front End Performance

When measuring front end performance, we are primarily concerned with understanding the response time that the user (sitting in front of a browser) experiences. This is typically measured as the time taken to load a web page. Performance of the front end depends on the following:

  • Time taken to generate the base page
  • Browser parse time
  • Time to download all of the components on the page (css,js,images,etc.)
  • Browser render time of the page

For most applications, the response time is dominated by the 3rd bullet above i.e. time spent by the browser in retrieving all of the components on a page. As pages have become increasingly complex, their sizes have mushroomed as well – it is not uncommon to see pages of 0.5 MB or more. Depending on where the user is located, it can take a significant amount of time for the browser to fetch components across the internet.

Front end Performance Tools

Front-end performance is typically viewed as waterfall charts produced by tools such as the Firebug Net Panel. During development, firebug is an invaluable tool to understand and fix client-side issues. However, to get a true measure of end user experience on production systems, performance needs to be measured from points on the internet where your customers typically are. Many tools are available to do this and they vary in price and functionality. Do your research to find a tool that fits your needs.

Back End Performance

The primary goal of measuring back end performance is to understand the maximum throughput that it can sustain.Traditionally, enterprises perform “load testing” of their applications to ensure they can scale. I prefer to call this “scalability testing“. Test clients drive load via bare-bones HTTP clients and measure the throughput of the application i.e. the number of requests per second they can handle. To increase the throughput, the number of client drivers need to be increased until the point where throughput stops to increase or worse stops to drop-off.

For complex multi-tier architectures, it is beneficial to break-up the back end analysis by testing the scalability of individual tiers. For example,  database scalability can be measured by running a workload just on the database. This can greatly help identify problems and also provides developers and QA engineers with tests they can repeat during subsequent product releases.

Many applications are thrown into production before any scalability testing is done. Things may seem fine until the day the application gets hit with increased traffic (good for business!). If the application crashes and burns because it cannot handle the load, you may not get a second chance.

Back End Performance Tools

Numerous load testing tools exist with varying functionality and price. There are also a number of open source tools available. Depending on resources you have and your budget, you can also outsource your entire scalability testing.


Front end performance is primarily concerned with measuring end user response times while back end performance is concerned with measuring throughput and scalability.


When load testing an application, the first set of tests should focus on measuring the maximum throughput. This is especially true of multi-user, interactive applications like web applications. The maximum throughput is best measured by running a few emulated users with zero think time. This means that each emulated user sends a request, receives a response and immediately loops back to send the next request. Although this is artificial, it is the best way to quickly determine the maximum performance of the server infrastructure.
<h2>Little’s Law</h2>
Once you have that throughput (say X), we can use Little’s Law to estimate the number of real simultaneous users that the application can support. In simple terms, Little’s Law states that :

N = X  / λ

where N is the number of concurrent users,  λ is the average arrival rate and X is the throughput. Note that the arrival rate is the inverse of the inter-arrival time i.e. the time between requests.

To understand this better, let’s take a concrete example from some tests I ran on a basic PHP script deployed in an apache server. The maximum throughput obtained was 2011.763 requests/sec with an average response time of 6.737 ms, an average think time of 0.003 secs when running 20 users. The arrival rate is the inverse of the inter-arrival time which is the sum of the response time and think time. In this case, X is .2011.763 and λ is 1/(0.006737 + 0.003). Therefore,

N = X / λ  = 2011.763 * 0.009737 = 19.5885

This is pretty close to the actual number of emulated users which is 20.


Estimating Concurrent Users

This is all well and good, but how does this help us in estimating the number of real concurrent users (with non-zero think time) that the system can support ? Using the same example as above, let us assume that if this were a real application, the average inter-arrival time is 5 seconds. Using Little’s Law, we can now compute N as :

N = X /λ  = 2011.763 *  5 = 10058 users.

In other words, this application running on this same infrastructure can support more than 10,000 concurrent users with an inter-arrival time of 5 seconds.

What does this say for think times ? If we assume that the application (and infrastructure) will continue to perform in the same manner as the number of connected users increase (i.e it maintains the average response time of 0.006737 seconds), the the average think time is 4.993 seconds. If the response time degrades as load goes up (which is usually the case after a certain point), then the number of users supported will also correspondingly decrease.

A well-designed application can scale linearly to support 10’s or 100’s of thousands of users. In the case of large websites like Facebook , Ebay and Flickr, the applications scale to handle millions of users. But obviously, these companies have invested tremendously to ensure that their applications and software infrastructure can scale.

Little’s Law can be used to estimate the maximum number of concurrent users that your application can support. As such, it is a handy tool to get a quick, rough idea. For example, if Little’s Law indicates that the application can only support 10,000 users but your target is really 20,000 users, you know you have work to do to improve basic performance.

Shanti's Photo


Latest Tweets