Performance Blog

Archive for the ‘Uncategorized’ Category

I recently checked in a feature that allows fairly extensive comparisons of different runs in Faban. Although the ‘Compare’ button has been part of the Results list view for awhile, it has been broken for a long time. It finally works!

When to use Compare

The purpose of this feature is to compare runs performed at the same load level (aka Scale in faban) and  on the same benchmark rig. Perhaps you are tuning certain configs and/or code and are doing runs to analyze the performance differences between these changes. The Compare feature lets you look at multiple runs at the same time on multiple dimensions: throughput, average and 90% response times, average CPU utilization, etc. This gives a single page view that can quickly point out where one run differs from another.

How to use Compare

This is easy. On the results view in the dashboard, simply select the runs you want to compare using the check box at the left of each row. Then click the Compare button at the top of the screen.

The screen-shot below shows this operation:

Screen Shot 2013-04-01 at 2.54.09 PM

Comparison Results

The first part of the comparison report looks like the image below. The report combines tables with graphs to make the data relevant. For example, Run Information is a summary table that describes the runs, where as throughput is a graph that shows how the overall throughput varied during the length of the test for all runs.

Screen Shot 2013-04-01 at 2.52.53 PM












How can I get this code?

The code is currently in the main branch of the faban code on github. Fork it and try it out. Once I get some feedback, I will fix any issues and cut a new binary release.


Enterprise applications are typically tested for load, performance and scalability using a driver that emulates a real client by sending requests to the applications similar to what a real user would. For web applications, the client is usually a browser and the driver is a simple Http client. The emulated http clients can be extremely light-weight allowing one to run several hundred or even thousand (depending on the think time) driver agents on a single box. Of course, tools vary widely – some tools can be quite heavy-weight so it’s important to do a thorough evaluation first. But I digress.

As web pages got larger and incorporated more components, performance testing tools started including a recording tool which could capture the http requests as a user navigates the application using a regular browser. The driver agent would then “playback” the captured requests. Many tools also allow modification of the requests to allow for unique logins, cookies etc. to correctly emulate multiple users. Such a “record-and-playback”  methodology  is part of most enterprise load testing tools.

Today’s web applications are complex and sophisticated with tons of javascript that track and handle all sorts of mouse movements, multiple XHR  requests on the same page, persistent connections using the Comet model, etc. If javascript generates dynamic requests, composing the URLs on the fly, the recorded scripts will fail. Of course, if the performance testing tool provides a rich interface allowing the tester full flexibility to modify the load driver, it is still possible to create a driver that can drive these rich web2.0 applications.

Browser Agents

Increasingly, many in the performance community are abandoning the old-style http client drivers in favor of browser agents i.e. an agent/driver that runs an actual full-featured browser. The obvious advantage to going this route is the dramatic simplification of test scripts – you can give it a single URL and the browser will automatically fetch all of the components on the page.  If the page contains javascript that in turn generates more requests – no problem. The browser will handle it all.

But at what cost?

If you’re thinking that this sounds too easy, what’s the catch … you’re right. There is a price to pay for this ease of use in both CPU and memory resources on the test driver systems. A real browser can consume 10s to 100s of megabytes of memory and significant CPU resources as well. And this is just for driving a single user! Realistically, how many browsers can you run on a typical machine, especially considering that driver boxes are typically older, slower hardware?

So what can we do to mitigate this problem?

Emulated Browsers with Real Javascript Engine

A compromise solution is to use a thin browser that does not have all of the functionality of a real browser but does include a real javascript engine. An example is HtmlUnit, which is a Java library that is lighter-weight than a real browser like IE or Firefox. The caveat here is that your performance testing tool must provide the capability to make calls to arbitrary third-party libraries. Many tools have very simplistic scripting capability which may not allow using HtmlUnit.


Many people seem to think that just because they have javascript or XHR requests, they need to use a real browser for scalability testing. This is untrue – in almost all but the most complex cases, you can still use an emulated client (the exception is if you have requests that are generated from javascript based on complex logic that is not easy to reproduce). Keep in mind that the purpose of load/scalability testing is to measure throughput. To do so, you want the lightest possible client so you can run the maximum number of emulated users with the minimum amount of hardware. Using a real browser should be the last option to consider.

I finally found some time to re-design the Faban site. For those of you who haven’t heard of it, Faban is a performance test automation tool and framework.

The new site is obviously not the best in terms of design, aesthetics, consistency, etc. but at least it didn’t take a whole lot of time.

The one thing that I have found hard to learn is css. That’s probably one of the reasons I have put off doing the site for so long. But guess what? I still don’t know much css and yet was able to fairly quickly put together this site. Many thanks go to I started with the CleanContent template and tweaked it a bit to incorporate the banner and colors. But beyond that, it was just a matter of editing the text. I used SeaMonkey for that as I have found it to be pretty safe (unlike Microsoft Word) in not adding a whole lot of junk to the html. My goal was to keep the html really simple – check it out.

The Documentation style is now completely different – I did not dare touch it. There are just too many pages and I don’t want to go edit them. If someone has a suggestion on how  to improve the site without too much mucking around (or better yet provide the code!) I’d happy to go make improvements.


After 18 years at Sun (the last 6 at Oracle), I finally called it quits. Last week, I began anew at Yahoo!  I will still be focused on performance, but this time on end user performance.

At Sun, we were extremely focused on server-side performance. Our primary metric was throughput. We worried about scalability. Does Solaris scale on the maximum number of CPUs/threads that our largest system had ? What about the JVM ? And the appserver, and the webserver … you get the idea.

In the web world, things are quite different. The primary metric is response time. One could care less what the throughput on a particular server is – tens of thousands of servers are being deployed anyway. This mindset and architecture fascinate me. How do these large internet sites handle performance ? So I decided to find out. What better way, then to be part of one of the sites that sees the most traffic on the internet (see ComScore report).

I am part of the Exceptional Performance Team at Yahoo! This is the team that first brought YSlow to the community and is responsible for a whole host of tools to measure and analyze performance. I hope to contribute to this effort as well and of course, I will continue to blog about interesting performance issues that I encounter. Please do let me know if there are particular topics you would like to see on the Exceptional Performance blog.


Tags: ,

Velocity 2010 came to an end today. I attended all 3 days – it was a great conference. I did not attend last year, but the crowds this year must have been at least 3 times that of 2008, when I first presented at Velocity. Here are some of my thoughts on the conference.


Being a performance person, I am naturally biased towards performance topics. So, I’ll cover this first. All of the performance sessions at the conference can be summed up thus :

The biggest performance issue is the time it takes for the browser to load a web page (aka page load times). Here is technique x and trick y and hack z to help you fix this problem.

I learned a lot about how to optimize css, javascript, http headers etc. But I was still disappointed that there was hardly a whisper about how to optimize the server side. The claim is that of the total response time, the server takes tens or at most 100’s of milliseconds where as the client takes several seconds.  So where do you want to focus your energy on ? I can accept that. But that seems to pre-suppose that all web applications have a scalable architecture and have solved their server-side performance and scalability issues. I find that a little hard to believe.


As expected, the details of how Facebook, Yahoo and twitter run their operations was of great interest to the audience. With so much media now being served, I was surprised to see only one session on optimizing Video serving and even that was not well attended. There was hardly any talk about optimizing infrastructure. I can’t help wondering why web operations wouldn’t be interested in optimizing their infrastructure costs. After all, we’ve been hearing a lot lately about the cost of power, how data centers are going green, more efficient etc. Aren’t these things applicable to the web world as well (not just enterprise IT) ? Even more surprising, a very small portion of the audience said they were deployed on the cloud.

Neil Gunther and I presented a session  on Hidden Scalability Gotchas in Memcached and Friends.

We had a great audience with some folks squatting on the floor in the front and a standing-room only audience in the back. There was tremendous interest in applying the USL Model to accurate data to quantify scalability. If anyone has additional feedback or comments, I would love to hear them.


I was blown away by the plethora of tools, a good many of which I had never heard of. Firebug with various add-ons (YSlow, PageSpeed) set the trend on browser-side monitoring and now even commercial vendors have versions of their product (available for free !) to do the same. This is great news for developers. If you haven’t heard of HttpWatch,, webpagetest, check them out.DynaTrace announced a free end user response time monitoring tool as well.


One real cool product I came across was Strangeloop – this is an appliance that sits in front of your web server and optimizes the response page. It’s amazing that it can do so much javascript optimization resulting in dramatic reduction in latency. I can’t help wondering why browsers don’t do this ? Surely, Mozilla and Google have enough smart engineers to come up with a an optimized javascript interpreter. It will be interesting to watch.

The usual monitoring vendors were all there – Keynote, Gomez (now part of Compuware), Webmetrics, AppDynamics etc.


Tuesday was billed as “Workshop” day. However, there really weren’t any workshops – they were all just regular sessions just longer. I guess it’s hard to do workshops with several hundred people in the room. If Velocity really wants to do workshops, they need to have at least a dozen of them scheduled and they need to be longer.

On the whole, the conference was a great success, with sold out crowds, well attended and delivered sessions and lots of new products. Hope I can make it to Velocity 2011.

Many web applications are now moving to the cloud where configurations are difficult to understand; see for example, Amazon Web Services (AWS) definition of EC2 Compute Unit.  How does one determine how many instances of what type are required to run an application ? Typical capacity planning exercises start by doing measurements. So for example, one might test the targeted app by deploying on say a ec2 m1.small type (1 EC2 Compute Unit) and see how many users it can support. Based on performance metrics gathered during the test, one can estimate how many instances will be required (assuming of course that the application scales horizontally).

The Tests

To test this simplistic model, I fired up an ec2 m1.small instance running Ubuntu and started the apache web server. I used another instance as the load driver to repeatedly fetch a single helloworld php page  from the web server and scaled up the number of users from 1 to 50 in increments of 10. The test was written and driven by Faban, a versatile open-source performance testing tool.

The Measurements

On Unix systems, tools like vmstat and mpstat can be used to measure cpu utilization (amongst other things).  Faban automatically runs these tools on any test for the same duration as the test run allowing one to monitor the resource utilization during the test. (Note that on Ubuntu, mpstat is not installed by default but is available as part of the sysstat package).

The Results

Here is the throughput graph as the number of virtual users was scaled.

PHP ThroughputThe throughput peaks at 20 users and then flattens out (actually falls a little bit). Looking at this graph, one would expect that the cpu saturated around 20 users (assuming no other bottleneck which is a reasonable assumption for this extremely simple application).

Here is a snippet of the vmstat output captured during the run at 50 users :

procs ———–memory———- —swap– —–io—- -system– —-cpu—-
r  b        swpd        free   buff     cache   si   so    bi     bo      in      cs us sy id wa
2  0              0 661096  18596 961772    0    0     6   104 1108  826  3  2 87  0
1  0              0 658116  18616 964028    0    0     0   237 2788 1966 26  9  9  0
3  0             0 655856  18632 966296    0    0     0   244 2699 1959 24 12  6  0
2  0             0 653376  18652 968700    0    0     0   236 2943 2069 24 11  9  0
1  0             0 651020  18668 970972    0    0     0   240 2842 1963 25 10  6  0
2  0            0 648680  18688 973224    0    0     0   241 2763 1954 24 11  9  0

The user time (column under us) averages 24.6% and the system time (column under sy) averages 10.6% for a total time of 35.2%. But the idle time (column under id) is only around 8% – no wonder the throughput stopped increasing. But what is the discrepancy here ? If the user and system time are only 35%, where is the remaining time going ?

To understand that, take a look at the mpstat output snippet below :

12:57:43 AM  CPU   %user   %nice  %sys %iowait    %irq   %soft  %steal   %idle   intr/s
12:57:53 AM    all      25.73    0.00    8.71      0.00    0.00    0.30      55.96    9.31   2763.26
12:58:03 AM   all      24.21    0.00   11.31      0.00    0.00    0.40     58.93    5.16   2661.41
12:58:13 AM    all     24.09    0.00   10.07    0.00    0.00    0.89     55.58    9.38   2904.84
12:58:23 AM   all     25.15    0.00    9.34      0.00    0.00    0.89     59.05    5.57   2824.95
12:58:33 AM   all     23.78    0.00    9.99     0.00    0.00    1.20      56.14    8.89   2760.54
12:58:43 AM   all     22.26    0.00   11.58     0.00    0.00    0.40    54.79   10.98   2835.83

We can see that the %user ,%sys  and %idle column values match those shown by vmstat. But we see an additional utilization column – %steal which ranges from 55% to 59%. If you add this value to the user, sys and idle, we get 100%. So that’s where the missing time has gone – to %steal.

Who is stealing my CPU ?

But what exactly is %steal ? It is the time when your application had something ready to run but the CPU was busy servicing some other instance. Clearly, this is not the only application running on this CPU. The m1.small instance is defined as providing “1 EC2 Compute Unit“, not 1 CPU.

In this case, the 1 instance was worth about 35% of the single CPU that was on this system (an Intel Xeon E5430 @2.66GHz).

When looking at cpu utilization on EC2 (or any virtualized environments based on Xen), keep this in mind. Always consider %steal in addition to the user and system time.

I attended the Bay Area Startup Weekend in Mountain View this previous week-end. This was the first such event I attended and it was an amazing experience – so I thought I’d share it.

The idea behind the event was that a bunch of folks would show up, some of them would pitch ideas for new startups and the others would join them if they liked the idea and/or had the necessary skills to build it. The goal was to build a working prototype over the course of the week-end.

This seemed like an impossible task to me – not the part where you build a prototype but the idea that random people could come together and actually form a startup. And on talking to one of the organizers, he confirmed that the goal was really to form a community, help people get to know each other – sometimes the team does gel and a successful startup is formed.

Nevertheless, there were probably a 100 people at this event. And contrary to my thinking, a good chunk of them weren’t even developers. Some of them had ideas and were hoping that others would build a prototype for them. Others came to hang-out and learn and still others came to simply make connections.

In the spirit of things, I joined a team called “medilist”  which is a service aimed at caregivers to monitor their loved ones. Steve  Echman proposed the idea since a potential client wanted a system like this for the UK. He is a business/marketing type but I found him to be quite knowledgable. We also had Prashant Sachdev, an entrepreneur running from India on our team. Prashant ended up developing the front-end, while I did the back-end. We were going to use Twilio for getting input from the patient, but neither Prashant nor I knew anything about it. We hashed out the use case we would prototype on Friday night and I decided to tackle Twilio first thing on Saturday. Prashant worked into the wee hours figuring out how to hook Flex to MySQL.

Lucky for me, on Saturday we had Kevin Morrill join out team remotely from SFO and knew Twilio. That was great, as I could then focus on just the database and PHP application logic.

But as usual, things were’nt that simple. We lost half of Saturday over issues with the hosting site, versions of php,mysql etc. But on the plus side, we had Nutan Panwar, a user interface designer join us later in the day.Steve and Nutan colloborated on the page designs and by 10 or 10:30 PM we managed to get the dashboard working. It was pulling data from the database and displaying relevant fields.At this point, I called it quits for the day but Prashant and Nutan worked into the wee hours, Nutan recorded all the voice questions for twilio and Prashant was once again fighting with flex.

Sunday morning, things started moving. Kevin came down in person and quickly got the twilio app hooked up to the database. Prashant and I finished up the details page and by 7:00 PM we had our demo ready to roll. You can see what we put together for the caregiver app here. I can’t post the link to the phone app, as it inserts into the database, but what it does is call the specified phone number, asks a series of questions, gathers the responses and stores them in the database.

There were about a dozen teams presenting and most of what they put together was incredible – from a new twitter search (1st prize) to a jazzy looking demo for choosing what to wear, the apps ranged from useful, thoughtful to plain fun. The evening wrapped up around 10:00 PM.

On the whole, it was one of the most intense week-ends I have ever experienced.  I enjoyed every minute of it and may even consider doing it again. I learned about several products and companies, got some business tips, got to know my very nice team and hacked some code. Even if you are never going to do a startup. I highly recommend participating in an event like this – it really gets your adrenalin flowing and bring some passion to whatever you are doing. And who knows – perhaps you will do that startup !

Shanti's Photo


Latest Tweets