Performance Blog

Olio on github

Posted by: perfwork on: July 27, 2011

Olio was developed by Sun Microsystems as a way to compare, measure and analyze the performance of various web2.0 technology stacks. We had a great collaboration with the RADLab in UC Berkeley and gave this project to Apache. However, with the take-over by Oracle, Sun was no longer willing to support the project. Many users continued to find and use Olio but no one (including big-name companies like VMWare who used it for their own benchmark) was willing to contribute to it. I’ve always felt that open source works only when there is big corporation support, but I digress.

Anyway, I’ve asked for the Apache Olio project to be wound down. For those who may still be interested in using it, I have now copied over the repository to github - feel free to fork it. I have also moved some of the documentation to the wiki.

For anyone considering moving a svn repository to git, git-svn was mostly painless. It preserves the history of the edits which is really great.

Velocity 2011

Posted by: perfwork on: June 20, 2011

Yet another year has gone by marked by yet another Velocity conference last week. This year the crowds were even bigger – if this conference continues to grow in this manner, it will soon have to be held in Moscone!

I gave myself the week-end to sleep over this instead of rushing to publish ASAP so that I could gather my notes and reflect on the conference.

The high order bits

Mobile

For me, the best day was the workshop day on Tuesday, specifically the mobile workshops in the afternoon.  I did not attend Maximiliano’s session last year so I am very glad I did this year. I learned a ton and hope to put it to use as I increase my focus on the mobile web. It was clear from this as well as the earlier session by Ariya that the Velocity audience has not yet started to grapple with optimizing the mobile experience.  Lots of very useful, meaty information in both these sessions, so check them out.

Statistics

It was refreshing to see the emphasis on statistics with the two sessions by John Rauser of Amazon. John is obviously a very seasoned speaker and his workshop was very well received.  It would be great to see a workshop next year that takes this a step further into a practical workshop on  how to apply statistics in analyzing performance data, including a discussion of confidence intervals.

I would be amiss if I did not also mention the Ignite session on Little’s Law. It was a great way to present this topic for those who have never heard of it, so do check it out.

Dynamic Optimization

It seems the list of companies and products entering this market is growing day by day. These products optimize your site using a variety of technologies. Last year, Strangeloop led the pack but this year there were many more. I was particularly impressed by Cotendo. The company seems to have made a rapid rise in a very short time with advanced functionality that only very large sites have. Ditto for CloudFlare – I liked the CEO’s Ignite talk as well. If you are in the market for these type of products, I definitely recommend checking them out.

The low order bits

Sponsored keynotes

The myriad sponsored talks. It is one thing to have a sponsored session track (in fact many sessions in this track were well worth attending), but another to have them be part of the Keynote sessions. Considering that keynotes took up half the day on both days, I found a big chunk of them were worthless.

The language

This conference also gets a low score for language. From when did it become okay to use foul language in conferences and especially in keynotes that were being steamed live? It seemed to start with Artur Bergman and many speakers after that seemed to think it was okay to drop the f-word every few minutes.

The number of women

If you looked around the room, there were very few women – I would estimate the female audience to be well less than 10%. I counted exactly 3 women speakers. At the Velocity summit earlier this year, the claim was that they wanted to increase the participation of women and minorities; I can’t help wonder what steps were taken to do that. With the new standards for foul language, good luck in pulling more women in.

Measuring Page Performance

Posted by: perfwork on: March 24, 2011

Web pages today are becoming increasingly complex and it is now well recognized that simply measuring page load time does not represent the  response time of a page.

But what exactly do we mean by response time? Terms such as time to interactivity, perceived performance, above the fold performance etc. have come into vogue. Let’s examine these in turn.

Time To Interactivity

In last year’s Velocity Conference, Nicholas Zakas of Yahoo! gave an excellent presentation on how the Yahoo! Front Page responsiveness was improved in which he focused on time to interactivity. In other words, when can a user actually interact with a page (say click on a link) is more important than ensuring that the entire page gets rendered. With pages increasingly containing animated multi-media content which the user may not care about, this definition may make sense. However,  imagine a page whose primary purpose is to serve up links to other pages (e.g. news) and loads lots of images after onload. All the links appear first which means the user can interact with the site, yet the page has lots of white space where the images go – can we truly just measure the time to interactivity and claim this to be the response time of a page?

Perceived Performance

Another popular term, perceived performance is defined loosely as the time that the user perceives it takes for the page. This means that all the major elements of the page that the user can easily see must  be rendered. By definition, this measurement is highly subjective. Perceptions differ – for e.g. one user may not miss the Chat pane in gmail while for another this is a very important feature. Further, again by definition, this metric is application dependent. Nevertheless, for a particular web application, it is possible for developers and performance engineers to agree on a definition of what is perceived performance and work towards measuring/improving it.

Above The Fold Performance

In the Velocity Online conference last week, Jake Brutlag of google proposed  “Above the Fold Time” (AFT) as a method of measuring a more meaningful response time. This was defined as the time taken to render all of the static elements in the visible portion of the browser window. Jake and others have put in some serious thought and defined the algorithm to distinguish between the static and dynamic (e.g. ads) content on a page.

Clearly, one part of the AFT proposal is valid -  one doesn’t really care about the content on the page that is not visible initially and which the user can only get to by scrolling. But measuring everything above the fold has its issues as well. Take for instance the new Yahoo! Mail Beta. Y! Mail now has extensive social features to enable one to connect to Facebook, Twitter, Messenger and endless third-party applications. It is arguable whether the user will expect all of these third-party links to be on the page before he “perceives” that his request is complete.  The page still looks finished without those links.

In my opinion, we need to distinguish between the essential parts of the page vs the optional ones (A caveat here – although no one would argue that an ad is essential, it is essential for the page to look complete).  Looking at just static vs dynamic pixels misses this point. The difficulty of course is that there is no uniform way to define what is essential – it once again becomes application/page specific.

But for now, that’s the way I am going – defining “perceived performance” on a case by case basis.

Announcing faban.org

Posted by: perfwork on: February 15, 2011

As many of you know, Faban is a free and open source benchmark development and automation framework. It was originally developed at Sun Microsystems, Inc. which made it available to the community under the CDDL license.

With the architect and lead developer Akara Sucharitakul and myself no longer working at Oracle (not to mention the demise of the project website without notice), we decided to host it at http://www.faban.org. The website isn’t pretty but at least it hosts all the documentation,  a downloadable kit and a pointer to the source on github.  In the coming weeks, I will work on organizing the site. A big thanks to all the folks who expressed concern about the future of this project. With your help, we can continue to support it.

If you are a faban user, please do join the new Faban Users forum at http://groups.google.com/group/faban-users.

 

Tags:

A lesson in Analysis

Posted by: perfwork on: February 11, 2011

I thought I’d continue the theme of my last post “A lesson in Validation” with a lesson in analysis. This one is mostly focused on networking – one of my primary new focus areas but anyone can benefit from the process and lessons learned.

Recently, I was analyzing the results of  some tests run against Yahoo’s Malaysia Front Page. The response times were incredibly high and on digging down, it soon became apparent why. The time taken to retrieve the static objects was more than 100 times what it should have been.

CDN Magic

Like most websites, Yahoo! uses geographically distributed CDNs to serve static content. Malaysia gets served out of Singapore which is pretty darned close so there should be no reason for extra long hops. A large response time to retrieve a static object can mean one of two things: the object is not being cached or it is getting routed to the wrong location.

Sure enough, nslookup showed that the ip address being used to access all the static objects was located in the USA. It was therefore no surprise that it was taking so long. Satisfied that I had found a routing problem, I contacted the DNS team and they said … you guessed it. “It’s not our problem”. They ran some of their own tests and stated that Malaysia was getting routed correctly to Singapore, therefore the problem must be with my tests.

Dig(1) to the rescue

Since I was using a third-party tool, I contacted the vendor to see if they could help. The support engineer promptly ran dig and found that all the requests were being resolved (but didn’t care that they weren’t being resolved correctly!)  Here is the output of dig:

<<>> DiG 9.3.4 <<>> l1.yimg.com @GPNKULDB01 A +notrace +recurse
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1442
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 2, ADDITIONAL: 2 

;; QUESTION SECTION:
;l1.yimg.com.	 IN	A 

;; ANSWER SECTION:
l1.yimg.com.	 3068	IN	CNAME	geoycs-l.gy1.b.yahoodns.net.
geoycs-l.gy1.b.yahoodns.net. 3577 IN	CNAME	fo-anyycs-l.ay1.b.yahoodns.net.
fo-anyycs-l.ay1.b.yahoodns.net.	210 IN	A	98.137.88.34
fo-anyycs-l.ay1.b.yahoodns.net.	210 IN	A	98.137.88.84
fo-anyycs-l.ay1.b.yahoodns.net.	210 IN	A	98.137.88.83
fo-anyycs-l.ay1.b.yahoodns.net.	210 IN	A	98.137.88.37
fo-anyycs-l.ay1.b.yahoodns.net.	210 IN	A	98.137.88.36
fo-anyycs-l.ay1.b.yahoodns.net.	210 IN	A	98.137.88.35 

;; AUTHORITY SECTION:
ay1.b.yahoodns.net.	159379	IN	NS	yf1.yahoo.com.
ay1.b.yahoodns.net.	159379	IN	NS	yf2.yahoo.com. 

;; ADDITIONAL SECTION:
yf2.yahoo.com.	 1053	IN	A	68.180.130.15
yf1.yahoo.com.	 1053	IN	A	68.142.254.15 

;; Query time: 0 msec
;; SERVER: 172.16.37.138#53(172.16.37.138)
;; WHEN: Wed Jan 12 11:15:42 2011
;; MSG SIZE rcvd: 270

We now had the ip address of the DNS Resolver – 172.16.37.138. Where was this located? nslookup showed:

** server can't find 138.37.16.172.in-addr.arpa.: NXDOMAIN

No luck – this was a private IP address. Back I went to Tech Support:  “Can you please let me know what the public IP address is where this private IP is NATed to?”

And here’s what I got:  “I confirmed with our NOC team that the public IP Address of the DNS server located at “Kuala Lumpur, Malaysia – VNSL” site is a.b.c.d. Hope this is helpful for you.”

(I replaced the actual ip address above for security).  I promptly did another nslookup which showed the host name with KaulaLumpur in it. So it seemed that there was no problem on the testing side after all.  But not so fast … hostnames can be bogus!

Geo-Locating a Server

We had to find out where exactly this ip address was located in the world. So the ip address was plugged into http://whatismyipaddress.com/ip-lookup and it came back with:

Geolocation Information

Country: Canada ca flag
State/Region: Quebec
City: Montreal
Latitude: 45.5
Longitude: -73.5833
Area Code:
Postal Code: h3c6w2

The test server was actually located in Canada, while appearing to be from KaulaLampur!  No wonder our DNS servers were routing the requests to the US.

What seemed to be a problem with our routing, turned out to be a problem of the tool’s routing !

Moral of the story: Don’t trust any tool and validate, validate, validate!

A lesson in validation

Posted by: perfwork on: January 28, 2011

Those who have worked with me know how much  I stress the importance of validation: validate your tools, workloads and measurements. Recently, an incident brought home yet again the importance of this tenet and I hope my narrating this story will prove useful to you as well.

For the purposes of keeping things confidential, I will use fictional names. We were tasked with running some performance measurements on a new version of a product called X. The product team had made considerable investment into speeding things up and now wanted to see what the fruits of their labor had produced. The initial measurements seemed good but since performance is always relative, they wanted to see comparisons against another version Y. So similar tests were spun up on Y and sure enough X was faster than Y. The matter would have rested there, if it weren’t for the fact that the news quickly spread and we were soon asked for more details.

Firebug to the rescue

At this point, we took a pause and wondered: Are we absolutely sure X is faster than Y? I decided to do some manual validation. That night, connecting via my local ISP from home, I used firefox to do the same operations that were being performed in the automated performance tests. I launched firebug and started looking at waterfalls in the Net panel.

As you can probably guess, what I saw was surprising. The first request returned a page that caused a ton of object retrievals. The onload time reported by firebug was only a few seconds, yet there was no page complete time!

The page seemed complete and I could interact with it. But the fact that firebug could not determine Page Complete was a little disconcerting. I repeated the exercise using HttpWatch just to be certain and it reported exactly the same thing.

Time to dig down deeper. Taking a look at the individual object requests, one in particular was using the Comet model and it never completed. On waiting a little longer, there were other requests being sent by the browser periodically. Neither of these request types however had any visual impact on the page. Since requests were continuing to be made, firebug obviously thought that the page was not complete.

Page Complete or Not?

This begged the question: how did the automated tests run and how did they determine when the page was done? There was a timeout set for each request, but if the request was terminating because of the timeout, we surely would have noticed since the response times reported would have been the timeout value. In fact, the response time being reported was less than half the timeout value.

So we started digging into the waterfalls of some of the automated test results. Lo and behold – a significant component of the response time was the HTTP Push (also known as HTTP Streaming) one.  There were also several of the sporadic requests that were being made well after the page was complete. This resulted in arbitrary response times for Y being reported.

It turned out that the automated tool was actually quite sophisticated. It doesn’t just use a standard timeout for the entire request. Instead it monitors the network and if no activity is detected for 3 seconds, it considers the request complete. So it captured some of the streaming and other post-PageComplete requests and returned when the pause between them was more than 3 seconds. That is why we thought we were seeing “valid” response times which looked reasonable and had us fooled!

Of course, this leads to the big discussion of when exactly do we consider  a http request as complete? I don’t want to get into that now as my primary purpose of this article is to point out the importance of validation in performance testing. If we had taken the time to validate the results of the initial test runs, this problem would have been detected a long time ago and lots of cycles could have been saved (not to mention the embarrassment of admitting to others that the initial results reported were wrong !)

Tools for Web Performance Analysis

Posted by: perfwork on: January 21, 2011

At Yahoo!, I’m currently focused on analysis of end user performance. This is a little different than what I’ve worked on in the past, which was mainly server-side performance and scalability. The new focus requires a new list of tools so I thought I’d use this post to share the tools I’ve been learning and using in the past couple of months.

HttpWatch

This made the top of my list and I use it almost every day. Although it has features very similar to firebug, it has two features that I find very useful – the ability to save the waterfall data directly to a csv file and a stand-alone HttpWatch Studio tool that easily loads previously saved data and reconstructs the waterfall ( I know you can export Net data from firebug, but only in HAR format). And best of all, HttpWatch works with both IE and Firefox. The downside is that it works only on Windows and it’s not free.

Firebug

This is everyone’s favorite tool and I love it too. Great for debugging as well as performance analysis. It is a little buggy though – I get frustrated when after starting Firebug, I go to a URL expecting it to capture my requests, only to find that it disappears on me. I end up keeping firebug on all the time and this can get annoying.

HttpAnalyzer

This is also a Windows-only commercial tool – it is similar to Wireshark. It’s primary focus is http however, so it is easier to use than Wireshark. Since it sits at the OS level, it captures all traffic, irrespective of which browser or application is making the http request. As such, it’s a great tool for analyzing non-browser based http client applications.

Gomez

Yet another commercial tool, but considering our global presence and the dozens of websites that need to  be tested from different locations, we need a robust, commercial tool that can meet our synthetic testing and monitoring requirements. Gomez has pretty good coverage across the world.

I have a love-hate relationship with Gomez. I love the fact that I can do the testing I want at both backbone and  last mile, but I hate it’s user interface and limited data visualization. We have to resort to extracting the data using web services and do the analysis and visualization ourselves. I really can’t complain too much – since I didn’t have to build these tools myself !

Command-line tools

Last, but not the least, I rely heavily on standard Unix command-line tools like nslookup, DIG, curl, ifconfig, netstat, etc. And my favorite text processing tools remain sed and awk. Every time I say that, people shake their heads or roll their eyes. But we can agree to dis-agree without getting into language wars I think.

Tags:

Starting anew

Posted by: perfwork on: October 22, 2010

After 18 years at Sun (the last 6 at Oracle), I finally called it quits. Last week, I began anew at Yahoo!  I will still be focused on performance, but this time on end user performance.

At Sun, we were extremely focused on server-side performance. Our primary metric was throughput. We worried about scalability. Does Solaris scale on the maximum number of CPUs/threads that our largest system had ? What about the JVM ? And the appserver, and the webserver … you get the idea.

In the web world, things are quite different. The primary metric is response time. One could care less what the throughput on a particular server is – tens of thousands of servers are being deployed anyway. This mindset and architecture fascinate me. How do these large internet sites handle performance ? So I decided to find out. What better way, then to be part of one of the sites that sees the most traffic on the internet (see ComScore report).

I am part of the Exceptional Performance Team at Yahoo! This is the team that first brought YSlow to the community and is responsible for a whole host of tools to measure and analyze performance. I hope to contribute to this effort as well and of course, I will continue to blog about interesting performance issues that I encounter. Please do let me know if there are particular topics you would like to see on the Exceptional Performance blog.

 

Tags: ,

Velocity 2010 Impressions

Posted by: perfwork on: June 24, 2010

Velocity 2010 came to an end today. I attended all 3 days – it was a great conference. I did not attend last year, but the crowds this year must have been at least 3 times that of 2008, when I first presented at Velocity. Here are some of my thoughts on the conference.

Performance

Being a performance person, I am naturally biased towards performance topics. So, I’ll cover this first. All of the performance sessions at the conference can be summed up thus :

The biggest performance issue is the time it takes for the browser to load a web page (aka page load times). Here is technique x and trick y and hack z to help you fix this problem.

I learned a lot about how to optimize css, javascript, http headers etc. But I was still disappointed that there was hardly a whisper about how to optimize the server side. The claim is that of the total response time, the server takes tens or at most 100′s of milliseconds where as the client takes several seconds.  So where do you want to focus your energy on ? I can accept that. But that seems to pre-suppose that all web applications have a scalable architecture and have solved their server-side performance and scalability issues. I find that a little hard to believe.

Operations

As expected, the details of how Facebook, Yahoo and twitter run their operations was of great interest to the audience. With so much media now being served, I was surprised to see only one session on optimizing Video serving and even that was not well attended. There was hardly any talk about optimizing infrastructure. I can’t help wondering why web operations wouldn’t be interested in optimizing their infrastructure costs. After all, we’ve been hearing a lot lately about the cost of power, how data centers are going green, more efficient etc. Aren’t these things applicable to the web world as well (not just enterprise IT) ? Even more surprising, a very small portion of the audience said they were deployed on the cloud.

Neil Gunther and I presented a session  on Hidden Scalability Gotchas in Memcached and Friends.

We had a great audience with some folks squatting on the floor in the front and a standing-room only audience in the back. There was tremendous interest in applying the USL Model to accurate data to quantify scalability. If anyone has additional feedback or comments, I would love to hear them.

Tools

I was blown away by the plethora of tools, a good many of which I had never heard of. Firebug with various add-ons (YSlow, PageSpeed) set the trend on browser-side monitoring and now even commercial vendors have versions of their product (available for free !) to do the same. This is great news for developers. If you haven’t heard of HttpWatch, showslow.com, webpagetest, check them out.DynaTrace announced a free end user response time monitoring tool as well.

Products

One real cool product I came across was Strangeloop – this is an appliance that sits in front of your web server and optimizes the response page. It’s amazing that it can do so much javascript optimization resulting in dramatic reduction in latency. I can’t help wondering why browsers don’t do this ? Surely, Mozilla and Google have enough smart engineers to come up with a an optimized javascript interpreter. It will be interesting to watch.

The usual monitoring vendors were all there – Keynote, Gomez (now part of Compuware), Webmetrics, AppDynamics etc.

Misc.

Tuesday was billed as “Workshop” day. However, there really weren’t any workshops – they were all just regular sessions just longer. I guess it’s hard to do workshops with several hundred people in the room. If Velocity really wants to do workshops, they need to have at least a dozen of them scheduled and they need to be longer.

On the whole, the conference was a great success, with sold out crowds, well attended and delivered sessions and lots of new products. Hope I can make it to Velocity 2011.

Quick Throughput Experiment

Posted by: perfwork on: June 21, 2010

One of the first things we performance engineers do with a new server application is to conduct a quick throughput experiment. The goal is to find the maximum throughput that the server can deliver. In many cases, it is important that the server be capable of delivering this throughput with a certain response time bound. Thus, we always qualify the throughput with an average and 90th percentile response time (i.e. we want 90% of the requests to execute within the stated time). Any decent workload should therefore measure both the throughput and response time.

Let us assume we have such a workload. How best to estimate the maximum throughput within the required response time bounds ? The easiest way to conduct such an experiment is to run a bunch of clients (emulated users, virtual users or vusers) to drive load against the target server without any think time. Here is how the flow from a vuser will look like :

Create Request ==> Send Request ==> Receive Response ==> Log statistics

This sequence of operations is executed repeatedly (without any pauses in between i.e. no think times) for a sufficient length of time to get statistically valid results. So, to find the maximum throughput, run a series of tests, each time increasing the number of clients. Simple, isn’t it ?

A little while ago, I realized that if one doesn’t have the proper training, this isn’t that simple. I came across such an experiment with the following results :

VUsers Throughput

Requests/sec

5000 88318
10000 88407
20000 88309
25000 88429
30000 88392
35000 88440

What is wrong with these results ?
Firstly, the throughput is roughly the same at all loads. This probably means that the system saturated even below the base load level of 5,000 Vusers. Recall, that the workload does not have a think time. When you have this many users repeatedly submitting requests, the server is certain to be overloaded. I must mention that the server in this case is a single system with 12 cores having hyper-threading enabled. A multi-threaded server application typically will use one or more threads to receive requests from the network, then hand the request to a worker thread for processing. Considering the context-switching, waiting, locking etc. one can assume that at most one can run 4x the number of cores or in this case about 96 server threads. Since each Vuser submits a request and waits for a response, it probably requires 2-2.5x the number of Vusers as the number of server threads to saturate a system. Using this rule of thumb, one would need to run a maximum of 200-250 Vusers.

Throughput with Correct Vusers

After I explained the above, the tests were re-run with the following results:

VUsers Throughput
1 1833
10 18226
50 74684
100 86069
200 88455
300 88375

Notice that the maximum throughput is still nearly the same as from the previous set, but it has been achieved with a much lower number of Vusers (aka clients). So does it really matter ? Doesn’t it look better to say that the server could handle 35000 connections rather than 300 ? No, it doesn’t. The reason becomes obvious if we take a look at the response times.

The Impact of Response Times

The graph below shows how the 90% Response Time varied for both sets of experiments :

90% Response Time (Minimal Vusers)

The response times for the first experiment with very large number of Vusers ranges in the hundreds of millisecs. When the number of Vusers was pared down to just reach saturation, the server responded hundred times faster ! Intuitively too, this makes sense. If the server is inundated with requests, they are just going to queue up. The longer a request waits for processing, the larger is it’s response time.

Summary

When doing throughput performance experiments, it is important to take into consideration the type of server application, the hardware characteristics etc. and run appropriate load levels. Otherwise, although you may be able to find out what the maximum throughput is, you will have no idea what the response time is.

Pages

Latest Tweets

Categories

Archives

Follow

Get every new post delivered to your Inbox.