I finally found some time to re-design the Faban site. For those of you who haven’t heard of it, Faban is a performance test automation tool and framework.
The new site is obviously not the best in terms of design, aesthetics, consistency, etc. but at least it didn’t take a whole lot of time.
The one thing that I have found hard to learn is css. That’s probably one of the reasons I have put off doing the site for so long. But guess what? I still don’t know much css and yet was able to fairly quickly put together this site. Many thanks go to http://freehtml5templates.com. I started with the CleanContent template and tweaked it a bit to incorporate the banner and colors. But beyond that, it was just a matter of editing the text. I used SeaMonkey for that as I have found it to be pretty safe (unlike Microsoft Word) in not adding a whole lot of junk to the html. My goal was to keep the html really simple – check it out.
The Documentation style is now completely different – I did not dare touch it. There are just too many pages and I don’t want to go edit them. If someone has a suggestion on how to improve the site without too much mucking around (or better yet provide the code!) I’d happy to go make improvements.
When load testing an application, the first set of tests should focus on measuring the maximum throughput. This is especially true of multi-user, interactive applications like web applications. The maximum throughput is best measured by running a few emulated users with zero think time. This means that each emulated user sends a request, receives a response and immediately loops back to send the next request. Although this is artificial, it is the best way to quickly determine the maximum performance of the server infrastructure.
Once you have that throughput (say X), we can use Little’s Law to estimate the number of real simultaneous users that the application can support. In simple terms, Little’s Law states that :
N = X / λ
where N is the number of concurrent users, λ is the average arrival rate and X is the throughput. Note that the arrival rate is the inverse of the inter-arrival time i.e. the time between requests.
To understand this better, let’s take a concrete example from some tests I ran on a basic PHP script deployed in an apache server. The maximum throughput obtained was 2011.763 requests/sec with an average response time of 6.737 ms, an average think time of 0.003 secs when running 20 users. The arrival rate is the inverse of the inter-arrival time which is the sum of the response time and think time. In this case, X is .2011.763 and λ is 1/(0.006737 + 0.003). Therefore,
N = X / λ = 2011.763 * 0.009737 = 19.5885
This is pretty close to the actual number of emulated users which is 20.
Estimating Concurrent Users
This is all well and good, but how does this help us in estimating the number of real concurrent users (with non-zero think time) that the system can support ? Using the same example as above, let us assume that if this were a real application, the average inter-arrival time is 5 seconds. Using Little’s Law, we can now compute N as :
N = X /λ = 2011.763 * 5 = 10058 users.
In other words, this application running on this same infrastructure can support more than 10,000 concurrent users with an inter-arrival time of 5 seconds.
What does this say for think times ? If we assume that the application (and infrastructure) will continue to perform in the same manner as the number of connected users increase (i.e it maintains the average response time of 0.006737 seconds), the the average think time is 4.993 seconds. If the response time degrades as load goes up (which is usually the case after a certain point), then the number of users supported will also correspondingly decrease.
A well-designed application can scale linearly to support 10’s or 100’s of thousands of users. In the case of large websites like Facebook , Ebay and Flickr, the applications scale to handle millions of users. But obviously, these companies have invested tremendously to ensure that their applications and software infrastructure can scale.
Little’s Law can be used to estimate the maximum number of concurrent users that your application can support. As such, it is a handy tool to get a quick, rough idea. For example, if Little’s Law indicates that the application can only support 10,000 users but your target is really 20,000 users, you know you have work to do to improve basic performance.
Olio was developed by Sun Microsystems as a way to compare, measure and analyze the performance of various web2.0 technology stacks. We had a great collaboration with the RADLab in UC Berkeley and gave this project to Apache. However, with the take-over by Oracle, Sun was no longer willing to support the project. Many users continued to find and use Olio but no one (including big-name companies like VMWare who used it for their own benchmark) was willing to contribute to it. I’ve always felt that open source works only when there is big corporation support, but I digress.
Anyway, I’ve asked for the Apache Olio project to be wound down. For those who may still be interested in using it, I have now copied over the repository to github – feel free to fork it. I have also moved some of the documentation to the wiki.
For anyone considering moving a svn repository to git, git-svn was mostly painless. It preserves the history of the edits which is really great.
Yet another year has gone by marked by yet another Velocity conference last week. This year the crowds were even bigger – if this conference continues to grow in this manner, it will soon have to be held in Moscone!
I gave myself the week-end to sleep over this instead of rushing to publish ASAP so that I could gather my notes and reflect on the conference.
The high order bits
For me, the best day was the workshop day on Tuesday, specifically the mobile workshops in the afternoon. I did not attend Maximiliano’s session last year so I am very glad I did this year. I learned a ton and hope to put it to use as I increase my focus on the mobile web. It was clear from this as well as the earlier session by Ariya that the Velocity audience has not yet started to grapple with optimizing the mobile experience. Lots of very useful, meaty information in both these sessions, so check them out.
It was refreshing to see the emphasis on statistics with the two sessions by John Rauser of Amazon. John is obviously a very seasoned speaker and his workshop was very well received. It would be great to see a workshop next year that takes this a step further into a practical workshop on how to apply statistics in analyzing performance data, including a discussion of confidence intervals.
I would be amiss if I did not also mention the Ignite session on Little’s Law. It was a great way to present this topic for those who have never heard of it, so do check it out.
It seems the list of companies and products entering this market is growing day by day. These products optimize your site using a variety of technologies. Last year, Strangeloop led the pack but this year there were many more. I was particularly impressed by Cotendo. The company seems to have made a rapid rise in a very short time with advanced functionality that only very large sites have. Ditto for CloudFlare – I liked the CEO’s Ignite talk as well. If you are in the market for these type of products, I definitely recommend checking them out.
The low order bits
The myriad sponsored talks. It is one thing to have a sponsored session track (in fact many sessions in this track were well worth attending), but another to have them be part of the Keynote sessions. Considering that keynotes took up half the day on both days, I found a big chunk of them were worthless.
This conference also gets a low score for language. From when did it become okay to use foul language in conferences and especially in keynotes that were being steamed live? It seemed to start with Artur Bergman and many speakers after that seemed to think it was okay to drop the f-word every few minutes.
The number of women
If you looked around the room, there were very few women – I would estimate the female audience to be well less than 10%. I counted exactly 3 women speakers. At the Velocity summit earlier this year, the claim was that they wanted to increase the participation of women and minorities; I can’t help wonder what steps were taken to do that. With the new standards for foul language, good luck in pulling more women in.
Web pages today are becoming increasingly complex and it is now well recognized that simply measuring page load time does not represent the response time of a page.
But what exactly do we mean by response time? Terms such as time to interactivity, perceived performance, above the fold performance etc. have come into vogue. Let’s examine these in turn.
Time To Interactivity
In last year’s Velocity Conference, Nicholas Zakas of Yahoo! gave an excellent presentation on how the Yahoo! Front Page responsiveness was improved in which he focused on time to interactivity. In other words, when can a user actually interact with a page (say click on a link) is more important than ensuring that the entire page gets rendered. With pages increasingly containing animated multi-media content which the user may not care about, this definition may make sense. However, imagine a page whose primary purpose is to serve up links to other pages (e.g. news) and loads lots of images after onload. All the links appear first which means the user can interact with the site, yet the page has lots of white space where the images go – can we truly just measure the time to interactivity and claim this to be the response time of a page?
Another popular term, perceived performance is defined loosely as the time that the user perceives it takes for the page. This means that all the major elements of the page that the user can easily see must be rendered. By definition, this measurement is highly subjective. Perceptions differ – for e.g. one user may not miss the Chat pane in gmail while for another this is a very important feature. Further, again by definition, this metric is application dependent. Nevertheless, for a particular web application, it is possible for developers and performance engineers to agree on a definition of what is perceived performance and work towards measuring/improving it.
Above The Fold Performance
In the Velocity Online conference last week, Jake Brutlag of google proposed “Above the Fold Time” (AFT) as a method of measuring a more meaningful response time. This was defined as the time taken to render all of the static elements in the visible portion of the browser window. Jake and others have put in some serious thought and defined the algorithm to distinguish between the static and dynamic (e.g. ads) content on a page.
Clearly, one part of the AFT proposal is valid – one doesn’t really care about the content on the page that is not visible initially and which the user can only get to by scrolling. But measuring everything above the fold has its issues as well. Take for instance the new Yahoo! Mail Beta. Y! Mail now has extensive social features to enable one to connect to Facebook, Twitter, Messenger and endless third-party applications. It is arguable whether the user will expect all of these third-party links to be on the page before he “perceives” that his request is complete. The page still looks finished without those links.
In my opinion, we need to distinguish between the essential parts of the page vs the optional ones (A caveat here – although no one would argue that an ad is essential, it is essential for the page to look complete). Looking at just static vs dynamic pixels misses this point. The difficulty of course is that there is no uniform way to define what is essential – it once again becomes application/page specific.
But for now, that’s the way I am going – defining “perceived performance” on a case by case basis.
As many of you know, Faban is a free and open source benchmark development and automation framework. It was originally developed at Sun Microsystems, Inc. which made it available to the community under the CDDL license.
With the architect and lead developer Akara Sucharitakul and myself no longer working at Oracle (not to mention the demise of the project website without notice), we decided to host it at http://www.faban.org. The website isn’t pretty but at least it hosts all the documentation, a downloadable kit and a pointer to the source on github. In the coming weeks, I will work on organizing the site. A big thanks to all the folks who expressed concern about the future of this project. With your help, we can continue to support it.
If you are a faban user, please do join the new Faban Users forum at http://groups.google.com/group/faban-users.
I thought I’d continue the theme of my last post “A lesson in Validation” with a lesson in analysis. This one is mostly focused on networking – one of my primary new focus areas but anyone can benefit from the process and lessons learned.
Recently, I was analyzing the results of some tests run against Yahoo’s Malaysia Front Page. The response times were incredibly high and on digging down, it soon became apparent why. The time taken to retrieve the static objects was more than 100 times what it should have been.
Like most websites, Yahoo! uses geographically distributed CDNs to serve static content. Malaysia gets served out of Singapore which is pretty darned close so there should be no reason for extra long hops. A large response time to retrieve a static object can mean one of two things: the object is not being cached or it is getting routed to the wrong location.
Sure enough, nslookup showed that the ip address being used to access all the static objects was located in the USA. It was therefore no surprise that it was taking so long. Satisfied that I had found a routing problem, I contacted the DNS team and they said … you guessed it. “It’s not our problem”. They ran some of their own tests and stated that Malaysia was getting routed correctly to Singapore, therefore the problem must be with my tests.
Dig(1) to the rescue
Since I was using a third-party tool, I contacted the vendor to see if they could help. The support engineer promptly ran dig and found that all the requests were being resolved (but didn’t care that they weren’t being resolved correctly!) Here is the output of dig:
<<>> DiG 9.3.4 <<>> l1.yimg.com @GPNKULDB01 A +notrace +recurse ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1442 ;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 2, ADDITIONAL: 2 ;; QUESTION SECTION: ;l1.yimg.com. IN A ;; ANSWER SECTION: l1.yimg.com. 3068 IN CNAME geoycs-l.gy1.b.yahoodns.net. geoycs-l.gy1.b.yahoodns.net. 3577 IN CNAME fo-anyycs-l.ay1.b.yahoodns.net. fo-anyycs-l.ay1.b.yahoodns.net. 210 IN A 184.108.40.206 fo-anyycs-l.ay1.b.yahoodns.net. 210 IN A 220.127.116.11 fo-anyycs-l.ay1.b.yahoodns.net. 210 IN A 18.104.22.168 fo-anyycs-l.ay1.b.yahoodns.net. 210 IN A 22.214.171.124 fo-anyycs-l.ay1.b.yahoodns.net. 210 IN A 126.96.36.199 fo-anyycs-l.ay1.b.yahoodns.net. 210 IN A 188.8.131.52 ;; AUTHORITY SECTION: ay1.b.yahoodns.net. 159379 IN NS yf1.yahoo.com. ay1.b.yahoodns.net. 159379 IN NS yf2.yahoo.com. ;; ADDITIONAL SECTION: yf2.yahoo.com. 1053 IN A 184.108.40.206 yf1.yahoo.com. 1053 IN A 220.127.116.11 ;; Query time: 0 msec ;; SERVER: 172.16.37.138#53(172.16.37.138) ;; WHEN: Wed Jan 12 11:15:42 2011 ;; MSG SIZE rcvd: 270
We now had the ip address of the DNS Resolver – 172.16.37.138. Where was this located? nslookup showed:
** server can't find 18.104.22.168.in-addr.arpa.: NXDOMAIN
No luck – this was a private IP address. Back I went to Tech Support: “Can you please let me know what the public IP address is where this private IP is NATed to?”
And here’s what I got: “I confirmed with our NOC team that the public IP Address of the DNS server located at “Kuala Lumpur, Malaysia – VNSL” site is a.b.c.d. Hope this is helpful for you.”
(I replaced the actual ip address above for security). I promptly did another nslookup which showed the host name with KaulaLumpur in it. So it seemed that there was no problem on the testing side after all. But not so fast … hostnames can be bogus!
Geo-Locating a Server
We had to find out where exactly this ip address was located in the world. So the ip address was plugged into http://whatismyipaddress.com/ip-lookup and it came back with:
The test server was actually located in Canada, while appearing to be from KaulaLampur! No wonder our DNS servers were routing the requests to the US.
What seemed to be a problem with our routing, turned out to be a problem of the tool’s routing !
Moral of the story: Don’t trust any tool and validate, validate, validate!