A lesson in validation
Posted January 28, 2011on:
Those who have worked with me know how much I stress the importance of validation: validate your tools, workloads and measurements. Recently, an incident brought home yet again the importance of this tenet and I hope my narrating this story will prove useful to you as well.
For the purposes of keeping things confidential, I will use fictional names. We were tasked with running some performance measurements on a new version of a product called X. The product team had made considerable investment into speeding things up and now wanted to see what the fruits of their labor had produced. The initial measurements seemed good but since performance is always relative, they wanted to see comparisons against another version Y. So similar tests were spun up on Y and sure enough X was faster than Y. The matter would have rested there, if it weren’t for the fact that the news quickly spread and we were soon asked for more details.
Firebug to the rescue
At this point, we took a pause and wondered: Are we absolutely sure X is faster than Y? I decided to do some manual validation. That night, connecting via my local ISP from home, I used firefox to do the same operations that were being performed in the automated performance tests. I launched firebug and started looking at waterfalls in the Net panel.
As you can probably guess, what I saw was surprising. The first request returned a page that caused a ton of object retrievals. The onload time reported by firebug was only a few seconds, yet there was no page complete time!
The page seemed complete and I could interact with it. But the fact that firebug could not determine Page Complete was a little disconcerting. I repeated the exercise using HttpWatch just to be certain and it reported exactly the same thing.
Time to dig down deeper. Taking a look at the individual object requests, one in particular was using the Comet model and it never completed. On waiting a little longer, there were other requests being sent by the browser periodically. Neither of these request types however had any visual impact on the page. Since requests were continuing to be made, firebug obviously thought that the page was not complete.
Page Complete or Not?
This begged the question: how did the automated tests run and how did they determine when the page was done? There was a timeout set for each request, but if the request was terminating because of the timeout, we surely would have noticed since the response times reported would have been the timeout value. In fact, the response time being reported was less than half the timeout value.
So we started digging into the waterfalls of some of the automated test results. Lo and behold – a significant component of the response time was the HTTP Push (also known as HTTP Streaming) one. There were also several of the sporadic requests that were being made well after the page was complete. This resulted in arbitrary response times for Y being reported.
It turned out that the automated tool was actually quite sophisticated. It doesn’t just use a standard timeout for the entire request. Instead it monitors the network and if no activity is detected for 3 seconds, it considers the request complete. So it captured some of the streaming and other post-PageComplete requests and returned when the pause between them was more than 3 seconds. That is why we thought we were seeing “valid” response times which looked reasonable and had us fooled!
Of course, this leads to the big discussion of when exactly do we consider a http request as complete? I don’t want to get into that now as my primary purpose of this article is to point out the importance of validation in performance testing. If we had taken the time to validate the results of the initial test runs, this problem would have been detected a long time ago and lots of cycles could have been saved (not to mention the embarrassment of admitting to others that the initial results reported were wrong !)