Performance Blog

Measuring CPU Utilization on EC2

Posted on: March 20, 2010

Many web applications are now moving to the cloud where configurations are difficult to understand; see for example, Amazon Web Services (AWS) definition of EC2 Compute Unit.  How does one determine how many instances of what type are required to run an application ? Typical capacity planning exercises start by doing measurements. So for example, one might test the targeted app by deploying on say a ec2 m1.small type (1 EC2 Compute Unit) and see how many users it can support. Based on performance metrics gathered during the test, one can estimate how many instances will be required (assuming of course that the application scales horizontally).

The Tests

To test this simplistic model, I fired up an ec2 m1.small instance running Ubuntu and started the apache web server. I used another instance as the load driver to repeatedly fetch a single helloworld php page  from the web server and scaled up the number of users from 1 to 50 in increments of 10. The test was written and driven by Faban, a versatile open-source performance testing tool.

The Measurements

On Unix systems, tools like vmstat and mpstat can be used to measure cpu utilization (amongst other things).  Faban automatically runs these tools on any test for the same duration as the test run allowing one to monitor the resource utilization during the test. (Note that on Ubuntu, mpstat is not installed by default but is available as part of the sysstat package).

The Results

Here is the throughput graph as the number of virtual users was scaled.

PHP ThroughputThe throughput peaks at 20 users and then flattens out (actually falls a little bit). Looking at this graph, one would expect that the cpu saturated around 20 users (assuming no other bottleneck which is a reasonable assumption for this extremely simple application).

Here is a snippet of the vmstat output captured during the run at 50 users :

procs ———–memory———- —swap– —–io—- -system– —-cpu—-
r  b        swpd        free   buff     cache   si   so    bi     bo      in      cs us sy id wa
2  0              0 661096  18596 961772    0    0     6   104 1108  826  3  2 87  0
1  0              0 658116  18616 964028    0    0     0   237 2788 1966 26  9  9  0
3  0             0 655856  18632 966296    0    0     0   244 2699 1959 24 12  6  0
2  0             0 653376  18652 968700    0    0     0   236 2943 2069 24 11  9  0
1  0             0 651020  18668 970972    0    0     0   240 2842 1963 25 10  6  0
2  0            0 648680  18688 973224    0    0     0   241 2763 1954 24 11  9  0

The user time (column under us) averages 24.6% and the system time (column under sy) averages 10.6% for a total time of 35.2%. But the idle time (column under id) is only around 8% – no wonder the throughput stopped increasing. But what is the discrepancy here ? If the user and system time are only 35%, where is the remaining time going ?

To understand that, take a look at the mpstat output snippet below :

12:57:43 AM  CPU   %user   %nice  %sys %iowait    %irq   %soft  %steal   %idle   intr/s
12:57:53 AM    all      25.73    0.00    8.71      0.00    0.00    0.30      55.96    9.31   2763.26
12:58:03 AM   all      24.21    0.00   11.31      0.00    0.00    0.40     58.93    5.16   2661.41
12:58:13 AM    all     24.09    0.00   10.07    0.00    0.00    0.89     55.58    9.38   2904.84
12:58:23 AM   all     25.15    0.00    9.34      0.00    0.00    0.89     59.05    5.57   2824.95
12:58:33 AM   all     23.78    0.00    9.99     0.00    0.00    1.20      56.14    8.89   2760.54
12:58:43 AM   all     22.26    0.00   11.58     0.00    0.00    0.40    54.79   10.98   2835.83

We can see that the %user ,%sys  and %idle column values match those shown by vmstat. But we see an additional utilization column – %steal which ranges from 55% to 59%. If you add this value to the user, sys and idle, we get 100%. So that’s where the missing time has gone – to %steal.

Who is stealing my CPU ?

But what exactly is %steal ? It is the time when your application had something ready to run but the CPU was busy servicing some other instance. Clearly, this is not the only application running on this CPU. The m1.small instance is defined as providing “1 EC2 Compute Unit“, not 1 CPU.

In this case, the 1 instance was worth about 35% of the single CPU that was on this system (an Intel Xeon E5430 @2.66GHz).

When looking at cpu utilization on EC2 (or any virtualized environments based on Xen), keep this in mind. Always consider %steal in addition to the user and system time.

About these ads

3 Responses to "Measuring CPU Utilization on EC2"

This is very helpful information.

Thank you.

Very useful article.
I want to create a similar test in order to fetch cpu and ram usage. I want to scale up users and keep the cpu and ram usage. For iexample for 100 users % cpu usage, % ram usage, for 200 %cpu , %ram etc. Your test seems to work like charm can you lend it to me?

Go download Faban from http://faban.org. It will automate the metric collection for your workload

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Pages

Latest Tweets

  • 15-20 min talks doesn't let speakers cover much - disappointed in IOT day thus far. @dataweeksf 5 days ago
  • Researchers Advance Artificial Intelligence for Player Goal Prediction in Gaming lnkd.in/bKzdHsd 1 week ago
  • RT @bgracely: And one more thing…. We've set up a special section of the Genius Bar for those of you that assume the Apple Watch is waterp… 1 week ago
  • Will iPay put Square out of business? 1 week ago

Categories

Archives

Follow

Get every new post delivered to your Inbox.

Join 229 other followers

%d bloggers like this: