Cloud computing is a significant shift in the way companies build and run IT resources. It promises pay-‐as-‐you-‐go economics and elastic capacity. Every major change in IT forces IT professionals to “rebalance” their application strategy—just look at client-‐server computing, the web, or mobile devices. Today, cloud computing is prompting a similar reconsidering of IT strategy. But it’s still early days for clouds. Many enterprises are skeptical of on-‐ demand computing, because it forces them to relinquish control over the underlying networks and architectures on which their applications run. In late 2009, performance monitoring firm Webmetrics approached us to write a study on cloud performance. We decided to assess several cloud platforms, across several dimensions, using Webmetrics’ testing services to collect performance data. Over the course of several months, we created test agents for five leading cloud providers that would measure network, CPU, and I/O constraints. We also analyzed five companies’ sites running on each of the five clouds. As you might imagine, this resulted in a considerable amount of data, which we then processed and browsed for informative patterns that would help us understand the performance and capacity of these platforms. This report is the result of that effort. Testing across several platforms is by its very nature imprecise. Different clouds require different programming techniques, so no two test agents were alike. Some clouds use large-‐scale data storage that’s optimized for quick retrieval; others rely on traditional databases. As a result, the data in this report should serve only as a guideline for further testing: your mileage will vary greatly.
First and foremost: there’s a lot to watch. Clouds can fail in many unexpected ways; here are some of the lessons we’ve learned.
Watch your neighbours. We’ve seen good evidence that several cloud applications slow down at once, so you’ll definitely be affected by others using the same cloud as you.
Understand the profile of your cloud. Different clouds are good at different tasks. You’ll need to choose the size of your virtual machines—in terms of CPU, memory, and so on—in order to deliver good performance.
You need an agent on the inside. When you plan a monitoring strategy, you need custom code that exercised back-‐end functions so you can triage the problem quickly.
Choose PaaS or IaaS. If you’re willing to re-code your application to take advantage of “big data” systems like Bigtable, you can scale well by choosing a PaaS cloud. On the other hand, if you need individual machines, you’ll have to build elasticity into your IaaS configuration.
Big data doesn’t come for free. Using large, shared data stores might seem nice; but it takes time to put the data in there, which may not be appropriate for your application’s usage patterns.
Monitor usage and governors. In PaaS, if you exceed your rate limits, your users will get errors.
Troubleshooting gets harder. You need data on the Internet as a whole, your cloud provider as a whole, and your individual application’s various tiers, in order to properly triage the problem. When you were running on dedicated infrastructure, you didn’t have to spend as much time eliminating third-‐party issues (such as contention for shared bandwidth or I/O blocking.)
PaaS means you’re in the same basket. We noticed that if you’re using a PaaS, when the cloud gets slow, everyone gets slow. With IaaS, there’s more separation of the CPU and the server’s responsiveness—but you’re still contending for shared storage and network bandwidth.
Watch several zones. When you rely on availability zones to distribute the risk of an outage, you’ll also need to deploy additional monitoring to compare those zones to one another.
Click here to download the white paper.