The document summarizes measurements and analysis of various cloud computing platforms. It finds that cloud uptime is generally good and improving. Location and proximity matter - response times increase with distance from data centers. Over time, some cloud providers like Google have expanded their global coverage and reduced response times. Routing of requests can be location-dependent. Large cloud providers like Amazon and Google have demonstrated the ability to scale infrastructure globally on demand.
1. Cloud Encounters Measuring the Computing Cloud Dr. Peter van Eijk Independent IT professional Session #616 CMG’09, see www.cmg.org
2.
3.
4. What clouds? Cloud Model Fast deployment of virtual servers Main cost unit Virtual server Examples Amazon
5. What clouds? Cloud Model Hosted applications Main cost unit User Account / Month Examples Google Sites Google Apps Salesforce.com LinkedIn
6. What clouds? Cloud Model (Static) Content Distribution Network Main cost unit Delivery of single image, movie Examples Akamai Amazon Cloudfront Rackspace Cloudfiles
7. What clouds? Cloud Model Distributed, scalable processing Main cost unit API call, Chunk of code Examples Google App Engine Windows Azure (FaceBook API)
8.
9. Cloud uptime is pretty good, and getting better Each bar represents > 7000 test requests using world-wide monitoring stations by WatchMouse
10.
11.
12.
13. Connect time to a single hosted site increases with distance. Clouds can be everywhere 0 50 100 150 200 250 300 350 nl2 nl4 de fr dk uk fr2 us-ny us-fl us-sf sg hk nz Monitoring station Milliseconds to connect app engine09 flat site .nl Cloud has Asian presence Cloud has European presence New Zealand is really far away from Europe Connect time to the Google App Engine cloud does not.
14. Clouds evolve over time 0 50 100 150 200 250 Brazil Chicago Florida New York San Francisco Danmark France Germany Netherlands UK India Hong Kong Melbourne Sydney Monitoring station Milliseconds to connect dec-08 apr-09 sep-09 Google’s coverage has increased in ‘09 Missed real location of Denmark in ‘08 Suboptimal routing in Australia Europe has good Internet Hong Kong had a routing issue in Sept San Francisco. Puzzling? 0 20 40 60 80 100 120 140 160
20. Session #616 Thank you for your attention Any Questions? (see next slides) @petersgriddle www.petersgriddle.net
21.
22.
23.
24.
25.
26.
27.
28.
29.
Notas do Editor
In the recent years a number of computing infrastructures have evolved that live “somewhere on the internet”. Because of their intangible nature, these infrastructures are collectively called computing clouds. This presentation reports on a number of measurements of their quality and location.
Cloud computing is a generic term, and signifies a number of distinct approaches to computing. Included in that spectrum are the following main approaches. Amazon’s EC2, (elastic cloud computing) is a platform for fast deployment of virtual servers. You basically get an operating system with an IP address. Beyond a single instance, customers need to take care of sizing issues themselves. A more traditional approach, in a way, is formed by specific hosted applications. The service is basically an account to log in to, and ‘application dialtone’. The provider takes care of all scaling issues. Examples of these are Google Apps, Google Sites, and Salesforce.com. Arguably, most interactive websites including social media websites, can be considered as hosted applications. Also early in the game are content distribution networks. These store digital assets such as pictures, video, etcetera on the edge of the network, so they can be delivered quickly and scalable to large numbers of client. The emphasis is on data, not applications. If you want to call this application hosting it is about static web content hosting. Sizing is typically a provider responsibility. More technically advanced are distributed processing platforms, of which Google App Engine is a prime example. This platform allows the customer to execute ‘chunks of code’. The programmer has no control over the physical location where this code is executed. It could be in multiple locations, as we will see later. Interestingly, some social media applications also have some elements of this model, as they provide an API to 3 rd party developers.
Cloud computing is a generic term, and signifies a number of distinct approaches to computing. Included in that spectrum are the following main approaches. Amazon’s EC2, (elastic cloud computing) is a platform for fast deployment of virtual servers. You basically get an operating system with an IP address. Beyond a single instance, customers need to take care of sizing issues themselves. A more traditional approach, in a way, is formed by specific hosted applications. The service is basically an account to log in to, and ‘application dialtone’. The provider takes care of all scaling issues. Examples of these are Google Apps, Google Sites, and Salesforce.com. Arguably, most interactive websites including social media websites, can be considered as hosted applications. Also early in the game are content distribution networks. These store digital assets such as pictures, video, etcetera on the edge of the network, so they can be delivered quickly and scalable to large numbers of client. The emphasis is on data, not applications. If you want to call this application hosting it is about static web content hosting. Sizing is typically a provider responsibility. More technically advanced are distributed processing platforms, of which Google App Engine is a prime example. This platform allows the customer to execute ‘chunks of code’. The programmer has no control over the physical location where this code is executed. It could be in multiple locations, as we will see later. Interestingly, some social media applications also have some elements of this model, as they provide an API to 3 rd party developers.
Cloud computing is a generic term, and signifies a number of distinct approaches to computing. Included in that spectrum are the following main approaches. Amazon’s EC2, (elastic cloud computing) is a platform for fast deployment of virtual servers. You basically get an operating system with an IP address. Beyond a single instance, customers need to take care of sizing issues themselves. A more traditional approach, in a way, is formed by specific hosted applications. The service is basically an account to log in to, and ‘application dialtone’. The provider takes care of all scaling issues. Examples of these are Google Apps, Google Sites, and Salesforce.com. Arguably, most interactive websites including social media websites, can be considered as hosted applications. Also early in the game are content distribution networks. These store digital assets such as pictures, video, etcetera on the edge of the network, so they can be delivered quickly and scalable to large numbers of client. The emphasis is on data, not applications. If you want to call this application hosting it is about static web content hosting. Sizing is typically a provider responsibility. More technically advanced are distributed processing platforms, of which Google App Engine is a prime example. This platform allows the customer to execute ‘chunks of code’. The programmer has no control over the physical location where this code is executed. It could be in multiple locations, as we will see later. Interestingly, some social media applications also have some elements of this model, as they provide an API to 3 rd party developers.
Cloud computing is a generic term, and signifies a number of distinct approaches to computing. Included in that spectrum are the following main approaches. Amazon’s EC2, (elastic cloud computing) is a platform for fast deployment of virtual servers. You basically get an operating system with an IP address. Beyond a single instance, customers need to take care of sizing issues themselves. A more traditional approach, in a way, is formed by specific hosted applications. The service is basically an account to log in to, and ‘application dialtone’. The provider takes care of all scaling issues. Examples of these are Google Apps, Google Sites, and Salesforce.com. Arguably, most interactive websites including social media websites, can be considered as hosted applications. Also early in the game are content distribution networks. These store digital assets such as pictures, video, etcetera on the edge of the network, so they can be delivered quickly and scalable to large numbers of client. The emphasis is on data, not applications. If you want to call this application hosting it is about static web content hosting. Sizing is typically a provider responsibility. More technically advanced are distributed processing platforms, of which Google App Engine is a prime example. This platform allows the customer to execute ‘chunks of code’. The programmer has no control over the physical location where this code is executed. It could be in multiple locations, as we will see later. Interestingly, some social media applications also have some elements of this model, as they provide an API to 3 rd party developers.
Each vertical bar represents more than 7000 probes as measured by more than 30 monitoring stations of WatchMouse. Each probe typically consists of a HTTP GET request of a specific data asset. Uptime is calculated by dividing the number of successful requests by the total number of requests. Requests are successful if they complete within a few seconds. Although these services are all accessed through a URL, their underlying software architecture can be very different. Uptimes are typically way above 99.5% uptime, which is quite competitive in comparison to typical in-house application service level agreements.
Cloud infrastructures do have physical locations. Even though Google is pretty secretive about a lot of details of its operations, there are known locations of a number of Google’s datacenters. The map shown is derived from http://www.techcrunch.com/2008/04/11/where-are-all-the-google-data-centers/ The relevance of the location of datacenters is that the further away it is, the longer it takes for information exchanges with them. For example, the round trip delay between Spain and New-Zealand (opposites on the globe) is about 290 milliseconds. Proximity matters. Interactive applications are heavily dependent on quick information exchanges. Although 290 milliseconds is hardly noticeable to the human observer, many applications require dozens or even hundreds of round trips for meaningful user interactions. A hundred round trips of 100 milliseconds each add up to 10 seconds total delay, which is intolerable for most user interactions.
Our Cloud Proximity Indicator is an aggregate measure, and is computing by averaging the distances to 35 Watchmouse monitoring stations, where distance is minimum of observed ping delays or connect delays in milliseconds. Akamai clearly has an abundant presence around the world, and is for example the only provider with a presence in Australia and Israel. Google AppEngine and the CNN CDN are improving the coverage of their network, and their routing policies. Mosso on the other hand has lost some presence in Singapore. Google sites seems to share a lot of infrastructure with Google AppEngine. The fact that EC2, which is in a single location, has changed its proximity is probably attributable to generic backbone routing changes. Our results are obviously dependent on the location of the Watchmouse stations. However, these are located on all continents and are representative of major internet markets.
What we see here is that during August and September, Amazon Cloudfront has moved closer to the New York monitoring station. It is likely that this is the result of improved peering arrangements that have led to more direct routing. Nevertheless, routes continue to fluctuate.
Some clouds are in more locations than other clouds. In this diagram we see measurements of the distance to the Google App Engine cloud. The graph depicts the distance from the monitoring stations to a regular site, hosted in NL, and a site hosted on Google App Engine. Time to connect to the regular site increases with distance. We are measuring the speed of light here, sort of (around 100.000 Km/sec; the speed of light is lower in fiber, and intermediate routers also add delay). The round trip delay between Spain and New-Zealand (opposites on the globe, 20.000km distance) is around 290 milliseconds The Google cloud is in more than one place. For example, it is close to the Netherlands, but it also has a presence in East Asia, near Hong Kong as well as Singapore. The regular site is closer to monitoring station NL2, where the cloud is closer to NL4.
Cloud infrastructures evolve over time. The graph depicts the proximity of Google App Engine as in progresses over a year. It definitely has made progress, although it keeps changing. Apparently new datacenters are brought online, but there are also routing changes. For example in 2008 Google missed the real location of Denmark, and probably routed Danish traffic to north America rather than one of the datacenters in Europe. Europe as a whole has pretty good Internet. The central function of the Amsterdam Internet Exchange is likely to contribute to that. Puzzling is the proximity of San Francisco (of all places). Hong Kong has deteriorated as a result of routing changes, but is back on track. Finally, there appears to be a new datacenter in Australia, but traffic from the Sydney station is not yet routed to it.
How does Google App Engine route traffic to its nearest data center? There are a number of tricks used. A hostname is mapped to a generic Google CNAME. The DNS lookup result of this name depends on the location of the resolver. It is also not stable over time. Even though a range of locations can map to the same IP address (which could be unique to a Google data center), these mappings vary over time. The ping delays measured indicate that both IP addresses of DNS servers, as well as addresses of Google servers, are multicast addresses. That means that they exist in multiple locations. In conclusion, all these techniques make it quite a challenge to conduct detailed cloud measurements. Note the Google number plan. App Engine addresses always end with .141. DNS servers always end with .9
Amazon EC2 is a cloud infrastructure, but it runs in only two locations with very limited migration functionality over these locations. Akamai and others replicate static content across a number of geographical locations. Google App Engine on the other hand makes it possible to replicate dynamic content across a global geography. This involves an object model for data storage that allows for consistent distributed updates. Because of this, design and programming for this infrastructure has to be done from the ground up, and it allows very little reuse of existing software assets. Yet, once that is made possible this can scale up to any number of servers, therefore realizing near infinite scalability. At the same time, Google Application Engine realizes uptimes similar to other cloud operators.
If you run out of servers, you need to buy more. That takes anywhere from a few days to a few weeks. Then you need to find empty racks. If you run out of rack space, you need to find more datacenters. That can take months. In some areas, we are running out of datacenters. Construction of new datacenters can take years. In some areas, we are running out of space and power to construct datacenters.
One of the compelling reasons for looking at cloud computing is the speed with which it can be deployed. As Werner Vogel, CTO of Amazon, likes to say: “The only limit is your credit card”. An additional benefit is the capability to scale down quickly, which implies that there will not be large sums of money ‘sunk’ or wasted.
Trivia question: what was the movie reference in this presentation?