Tuesday, February 24, 2009

Eucalyptus to be Included in Next Ubuntu Release

One of the obstacles we identified in our paper was API-based vendor lock-in: If you've written your application to run on a particular cloud, and especially if you have automated management and provisioning tools, moving to a different provider, with different APIs, can be a significant expense. Mark Shuttleworth recently announced on the Ubuntu Developers mailing list that the next release of Ubuntu, Karmic Koala, "aims to keep free software at the forefront of cloud computing by embracing the API's of Amazon EC2, and making it easy for anybody to set up their own cloud using entirely open tools." Specifically they will be including support for Eucalyptus, the University of California, Santa Barbara project that allows operators to provide an EC2 compatible API to their own cluster. While there are a number of other efforts to create a standard this is definitely a step towards the adoption of the EC2 API as a de facto standard. To the best of our knowledge, this makes EC2's API the first one with a second implementation -- an open source implementation, to boot.

Monday, February 23, 2009

IBM Software Available Pay-As-You-Go on EC2

Just before we released our paper, IBM and Amazon made a very interesting announcement: IBM software will be available in the cloud with pay-as-you-go licensing. This makes IBM one of the first major enterprise software vendors to provide pay-as-you-go licensing in the cloud. (Another example is Red Hat, which provides supported versions of Red Hat Enterprise Linux, JBoss and other software for a monthly fee plus a by-the-hour fee). In our white paper, we had identified software licensing models as an obstacle to cloud computing, so we are excited to see this development. The announcement highlights some of the benefits that a software vendor like IBM can get from cloud computing. IBM is letting software developers use their products in the cloud at no additional cost for development purposes (other than a small monthly fee), thus essentially providing a very easy-to-use trial version of the software: spinning up an AMI with, for example, DB2, is much faster and more convenient than working out a trial arrangement with a sales representative and installing the software. This is good for developers because they can focus on integration with the software, and it's good for IBM because it lets more people try their products. For companies that want to do more than development, the software can be used in the cloud with either an existing (longer-term) IBM billing plan or an hourly pay-as-you-go billing plan. Details of the latter are not available yet, but it will be interesting to see what the setup, monthly and hourly fees have to look like for a supported pay-as-you-go offering to make sense. At Berkeley we believe that at least in the short term, one of the biggest advantages of cloud computing is ease of experimentation. Before today, one could use a cloud service like EC2 to test out multiple operating systems, machine images with pre-configured open-source software stacks, and large-scale experiments (will my software scale to 100 nodes?). The availability of software like DB2, WebSphere sMash, etc from IBM means even more prototyping and experimentation is possible without negotiating long-term contracts or having to go through a complicated setup process. This potential for prototyping and experimentation helps both software users and commercial software vendors.

Thursday, February 12, 2009

YouTube Discussion of the Paper

Here's a video of professors Armando Fox, Anthony Joseph, Randy Katz and David Patterson discussing some of the ideas in the paper:

Above the Clouds Released

We've just released our white paper: "Above the Clouds: A Berkeley View of Cloud Computing."

Executive summary:

Cloud Computing, the long-held dream of computing as a utility, has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. Developers with innovative ideas for new Internet services no longer require the large capital outlays in hardware to deploy their service or the human expense to operate it. They need not be concerned about over-provisioning for a service whose popularity does not meet their predictions, thus wasting costly resources, or under-provisioning for one that becomes wildly popular, thus missing potential customers and revenue. Moreover, companies with large batch-oriented tasks can get results as quickly as their programs can scale, since using 1000 servers for one hour costs no more than using one server for 1000 hours. This elasticity of resources, without paying a premium for large scale, is unprecedented in the history of IT. Cloud Computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services. The services themselves have long been referred to as Software as a Service (SaaS). The datacenter hardware and software is what we will call a Cloud. When a Cloud is made available in a pay-as-you-go manner to the general public, we call it a Public Cloud; the service being sold is Utility Computing. We use the term Private Cloud to refer to internal datacenters of a business or other organization, not made available to the general public. Thus, Cloud Computing is the sum of SaaS and Utility Computing, but does not include Private Clouds. People can be users or providers of SaaS, or users or providers of Utility Computing. We focus on SaaS Providers (Cloud Users) and Cloud Providers, which have received less attention than SaaS Users. From a hardware point of view, three aspects are new in Cloud Computing:

  1. The illusion of infinite computing resources available on demand, thereby eliminating the need for Cloud Computing users to plan far ahead for provisioning.
  2. The elimination of an up-front commitment by Cloud users, thereby allowing companies to start small and increase hardware resources only when there is an increase in their needs.
  3. The ability to pay for use of computing resources on a short-term basis as needed (e.g., processors by the hour and storage by the day) and release them as needed, thereby rewarding conservation by letting machines and storage go when they are no longer useful.
We argue that the construction and operation of extremely large-scale, commodity-computer datacenters at low-cost locations was the key necessary enabler of Cloud Computing, for they uncovered the factors of 5 to 7 decrease in cost of electricity, network bandwidth, operations, software, and hardware available at these very large economies of scale. These factors, combined with statistical multiplexing to increase utilization compared a private cloud, meant that cloud computing could offer services below the costs of a medium-sized datacenter and yet still make a good profit. Any application needs a model of computation, a model of storage, and a model of communication. The statistical multiplexing necessary to achieve elasticity and the illusion of infinite capacity requires each of these resources to be virtualized to hide the implementation of how they are multiplexed and shared. Our view is that different utility computing offerings will be distinguished based on the level of abstraction presented to the programmer and the level of management of the resources.

Amazon EC2 is at one end of the spectrum. An EC2 instance looks much like physical hardware, and users can control nearly the entire software stack, from the kernel upwards. This low level makes it inherently difficult for Amazon to offer automatic scalability and failover, because the semantics associated with replication and other state management issues are highly application-dependent. At the other extreme of the spectrum are application domain-specific platforms such as Google AppEngine. AppEngine is targeted exclusively at traditional web applications, enforcing an application structure of clean separation between a stateless computation tier and a stateful storage tier. AppEngine's impressive automatic scaling and high-availability mechanisms, and the proprietary MegaStore data storage available to AppEngine applications, all rely on these constraints. Applications for Microsoft's Azure are written using the .NET libraries, and compiled to the Common Language Runtime, a language-independent managed environment. Thus, Azure is intermediate between application frameworks like AppEngine and hardware virtual machines like EC2. When is Utility Computing preferable to running a Private Cloud? A first case is when demand for a service varies with time. Provisioning a data center for the peak load it must sustain a few days per month leads to underutilization at other times, for example. Instead, Cloud Computing lets an organization pay by the hour for computing resources, potentially leading to cost savings even if the hourly rate to rent a machine from a cloud provider is higher than the rate to own one. A second case is when demand is unknown in advance. For example, a web startup will need to support a spike in demand when it becomes popular, followed potentially by a reduction once some of the visitors turn away. Finally, organizations that perform batch analytics can use the "cost associativity" of cloud computing to finish computations faster: using 1000 EC2 machines for 1 hour costs the same as using 1 machine for 1000 hours. For the first case of a web business with varying demand over time and revenue proportional to user hours, we have captured the tradeoff in the equation below.

The left-hand side multiplies the net revenue per user-hour by the number of user-hours, giving the expected profit from using Cloud Computing. The right-hand side performs the same calculation for a fixed-capacity datacenter by factoring in the average utilization, including nonpeak workloads, of the datacenter. Whichever side is greater represents the opportunity for higher profit.

The table below previews our ranked list of critical obstacles to growth of Cloud Computing; the full discussion is in Section 7 of our paper. The first three concern adoption, the next five affect growth, and the last two are policy and business obstacles. Each obstacle is paired with an opportunity, ranging from product development to research projects, which can overcome that obstacle.

We predict Cloud Computing will grow, so developers should take it into account. All levels should aim at horizontal scalability of virtual machines over the efficiency on a single VM. In addition:

  • Applications Software needs to both scale down rapidly as well as scale up, which is a new requirement. Such software also needs a pay-for-use licensing model to match needs of Cloud Computing.
  • Infrastructure Software needs to be aware that it is no longer running on bare metal but on VMs. Moreover, it needs to have billing built in from the beginning.
  • Hardware Systems should be designed at the scale of a container (at least a dozen racks), which will be is the minimum purchase size. Cost of operation will match performance and cost of purchase in importance, rewarding energy proportionality such as by putting idle portions of the memory, disk, and network into low power mode. Processors should work well with VMs, flash memory should be added to the memory hierarchy, and LAN switches and WAN routers must improve in bandwidth and cost.
Table: Quick Preview of Top 10 Obstacles to and Opportunities for Growth of Cloud Computing.
ObstacleOpportunity
1Availability of ServiceUse Multiple Cloud Providers; Use Elasticity to Prevent DDOS
2Data Lock-InStandardize APIs; Compatible SW to enable Surge Computing
3Data Confidentiality and AuditabilityDeploy Encryption, VLANs, Firewalls; Geographical Data Storage
4Data Transfer BottlenecksFedExing Disks; Data Backup/Archival; Higher BW Switches
5Performance UnpredictabilityImproved VM Support; Flash Memory; Gang Schedule VMs
6Scalable StorageInvent Scalable Store
7Bugs in Large Distributed SystemsInvent Debugger that relies on Distributed VMs
8Scaling QuicklyInvent Auto-Scaler that relies on ML; Snapshots for Conservation
9Reputation Fate SharingOffer reputation-guarding services like those for email
10Software LicensingPay-for-use licenses; Bulk use sales