Tuesday, March 17, 2009
Cloud computing in education
Monday, March 2, 2009
Is Everything Cloud Computing?
- It will be very hard to come up with an easy-to-apply, widely-adopted definition that clearly demarks when an internal datacenter should be properly included in Cloud Computing and when it should not.
- Even if you overcome drawback 1, many of the generalizations that apply to our definition of Cloud Computing will be incorrect for a more inclusive definition of Cloud Computing. Such inconsistency is a reason some think the claims for Cloud Computing are just hype.
Hence, to be more precise, in Above the Clouds, we defined Cloud Computing as follows:
Cloud Computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services. The services themselves have long been referred to as Software as a Service (SaaS), so we use that term. The datacenter hardware and software is what we will call a Cloud. When a Cloud is made available in a pay-as-you-go manner to the public, we call it a Public Cloud; the service being sold is Utility Computing. … We use the term Private Cloud to refer to internal datacenters of a business or other organization that are not made available to the public. Thus, Cloud Computing is the sum of SaaS and Utility Computing, but does not normally include Private Clouds.Regarding the first drawback to a more expansive definition, you can include nearly everything in IT involving server hardware or software, leaving yourself vulnerable to people like Larry Ellison calling you a marketing charlatan. Alternatively, you have to come up with further distinctions of which kinds of internal datacenters are Cloud Computing and which are not. Good luck coming up with a short, crisp definition and having it widely adopted.
Regarding the second drawback to including internal datacenters, an expanded definition will break most generalizations about Cloud Computing. Here are two examples. Virtually no internal datacenters are large enough to see the factors of 5 to 7 in cost advantages due to economies of scale as seen in huge data centers that we believe, as we say in the report, are a defining factor of cloud computing. Many internal datacenters also do not have the fine grained accounting in place to inform users of what resources they are using, which makes it hard to inspire the resource conservationism provided by the pay-as-you-go billing model, which we identify as another unique characteristic of Cloud Computing.
Of course, it is possible to run an internal datacenter using exactly the same APIs and policies of Public Clouds; presumably, that is how Amazon and Google got started in this business. Running an internal data center this way leads to some advantages of Cloud Computing, such as improved utilization and resource management, but not to all of the advantages we identified, such as high elasticity, pay-as-you-go billing, and economies of scale. As we say in the report, we also think that using cloud APIs in an internal data center will enable what we call Surge Computing: in times of heavy load, you outsource some tasks from an internal data center into a Cloud, thus mitigating the risk of under-provisioning.
Returning to the analogy we used in Above the Clouds, the hardware world has largely separated into semiconductor foundries like TSMC and fab-less chip design companies like NVIDIA. However, some larger companies do have internal fabs that they precisely match to the TSMC design rules so that they can do “Surge Chip Fabrication” at TSMC when chip demand exceeds what their internal capacity. Indeed, “Surge Chip Fab” is a significant fraction of TSMC’s business.
Just as the those in the hardware world do not consider companies that use Surge Chip Fab to be semiconductor foundries, for the two reasons above, we do not recommend including any private datacenter that imitates some characteristics of Public Clouds in Cloud Computing. Surge Computing is a more accurate label for such datacenters, even if it doesn’t have the same sizzle as Cloud Computing.
Although our restricted definition may limit which products and services are labeled Cloud Computing, by being precise we aim to prevent allergic reactions and thereby enable a more meaningful and constructive discussion of the current state of Cloud Computing and its future.
Tuesday, February 24, 2009
Eucalyptus to be Included in Next Ubuntu Release
Monday, February 23, 2009
IBM Software Available Pay-As-You-Go on EC2
Thursday, February 12, 2009
YouTube Discussion of the Paper
Above the Clouds Released
Executive summary:
Cloud Computing, the long-held dream of computing as a utility, has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. Developers with innovative ideas for new Internet services no longer require the large capital outlays in hardware to deploy their service or the human expense to operate it. They need not be concerned about over-provisioning for a service whose popularity does not meet their predictions, thus wasting costly resources, or under-provisioning for one that becomes wildly popular, thus missing potential customers and revenue. Moreover, companies with large batch-oriented tasks can get results as quickly as their programs can scale, since using 1000 servers for one hour costs no more than using one server for 1000 hours. This elasticity of resources, without paying a premium for large scale, is unprecedented in the history of IT. Cloud Computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services. The services themselves have long been referred to as Software as a Service (SaaS). The datacenter hardware and software is what we will call a Cloud. When a Cloud is made available in a pay-as-you-go manner to the general public, we call it a Public Cloud; the service being sold is Utility Computing. We use the term Private Cloud to refer to internal datacenters of a business or other organization, not made available to the general public. Thus, Cloud Computing is the sum of SaaS and Utility Computing, but does not include Private Clouds. People can be users or providers of SaaS, or users or providers of Utility Computing. We focus on SaaS Providers (Cloud Users) and Cloud Providers, which have received less attention than SaaS Users. From a hardware point of view, three aspects are new in Cloud Computing:
- The illusion of infinite computing resources available on demand, thereby eliminating the need for Cloud Computing users to plan far ahead for provisioning.
- The elimination of an up-front commitment by Cloud users, thereby allowing companies to start small and increase hardware resources only when there is an increase in their needs.
- The ability to pay for use of computing resources on a short-term basis as needed (e.g., processors by the hour and storage by the day) and release them as needed, thereby rewarding conservation by letting machines and storage go when they are no longer useful.
Amazon EC2 is at one end of the spectrum. An EC2 instance looks much like physical hardware, and users can control nearly the entire software stack, from the kernel upwards. This low level makes it inherently difficult for Amazon to offer automatic scalability and failover, because the semantics associated with replication and other state management issues are highly application-dependent. At the other extreme of the spectrum are application domain-specific platforms such as Google AppEngine. AppEngine is targeted exclusively at traditional web applications, enforcing an application structure of clean separation between a stateless computation tier and a stateful storage tier. AppEngine's impressive automatic scaling and high-availability mechanisms, and the proprietary MegaStore data storage available to AppEngine applications, all rely on these constraints. Applications for Microsoft's Azure are written using the .NET libraries, and compiled to the Common Language Runtime, a language-independent managed environment. Thus, Azure is intermediate between application frameworks like AppEngine and hardware virtual machines like EC2. When is Utility Computing preferable to running a Private Cloud? A first case is when demand for a service varies with time. Provisioning a data center for the peak load it must sustain a few days per month leads to underutilization at other times, for example. Instead, Cloud Computing lets an organization pay by the hour for computing resources, potentially leading to cost savings even if the hourly rate to rent a machine from a cloud provider is higher than the rate to own one. A second case is when demand is unknown in advance. For example, a web startup will need to support a spike in demand when it becomes popular, followed potentially by a reduction once some of the visitors turn away. Finally, organizations that perform batch analytics can use the "cost associativity" of cloud computing to finish computations faster: using 1000 EC2 machines for 1 hour costs the same as using 1 machine for 1000 hours. For the first case of a web business with varying demand over time and revenue proportional to user hours, we have captured the tradeoff in the equation below.

The left-hand side multiplies the net revenue per user-hour by the number of user-hours, giving the expected profit from using Cloud Computing. The right-hand side performs the same calculation for a fixed-capacity datacenter by factoring in the average utilization, including nonpeak workloads, of the datacenter. Whichever side is greater represents the opportunity for higher profit.
The table below previews our ranked list of critical obstacles to growth of Cloud Computing; the full discussion is in Section 7 of our paper. The first three concern adoption, the next five affect growth, and the last two are policy and business obstacles. Each obstacle is paired with an opportunity, ranging from product development to research projects, which can overcome that obstacle.
We predict Cloud Computing will grow, so developers should take it into account. All levels should aim at horizontal scalability of virtual machines over the efficiency on a single VM. In addition:
- Applications Software needs to both scale down rapidly as well as scale up, which is a new requirement. Such software also needs a pay-for-use licensing model to match needs of Cloud Computing.
- Infrastructure Software needs to be aware that it is no longer running on bare metal but on VMs. Moreover, it needs to have billing built in from the beginning.
- Hardware Systems should be designed at the scale of a container (at least a dozen racks), which will be is the minimum purchase size. Cost of operation will match performance and cost of purchase in importance, rewarding energy proportionality such as by putting idle portions of the memory, disk, and network into low power mode. Processors should work well with VMs, flash memory should be added to the memory hierarchy, and LAN switches and WAN routers must improve in bandwidth and cost.
Obstacle | Opportunity | |
1 | Availability of Service | Use Multiple Cloud Providers; Use Elasticity to Prevent DDOS |
2 | Data Lock-In | Standardize APIs; Compatible SW to enable Surge Computing |
3 | Data Confidentiality and Auditability | Deploy Encryption, VLANs, Firewalls; Geographical Data Storage |
4 | Data Transfer Bottlenecks | FedExing Disks; Data Backup/Archival; Higher BW Switches |
5 | Performance Unpredictability | Improved VM Support; Flash Memory; Gang Schedule VMs |
6 | Scalable Storage | Invent Scalable Store |
7 | Bugs in Large Distributed Systems | Invent Debugger that relies on Distributed VMs |
8 | Scaling Quickly | Invent Auto-Scaler that relies on ML; Snapshots for Conservation |
9 | Reputation Fate Sharing | Offer reputation-guarding services like those for email |
10 | Software Licensing | Pay-for-use licenses; Bulk use sales |