Above the Clouds

Friday, January 28, 2011

Cloud computing - what's in it for me as a scientist?

SCIENCE magazine asked me (Armando) to write a short "Perspectives" piece on how scientists could benefit from cloud computing, based on my presentation at National Academy of Engineering FRONTIERS last September. The piece is running in the January 28, 2011 issue of SCIENCE. A link to the article can be found on the RAD Lab home page.

Tuesday, October 12, 2010

The potential of Clouds for scientific computing

A couple of weeks ago Armando gave an invited talk at the Frontiers of Engineering symposium organized by the National Academy of Engineering. He talked about the potential of Cloud Computing for scientists and engineers who aren’t primarily programmers. It was well received overall and a short article based on it will appear in print soon (and be posted here). Meanwhile, here are the slides in case anyone’s interested. As with the techreport, feel free to use/cite as needed, but please point people at our copy rather than providing a separate internal copy, so we can count downloads etc.

Monday, October 4, 2010

a note on citation

We've received many requests for permission to redistribute or quote our Above The Clouds paper. You are free to redistribute inside your organization or cite, without asking us first, provided you cite us. The correct citation is as follows:

"Above the Clouds: A Berkeley View of Cloud Computing" by Michael Armbrust, Armando Fox, Rean Grifﬁth, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Technical Report EECS-2009-28, EECS Department, University of California, Berkeley.

Please do not redistribute this paper to the public at large; link them to our copy instead.

Thursday, June 11, 2009

clouds and peer-to-peer

We've been asked a few times about the relationship between clouds and peer-to-peer systems, and we wanted to take this opportunity to respond.

Definitions
We differentiate between peer-to-peer (p2p) techniques and p2p systems. The former refers to a set of techniques for building self-organizing distributed systems. These techniques are often useful in building datacenter-scale applications, including datacenter-scale applications that are hosted in the cloud. For instance, Amazon's Dynamo datastore relies on a structured peer-to-peer overlay, as do several other key-value stores.

People often use "P2P" to refer to systems that use these techniques to organize large numbers of cooperating end hosts (peers) such as personal computers and settop boxes. In these systems, most peers necessarily communicate using the Internet, rather than a local area network (LAN). To date, the most successful peer-to-peer applications have been file sharing (e.g., Napster, BitTorrent, eDonkey), communication (Skype). and embarrassing parallel computations, such as SETI@home and BOINC projects.

Limitations
The main appeal of p2p systems is that their resources are often "free", coming from individuals which volunteer their machines' CPUs, storage, and bandwidth. Offsetting this, we see two key limitations of P2P systems.

First, p2p systems lack a centralized administrative entity that owns and controls the peer resources. This makes it hard to ensure high levels of availability and performance. Users are free to disable the peer-to-peer application or reboot their machine, so a great degree of redundancy is required. This makes p2p systems a poor fit for applications requiring reliability, such as web hosting, or other sorts of server applications.

This decentralized control also limits trust. Users can inspect the memory and storage of a running application, meaning that applications cannot safely store confidential information unencrypted on peers. Nor can the application developer count on any particular quantity of resources being dedicated on a machine, or on any particular reliability of storage. These obstacles have made it difficult to monetize p2p services. It should come as no surprise that, so far, the most successful p2p applications have been free, with Skype being a notable exception.

Second, the connectivity between any two peers in the wide area is two or three order of magnitude lower than between two nodes in a datacenter. Residential connectivity in US is typically 1Mbps or less, while in a datacenter a node can often push up to 1Gbps. This makes p2p systems inappropriate for data intensive applications (e.g., data mining, indexing, search), which accounts for a large chunk of the workload in today's datacenters.

Opportunities
Recently, there have been promising efforts to address some of the limitations of p2p systems by building hybrid systems. The most popular examples are data delivery systems, such as Pando and Abcast, where p2p systems are complemented by traditional Content Distribution Systems (CDNs). CDNs are used to ensure availability and performance when the data is not found at peers, or/and peers do not have enough aggregate bandwidth to sustain the demand.

In another development, cable operators and video distributors have started to test with turning the set top boxes into peers. The advantage of settop boxes is that, unlike personal computers, they are always on, and they can be much easily managed remotely. Examples in this category are Vudu, and the European NanoDataCenter effort. However, to date, the applications of choice in the context of these efforts have still remained file sharing and video delivery.

Datacenter clouds and p2p systems are not a substitute for each other. Widely distributed peers may have more aggregate resources, but they lack the reliability and high interconnection bandwidth offered by datacenters. As a result, cloud-hosting and p2p systems complement each other. We expect that in the future more and more applications will span both the cloud and the edge. Examples of such applications are:

Data and video delivery. For highly popular content, p2p distribution can eliminate the network bottlenecks by pushing the distribution at the edge. As an example, consider a live event such as the presidential inauguration. With traditional CDNs, every viewer on a local area network would receive an independent stream, which could lead to choking the incoming link. With p2p, only one viewer on the network needs to receive the stream; the stream can be then redistributed to other viewers using p2p techniques.
Distributed applications that require a high level of interactivity, such as massive multi player games, video conferences, and IP telephony. To minimize latency, in these applications peers communicate with each other directly, rather than through a central server.
Applications that request massive computation per user, such as video editing and real-time translation. Such applications may take advantage of the vast amount of computation resources of the user's machine. Today, virtually every notebook and personal computer has a multi-core processor which are mostly unused. Proposals, such as Google's Native Client aim to tap into these resources.

Wednesday, May 6, 2009

Surge Computing/Hybrid Computing

In an earlier blog post [March 2, 2009], we discussed why private clouds enjoy only a small subset of the benefits of public clouds. If common API's allowed the same application to transition between a private cloud and a public cloud, we believe application operators could enjoy the full benefits of cloud computing. We referred to this capability as "surge computing" in our Above the Clouds white paper.

Surge computing would allow developers to push just enough (possibly sanitized) data into the cloud to perform a computation and obtain an acceptable result, or seamlessly pull in resources from a public cloud when local capacity is temporarily exceeded. They could even use either the private cloud or the public cloud as a "spare" in the event that one cloud environment becomes unavailable or fails.

One early surge-computing tool available to SaaS developers is Eucalyptus, an open source reimplementation of the Amazon Web Services EC2 APIs. The Eucalyptus software was originally developed at UC Santa Barbara, and Eucalyptus Systems, recently raised $5.5M to provide consulting services and technical support for customers constructing private clouds. Canonical Ltd. announced that Eucalyptus will be the underlying technology used in the Ubuntu Enterprise Cloud, which previewed in the latest version of Ubuntu (9.04 released April 23rd 2009). Finally, companies like RightScale have committed to allowing their customers to register their private Ubuntu Enterprise Clouds and have them managed alongside applications deployed on Amazon EC2 via a single interface (the RightScale dashboard).

Wednesday, April 29, 2009

cloud security

Security is one of the most often-cited objections to cloud computing; analysts and skeptical companies ask "who would trust their essential data 'out there' somewhere?" We didn't focus on security extensively in our paper, and we wanted to offer our analysis of what the major security concerns are with cloud computing, and what might be done about them. These are preliminary thoughts; we welcome comments and criticism. Security is not our primary area of interest, and we'd love to hear from people with operational experience.

The security issues involved in protecting clouds from outside threats are similar to those already facing large datacenters, except that responsibility is divided between the cloud user and the cloud operator. The cloud user is responsible for application-level security. The cloud provider is responsible for physical security, and likely for enforcing external firewall policies. Security for intermediate layers of the software stack is a shared between the user and the operator; the lower the level of abstraction exposed to the user, the more responsibility goes with it. Amazon EC2 users have more responsibility for their security than do Azure users, who in turn have more responsibilities than AppEngine customers. This user responsibility, in turn, can be outsourced to third parties who sell specialty security services. The homogeneity and standardized interfaces of platforms like EC2 makes it possible for a company to offer, say, configuration management or firewall rule analysis as value-added services. Outsourced IT is familiar in the enterprise world; there is nothing intrinsicaly infeasible about trusting third parties with essential corporate infrastructure.

While cloud computing may make external-facing security easier, it does pose the new problem of internal-facing security. Cloud providers need to guard against theft or denial of service attacks by users. Users need to be protected against one another.

The primary security mechanism in today's clouds is virtualization. This is a powerful defense, and protects against most attempts by users to attack one another or the underlying cloud infrastructure. However, not all resources are virtualized and not all virtualizion environments are bug-free. Virtualization software has been known to contain bugs that allow virtualized code to "break loose" to some extent. [1] Incorrect network virtualization may allow user code access to sensitive portions of the provider's infrastructure, or to the resources of other users. These challenges, though, are similar to those involved in mangaging large non-cloud datacenters, where different applications need to be protected from one another. Any large internet service will need to ensure that one buggy service doesn't take down the entire datacenter, or that a single security hole doesn't compromise everything else.

One last security concern is protecting the cloud user against the provider. The provider will by definition control the "bottom layer" of the software stack, which effectively circumvents most known security techniques. Absent radical changes in security technology, we expect that users will use contracts and courts, rather than clever security engineering, to guard against provider malfeasence. The one important exception is the risk of inadvertent data loss. It's hard to imagine Amazon spying on the contents of virtual machine memory; it's easy to imagine a hard disk being disposed of without being wiped, or a permissions bug making data visible improperly.

There's an obvious defense, namely user-level encryption of storage. This is already common for high-value data outside the cloud, and both tools and expertise are readily available. The catch is that key management is still challenging: users would need to be careful that the keys are never stored on permanent storage or handled improperly. Providers could make this simpler by exposing APIs for things like curtained memory or security sensive storage that should never be paged out.

[1] Indeed, even correct VM environments can allow the virtualized software to "escape" in the presence of hardware errors. See Sudhakar Govindavajhala and Andrew W. Appel, Using Memory Errors to Attack a Virtual Machine. 2003 IEEE Symposium on Security and Privacy, pp. 154-165, May 2003.

Monday, April 20, 2009

Cloud computing, law enforcement and business continuity

In our Above The Clouds white paper, we identified various obstacles to the growth of Cloud Computing including data conﬁdentiality and auditability as well as business continuity in the event of an outage at the cloud vendor.

Recently, a colocation facility owned by Core IP Networks LLC was raided by the FBI and the entire datacenter was shut down. "Millions of dollars' worth" of computers, many owned by other companies colocated in the datacenter that had no connection to the companies being investigated by the FBI, were confiscated and those sites went offline. Some of the companies subsequently went out of business. Spreading one's cloud application over multiple physical datacenters may protect against natural disasters, but if those datacenters are all operated by a single provider or in a single jurisdiction, customers might still be exposed to other business continuity disruptions such as this one.

Core IP Networks' CEO, Matthew Simpson, posted a letter to inform customers of the situation as well as to voice concern over the unfairness of the FBI's operation to many of the innocent "bystander" customers who suffered service outages as a result. His letter concludes: "If you run a datacenter, please be aware that in our great country, the FBI can come into your place of business at any time and take whatever they want, with no reason." Indeed, noted technologist and technology blogger James Urquhart wonders whether the U.S. legal system will be a hindrance to cloud computing adoption.

The problem is hardly unique to the United States. The massive government-initiated shutdowns of Swedish ISP's used by the Pirate Bay, a group being investigated for trafficking in copyrighted digital media, similarly resulted in unexpected downtime for many companies unrelated to the Pirate Bay but who had the misfortune to be housed in the same facility.

These incidents also illustrate what we called reputation fate sharing in the paper: the behavior of a single cloud customer can affect the reputation of other customers, perhaps to the extreme degree that computers belong to innocent bystanders are seized.