Thursday, June 11, 2009

clouds and peer-to-peer

We've been asked a few times about the relationship between clouds and peer-to-peer systems, and we wanted to take this opportunity to respond.

Definitions
We differentiate between peer-to-peer (p2p) techniques and p2p systems. The former refers to a set of techniques for building self-organizing distributed systems. These techniques are often useful in building datacenter-scale applications, including datacenter-scale applications that are hosted in the cloud. For instance, Amazon's Dynamo datastore relies on a structured peer-to-peer overlay, as do several other key-value stores.

People often use "P2P" to refer to systems that use these techniques to organize large numbers of cooperating end hosts (peers) such as personal computers and settop boxes. In these systems, most peers necessarily communicate using the Internet, rather than a local area network (LAN). To date, the most successful peer-to-peer applications have been file sharing (e.g., Napster, BitTorrent, eDonkey), communication (Skype). and embarrassing parallel computations, such as SETI@home and BOINC projects.

Limitations
The main appeal of p2p systems is that their resources are often "free", coming from individuals which volunteer their machines' CPUs, storage, and bandwidth. Offsetting this, we see two key limitations of P2P systems.

First, p2p systems lack a centralized administrative entity that owns and controls the peer resources. This makes it hard to ensure high levels of availability and performance. Users are free to disable the peer-to-peer application or reboot their machine, so a great degree of redundancy is required. This makes p2p systems a poor fit for applications requiring reliability, such as web hosting, or other sorts of server applications.

This decentralized control also limits trust. Users can inspect the memory and storage of a running application, meaning that applications cannot safely store confidential information unencrypted on peers. Nor can the application developer count on any particular quantity of resources being dedicated on a machine, or on any particular reliability of storage. These obstacles have made it difficult to monetize p2p services. It should come as no surprise that, so far, the most successful p2p applications have been free, with Skype being a notable exception.

Second, the connectivity between any two peers in the wide area is two or three order of magnitude lower than between two nodes in a datacenter. Residential connectivity in US is typically 1Mbps or less, while in a datacenter a node can often push up to 1Gbps. This makes p2p systems inappropriate for data intensive applications (e.g., data mining, indexing, search), which accounts for a large chunk of the workload in today's datacenters.

Opportunities
Recently, there have been promising efforts to address some of the limitations of p2p systems by building hybrid systems. The most popular examples are data delivery systems, such as Pando and Abcast, where p2p systems are complemented by traditional Content Distribution Systems (CDNs). CDNs are used to ensure availability and performance when the data is not found at peers, or/and peers do not have enough aggregate bandwidth to sustain the demand.

In another development, cable operators and video distributors have started to test with turning the set top boxes into peers. The advantage of settop boxes is that, unlike personal computers, they are always on, and they can be much easily managed remotely. Examples in this category are Vudu, and the European NanoDataCenter effort. However, to date, the applications of choice in the context of these efforts have still remained file sharing and video delivery.

Datacenter clouds and p2p systems are not a substitute for each other. Widely distributed peers may have more aggregate resources, but they lack the reliability and high interconnection bandwidth offered by datacenters. As a result, cloud-hosting and p2p systems complement each other. We expect that in the future more and more applications will span both the cloud and the edge. Examples of such applications are:

  • Data and video delivery. For highly popular content, p2p distribution can eliminate the network bottlenecks by pushing the distribution at the edge. As an example, consider a live event such as the presidential inauguration. With traditional CDNs, every viewer on a local area network would receive an independent stream, which could lead to choking the incoming link. With p2p, only one viewer on the network needs to receive the stream; the stream can be then redistributed to other viewers using p2p techniques.
  • Distributed applications that require a high level of interactivity, such as massive multi player games, video conferences, and IP telephony. To minimize latency, in these applications peers communicate with each other directly, rather than through a central server.
  • Applications that request massive computation per user, such as video editing and real-time translation. Such applications may take advantage of the vast amount of computation resources of the user's machine. Today, virtually every notebook and personal computer has a multi-core processor which are mostly unused. Proposals, such as Google's Native Client aim to tap into these resources.

8 comments:

Tim said...

There was an interesting talk at the HotCloud workshop yesterday on bridging these two ideas to make "volunteer clouds".

Nebulas: Using Distributed Voluntary Resources to Build Clouds, Abhishek Chandra and Jon Weissman

Some other interesting talks:
Hot Cloud 09

97614a888580e98b65ea87f12895f9eadac67e0bf1e86e56786509a072ee0a58 said...

Identity management is also a potential significant application for P2P.

97614a888580e98b65ea87f12895f9eadac67e0bf1e86e56786509a072ee0a58 said...

In a P2P world a user retains control of their credentials and provides these directly to others as required.

KickStart Search said...

Didn't know about the limitiations of residential connectivity. Interesting stuff.

bobthesciguy said...

I'm curious what the poster meant by "embarrassing parallel computations, such as SETI@home and BOINC projects." They seem successful.

Tim said...

Interesting insights.

Are those residential comms numbers real? Do they arise because of the distribution of performance (median << mean) or from asynchronous characteristics, or is the US just way behind the rest of the world in residential comms performance?

Also, surely if the main constraint is bandwidth between peers over the interent, then the p2p apps will tend to migrate to the higher bandwidth hosts?

rnbresearch said...

Interesting text. You have a nice blog. Keep it up!

鍵盤 said...

jp成人,熊貓貼圖,成人圖片,成人文章,正妹,成人小說,杜蕾斯成人,ut 聊天室,熊貓貼圖區,交友聊天找e爵,ol制服美女影片,777成人區,bt成人,女同志聊天室,貼圖片區,一葉情貼圖片區,6k聊天室,69成人,成人貼圖站,色情影片,聊天室ut,免費成人影片,成人漫畫,0204貼圖區,小高聊天室,歐美免費影片,情色視訊聊天室,4u成人,pc交友,尋夢園聊天聯盟,玩美女人影音秀,666成人,免費視訊,聊天,情色論壇,視訊,成人文學,成人電影,漫畫貼圖,情色自拍,

Post a Comment