This document discusses peer-to-peer systems and middleware for managing distributed resources at a large scale. It describes key characteristics of peer-to-peer systems like nodes contributing equal resources and decentralized operation. Middleware systems like Pastry and Tapestry are overlay networks that route requests to distributed objects across nodes through knowledge at each node. They provide simple APIs and support scalability, load balancing, and dynamic node availability.
GOAL: To enablethe sharing of data and resources on a very large scale by
eliminating any requirement for separately managed servers and their
associated infrastructure.
Peer-to-peer systems aim to support useful distributed services and
applications using data and computing resources available in the personal
computers and workstations that are present in the Internet.
Peer-to-Peer applications as ‘applications that exploit resources available
at the edges of the Internet – Storage, cycles, content, human presence’.
PEER - TO - PEER SYSTEMS
4.
CHARACTERISTICS
• Their designensures that each user contributes resources to the system.
• Although they may differ in the resources that they contribute, all the nodes in
a peer-to-peer system have the same functional capabilities and responsibilities.
• Their correct operation does not depend on the existence of any centrally
administered systems.
• They can be designed to offer a limited degree of anonymity to the providers and
users of resources.
• A key issue for their efficient operation is the choice of an algorithm for the
placement of data across many hosts and subsequent access to it.
5.
Figure (a): Distinctionsbetween IP and overlay routing for
peer-to-peer applications
IP Application-level routing overlay
Scale IPv4 is limitedto 232 addressablenodes.The
IPv6 name space is much moregenerous
(2128), but addresses in bothversions are
hierarchically structured and much of thespace
is pre-allocated according to administrative
requirements.
Peer-to-peer systems can addressmore objects.
The GUID name space is very largeand flat
(>2128), allowing it to be much morefully
occupied.
Load balancing Loads on routers are determined by network
topologyand associated traffic patterns.
Object locations can be randomized and hence
traffic patterns are divorced fromthe network
topology.
Network dynamics
(addition/deletion of
objects/nodes)
IP routingtables are updated asynchronouslyon
a best-efforts basis with time constants onthe
order of1 hour.
Routing tables can be updated synchronouslyor
asynchronouslywithfractions of a second
delays.
Fault tolerance Redundancy is designed intotheIP networkby
its managers, ensuring tolerance of a single
router or network connectivityfailure. n-fold
replication is costly.
Routes and object references can be replicated
n-fold, ensuring tolerance of n failures ofnodes
or connections.
Target identification Each IP address maps to exactly one target
node.
Messages can be routed to thenearestreplicaof
a target object.
Security andanonymity Addressing is only secure when all nodesare
trusted. Anonymity for the owners ofaddresses
is not achievable.
Security can be achieved even in environments
with limited trust.A limited degree of
anonymity can be provided.
6.
NAPSTER
The firstlarge scale peer-to-peer network was Napster, set up in 1999 to share digital
music files over the Internet.
While Napster maintained centralized (and replicated) indices, the music files were
created and made available by individuals, usually with music copied from CDs to
computer files.
Music content owners sued Napster for copyright violations and succeeded in
shutting down the service. Figure (b) documents the process of requesting a music file
from Napster.
7.
Figure (b): Napster:peer-to-peer file sharing with a
centralized, replicated index
Napster server
Index1. File location
2. List of peers
request
offering the file
peers
3. File request
4. File delivered
5. Index update
Napster server
Index
8.
Peer-to-peermiddleware
• The thirdgeneration is characterized by the emergence of middleware
layers for the application-independent management of distributed resources
on a global scale.
•Peer-to-peer middleware systems are designed specifically to meet the
need for the automatic placement and subsequent location of the
distributed objects managed by peer-to-peer systems and applications.
• The best-known and most fully developed examples include Pastry
[Rowstron and Druschel 2001], Tapestry [Zhao et al. 2004], CAN
[Ratnasamy et al. 2001], Chord [Stoica et al. 2001] and Kademlia
[Maymounkov and Mazieres 2002].
9.
Functional requirements:
• Thefunction of peer-to-peer middleware is to simplify the construction of
services that are implemented across many hosts in a widely distributed
network.
• Other important requirements include the ability to add new resources and to
remove them at will and to add hosts to the service and remove them.
• Like other middleware, peer-to-peer middleware should offer a simple
programming interface to application programmers that is independent of the
types of distributed resource that the application manipulates.
Non-functional requirements:
•Global scalability
•Load balancing
•Optimization for local interactions between neighboring peers
•Accommodating to highly dynamic host availability
10.
ROUTINGOVERLAYS
• In peer-to-peersystems a distributed algorithm known as a routing overlay takes
responsibility for locating nodes and objects.
• The routing overlay ensures that any node can access any object by routing each
request through a sequence of nodes, exploiting knowledge at each of them to locate the
destination object.
• Peer-to-peer systems usually store multiple replicas of objects to ensure availability.
Main Task:
• Routing of requests to objects
• Insertion of objects
• Deletion of objects
• Node addition and removal
11.
Figure (c): Distributionof information in a routing overlay
Object:
Node:
D
CÕs routing knowledge
DÕs routing knowledgeAÕs routing knowledge
BÕs routing knowledge
C
A
B
12.
Figure (d) :Basic programming interface for a distributed
hash table (DHT) as implemented by the PAST API over
Pastry
put(GUID, data)
The data is stored in replicas at all nodes responsible for the object identified
by GUID.
remove(GUID)
Deletes all references to GUID and the associated data.
value = get(GUID)
The data associated with GUID is retrieved from one of the nodes responsible
it.
13.
Figure (e): Basicprogramming interface for distributed object
location and routing (DOLR) as implemented by Tapestry
publish(GUID )
GUID can be computed from the object (or some part of it, e.g. its
name). This function makes the node performing a publish operation
the host for the object corresponding to GUID.
unpublish(GUID)
Makes the object corresponding to GUID inaccessible.
sendToObj(msg, GUID, [n])
Following the object-oriented paradigm, an invocation message is sent
to an object in order to access it. This might be a request to open a TCP
connection for data transfer or to return a message containing all or
part of the object’s state. The final optional parameter [n], if present,
requests the delivery of the same message to n replicas of the object.
14.
OVERLAY CASE STUDIES:PASTRY, TAPESTRY
• Pastry is the message routing infrastructure deployed in several applications
including PAST [Druschel and Rowstron 2001], an archival (immutable) file
storage system implemented as a distributed hash table with the API in Figure
(d).
• Pastry has a straightforward but effective design.
• Tapestry is the basis for the OceanStore storage system.
• It has a more complex architecture than Pastry because it aims to support a
wider range of locality approaches.