BY
NANDAKUMAR P,
AP/CSE,
CAHCET
CONTENTS:
 Introduction
 Peer-to-Peer Systems
 Middleware, Routing Overlays
 Case Studies: Pastry, Tapestry
GOAL: To enable the sharing of data and resources on a very large scale by
eliminating any requirement for separately managed servers and their
associated infrastructure.
 Peer-to-peer systems aim to support useful distributed services and
applications using data and computing resources available in the personal
computers and workstations that are present in the Internet.
 Peer-to-Peer applications as ‘applications that exploit resources available
at the edges of the Internet – Storage, cycles, content, human presence’.
PEER - TO - PEER SYSTEMS
CHARACTERISTICS
• Their design ensures that each user contributes resources to the system.
• Although they may differ in the resources that they contribute, all the nodes in
a peer-to-peer system have the same functional capabilities and responsibilities.
• Their correct operation does not depend on the existence of any centrally
administered systems.
• They can be designed to offer a limited degree of anonymity to the providers and
users of resources.
• A key issue for their efficient operation is the choice of an algorithm for the
placement of data across many hosts and subsequent access to it.
Figure (a): Distinctions between IP and overlay routing for
peer-to-peer applications
IP Application-level routing overlay
Scale IPv4 is limitedto 232 addressablenodes.The
IPv6 name space is much moregenerous
(2128), but addresses in bothversions are
hierarchically structured and much of thespace
is pre-allocated according to administrative
requirements.
Peer-to-peer systems can addressmore objects.
The GUID name space is very largeand flat
(>2128), allowing it to be much morefully
occupied.
Load balancing Loads on routers are determined by network
topologyand associated traffic patterns.
Object locations can be randomized and hence
traffic patterns are divorced fromthe network
topology.
Network dynamics
(addition/deletion of
objects/nodes)
IP routingtables are updated asynchronouslyon
a best-efforts basis with time constants onthe
order of1 hour.
Routing tables can be updated synchronouslyor
asynchronouslywithfractions of a second
delays.
Fault tolerance Redundancy is designed intotheIP networkby
its managers, ensuring tolerance of a single
router or network connectivityfailure. n-fold
replication is costly.
Routes and object references can be replicated
n-fold, ensuring tolerance of n failures ofnodes
or connections.
Target identification Each IP address maps to exactly one target
node.
Messages can be routed to thenearestreplicaof
a target object.
Security andanonymity Addressing is only secure when all nodesare
trusted. Anonymity for the owners ofaddresses
is not achievable.
Security can be achieved even in environments
with limited trust.A limited degree of
anonymity can be provided.
NAPSTER
 The first large scale peer-to-peer network was Napster, set up in 1999 to share digital
music files over the Internet.
 While Napster maintained centralized (and replicated) indices, the music files were
created and made available by individuals, usually with music copied from CDs to
computer files.
 Music content owners sued Napster for copyright violations and succeeded in
shutting down the service. Figure (b) documents the process of requesting a music file
from Napster.
Figure (b): Napster: peer-to-peer file sharing with a
centralized, replicated index
Napster server
Index1. File location
2. List of peers
request
offering the file
peers
3. File request
4. File delivered
5. Index update
Napster server
Index
Peer-to-peermiddleware
• The third generation is characterized by the emergence of middleware
layers for the application-independent management of distributed resources
on a global scale.
•Peer-to-peer middleware systems are designed specifically to meet the
need for the automatic placement and subsequent location of the
distributed objects managed by peer-to-peer systems and applications.
• The best-known and most fully developed examples include Pastry
[Rowstron and Druschel 2001], Tapestry [Zhao et al. 2004], CAN
[Ratnasamy et al. 2001], Chord [Stoica et al. 2001] and Kademlia
[Maymounkov and Mazieres 2002].
Functional requirements:
• The function of peer-to-peer middleware is to simplify the construction of
services that are implemented across many hosts in a widely distributed
network.
• Other important requirements include the ability to add new resources and to
remove them at will and to add hosts to the service and remove them.
• Like other middleware, peer-to-peer middleware should offer a simple
programming interface to application programmers that is independent of the
types of distributed resource that the application manipulates.
Non-functional requirements:
•Global scalability
•Load balancing
•Optimization for local interactions between neighboring peers
•Accommodating to highly dynamic host availability
ROUTINGOVERLAYS
• In peer-to-peer systems a distributed algorithm known as a routing overlay takes
responsibility for locating nodes and objects.
• The routing overlay ensures that any node can access any object by routing each
request through a sequence of nodes, exploiting knowledge at each of them to locate the
destination object.
• Peer-to-peer systems usually store multiple replicas of objects to ensure availability.
Main Task:
• Routing of requests to objects
• Insertion of objects
• Deletion of objects
• Node addition and removal
Figure (c): Distribution of information in a routing overlay
Object:
Node:
D
CÕs routing knowledge
DÕs routing knowledgeAÕs routing knowledge
BÕs routing knowledge
C
A
B
Figure (d) : Basic programming interface for a distributed
hash table (DHT) as implemented by the PAST API over
Pastry
put(GUID, data)
The data is stored in replicas at all nodes responsible for the object identified
by GUID.
remove(GUID)
Deletes all references to GUID and the associated data.
value = get(GUID)
The data associated with GUID is retrieved from one of the nodes responsible
it.
Figure (e): Basic programming interface for distributed object
location and routing (DOLR) as implemented by Tapestry
publish(GUID )
GUID can be computed from the object (or some part of it, e.g. its
name). This function makes the node performing a publish operation
the host for the object corresponding to GUID.
unpublish(GUID)
Makes the object corresponding to GUID inaccessible.
sendToObj(msg, GUID, [n])
Following the object-oriented paradigm, an invocation message is sent
to an object in order to access it. This might be a request to open a TCP
connection for data transfer or to return a message containing all or
part of the object’s state. The final optional parameter [n], if present,
requests the delivery of the same message to n replicas of the object.
OVERLAY CASE STUDIES: PASTRY, TAPESTRY
• Pastry is the message routing infrastructure deployed in several applications
including PAST [Druschel and Rowstron 2001], an archival (immutable) file
storage system implemented as a distributed hash table with the API in Figure
(d).
• Pastry has a straightforward but effective design.
• Tapestry is the basis for the OceanStore storage system.
• It has a more complex architecture than Pastry because it aims to support a
wider range of locality approaches.
Unit 3 cs6601 Distributed Systems

Unit 3 cs6601 Distributed Systems

  • 1.
  • 2.
    CONTENTS:  Introduction  Peer-to-PeerSystems  Middleware, Routing Overlays  Case Studies: Pastry, Tapestry
  • 3.
    GOAL: To enablethe sharing of data and resources on a very large scale by eliminating any requirement for separately managed servers and their associated infrastructure.  Peer-to-peer systems aim to support useful distributed services and applications using data and computing resources available in the personal computers and workstations that are present in the Internet.  Peer-to-Peer applications as ‘applications that exploit resources available at the edges of the Internet – Storage, cycles, content, human presence’. PEER - TO - PEER SYSTEMS
  • 4.
    CHARACTERISTICS • Their designensures that each user contributes resources to the system. • Although they may differ in the resources that they contribute, all the nodes in a peer-to-peer system have the same functional capabilities and responsibilities. • Their correct operation does not depend on the existence of any centrally administered systems. • They can be designed to offer a limited degree of anonymity to the providers and users of resources. • A key issue for their efficient operation is the choice of an algorithm for the placement of data across many hosts and subsequent access to it.
  • 5.
    Figure (a): Distinctionsbetween IP and overlay routing for peer-to-peer applications IP Application-level routing overlay Scale IPv4 is limitedto 232 addressablenodes.The IPv6 name space is much moregenerous (2128), but addresses in bothversions are hierarchically structured and much of thespace is pre-allocated according to administrative requirements. Peer-to-peer systems can addressmore objects. The GUID name space is very largeand flat (>2128), allowing it to be much morefully occupied. Load balancing Loads on routers are determined by network topologyand associated traffic patterns. Object locations can be randomized and hence traffic patterns are divorced fromthe network topology. Network dynamics (addition/deletion of objects/nodes) IP routingtables are updated asynchronouslyon a best-efforts basis with time constants onthe order of1 hour. Routing tables can be updated synchronouslyor asynchronouslywithfractions of a second delays. Fault tolerance Redundancy is designed intotheIP networkby its managers, ensuring tolerance of a single router or network connectivityfailure. n-fold replication is costly. Routes and object references can be replicated n-fold, ensuring tolerance of n failures ofnodes or connections. Target identification Each IP address maps to exactly one target node. Messages can be routed to thenearestreplicaof a target object. Security andanonymity Addressing is only secure when all nodesare trusted. Anonymity for the owners ofaddresses is not achievable. Security can be achieved even in environments with limited trust.A limited degree of anonymity can be provided.
  • 6.
    NAPSTER  The firstlarge scale peer-to-peer network was Napster, set up in 1999 to share digital music files over the Internet.  While Napster maintained centralized (and replicated) indices, the music files were created and made available by individuals, usually with music copied from CDs to computer files.  Music content owners sued Napster for copyright violations and succeeded in shutting down the service. Figure (b) documents the process of requesting a music file from Napster.
  • 7.
    Figure (b): Napster:peer-to-peer file sharing with a centralized, replicated index Napster server Index1. File location 2. List of peers request offering the file peers 3. File request 4. File delivered 5. Index update Napster server Index
  • 8.
    Peer-to-peermiddleware • The thirdgeneration is characterized by the emergence of middleware layers for the application-independent management of distributed resources on a global scale. •Peer-to-peer middleware systems are designed specifically to meet the need for the automatic placement and subsequent location of the distributed objects managed by peer-to-peer systems and applications. • The best-known and most fully developed examples include Pastry [Rowstron and Druschel 2001], Tapestry [Zhao et al. 2004], CAN [Ratnasamy et al. 2001], Chord [Stoica et al. 2001] and Kademlia [Maymounkov and Mazieres 2002].
  • 9.
    Functional requirements: • Thefunction of peer-to-peer middleware is to simplify the construction of services that are implemented across many hosts in a widely distributed network. • Other important requirements include the ability to add new resources and to remove them at will and to add hosts to the service and remove them. • Like other middleware, peer-to-peer middleware should offer a simple programming interface to application programmers that is independent of the types of distributed resource that the application manipulates. Non-functional requirements: •Global scalability •Load balancing •Optimization for local interactions between neighboring peers •Accommodating to highly dynamic host availability
  • 10.
    ROUTINGOVERLAYS • In peer-to-peersystems a distributed algorithm known as a routing overlay takes responsibility for locating nodes and objects. • The routing overlay ensures that any node can access any object by routing each request through a sequence of nodes, exploiting knowledge at each of them to locate the destination object. • Peer-to-peer systems usually store multiple replicas of objects to ensure availability. Main Task: • Routing of requests to objects • Insertion of objects • Deletion of objects • Node addition and removal
  • 11.
    Figure (c): Distributionof information in a routing overlay Object: Node: D CÕs routing knowledge DÕs routing knowledgeAÕs routing knowledge BÕs routing knowledge C A B
  • 12.
    Figure (d) :Basic programming interface for a distributed hash table (DHT) as implemented by the PAST API over Pastry put(GUID, data) The data is stored in replicas at all nodes responsible for the object identified by GUID. remove(GUID) Deletes all references to GUID and the associated data. value = get(GUID) The data associated with GUID is retrieved from one of the nodes responsible it.
  • 13.
    Figure (e): Basicprogramming interface for distributed object location and routing (DOLR) as implemented by Tapestry publish(GUID ) GUID can be computed from the object (or some part of it, e.g. its name). This function makes the node performing a publish operation the host for the object corresponding to GUID. unpublish(GUID) Makes the object corresponding to GUID inaccessible. sendToObj(msg, GUID, [n]) Following the object-oriented paradigm, an invocation message is sent to an object in order to access it. This might be a request to open a TCP connection for data transfer or to return a message containing all or part of the object’s state. The final optional parameter [n], if present, requests the delivery of the same message to n replicas of the object.
  • 14.
    OVERLAY CASE STUDIES:PASTRY, TAPESTRY • Pastry is the message routing infrastructure deployed in several applications including PAST [Druschel and Rowstron 2001], an archival (immutable) file storage system implemented as a distributed hash table with the API in Figure (d). • Pastry has a straightforward but effective design. • Tapestry is the basis for the OceanStore storage system. • It has a more complex architecture than Pastry because it aims to support a wider range of locality approaches.