Searching in P2P networks
Mohamed Elsharnouby - Istanbul Sehir University
P2P networks
Structured:
- CAN
- Chord
- Tapestry
- Pastry
- Viceroy
Unstructured:
- Freenet
- Gnutella
- BitTorrent
Structured
Pros:
- Can search any resource even if rare
- Search is more efficient as it exploits the
structure
Cons:
- Not very robust and resilient as
unstructured
- Overhead of maintaining the structure with
joining and leaving peers
Pros:
- More resilient to failures
- Better handling of joining/leaving peers
- Allow better optimization of routing by
changing the overlay structure
Cons:
- Rare resources are harder to find if found
at all
- Searching can flood and overload the
whole network
Unstructured
Search in Structured Networks
Content
Addressable
Network (CAN)
CAN
Multidimensional Cartesian
coordinate space on a multi-torus
Each peer has a neighbour list
Routing performance is O( × N1/
)
CAN
Joining: by splitting an existing peer’s
zone into half
Neighbour list: transferred from the
old peer - updated for all neighbouring
peers
Leaving: a neighboring peer takes over
its space and the neighbour lists are
updated
CAN improvement
Multiple coordinate spaces (realities)
with different place for each peer,
same place for data
Increasing dimensions: gives better
routing. But both are needed
Overloading zones: more data
availability - fault tolerance - shorter
routing
Topological awareness of IP network
Using multiple hash functions:
increases data availability
Chord
Chord
Peers are organized around a circle
according to their ID which is an m-bit
ID assigned by a uniform hashing
function
Each data item is assigned an ID on
the same circle and assigned to its
successor peer
Routing takes O(log N) if peer
information is up to date
Chord
Each peer carries a finger table for
info of peers which are successors of
IDs that increase by a power [ hence
the O(log N) routing ]
Resilience is increased by maintaining
another list of length r of the peer’s
direct successors
Joining and leaving: needs successor
keys to be updated which is done by a
stabilization protocol that runs
periodically in the background
Chord
It needs O(log N) for routing, much
better than CAN
Needs O(log2
N) which is worse than
CAN which requires O(2 x d)
Could make some use of CAN
improvements ideas as multiple
realities
Cannot take into account IP topology
Tapestry
Tapestry
The nth peer that the message
reaches shares a suffix of at least
length n with destination ID
Routing takes O(logb
N) where b is the
base of IDs
Uses multiple roots for each data
object to avoid single points of failure
Robustness is increased by making
the neighbour map maintain two
backup peers in addition to the
primary ones
Pastry
Pastry
Same as Tapestry
Doesn’t have optimization for locality
of peers
Less efficient replication algorithm
Viceroy
Viceroy
- General Ring: every node is
connected to its successor and
predecessor
- Level Ring: every node is
connected to others on ring
- Butterfly: every level L:
- Down right edge that is
added to a long range
- Down left edge to close
range
- Up edge to close range
Routing performance is O(log N)
Search in Unstructured Networks
Freenet
Freenet
It uses Steepest Ascent Hill Climbing
with backtracking algorithm
It caches the found file in the path
peers => improvement of routing
Anonymity is one of the main
properties of the network
Least Recently Used (LRU) is the basic
cache replacement algorithm
An enhanced algorithm for cache
replacement could be used for cache
replacement
Freenet
Enhanced-clustering with Random
Shortcut
It uses the concept of small world by
choosing the farthest node in the
cache
If the new added node is closer it
replaces in the cache
If it’s farther with a certain probability
it replaces
The choice of optimum is still an
open question
Gnutella
Gnutella
Routing through the network is mainly
done by flooding (BFS) with certain
TTL and limit of hops
This causes high overload of the
network when too many nodes join
To join a client connects to one of the
peers and broadcasts its content by
flooding as well
A concept of ultra peers with higher
bandwidth is introduced to carry the
network routing and search operations
for its leaves
BitTorrent
BitTorrent
A centralized P2P system
It cuts files into pieces of fixed size
(256 Kbytes each) and hashes them
with SHA1 to confirm integrity of data
A client needs to connect to Tracker
that gives the client a set of random
peers having the file needed
A downloaded piece could be seeded
DHT introduced trackerless BitTorrent
Questions?
Thank you

Peer to peer network schemes and finding algorithms

  • 1.
    Searching in P2Pnetworks Mohamed Elsharnouby - Istanbul Sehir University
  • 2.
    P2P networks Structured: - CAN -Chord - Tapestry - Pastry - Viceroy Unstructured: - Freenet - Gnutella - BitTorrent
  • 3.
    Structured Pros: - Can searchany resource even if rare - Search is more efficient as it exploits the structure Cons: - Not very robust and resilient as unstructured - Overhead of maintaining the structure with joining and leaving peers Pros: - More resilient to failures - Better handling of joining/leaving peers - Allow better optimization of routing by changing the overlay structure Cons: - Rare resources are harder to find if found at all - Searching can flood and overload the whole network Unstructured
  • 4.
  • 5.
  • 6.
    CAN Multidimensional Cartesian coordinate spaceon a multi-torus Each peer has a neighbour list Routing performance is O( × N1/ )
  • 7.
    CAN Joining: by splittingan existing peer’s zone into half Neighbour list: transferred from the old peer - updated for all neighbouring peers Leaving: a neighboring peer takes over its space and the neighbour lists are updated
  • 8.
    CAN improvement Multiple coordinatespaces (realities) with different place for each peer, same place for data Increasing dimensions: gives better routing. But both are needed Overloading zones: more data availability - fault tolerance - shorter routing Topological awareness of IP network Using multiple hash functions: increases data availability
  • 9.
  • 10.
    Chord Peers are organizedaround a circle according to their ID which is an m-bit ID assigned by a uniform hashing function Each data item is assigned an ID on the same circle and assigned to its successor peer Routing takes O(log N) if peer information is up to date
  • 11.
    Chord Each peer carriesa finger table for info of peers which are successors of IDs that increase by a power [ hence the O(log N) routing ] Resilience is increased by maintaining another list of length r of the peer’s direct successors Joining and leaving: needs successor keys to be updated which is done by a stabilization protocol that runs periodically in the background
  • 12.
    Chord It needs O(logN) for routing, much better than CAN Needs O(log2 N) which is worse than CAN which requires O(2 x d) Could make some use of CAN improvements ideas as multiple realities Cannot take into account IP topology
  • 13.
  • 14.
    Tapestry The nth peerthat the message reaches shares a suffix of at least length n with destination ID Routing takes O(logb N) where b is the base of IDs Uses multiple roots for each data object to avoid single points of failure Robustness is increased by making the neighbour map maintain two backup peers in addition to the primary ones
  • 15.
  • 16.
    Pastry Same as Tapestry Doesn’thave optimization for locality of peers Less efficient replication algorithm
  • 17.
  • 18.
    Viceroy - General Ring:every node is connected to its successor and predecessor - Level Ring: every node is connected to others on ring - Butterfly: every level L: - Down right edge that is added to a long range - Down left edge to close range - Up edge to close range Routing performance is O(log N)
  • 19.
  • 20.
  • 21.
    Freenet It uses SteepestAscent Hill Climbing with backtracking algorithm It caches the found file in the path peers => improvement of routing Anonymity is one of the main properties of the network Least Recently Used (LRU) is the basic cache replacement algorithm An enhanced algorithm for cache replacement could be used for cache replacement
  • 22.
    Freenet Enhanced-clustering with Random Shortcut Ituses the concept of small world by choosing the farthest node in the cache If the new added node is closer it replaces in the cache If it’s farther with a certain probability it replaces The choice of optimum is still an open question
  • 23.
  • 24.
    Gnutella Routing through thenetwork is mainly done by flooding (BFS) with certain TTL and limit of hops This causes high overload of the network when too many nodes join To join a client connects to one of the peers and broadcasts its content by flooding as well A concept of ultra peers with higher bandwidth is introduced to carry the network routing and search operations for its leaves
  • 25.
  • 26.
    BitTorrent A centralized P2Psystem It cuts files into pieces of fixed size (256 Kbytes each) and hashes them with SHA1 to confirm integrity of data A client needs to connect to Tracker that gives the client a set of random peers having the file needed A downloaded piece could be seeded DHT introduced trackerless BitTorrent
  • 27.
  • 28.