Placing Files on the Nodes of Peer-to-Peer Systems [Elektronische Ressource] / Sunantha Sodsee
102 Pages
English

Placing Files on the Nodes of Peer-to-Peer Systems [Elektronische Ressource] / Sunantha Sodsee

-

Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Description

Placing Files on the Nodesof Peer-to-Peer SystemsDissertationzur Erlangung des akademischen GradesDOKTOR-INGENIEURINder Fakult¨at fu¨r Mathematik und Informatikder FernUniversit¨at in HagenvonSunantha SodseeNonthaburi, ThailandHagen 2011Betreuer :Prof. Dr.-Ing. habil. Herwig UngerGutachter :Prof. Dr.-Ing. habil. Herwig Unger; FernUniversit¨at in HagenAsst. Prof. Dr. Phayung Meesad; King Mongkut’s University of Technology NorthBangkok, BangkokIIIAcknowledgementsThere are many people without whom I would have never been able to complete this thesis.First of all, I have to say Thank you to my father and mother, Phayung and Chalai Sodsee,fortheirencouragementandsupport. Myworkcouldnotbefinishedwithouttheirinspirationgiven to me.I highly appreciate to have the financial support of a DAAD Matching Funds grant for threeyears, what gave me the possibility to study and work in Germany. Moreover, I am thankfulto both FernUniversit¨at in Hagen as well as King Mongkut’s University of Technology NorthBangkok,sincetheircooperationgavemethechancetowritemythesisasthefirstbinationalPhD student of both institutions.I am grateful to my supervisors, Prof. Dr.-Ing. habil. Herwig Unger and Prof. Dr. PhayungMeesadfortheirpermanentsupportandhelpthroughoutthetimeofmyresearchandduringthe writing of this thesis. Furthermore, I received many hints, suggestions and support fromProf. Dr. Dr. Wolfgang A. Halang, which enhanced my academic research work and alsomy private life.

Subjects

Informations

Published by
Published 01 January 2012
Reads 25
Language English
Document size 5 MB

Exrait

Placing Files on the Nodes
of Peer-to-Peer Systems
Dissertation
zur Erlangung des akademischen Grades
DOKTOR-INGENIEURIN
der Fakult¨at fu¨r Mathematik und Informatik
der FernUniversit¨at in Hagen
von
Sunantha Sodsee
Nonthaburi, Thailand
Hagen 2011Betreuer :
Prof. Dr.-Ing. habil. Herwig Unger
Gutachter :
Prof. Dr.-Ing. habil. Herwig Unger; FernUniversit¨at in Hagen
Asst. Prof. Dr. Phayung Meesad; King Mongkut’s University of Technology North
Bangkok, BangkokIII
Acknowledgements
There are many people without whom I would have never been able to complete this thesis.
First of all, I have to say Thank you to my father and mother, Phayung and Chalai Sodsee,
fortheirencouragementandsupport. Myworkcouldnotbefinishedwithouttheirinspiration
given to me.
I highly appreciate to have the financial support of a DAAD Matching Funds grant for three
years, what gave me the possibility to study and work in Germany. Moreover, I am thankful
to both FernUniversit¨at in Hagen as well as King Mongkut’s University of Technology North
Bangkok,sincetheircooperationgavemethechancetowritemythesisasthefirstbinational
PhD student of both institutions.
I am grateful to my supervisors, Prof. Dr.-Ing. habil. Herwig Unger and Prof. Dr. Phayung
Meesadfortheirpermanentsupportandhelpthroughoutthetimeofmyresearchandduring
the writing of this thesis. Furthermore, I received many hints, suggestions and support from
Prof. Dr. Dr. Wolfgang A. Halang, which enhanced my academic research work and also
my private life. Last but not least, I could count at all times on many friends and colleagues
who accompanied me and gave me passion throughout partially difficult times: in particular
my landlord and landlady, Hans and Anna Siering, who taught me to speak German, offer
me a good second home as well as a nice living in Germany.IVContents V
Contents
Abstract VII
1 Introduction 1
1.1 Towards an Internet-based Data Management . . . . . . . . . . . . . . . 1
1.2 Decentralisation of Services . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contribution of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 State of the Art 7
2.1 Content Delivery Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 P2P Systems for Content Distribution . . . . . . . . . . . . . . . . . . . . 11
2.3 Searching for Information . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Evaluation of Search Results . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Hubs and Authorities . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.3 PageReputation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Coordination and Self-organisation . . . . . . . . . . . . . . . . . . . . . 20
2.6 P2PNetSim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 On Random Walkers 25
3.1 Random Walkers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Definition and Related Work . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 Population of Random Walkers . . . . . . . . . . . . . . . . . . . 26
3.1.3 Ants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 PageRank Calculation with Random Walks . . . . . . . . . . . . . . . . . 31
3.2.1 General Principles . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Estimation of Network Size . . . . . . . . . . . . . . . . . . . . . 34
3.2.3 Convergence in Real Systems . . . . . . . . . . . . . . . . . . . . 37
3.2.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 A Generalised Node Evaluation 43
4.1 Influence of Network Parameters . . . . . . . . . . . . . . . . . . . . . . 43
4.2 NodeRank: an Extension of PageRank . . . . . . . . . . . . . . . . . . . 44
4.3 Simulation of NodeRank and its Properties . . . . . . . . . . . . . . . . . 45VI Contents
5 Evaluation of User Activities 48
5.1 Characterisation of User Activities . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Measuring and Propagating Node Activities . . . . . . . . . . . . . . . . . 50
5.2.1 General Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.2 Representation of Peer Activities . . . . . . . . . . . . . . . . . . 51
5.2.3 Identifying the Utilisation of Network Areas . . . . . . . . . . . . . 52
5.3 Activity-based Clustering of Peers . . . . . . . . . . . . . . . . . . . . . . 61
5.3.1 Planar or Plane-embedded Environments . . . . . . . . . . . . . . 61
5.3.2 Generalisation of Clustering Method . . . . . . . . . . . . . . . . . 65
5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6 Application to Video-on-Demand Systems 74
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 Probability Functions for Picking and Depositing Files . . . . . . . . . . . 76
6.3 Calculation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7 Conclusion and Future Work 82
7.1 Contribution and Review of Results . . . . . . . . . . . . . . . . . . . . . 82
7.2 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 83
Bibliography 85VII
Abstract
Content Delivery Networks(CDNs) aretoolsto distributea fast growing amount ofinforma-
tion to users. Ensuring compliance with the required Quality of Service (QoS) parameters
becomes a key problem for providers. To cope with insufficient bandwidth and capabilities
of single-server architectures, more and more distributed and even (hybrid) Peer-to-Peer
(P2P) system architectures are used for data servicing. A P2P system consists of a set
of fully identical, uniform node computers, called peers. Since a peer can be both, client
and server, any peer can also act as a server hosting a set of information. The tedious
task is now to decide on which place which contents shall be located, to guarantee short
response times for all user requests. This multicriterial decision cannot be made by a single
or a group of human administrators, since they are not able to oversee or control complex
network systems in their entirety.
In this thesis, a new parameter NodeRank is defined, and used as a single value to compare
allcapabilitiesofanoderelevantinprovidingservicetoallpeersinaP2Psystem. Compared
to PageRank known from the Internet search engine Google, this parameter does not only
evaluate the topological position of a node, but also its network and hardware resources as
well as the impact of user activities to it. To handle user activities, special locally working
recognition and propagation algorithms are devised.
Random walking of a population of walkers appears to be an appropriate tool to compute
NodeRankvaluesintherequired, fullydecentralisedmanner. Themethodsproposedusethe
effect that all considered factors will influence the random walkers’ transition probabilities at
each node. To adjust their impact, the parameters are combined in a linear or exponential
way. After all peers have been assigned to a corresponding service node each, a weighted
short-distance tree of all links leaving the service nodes is established as a side effect. It is
shown that such trees are useful to improve message routing, and provide the information
neededtomovetheservicenodes’functionstepbysteptowellbalanced,optimisedpositions.
Last but not least, a new algorithm, which controls random walkers to pick up, move
and drop files on positions guaranteeing optimal response times, is able to manage all file
operations in a decentralised system.
The implementation of the results within a Video-on-Demand environment leads to the
development of a concept for the first, fully adaptive CDN, which can automatically fulfill
most administration tasks. In this approach, random walkers constitute the universal ad-
ministration tool substituting for humans. Simulations of all suggested methods in differentVIII Abstract
scenariosunderlineitsefficiencyaswellasthefactthatupto80%ofallhumanmanagement
effort for CDNs can be saved by it.1
1 Introduction
1.1 Towards an Internet-based Data Management
Before the broad introduction of the Internet it was quite difficult to find out in a short
time whether any information about a given problem exists, and/or to obtain all available
information in a reasonable time. With the World Wide Web (WWW) this situation has
drastically changed in the late 1990s. Voluminous libraries with their huge amount of books
and catalogues could be stored on hard disks or other storage devices within comparably
small spaces, and accessed from users form all over the world due to a set of transparent
protocols within few milliseconds.
Nowadays, the users face exactly the opposite problem: About any topic a huge amount of
information is available, and it becomes hard to evaluate whether this information is from
reliable sources, valid and correct and can be used for further work. In addition, no user is
able anymore to oversee the almost infinite number of data sources.
Consequently,searchengineshavebeendevelopedtofilteroutrequestedinformationbyaset
of keywords entered by the user. Naturally, the selection of such search queries is a key issue
forthequalityofinformationretrievedandsignificantlyinfluencestheselectionofdocuments
found. Another problem is that any new developments or events must be described with
well established keywords, since newly emerging terms are not automatically propagated.
Nevertheless, for most topics even extensive experience is not enough to limit the number
of search results to a few, manageable ones. Consequently, scientists and practitioners dealt
with the problem of ranking search results, which might today be strongly influenced by
political and especially economic interests.
Despite all technical and technological progress, the download of documents (as well as the
use of any other services in general) may still take significant time. Some servers may even
be overloaded and break down during peak access times. One reason for this effect is the
irregular and more or less uncoordinated development of the Internet. In addition, the user
behaviour and, therefore, the traffic to be expected is hard or impossible to predict, and
the interests of users are changing in a similar, unpredictable manner. Consequently, to
the single user the whole network appears as a huge, chaotic, almost non-understandable
system, in which data or services are considered to be not available, if
• data/services are not available fast enough,2 1 Introduction
• data are not updated within suitable intervals and, consequently, appear as not up to
date,
• users are frightened by huge numbers of available alternatives,
• servers are not known or only accessible with complex management procedures or
• users cannot understand or cope with access methods.
Also, due to technical reasons, there always exists a gap between
• the bandwidth needed and the existing one to bring contents to the users within a
reasonable response time or
• the place where content is stored and needed by the users and
• available software that allows users to access the resources needed in a transparent
manner.
Figure 1.1: Mutual influences in a service network
As shown in Figure 1.1, there is a permanent, mutual influence between content, user and
user activities, and the network with its parameters and configuration. Modern computers
and computer networks shall be able to learn from the behaviour of the users, how to
adapt their configuration and services to the users’ needs and their environments. Content
Delivery Networks (CDN) are one approach to deliver any information with a respected
Quality of Service (QoS) to users. In this context, QoS mostly means to limit package
transfer times, latency and jitter as well as error rates to a still tolerable or minimal amount
(mostly following some real-time conditions). Video-on-Demand (VoD) systems are just one
example, where a huge amount of data must be transferred in real time to a high number
of users in front of their monitors.