145 Pages
English

Top-k retrieval in peer to peer networks [Elektronische Ressource] / von Uwe Thaden

-

Gain access to the library to view online
Learn more

Description

TOP-K RETRIEVAL IN PEER TO PEER NETWORKSVon der Fakultat¨ fur¨ Elektrotechnik und Informatikder Universit¨ at Hannoverzur Erlangung des Grades einesDOKTORS DER NATURWISSENSCHAFTENDr.rer.nat.genehmigte Dissertation vonDipl.-Inform. Uwe Thadengeboren am 16. Juni 1973 in B¨ orger2005Referent: Prof. Dr. techn. Wolfgang NejdlKo-Referent: Prof. Dr.-Ing. Christian GrimmTag der Promotion: 12. Dezember 2005Im Lichte bereits erlangter Erkenntnis erscheint das gluc¨ klich Erreichte fast wie selbstverst¨andlich,und jeder intelligente Student erfaßt es ohne zu große Mu¨he. Aber das ahnungsvolle, Jahrew¨ ahrende Suchen im Dunkeln mit seiner gespannten Sehnsucht, seiner Abwechslung von Zuver-sicht und Ermattung und seinem endlichen Durchbrechen zur Klarheit, das kennt nur, wer es sel-ber erlebt hat.Albert Einstein, ‘Mein Weltbild’, 1934iACKNOWLEDGEMENTSScientific work and especially the creation of a PhD-thesis is not done isolated fromother people. I want to thank my dissertation advisor Prof. Dr. techn. WolfgangNejdl for his support and for giving me the needed freedom to work the way Iwanted. I wish to thank the co-referent Prof. Dr.-Ing. Christian Grimm for his timeand comments! A thank you goes to my colleagues at KBS and L3S; just to name afew who were great partners in discussions and always willing to help: Wolf Siberski,Dr. Martin Wolpers, Dr. Wolf-Tilo Balke, Prof. Dr. Friedrich Steimann, Dr.

Subjects

Informations

Published by
Published 01 January 2005
Reads 6
Language English
Document size 2 MB

TOP-K RETRIEVAL IN PEER TO PEER NETWORKS
Von der Fakultat¨ fur¨ Elektrotechnik und Informatik
der Universit¨ at Hannover
zur Erlangung des Grades eines
DOKTORS DER NATURWISSENSCHAFTEN
Dr.rer.nat.
genehmigte Dissertation von
Dipl.-Inform. Uwe Thaden
geboren am 16. Juni 1973 in B¨ orger
2005Referent: Prof. Dr. techn. Wolfgang Nejdl
Ko-Referent: Prof. Dr.-Ing. Christian Grimm
Tag der Promotion: 12. Dezember 2005Im Lichte bereits erlangter Erkenntnis erscheint das gluc¨ klich Erreichte fast wie selbstverst¨andlich,
und jeder intelligente Student erfaßt es ohne zu große Mu¨he. Aber das ahnungsvolle, Jahre
w¨ ahrende Suchen im Dunkeln mit seiner gespannten Sehnsucht, seiner Abwechslung von Zuver-
sicht und Ermattung und seinem endlichen Durchbrechen zur Klarheit, das kennt nur, wer es sel-
ber erlebt hat.
Albert Einstein, ‘Mein Weltbild’, 1934
iACKNOWLEDGEMENTS
Scientific work and especially the creation of a PhD-thesis is not done isolated from
other people. I want to thank my dissertation advisor Prof. Dr. techn. Wolfgang
Nejdl for his support and for giving me the needed freedom to work the way I
wanted. I wish to thank the co-referent Prof. Dr.-Ing. Christian Grimm for his time
and comments! A thank you goes to my colleagues at KBS and L3S; just to name a
few who were great partners in discussions and always willing to help: Wolf Siberski,
Dr. Martin Wolpers, Dr. Wolf-Tilo Balke, Prof. Dr. Friedrich Steimann, Dr. Jorg¨
Diederich, Fabian Leitritz, and Franziska Pfeffer and Katia Cappelli.
On the other hand I wish to thank my parents who always supported me in everything
I did. And last but not least a big kiss to my wife Carola and to my daughter Claire
Marie who gave me their love and help to encourage me every day — I love you!
iiABSTRACT
Top-k Retrieval in Peer to Peer Networks
in
English
Distributed information systems have become a very important technology, both in
research and industry. Especially peer to peer networks made their way from sim-
ple file sharing systems to matured systems in different contexts. When querying,
users are interested in only a few results which match best for their query. From
the information retrieval some techniques for ranked retrieval are well known. Most
of them rely on information which is gathered from parts of or the whole document
collection; this is called collection wide information. The first chapters of this thesis
present the background from peer to peer and information retrieval; state of the art
systems for peer to peer systems information retrieval systens are presented together
with their individual drawbacks, leading to the main contribution of this thesis: A
new algorithm ProToRaDo, which offers distributed ranking with respect to col-
lection wide information. The algorithm will be discussed, mathematically proved
and evaluated using an implemented simulation framework.
Keywords: Peer to Peer, Top-k, Information Retrieval
iiiABSTRACT
Top-k Retrieval in Peer to Peer Networks
in
Deutsch
Verteilte Informationssysteme, insbesondere Peer to Peer-Netzwerke, bekommen eine
immer st¨arker werdende Bedeutung. Diese Arbeit stellt die Entwicklung von verteil-
ten Retrieval- und Ranking-Methoden in Peer to Peer-Netzwerken dar. Ausgehend
von der Grundlagendarstellung von Peer to Peer werden aktuelle Peer to Peer-
Overlay Netzwerke diskutiert und im Kontext ‘Information Retrieval’ untersucht.
Eine wesentliche Herausforderung fur¨ das Ranking in Peer to Peer-Netzwerken ist
die Nutzung von bekannten Verfahren aus dem Information Retrieval, die Infor-
mationen erfordern, die aus großen Teilen oder der gesamten Menge von Informa-
tionsquellen hervorgehen. Solche Informationen werden collection wide information
genannt. Bekannte Information Retrieval-Techniken werden im Peer to Peer-Kontext
diskutiert; darauf aufbauend werden bestehende L¨ osungen aus dem Peer to Peer-
Bereich dargestellt. Der Schwerpunkt meiner Arbeit wird somit in der Darstellung
des entwickelten Algorithmus ProToRaDo f¨ur verteiltes Ranking unter Einbezug
von collection wide information in Peer to Peer-Netzwerken liegen. ProToRaDo wird
unter Peer to Peer- und Information Retrieval-Aspekten diskutiert, mathematisch
belegt und mit Simulationsergebnissen evaluiert.
Schlagworte: Peer to Peer, Top-k, Information Retrieval
ivTABLE OF CONTENTS
ACKNOWLEDGEMENTS .................................. ii
LIST OF FIGURES ...................................... viii
LIST OF TABLES ....................................... x
CHAPTER
I. Introduction ....................................... 1
1.1 Motivation..................................... 2
1.2 ProblemStatementandResearchChalenges.................. 3
1.3 ContributionofthisWork ............................ 4
1.4 ThesisOutline................................... 5
II. Peer to Peer Networks................................. 7
2.1 Introduction.................................... 8
2.1.1 History.................................. 10
2.1.2 NonFunctionalRequirements..................... 10
2.1.3 WhatisPertoPer?......................... 11
2.1.3.1 PertoPerandClient/Server............. 13
2.1.3.2 Peer to Peer and Distributed Databases . . . . . . . . . 14
2.1.3.3 Peer to Peer and Grids . . . . . . . . . . . . . . . . . . 15
2.1.4 OverlayNetworks............................ 16
2.2 ExistingSystems/Approaches.......................... 17
2.2.1 ClasificationofPertoPerNetworks................ 18
2.2.2 BasicTopologiesandRouting..................... 20
2.2.2.1 Unstructured Peer to Peer Networks . . . . . . . . . . . 21
2.2.2.2 Structured, DHT Networks . . . . . . . . . . . . . . . . 25
2.2.2.3 Structured, Non DHT Networks . . . . . . . . . . . . . 36
III. Search in Peer to Peer Networks .......................... 48
3.1 Background: Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.1 History.................................. 51
3.1.2 InformationRetrieval.......................... 53
3.1.3 Databases and Information Retrieval . . . . . . . . . . . . . . . . . 53
3.1.4 Modeling ................................ 55
3.1.4.1 BuildingtheIndex..................... 56
3.1.4.2 Working with the Index . . . . . . . . . . . . . . . . . . 59
3.1.4.3 Measurements/Evaluation................ 65
v3.1.5 SummaryandConclusion....................... 66
3.2 InformationRetrievalinPertoPerNetworks ................ 67
3.2.1 Chalenges................................ 67
3.2.2 ExistingSystems/Approaches.................... 67
3.2.2.1 PlanetP........................... 68
3.2.2.2 Rumorama......................... 69
3.2.2.3 Minerva........................... 70
3.2.2.4 Adlib............................ 71
3.2.2.5 PerSearch......................... 71
3.2.2.6 Pier............................. 72
3.2.2.7 PIRS............................ 73
3.2.2.8 OtherSystems....................... 74
3.2.3 OpenProblems............................. 76
IV. An Algorithm for Top-k Retrieval in Structured Peer to Peer Networks.77
4.1 Whatischalenging?............................... 78
4.1.1 Example................................. 79
4.1.2 ConcreteChalenges.......................... 80
4.2 Ingredients..................................... 81
4.2.1 Overview ................................ 81
4.2.1.1 Storage, Distribution, and Computation . . . . . . . . . 81
4.2.1.2 BasicStrategy....................... 83
4.3 AlgorithmDetails................................. 84
4.3.1 Top-kRetrievalAlgorithm....................... 84
4.3.2 CorectnesandOptimalityResults.................. 91
4.3.3 ExtendingtoContentandClassification............... 95
4.3.3.1 Example: News Corpora in Digital Libraries . . . . . . 95
4.3.3.2 QueryProcesing..................... 95
4.3.4 IDFIndexEntryExpiration...................... 98
4.3.5 QueryRoutingIndexes......................... 99
V. Evaluation ........................................100
5.1 SimulatorDesign .................................101
5.1.1 ExistingSimulators...........................101
5.1.2 SimulatorImplementation.......................102
5.1.2.1 Query- and Resource Description . . . . . . . . . . . . 103
5.1.2.2 Super Peer based Topology . . . . . . . . . . . . . . . . 103
5.1.2.3 Connections (Network Characteristics) . . . . . . . . . 103
5.2 Evaluation:Content................................104
5.2.1 DataSetup...............................104
5.2.1.1 Top-10...........................105
5.2.1.2 Top-1............................105
5.2.1.3 Staticscenario.......................105
5.2.1.4 Dynamic scenario . . . . . . . . . . . . . . . . . . . . . 105
5.2.2 Hypotheses...............................106
5.2.3 Results..................................106
5.2.3.1 Staticscenario.......................106
5.2.3.2 Dynamic scenario . . . . . . . . . . . . . . . . . . . . . 107
5.2.4 Discusion................................109
5.2.4.1 Estimated vs. measured contacted peers . . . . . . . . 109
vi5.3 Evaluation:ContentandClassification.....................110
5.3.1 DataSetup...............................110
5.3.2 Results..................................111
5.3.2.1 Indexsize..........................111
5.3.2.2 IndexEffectivity......................111
VI. Summary and Outlook.................................114
6.1 Summary......................................114
6.2 OutlookandFutureWork ............................115
APPENDICES.......................................... 119
A CuriculumVitæ .................................120
B Publications....................................121
BIBLIOGRAPHY........................................ 123
viiLIST OF FIGURES
Figure
1.1 Querying.......................................... 3
2.1 Overlaynetwork ..................................... 17
2.2 Classification of peer to peer networks (the colors depict the generation aspect) . . 19
2.3 UserinterfaceofNapsterclient............................. 23
2.4 QueryingNapster..................................... 24
2.5 FloodinginGnutela................................... 25
2.6 DistributedHashTable;adaptedfrom[83]....................... 26
2.7 IdentifiersinaChordring................................ 27
2.8 Chordfingertable .................................... 28
2.9 Pastry: Routing from peer 37A0F1 with key B57B2D . . . . . . . . . . . . . . . . . 31
2.10 ZonesplitinginCANonnodearival......................... 32
2.11 Route from node 1 to a key with coordinate (x, y) in a 2-dimensional CAN topology 33
2.12 CANasrandomtre................................... 33
2.13 A P-Grid tree. The arrow shows the mapping from a key to a peer. . . . . . . . . . 35
2.14 Asuperpeernetwork................................... 39
2.15 HyperCuP......................................... 41
2.16 Network topology construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.17 Networktopologyconstructioncontinued ....................... 44
2.18 Networktopologyconstructioncontinued ....................... 45
2.19 Networktopologyconstructioncontinued ....................... 46
2.20 AHypercubeanditsspanningtre ........................... 47
viii