Exploiting tag information for search and personalization [Elektronische Ressource] / Raluca Paiu
172 Pages
English
Gain access to the library to view online
Learn more

Exploiting tag information for search and personalization [Elektronische Ressource] / Raluca Paiu

-

Gain access to the library to view online
Learn more
172 Pages
English

Description

EXPLOITING TAG INFORMATION FOR SEARCHAND PERSONALIZATIONVon der Fakult at fur Elektrotechnik und Informatikder Gottfried Wilhelm Leibniz Universit at Hannoverzur Erlangung des GradesDOKTORIN DER NATURWISSENSCHAFTENDr. rer. nat.genehmigte DissertationvonDipl.-Ing. Raluca Paiugeboren am 2. Dezember 1980, in Bukarest, Rum anien2009Referent: Prof. Dr. Wolfgang NejdlKorreferent: Prof. Dr. Heribert VollmerTag der Promotion: 18. November 2009ABSTRACTWith the increasing popularity of Web 2.0 sites, the amount of content available onlineis multiplying at a rapid rate, at the same time becoming also more diverse in terms ofcontent types { pictures, music, Web pages, etc. { and quality. Professional and user-generated content are quite tightly merged together, such that for users it becomes di cultto spot only the high quality items perfectly matching their interests or current needs.On the other hand, collaborative tagging has become an increasingly popular means forsharing and organizing Web resources, leading to a huge amount of user generated metadata.Some previous works started to make use of this metadata for various purposes, though forimproving users’ access to information it is not yet obvious whether and how these tags orsubsets of them can be used. In this thesis we investigate these questions in detail and, basedon the outcomes of this analysis, propose a number of applications of tags for supportingsearch and personalization.

Subjects

Informations

Published by
Published 01 January 2010
Reads 14
Language English
Document size 1 MB

Exrait

EXPLOITING TAG INFORMATION FOR SEARCH
AND PERSONALIZATION
Von der Fakult at fur Elektrotechnik und Informatik
der Gottfried Wilhelm Leibniz Universit at Hannover
zur Erlangung des Grades
DOKTORIN DER NATURWISSENSCHAFTEN
Dr. rer. nat.
genehmigte Dissertation
von
Dipl.-Ing. Raluca Paiu
geboren am 2. Dezember 1980, in Bukarest, Rum anien
2009Referent: Prof. Dr. Wolfgang Nejdl
Korreferent: Prof. Dr. Heribert Vollmer
Tag der Promotion: 18. November 2009ABSTRACT
With the increasing popularity of Web 2.0 sites, the amount of content available online
is multiplying at a rapid rate, at the same time becoming also more diverse in terms of
content types { pictures, music, Web pages, etc. { and quality. Professional and user-
generated content are quite tightly merged together, such that for users it becomes di cult
to spot only the high quality items perfectly matching their interests or current needs.
On the other hand, collaborative tagging has become an increasingly popular means for
sharing and organizing Web resources, leading to a huge amount of user generated metadata.
Some previous works started to make use of this metadata for various purposes, though for
improving users’ access to information it is not yet obvious whether and how these tags or
subsets of them can be used. In this thesis we investigate these questions in detail and, based
on the outcomes of this analysis, propose a number of applications of tags for supporting
search and personalization.
We start with an in-depth study of tagging behaviors and motivations for di erent kinds
of resources and systems { Web pages (Del.icio.us), music (Last.fm) and images (Flickr)
{ being thus the rst to present a thorough analysis of tag distributions and characteris-
tics across multiple tagging environments. We analyze the implications of tags for search
applications and show which types of tags are mostly employed for tagging and searching
and which are the most easily remembered by users. Based on these observations, we pro-
pose a number of methods for automatically identifying the most valuable types of tags for
search, evaluation results indicating the high potential of these methods in enabling further
improvement of systems making use of social tags.
We continue discussing the use of tags for personalization applications and tackle two
di erent aspects: personalized music recommendations and personalized Web search. For
the former aspect we touch, we make use of collaboratively created user tags, while for
the latter, expert annotations extracted from the ODP online catalog are employed. Ex-
tensive experiments analyzing both approaches show them to yield improved results over
collaborative ltering and regular Google search, respectively.
Finally, we exploit tags for automatically inferring valuable knowledge about the re-
sources tags are attached to. We focus on the multimedia domain and propose three al-
gorithms relying on tags and other social information and aiming at identifying di erent
features of multimedia resources. The three scenarios we discuss target to identify: (1)
songs’ moods and themes; (2) potential music hits; and (3) landmark pictures. The results
of the algorithms’ evaluations we performed are promising and provide new insights into the
potential such methods have in enabling easier access to content and improving multimedia
retrieval.
Keywords:Web 2.0, Information Retrieval, PersonalizationZUSAMMENFASSUNG
Mit der zunehmenden Popularit at von Web 2.0 Seiten multipliziert sich die Menge der online
verfugbaren Daten rasant. Gleichzeitig werden die Web-Daten immer vielf altiger im Hin-
blick auf Inhalt, wie z.B. Bilder, Musik, Web-Seiten, und Qualit at. Professionell sowie nicht
professionell erzeugte Inhalte sind so eng miteinander verschmolzen, dass es fur Benutzer
schwierig wird, nur die hochwertigen Inhalte zu nden, die ihren Interessen entsprechen oder
derzeitige Anforderungen erfullen. Auf der anderen Seite, hat sich kollaboratives Tagging zu
einem zunehmend beliebten Mittel zum Austausch und zur Organisation von Web-Inhalten
entwickelt, wodurch eine sehr gro e Menge von Metadaten entstanden ist. Einige von den
fruheren Studien haben angefangen, diese fur verschiedene Zwecken zu nutzen.
Jedoch ist noch unklar in wieweit diese Tags oder Teilmengen davon zur Verbesserung
des Zugangs des Nutzers zu Daten benutzt werden k onnen. In dieser Dissertation unter-
suchen wir diese Fragen im Detail und schlagen als Ergebnis dieser Analyse vor, Such- und
Personalisierungs-Methoden durch Verwendung von Tags zu verbessern.
Wir beginnen mit einer ausfuhrlic hen Studie des Verhaltens und der Motivation von
Nutzern, Metadaten zu erstellen (\kollaboratives tagging"), bezogen auf verschiedene Arten
von Ressourcen und Systemen, wie z.B. Web-Seiten (Del.icio.us), Musik (Last.fm) und
Bilder (Flickr). Somit sind wir die ersten, die eine ausfuhrlic he Analyse der Verteilung und
Eigenschaften von Tags ub er mehrere Tagging-Umgebungen darstellen. Wir analysieren
die Auswirkungen von Tags fur Suchanwendungen und zeigen, welche Arten von Tags am
h au gsten fur Annotation und Suche eingesetzt werden und welche Tag-Typen am leicht-
esten fur die Benutzer zu erinnern sind. Anhand dieser Beobachtungen schlagen wir eine
Reihe von Methoden vor, die die besten Tags fur Such-Algorithmen automatisch ermit-
teln k onnen. Die Evaluierungsergebnisse zeigen das gro e Potenzial dieser Methoden, die
Funktionalit at von Systemen, die Tags verwenden, zu verbessern.
Wir diskutieren den Einsatz von Tags fur Personalisierungsanwendungen und betra-
chten zwei unterschiedliche Aspekte: personalisierte Musik-Empfehlungen und personal-
isierte Web-Suche. Fur den ersten Aspekt, den wir analysieren, nutzen wir die kollaborativ
erzeugten Benutzer Tags. Fur den zweiten Aspekt dagegen werden die von Experten er-
stellten Annotationen aus dem ODP Online Katalog verwendet. Umfangreiche Experimente
zeigen, dass beide Ans atze verbesserte Ergebnisse im Vergleich mit kollaborativem Filtering,
beziehungsweise Google-Suche liefern.
Letztendlich nutzen wir Tags um wertvolle Erkenntnisse ub er Ressourcen, die mit den
Tags assoziiert sind, zu gewinnen. Wir konzentrieren uns auf den Multimedia-Bereich
und entwickeln drei verschiedene Algorithmen, basierend auf Tags und anderen sozialen
Informationen, die als Ziel die Identi zierung verschiedenen Eigenschaften von Multimedia-
Ressourcen haben. Die drei Szenarien, die wir analysieren, versuchen Stimmungen und
Themen von Liedern zu identi zieren, potenzielle Musik-Hits vorherzusagen, sowie Bilder
von Sehenswurdigk eiten zu nden. Die Evaluationsergebnisse von unseren Algorithmen
sind vielversprechend und geben neue Einblicke in das Potenzial solcher Methoden zur Er-
leichterung des Zugangs zu Inhalten sowie zur Verbesserung des Multimedia Retrieval.
Schlagw orter: Web 2.0, Information Retrieval, PersonalisierungACKNOWLEDGMENTS
First, I would like to thank my supervisor, Prof. Dr. Wolfgang Nejdl for
giving me the opportunity of being part of L3S Research Center and Got-
tfried Wilhelm Leibniz University of Hannover. With his excellent guidance
he taught me the key points of how excellent research must be pursued. I
would also like to thank him for the continuous support, which allowed me to
attend many interesting conferences and project meetings and thus helped me
deepen my knowledge in this eld.
I would also like to thank Prof. Dr. Heribert Vollmer, my second super-
visor, for providing very useful comments on the draft of my thesis, as well
as Prof. Dr. Gabriele von Voigt for agreeing to be part of my dissertation
committee.
I am very grateful to Prof. Dr. Valentin Cristea, one of the best Professors
I met at the Politehnica University of Bucharest, who believed in me and
supported my arrival at L3S Research Center and Gottfried Wilhelm Leibniz
University of Hannover.
I would also like to thank to the many colleagues I cooperated with, either
from the Gottfried Wilhelm Leibniz University of Hannover, or from other
universities and institutes, for their support and valuable comments not limited
just to this thesis. Many thanks to the colleagues working in the administrative
departments, especially to Anca Vais, Marion Wicht and Iris Zieseniss, for their
support and help with many issues related to university administration.
I am also very grateful to Ionescu Aurelia for teaching me the German
language and thus contributing to a much easier adaptation to the life in
Germany. To Teodor Danet , a great Mathematics teacher, for believing in
my intellectual abilities and contributing to the development of my analytical
thinking.
A special thank to the European Commission for the IST work programme
and its frameworks, which supported the research within my thesis, in par-
ticular the 6th Framework Programme, and the PHAROS IP project (IST
Contract No. 045035).
Last, but de nitely not last, I am forever grateful to my family. To my
parents, for enduring the distance, for always supporting me in my initiatives
and for the excellent guidance in my education and development. To my
grandparents, who shaped my way and my intellectual skills starting from thevi
very early years of my childhood. To my uncle, Titel, a great person and
exceptional Physics Professor, for in uencing my decision on pursuing this
Ph.D. study. To Thomas, for standing by me along this Ph.D., for his love
and understanding, support and good advices.FOREWORD
The algorithms presented in this thesis have been published at various
conferences, as follows.
In Chapter 3 we describe contributions included in:
Can All Tags Be Used for Search?. Kerstin Bischo , Claudiu S. Firan,
Wolfgang Nejdl, Raluca Paiu. In: Proceedings of the 17th ACM Con-
ference on Information and Knowledge Mining, CIKM ’08, Napa Valley,
California, USA, October 26-30, 2008, pp. 193-202, ACM, 978-1-59593-
991-3. [BFNP08]
Automatically Identifying Tag Types. Kerstin Bischo , Claudiu S. Firan,
Cristina Kadar, Wolfgang Nejdl, Raluca Paiu. In: Proceedings of the 5th
International Conference on Advanced Data Mining and Applications.
+ADMA’09, Beijing, China, August 17-18, 2009. [BFK 09]
Chapter 4 presenting the use of tags for personalization applications is built
upon the work published in:
The Bene t of Using Tag-Based Pro les . Claudiu S. Firan, Wolfgang
Nejdl, Raluca Paiu. In: Proceedings of the 5th Latin American Web
Congress. LA-WEB ’07, October 31 - November 2 2007, Santiago de
Chile. [FNP]
Using ODP metadata to personalize search. Paul A. Chirita, Wolfgang
Nejdl, Raluca Paiu, Christian Kohlschutter. In Proceedings of the 28th
Annual International ACM SIGIR Conference on Research and Devel-
opment in Information Retrieval. SIGIR ’05, Salvador, Brazil, August
2005. [CNPK05]
Finally, in Chapter 5 we structure the presentation around the following
papers:
Deriving Music Theme Annotations from User Tags. Kerstin Bischo ,
Claudiu S. Firan, Raluca Paiu. In: Proceedings of the 18th International
World Wide Web Conference. WWW ’09, Madrid, Spain, April 20-24,
2009. [BFP09]viii
How Do You Feel about \Dancing Queen"? Deriving Mood & Theme An-
notations from User Tags. Kerstin Bischo , Claudiu S. Firan, Wolfgang
Nejdl, Raluca Paiu. In: Proceedings of the Joint Conference on Digital
Libraries. JCDL ’09, June 15-19, 2009, Austin, Texas, USA. [BFNP09]
Exploiting Flickr Tags and Groups for Finding Landmark Photos. Rabeeh
Abbasi, Sergey Chernov, Wolfgang Nejdl, Raluca Paiu, Ste en Staab. In:
Proceedings of the 31st European Conference on Information Retrieval.
+ECIR ’09, April 6-9, Toulouse, France. [ACN 09]
Social Knowledge-Driven Music Hit Prediction. Kerstin Bischo , Claudiu
S. Firan, Mihai Georgescu, Wolfgang Nejdl, Raluca Paiu. In: Proceed-
ings of the 5th International Conference on Advanced Data Mining and
+Applications. ADMA’09, Beijing, China, August 17-18, 2009. [BFG 09]
During the early stages of the Ph.D. studies I have also published a number
of papers investigating the use of metadata for improving desktop search. This
aspect is not touched in this thesis due to space limitation, but the complete
list of publications follows:
Leveraging Personal Metadata for Desktop Search { The Beagle++ Sys-
tem. Enrico Minack, Raluca Paiu, Stefania Costache, Gianluca Demar-
tini, Julien Gaugaz, Ekaterini Ioannou, Paul A. Chirita, Wolfgang Nejdl.
+In: Journal of Web Semantics. To appear (2009). [MPC 09]
Personalizing PageRank-Based Ranking over Distributed Collections. Ste-
fania Costache, Wolfgang Nejdl, Raluca Paiu. In: Proceedings of the
19th International Conference on Advanced Information Systems Engi-
neering. CAiSE ’07, June 2007, Trondheim, Norway. [CNP07]
The Beagle++ Toolbox: Towards an Extendable Desktop Search Archi-
tecture. Ingo Brunkhorst, Paul A. Chirita, Stefania Costache, Julien
Gaugaz, Ekaterini Ioannou, Tereza Iofciu, Enrico Minack, Wolfgang Ne-
jdl, Raluca Paiu. In: Proceedings of the Semantic Desktop and Social
Semantic Collaboration Workshop at the International Semantic Web
+Conference, ISWC ’06, November 2006, Athens, GA, USA. [BCC 06]
Beagle++: Semantically Enhanced Searching and Ranking on the Desk-
top. Paul A. Chirita, Stefania Costache, Wolfgang Nejdl, Raluca Paiu.
In: Proceedings of the 3rd European Semantic Web Conference. ESWC
’06, June 2006, , Budva, Montenegro. [CCNP06]
Peer-Sensitive ObjectRank - Valuing Contextual Information in Social
Networks. Andrei Damian, Wolfgang Nejdl, Raluca Paiu. In: Proceed-
ings of the 6th International Conference on Web Systems
Engineering. WISE ’05, November 2005, New York, NY, USA. [DNP05]ix
Keywords and RDF Fragments: Integrating Metadata and Full-Text Search
in Beagle++. Tereza Iofciu, Christian Kohlschutter, Wolfgang Nejdl,
Raluca Paiu. In: Proceedings of the Workshop on the Semantic Desktop
- Next Generation Personal Information Management and Collaboration
Infrastructure at the International Semantic Web Conference. ISWC ’05,
Galway, Ireland, November 2005. [IKNP05]
Semantically Enhanced Searching and Ranking on the Desktop. Paul A.
Chirita, Stefania Costache, Wolfgang Nejdl, Raluca Paiu. In Proceed-
ings of the International Semantic Web Conference Workshop on the
Semantic Desktop - Next Generation Personal Information Management
and Collaboration Infrastructure. ISWC ’05, Galway, Ireland, November
2005. [CCNP05]
Semantically Rich Recommendations in Social Networks for Sharing, Ex-
changing and Ranking Semantic Context. Stefania Costache, Wolfgang
Nejdl, Raluca Paiu. In: Proceedings of the 4th International Semantic
Web Conference, ISWC ’05, Galway, Ireland, November 2005. [CNP05b]
Desktop Search - How Contextual Information In uences Search Results
& Rankings. Wolfgang Nejdl, Raluca Paiu. In Proceedings of the 2nd
SIGIR Workshop on Information Retrieval in Context (IRiX), Salvador,
Brazil, August 2005. [NP05a]
Semantically Rich Recommendations in Social Networks for Sharing and
Exchanging Semantic Context. Stefania Costache, Wolfgang Nejdl, Raluca
Paiu. In: Proceedings of the 2nd European Semantic Web Conference
Workshop on Ontologies in P2P Communities, ESWC ’05, Greece, May
2005. [CNP05a]
Activity Based Metadata for Semantic Desktop Search. Paul A. Chirita,
Stefania Costache, Rita Gavriloaie, Wolfgang Nejdl, Raluca Paiu. In:
Proceedings of the 2nd European Semantic Web Conference, ESWC ’05,
+Crete, Greece, May 2005. [CGG 05]
I know I stored it somewhere - Contextual Information and Ranking on
our Desktop. Wolfgang Nejdl, Raluca Paiu. In: Proceedings of the 8th
International Workshop of the DELOS Network of Excellence on Digital
Libraries on Future Digital Library Management Systems (System Archi-
tecture & Information Access), March - April 2005, Dagstuhl, Germany.
[NP05b]