16 Pages
English

RoSeS A continuous content based query engine for RSS feeds

Gain access to the library to view online
Learn more

Description

Niveau: Supérieur, Doctorat, Bac+8
RoSeS : A continuous content-based query engine for RSS feeds Jordi Creus1, Bernd Amann1, Nicolas Travers2, and Dan Vodislav3 1 LIP6, CNRS – Universite Pierre et Marie Curie, Paris, France 2 Cedric/CNAM – Conservatoire National des Arts et Metiers, Paris, France 3 ETIS, CNRS – University of Cergy-Pontoise, Cergy, France Abstract. In this paper we present RoSeS (Really Open Simple and Efficient Syndication), a generic framework for content-based RSS feed querying and ag- gregation. RoSeS is based on a data-centric approach, using a combination of standard database concepts like declarative query languages, views and multi- query optimization. Users create personalized feeds by defining and composing content-based filtering and aggregation queries on collections of RSS feeds. Pub- lishing these queries corresponds to defining views which can then be used for building new queries / feeds. This naturally reflects the publish-subscribe nature of RSS applications. The contributions presented in this paper are a declarative RSS feed aggregation language, an extensible stream algebra for building effi- cient continuous multi-query execution plans for RSS aggregation views, a multi- query optimization strategy for these plans and a running prototype based on a multi-threaded asynchronous execution engine. 1 Introduction In its origins the Web was a collection of semi-structured (HTML) documents con- nected by hypertext links.

  • rss feeds

  • based

  • news aggregators like

  • query plan

  • aggregating rss streams

  • rss views

  • queries

  • feeds can

  • can best


Subjects

Informations

Published by
Reads 19
Language English
RoSeS:Acontinuouscontent-basedqueryengineforRSSfeedsJordiCreus1,BerndAmann1,NicolasTravers2,andDanVodislav31LIP6,CNRSUniversite´PierreetMarieCurie,Paris,France2Cedric/CNAMConservatoireNationaldesArtsetMe´tiers,Paris,France3ETIS,CNRS–UniversityofCergy-Pontoise,Cergy,FranceAbstract.InthispaperwepresentRoSeS(ReallyOpenSimpleandEfficientSyndication),agenericframeworkforcontent-basedRSSfeedqueryingandag-gregation.RoSeSisbasedonadata-centricapproach,usingacombinationofstandarddatabaseconceptslikedeclarativequerylanguages,viewsandmulti-queryoptimization.Userscreatepersonalizedfeedsbydefiningandcomposingcontent-basedfilteringandaggregationqueriesoncollectionsofRSSfeeds.Pub-lishingthesequeriescorrespondstodefiningviewswhichcanthenbeusedforbuildingnewqueries/feeds.Thisnaturallyreflectsthepublish-subscribenatureofRSSapplications.ThecontributionspresentedinthispaperareadeclarativeRSSfeedaggregationlanguage,anextensiblestreamalgebraforbuildingeffi-cientcontinuousmulti-queryexecutionplansforRSSaggregationviews,amulti-queryoptimizationstrategyfortheseplansandarunningprototypebasedonamulti-threadedasynchronousexecutionengine.1IntroductionInitsoriginstheWebwasacollectionofsemi-structured(HTML)documentscon-nectedbyhypertextlinks.Thisvisionhasbeenvalidformanyyearsandthemaineffortforfacilitatingaccesstoandpublishingwebinformationwasinvestedinthedevelop-mentofexpressiveandscalablesearchenginesforretrievingpagesrelevanttouserqueries.Morerecently,newwebcontentpublishingandsharingapplicationsthatcom-binemodernsoftwareinfrastructures(AJAX,webservices)andhardwaretechnologies(handheldmobileuserdevices)appearedonthescene.Thewebcontentspublishedbytheseapplicationsisgenerallyevolvingveryrapidlyintimeandcanbestbecharacter-izedbyastreamofinformationentities.Onlinemedia,socialnetworksandmicroblog-gingsystemsareamongthemostpopularexamplesofsuchapplications,butthelistofwebapplicationsgeneratingmanydifferentkindsofinformationstreamsisincreasingeveryday.InourworkweareinterestedinRSS4andATOM[?]asstandardformatsforpub-lishinginformationstreams.Bothformatscanbeconsideredasthecontinuouscounter-partofstaticHTMLdocumentsforencodingsemi-structureddatastreamsinformofTheauthorsacknowledgethesupportoftheFrenchAgenceNationaledelaRecherche(ANR),undergrantROSES(ANR-07-MDCO-011)“ReallyOpen,SimpleandEfficientSyndication”4RSSstandsfor(1)RichSiteSummary,(2)RDFSiteSummaryand(3)ReallySimpleSyndi-cation.