opinion mining tutorial-WWW-2008 [Compatibility  Mode]
51 Pages

opinion mining tutorial-WWW-2008 [Compatibility Mode]


Downloading requires you to have access to the YouScribe library
Learn all about the services we offer


?????????TTu utorialtorial given aat t WWW-2008, April 21, 2008 in BeijingOpinion MMiningining & Summarization- Sentiment AnalysisBing LiuDepartment of Computer ScienceUniversity of Illinois at Chicagoliub@cs.uic.eduhttp://www.cscs.uicuic.edu/~liubIntroduction – facts and opinionsTwo main types of textual information. FtFacts and OpinionsMost current information processing technique (e.g., ssearchearch engines) wworkork with factsfacts (assume they are true)Facts cancan be expressed wwithith topictopic keywords.keywords.E.g., search engines do not search for opinionsOpppinions are hard to express with a few keywordsHow do people think of Motorola Cell phones?Current search ranking strategy is not appropriate for opinion retrieval/search.Bing Liu, UIC WWW-2008 Tutorial 2??????????????????Introduction – user generated contentWord-of-mouth on the WebOne can express personal experiences and opinions on almost anything, at review sites, forums, discussion groups, blogs ... (called tthehe user generated content.)They contain valuable informationWeb/global scale: No longer – one’s circle of friendsOur interest: to mine opinions expressed in the user-generated contentAn intellectually very challenging problem.Practically very useful. Bing Liu, UIC WWW-2008 Tutorial 3Introduction – ApplicationsBusinesses and organizations ...



Published by
Reads 50
Language English
-Sentiment Anal sis
Bing Liu Department of Computer Science University of Illinois at Chicago liub cs.uic.edu . . .
Introduction  facts and opinions
„Two main types of textual information. ‰ p n onsac s an „Most current information processing technique . ., they are true)       „E.g., search engines do not search for opinions ‰ are hard to ex ress with a few ke wordsO inions „How do people think of Motorola Cell phones? ‰search ranking strategy is not appropriate forCurrent  .
Bing Liu, UIC WWW-2008 Tutorial
Introduction  user generated content
„Word-of-mouth on the Web ‰One can express personal experiences and opinions on almost anything, at review sites, forums, discussion groups, ... . ‰They contain valuable information ‰Web/ lobal scale: circle of friends er one’sNo lon „Our interest:to mine opinions expressed in the user-generated content ‰An intellectually very challenging problem. ‰Practically very useful.
Bing Liu, UIC WWW-2008 Tutorial
Introduction  Applications
„Businesses and organizations: product and service benchmarking.  . ‰Business spends a huge amount of money to find consumer sentiments and opinions. „, , „lsiduandivI: interested in other’s opinions when ‰Purchasing a product or using a service, ‰Finding opinions on political topics, „Ads placements:Placing ads in the user-generated content  . ‰Place an ad from a competitor if one criticizes a product. „Opinion retrieval/search: providing general search for opinions.
Bing Liu, UIC WWW-2008 Tutorial
Two types of evaluation
„Direct Opinions: sentiment expressions on some objects, e.g., products, events, topics, persons. ‰. ., ‰Subjective    similarities or differences of more than one ob ect. Usuall ex ressin an orderin . ‰E.g., “car x is cheaper than car y.” ‰Objective or subjective.
Bing Liu, UIC WWW-2008 Tutorial
O inion search(Liu, Web Data Mining book, 2007)
„Can ou search for o inions as convenientl as general Web search? „ need to make a decisionWhenever ou ou may want some opinions from others, ‰Wouldn’t it be nice?ou can find them on a search system instantly, by issuing queries such as „Opinions: “Motorola cell phones” „ompar sons: ao oro a vs. o „Cannot be done yet! Very hard!
Bing Liu, UIC WWW-2008 Tutorial
T ical o inion search ueries
„ inion o anization or or erson of a inionFind the o holder) on a particular object or a feature of the object. ‰is Bill Clinton’s opinion on abortion?E.g., what „Find positive and/or negative opinions on a particular object (or some features of the object),e.g.,  . ‰public opinions on a political topic. „ e on an ob ect chan inionsFind how o over time. „How object A compares with Object B? ‰Gmail vs. Hotmail
Bing Liu, UIC WWW-2008 Tutorial
Find the opinion of a person on X
 , can handle it, i.e., using suitable keywords.     „Reason:         opinion on a particular topic. ‰The o inion is likel in a sin contained le document. ‰Thus, a good keyword query may be sufficient.
Bing Liu, UIC WWW-2008 Tutorial
Find opinions on an object
We use product reviews as an example:         „ from general Web search. ‰ inions for oE. . search on “Motorola RAZR V3” „General Web search (for a fact): rank pages according to some authority and relevance scores. ‰ per ec s . page rs searc e ee user v ews ‰One fact = Multiple facts „ r hini n w v r nk i ir l h: r ‰reading only the review ranked at the top is not appropriate because it is only the opinion of one person.    
Bing Liu, UIC WWW-2008 Tutorial
Search opinions (contd)
„Ranking: ‰ two ran ngspro uce „Positive opinions and negative opinions „Some kind of summar # of each ., both, e. of ‰Or, one rankingbut „The top (say 30) reviews should reflect the natural distribution  , . ., right balance of positive and negative reviews. „Questions: ‰Should the user reads all the top reviews? OR ‰Should the system prepare a summary of the reviews?
Bing Liu, UIC WWW-2008 Tutorial
Reviews are similar to surveys
„Reviews can be regarded as traditional surveys. ‰In traditional survey, returned survey forms are treate as raw ata. ‰Analysis is performed to summarize the survey . „E.g., % against or for a particular issue, etc.  , ‰Can a summary be produced?     
Bing Liu, UIC WWW-2008 Tutorial
„Opinion mining – the abstraction „Document level sentiment classification    „Feature-based opinion mining and
„Comparative sentence and relation
„Opinion spam „
Bing Liu, UIC WWW-2008 Tutorial
Opinion mining  theabstraction (Hu and Liu, KDD-04; Liu, Web Data Mining book 2007)
    ‰Opinion holder: The person or organization that holds a specific opinion on a particular object. ‰ec : c anon w s op n on expresse ‰Opinion: a view, attitude, or appraisal on an object from an opinion holder. „Objectives of opinion mining: many ... „Let us abstract the problem ‰put existing research into a common framework „We useconsumer reviews of productsto develop the . .
Bing Liu, UIC WWW-2008 Tutorial
„Definition(object): AnobjectOis an entity which         topic.Ois represented as ‰a hierarchy ofcomponents,sub-components, and so on.           set ofattributesof the component. ‰Ois the root node (which also has a set of attributes)           of the node. „simplify our discussion, we use “To features” to  . ‰The term “feature” should be understood in ab road sense, „feature, topic or sub-topic, event or sub-event, etcProduct „o e: ea ure s a so a ec see o.
Bing Liu, UIC WWW-2008 Tutorial
Model of a review
„An objectOrepresented with a finite set of features,is  , , …,n. ‰Each featurefiinFcan be expressed with a finite set of words or phrasesWi, which aresynonyms.             {W1,W2, …,Wn} for the features.          subset of thefeaturesSjFof objectO. ‰For each featurefkSjthatjcomments on, he/she „chooses a word or phrase fromWkto describe the feature, and „ex resses a ositive ne ative or neutralo iniononf.
Bing Liu, UIC WWW-2008 Tutorial
Opinion mining tasks
„At the document (or review) level:    „Classes: positive, negative, and neutral „Assumption: each document (or review) focuses on a single          opinion from a single opinion holder. „At the sentence level: as:en y ng su ec ve sen op n ona e ences „Classes: objective and subjective (opinionated) Task 2:sentiment classification of sentences „Classes:positive, negative and neutral. „Assumption: a sentence contains only one opinion ‰not true in many cases. „Then we can also consider clauses or phrases. Bing Liu, UIC WWW-2008 Tutorial
Opinion mining tasks (contd)
„At the feature level: Task1:Identify and extract object features that have been commented on by an opinion holder (e.g., a reviewer).          positive, negative or neutral. Task3:Group feature synonyms. ‰Produce a feature-based opinion summary of multiple reviews(more on this later). „p n on ers o: a so use u , e.g., s ers oen y in news articles, etc, but they are usually known in th r n r t nt nt i. . th r f th t .
Bing Liu, UIC WWW-2008 Tutorial
More at the feature level
„Problem 1:BothFandWare unknown. ‰We need to perform all three tasks: „Problem 2:Fis known butWis unknown. ‰All three tasks are still needed. Task 3 is easier. It becomes the problem of matching the discovered  . „Problem 3:Wis known (Fis known too).
F:the set of features     
Bing Liu, UIC WWW-2008 Tutorial
„Opinion mining – the abstraction „Document level sentiment classification    „Feature-based opinion mining and
„Comparative sentence and relation
„Opinion spam „
Bing Liu, UIC WWW-2008 Tutorial
Sentiment classification
„Classify documents (e.g., reviews) based on the       (authors), ‰Positive ne ative and ossibl neutral ‰Since in our modelan objectOitself is also a feature, then sentiment classificationessentially determines the opinion  . ., . „Similar but different from topic-based text classification. ‰In topic-based text classification, topic words are important. ‰In sentiment classification, sentiment words are more , . ., , , , , , .
Bing Liu, UIC WWW-2008 Tutorial
Unsupervised review classification (Turney, ACL-02)
 . automobiles, banks, movies, and travel destinations. „The approach: Three steps  ‰Part-of-speech tagging      phrases) from reviews if their tags conform to some given patterns, e.g., (1) JJ, (2) NN.
Bing Liu, UIC WWW-2008 Tutorial
     (SO) of the extracted phrases ‰Use Pointwise mutual information PMI(word1,word2)=log2P(word1word2)
‰Semantic orientation (SO): SO(phrase) = PMI(phrase, “excellent”) - PMI(phrase, “poor”)
‰AltaVista near operator to do search to findUsing  . Bing Liu, UIC WWW-2008 Tutorial 22