Concepts and Techniques Spring 2009
45 Pages
English

Concepts and Techniques Spring 2009

-

Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Description

http://en.wikipedia.org/wiki/Naive_Bayes_classifier ..... then P("soccer"|Sports) = 37/153. Word “soccer" occurs 1+1+1 = 3 times in politics docs.

Subjects

Informations

Published by
Reads 70
Language English
CSE-590/634Data Mining Concepts and Tec hSpring 2009inuqeBayesian Classifications Presented by:Muhammad A. Islam, 106506983MoieedAhmed, 106867769 Guided by: Prof. Anita Wasilewska 
ƒ • •BibliographyDATA MINING Concepts and nTdechniques, JiaweiHan, MichelineKamberMorgan Kaufman Publishers, 2Edition. ƒChapter 6, Classification and Prediction, Section 6.4.Computer Science, Carnegie Mellon University – http://www.cs.cmu.edu/~awm/tutorials– http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/mlbook/ch6.pdf– http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbook.htmlWikipedia:––  hhttttpp::////eenn..wwiikkiippeeddiiaa..oorrgg//wwiikkii//BNaaiyveesi_aBna_yperso_bcalbaislsiitfyier
Outline• Introduction to Bayesian Classification– Probability– BayesTheorem– Naïve Ba yesClassifier– Classification E xample• Text Classification –an Application• Paper: “Text Mining: Fi nding Nuggets in Mountains of Textual Data”
Introduction to Bayesian ClassificationyBMuhammad A. Islam106506983 
`Bayesian ClassificationWhat is it ?Statistical method for classification.Su pervised Learning Method.As sumes an underlying probabilistic model, the Bayesthe orem.Can solve diagnostic and predictive problems.Ca n solve problems involving both categorical and co ntinuous valued attributes.Named after Thomas Bayes, who proposed the Bayes Th eorem. 
 • •Basic Probability ConceptsSample space S is a set of all possible outcomes– S = {1,2,3,4,5,6} for a dice roll– S = {H,T} for a coin toss.  yna si A tnevE nAecapS elpmaS eht fo tesbus– Seeing a 1 on the dice roll  –oc a no daeh gnitteGssot ni
 • • • •Random VariablesA is a random variable if A denotes an event, and there is some degree of uncertainty as to whether A occurs.– A = The US president in 2016 will be male– A = You see a head on a coin toss– A = The weather will be sunny tomorrowBoolean random variables– A is either true or false.Discrete random variables – Weather is one of {sunny, rain, cloudy, snow} Continuous random variables – Temp=21.6.
ƒProbabilityWe write P(A) as “the fraction of possible worlds in which A is true”Event space of all possible worldsIts area is 1Worlds in which A is trueWorlds in which A is FalseP(A) = Area of reddish ovalhttp://www.cs.cmu.edu/~awm/tutorials
The axioms of Probability`0 <= P(A) <= 1``PP((TFraulsee) ) ==  10`P(A or B) = P(A) + P(B) -P(A and B)From these we can prove:`P(notA) = P(~A) = 1-P(A)`^ A(P + )B ^ A(P = )A(P)B~ http://www.cs.cmu.edu/~awm/tutorials
 •Conditional ProbabilityP(A|B) = Fraction of worlds in which B is true that also have A trueFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2“Headaches are rare and flu is rarer, but if you’re coming down with ‘flu there’s a 50-50 chance you’ll have a headache.”http://www.cs.cmu.edu/~awm/tutorials
Conditional ProbabilityFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2P(H|F) = Fraction of flu-inflicted worlds in which you have a headache =#worlds with flu and headache#worlds with fluArea of “H and F”region= Area of “F”regionP(H ^ F)  = )F(Phttp://www.cs.cmu.edu/~awm/tutorials