Concepts and Techniques Spring 2009
45 Pages

Concepts and Techniques Spring 2009


Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Description ..... then P("soccer"|Sports) = 37/153. Word “soccer" occurs 1+1+1 = 3 times in politics docs.



Published by
Reads 70
Language English
CSE-590/634Data Mining Concepts and Tec hSpring 2009inuqeBayesian Classifications Presented by:Muhammad A. Islam, 106506983MoieedAhmed, 106867769 Guided by: Prof. Anita Wasilewska 
ƒ • •BibliographyDATA MINING Concepts and nTdechniques, JiaweiHan, MichelineKamberMorgan Kaufman Publishers, 2Edition. ƒChapter 6, Classification and Prediction, Section 6.4.Computer Science, Carnegie Mellon University –––––  hhttttpp::////eenn..wwiikkiippeeddiiaa..oorrgg//wwiikkii//BNaaiyveesi_aBna_yperso_bcalbaislsiitfyier
Outline• Introduction to Bayesian Classification– Probability– BayesTheorem– Naïve Ba yesClassifier– Classification E xample• Text Classification –an Application• Paper: “Text Mining: Fi nding Nuggets in Mountains of Textual Data”
Introduction to Bayesian ClassificationyBMuhammad A. Islam106506983 
`Bayesian ClassificationWhat is it ?Statistical method for classification.Su pervised Learning Method.As sumes an underlying probabilistic model, the Bayesthe orem.Can solve diagnostic and predictive problems.Ca n solve problems involving both categorical and co ntinuous valued attributes.Named after Thomas Bayes, who proposed the Bayes Th eorem. 
 • •Basic Probability ConceptsSample space S is a set of all possible outcomes– S = {1,2,3,4,5,6} for a dice roll– S = {H,T} for a coin toss.  yna si A tnevE nAecapS elpmaS eht fo tesbus– Seeing a 1 on the dice roll  –oc a no daeh gnitteGssot ni
 • • • •Random VariablesA is a random variable if A denotes an event, and there is some degree of uncertainty as to whether A occurs.– A = The US president in 2016 will be male– A = You see a head on a coin toss– A = The weather will be sunny tomorrowBoolean random variables– A is either true or false.Discrete random variables – Weather is one of {sunny, rain, cloudy, snow} Continuous random variables – Temp=21.6.
ƒProbabilityWe write P(A) as “the fraction of possible worlds in which A is true”Event space of all possible worldsIts area is 1Worlds in which A is trueWorlds in which A is FalseP(A) = Area of reddish oval
The axioms of Probability`0 <= P(A) <= 1``PP((TFraulsee) ) ==  10`P(A or B) = P(A) + P(B) -P(A and B)From these we can prove:`P(notA) = P(~A) = 1-P(A)`^ A(P + )B ^ A(P = )A(P)B~
 •Conditional ProbabilityP(A|B) = Fraction of worlds in which B is true that also have A trueFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2“Headaches are rare and flu is rarer, but if you’re coming down with ‘flu there’s a 50-50 chance you’ll have a headache.”
Conditional ProbabilityFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2P(H|F) = Fraction of flu-inflicted worlds in which you have a headache =#worlds with flu and headache#worlds with fluArea of “H and F”region= Area of “F”regionP(H ^ F)  = )F(P