SlideShare uma empresa Scribd logo
1 de 160
Classification ,[object Object],[object Object],[object Object],modified by Donghui Zhang Integrated with slides from Prof. Andrew W. Moore http:// www.cs.cmu.edu/~awm/tutorials
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Classification vs. Prediction
Classification—A Two-Step Process   ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Classification Process (1): Model Construction Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’  Training Data Classifier (Model)
Classification Process (2): Use the Model in Prediction (Jeff, Professor, 4) Tenured? Classifier Testing Data Unseen Data
Supervised vs. Unsupervised Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Evaluating Classification Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Training Dataset This follows an  example from Quinlan’s ID3
Output: A Decision Tree for “ buys_computer” age? overcast student? credit rating? no yes fair excellent <=30 >40 no no yes yes yes 30..40
Extracting Classification Rules from Trees ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Algorithm for Decision Tree Induction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Information gain slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials:  http://www.cs.cmu.edu/~awm/tutorials  . Comments and corrections gratefully received.
Bits ,[object Object],[object Object],[object Object],[object Object],[object Object],P(X=C) = 1/4 P(X=B) = 1/4 P(X=D) = 1/4 P(X=A) = 1/4
Fewer Bits ,[object Object],[object Object],[object Object],P(X=C) = 1/8 P(X=B) = 1/4 P(X=D) = 1/8 P(X=A) = 1/2
Fewer Bits ,[object Object],[object Object],[object Object],[object Object],P(X=C) = 1/8 P(X=B) = 1/4 P(X=D) = 1/8 P(X=A) = 1/2 111 D 110 C 10 B 0 A
Fewer Bits ,[object Object],[object Object],[object Object],[object Object],P(X=C) = 1/3 P(X=B) = 1/3 P(X=D) = 1/3 10 C 01 B 00 A
[object Object],[object Object],[object Object],[object Object],[object Object],General Case … . P(X=V 2 ) = p 2 P(X=V 1 ) = p 1 P(X=V m ) = p m
[object Object],[object Object],[object Object],[object Object],[object Object],General Case A histogram of the frequency distribution of values of X would be flat A histogram of the frequency distribution of values of X would have many lows and one or two highs … . P(X=V 2 ) = p 2 P(X=V 1 ) = p 1 P(X=V m ) = p m
[object Object],[object Object],[object Object],[object Object],[object Object],General Case A histogram of the frequency distribution of values of X would be flat A histogram of the frequency distribution of values of X would have many lows and one or two highs ..and so the values sampled from it would be all over the place ..and so the values sampled from it would be more predictable … . P(X=V 2 ) = p 2 P(X=V 1 ) = p 1 P(X=V m ) = p m
Entropy in a nut-shell Low Entropy High Entropy
Entropy in a nut-shell Low Entropy High Entropy ..the values (locations of soup) unpredictable... almost uniformly sampled throughout our dining room ..the values (locations of soup) sampled entirely from within the soup bowl
Exercise: ,[object Object],[object Object],[object Object],[object Object]
Specific Conditional Entropy Suppose I’m trying to predict output Y and I have input X ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Specific Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X=v)  =  The entropy of  Y  among only those records in which  X  has value  v X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Specific Conditional Entropy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X)   =  The average conditional entropy of  Y = if you choose a record at random what will be the conditional entropy of  Y , conditioned on that row’s value of  X = Expected number of bits to transmit  Y  if both sides will know the value of  X =  Σ j Prob(X=v j ) H(Y  |  X = v j ) X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Conditional Entropy ,[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” Example: H(Y | X)  =  0.5 * 1 + 0.25 * 0 + 0.25 * 0 = 0.5 0 0.25 CS 0 0.25 History 1 0.5 Math H(Y  |  X = v j ) Prob(X=v j ) v j Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Information Gain ,[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” ,[object Object],[object Object],[object Object],[object Object],Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
What is Information Gain used for? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conditional entropy H(C|age) ,[object Object],[object Object],[object Object]
Select the attribute with lowest conditional entropy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],yes age? <=30 >40 30..40 student? no yes no yes credit rating? fair excellent no yes
Goodness in Decision Tree Induction ,[object Object],[object Object],[object Object],[object Object]
Scalable Decision Tree Induction Methods in Data Mining Studies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Visualization of a   Decision Tree   in SGI/MineSet 3.0
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bayesian Classification: Why? ,[object Object],[object Object],[object Object],[object Object]
Bayesian Classification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bayesian Theorem ,[object Object],[object Object],[object Object],[object Object],[object Object]
Basic Idea  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Naïve Bayes Classifier  ,[object Object],[object Object],[object Object],[object Object]
Sample quiz questions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Naïve Bayesian Classifier:  Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Pitfall: forget P(Ci)
Naïve Bayesian Classifier: Comments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Baysian Networks slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials:  http://www.cs.cmu.edu/~awm/tutorials  . Comments and corrections gratefully received.
What we’ll discuss ,[object Object],[object Object],[object Object],[object Object],[object Object]
Why this matters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Anomaly Detection Inference Active Data Collection
Ways to deal with Uncertainty ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Discrete Random Variables ,[object Object],[object Object],[object Object],[object Object],[object Object]
Probabilities ,[object Object],[object Object],[object Object]
Visualizing A Event space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval
Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],The area of A can’t get any smaller than 0 And a zero area would mean no world could ever have A true
Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],The area of A can’t get any bigger than 1 And an area of 1 would mean all worlds will have A true
Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],A B
Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],P(A or B) B P(A and B) Simple addition and subtraction A B
These Axioms are Not to be Trifled With ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Theorems from the Axioms ,[object Object],[object Object],[object Object],[object Object],[object Object]
Another important theorem ,[object Object],[object Object],[object Object],[object Object],[object Object]
Conditional Probability ,[object Object],F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 “ Headaches are rare and flu is rarer, but if you’re coming down with ‘flu there’s a 50-50 chance you’ll have a headache.”
Conditional Probability H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(H|F) = Fraction of flu-inflicted worlds in which you have a headache = #worlds with flu and headache ------------------------------------ #worlds with flu = Area of “H and F” region ------------------------------ Area of “F” region = P(H ^ F) ----------- P(F)  F H
Definition of Conditional Probability P(A ^ B)  P(A|B)  =  ----------- P(B)  Corollary: The Chain Rule P(A ^ B) = P(A|B) P(B)
Bayes Rule ,[object Object],[object Object],[object Object],[object Object],Bayes, Thomas (1763)  An essay towards solving a problem in the doctrine of chances.  Philosophical Transactions of the Royal Society of London,  53:370-418
Using Bayes Rule to Gamble ,[object Object],The “Lose” envelope has three beads and no money Trivial question: someone draws an envelope at random and offers to sell it to you. How much should you pay? R  R  B  B R  B  B $1.00
Using Bayes Rule to Gamble ,[object Object],The “Lose” envelope has three beads and no money ,[object Object],[object Object],[object Object],$1.00
Another Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Multivalued Random Variables ,[object Object],[object Object],[object Object]
An easy fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object]
An easy fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Another fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object]
Another fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
More General Forms of Bayes Rule
More General Forms of Bayes Rule
Useful Easy-to-prove facts
From Probability to Bayesian Net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Joint Distribution Recipe for making a joint distribution of M variables: Example: Boolean variables A, B, C
The Joint Distribution ,[object Object],[object Object],Example: Boolean variables A, B, C 1 1 1 0 1 1 1 0 1 0 0 1 1 1 0 0 1 0 1 0 0 0 0 0 C B A
The Joint Distribution ,[object Object],[object Object],[object Object],Example: Boolean variables A, B, C 0.10 1 1 1 0.25 0 1 1 0.10 1 0 1 0.05 0 0 1 0.05 1 1 0 0.10 0 1 0 0.05 1 0 0 0.30 0 0 0 Prob C B A
The Joint Distribution ,[object Object],[object Object],[object Object],[object Object],Example: Boolean variables A, B, C A B C 0.05 0.25 0.10 0.05 0.05 0.10 0.10 0.30 0.10 1 1 1 0.25 0 1 1 0.10 1 0 1 0.05 0 0 1 0.05 1 1 0 0.10 0 1 0 0.05 1 0 0 0.30 0 0 0 Prob C B A
Using the Joint Once you have the JD you can ask for the probability of any logical expression involving your attribute
Using the Joint P(Poor Male) = 0.4654
Using the Joint P(Poor) = 0.7604
Inference with the Joint
Inference with the Joint P( Male  |  Poor ) = 0.4654 / 0.7604 = 0.612
Joint distributions ,[object Object],[object Object],[object Object],[object Object]
Using fewer numbers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Using fewer numbers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],What extra assumption can you make?
Independence ,[object Object],[object Object],[object Object],[object Object],[object Object]
Independence ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Independence ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],And in general: P(M=u ^ S=v) = P(M=u) P(S=v) for each of the four combinations of u=True/False v=True/False
Independence ,[object Object],[object Object],[object Object],[object Object],And since we now have the joint pdf, we can make any queries we like. From these statements, we can derive the full joint pdf. F F T F F T T T Prob S M
A more interesting case ,[object Object],[object Object],[object Object],[object Object]
A more interesting case ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A more interesting case ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A more interesting case ,[object Object],[object Object],[object Object],[object Object],P(S    M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L    M ^ S) = 0.05 P(L    M ^ ~S) = 0.1 P(L    ~M ^ S) = 0.1 P(L    ~M ^ ~S) = 0.2 Now we can derive a full joint p.d.f. with a “mere” six numbers instead of seven* *Savings are larger for larger numbers of variables.
A more interesting case ,[object Object],[object Object],[object Object],[object Object],P(S    M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L    M ^ S) = 0.05 P(L    M ^ ~S) = 0.1 P(L    ~M ^ S) = 0.1 P(L    ~M ^ ~S) = 0.2 Question:  Express P(L=x ^ M=y ^ S=z) in terms that only need the above expressions, where  x,y  and  z  may each be True or False.
A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 P(S    M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L    M ^ S) = 0.05 P(L    M ^ ~S) = 0.1 P(L    ~M ^ S) = 0.1 P(L    ~M ^ ~S) = 0.2
A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 Read the absence of an arrow between S and M to mean “it would not help me predict M if I knew the value of S” Read the two arrows into L to mean that if I want to know the value of L it may help me to know M and to know S. This kind of stuff will be thoroughly formalized later P(S    M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L    M ^ S) = 0.05 P(L    M ^ ~S) = 0.1 P(L    ~M ^ S) = 0.1 P(L    ~M ^ ~S) = 0.2
An even cuter trick ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conditional independence ,[object Object],[object Object],[object Object],[object Object],[object Object],M L R Given knowledge of M, knowing anything else in the diagram won’t help us with L, etc. ..which is also notated by the following diagram.
Conditional Independence formalized ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ Shoe-size is conditionally independent of Glove-size given height weight and age” means forall s,g,h,w,a P(ShoeSize=s|Height=h,Weight=w,Age=a) = P(ShoeSize=s|Height=h,Weight=w,Age=a,GloveSize=g)
Example: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ Shoe-size is conditionally independent of Glove-size given height weight and age” does not mean forall s,g,h P(ShoeSize=s|Height=h) = P(ShoeSize=s|Height=h, GloveSize=g)
Conditional independence M L R ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],‘ R and L conditionally independent given M’
Conditional independence M L R ,[object Object],[object Object],[object Object],[object Object],[object Object],Conditional Independence: P(R  M,L) = P(R  M), P(R  ~M,L) = P(R  ~M) Again, we can obtain any member of the Joint prob dist that we desire: P(L=x ^ R=y ^ M=z) =
Assume five variables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T ,[object Object],[object Object]
Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T ,[object Object],[object Object],[object Object]
Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 ,[object Object],[object Object]
Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Bayes Nets Formalized ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Building a Bayes Net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Computing a Joint Entry ,[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Computing with Bayes Net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
The general case ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],So any entry in joint pdf table can be computed. And so  any conditional probability  can be computed.
Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What   could we do to   compute P(R    T,~S)? S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What   could we do to   compute P(R    T,~S)? Step 1: Compute P( R ^   T ^ ~S ) Step 2: Compute P( T ^ ~S ) Step 3: Return P( R ^   T ^ ~S ) ------------------------------------- P( T ^ ~S ) S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What   could we do to   compute P(R    T,~S)? Step 1: Compute P( R ^   T ^ ~S ) Step 2: Compute P( T ^ ~S ) Step 3: Return P( R ^   T ^ ~S ) ------------------------------------- P( T ^ ~S ) Sum of all the rows in the Joint that match  R ^   T ^ ~S Sum of all the rows in the Joint that match  T ^ ~S S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What   could we do to   compute P(R    T,~S)? Step 1: Compute P( R ^   T ^ ~S ) Step 2: Compute P( T ^ ~S ) Step 3: Return P( R ^   T ^ ~S ) ------------------------------------- P( T ^ ~S ) Sum of all the rows in the Joint that match  R ^   T ^ ~S Sum of all the rows in the Joint that match  T ^ ~S Each of these obtained by the “computing a joint probability entry” method of the earlier slides 4 joint computes 8 joint computes S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
The good news ,[object Object],[object Object]
The good news ,[object Object],[object Object],Suppose you have  m  binary-valued variables in your Bayes Net and expression  E 2  mentions  k  variables. How much work is the above computation?
The sad, bad news ,[object Object],[object Object]
The sad, bad news ,[object Object],[object Object],[object Object],[object Object],[object Object]
The sad, bad news ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bayes nets inference algorithms ,[object Object],A poly tree Not a poly tree (but still a legal Bayes net) S R L T L T M S M R X 1 X 2 X 4 X 3 X 5 X 1 X 2 X 3 X 5 X 4 ,[object Object],[object Object],[object Object]
Sampling from the Joint Distribution ,[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Sampling from the Joint Distribution ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
A general sampling algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Stochastic Simulation Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General Stochastic Simulation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Likelihood weighting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Likelihood weighting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Case Study I ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Questions ,[object Object],[object Object],[object Object],[object Object],[object Object]
What you should know ,[object Object],[object Object],[object Object],[object Object],[object Object]
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Neural Networks ,[object Object],[object Object],[object Object]
A  Neuron ,[object Object], k - f weighted  sum Input vector  x output  y Activation function weight vector  w  w 0 w 1 w n x 0 x 1 x n
A  Neuron  k - f weighted  sum Input vector  x output  y Activation function weight vector  w  w 0 w 1 w n x 0 x 1 x n
Multi-Layer Perceptron Output nodes Input nodes Hidden nodes Output vector Input vector:  x i w ij
Network Training ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Network Pruning and Rule Extraction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Linear Support Vector Machines ,[object Object],value( )= -1, e.g. does not buy computer value( )= 1, e.g. buy computer ,[object Object],Margin
Linear Support Vector Machines Support Vectors Small Margin Large Margin
Linear Support Vector Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],w·x  + b  = -1 w·x  + b  = 1
Linear Support Vector Machines ,[object Object],w·x  + b  = -1 w·x  + b  = 1 10 11 12 60° ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],M
Linear Support Vector Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SVM – Cont. ,[object Object],[object Object],-1 0 +1 + + - (1,0) (0,0) (0,1) + + -
Non-Linear SVM Classification using SVM ( w,b ) In non linear case we can see this as  Kernel – Can be thought of as doing dot product  in some high dimensional space
Example of Non-linear SVM
Results
SVM vs. Neural Network ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SVM Related Links ,[object Object],[object Object],[object Object],[object Object],[object Object]
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bagging and Boosting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Classifier C Classification method (CM) CM Classifier C1 CM Classifier C2 Classifier C*
Bagging  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Boosting Technique — Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier108kaushik
 
Chap08 estimation additional topics
Chap08 estimation additional topicsChap08 estimation additional topics
Chap08 estimation additional topicsJudianto Nugroho
 
Generalization abstraction
Generalization abstractionGeneralization abstraction
Generalization abstractionEdward Blurock
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with TensorflowShubham Sharma
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2Nandhini S
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLPbutest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Chap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionChap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionJudianto Nugroho
 
Classification
ClassificationClassification
ClassificationCloudxLab
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.ShwetaPatil174
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part) Marina Santini
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation sourcebutest
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random ForestsCloudxLab
 

Mais procurados (20)

ppt
pptppt
ppt
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
 
Chap08 estimation additional topics
Chap08 estimation additional topicsChap08 estimation additional topics
Chap08 estimation additional topics
 
Generalization abstraction
Generalization abstractionGeneralization abstraction
Generalization abstraction
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Sfs4e ppt 06
Sfs4e ppt 06Sfs4e ppt 06
Sfs4e ppt 06
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Machine learning
Machine learningMachine learning
Machine learning
 
Chap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionChap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distribution
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Machine learning
Machine learningMachine learning
Machine learning
 
Classification
ClassificationClassification
Classification
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 

Semelhante a My7class

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.pptbutest
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.pptRohit Raj
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
original
originaloriginal
originalbutest
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text ClassificationSai Srinivas Kotni
 
week9_Machine_Learning.ppt
week9_Machine_Learning.pptweek9_Machine_Learning.ppt
week9_Machine_Learning.pptbutest
 
CS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine LearningCS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine Learningbutest
 

Semelhante a My7class (20)

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
original
originaloriginal
original
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
ppt
pptppt
ppt
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
week9_Machine_Learning.ppt
week9_Machine_Learning.pptweek9_Machine_Learning.ppt
week9_Machine_Learning.ppt
 
Unit-2.ppt
Unit-2.pptUnit-2.ppt
Unit-2.ppt
 
CS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine LearningCS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 

Último

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Último (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

My7class

  • 1.
  • 2.
  • 3.
  • 4.
  • 5. Classification Process (1): Model Construction Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Training Data Classifier (Model)
  • 6. Classification Process (2): Use the Model in Prediction (Jeff, Professor, 4) Tenured? Classifier Testing Data Unseen Data
  • 7.
  • 8.
  • 9.
  • 10. Training Dataset This follows an example from Quinlan’s ID3
  • 11. Output: A Decision Tree for “ buys_computer” age? overcast student? credit rating? no yes fair excellent <=30 >40 no no yes yes yes 30..40
  • 12.
  • 13.
  • 14. Information gain slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Entropy in a nut-shell Low Entropy High Entropy
  • 23. Entropy in a nut-shell Low Entropy High Entropy ..the values (locations of soup) unpredictable... almost uniformly sampled throughout our dining room ..the values (locations of soup) sampled entirely from within the soup bowl
  • 24.
  • 25.
  • 26. Specific Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X=v) = The entropy of Y among only those records in which X has value v X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
  • 27.
  • 28. Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X) = The average conditional entropy of Y = if you choose a record at random what will be the conditional entropy of Y , conditioned on that row’s value of X = Expected number of bits to transmit Y if both sides will know the value of X = Σ j Prob(X=v j ) H(Y | X = v j ) X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Visualization of a Decision Tree in SGI/MineSet 3.0
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47. Baysian Networks slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53. Visualizing A Event space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62. Conditional Probability H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(H|F) = Fraction of flu-inflicted worlds in which you have a headache = #worlds with flu and headache ------------------------------------ #worlds with flu = Area of “H and F” region ------------------------------ Area of “F” region = P(H ^ F) ----------- P(F) F H
  • 63. Definition of Conditional Probability P(A ^ B) P(A|B) = ----------- P(B) Corollary: The Chain Rule P(A ^ B) = P(A|B) P(B)
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73. More General Forms of Bayes Rule
  • 74. More General Forms of Bayes Rule
  • 76.
  • 77. The Joint Distribution Recipe for making a joint distribution of M variables: Example: Boolean variables A, B, C
  • 78.
  • 79.
  • 80.
  • 81. Using the Joint Once you have the JD you can ask for the probability of any logical expression involving your attribute
  • 82. Using the Joint P(Poor Male) = 0.4654
  • 83. Using the Joint P(Poor) = 0.7604
  • 85. Inference with the Joint P( Male | Poor ) = 0.4654 / 0.7604 = 0.612
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98. A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2
  • 99. A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 Read the absence of an arrow between S and M to mean “it would not help me predict M if I knew the value of S” Read the two arrows into L to mean that if I want to know the value of L it may help me to know M and to know S. This kind of stuff will be thoroughly formalized later P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117.
  • 118.
  • 119.
  • 120.
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140. A Neuron  k - f weighted sum Input vector x output y Activation function weight vector w  w 0 w 1 w n x 0 x 1 x n
  • 141. Multi-Layer Perceptron Output nodes Input nodes Hidden nodes Output vector Input vector: x i w ij
  • 142.
  • 143.
  • 144.
  • 145.
  • 146. Linear Support Vector Machines Support Vectors Small Margin Large Margin
  • 147.
  • 148.
  • 149.
  • 150.
  • 151. Non-Linear SVM Classification using SVM ( w,b ) In non linear case we can see this as Kernel – Can be thought of as doing dot product in some high dimensional space
  • 154.
  • 155.
  • 156.
  • 157.
  • 158.
  • 159.
  • 160.