Generative Artificial Intelligence: How generative AI works.pdf
My7class
1.
2.
3.
4.
5. Classification Process (1): Model Construction Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Training Data Classifier (Model)
6. Classification Process (2): Use the Model in Prediction (Jeff, Professor, 4) Tenured? Classifier Testing Data Unseen Data
11. Output: A Decision Tree for “ buys_computer” age? overcast student? credit rating? no yes fair excellent <=30 >40 no no yes yes yes 30..40
12.
13.
14. Information gain slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
23. Entropy in a nut-shell Low Entropy High Entropy ..the values (locations of soup) unpredictable... almost uniformly sampled throughout our dining room ..the values (locations of soup) sampled entirely from within the soup bowl
24.
25.
26. Specific Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X=v) = The entropy of Y among only those records in which X has value v X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
27.
28. Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X) = The average conditional entropy of Y = if you choose a record at random what will be the conditional entropy of Y , conditioned on that row’s value of X = Expected number of bits to transmit Y if both sides will know the value of X = Σ j Prob(X=v j ) H(Y | X = v j ) X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
47. Baysian Networks slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
48.
49.
50.
51.
52.
53. Visualizing A Event space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval
54.
55.
56.
57.
58.
59.
60.
61.
62. Conditional Probability H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(H|F) = Fraction of flu-inflicted worlds in which you have a headache = #worlds with flu and headache ------------------------------------ #worlds with flu = Area of “H and F” region ------------------------------ Area of “F” region = P(H ^ F) ----------- P(F) F H
63. Definition of Conditional Probability P(A ^ B) P(A|B) = ----------- P(B) Corollary: The Chain Rule P(A ^ B) = P(A|B) P(B)
98. A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L M^S)=0.05 P(L M^~S)=0.1 P(L ~M^S)=0.1 P(L ~M^~S)=0.2 P(S M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L M ^ S) = 0.05 P(L M ^ ~S) = 0.1 P(L ~M ^ S) = 0.1 P(L ~M ^ ~S) = 0.2
99. A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L M^S)=0.05 P(L M^~S)=0.1 P(L ~M^S)=0.1 P(L ~M^~S)=0.2 Read the absence of an arrow between S and M to mean “it would not help me predict M if I knew the value of S” Read the two arrows into L to mean that if I want to know the value of L it may help me to know M and to know S. This kind of stuff will be thoroughly formalized later P(S M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L M ^ S) = 0.05 P(L M ^ ~S) = 0.1 P(L ~M ^ S) = 0.1 P(L ~M ^ ~S) = 0.2
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
110.
111.
112.
113.
114.
115.
116.
117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
129.
130.
131.
132.
133.
134.
135.
136.
137.
138.
139.
140. A Neuron k - f weighted sum Input vector x output y Activation function weight vector w w 0 w 1 w n x 0 x 1 x n
151. Non-Linear SVM Classification using SVM ( w,b ) In non linear case we can see this as Kernel – Can be thought of as doing dot product in some high dimensional space