IAC 2024 - IA Fast Track to Search Focused AI Solutions
Probability Review
1. Probability Review
Tomoki Tsuchida
Computational & Cognitive Neuroscience Lab
Department of Computer Science
University of California, San Diego
2. Lectures based on previous bootcamp lectures/
slides, first written by Tim K. Marks (Thanks Tim!!)
and then by David Groppe (Thanks David!!)
and then by Jake Olson (Thanks Jake!!)
3. Talk Outline
• Why study probability?
• Probability defined
• Probabilities of Events Formed from other
Events
‣ P(not E)
‣ P(E or F)
‣ P(E & F)
- Independent events
• Conditional Probability
9. Talk Outline
• Why is probability useful?
• Probability Defined
• Probabilities of Events Formed from other
Events
‣ P(not E)
‣ P(E or F)
‣ P(E & F)
- Independent events
• Conditional Probability
10. We now know Probability is useful...
But what does it mean?
Examples:
“There is a 47% chance of red.”
“There is a 47% chance of rain.”
11. Two of the ways to interpret the probability:
(Philosophical debates ensure for either interpretation. Or, one can forget about the
interpretation and focus on the properties alone, as mathematicians do :P)
12. Two of the ways to interpret the probability:
Frequentist-The percentage of
time some event will happen.
(Philosophical debates ensure for either interpretation. Or, one can forget about the
interpretation and focus on the properties alone, as mathematicians do :P)
13. Two of the ways to interpret the probability:
Frequentist-The percentage of
time some event will happen.
Bayesian-How strongly
you believe that something
is true.
(Philosophical debates ensure for either interpretation. Or, one can forget about the
interpretation and focus on the properties alone, as mathematicians do :P)
15. Probability based on Axiomatic
Theory
Random Experiment: An experiment with
stochastic (nondeterministic) result
(e.g. “two coin tosses”)
16. Probability based on Axiomatic
Theory
Random Experiment: An experiment with
stochastic (nondeterministic) result
(e.g. “two coin tosses”)
Outcome: Result of the experiment
(e.g. HH, TH etc.)
17. Probability based on Axiomatic
Theory
Random Experiment: An experiment with
stochastic (nondeterministic) result
(e.g. “two coin tosses”)
Outcome: Result of the experiment
(e.g. HH, TH etc.)
Sample Space (Ω): Set of all possible
outcomes of the random experiment
(Ω = {HH, TH, HT, TT} )
18. Probability based on Axiomatic
Theory
Random Experiment: An experiment with
stochastic (nondeterministic) result
(e.g. “two coin tosses”)
Outcome: Result of the experiment
(e.g. HH, TH etc.)
Sample Space (Ω): Set of all possible
outcomes of the random experiment
(Ω = {HH, TH, HT, TT} )
Event: A subset of the sample space that
satisfies certain constraints
(e.g. E={HH}, F={HH, TH}, G={HH, TT})
19. An Example Random Experiment:
Rolling a six-sided die once
Outcomes: 1, 2, 3, 4, 5, or 6
Sample Space: Ω={1,2,3,4,5,6}
Events: 26 total (why?)
note:
The sample space itself Ω (Omega)
and the empty set ϕ (phi) or {}
...are also events.
20. Complement of an Event
Odd numbers on the die: E={1,3,5}
c
The complement of E, E, are all elements
that are not members of E:
c
E ={2,4,6}
Draw Venn diagrams
(Remember: events are sets!
c
So E is also an event itself.)
21. Union of Two Events
Odd numbers on the die: E={1,3,5}
Numbers less than 4: F={1,2,3}
The union of E and F are all elements that
are members of E or members of F:
E or F=E ∪ F={1,2,3,5}
22. Intersection of Two Events
Odd numbers on the die: E={1,3,5}
Numbers less than 4: F={1,2,3}
The intersection of E and F are all
elements that are members of E and
members of F:
E & F=E ∩ F={1,3}
23. An Axiomatic Definition of
Probability
Probability is a set function P(E) that assigns to every event E
a number called the “probability of E” such that:
24. An Axiomatic Definition of
Probability
Probability is a set function P(E) that assigns to every event E
a number called the “probability of E” such that:
1. The probability of an event is greater than or equal to zero
25. An Axiomatic Definition of
Probability
Probability is a set function P(E) that assigns to every event E
a number called the “probability of E” such that:
1. The probability of an event is greater than or equal to zero
P(E) ≥ 0
€
26. An Axiomatic Definition of
Probability
Probability is a set function P(E) that assigns to every event E
a number called the “probability of E” such that:
1. The probability of an event is greater than or equal to zero
P(E) ≥ 0
2. The probability of the sample space is one
€
27. An Axiomatic Definition of
Probability
Probability is a set function P(E) that assigns to every event E
a number called the “probability of E” such that:
1. The probability of an event is greater than or equal to zero
P(E) ≥ 0
2. The probability of the sample space is one
P(Ω) = 1
€
28. An Axiomatic Definition of
Probability
Probability is a set function P(E) that assigns to every event E
a number called the “probability of E” such that:
1. The probability of an event is greater than or equal to zero
P(E) ≥ 0
2. The probability of the sample space is one
P(Ω) = 1
3. If two events are disjoint, the probability of their union
€ equals the sum of their probabilities
29. An Axiomatic Definition of
Probability
Probability is a set function P(E) that assigns to every event E
a number called the “probability of E” such that:
1. The probability of an event is greater than or equal to zero
P(E) ≥ 0
2. The probability of the sample space is one
P(Ω) = 1
3. If two events are disjoint, the probability of their union
€ equals the sum of their probabilities
P(E or F) = P(E) + P(F)
30. An Example Random Experiment:
Rolling a six-sided die once
A legal probability function:
P(1)=P(2)=P(3)=P(4)=P(5)=P(6)=1/6
An illegal probability function:
P(1)=P(2)=P(3)=P(4)=P(5)=P(6)=1/2
P(Ω)=P(1)+P(2)+P(3)+P(4)+P(5)+P(6)=3
31. Talk Outline
• Probability Defined
• Probabilities of Events Formed from other
Events
‣ P(not E)
‣ P(E or F)
‣ P(E & F)
- Independent events
• Conditional Probability
‣ Bayes’ Rule
32. Probability of Events from other
Events: Complement of an event
c
P(E )=P(not E)=1-P(E)
c
E={1,3},E ={2,4,5,6}
c
P(E )=1-P(E)=1-1/3=2/3
Note: This is easy to visualize with Venn Diagrams
33. Probability of Events from other
Events: Intersection of two events
P(E ∩ F)=P(E & F)=P(E)P(F)
If and only if E and F are
independent events
E={1,3,5}, F={1,2,3,4}
E ∩ F={1,3}
34. Probability of Events from other
Events: Intersection of two events
P(E ∩ F)=P(E & F)=P(E)P(F)
If and only if E and F are
independent events
E={1,3,5}, F={1,2,3,4}
E ∩ F={1,3}
P(E)=1/2, P(F)=2/3
P(E ∩ F)=1/3=P(E)P(F)
35. Probability of Events from other
Events: Union of two events
P(E ∪ F)=P(E or F)=P(E)+P(F)-P(E ∩ F)
E={1,3}, F={1,2,3}
E ∪ F={1,2,3}, E ∩ F={1,3}
36. Probability of Events from other
Events: Union of two events
P(E ∪ F)=P(E or F)=P(E)+P(F)-P(E ∩ F)
E={1,3}, F={1,2,3}
E ∪ F={1,2,3}, E ∩ F={1,3}
P(E)=1/3, P(F)=1/2
P(E ∪ F)=P(E)+P(F)-P(E ∩ F)=1/3-1/2-1/3=1/2
37. Probability of Events from other
Events: Intersection of two events
P(E ∩ F)=P(E & F)
E={1,3}, F={1,2,3,4}
E ∩ F={1,3}
P(E)=2/6, P(F)=4/6
P(E ∩ F)=2/6
38. Probability of Events from other
Events: Intersection of two events
P(E ∩ F)=P(E)P(F)
If and only if E and F are
independent events
E={1,3}, F={1,2,3,4}
E ∩ F={1,3}
P(E)=2/6, P(F)=4/6
P(E ∩ F)=2/6≠P(E)P(F)
39. Understanding Independence
P(E ∩ F)=P(E)P(F)
Independent events give no
information about each other.
E={1,3,5}, F={1,2,3,4}
E ∩ F={1,3}
P(E)=1/2, P(F)=2/3, P(E & F)=1/3
P(E)P(F)=(1/2)/(2/3)=1/3=P(E & F)
note: this is different from disjoint events, for which P(E)P(F) = 0 !
40. Talk Outline
• Probability Defined
• Probabilities of Events Formed from other
Events
‣ P(not E)
‣ P(E or F)
‣ P(E & F)
- Independent events
• Conditional Probability
‣ Bayes’ Rule
41. If I tell you that I rolled a number less than 4, what is the
probability that I rolled an odd number?
Conditional Probability: The probability of event E
given that event F has happened
P(E & F)
P(E | F) =
P(F)
E={1,3,5}, F={1,2,3}
€ E ∩ F={1,3}
42. If I tell you that I rolled a number less than 4, what is the
probability that I rolled an odd number?
Conditional Probability: The probability of event E
given that event F has happened
P(E & F)
P(E | F) =
P(F)
E={1,3,5}, F={1,2,3}
€ E ∩ F={1,3}
P(E & F)=1/3, P(F)=1/2
P(E|F)=(1/3)/(1/2)=2/3
43. Conditional Probability
What if E and F are independent events?
P(E & F) P(E)P(F)
P(E | F) = = = P(E)
P(F) P(F)
€
same example of
independent variables
that I used before
44. Conditional Probability
What if E and F are independent events?
P(E & F) P(E)P(F)
P(E | F) = = = P(E)
P(F) P(F)
If I tell you that I rolled a number less than 5, what is the
probability that I rolled an odd number?
€
E={1,3,5}, F={1,2,3,4}
same example of E ∩ F={1,3}
independent variables
that I used before P(E)=1/2, P(F)=2/3, P(E & F)=1/3
P(E | F)=(1/3)/(2/3)=1/2=P(E)
46. The multiplication rule
Rearranging the definition of conditional probability,
we get:
P (E&F ) = P (E|F )P (F )
If we have more events, say E, F, G...
(and change & to , for simpler notation):
47. The multiplication rule
Rearranging the definition of conditional probability,
we get:
P (E&F ) = P (E|F )P (F )
If we have more events, say E, F, G...
(and change & to , for simpler notation):
P (E, F, G) = P (E, F |G)P (G)
= P (E|F, G)P (F |G)P (G)
48. The multiplication rule
Rearranging the definition of conditional probability,
we get:
P (E&F ) = P (E|F )P (F )
If we have more events, say E, F, G...
(and change & to , for simpler notation):
P (E, F, G) = P (E, F |G)P (G)
= P (E|F, G)P (F |G)P (G)
If you forget everything else today about conditional
probability, just remember:
49. The multiplication rule
Rearranging the definition of conditional probability,
we get:
P (E&F ) = P (E|F )P (F )
If we have more events, say E, F, G...
(and change & to , for simpler notation):
P (E, F, G) = P (E, F |G)P (G)
= P (E|F, G)P (F |G)P (G)
If you forget everything else today about conditional
probability, just remember:
P (EF ) = P (E|F )P (F )
50. The Wrong Conditioning
Variable:
“The CASA (Center for Addiction and Substance Abuse at Columbia)
study establishes a clear progression that begins with gateway drugs and
leads to cocaine use: nearly 90% of people who have ever tried cocaine
used all three gateway substances [alcohol, marijuana, & cigarettes]
first.”
Source: http://www.columbia.edu/cu/record/archives/vol20/
vol20_iss10/record2010.24.html
29
51. Extra: Monty Hall problem
Suppose you're on a game show and you're given the choice of three doors [and will win
what is behind the chosen door]. Behind one door is a car; behind the others, goats. The car
and the goats were placed randomly behind the doors before the show. The rules of the
game show are as follows: After you have chosen a door, the door remains closed for the
time being. The game show host, Monty Hall, who knows what is behind the doors, now has
to open one of the two remaining doors, and the door he opens must have a goat behind it. If
both remaining doors have goats behind them, he chooses one [uniformly] at random. After
Monty Hall opens a door with a goat, he will ask you to decide whether you want to stay with
your first choice or to switch to the last remaining door. Imagine that you chose Door 1 and
the host opens Door 3, which has a goat. He then asks you "Do you want to switch to Door
Number 2?" Is it to your advantage to change your choice?
- Krauss and Wang 2003:10
30
52. Talk Outline
• Probability Defined
• Probabilities of Events Formed from other
Events
‣ P(not E)
‣ P(E or F)
‣ P(E & F)
- Independent events
• Conditional Probability
‣ Bayes’ Rule
53. Bayes’ Rule
(a.k.a. Bayes’ Theorem)
Note that the multiplication rule is symmetric, so
P (E, F ) = P (E|F )P (F ) = P (F |E)P (E)
Dividing one of them by P(F) yields
P (F |E)P (E)
P (E|F ) =
P (F )
54. Bayes’ Rule
(a.k.a. Bayes’ Theorem)
A way to infer one conditional probability
from another (probabilistic inference):
P (D|H)P (H)
P (H|D) =
P (D)
Prior: P (H) (“prior” belief about H)
Likelihood: P (D|H) (Under, H how likely it is to see D)
Posterior: P (H|D) (“updated” belief about H
after seeing D)
Note: Named after Rev. Thomas Bayes (1702-1761)
55. Marginalization
How do we calculate P(F)? Note that
{
E = (E F ) [ (E F )
So that
P (E) = P (EF ) + P (EF { )
{ {
= P (E|F )P (F ) + P (E|F )P (F )
= P (E|F )P (F ) + P (E|F { )(1 P (F ))
56. Marginalization
So for Bayes’ rule for two hypotheses is
P (D|H)P (H)
P (H|D) =
P (D|H)P (H) + P (D|H { )P (H { )
(phew!)
57. Exercise
• A laboratory blood test is 95% effective in
detecting a disease when it is, in fact,
present. However, the test also yields a “false
positive” result for 1 percent of the healthy
persons tested. If 0.5 percent of the
population actually has the disease, what is
the probability a person has the disease
given that the test result is positive?
(from Ross, Section 3.3 example 3d)
58. Solution
• D: the event that the tested person has the
disease.
• E: the event that the test result is positive.
• We know: P(E | D) = 0.95, P(D) = .005.
• We want to know P(D | E)
59. Solution
P (D, E)
P (D|E) =
P (E)
P (E|D)P (D)
=
P (E|D)P (D) + P (E|D{ )P (D{ )
(.95)(.005)
=
(.95)(.005) + (.01)(.995)
⇡ .323
(Why is it so low?)
60. Does the Brain Perform Bayesian Inference?
Opinion TRENDS in Neurosciences Vol.27 No.12 December 2004
The Bayesian brain: the role of
uncertainty in neural coding and
computation
David C. Knill and Alexandre Pouget
Center for Visual Science and the Department of Brain and Cognitive Science, University of Rochester, NY 14627, USA
To use sensory information efficiently to make judgments Bayesian inference and the Bayesian coding hypothesis
and guide action in the world, the brain must represent The fundamental concept behind the Bayesian approach
and use information about uncertainty in its computations to perceptual computations is that the information
for perception and action. Bayesian methods have proven provided by a set of sensory data about the world is
successful in building computational theories for percep- represented by a conditional probability density function
tion and sensorimotor control, and psychophysics is over the set of unknown variables – the posterior density
providing a growing body of evidence that human function. A Bayesian perceptual system, therefore, would
perceptual computations are ‘Bayes’ optimal’. This leads represent the perceived depth of an object, for example,
to the ‘Bayesian coding hypothesis’: that the brain not as a single number Z but as a conditional probability
represents sensory information probabilistically, in the density function p(Z/I), where I is the available image
form of probability distributions. Several computational information (e.g. stereo disparities). Loosely speaking,
schemes have recently been proposed for how this might p(Z/I) would specify the relative probability that the object
be achieved in populations of neurons. Neurophysio- is at different depths Z, given the available sensory
logical data on the hypothesis, however, is almost non- information.
existent. A major challenge for neuroscientists is to test More generally, the component computations that
these ideas experimentally, and so determine whether underlay Bayesian inferences [that give rise to p(Z/I)]
and how neurons code information about sensory are ideally performed on representations of conditional
uncertainty. probability density functions rather than on unitary
estimates of parameter values. Loosely speaking, a
Humans and other animals operate in a world of sensory Bayes’ optimal system maintains, at each stage of local
uncertainty. Although introspection tells us that percep- computation, a representation of all possible values of the
61. Does the Brain Perform Bayesian Inference?
• An appealing idea because:
‣ It could explain why the brain works as it
does (i.e., it is performing optimal Bayesian
inference)
• An unappealing idea because:
‣ For even some simple problems it can be
difficult to know what the optimal Bayesian
inference is
‣ Computation of probabilities can be difficult
Probability is powerful tool for dealing with uncertainty\n\nImportant to Cog Sci because:\n1) living in the world is fraught with uncertainty\n2) Scientific data is noisy - \n A) Behavior and neural representations are noisy.\n\n B) our scientific perceptual systems are fraught with uncertainty\n\n
-Principled way of dealing with the uncertainty in our perception, experience in general.\n- - Maybe an idea as to how the brain deals with things?\n\nImportant to Cog Sci because:\n1) living in the world is fraught with uncertainty\n(e.g., any 2D image is consistent with multiple 3D scenes)\n\nUnderstanding the mathematics of probability may help us to understand how the brain deals with uncertainty\n
\n
Important to Cog Sci because:\n2) our scientific perceptual systems are fraught with uncertainty\n\nProbability is key to making sense of our data.\n
\n
\n
Examples:\nfrequentist:\nchance of winning\ngambles, repeatable stuff (but a lot of things are not repeatable! wouldn’t you get same results from the same initial conditions?)\n\nBayesian: \nweather\nexistence of aliens on Saturn\nsubjective belief must follow prob. laws in order to be coherent.\n(but how do you measure the belief? and priors?)\n\n
Examples:\nfrequentist:\nchance of winning\ngambles, repeatable stuff (but a lot of things are not repeatable! wouldn’t you get same results from the same initial conditions?)\n\nBayesian: \nweather\nexistence of aliens on Saturn\nsubjective belief must follow prob. laws in order to be coherent.\n(but how do you measure the belief? and priors?)\n\n
Stop, do some examples of sample spaces and events.\n\nTo build a probability model, we need at least three ingredients. We need to know: • What are all the things that could possibly happen? • What sensible yes-no questions can we ask about these things? • For any such question, what is the probability that the answer is yes?\nThe first point on the agenda is formalized by specifying a set Ω. Every element ω ∈ Ω symbolizes one possible fate of the model.\n
Stop, do some examples of sample spaces and events.\n\nTo build a probability model, we need at least three ingredients. We need to know: • What are all the things that could possibly happen? • What sensible yes-no questions can we ask about these things? • For any such question, what is the probability that the answer is yes?\nThe first point on the agenda is formalized by specifying a set Ω. Every element ω ∈ Ω symbolizes one possible fate of the model.\n
Stop, do some examples of sample spaces and events.\n\nTo build a probability model, we need at least three ingredients. We need to know: • What are all the things that could possibly happen? • What sensible yes-no questions can we ask about these things? • For any such question, what is the probability that the answer is yes?\nThe first point on the agenda is formalized by specifying a set Ω. Every element ω ∈ Ω symbolizes one possible fate of the model.\n
Stop, do some examples of sample spaces and events.\n\nTo build a probability model, we need at least three ingredients. We need to know: • What are all the things that could possibly happen? • What sensible yes-no questions can we ask about these things? • For any such question, what is the probability that the answer is yes?\nThe first point on the agenda is formalized by specifying a set Ω. Every element ω ∈ Ω symbolizes one possible fate of the model.\n