Probability Review

Probability Review
Tomoki Tsuchida
Computational & Cognitive Neuroscience Lab
Department of Computer Science
University of California, San Diego

Lectures based on previous bootcamp lectures/
slides, ﬁrst written by Tim K. Marks (Thanks Tim!!)
and then by David Groppe (Thanks David!!)
and then by Jake Olson (Thanks Jake!!)

Talk Outline
• Why study probability?
• Probability deﬁned
• Probabilities of Events Formed from other
Events
‣ P(not E)
‣ P(E or F)
‣ P(E & F)
- Independent events

• Conditional Probability

Why do Cognitive Scientists
need probability?

Why do Cognitive Scientists
need probability?

The answer: Uncertainty!

Everyday Perceptual Uncertainty

Artist: Julian Beever

http://users.skynet.be/J.Beever/pave.htm

Uncertainty in Scientiﬁc Perception

Talk Outline
• Why is probability useful?
• Probability Deﬁned
Events
‣ P(not E)
‣ P(E or F)
‣ P(E & F)


We now know Probability is useful...

But what does it mean?

Examples:

“There is a 47% chance of red.”

“There is a 47% chance of rain.”

Two of the ways to interpret the probability:

(Philosophical debates ensure for either interpretation. Or, one can forget about the
interpretation and focus on the properties alone, as mathematicians do :P)


Frequentist-The percentage of
time some event will happen.



Frequentist-The percentage of
time some event will happen.

Bayesian-How strongly
you believe that something
is true.


Probability based on Axiomatic
Theory

Theory
Random Experiment: An experiment with
stochastic (nondeterministic) result
(e.g. “two coin tosses”)

Theory
Outcome: Result of the experiment
(e.g. HH, TH etc.)

Theory
(e.g. HH, TH etc.)
Sample Space (Ω): Set of all possible
outcomes of the random experiment
(Ω = {HH, TH, HT, TT} )

Theory
(e.g. HH, TH etc.)
Sample Space (Ω): Set of all possible
outcomes of the random experiment
(Ω = {HH, TH, HT, TT} )
Event: A subset of the sample space that
satisﬁes certain constraints
(e.g. E={HH}, F={HH, TH}, G={HH, TT})

An Example Random Experiment:
Rolling a six-sided die once
Outcomes: 1, 2, 3, 4, 5, or 6
Sample Space: Ω={1,2,3,4,5,6}
Events: 26 total (why?)

note:
The sample space itself Ω (Omega)
and the empty set ϕ (phi) or {}
...are also events.

Complement of an Event

Odd numbers on the die: E={1,3,5}
c
The complement of E, E, are all elements
that are not members of E:
c
E ={2,4,6}
Draw Venn diagrams

(Remember: events are sets!
c
So E is also an event itself.)

Union of Two Events

Numbers less than 4: F={1,2,3}

The union of E and F are all elements that
are members of E or members of F:
E or F=E ∪ F={1,2,3,5}

Intersection of Two Events

Numbers less than 4: F={1,2,3}

The intersection of E and F are all
elements that are members of E and
members of F:
E & F=E ∩ F={1,3}

An Axiomatic Deﬁnition of
Probability
Probability is a set function P(E) that assigns to every event E
a number called the “probability of E” such that:

Probability

1. The probability of an event is greater than or equal to zero

Probability

P(E) ≥ 0

€

Probability

P(E) ≥ 0
2. The probability of the sample space is one

€

Probability

P(E) ≥ 0
P(Ω) = 1
€

Probability

P(E) ≥ 0
P(Ω) = 1
3. If two events are disjoint, the probability of their union
€ equals the sum of their probabilities

Probability

P(E) ≥ 0
P(Ω) = 1
3. If two events are disjoint, the probability of their union
€ equals the sum of their probabilities
P(E or F) = P(E) + P(F)

An Example Random Experiment:
Rolling a six-sided die once

A legal probability function:
P(1)=P(2)=P(3)=P(4)=P(5)=P(6)=1/6

An illegal probability function:
P(1)=P(2)=P(3)=P(4)=P(5)=P(6)=1/2

P(Ω)=P(1)+P(2)+P(3)+P(4)+P(5)+P(6)=3

Talk Outline
• Probability Deﬁned
Events
‣ P(not E)
‣ P(E or F)
‣ P(E & F)

‣ Bayes’ Rule

Probability of Events from other
Events: Complement of an event
c
P(E )=P(not E)=1-P(E)

c
E={1,3},E ={2,4,5,6}
c
P(E )=1-P(E)=1-1/3=2/3

Note: This is easy to visualize with Venn Diagrams

Events: Intersection of two events
P(E ∩ F)=P(E & F)=P(E)P(F)
If and only if E and F are
independent events

E={1,3,5}, F={1,2,3,4}
E ∩ F={1,3}

P(E ∩ F)=P(E & F)=P(E)P(F)
independent events

E={1,3,5}, F={1,2,3,4}
E ∩ F={1,3}
P(E)=1/2, P(F)=2/3
P(E ∩ F)=1/3=P(E)P(F)

Events: Union of two events
P(E ∪ F)=P(E or F)=P(E)+P(F)-P(E ∩ F)

E={1,3}, F={1,2,3}
E ∪ F={1,2,3}, E ∩ F={1,3}

Events: Union of two events
P(E ∪ F)=P(E or F)=P(E)+P(F)-P(E ∩ F)

E={1,3}, F={1,2,3}
E ∪ F={1,2,3}, E ∩ F={1,3}

P(E)=1/3, P(F)=1/2
P(E ∪ F)=P(E)+P(F)-P(E ∩ F)=1/3-1/2-1/3=1/2

P(E ∩ F)=P(E & F)

E={1,3}, F={1,2,3,4}
E ∩ F={1,3}
P(E)=2/6, P(F)=4/6
P(E ∩ F)=2/6

P(E ∩ F)=P(E)P(F)
independent events

E={1,3}, F={1,2,3,4}
E ∩ F={1,3}
P(E)=2/6, P(F)=4/6
P(E ∩ F)=2/6≠P(E)P(F)

Understanding Independence

P(E ∩ F)=P(E)P(F)
Independent events give no
information about each other.

E={1,3,5}, F={1,2,3,4}
E ∩ F={1,3}
P(E)=1/2, P(F)=2/3, P(E & F)=1/3
P(E)P(F)=(1/2)/(2/3)=1/3=P(E & F)

note: this is different from disjoint events, for which P(E)P(F) = 0 !

If I tell you that I rolled a number less than 4, what is the
probability that I rolled an odd number?

Conditional Probability: The probability of event E
given that event F has happened
P(E & F)
P(E | F) =
P(F)

E={1,3,5}, F={1,2,3}
€ E ∩ F={1,3}


Conditional Probability: The probability of event E
given that event F has happened
P(E & F)
P(E | F) =
P(F)

E={1,3,5}, F={1,2,3}
€ E ∩ F={1,3}
P(E & F)=1/3, P(F)=1/2
P(E|F)=(1/3)/(1/2)=2/3

Conditional Probability
What if E and F are independent events?
P(E & F) P(E)P(F)
P(E | F) = = = P(E)
P(F) P(F)

€

same example of
independent variables
that I used before

Conditional Probability
What if E and F are independent events?
P(E & F) P(E)P(F)
P(E | F) = = = P(E)
P(F) P(F)
€
E={1,3,5}, F={1,2,3,4}
same example of E ∩ F={1,3}
independent variables
that I used before P(E)=1/2, P(F)=2/3, P(E & F)=1/3
P(E | F)=(1/3)/(2/3)=1/2=P(E)

The multiplication rule
Rearranging the deﬁnition of conditional probability,
we get:
P (E&F ) = P (E|F )P (F )

we get:
P (E&F ) = P (E|F )P (F )

If we have more events, say E, F, G...
(and change & to , for simpler notation):

we get:
P (E&F ) = P (E|F )P (F )

P (E, F, G) = P (E, F |G)P (G)
= P (E|F, G)P (F |G)P (G)

we get:
P (E&F ) = P (E|F )P (F )

P (E, F, G) = P (E, F |G)P (G)
= P (E|F, G)P (F |G)P (G)

If you forget everything else today about conditional
probability, just remember:

The Wrong Conditioning
Variable:

“The CASA (Center for Addiction and Substance Abuse at Columbia)
study establishes a clear progression that begins with gateway drugs and
leads to cocaine use: nearly 90% of people who have ever tried cocaine
used all three gateway substances [alcohol, marijuana, & cigarettes]
first.”

Source: http://www.columbia.edu/cu/record/archives/vol20/
vol20_iss10/record2010.24.html

29

Extra: Monty Hall problem
Suppose you're on a game show and you're given the choice of three doors [and will win
what is behind the chosen door]. Behind one door is a car; behind the others, goats. The car
and the goats were placed randomly behind the doors before the show. The rules of the
game show are as follows: After you have chosen a door, the door remains closed for the
time being. The game show host, Monty Hall, who knows what is behind the doors, now has
to open one of the two remaining doors, and the door he opens must have a goat behind it. If
both remaining doors have goats behind them, he chooses one [uniformly] at random. After
Monty Hall opens a door with a goat, he will ask you to decide whether you want to stay with
your ﬁrst choice or to switch to the last remaining door. Imagine that you chose Door 1 and
the host opens Door 3, which has a goat. He then asks you "Do you want to switch to Door
Number 2?" Is it to your advantage to change your choice?

- Krauss and Wang 2003:10

30

Bayes’ Rule
(a.k.a. Bayes’ Theorem)

Note that the multiplication rule is symmetric, so

P (E, F ) = P (E|F )P (F ) = P (F |E)P (E)

Dividing one of them by P(F) yields

P (F |E)P (E)
P (E|F ) =
P (F )

Bayes’ Rule
(a.k.a. Bayes’ Theorem)

A way to infer one conditional probability
from another (probabilistic inference):

P (D|H)P (H)
P (H|D) =
P (D)
Prior: P (H) (“prior” belief about H)
Likelihood: P (D|H) (Under, H how likely it is to see D)
Posterior: P (H|D) (“updated” belief about H
after seeing D)
Note: Named after Rev. Thomas Bayes (1702-1761)

Marginalization
How do we calculate P(F)? Note that
{
E = (E F ) [ (E F )

So that

P (E) = P (EF ) + P (EF { )
{ {
= P (E|F )P (F ) + P (E|F )P (F )
= P (E|F )P (F ) + P (E|F { )(1 P (F ))

Marginalization
So for Bayes’ rule for two hypotheses is

P (D|H)P (H)
P (H|D) =
P (D|H)P (H) + P (D|H { )P (H { )

(phew!)

Exercise

• A laboratory blood test is 95% effective in
detecting a disease when it is, in fact,
present. However, the test also yields a “false
positive” result for 1 percent of the healthy
persons tested. If 0.5 percent of the
population actually has the disease, what is
the probability a person has the disease
given that the test result is positive?

(from Ross, Section 3.3 example 3d)

Solution

• D: the event that the tested person has the
disease.
• E: the event that the test result is positive.
• We know: P(E | D) = 0.95, P(D) = .005.
• We want to know P(D | E)

Solution

P (D, E)
P (D|E) =
P (E)
P (E|D)P (D)
=
P (E|D)P (D) + P (E|D{ )P (D{ )
(.95)(.005)
=
(.95)(.005) + (.01)(.995)
⇡ .323

(Why is it so low?)

Does the Brain Perform Bayesian Inference?
Opinion TRENDS in Neurosciences Vol.27 No.12 December 2004

The Bayesian brain: the role of
uncertainty in neural coding and
computation
David C. Knill and Alexandre Pouget
Center for Visual Science and the Department of Brain and Cognitive Science, University of Rochester, NY 14627, USA

To use sensory information efﬁciently to make judgments Bayesian inference and the Bayesian coding hypothesis
and guide action in the world, the brain must represent The fundamental concept behind the Bayesian approach
and use information about uncertainty in its computations to perceptual computations is that the information
for perception and action. Bayesian methods have proven provided by a set of sensory data about the world is
successful in building computational theories for percep- represented by a conditional probability density function
tion and sensorimotor control, and psychophysics is over the set of unknown variables – the posterior density
providing a growing body of evidence that human function. A Bayesian perceptual system, therefore, would
perceptual computations are ‘Bayes’ optimal’. This leads represent the perceived depth of an object, for example,
to the ‘Bayesian coding hypothesis’: that the brain not as a single number Z but as a conditional probability
represents sensory information probabilistically, in the density function p(Z/I), where I is the available image
form of probability distributions. Several computational information (e.g. stereo disparities). Loosely speaking,
schemes have recently been proposed for how this might p(Z/I) would specify the relative probability that the object
be achieved in populations of neurons. Neurophysio- is at different depths Z, given the available sensory
logical data on the hypothesis, however, is almost non- information.
existent. A major challenge for neuroscientists is to test More generally, the component computations that
these ideas experimentally, and so determine whether underlay Bayesian inferences [that give rise to p(Z/I)]
and how neurons code information about sensory are ideally performed on representations of conditional
uncertainty. probability density functions rather than on unitary
estimates of parameter values. Loosely speaking, a
Humans and other animals operate in a world of sensory Bayes’ optimal system maintains, at each stage of local
uncertainty. Although introspection tells us that percep- computation, a representation of all possible values of the

Does the Brain Perform Bayesian Inference?

• An appealing idea because:
‣ It could explain why the brain works as it
does (i.e., it is performing optimal Bayesian
inference)
• An unappealing idea because:
‣ For even some simple problems it can be
difﬁcult to know what the optimal Bayesian
inference is
‣ Computation of probabilities can be difﬁcult

Talk Outline

‣ Next session: Random Variables etc.

Probability Review

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (13)

Semelhante a Probability Review

Semelhante a Probability Review (20)

Último

Último (20)

Probability Review

Notas do Editor