SlideShare a Scribd company logo
1 of 85
Download to read offline
A gentle introduction to the
mathematics of biosurveillance:
Bayes Rule and Bayes Classifiers
Associate Member
The RODS Lab
University of Pittburgh
Carnegie Mellon University
http://rods.health.pitt.edu
Andrew W. Moore
Professor
The Auton Lab
School of Computer Science
Carnegie Mellon University
http://www.autonlab.org
awm@cs.cmu.edu
412-268-7599
2
What we’re going to do
• We will review the concept of reasoning with
uncertainty
• Also known as probability
• This is a fundamental building block
• It’s really going to be worth it
3
Discrete Random Variables
• A is a Boolean-valued random variable if A
denotes an event, and there is some degree
of uncertainty as to whether A occurs.
• Examples
• A = The next patient you examine is suffering
from inhalational anthrax
• A = The next patient you examine has a cough
• A = There is an active terrorist cell in your city
4
Probabilities
• We write P(A) as “the fraction of possible
worlds in which A is true”
• We could at this point spend 2 hours on the
philosophy of this.
• But we won’t.
5
Visualizing A
Event space of
all possible
worlds
Its area is 1
Worlds in which A is False
Worlds in which
A is true
P(A) = Area of
reddish oval
7
The Axioms Of Probability
• 0 <= P(A) <= 1
• P(True) = 1
• P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
The area of A can’t get
any smaller than 0
And a zero area would
mean no world could
ever have A true
8
Interpreting the axioms
• 0 <= P(A) <= 1
• P(True) = 1
• P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
The area of A can’t get
any bigger than 1
And an area of 1 would
mean all worlds will have
A true
9
Interpreting the axioms
• 0 <= P(A) <= 1
• P(True) = 1
• P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
A
B
10
A
B
Interpreting the axioms
• 0 <= P(A) <= 1
• P(True) = 1
• P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
P(A or B)
BP(A and B)
Simple addition and subtraction
12
Another important theorem
• 0 <= P(A) <= 1, P(True) = 1, P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
From these we can prove:
P(A) = P(A and B) + P(A and not B)
A B
13
Conditional Probability
• P(A|B) = Fraction of worlds in which B is true
that also have A true
F
H
H = “Have a headache”
F = “Coming down with Flu”
P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2
“Headaches are rare and flu
is rarer, but if you’re coming
down with ‘flu there’s a 50-
50 chance you’ll have a
headache.”
14
Conditional Probability
F
H
H = “Have a headache”
F = “Coming down with Flu”
P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2
P(H|F) = Fraction of flu-inflicted
worlds in which you have a
headache
= #worlds with flu and headache
------------------------------------
#worlds with flu
= Area of “H and F” region
------------------------------
Area of “F” region
= P(H and F)
---------------
P(F)
15
Definition of Conditional Probability
P(A and B)
P(A|B) = -----------
P(B)
Corollary: The Chain Rule
P(A and B) = P(A|B) P(B)
16
Probabilistic Inference
F
H
H = “Have a headache”
F = “Coming down with Flu”
P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2
One day you wake up with a headache. You think: “Drat!
50% of flus are associated with headaches so I must have a
50-50 chance of coming down with flu”
Is this reasoning good?
17
Probabilistic Inference
F
H
H = “Have a headache”
F = “Coming down with Flu”
P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2
P(F and H) = …
P(F|H) = …
18
Probabilistic Inference
F
H
H = “Have a headache”
F = “Coming down with Flu”
P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2
8
1
10
1
80
1
)(
)and(
)|( 
HP
HFP
HFP
80
1
40
1
2
1
)()|()and(  FPFHPHFP
19
What we just did…
P(A ^ B) P(A|B) P(B)
P(B|A) = ----------- = ---------------
P(A) P(A)
This is Bayes Rule
Bayes, Thomas (1763) An essay
towards solving a problem in the doctrine
of chances. Philosophical Transactions
of the Royal Society of London, 53:370-
418
20
Menu
Bad Hygiene Good Hygiene
MenuMenu
Menu
MenuMenu
Menu
• You are a health official, deciding whether to investigate a restaurant
• You lose a dollar if you get it wrong.
• You win a dollar if you get it right
• Half of all restaurants have bad hygiene
• In a bad restaurant, ¾ of the menus are smudged
• In a good restaurant, 1/3 of the menus are smudged
• You are allowed to see a randomly chosen menu
21
)|( SBP
)(
)and(
SP
SBP
)(
)and(
SP
BSP

)notand()and(
)and(
BSPBSP
BSP


)notand()and(
)()|(
BSPBSP
BPBSP


)not()not|()()|(
)()|(
BPBSPBPBSP
BPBSP


13
9
2
1
3
1
2
1
4
3
2
1
4
3




22
Menu
Menu
Menu
Menu
Menu
Menu
Menu
Menu
Menu Menu Menu Menu
Menu Menu Menu Menu
23
Bayesian Diagnosis
Buzzword Meaning In our
example
Our
example’s
value
True State The true state of the world, which
you would like to know
Is the restaurant
bad?
24
Bayesian Diagnosis
Buzzword Meaning In our
example
Our
example’s
value
True State The true state of the world, which
you would like to know
Is the restaurant
bad?
Prior Prob(true state = x) P(Bad) 1/2
25
Bayesian Diagnosis
Buzzword Meaning In our
example
Our
example’s
value
True State The true state of the world, which
you would like to know
Is the restaurant
bad?
Prior Prob(true state = x) P(Bad) 1/2
Evidence Some symptom, or other thing
you can observe
Smudge
26
Bayesian Diagnosis
Buzzword Meaning In our
example
Our
example’s
value
True State The true state of the world, which
you would like to know
Is the restaurant
bad?
Prior Prob(true state = x) P(Bad) 1/2
Evidence Some symptom, or other thing
you can observe
Conditional Probability of seeing evidence if
you did know the true state
P(Smudge|Bad) 3/4
P(Smudge|not Bad) 1/3
27
Bayesian Diagnosis
Buzzword Meaning In our
example
Our
example’s
value
True State The true state of the world, which
you would like to know
Is the restaurant
bad?
Prior Prob(true state = x) P(Bad) 1/2
Evidence Some symptom, or other thing
you can observe
Conditional Probability of seeing evidence if
you did know the true state
P(Smudge|Bad) 3/4
P(Smudge|not Bad) 1/3
Posterior The Prob(true state = x | some
evidence)
P(Bad|Smudge) 9/13
28
Bayesian Diagnosis
Buzzword Meaning In our
example
Our
example’s
value
True State The true state of the world, which
you would like to know
Is the restaurant
bad?
Prior Prob(true state = x) P(Bad) 1/2
Evidence Some symptom, or other thing
you can observe
Conditional Probability of seeing evidence if
you did know the true state
P(Smudge|Bad) 3/4
P(Smudge|not Bad) 1/3
Posterior The Prob(true state = x | some
evidence)
P(Bad|Smudge) 9/13
Inference,
Diagnosis,
Bayesian
Reasoning
Getting the posterior from the
prior and the evidence
29
Bayesian Diagnosis
Buzzword Meaning In our
example
Our
example’s
value
True State The true state of the world, which
you would like to know
Is the restaurant
bad?
Prior Prob(true state = x) P(Bad) 1/2
Evidence Some symptom, or other thing
you can observe
Conditional Probability of seeing evidence if
you did know the true state
P(Smudge|Bad) 3/4
P(Smudge|not Bad) 1/3
Posterior The Prob(true state = x | some
evidence)
P(Bad|Smudge) 9/13
Inference,
Diagnosis,
Bayesian
Reasoning
Getting the posterior from the
prior and the evidence
Decision
theory
Combining the posterior with
known costs in order to decide
what to do
30
Many Pieces of Evidence
31
Many Pieces of Evidence
Pat walks in to the surgery.
Pat is sore and has a headache but no cough
32
Many Pieces of Evidence
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
Pat walks in to the surgery.
Pat is sore and has a headache but no cough
Priors
Conditionals
33
Many Pieces of Evidence
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
Pat walks in to the surgery.
Pat is sore and has a headache but no cough
What is P( F | H and not C and S ) ?
Priors
Conditionals
34
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
The Naïve
Assumption
35
If I know Pat has Flu…
…and I want to know if Pat has a cough…
…it won’t help me to find out whether Pat is sore
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
The Naïve
Assumption
36
If I know Pat has Flu…
…and I want to know if Pat has a cough…
…it won’t help me to find out whether Pat is sore
)|()notand|(
)|()and|(
FCPSFCP
FCPSFCP


Coughing is explained away by Flu
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
The Naïve
Assumption
37
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
The Naïve
Assumption:
General Case
If I know the true state…
…and I want to know about one of the
symptoms…
…then it won’t help me to find out anything
about the other symptoms
)symptomsotherandstatetrue|Symptom(P
Other symptoms are explained away by the true state
)statetrue|Symptom(P
38
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
The Naïve
Assumption:
General Case
If I know the true state…
…and I want to know about one of the
symptoms…
…then it won’t help me to find out anything
about the other symptoms
)symptomsotherandstatetrue|Symptom(P
Other symptoms are explained away by the true state
)statetrue|Symptom(P
39
)andnotand|( SCHFP
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
40
)andnotand|( SCHFP
)andnotand(
)andandnotand(
SCHP
FSCHP

P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
41
)andnotand|( SCHFP
)andnotand(
)andandnotand(
SCHP
FSCHP

)notandandnotand()andandnotand(
)andandnotand(
FSCHPFSCHP
FSCHP


P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
42
)andnotand|( SCHFP
)andnotand(
)andandnotand(
SCHP
FSCHP

)notandandnotand()andandnotand(
)andandnotand(
FSCHPFSCHP
FSCHP


How do I get P(H and not C and S and F)?
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
43
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
)andandnotand( FSCHP
44
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
)andandnot()andandnot|( FSCPFSCHP 
)andandnotand( FSCHP
Chain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
45
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
)andandnot()andandnot|( FSCPFSCHP 
)andandnotand( FSCHP
Naïve assumption: lack of cough and soreness have
no effect on headache if I am already assuming Flu
)andandnot()|( FSCPFHP 
46
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
)andandnot()andandnot|( FSCPFSCHP 
)andandnot()|( FSCPFHP 
)and()and|not()|( FSPFSCPFHP 
)andandnotand( FSCHP
Chain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
47
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
)andandnot()andandnot|( FSCPFSCHP 
)andandnot()|( FSCPFHP 
)and()and|not()|( FSPFSCPFHP 
)and()|not()|( FSPFCPFHP 
)andandnotand( FSCHP
Naïve assumption: Sore has no effect on Cough if I
am already assuming Flu
48
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
)andandnot()andandnot|( FSCPFSCHP 
)andandnot()|( FSCPFHP 
)and()and|not()|( FSPFSCPFHP 
)and()|not()|( FSPFCPFHP 
)()|()|not()|( FPFSPFCPFHP 
)andandnotand( FSCHP
Chain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
49
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
)andandnot()andandnot|( FSCPFSCHP 
)andandnot()|( FSCPFHP 
)and()and|not()|( FSPFSCPFHP 
)and()|not()|( FSPFCPFHP 
)()|()|not()|( FPFSPFCPFHP 
)andandnotand( FSCHP
320
1
40
1
4
3
3
2
1
2
1







50
)andnotand|( SCHFP
)andnotand(
)andandnotand(
SCHP
FSCHP

)notandandnotand()andandnotand(
)andandnotand(
FSCHPFSCHP
FSCHP


P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
51
)notandandnot()not|( FSCPFHP 
)notand()notand|not()not|( FSPFSCPFHP 
)notand()not|not()not|( FSPFCPFHP 
)not()not|()not|not()not|( FPFSPFCPFHP 
)notandandnotand( FSCHP
)notandandnot()notandandnot|( FSCPFSCHP 
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
288
7
40
39
3
1
6
1
1
78
7







52
)andnotand|( SCHFP
)andnotand(
)andandnotand(
SCHP
FSCHP

)notandandnotand()andandnotand(
)andandnotand(
FSCHPFSCHP
FSCHP


P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
288
7
320
1
320
1
= 0.1139 (11% chance of Flu, given symptoms)
53
Building A Bayes Classifier
P(Flu) = 1/40 P(Not Flu) = 39/40
P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78
P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6
P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
Priors
Conditionals
54
The General Case
55
Building a naïve Bayesian Classifier
P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___
P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___
P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___
: : : : : : :
P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___
P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___
P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___
: : : : : : :
P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___
: : :
P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___
P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___
: : : : : : :
P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___
Assume:
• True state has N possible values: 1, 2, 3 .. N
• There are K symptoms called Symptom1, Symptom2, … SymptomK
• Symptomi has Mi possible values: 1, 2, .. Mi
56
Building a naïve Bayesian Classifier
P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___
P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___
P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___
: : : : : : :
P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___
P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___
P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___
: : : : : : :
P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___
: : :
P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___
P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___
: : : : : : :
P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___
Assume:
• True state has N values: 1, 2, 3 .. N
• There are K symptoms called Symptom1, Symptom2, … SymptomK
• Symptomi has Mi values: 1, 2, .. Mi
Example:
P( Anemic | Liver Cancer) = 0.21
57
P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___
P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___
P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___
: : : : : : :
P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___
P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___
P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___
: : : : : : :
P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___
: : :
P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___
P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___
: : : : : : :
P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___
)sympandsympandsymp|state( 2211 nn XXXYP  
)sympandsympandsymp(
)stateandsympandsympandsymp(
2211
2211
nn
nn
XXXP
YXXXP





 


Z
nn
nn
ZXXXP
YXXXP
)stateandsympandsympandsymp(
)stateandsympandsympandsymp(
2211
2211


 


















Z
n
i
ii
n
i
ii
ZPZ|XP
YPY|XP
)state()statesymp(
)state()statesymp(
1
1
58
P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___
P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___
P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___
: : : : : : :
P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___
P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___
P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___
: : : : : : :
P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___
: : :
P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___
P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___
: : : : : : :
P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___
)sympandsympandsymp|state( 2211 nn XXXYP  
)sympandsympandsymp(
)stateandsympandsympandsymp(
2211
2211
nn
nn
XXXP
YXXXP





 


Z
nn
nn
ZXXXP
YXXXP
)stateandsympandsympandsymp(
)stateandsympandsympandsymp(
2211
2211


 


















Z
n
i
ii
n
i
ii
ZPZ|XP
YPY|XP
)state()statesymp(
)state()statesymp(
1
1
59
Conclusion
• You will hear lots of “Bayesian” this and
“conditional probability” that this week.
• It’s simple: don’t let wooly academic types trick you
into thinking it is fancy.
• You should know:
• What are: Bayesian Reasoning, Conditional
Probabilities, Priors, Posteriors.
• Appreciate how conditional probabilities are manipulated.
• Why the Naïve Bayes Assumption is Good.
• Why the Naïve Bayes Assumption is Evil.
60
Text mining
• Motivation: an enormous (and growing!) supply
of rich data
• Most of the available text data is unstructured…
• Some of it is semi-structured:
• Header entries (title, authors’ names, section titles,
keyword lists, etc.)
• Running text bodies (main body, abstract, summary,
etc.)
• Natural Language Processing (NLP)
• Text Information Retrieval
61
Text processing
• Natural Language Processing:
• Automated understanding of text is a very very very
challenging Artificial Intelligence problem
• Aims on extracting semantic contents of the processed
documents
• Involves extensive research into semantics, grammar,
automated reasoning, …
• Several factors making it tough for a computer include:
• Polysemy (the same word having several different
meanings)
• Synonymy (several different ways to describe the
same thing)
62
Text processing
• Text Information Retrieval:
• Search through collections of documents in order to find
objects:
• relevant to a specific query
• similar to a specific document
• For practical reasons, the text documents are
parameterized
• Terminology:
• Documents (text data units: books, articles,
paragraphs, other chunks such as email
messages, ...)
• Terms (specific words, word pairs, phrases)
63
Text Information Retrieval
• Typically, the text databases are parametrized with a document-
term matrix
• Each row of the matrix corresponds to one of the documents
• Each column corresponds to a different term
Shortness of breath
Difficulty breathing
Rash on neck
Sore neck and difficulty breathing
Just plain ugly
64
Text Information Retrieval
• Typically, the text databases are parametrized with a document-
term matrix
• Each row of the matrix corresponds to one of the documents
• Each column corresponds to a different term
breath difficulty just neck plain rash short sore ugly
Shortness of breath 1 0 0 0 0 0 1 0 0
Difficulty breathing 1 1 0 0 0 0 0 0 0
Rash on neck 0 0 0 1 0 1 0 0 0
Sore neck and difficulty breathing 1 1 0 1 0 0 0 1 0
Just plain ugly 0 0 1 0 1 0 0 0 1
65
Parametrization
for Text Information Retrieval
• Depending on the particular method of
parametrization the matrix entries may be:
• binary
(telling whether a term Tj is present in the
document Di or not)
• counts (frequencies)
(total number of repetitions of a term Tj in Di)
• weighted frequencies
(see the slide following the next)
66
Typical applications of Text IR
• Document indexing and classification
(e.g. library systems)
• Search engines
(e.g. the Web)
• Extraction of information from textual sources
(e.g. profiling of personal records, consumer
complaint processing)
67
Typical applications of Text IR
• Document indexing and classification
(e.g. library systems)
• Search engines
(e.g. the Web)
• Extraction of information from textual sources
(e.g. profiling of personal records, consumer
complaint processing)
68
Building a naïve Bayesian Classifier
P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___
P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___
P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___
: : : : : : :
P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___
P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___
P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___
: : : : : : :
P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___
: : :
P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___
P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___
: : : : : : :
P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___
Assume:
• True state has N values: 1, 2, 3 .. N
• There are K symptoms called Symptom1, Symptom2, … SymptomK
• Symptomi has Mi values: 1, 2, .. Mi
69
Building a naïve Bayesian Classifier
P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___
P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___
P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___
: : : : : : :
P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___
P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___
P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___
: : : : : : :
P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___
: : :
P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___
P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___
: : : : : : :
P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___
Assume:
• True state has N values: 1, 2, 3 .. N
• There are K symptoms called Symptom1, Symptom2, … SymptomK
• Symptomi has Mi values: 1, 2, .. Mi
70
Building a naïve Bayesian Classifier
P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___
P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___
P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___
: : : : : : :
P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___
P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___
P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___
: : : : : : :
P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___
: : :
P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___
P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___
: : : : : : :
P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___
Assume:
• True state has N values: 1, 2, 3 .. N
• There are K symptoms called Symptom1, Symptom2, … SymptomK
• Symptomi has Mi values: 1, 2, .. Mi
71
Building a naïve Bayesian Classifier
P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___
P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___
P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___
: : : : : : :
P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___
P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___
P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___
: : : : : : :
P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___
: : :
P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___
P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___
: : : : : : :
P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___
Assume:
• True state has N values: 1, 2, 3 .. N
• There are K symptoms called Symptom1, Symptom2, … SymptomK
• Symptomi has Mi values: 1, 2, .. Mi
72
Building a naïve Bayesian Classifier
Assume:
• True state has N values: 1, 2, 3 .. N
• There are K symptoms called Symptom1, Symptom2, … SymptomK
• Symptomi has Mi values: 1, 2, .. Mi
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
73
Building a naïve Bayesian Classifier
Assume:
• True state has N values: 1, 2, 3 .. N
• There are K symptoms called Symptom1, Symptom2, … SymptomK
• Symptomi has Mi values: 1, 2, .. Mi
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
Example:
Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
74
Building a naïve Bayesian Classifier
Assume:
• True state has N values: 1, 2, 3 .. N
• There are K symptoms called Symptom1, Symptom2, … SymptomK
• Symptomi has Mi values: 1, 2, .. Mi
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
Example:
Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
75
Building a naïve Bayesian Classifier
Assume:
• True state has N values: 1, 2, 3 .. N
• There are K symptoms called Symptom1, Symptom2, … SymptomK
• Symptomi has Mi values: 1, 2, .. Mi
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
Example:
Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
76
Learning a Bayesian Classifier
breath difficulty just neck plain rash short sore ugly
Shortness of breath 1 0 0 0 0 0 1 0 0
Difficulty breathing 1 1 0 0 0 0 0 0 0
Rash on neck 0 0 0 1 0 1 0 0 0
Sore neck and difficulty breathing 1 1 0 1 0 0 0 1 0
Just plain ugly 0 0 1 0 1 0 0 0 1
1. Before deployment of classifier,
77
Learning a Bayesian Classifier
EXPERT
SAYS
breath difficulty just neck plain rash short sore ugly
Shortness of breath Resp 1 0 0 0 0 0 1 0 0
Difficulty breathing Resp 1 1 0 0 0 0 0 0 0
Rash on neck Rash 0 0 0 1 0 1 0 0 0
Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0
Just plain ugly Other 0 0 1 0 1 0 0 0 1
1. Before deployment of classifier, get labeled training data
78
Learning a Bayesian Classifier
EXPERT SAYS breath difficulty just neck plain rash short sore ugly
Shortness of breath Resp 1 0 0 0 0 0 1 0 0
Difficulty breathing Resp 1 1 0 0 0 0 0 0 0
Rash on neck Rash 0 0 0 1 0 1 0 0 0
Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0
Just plain ugly Other 0 0 1 0 1 0 0 0 1
1. Before deployment of classifier, get labeled training data
2. Learn parameters (conditionals, and priors)
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
79
Learning a Bayesian Classifier
EXPERT SAYS breath difficulty just neck plain rash short sore ugly
Shortness of breath Resp 1 0 0 0 0 0 1 0 0
Difficulty breathing Resp 1 1 0 0 0 0 0 0 0
Rash on neck Rash 0 0 0 1 0 1 0 0 0
Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0
Just plain ugly Other 0 0 1 0 1 0 0 0 1
1. Before deployment of classifier, get labeled training data
2. Learn parameters (conditionals, and priors)
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
recordstrainingresp""num
breath""containingrecordstrainingresp""num
)Respprodrome|1breath( P
80
Learning a Bayesian Classifier
EXPERT SAYS breath difficulty just neck plain rash short sore ugly
Shortness of breath Resp 1 0 0 0 0 0 1 0 0
Difficulty breathing Resp 1 1 0 0 0 0 0 0 0
Rash on neck Rash 0 0 0 1 0 1 0 0 0
Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0
Just plain ugly Other 0 0 1 0 1 0 0 0 1
1. Before deployment of classifier, get labeled training data
2. Learn parameters (conditionals, and priors)
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
recordstrainingresp""num
breath""containingrecordstrainingresp""num
)Respprodrome|1breath( P
recordstrainingnumtotal
recordstrainingresp""num
)Respprodrome( P
81
Learning a Bayesian Classifier
EXPERT SAYS breath difficulty just neck plain rash short sore ugly
Shortness of breath Resp 1 0 0 0 0 0 1 0 0
Difficulty breathing Resp 1 1 0 0 0 0 0 0 0
Rash on neck Rash 0 0 0 1 0 1 0 0 0
Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0
Just plain ugly Other 0 0 1 0 1 0 0 0 1
1. Before deployment of classifier, get labeled training data
2. Learn parameters (conditionals, and priors)
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
82
Learning a Bayesian Classifier
EXPERT SAYS breath difficulty just neck plain rash short sore ugly
Shortness of breath Resp 1 0 0 0 0 0 1 0 0
Difficulty breathing Resp 1 1 0 0 0 0 0 0 0
Rash on neck Rash 0 0 0 1 0 1 0 0 0
Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0
Just plain ugly Other 0 0 1 0 1 0 0 0 1
1. Before deployment of classifier, get labeled training data
2. Learn parameters (conditionals, and priors)
3. During deployment, apply classifier
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
83
Learning a Bayesian Classifier
EXPERT SAYS breath difficulty just neck plain rash short sore ugly
Shortness of breath Resp 1 0 0 0 0 0 1 0 0
Difficulty breathing Resp 1 1 0 0 0 0 0 0 0
Rash on neck Rash 0 0 0 1 0 1 0 0 0
Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0
Just plain ugly Other 0 0 1 0 1 0 0 0 1
1. Before deployment of classifier, get labeled training data
2. Learn parameters (conditionals, and priors)
3. During deployment, apply classifier
New Chief Complaint: “Just sore breath”
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
84
Learning a Bayesian Classifier
EXPERT SAYS breath difficulty just neck plain rash short sore ugly
Shortness of breath Resp 1 0 0 0 0 0 1 0 0
Difficulty breathing Resp 1 1 0 0 0 0 0 0 0
Rash on neck Rash 0 0 0 1 0 1 0 0 0
Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0
Just plain ugly Other 0 0 1 0 1 0 0 0 1
1. Before deployment of classifier, get labeled training data
2. Learn parameters (conditionals, and priors)
3. During deployment, apply classifier
...)1,just0,difficulty1,breath|GIprodrome( P
 


Z
ZPPP
GIPPP
)state(Z)...prod|0difficulty(Z)prod|1breath(
)state(GI)...prod|0difficulty(GI)prod|1breath(
New Chief Complaint: “Just sore breath”
P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___
P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___
P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___
P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___
P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___
: : :
P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___
P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
85
CoCo Performance (AUC scores)
• Botulism 0.78
• rash, 0.91
• neurological 0.92
• hemorrhagic, 0.93;
• constitutional 0.93
• gastrointestinal 0.95
• other, 0.96
• respiratory 0.96
86
Conclusion
• Automated text extraction is increasingly important
• There is a very wide world of text extraction outside
Biosurveillance
• The field has changed very fast, even in the past three
years.
• Warning, although Bayes Classifiers are simplest to
implement, Logistic Regression or other discriminative
methods often learn more accurately. Consider using off
the shelf methods, such as William Cohen’s successful
“minor third” open-source libraries:
http://minorthird.sourceforge.net/
• Real systems (including CoCo) have many ingenious
special-case improvements.
87
Discussion
1. What new data sources should we apply
algorithms to?
1. EG Self-reporting?
2. What are related surveillance problems to which
these kinds of algorithms can be applied?
3. Where are the gaps in the current algorithms
world?
4. Are there other spatial problems out there?
5. Could new or pre-existing algorithms help in the
period after an outbreak is detected?
6. Other comments about favorite tools of the trade.

More Related Content

Viewers also liked

Оранжевое чудо
Оранжевое чудоОранжевое чудо
Оранжевое чудоsfrnvln
 
潮漲威尼斯 Acqua alta in Venezia
潮漲威尼斯 Acqua alta in Venezia潮漲威尼斯 Acqua alta in Venezia
潮漲威尼斯 Acqua alta in Veneziapsjlew
 
robinhoodworkpack1
 robinhoodworkpack1 robinhoodworkpack1
robinhoodworkpack1GCBA
 
Autumn Leaves
Autumn LeavesAutumn Leaves
Autumn Leavespsjlew
 
認識白内障
認識白内障認識白内障
認識白内障psjlew
 
Macao world heritage
Macao world heritageMacao world heritage
Macao world heritagepsjlew
 
Week 8: Algorithmic Game Theory Notes
Week 8: Algorithmic Game Theory NotesWeek 8: Algorithmic Game Theory Notes
Week 8: Algorithmic Game Theory NotesDongseo University
 
St. Petersburg Russia of the Tzars
St. Petersburg Russia of the TzarsSt. Petersburg Russia of the Tzars
St. Petersburg Russia of the Tzarspsjlew
 
Tlaxcalteca and affiliated tribes of texas 2
Tlaxcalteca and affiliated tribes of texas 2Tlaxcalteca and affiliated tribes of texas 2
Tlaxcalteca and affiliated tribes of texas 2TlaxcaltecaTed
 
A Portfolio 101007
A Portfolio 101007A Portfolio 101007
A Portfolio 101007TRKnappArch
 
Project real estate
Project real estateProject real estate
Project real estatechiranjitgbs
 
The latest Tech
The latest TechThe latest Tech
The latest Techpsjlew
 
戴旭 --- 一位解放軍空軍上校沉 思錄
戴旭 --- 一位解放軍空軍上校沉 思錄戴旭 --- 一位解放軍空軍上校沉 思錄
戴旭 --- 一位解放軍空軍上校沉 思錄psjlew
 
2014-1 Computer Algorithm W12 Notes
2014-1 Computer Algorithm W12 Notes2014-1 Computer Algorithm W12 Notes
2014-1 Computer Algorithm W12 NotesDongseo University
 
2014-1 computer algorithm w4 notes
2014-1 computer algorithm w4 notes2014-1 computer algorithm w4 notes
2014-1 computer algorithm w4 notesDongseo University
 
产品经理工作总结20100128
产品经理工作总结20100128产品经理工作总结20100128
产品经理工作总结20100128worldhema
 
社會寫真 -這年頭
社會寫真 -這年頭社會寫真 -這年頭
社會寫真 -這年頭psjlew
 
Australian road trains
Australian road trainsAustralian road trains
Australian road trainspsjlew
 
Sobre o Arrebatamento!
Sobre o Arrebatamento!Sobre o Arrebatamento!
Sobre o Arrebatamento!welingtonjh
 

Viewers also liked (20)

Оранжевое чудо
Оранжевое чудоОранжевое чудо
Оранжевое чудо
 
潮漲威尼斯 Acqua alta in Venezia
潮漲威尼斯 Acqua alta in Venezia潮漲威尼斯 Acqua alta in Venezia
潮漲威尼斯 Acqua alta in Venezia
 
robinhoodworkpack1
 robinhoodworkpack1 robinhoodworkpack1
robinhoodworkpack1
 
Autumn Leaves
Autumn LeavesAutumn Leaves
Autumn Leaves
 
認識白内障
認識白内障認識白内障
認識白内障
 
Macao world heritage
Macao world heritageMacao world heritage
Macao world heritage
 
Week 8: Algorithmic Game Theory Notes
Week 8: Algorithmic Game Theory NotesWeek 8: Algorithmic Game Theory Notes
Week 8: Algorithmic Game Theory Notes
 
St. Petersburg Russia of the Tzars
St. Petersburg Russia of the TzarsSt. Petersburg Russia of the Tzars
St. Petersburg Russia of the Tzars
 
Tlaxcalteca and affiliated tribes of texas 2
Tlaxcalteca and affiliated tribes of texas 2Tlaxcalteca and affiliated tribes of texas 2
Tlaxcalteca and affiliated tribes of texas 2
 
A Portfolio 101007
A Portfolio 101007A Portfolio 101007
A Portfolio 101007
 
Project real estate
Project real estateProject real estate
Project real estate
 
The latest Tech
The latest TechThe latest Tech
The latest Tech
 
戴旭 --- 一位解放軍空軍上校沉 思錄
戴旭 --- 一位解放軍空軍上校沉 思錄戴旭 --- 一位解放軍空軍上校沉 思錄
戴旭 --- 一位解放軍空軍上校沉 思錄
 
2014-1 Computer Algorithm W12 Notes
2014-1 Computer Algorithm W12 Notes2014-1 Computer Algorithm W12 Notes
2014-1 Computer Algorithm W12 Notes
 
2014-1 computer algorithm w4 notes
2014-1 computer algorithm w4 notes2014-1 computer algorithm w4 notes
2014-1 computer algorithm w4 notes
 
产品经理工作总结20100128
产品经理工作总结20100128产品经理工作总结20100128
产品经理工作总结20100128
 
社會寫真 -這年頭
社會寫真 -這年頭社會寫真 -這年頭
社會寫真 -這年頭
 
Australian road trains
Australian road trainsAustralian road trains
Australian road trains
 
Sobre o Arrebatamento!
Sobre o Arrebatamento!Sobre o Arrebatamento!
Sobre o Arrebatamento!
 
Creation Care
Creation CareCreation Care
Creation Care
 

Similar to 2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …

NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 
1 Probability Please read sections 3.1 – 3.3 in your .docx
 1 Probability   Please read sections 3.1 – 3.3 in your .docx 1 Probability   Please read sections 3.1 – 3.3 in your .docx
1 Probability Please read sections 3.1 – 3.3 in your .docxaryan532920
 
Probability concepts-applications-1235015791722176-2
Probability concepts-applications-1235015791722176-2Probability concepts-applications-1235015791722176-2
Probability concepts-applications-1235015791722176-2satysun1990
 
Probability Concepts Applications
Probability Concepts  ApplicationsProbability Concepts  Applications
Probability Concepts Applicationsguest44b78
 
probability_statistics_presentation.pptx
probability_statistics_presentation.pptxprobability_statistics_presentation.pptx
probability_statistics_presentation.pptxvietnam5hayday
 
Maths Class 12 Probability Project Presentation
Maths Class 12 Probability Project PresentationMaths Class 12 Probability Project Presentation
Maths Class 12 Probability Project PresentationAaditya Pandey
 
Probability concepts for Data Analytics
Probability concepts for Data AnalyticsProbability concepts for Data Analytics
Probability concepts for Data AnalyticsSSaudia
 
2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptxImpanaR2
 
Complements and Conditional Probability, and Bayes' Theorem
 Complements and Conditional Probability, and Bayes' Theorem Complements and Conditional Probability, and Bayes' Theorem
Complements and Conditional Probability, and Bayes' TheoremLong Beach City College
 
1615 probability-notation for joint probabilities
1615 probability-notation for joint probabilities1615 probability-notation for joint probabilities
1615 probability-notation for joint probabilitiesDr Fereidoun Dejahang
 
Probablity distribution
Probablity distributionProbablity distribution
Probablity distributionMmedsc Hahm
 
Statistical Analysis with R -II
Statistical Analysis with R -IIStatistical Analysis with R -II
Statistical Analysis with R -IIAkhila Prabhakaran
 

Similar to 2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction … (20)

NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Laws of probability
Laws of probabilityLaws of probability
Laws of probability
 
QT1 - 04 - Probability
QT1 - 04 - ProbabilityQT1 - 04 - Probability
QT1 - 04 - Probability
 
1 Probability Please read sections 3.1 – 3.3 in your .docx
 1 Probability   Please read sections 3.1 – 3.3 in your .docx 1 Probability   Please read sections 3.1 – 3.3 in your .docx
1 Probability Please read sections 3.1 – 3.3 in your .docx
 
Probability concepts-applications-1235015791722176-2
Probability concepts-applications-1235015791722176-2Probability concepts-applications-1235015791722176-2
Probability concepts-applications-1235015791722176-2
 
Probability Concepts Applications
Probability Concepts  ApplicationsProbability Concepts  Applications
Probability Concepts Applications
 
Addition rule and multiplication rule
Addition rule and multiplication rule  Addition rule and multiplication rule
Addition rule and multiplication rule
 
probability_statistics_presentation.pptx
probability_statistics_presentation.pptxprobability_statistics_presentation.pptx
probability_statistics_presentation.pptx
 
Probability
ProbabilityProbability
Probability
 
Basic concepts of probability
Basic concepts of probabilityBasic concepts of probability
Basic concepts of probability
 
Maths Class 12 Probability Project Presentation
Maths Class 12 Probability Project PresentationMaths Class 12 Probability Project Presentation
Maths Class 12 Probability Project Presentation
 
Probability concepts for Data Analytics
Probability concepts for Data AnalyticsProbability concepts for Data Analytics
Probability concepts for Data Analytics
 
2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx
 
Complements and Conditional Probability, and Bayes' Theorem
 Complements and Conditional Probability, and Bayes' Theorem Complements and Conditional Probability, and Bayes' Theorem
Complements and Conditional Probability, and Bayes' Theorem
 
1615 probability-notation for joint probabilities
1615 probability-notation for joint probabilities1615 probability-notation for joint probabilities
1615 probability-notation for joint probabilities
 
Bayes Theorem
Bayes TheoremBayes Theorem
Bayes Theorem
 
Probablity distribution
Probablity distributionProbablity distribution
Probablity distribution
 
Probability
ProbabilityProbability
Probability
 
Probability.pptx
Probability.pptxProbability.pptx
Probability.pptx
 
Statistical Analysis with R -II
Statistical Analysis with R -IIStatistical Analysis with R -II
Statistical Analysis with R -II
 

More from Dongseo University

Lecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfLecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfDongseo University
 
Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Dongseo University
 
Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Dongseo University
 
Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Dongseo University
 
Average Linear Selection Algorithm
Average Linear Selection AlgorithmAverage Linear Selection Algorithm
Average Linear Selection AlgorithmDongseo University
 
Lower Bound of Comparison Sort
Lower Bound of Comparison SortLower Bound of Comparison Sort
Lower Bound of Comparison SortDongseo University
 
Running Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayRunning Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayDongseo University
 
Proof By Math Induction Example
Proof By Math Induction ExampleProof By Math Induction Example
Proof By Math Induction ExampleDongseo University
 
Estimating probability distributions
Estimating probability distributionsEstimating probability distributions
Estimating probability distributionsDongseo University
 
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)Dongseo University
 
2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)Dongseo University
 
2017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #52017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #5Dongseo University
 

More from Dongseo University (20)

Lecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfLecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdf
 
Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03
 
Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02
 
Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01
 
Markov Chain Monte Carlo
Markov Chain Monte CarloMarkov Chain Monte Carlo
Markov Chain Monte Carlo
 
Simplex Lecture Notes
Simplex Lecture NotesSimplex Lecture Notes
Simplex Lecture Notes
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Median of Medians
Median of MediansMedian of Medians
Median of Medians
 
Average Linear Selection Algorithm
Average Linear Selection AlgorithmAverage Linear Selection Algorithm
Average Linear Selection Algorithm
 
Lower Bound of Comparison Sort
Lower Bound of Comparison SortLower Bound of Comparison Sort
Lower Bound of Comparison Sort
 
Running Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayRunning Time of Building Binary Heap using Array
Running Time of Building Binary Heap using Array
 
Running Time of MergeSort
Running Time of MergeSortRunning Time of MergeSort
Running Time of MergeSort
 
Binary Trees
Binary TreesBinary Trees
Binary Trees
 
Proof By Math Induction Example
Proof By Math Induction ExampleProof By Math Induction Example
Proof By Math Induction Example
 
TRPO and PPO notes
TRPO and PPO notesTRPO and PPO notes
TRPO and PPO notes
 
Estimating probability distributions
Estimating probability distributionsEstimating probability distributions
Estimating probability distributions
 
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
 
2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)
 
2017-2 ML W11 GAN #1
2017-2 ML W11 GAN #12017-2 ML W11 GAN #1
2017-2 ML W11 GAN #1
 
2017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #52017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #5
 

Recently uploaded

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 

Recently uploaded (20)

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 

2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …

  • 1. A gentle introduction to the mathematics of biosurveillance: Bayes Rule and Bayes Classifiers Associate Member The RODS Lab University of Pittburgh Carnegie Mellon University http://rods.health.pitt.edu Andrew W. Moore Professor The Auton Lab School of Computer Science Carnegie Mellon University http://www.autonlab.org awm@cs.cmu.edu 412-268-7599
  • 2. 2 What we’re going to do • We will review the concept of reasoning with uncertainty • Also known as probability • This is a fundamental building block • It’s really going to be worth it
  • 3. 3 Discrete Random Variables • A is a Boolean-valued random variable if A denotes an event, and there is some degree of uncertainty as to whether A occurs. • Examples • A = The next patient you examine is suffering from inhalational anthrax • A = The next patient you examine has a cough • A = There is an active terrorist cell in your city
  • 4. 4 Probabilities • We write P(A) as “the fraction of possible worlds in which A is true” • We could at this point spend 2 hours on the philosophy of this. • But we won’t.
  • 5. 5 Visualizing A Event space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval
  • 6. 7 The Axioms Of Probability • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) The area of A can’t get any smaller than 0 And a zero area would mean no world could ever have A true
  • 7. 8 Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) The area of A can’t get any bigger than 1 And an area of 1 would mean all worlds will have A true
  • 8. 9 Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) A B
  • 9. 10 A B Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) P(A or B) BP(A and B) Simple addition and subtraction
  • 10. 12 Another important theorem • 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) From these we can prove: P(A) = P(A and B) + P(A and not B) A B
  • 11. 13 Conditional Probability • P(A|B) = Fraction of worlds in which B is true that also have A true F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 “Headaches are rare and flu is rarer, but if you’re coming down with ‘flu there’s a 50- 50 chance you’ll have a headache.”
  • 12. 14 Conditional Probability F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(H|F) = Fraction of flu-inflicted worlds in which you have a headache = #worlds with flu and headache ------------------------------------ #worlds with flu = Area of “H and F” region ------------------------------ Area of “F” region = P(H and F) --------------- P(F)
  • 13. 15 Definition of Conditional Probability P(A and B) P(A|B) = ----------- P(B) Corollary: The Chain Rule P(A and B) = P(A|B) P(B)
  • 14. 16 Probabilistic Inference F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 One day you wake up with a headache. You think: “Drat! 50% of flus are associated with headaches so I must have a 50-50 chance of coming down with flu” Is this reasoning good?
  • 15. 17 Probabilistic Inference F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(F and H) = … P(F|H) = …
  • 16. 18 Probabilistic Inference F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 8 1 10 1 80 1 )( )and( )|(  HP HFP HFP 80 1 40 1 2 1 )()|()and(  FPFHPHFP
  • 17. 19 What we just did… P(A ^ B) P(A|B) P(B) P(B|A) = ----------- = --------------- P(A) P(A) This is Bayes Rule Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370- 418
  • 18. 20 Menu Bad Hygiene Good Hygiene MenuMenu Menu MenuMenu Menu • You are a health official, deciding whether to investigate a restaurant • You lose a dollar if you get it wrong. • You win a dollar if you get it right • Half of all restaurants have bad hygiene • In a bad restaurant, ¾ of the menus are smudged • In a good restaurant, 1/3 of the menus are smudged • You are allowed to see a randomly chosen menu
  • 21. 23 Bayesian Diagnosis Buzzword Meaning In our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad?
  • 22. 24 Bayesian Diagnosis Buzzword Meaning In our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(Bad) 1/2
  • 23. 25 Bayesian Diagnosis Buzzword Meaning In our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(Bad) 1/2 Evidence Some symptom, or other thing you can observe Smudge
  • 24. 26 Bayesian Diagnosis Buzzword Meaning In our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(Bad) 1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad) 3/4 P(Smudge|not Bad) 1/3
  • 25. 27 Bayesian Diagnosis Buzzword Meaning In our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(Bad) 1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad) 3/4 P(Smudge|not Bad) 1/3 Posterior The Prob(true state = x | some evidence) P(Bad|Smudge) 9/13
  • 26. 28 Bayesian Diagnosis Buzzword Meaning In our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(Bad) 1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad) 3/4 P(Smudge|not Bad) 1/3 Posterior The Prob(true state = x | some evidence) P(Bad|Smudge) 9/13 Inference, Diagnosis, Bayesian Reasoning Getting the posterior from the prior and the evidence
  • 27. 29 Bayesian Diagnosis Buzzword Meaning In our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(Bad) 1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad) 3/4 P(Smudge|not Bad) 1/3 Posterior The Prob(true state = x | some evidence) P(Bad|Smudge) 9/13 Inference, Diagnosis, Bayesian Reasoning Getting the posterior from the prior and the evidence Decision theory Combining the posterior with known costs in order to decide what to do
  • 28. 30 Many Pieces of Evidence
  • 29. 31 Many Pieces of Evidence Pat walks in to the surgery. Pat is sore and has a headache but no cough
  • 30. 32 Many Pieces of Evidence P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 Pat walks in to the surgery. Pat is sore and has a headache but no cough Priors Conditionals
  • 31. 33 Many Pieces of Evidence P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 Pat walks in to the surgery. Pat is sore and has a headache but no cough What is P( F | H and not C and S ) ? Priors Conditionals
  • 32. 34 P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 The Naïve Assumption
  • 33. 35 If I know Pat has Flu… …and I want to know if Pat has a cough… …it won’t help me to find out whether Pat is sore P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 The Naïve Assumption
  • 34. 36 If I know Pat has Flu… …and I want to know if Pat has a cough… …it won’t help me to find out whether Pat is sore )|()notand|( )|()and|( FCPSFCP FCPSFCP   Coughing is explained away by Flu P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 The Naïve Assumption
  • 35. 37 P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 The Naïve Assumption: General Case If I know the true state… …and I want to know about one of the symptoms… …then it won’t help me to find out anything about the other symptoms )symptomsotherandstatetrue|Symptom(P Other symptoms are explained away by the true state )statetrue|Symptom(P
  • 36. 38 P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 The Naïve Assumption: General Case If I know the true state… …and I want to know about one of the symptoms… …then it won’t help me to find out anything about the other symptoms )symptomsotherandstatetrue|Symptom(P Other symptoms are explained away by the true state )statetrue|Symptom(P
  • 37. 39 )andnotand|( SCHFP P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
  • 38. 40 )andnotand|( SCHFP )andnotand( )andandnotand( SCHP FSCHP  P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
  • 39. 41 )andnotand|( SCHFP )andnotand( )andandnotand( SCHP FSCHP  )notandandnotand()andandnotand( )andandnotand( FSCHPFSCHP FSCHP   P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
  • 40. 42 )andnotand|( SCHFP )andnotand( )andandnotand( SCHP FSCHP  )notandandnotand()andandnotand( )andandnotand( FSCHPFSCHP FSCHP   How do I get P(H and not C and S and F)? P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
  • 41. 43 P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 )andandnotand( FSCHP
  • 42. 44 P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 )andandnot()andandnot|( FSCPFSCHP  )andandnotand( FSCHP Chain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
  • 43. 45 P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 )andandnot()andandnot|( FSCPFSCHP  )andandnotand( FSCHP Naïve assumption: lack of cough and soreness have no effect on headache if I am already assuming Flu )andandnot()|( FSCPFHP 
  • 44. 46 P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 )andandnot()andandnot|( FSCPFSCHP  )andandnot()|( FSCPFHP  )and()and|not()|( FSPFSCPFHP  )andandnotand( FSCHP Chain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
  • 45. 47 P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 )andandnot()andandnot|( FSCPFSCHP  )andandnot()|( FSCPFHP  )and()and|not()|( FSPFSCPFHP  )and()|not()|( FSPFCPFHP  )andandnotand( FSCHP Naïve assumption: Sore has no effect on Cough if I am already assuming Flu
  • 46. 48 P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 )andandnot()andandnot|( FSCPFSCHP  )andandnot()|( FSCPFHP  )and()and|not()|( FSPFSCPFHP  )and()|not()|( FSPFCPFHP  )()|()|not()|( FPFSPFCPFHP  )andandnotand( FSCHP Chain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
  • 47. 49 P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 )andandnot()andandnot|( FSCPFSCHP  )andandnot()|( FSCPFHP  )and()and|not()|( FSPFSCPFHP  )and()|not()|( FSPFCPFHP  )()|()|not()|( FPFSPFCPFHP  )andandnotand( FSCHP 320 1 40 1 4 3 3 2 1 2 1       
  • 48. 50 )andnotand|( SCHFP )andnotand( )andandnotand( SCHP FSCHP  )notandandnotand()andandnotand( )andandnotand( FSCHPFSCHP FSCHP   P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
  • 49. 51 )notandandnot()not|( FSCPFHP  )notand()notand|not()not|( FSPFSCPFHP  )notand()not|not()not|( FSPFCPFHP  )not()not|()not|not()not|( FPFSPFCPFHP  )notandandnotand( FSCHP )notandandnot()notandandnot|( FSCPFSCHP  P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 288 7 40 39 3 1 6 1 1 78 7       
  • 50. 52 )andnotand|( SCHFP )andnotand( )andandnotand( SCHP FSCHP  )notandandnotand()andandnotand( )andandnotand( FSCHPFSCHP FSCHP   P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 288 7 320 1 320 1 = 0.1139 (11% chance of Flu, given symptoms)
  • 51. 53 Building A Bayes Classifier P(Flu) = 1/40 P(Not Flu) = 39/40 P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78 P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6 P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3 Priors Conditionals
  • 53. 55 Building a naïve Bayesian Classifier P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___ P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___ P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___ : : : : : : : P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___ P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___ P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___ : : : : : : : P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___ : : : P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___ P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___ : : : : : : : P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___ Assume: • True state has N possible values: 1, 2, 3 .. N • There are K symptoms called Symptom1, Symptom2, … SymptomK • Symptomi has Mi possible values: 1, 2, .. Mi
  • 54. 56 Building a naïve Bayesian Classifier P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___ P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___ P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___ : : : : : : : P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___ P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___ P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___ : : : : : : : P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___ : : : P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___ P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___ : : : : : : : P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___ Assume: • True state has N values: 1, 2, 3 .. N • There are K symptoms called Symptom1, Symptom2, … SymptomK • Symptomi has Mi values: 1, 2, .. Mi Example: P( Anemic | Liver Cancer) = 0.21
  • 55. 57 P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___ P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___ P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___ : : : : : : : P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___ P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___ P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___ : : : : : : : P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___ : : : P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___ P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___ : : : : : : : P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___ )sympandsympandsymp|state( 2211 nn XXXYP   )sympandsympandsymp( )stateandsympandsympandsymp( 2211 2211 nn nn XXXP YXXXP          Z nn nn ZXXXP YXXXP )stateandsympandsympandsymp( )stateandsympandsympandsymp( 2211 2211                       Z n i ii n i ii ZPZ|XP YPY|XP )state()statesymp( )state()statesymp( 1 1
  • 56. 58 P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___ P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___ P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___ : : : : : : : P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___ P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___ P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___ : : : : : : : P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___ : : : P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___ P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___ : : : : : : : P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___ )sympandsympandsymp|state( 2211 nn XXXYP   )sympandsympandsymp( )stateandsympandsympandsymp( 2211 2211 nn nn XXXP YXXXP          Z nn nn ZXXXP YXXXP )stateandsympandsympandsymp( )stateandsympandsympandsymp( 2211 2211                       Z n i ii n i ii ZPZ|XP YPY|XP )state()statesymp( )state()statesymp( 1 1
  • 57. 59 Conclusion • You will hear lots of “Bayesian” this and “conditional probability” that this week. • It’s simple: don’t let wooly academic types trick you into thinking it is fancy. • You should know: • What are: Bayesian Reasoning, Conditional Probabilities, Priors, Posteriors. • Appreciate how conditional probabilities are manipulated. • Why the Naïve Bayes Assumption is Good. • Why the Naïve Bayes Assumption is Evil.
  • 58. 60 Text mining • Motivation: an enormous (and growing!) supply of rich data • Most of the available text data is unstructured… • Some of it is semi-structured: • Header entries (title, authors’ names, section titles, keyword lists, etc.) • Running text bodies (main body, abstract, summary, etc.) • Natural Language Processing (NLP) • Text Information Retrieval
  • 59. 61 Text processing • Natural Language Processing: • Automated understanding of text is a very very very challenging Artificial Intelligence problem • Aims on extracting semantic contents of the processed documents • Involves extensive research into semantics, grammar, automated reasoning, … • Several factors making it tough for a computer include: • Polysemy (the same word having several different meanings) • Synonymy (several different ways to describe the same thing)
  • 60. 62 Text processing • Text Information Retrieval: • Search through collections of documents in order to find objects: • relevant to a specific query • similar to a specific document • For practical reasons, the text documents are parameterized • Terminology: • Documents (text data units: books, articles, paragraphs, other chunks such as email messages, ...) • Terms (specific words, word pairs, phrases)
  • 61. 63 Text Information Retrieval • Typically, the text databases are parametrized with a document- term matrix • Each row of the matrix corresponds to one of the documents • Each column corresponds to a different term Shortness of breath Difficulty breathing Rash on neck Sore neck and difficulty breathing Just plain ugly
  • 62. 64 Text Information Retrieval • Typically, the text databases are parametrized with a document- term matrix • Each row of the matrix corresponds to one of the documents • Each column corresponds to a different term breath difficulty just neck plain rash short sore ugly Shortness of breath 1 0 0 0 0 0 1 0 0 Difficulty breathing 1 1 0 0 0 0 0 0 0 Rash on neck 0 0 0 1 0 1 0 0 0 Sore neck and difficulty breathing 1 1 0 1 0 0 0 1 0 Just plain ugly 0 0 1 0 1 0 0 0 1
  • 63. 65 Parametrization for Text Information Retrieval • Depending on the particular method of parametrization the matrix entries may be: • binary (telling whether a term Tj is present in the document Di or not) • counts (frequencies) (total number of repetitions of a term Tj in Di) • weighted frequencies (see the slide following the next)
  • 64. 66 Typical applications of Text IR • Document indexing and classification (e.g. library systems) • Search engines (e.g. the Web) • Extraction of information from textual sources (e.g. profiling of personal records, consumer complaint processing)
  • 65. 67 Typical applications of Text IR • Document indexing and classification (e.g. library systems) • Search engines (e.g. the Web) • Extraction of information from textual sources (e.g. profiling of personal records, consumer complaint processing)
  • 66. 68 Building a naïve Bayesian Classifier P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___ P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___ P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___ : : : : : : : P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___ P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___ P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___ : : : : : : : P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___ : : : P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___ P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___ : : : : : : : P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___ Assume: • True state has N values: 1, 2, 3 .. N • There are K symptoms called Symptom1, Symptom2, … SymptomK • Symptomi has Mi values: 1, 2, .. Mi
  • 67. 69 Building a naïve Bayesian Classifier P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___ P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___ P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___ : : : : : : : P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___ P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___ P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___ : : : : : : : P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___ : : : P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___ P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___ : : : : : : : P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___ Assume: • True state has N values: 1, 2, 3 .. N • There are K symptoms called Symptom1, Symptom2, … SymptomK • Symptomi has Mi values: 1, 2, .. Mi
  • 68. 70 Building a naïve Bayesian Classifier P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___ P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___ P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___ : : : : : : : P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___ P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___ P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___ : : : : : : : P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___ : : : P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___ P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___ : : : : : : : P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___ Assume: • True state has N values: 1, 2, 3 .. N • There are K symptoms called Symptom1, Symptom2, … SymptomK • Symptomi has Mi values: 1, 2, .. Mi
  • 69. 71 Building a naïve Bayesian Classifier P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___ P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___ P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___ : : : : : : : P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___ P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___ P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___ : : : : : : : P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___ : : : P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___ P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___ : : : : : : : P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___ Assume: • True state has N values: 1, 2, 3 .. N • There are K symptoms called Symptom1, Symptom2, … SymptomK • Symptomi has Mi values: 1, 2, .. Mi
  • 70. 72 Building a naïve Bayesian Classifier Assume: • True state has N values: 1, 2, 3 .. N • There are K symptoms called Symptom1, Symptom2, … SymptomK • Symptomi has Mi values: 1, 2, .. Mi P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
  • 71. 73 Building a naïve Bayesian Classifier Assume: • True state has N values: 1, 2, 3 .. N • There are K symptoms called Symptom1, Symptom2, … SymptomK • Symptomi has Mi values: 1, 2, .. Mi P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___ Example: Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
  • 72. 74 Building a naïve Bayesian Classifier Assume: • True state has N values: 1, 2, 3 .. N • There are K symptoms called Symptom1, Symptom2, … SymptomK • Symptomi has Mi values: 1, 2, .. Mi P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___ Example: Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
  • 73. 75 Building a naïve Bayesian Classifier Assume: • True state has N values: 1, 2, 3 .. N • There are K symptoms called Symptom1, Symptom2, … SymptomK • Symptomi has Mi values: 1, 2, .. Mi P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___ Example: Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
  • 74. 76 Learning a Bayesian Classifier breath difficulty just neck plain rash short sore ugly Shortness of breath 1 0 0 0 0 0 1 0 0 Difficulty breathing 1 1 0 0 0 0 0 0 0 Rash on neck 0 0 0 1 0 1 0 0 0 Sore neck and difficulty breathing 1 1 0 1 0 0 0 1 0 Just plain ugly 0 0 1 0 1 0 0 0 1 1. Before deployment of classifier,
  • 75. 77 Learning a Bayesian Classifier EXPERT SAYS breath difficulty just neck plain rash short sore ugly Shortness of breath Resp 1 0 0 0 0 0 1 0 0 Difficulty breathing Resp 1 1 0 0 0 0 0 0 0 Rash on neck Rash 0 0 0 1 0 1 0 0 0 Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0 Just plain ugly Other 0 0 1 0 1 0 0 0 1 1. Before deployment of classifier, get labeled training data
  • 76. 78 Learning a Bayesian Classifier EXPERT SAYS breath difficulty just neck plain rash short sore ugly Shortness of breath Resp 1 0 0 0 0 0 1 0 0 Difficulty breathing Resp 1 1 0 0 0 0 0 0 0 Rash on neck Rash 0 0 0 1 0 1 0 0 0 Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0 Just plain ugly Other 0 0 1 0 1 0 0 0 1 1. Before deployment of classifier, get labeled training data 2. Learn parameters (conditionals, and priors) P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
  • 77. 79 Learning a Bayesian Classifier EXPERT SAYS breath difficulty just neck plain rash short sore ugly Shortness of breath Resp 1 0 0 0 0 0 1 0 0 Difficulty breathing Resp 1 1 0 0 0 0 0 0 0 Rash on neck Rash 0 0 0 1 0 1 0 0 0 Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0 Just plain ugly Other 0 0 1 0 1 0 0 0 1 1. Before deployment of classifier, get labeled training data 2. Learn parameters (conditionals, and priors) P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___ recordstrainingresp""num breath""containingrecordstrainingresp""num )Respprodrome|1breath( P
  • 78. 80 Learning a Bayesian Classifier EXPERT SAYS breath difficulty just neck plain rash short sore ugly Shortness of breath Resp 1 0 0 0 0 0 1 0 0 Difficulty breathing Resp 1 1 0 0 0 0 0 0 0 Rash on neck Rash 0 0 0 1 0 1 0 0 0 Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0 Just plain ugly Other 0 0 1 0 1 0 0 0 1 1. Before deployment of classifier, get labeled training data 2. Learn parameters (conditionals, and priors) P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___ recordstrainingresp""num breath""containingrecordstrainingresp""num )Respprodrome|1breath( P recordstrainingnumtotal recordstrainingresp""num )Respprodrome( P
  • 79. 81 Learning a Bayesian Classifier EXPERT SAYS breath difficulty just neck plain rash short sore ugly Shortness of breath Resp 1 0 0 0 0 0 1 0 0 Difficulty breathing Resp 1 1 0 0 0 0 0 0 0 Rash on neck Rash 0 0 0 1 0 1 0 0 0 Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0 Just plain ugly Other 0 0 1 0 1 0 0 0 1 1. Before deployment of classifier, get labeled training data 2. Learn parameters (conditionals, and priors) P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
  • 80. 82 Learning a Bayesian Classifier EXPERT SAYS breath difficulty just neck plain rash short sore ugly Shortness of breath Resp 1 0 0 0 0 0 1 0 0 Difficulty breathing Resp 1 1 0 0 0 0 0 0 0 Rash on neck Rash 0 0 0 1 0 1 0 0 0 Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0 Just plain ugly Other 0 0 1 0 1 0 0 0 1 1. Before deployment of classifier, get labeled training data 2. Learn parameters (conditionals, and priors) 3. During deployment, apply classifier P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
  • 81. 83 Learning a Bayesian Classifier EXPERT SAYS breath difficulty just neck plain rash short sore ugly Shortness of breath Resp 1 0 0 0 0 0 1 0 0 Difficulty breathing Resp 1 1 0 0 0 0 0 0 0 Rash on neck Rash 0 0 0 1 0 1 0 0 0 Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0 Just plain ugly Other 0 0 1 0 1 0 0 0 1 1. Before deployment of classifier, get labeled training data 2. Learn parameters (conditionals, and priors) 3. During deployment, apply classifier New Chief Complaint: “Just sore breath” P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
  • 82. 84 Learning a Bayesian Classifier EXPERT SAYS breath difficulty just neck plain rash short sore ugly Shortness of breath Resp 1 0 0 0 0 0 1 0 0 Difficulty breathing Resp 1 1 0 0 0 0 0 0 0 Rash on neck Rash 0 0 0 1 0 1 0 0 0 Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0 Just plain ugly Other 0 0 1 0 1 0 0 0 1 1. Before deployment of classifier, get labeled training data 2. Learn parameters (conditionals, and priors) 3. During deployment, apply classifier ...)1,just0,difficulty1,breath|GIprodrome( P     Z ZPPP GIPPP )state(Z)...prod|0difficulty(Z)prod|1breath( )state(GI)...prod|0difficulty(GI)prod|1breath( New Chief Complaint: “Just sore breath” P(Prod'm=GI) = ___ P(Prod'm=respir) = ___ … P(Prod'm=const) = ___ P( angry | Prod'm=GI ) = ___ P( angry | Prod'm=respir ) = ___ … P( angry | Prod'm=const ) = ___ P( ~angry | Prod'm=GI ) = ___ P(~angry | Prod'm=respir ) = ___ … P(~angry | Prod'm=const ) = ___ P( blood | Prod'm=GI ) = ___ P( blood | Prod'm=respir ) = ___ … P( blood | Prod'm=const ) = ___ P( ~blood | Prod'm=GI ) = ___ P( ~blood | Prod'm=respir) = ___ … P( ~blood | Prod'm=const ) = ___ : : : P( vomit | Prod'm=GI ) = ___ P( vomit | Prod'm=respir ) = ___ … P( vomit | Prod'm=const ) = ___ P( ~vomit | Prod'm=GI ) = ___ P( ~vomit |Prod'm=respir ) = ___ … P( ~vomit | Prod'm=const ) = ___
  • 83. 85 CoCo Performance (AUC scores) • Botulism 0.78 • rash, 0.91 • neurological 0.92 • hemorrhagic, 0.93; • constitutional 0.93 • gastrointestinal 0.95 • other, 0.96 • respiratory 0.96
  • 84. 86 Conclusion • Automated text extraction is increasingly important • There is a very wide world of text extraction outside Biosurveillance • The field has changed very fast, even in the past three years. • Warning, although Bayes Classifiers are simplest to implement, Logistic Regression or other discriminative methods often learn more accurately. Consider using off the shelf methods, such as William Cohen’s successful “minor third” open-source libraries: http://minorthird.sourceforge.net/ • Real systems (including CoCo) have many ingenious special-case improvements.
  • 85. 87 Discussion 1. What new data sources should we apply algorithms to? 1. EG Self-reporting? 2. What are related surveillance problems to which these kinds of algorithms can be applied? 3. Where are the gaps in the current algorithms world? 4. Are there other spatial problems out there? 5. Could new or pre-existing algorithms help in the period after an outbreak is detected? 6. Other comments about favorite tools of the trade.