Mais conteúdo relacionado Semelhante a Life and Work of Judea Perl | Turing100@Persistent (20) Mais de Persistent Systems Ltd. (20) Life and Work of Judea Perl | Turing100@Persistent2. ACM A. M. Turing Award
Judea Pearl
United States – 2011
Citation:
For fundamental contributions to
artificial intelligence through the
development of a calculus for
probabilistic and causal
reasoning.
35th Turing Award Recipient
In the Turing Centennial Year.
2
© 2012 Persistent Systems Ltd
3. Quotes about Judea Pearl’s work
―Judea Pearl's highly influential 1988 book (Probabilistic Reasoning in Intelligent
Systems) brought probability and decision theory into AI.‖
AI becomes science (1987 – present), AIMA. Stuart Russel & Peter Norvig
―His accomplishments over the last 30 years have provided the theoretical basis
for progress in artificial intelligence and led to extraordinary achievements in
machine learning, and they have redefined the term 'thinking machine‖
Vint Cerf, Turing Award Recipeient & President of ACM.
―Before Pearl, most AI systems reasoned with Boolean logic — they understood
true or false, but had a hard time with 'maybe’.‖
Alfred Spector, VP Research at Google
3
© 2012 Persistent Systems Ltd
4. Turing Test – Defining problem for AI
―The computer passes the test if a human interrogator, after posing some
written questions, cannot tell whether the written responses come from a
person or not‖
Alan Turing, Computing Machinery & Intelligence (1950)
The computer needs to process following capabilities
Natural Language Processing
Knowledge Representation
Automated Reasoning
Machine Learning
4
© 2012 Persistent Systems Ltd
6. Background
Born: 1936, Tel Aviv, Israel
Education:
BS, Technion Israel -1960
MS, Electronics, Newark College of Engineering -
1961
MS, Physics, Rutgers – 1965
PhD. Electrical Engineering, Polytechnic Institute of
Brooklyn, 1965
Professional Career:
Member Technical Staff, RCA Research Laboratories,
1961 - 1965
Director, Electronic Memories, Inc., Hawthorne,
1966–1969)
Faculty, University of California, Los Angeles,
Computer Science Department, 1969 – to date
(Emeritus Faculty since 1994)
Director of Cognitive Systems Laboratory, UCLA
(1978-)
6
© 2012 Persistent Systems Ltd
7. Research Interests
Early Research:
Magnetic and superconducting memories.
Combinatorial Search - A* Search
Heuristics: Intelligent Search Strategies for
Computer Problem Solving,
Probability & Decision Theory
Probabilistic Reasoning in Intelligent Systems
Causality & its applications in different
domains
Causality: Models, Reasoning, and Inference
7
© 2012 Persistent Systems Ltd
8. Daniel Pearl
Journalist, Musician
Wall Street Journal (South Asia
Bureau Chief 2002)
Kidnapped & Murdered in Karachi,
2002
Daniel Pearl Foundation
Promotes cross-cultural
understanding through journalism
and music.
Formed by Ruth & Judea Pearl in
2002.
(1963 – 2002)
8
© 2012 Persistent Systems Ltd
11. Thank you for Smoking !
Nick Naylor
• Academy of Tobacco Studies, a firm that
promotes the benefits of cigarettes.
• Evangelist for Tobacco products.
Ortolan K. Finnistire
• Senator from Vermont – Anti Tobacco
• Pass a resolution to put a skull & bones
symbol on Cigarette cases.
11
© 2012 Persistent Systems Ltd
12. Cigarette Smoking causes Lung Cancer ?
Cause: Smoking Cigarettes
Effect: Lung Cancer
Eating Cheese Leads to Heart Disease ?
Cause: Eating cheese
Effect: Heart Disease
12
© 2012 Persistent Systems Ltd
13. Which of these are actually Causal ?
Eating high protein food leads to Weight Loss.
Eating Aspirin reduces the risk of Heart Attack.
Women’s empowerment reduces population birth rate.
Bigger search button on a web page increases click-through.
Drinking Milk with additives increases height of kids.
Lower class size improve learning.
Carbon Emissions cause global warming.
Reducing Taxes increases job creation.
Lower interest rates leads to improved economy.
Higher pay leads to reduced attrition.
13
© 2012 Persistent Systems Ltd
14. Pearl’s Riddles of Causation
What patterns of experience convince
people that connection is causal.
What difference will it make if I told you that
a certain connection is causal or not causal.
14
© 2012 Persistent Systems Ltd
17. From a Pulitzer Prize–winning investigative
reporter at The New York Times comes the
explosive story of the rise of the processed
food industry and its link to the emerging
obesity epidemic.
Michael Moss reveals how companies use
salt, sugar, and fat to addict us and, more
important, how we can fight back
17
© 2012 Persistent Systems Ltd
18. www.persistentsys.com
The Art and Science of
Cause and Effect – Judea
Pearl
Transcript of lecture given Thursday, October 29, 1996,
UCLA 81st Faculty Research Lecture Series
© 2012 Persistent Systems Ltd
20. David Hume - Philosopher
“"Treatise of Human Nature“ – David
Hume
“Thus we remember to have seen that
species of object we call FLAME, and to
have felt that species of sensation we call
HEAT. We likewise call to mind their
constant conjunction in all past instances.
Without any farther ceremony, we call the
one CAUSE and the other EFFECT, and
infer the existence of the one from that of the
other.―
(1711 –1776)
20
© 2012 Persistent Systems Ltd
22. Francis Galton & Karl Pearson
Study of Inheritance of intelligence
Study of fore-arm & height
measurements
“Co- relation must be the consequence Francis Galton
of the variations of the two organs being (1822 - 1911)
partly due to common causes.“
Karl Pearson
(1857-1936)
22
© 2012 Persistent Systems Ltd
23. Correlation & Dependence
Correlation: It is a measure of
relationship between two
mathematical variables or
measured data values
Correlation coefficient
Pearson’s correlation coefficient
23
© 2012 Persistent Systems Ltd
24. Correlation is NOT Causation !
Careful inferring Causation from
Correlation !
Indicates possibility of predictive
relationship
Correlation is not the sufficient
condition for Causation.
Correlation or Causation?
Did Avas cause Housing Bubble
?
Is murder rate related to the
height of a mountain range ?
http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.html
24 M night Shyamalan’s lack of © 2012 Persistent Systems Ltd
26. Sir Ronald Fisher
Randomized Controlled Trials
Only accepted way for proving
causality.
First proposed by Charles
Sanders Peirce in education
Promoted and formalized by Sir
Ronald Fisher
Design of Experiments, (1935)
26
© 2012 Persistent Systems Ltd
27. Randomized Control Trials Diagram
Four Phases for RCT in Clinical
Trials
Enrollment
Intervention Allocation
Follow-up
Data Analysis
27
© 2012 Persistent Systems Ltd
28. Randomized Control Trials in Web
Current Search Widget
Proposed Search Widget
Which one is better ?
28 Ronny Kohavi, http://www.exp-platform.com/Pages/default.aspx
© 2012 Persistent Systems Ltd
29. Run Experiment (RCT) and Decide.
Data Driven Decision Making
Also known as A/B Testing
Google ran approximately
12,000 randomized experiments
in 2009 – 10% resulted in
change.
Web is ideal for running and
improving using experiments.
Very low cost of running the
experiment on web
29 Ronny Kohavi, http://www.exp-platform.com/Pages/default.aspx
© 2012 Persistent Systems Ltd
30. Things to keep in Mind
Randomization before allocation is critical
The exposure of other parameters, except for the feature under test, in
Control & Treatment group should be identical.
Use statistical significance tests on the results
Large enough of sample set
Remove the random chance of obtaining result in a trial.
30
© 2012 Persistent Systems Ltd
31. Challenges of Randomized Controlled
Trials
In most cases running a RCT is infeasible
Economics, Anthropology, Politics
In some cases it might be illegal !
Lack of Deeper Understanding
Understanding = How things work when taken apart !
Lack of language to express causal concepts explicitly is responsible for the
poor scientific activity around Causality.
Judea Pearl
31
© 2012 Persistent Systems Ltd
33. Judea Pearl’s Contribution to Causality
1. Representation for capturing relationships between different pieces of
information and their causal link.
Bayesian Networks
Capture
2. Algebra of Intervention
Do operator to capture explicit actions
Their relationship with Probability.
Judea Pearl’s work gave language and notation to Causality
and bought it under Mathematical Sciences
33
© 2012 Persistent Systems Ltd
34. Cigarette smoking & Lung Cancer
1964 study finds that cigarette smokers have a higher
chance of getting Lung Cancer
Cigarette lobby indicates a presence of unknown gene
that causes urge for Nicotine and causes cancer.
Study finds that people who visit bars have a higher
chance of getting lung cancer.
Doctors find a relationship between tar deposits in
lung and having lung cancer.
34
© 2012 Persistent Systems Ltd
35. Factors Affecting Pneumonia (An
Example)
From: Aronsky, D. and Haug, P.J., Diagnosing community-acquired
pneumonia with a Bayesian network, In: Proceedings of the Fall
Symposium of the American Medical Informatics Association, (1998) 632-
35 636. 35
© 2012 Persistent Systems Ltd
36. Challenge !
How can I establish a causal relationship
between smoking and lung cancer using this
data ?
P(Cancer | smoking) ?? P(Cancer)
P(cancer | smoking) > P(Cancer)
P (cancer | smoking) = P(Cancer)
36
© 2012 Persistent Systems Ltd
37. A Tutorial on Bayesian Networks - Oregon State University
www.persistentsys.com
A Tutorial on Bayesian Networks - Oregon State University
Primer on Probability
A Tutorial on Bayesian Networks, Weng-Keen Wong - Oregon
State University
© 2012 Persistent Systems Ltd
38. Probability Primer: Random
Variables
A random variable is the basic element of probability
Refers to an event and there is some degree of uncertainty as to
the outcome of the event
For example, the random variable A could be the event of getting
a heads on a coin flip
38
© 2012 Persistent Systems Ltd
39. Boolean Random Variables
We deal with the simplest type of random
variables – Boolean ones
Take the values true or false
Think of the event as occurring or not
occurring
Examples (Let A be a Boolean random
variable):
A = Getting heads on a coin flip
A = It will rain today
39 A = There is a typo in these slides © 2012 Persistent Systems Ltd
40. Probabilities
We will write P(A = true) to mean the probability that A = true.
What is probability? It is the relative frequency with which an outcome would be
obtained if the process were repeated a large number of times under similar
conditions*
The sum of the red
and blue areas is 1 P(A = true)
*Ahem…there’s also the Bayesian
definition which says probability is
your degree of belief in an outcome P(A = false)
40
© 2012 Persistent Systems Ltd
41. Conditional Probability
P(A = true | B = true) = Out of all the outcomes in
which B is true, how many also have A equal to true
Read this as: “Probability of A conditioned on B” or
“Probability of A given B”
H = “Have a headache”
F = “Coming down with Flu”
P(F = true)
P(H = true) = 1/10
P(F = true) = 1/40
P(H = true | F = true) = 1/2
P(H = “Headaches are rare and flu is rarer, but if
true) you’re coming down with flu there’s a 50-
41
50 chance you’ll have a headache.”
© 2012 Persistent Systems Ltd
42. The Joint Probability Distribution
We will write P(A = true, B = true) to mean “the probability of A =
true and B = true”
Notice that:
P(H=true|F=true)
P(F = true)
Area of " H and F" region
Area of " F" region
P(H true, F true)
P(F true)
P(H =
true) In general, P(X|Y)=P(X,Y)/P(Y)
42
© 2012 Persistent Systems Ltd
43. The Joint Probability Distribution
A B C P(A,B,C)
Joint probabilities can be false false false 0.1
between any number of false false true 0.2
variables false true false 0.05
false true true 0.05
eg. P(A = true, B = true, C = true false false 0.3
true) true false true 0.1
true true false 0.05
For each combination of true true true 0.15
variables, we need to say how
probable that combination is Sums to 1
The probabilities of these
43 combinations need to sum to 1
© 2012 Persistent Systems Ltd
44. The Joint Probability Distribution
Once you have the joint probability A B C P(A,B,C)
distribution, you can calculate any false false false 0.1
probability involving A, B, and C false false true 0.2
Note: May need to use marginalization false true false 0.05
and Bayes rule, (both of which are not false true true 0.05
discussed in these slides) true false false 0.3
true false true 0.1
Examples of things you can compute: true true false 0.05
true true true 0.15
• P(A=true) = sum of P(A,B,C) in rows with A=true
• P(A=true, B = true | C=true) =
P(A = true, B = true, C = true) / P(C = true)
44
© 2012 Persistent Systems Ltd
45. The Problem with the Joint Distribution
Lots of entries in the table to fill A B C P(A,B,C)
up! false false false 0.1
For k Boolean random variables, false false true 0.2
you need a table of size 2k false true false 0.05
How do we use fewer numbers? false true true 0.05
Need the concept of true false false 0.3
independence true false true 0.1
true true false 0.05
true true true 0.15
45
© 2012 Persistent Systems Ltd
46. Independence
Variables A and B are independent if any of the following hold:
P(A,B) = P(A) P(B)
P(A | B) = P(A)
P(B | A) = P(B) This says that knowing the outcome of A does
not tell me anything new about the outcome of
B.
46
© 2012 Persistent Systems Ltd
47. Independence
How is independence useful?
Suppose you have n coin flips and you want to
calculate the joint distribution P(C1, …, Cn)
If the coin flips are not independent, you need
2n values in the table
If the coin flips are independent, then
n
Each P(Ci) table has 2 entries and
P ( C 1 ,..., C n ) P (C i ) there are n of them for a total of 2n
i 1
values
47
© 2012 Persistent Systems Ltd
48. Conditional Independence
Variables A and B are conditionally independent given C if any of the
following hold:
P(A, B | C) = P(A | C) P(B | C)
P(A | B, C) = P(A | C)
P(B | A, C) = P(B | C) Knowing C tells me everything about B. I don’t gain
anything by knowing A (either because A doesn’t
influence B or because knowing C provides all the
information knowing A would give)
48
© 2012 Persistent Systems Ltd
50. A Bayesian Network
A Bayesian network is made up
of two things A
1. A Directed Acyclic Graph
B
C D
2. A set of tables for each node in the graph
A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
50
© 2012 Persistent Systems Ltd
51. A Directed Acyclic Graph
Each node in the graph is a A node X is a parent of another
random variable node Y if there is an arrow from
node X to node Y eg. A is a parent
A of B
B
C D
Informally, an arrow from node X
to node Y means X has a direct
influence on Y
51
© 2012 Persistent Systems Ltd
52. A Set of Tables for Each Node
A P(A) A B P(B|A)
Each node Xi has a conditional
false 0.6 false false 0.01
probability distribution P(Xi |
true 0.4 false true 0.99
Parents(Xi)) that quantifies the effect
true false 0.7
true true 0.3
of the parents on the node
The parameters are the probabilities
B C P(C|B)
in these conditional probability
false false 0.4
tables (CPTs)
false true 0.6 A
true false 0.9
true true 0.1
B
B D P(D|B)
false false 0.02
C D
false true 0.98
true false 0.05
52 true true 0.95
© 2012 Persistent Systems Ltd
53. Bayesian Network for Cigarette Smoking
Mystery
Smoking| Mystery Gene) Gene
Cancer
Smoking
in Family
Late
night
Tar Partying
P(Tar | Smoking) Deposits
Lung
Cancer
53
© 2012 Persistent Systems Ltd
54. Bayesian Networks
Two important properties:
1. Encodes the conditional independence relationships between the
variables in the graph structure
2. Is a compact representation of the joint probability distribution over
the variables
54 54
© 2012 Persistent Systems Ltd
55. Conditional Independence
The Markov condition: given its parents (P1, P2),
a node (X) is conditionally independent of its
non-descendants (ND1, ND2)
P1 P2
ND1 X ND2
C1 C2
55 55
© 2012 Persistent Systems Ltd
56. The Joint Probability Distribution
Due to the Markov condition, we can compute the joint probability
distribution over all the variables X1, …, Xn in the Bayesian net
using the formula:
n
P( X1 x1 ,..., X n
xn ) P( X i
x i | Parents ( X i ))
i 1
Where Parents(Xi) means the values of the Parents of the node Xi with respect to
the graph
56 56
© 2012 Persistent Systems Ltd
57. Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true) A
= (0.4)*(0.3)*(0.1)*(0.95) B
C D
57 57
© 2012 Persistent Systems Ltd
58. Using a Bayesian Network
Example
Using the network in the example, suppose you want to
calculate: This is from the
graph structure
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true) A
= (0.4)*(0.3)*(0.1)*(0.95) B
These numbers are from the
conditional probability tables
C D
58 58
© 2012 Persistent Systems Ltd
59. Inference
Using a Bayesian network to compute
probabilities is called inference
In general, inference involves queries of the
form:
E = The evidence variable(s)
P( XX|=E ) query variable(s)
The
59 59
© 2012 Persistent Systems Ltd
60. Key Questions on Bayesian Networks
How do you build a Bayesian Networks ?
How do you compute conditional probabilities based on data ?
What about continuous variables ?
Without data how do you build Bayesian Networks ? Can you
capture data from your experience in the network ?
Can you learn the structure from data ?
How do you draw inference using Bayesian Networks ?
How do you manage the computational complexity of the network
for exact inference.
60
© 2012 Persistent Systems Ltd
62. Algebra of Doing
Available: Algebra of Seeing Simplify the Bayesian Network
What is the chance it rained if by explicitly capturing an
we see the grass is wet ? intervention.
P(Rain | wet) = P (wet | rain) Causal conditional probabilities.
P(rain)/P(wet) P( x |do (y))
Algebra of Doing Calculus for moving from
What is the chance that it rained Causal conditional probability to
if we make the grass wet ? conditional probability.
P(rain | do(wet) ) = P(rain)
62
© 2012 Persistent Systems Ltd
63. Causal Conditional Probabilities
Borrowing Ideas from Gene
Randomized Controlled Trials
Hypothetical world where ?
Can we compute Smoking Cancer
P(cancer| do(smoking)) ?
Allows us to override Causal Tar
influences for that variable.
63
© 2012 Persistent Systems Ltd
64. Using Causal Conditional Probabilities
Setup an intervention in Intervention
Bayesian networks
Override all other Causal
influences in presence of Smoking Cancer
intervention.
P(Tar | do(smoking) Tar
Convert from do calculus to
observational calculus.
64
© 2012 Persistent Systems Ltd
66. To summarize
Correlation is NOT Causation
Randomized Controlled Trials (RCT) can
establish causation.
Want more ?
Study Causality !
66
© 2012 Persistent Systems Ltd
67. References
Causality: Models, Reasoning, and Inference
Judea Pearl, Second Edition
A Tutorial on Learning With Bayesian Networks ,
David Heckerman
Technical Report, Microsoft Research.
Bayesian Networks without Tears, Eugene Cherniak
AI Magazine, 1991
If Correlation does not imply Causation, what does ?
Michael Nielson blog
67
© 2012 Persistent Systems Ltd
68. Thank You
Persistent Systems Limited
www.persistentsys.com
68
68
© 2012 Persistent Systems Ltd © 2012 Persistent Systems Ltd
Notas do Editor Turing Award Lecturehttp://www.youtube.com/watch?v=78EmmdfOcI8&feature=player_embedded “The Mechanization of Causal Inference: A ‘Mini Turing Test’ and Beyond,”