An introduction to causal graphical models with examples of causality in practice from different fields of science. More focused discussion of causal inference in online ads and recommender systems.
2. My route to causality
Building
recommender
systems in social
networks
Conducting user
experiments
Estimating impact of
recommendations
and social feeds
3. Causality is everywhere
Spans every branch of science.
◦ Economics
◦ Political science
◦ Study of human behavior
◦ Biology and medicine
◦ Computer science (?)
Spans centuries of thought.
◦ Aristotle: “To know, is to know the final cause.”
Took us until 1930s to come up with the randomized experiment
(Fisher).
Still early days for estimating causal effects from observational data.
4. Causality in economics
David Card. The causal effect of education on earnings (1999)
Conley and Heerwig. The Long-Term Effects of Military
Conscription on Mortality: Estimates From the Vietnam-Era Draft
Lottery (2012)
5. Causality in political science
Darrell West. Air Wars (2013)
Chattopadhyay and Duflo. Women as Policy Makers:
Evidence from a Randomized Policy Experiment in
India (2004)
6. Causality in human behavior
Thistlewaithe and Campbell. Effect of public recognition
of scholastic achievement (1960)
Christakis and Fowler. The collective dynamics
of smoking in a large social network (2008)
7. Causality in biology and
medicine
Effect of Vitamin D deficiency on colon cancer
Effect of heart attack surgery on long-term
health of patient
8. Causality in web applications
Sharma and Cosley. Distinguishing between personal preference
and homophily in online activity feeds (2016).
Sharma, Hofman and Watts. Estimating the causal impact of
recommender systems (2015).
9. Counterfactual reasoning
Correlation question: How well can X predict Y?
◦ Machine learning, Statistical estimation.
Interventionist question: If X is changed to X’, what will be
the value of Y?
◦ Experiments, Reinforcement learning, Contextual bandits.
Counterfactual question: If X would have been X’, what
would be the value of Y?
◦ Today’s focus.
10. Estimating causal effects from
observational data
Why is causal inference hard?
◦ Simpson’s paradox
The language of graphical models
◦ Backdoor criterion
◦ Frontdoor criterion
Common approaches for causal inference
◦ Conditioning
◦ Mechanism-based
◦ Natural Experiments
Example: Estimating causal impact of recommender systems
11. Estimating the effectiveness of
kidney stone treatment
Treatment A Treatment B
Small stones 93% (81/87) 87% (234/270)
Large stones 73% (192/263) 69% (55/80)
Both 78% (273/350) 83% (289/350)
Julious and Mullee. Confounding and Simpson’s Paradox (1994).
http://en.wikipedia.org/wiki/Simpson’s_paradox
Two treatments for kidney stones
Treatment A : 78% effective
Treatment B : 83% effective
12. Estimating ad placement on a
search engine
Suppose we would like to optimize the set of ads shown for a query,
rather than optimize inidividually.
Click probability
estimates: q1, q2
Does q2 depend on
q1?
1st, q1
2nd, q2
13. Confounders in ad placement
Let us define two groups with 2000 queries each:
◦ High q1: (149/2000) CTR on second ad
◦ Low q1: (124/2000) CTR on second ad
Low q1 High q1
Low q2 5.1% (92/1823) 4.8% (71/1500)
High q2 18.1% (32/176) 15.6% (78/500)
Both 6.2% (124/2000) 7.5% (149/2000)
Bottou et al. Counterfactual reasoning and learning systems
(2013).
14. Causal graphical models: a
framework for causality
Structural equation modeling (SEM)
X = q1
Y = CTR on second ad
15. Which variables to condition
on?
Observed variables
◦ Which observed variables?
◦ As we will see, observing on all variables may not be correct.
Known unknowns:
◦ Age, Past diseases, Food intake
Unknown unknowns:
◦ What else could impact recovery from kidney stones?
◦ Genetic markers?
17. Connections to Bayesian
networks
Markov assumption: Probability of an effect is independent of
everything else given its direct causes.
Two
approaches:
--Backdoor
criterion
--Frontdoor
criterion
18. Graphical Models and common
methods for causal estimation
Condition on
observed covariates
• Stratification
• Matching
• Regression (?)
Mechanism-based
strategies
• Path-based
approaches
Natural experiments
• As-if experiments
• Instrumental
Variables
• Regression
discontinuity
19. I. Conditioning on observed
covariates
Corresponds to Backdoor criterion.
23. c) Regression
Condition on observed covariates by
adding them as independent variables
in regression.
Works only if true causal
relationship between
variables is linear.
25. III. Natural Experiments
Look for experiments happening in the real world.
Promise greater generalizability than controlled lab experiments.
Require greater care to ensure validity of causal identification.
29. Summary: Two graphical criteria
explain all of conventional
approaches
A principled, succinct framework for causality.
Allows arbitrary functional forms for relationships between variables.
Leads to clear statements about causal assumptions.
If a causal effect can be identified, it can be derived using do-calculus
(helpful for bigger graphs).
32. X = Activity on current item that the user is
viewing
Y = Activity on the recommended Item
UX = Latent properties of X
UY = Latent Properties of Y
Why is
estimating
effects of
recommenda
tions difficult
using
observational
data?
If latent properties for X and Y
are correlated, then observed
changes in AY cannot be
directly attributed to AX.
AX AY
UYUX
A causal graphical model for the impact of recommendations
(ref. Pearl 09)
33. AX = Visits on a product X on Amazon
AY = Recommendation click-throughs from X
to Y
UX = Consumer demand for X
UY = Consumer demand for Y
If latent
properties for X
and Y are
correlated, then
observed
changes in AY
cannot be
directly
attributed to AX.
AX AY
UYUX
A causal graphical model for the impact of recommendations
34. Example:
Looking for a
machine
learning book
Observed clickthrough data
due to recommendations do
not tell the full story.
For example, let’s assume I just
completed the Artificial
Intelligence book by Russell
and Norvig and now I want to
learn more about machine
learning.
37. Xi: Focal Product
Yj: Recommended Products
Causal
Link
Convenience
Link
Revisi
t Link
Waste
d Link
There could be also be irrelevant links.
38. The Shock strategy (I.V.)
If direct visits to product Yj are nearly constant, then we
can assume that the convenience clicks to Yj will be
nearly constant.
Thus,
39. The Shock strategy
We cannot say much during normal traffic for a product. But if a product experiences a spike in
visits and its recommended product does not, then we can demonstrate a method to compute
the causal clickthrough rate.
40. Data description
Dataset: Anonymized Amazon URL log data from Bing toolbar for opted-in
users.
Eight months (Sept. 1 2013 to May 31 2014).
URL structure allows us to determine:
◦ Type of page visited (product, search, cart, bestsellers, wishlist)
◦ Type of referral to a product (recommendation, search, none, others)
After filtering out bots, sellers, authors, publishers and unpopular products (<5
visits):
◦ Number of products = 1.38 M
◦ Number of users = 2.1M
◦ 60 product categories (such as Books, Toys, Electronics)
41. Implementing
the strategy:
The shock
criteria
Large: Visits during a shock
must exceed 5 times the
median traffic for a product
Sudden: Visits during a
shock must be 5 times the
last day’s traffic and 5 times
the last week’s traffic
Sane: Visits from at least 10
unique users and on 5
different days before and
after a shock
4776
shocks to
4126
products
42. Implementing the strategy: The shock
criteria
Additionally, we want direct visits to Yj be constant. Maximum change in direct visits to Yj should not bigger
than the size of the shock.
When beta=1, ideally causal. When beta=1, all bets are off.
Good shock Bad shock (filtered out at beta=0.7)
44. Robustness checks
Shocks may not be representative
◦ Distribution of users, popularity and the affinity between users and products
does not see much difference (except that shocked products are, on average,
more popular).
Shocks may be caused by deals which make the focal product more
attractive
◦ Verification using referrals from log data (e.g. bookbub.com) and manual
inspection of past prices (from camelcamelcamel.com)
Shocks may be a property of the weird holiday season.
◦ They occur throughout the data, although with more frequency during the
holidays.
45. Graphical models form a succinct,
sound and complete framework
for reasoning about causality.
They can also be practical.
THANK YOU!
AMIT SHARMA, MICROSOFT RESEARCH
http://www.amitsharma.in
@amt_shrma
Editor's Notes
Similarly, you can think about personalized, adaptive books.
We are looking for causal clickthroughs
I just read AI by Russell. Now I search. So there could be convenient, revisits, causal and wasted links.
I just read AI by Russell. Now I search. So there could be convenient, revisits, causal and wasted links.
I just read AI by Russell. Now I search. So there could be convenient, revisits, causal and wasted links.
In case of a book, think of the search results as the contents in a book in the normal order. And maybe we want to personalize that.