What is probabilistic programming? By analogy: if functional programming is programming with first-class functions and equational reasoning, probabilistic programming is programming with first-class distributions and Bayesian inference. All computable probability distributions can be encoded as probabilistic programs, and every probabilistic program represents a probability distribution.
What does it do? It gives a concise language for specifying complex, structured statistical models, and abstracts over the implementation details of exact and approximate inference algorithms. These models can be networked, causal, hierarchical, recursive, anything: the graph structure of the program is the generative structure of the distribution.
Who's interested? Cognitive scientists, statisticians, machine-learning specialists, and artificial-intelligence researchers.
3. Bayesian Role-Playing Games
● With enough paper, we could just write out every possible play
based on every possible dice-roll and the fixed player levels.
● How many ways could we have gotten from the start of the
game to the board we see? That's a likelihood.
● Which dice rolls were more common in games that gave us the
real play? That's a posterior probability.
● Bayesianism: unknown levels are just more dice.
– And are the actual dice weighted?
4. Bayesian Reasoning is Hard!
● P(H | E) = P(E | H)*P(H) / ∫ P(E | H)*P(H) dH
● Bayesian reasoning: how to update beliefs in response to
evidence
● But the integral in the denominator often cannot be solved
analytically.
● Approaches: conjugate priors and posteriors (for which we don't
need to evaluate it), various numerical methods
5. The Trouble with Bayesian Modeling (1)
Calculating probabilities takes time exponential in the number of relevant variables!
7. The Trouble with Bayesian Modeling (3)
● Exact Bayesian inference methods only support possible-worlds
with a finite number of things in them.
● Every combination of model and inference method currently has
to be combined manually into non-reusable code.
● Probabilistic modeling as we think we know it is inexpressive
and slow, ranging in complexity from NP-complete to
DEXPTIME.
8. Programs have generative structure
● Generative model = Rules + Randomness + Random
Parameters
– A program for generating random outcomes, given unknown
parameters.
● Probabilistic Program = Probability Distribution
● The Stochastic Lambda Calculus: a Turing-complete language
with random choices.
– In jargon: the 'probability monad'.
9. What sorts of queries?
● Sampling: 'What are some possibilities?'
● Expectation: 'What should we expect to see?'
● Support: 'What could happen?'
● Probability mass/density: 'What's the chance this happens?'
● Conditional query: 'Given this, what about that? What about
when X happens more than Y?'
– 'Given that I saw 18 black ravens, what proportion of all ravens are
black?'
10. What sorts of inference?
● Mostly sampling-based approximate inference, but there are
some exact algorithms.
● By sampling, we can reason about models where each possible
world might contain a different number of actual entities.
● Approximate inference only has a small cost of its own after we
generate many samples from the distribution.
– And advanced inference algorithms can re-use parts of computations
to sample even faster.
11. What can probabilistic programs express?
● In a 'code as data' language like Church (based on Scheme),
programs can include eval and learn arbitrary code.
● Or they can do a query about a query: inference about
inference.
● Generative structure generalizes logical and physical structure:
'The λ-calculus does not focus on the sequence of time, but rather o
12. The Elegance of Probabilistic Programming
● Programs = Distributions.
– Running a program = sampling an outcome from the distribution
– Monadic semantics: every function maps input values to output
distributions.
● Queries are just functions; language runtime performs
inference.
● '[H]allucinate possible worlds', and reason about them.
14. How can we use probabilistic programming?
● 'We would like to build computing machines that can interpret
and learn from their sensory experience, act effectively in real
time, and - ultimately - design and program themselves.'
● Professor Vikash Mansinghka, 'Natively Probabilistic Computation'
● Applications in: computer vision, cognitive science, machine
learning, natural language processing, and artificial intelligence.
15. How can I use probabilistic programming?
● Write a program simulating whatever you want to reason about.
Where you don't know which specific choice to make, choose
randomly.
● Present the language runtime's inference engine with real-world
data compatible with your model.
– And go get some coffee.
● The inference engine will learn which random choices generate
data like yours.
17. Computer Vision in 20 Lines of Code (2)
● The prior model 'knows' how to render pictures, but chooses
what to render completely at random. Inference then 'learns'
which choices most likely drew the real image.
● 'Our probabilistic graphics program did not originally support
rotation, which was needed for the AOL CAPTCHAs; adding it
required only 1 additional line of probabilistic code.'
– http://probcomp.csail.mit.edu/gpgp/
● Homework takes more than 20 lines of code!
18. Which bar should we meet at?
Social reasoning and cooperation in 11 LOC.
19. Modeling where to meet for drinks
(define (sample-location) (if (flip .55) 'popular-bar 'unpopular-bar))
(define (alice depth)
(query
(define alice-location (sample-location))
alice-location
(equal? alice-location (bob (- depth 1)))))
(define (bob depth)
(query
(define bob-location (sample-location))
bob-location
(or (= depth 0) (equal? bob-location (alice depth)))))
The correct answer is actually Libirah!
20. Generating Scenes Under Constraints
● Sample fresh coffee shops, conditioned on the constraints of good design.
● Scene generation is an open-world task: not just where to put furniture but how
much to generate is random.
● Exact inference can't even handle these models!
21. What more needs doing?
● Sampling-based approximation helps, but we need better inference
algorithms. No real-time reasoning yet.
● Techniques from compilers research help.
– One recent paper got a 600x speed-up by caching the nonrandom parts of
each computation and applying Just-In-Time compilation.
● Halting Problem, Rice's Theorem: we can't prove one inference
algorithm supreme in all circumstances.
● Reusable data: export what we've learned and pass it on to others.
● Ways to treat intractability probabilistically
22. Conclusions
● Rules = Programs, Uncertain knowledge = Random choice.
● Rules + Uncertainty = Programs + Randomness = Probabilistic
Programming
● Probabilistic programs can express any complex model, but need to
improve our inference algorithms to make reasoning tractable.
● Fields of application: cognitive science, procedural content
generation, machine learning, computer vision, Bayesian statistics,
artificial intelligence.
23. Bibliography (1)
● Probabilistic Models of Cognition
● http://probabilistic-programming.org/research/, http://forestdb.org/
● Vikash Mansinghka. Natively Probabilistic Computation. PhD thesis, Massachusetts
Institute of Technology, 2009.
● Noah D. Goodman, Vikash K. Mansinghka, Daniel M. Roy, Keith Bonawitz, and
Joshua B. Tenenbaum. Church: a language for generative models. In Proc. of
Uncertainty in Artificial Intelligence, 2008.
● Fritz Obermeyer. Automated equational reasoning in nondeterministic lambda-calculi
modulo theories H*. PhD thesis, Carnegie-Mellon University, 2009.
24. Bibliography (2)
● Vikash Mansinghka, Tejas Kulkarni, Yura Perov, Josh Tenenbaum. Approximate Bayesian
Image Interpretation using Generative Probabilistic Graphics Programs. Neural Information
Processing Systems 2013.
●
Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic
programming. In International Conference on Software Engineering (ICSE, FOSE track), 2014.
● Andreas Stuhlmüller and Noah D. Goodman. Reasoning about Reasoning by Nested
Conditioning: Modeling Theory of Mind with Probabilistic Programs. In Cognitive Systems
Research, 2013.
● Yi-Ting Yeh, Lingfeng Yang, Matthew Watson, Noah D. Goodman, and Pat Hanrahan. 2012.
Synthesizing open worlds with constraints using locally annealed reversible jump MCMC. ACM
Trans. Graph. 31, 4, Article 56 (July 2012).