Decision Theory Research at FRI

Johannes Treutlein
Foundational Research Institute
Decision theory research
at FRI

Johannes Treutlein
A wager for evidential decision
theory

Altruistic Newcomb problem
3
Ω
?
one
wish
predicts one-boxing: 
two wishes
predicts two-boxing: 
nothing

Altruistic Newcomb problem
4
S1 S2
A1 2 0
A2 3 1
● A1: One-box; A2: Two-box
● S1: opaque box contains two wishes; S2: opaque box empty

Meta decision theory
7
(Nozick 1993; MacAskill 2016)

8
Altruistic Newcomb problem in a large
universe
Ω
Ω
Ω
Ω
Ω
Ω
Ω

Altruistic Newcomb problem in a large
universe
9

EDT Wager
10
● Large universe
● Caring about the gains of our copies
● Non-zero credence in EDT
● Meta decision theory
Wager for evidential decision theory (and all other theories that
take impact of copies into account)

Relevance
11
● AI Safety
● Macrostrategy
● Multiverse-wide superrationality (Oesterheld 2017a)

Caspar Oesterheld  
Decision theory and approval-
directed agents

Implementing decision theories in AIs
13
• Two problems of decision theory in AI safety:
• What is the right decision theory for an AI?
• How do we implement decision theories in AI?
• Decision theory not explicit in AI architecture
• Example: Doing what has worked well in the past (Oesterheld
2017b)
• Exception: Gödel machine (Schmidhuber 2006)

Approval-directed agency
14
(Christiano 2014)

20
In the paper…
If overseer only looks at the world, the agent’s DT is
decisive.
If overseer only looks at the agent’s action, the
overseer’s DT is decisive.

Presentation title
John Smith | Head of Department 28.06.2016
Subtitle or caption
Thank you.
{johannes,caspar}@foundational-research.org

References
22
• Ahmed, A. (2014): Evidence, Decision and Causality. Cambridge University Press.
• Almond, P. (2010): On Causation and Correlation. Part 2: Implications of Evidential
Decision Theory. https://casparoesterheld.files.wordpress.com/2017/03/
correlation2.pdf
• Bostrom, N. (2014b): Superintelligence: Paths, Dangers, Strategies. Oxford
University Press.
• Christiano, P. (2014): Model-free decisions. https://ai-alignment.com/model-free-
decisions-6e6609f5d99e
• MacAskill, W. (2016): Smokers, Psychos, and Decision-Theoretic Uncertainty. The
Journal of Philosophy
• Nozick, R. (1993): The Nature of Rationality. Princeton: Princeton University Press

References
23
• Oesterheld, C. (2017b): Doing what has worked well in the past leads to evidential
decision theory. https://casparoesterheld.files.wordpress.com/2017/09/learningdt.pdf
• Oesterheld, C. (2017a): Multiverse-wide Cooperation via Correlated Decision
Making. https://foundational-research.org/files/Multiverse-wide-Cooperation-via-
Correlated-Decision-Making.pdf
• Schmidhuber, J. (2006): Gödel Machines: Self-Referential Universal Problem Solvers
Making Provably Optimal Self-Improvements. ftp://ftp.idsia.ch/pub/juergen/gm6.pdf
• Soares, N. and Fallenstein, B. (2014a): Aligning Superintelligence with Human
Interests: A Technical Research Agenda. MIRI Tech. rep. 2014-8. https://
intelligence.org/files/TechnicalAgenda.pdf
• Soares, N. and Fallenstein, B. (2014b): Toward Idealized Decision Theory. MIRI
Tech. rep. 2014-7. https://arxiv.org/abs/1507.01986
• Soares and Levinstein (2017): Cheating Death in Damascus. https://intelligence.org/
files/DeathInDamascus.pdf

Decision Theory Research at FRI

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Decision Theory Research at FRI

Semelhante a Decision Theory Research at FRI (20)

Mais de Effective Altruism Foundation

Mais de Effective Altruism Foundation (20)

Último

Último (20)

Decision Theory Research at FRI