SlideShare uma empresa Scribd logo
1 de 34
SUBGOAL LEARNING,
                           MACRO-ACTIONS,
                       PARTIAL OBSERVATION,
                   CLUSTERING OF FEATURES,
            and other stuff for difficult reinforcement learning
                                  settings.


One slide out of topic, sorry :-)
 Yesterday I was particularly interested in the discussion around
deep networks and convolution networks and Yann LeCun et al for
                  computer vision, thanks :-)

          Seemingly computational power is a big part
                  of computer vision, right ?
MASH WP6 – Goal Planning
             Controlling a 3D avatar
                or a robot arm:

  - without expert help
  - without model
  - without parallel runs
  - without knowing the target
  - with expensive runs
  - using existing human expertise if any, in a
way compliant with crowd-sourcing (human does
not know the platform)
Category of problems
●   MDP solving: you have access to the model
●   Generative models:
      –   Cases in which you can “undo”
      –   Cases in which you can not




       The hardest reinforcement
         learning setting you can
        find, with expensive sims
Goals of the project

–   Adapting MCTS for such problems

–   Parallel model-free MCTS

–   Facilitating and Testing Crowd-Sourcing

–   Other methods for such problems
Outline
●   What we have done
       –   Extension to partially observable expensive “very” model-
           free problems
       –   Experiments on other WPs testbeds
       –   Experiments on our testbeds
●   Parallelization
●   Conclusions
●   Perspectives
MCTS / UCT
●   MCTS = UCT (nearly)
●   Very good for high-dimensional problem with
    little expertise
●   Requires many simulations
●   Principle:
    ●   Do simulations (plenty of)
    ●   Adaptive decisions: first simulations with stupid
        strategies, and online improve the simulated
        strategy.
Change #1: Macro-Actions (MA)
●   With low-level decisions, actions should often
    be repeated for being meaningfull
●   Example with left-right:
       RRRRRR makes sense
       LLLLLLLL makes sense
       RLLLRRL makes no sense
●   Automatically categorize actions: eventually
    stationary, opposite, cyclic; define MA
    ==> state of the art + automatization
Change #2: Clustering Features
●   Many state variables are very similar
●   Clustering:
      – Performsimulations
      – Groups of correlated features


●   Strongly reduce the state space
    dimension
Change #3: memory
●   Partially Observable problems require
    memory
●   Tree of subgoals:
Change #3: memory
●   Partially Observable problems require
    memory
●   Tree of subgoals:
                                       I choose
                                      this action
Change #3: memory
●   Partially Observable problems require
    memory
●   Tree of subgoals:                         Each
                                              node
                                            contains
                                             a goal,
                                               i.e.
                                            features
                                              to be
                                            activated
Change #3: memory
●   Partially Observable problems require
    memory
●   Tree of subgoals:                         Each
    decisions                                 node
                                            contains
    made by                                  a goal,
                                               i.e.
    “voting”: MA                            features
                                              to be
    correlated                              activated
    with
    expected transitions
Change #3: memory
●   Partially Observable problems require
    memory
●               MCTS
    Tree of subgoals:       is an             Each
    decisions                                 node
                                            contains
      extremely
    made by             natural tool for     a goal,
                                               i.e.
    “voting”: MA                            features
                                              to be
    correlated
            building subgoals               activated
    with
    expected transitions
Summary CluVo + GMCTS:
               all in one slide
  1) Simulations, categorization of actions
  2) Building of macro-actions
  3) Clustering of features
  4) MCTS by simulations, correlations, voting:
      1) Node creation as in MCTS, but node=subgoal
      2) Simulations biased by rewards as in MCTS
      3) Goals → votes → MA → decisions


      Vote: actions which statistically activate
the goal features, in the current state, are preferred
Other developments
●   Q-learning
●   Fitted Q-iteration
●   Direct Policy Search


    Main issue: representation (macro-actions,
    clusters of features).
    ==> Direct Policy Search
    ●   also uses MA
    ●   but needs a memory
    ==> GMCTS quite convenient / focusing
         simulations.
Results of Cluvo+GMCTS
            on other WPs testcases
●   Blue flag then red flag ok
●   Looks easy, but in a fully agnostic framework
    and thousands of variables it is not that easy.
●   Same algorithm performed correctly on “catch
    as many flags as possible”.
●   Combines many things of the state of the art:
       –   Macro-actions
       –   Subgoal learning
       –   Clustering of features
       –   MCTS / UCT
All you can eat: DPS could do it,
 with MA (no memory needed)
Blue Flag then Red Flag:
Clustering ok for 3 out of 12 runs




             8h learning   Generalization
Test on testcases from
                 other WPs

●   This has taken most of the manpower, easy
    problems but with very difficult setting
●   No crowd-sourcing
●   We have other testbeds with external
    developpers (same platform)
Results on the
    game of Go (~8 contributors)
●   automatic modifications of the bandit (moderate
    success, far less efficient than supervised learning
    of databases or expert handcrafting)
    ==> Maths for crowd-sourcing:
    –   Automated regression testing by MSHT (good for crowd-
        sourcing)
    –   Constraints on the way human enter expertise, for
        preserving consistency
●   automatic precomputing of moves (opening books)
    ==> both are quite parallel, but very expensive for
    moderate progress
Results on Urban Rivals
      (18 millions of players, ~15
       developpers on the core)
●   Also partial observability, but information fully
    revealed frequently
●   Easy to simulate
●   MCTS was great for this application:
     – No human expertise needed
     – Consistent independently of human
       expertise
●   Solved the problem, whereas many
    engineers failed
MineSweeper – building a code on
   top of an existing heuristic
       (1 existing code...)
Results on Energy management
(~7/8 developers, high turn over)
●   Existing solutions often very poor for short
    term volatility and high-dimension of the state
    space
●   Simulation-based approaches: rigorous use
    of cross-validation, detailed non-simplified
    simulations
●   MCTS + DPS (for choosing the default
    policy): stable and efficient
Results
Conclusions (1)
●   MCTS adapted to partially observable
    expensive agnostic settings
        –   MCTS + all existing tricks from the state of the art
        –   Integration into MCTS probably more natural than in
            many algorithms (in particular subgoal learning)
        –   big implementation and experimentation work; more
            publications to come
●   An unexpected positive result:
    ●   Merge between two simulation-based tools, DPS
        (long term effects) and MCTS (short term effects)
    ●   Quite natural, highly parallel
    ●   Virtually no model bias
    ●   Really efficient in stochastic setting (not adversarial)
Conclusions (2)
●   We tested on problems with external
    developpers:
    ●   Simu-based optimization make precise models
        possible
    ●   Interface with humans: Automatic non-regression
        testing / constraints for consistency / interface for
        using human knowledge ok
    ●   WP problems did not motivate alternate developers,
        but principles could be tested on other testcases
●   No real crowd-sourcing, but some moderate
    teams of motivated developpers ==> easier
●   Principles developped for the platform are re-
    used for an industrial platform
Perspectives
●   The GMCTS / CLUVO program is stable and
    able to work on very hard settings by combining
    many state-of-the-art techniques, can have a
    long life; no crowd-sourcing
●   The application of simulation based methods is
    efficient, compliant with non-linear stochastic
    dynamics and parallel ==> validated for
    industrialization in energy management
●   MCTS variants for partially observable settings
    are ok also far from the hard Mash setting
Publication
●   Main MASH publication: JBHoock 's paper
    2012:
    ●   Categorizing actions for automatically designing
        macro-actions
    ●   Clustering of features
    ●   GMCTS for building subgoals
●   Many ideas in it, it's a big part of his ph.D. in
    one article.
Publications
●   Undecidability of adversarial planning with
    unbounded horizon        Planning with:
●   DPS: Convergence rates -of robust uncertainties
                                adversarial
                              - finite state space
    optimization / noisy optimization
                              - no observation
●                           - deterministic problem
    Parallel MCTS / nested MCTS
●   MCTS in continuous settings
                            Optimal average reward:
                               - undecidable
●   MCTS for PO setting    (real-world: Urban
                               - unapproximable   Rivals)
●   Model-free MCTS
●   Hybridization MCTS/DPS
●   Simulation-based optimization in power
    systems
Publications
●   Undecidability of adversarial planning with
    unbounded horizon
                              Planning with:
●   DPS: Convergence rates -of robust uncertainties
                                adversarial
    optimization / noisy optimization
                              - finite state space
                              - partially observation
●   Parallel MCTS / nested MCTS
                              - stochastic problem
●   MCTS in continuous settings all strategies,
                              - for
                                    stops almost surely
●   MCTS for PO setting (real-world: Urban Rivals)
                                Optimal average reward
●   Model-free MCTS                  is decidable
●   Hybridization MCTS/DPS
●   Simulation-based optimization in power
    systems
Publications
●   Undecidability of adversarial planning with
    unbounded horizon
●   DPS: Convergence rates of robust
    optimization / noisy optimization
●   Parallel MCTS / nested MCTS
●   MCTS in continuous settings
                Optimal rates in the parallel case
●   MCTS for POfor robust(real-world: Urban Rivals)
                   setting optimization w.r.t.
●   Model-free MCTSmonotonous compositions
                  ==> essentially, bounds and
●   Hybridization MCTS/DPS evolutionary
                    patches for
                          computation
●   Simulation-based optimization in power
    systems
Publications
●   Undecidability of adversarial planning with
    unbounded horizon
●   DPS: Convergence rates of robust
    optimization / noisy optimization
●   Parallel MCTS / nested MCTS
●   MCTS in continuous settings
             Optimal rates for noisy quadratic
●   MCTS for PO setting optimization with
                black-box (real-world: Urban Rivals)
●   Model-free MCTS linear in the regret
               variance
●   Hybridization MCTS/DPS
●   Simulation-based optimization in power
    systems
Publications
●   Undecidability of adversarial planning with
    unbounded horizon
●   DPS: Convergence rates of robust
    optimization / noisy optimization
●   Parallel MCTS / nested MCTS
●   MCTS in continuous settings
●   MCTS for PO setting (real-world: Urban Rivals)
●   Model-free MCTS Consistency proof in the
●                        continuous case
    Hybridization MCTS/DPS
●   Simulation-based optimization in power
    systems
Publications
●   Undecidability of adversarial planning with
    unbounded horizon
●   DPS: Convergence rates of robust
    optimization / noisy optimization
●   Parallel MCTS / nested MCTS
●   MCTS in continuous settings
●   MCTS for PO setting (real-world: Urban Rivals)
●   Model-free MCTS
●   Hybridization MCTS/DPS
●   Simulation-based optimization in power
    systems

Mais conteúdo relacionado

Semelhante a reinforcement learning for difficult settings

human computer Interaction cognitive models.ppt
human computer Interaction cognitive models.ppthuman computer Interaction cognitive models.ppt
human computer Interaction cognitive models.ppt
Jayaprasanna4
 
human computer Interaction cognitive models.ppt
human computer Interaction cognitive models.ppthuman computer Interaction cognitive models.ppt
human computer Interaction cognitive models.ppt
Jayaprasanna4
 
Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity Prediction
Triskelion_Kaggle
 
Efficient and Effective Influence Maximization in Social Networks: Hybrid App...
Efficient and Effective Influence Maximization in Social Networks: Hybrid App...Efficient and Effective Influence Maximization in Social Networks: Hybrid App...
Efficient and Effective Influence Maximization in Social Networks: Hybrid App...
NAVER Engineering
 

Semelhante a reinforcement learning for difficult settings (20)

human computer Interaction cognitive models.ppt
human computer Interaction cognitive models.ppthuman computer Interaction cognitive models.ppt
human computer Interaction cognitive models.ppt
 
human computer Interaction cognitive models.ppt
human computer Interaction cognitive models.ppthuman computer Interaction cognitive models.ppt
human computer Interaction cognitive models.ppt
 
Machine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonMachine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeon
 
Structured Software Design
Structured Software DesignStructured Software Design
Structured Software Design
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
 
Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity Prediction
 
Scolari's ICCD17 Talk
Scolari's ICCD17 TalkScolari's ICCD17 Talk
Scolari's ICCD17 Talk
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper review
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptx
 
Sistemas Distribuidos
Sistemas DistribuidosSistemas Distribuidos
Sistemas Distribuidos
 
Efficient and Effective Influence Maximization in Social Networks: Hybrid App...
Efficient and Effective Influence Maximization in Social Networks: Hybrid App...Efficient and Effective Influence Maximization in Social Networks: Hybrid App...
Efficient and Effective Influence Maximization in Social Networks: Hybrid App...
 
Dissertation Defence: Enforcing User-Defined Management Logic in Large Scale ...
Dissertation Defence: Enforcing User-Defined Management Logic in Large Scale ...Dissertation Defence: Enforcing User-Defined Management Logic in Large Scale ...
Dissertation Defence: Enforcing User-Defined Management Logic in Large Scale ...
 
Deep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorchDeep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorch
 
Concurrent Root Cut Loops to Exploit Random Performance Variability
Concurrent Root Cut Loops to Exploit Random Performance VariabilityConcurrent Root Cut Loops to Exploit Random Performance Variability
Concurrent Root Cut Loops to Exploit Random Performance Variability
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 

reinforcement learning for difficult settings

  • 1. SUBGOAL LEARNING, MACRO-ACTIONS, PARTIAL OBSERVATION, CLUSTERING OF FEATURES, and other stuff for difficult reinforcement learning settings. One slide out of topic, sorry :-) Yesterday I was particularly interested in the discussion around deep networks and convolution networks and Yann LeCun et al for computer vision, thanks :-) Seemingly computational power is a big part of computer vision, right ?
  • 2. MASH WP6 – Goal Planning Controlling a 3D avatar or a robot arm: - without expert help - without model - without parallel runs - without knowing the target - with expensive runs - using existing human expertise if any, in a way compliant with crowd-sourcing (human does not know the platform)
  • 3. Category of problems ● MDP solving: you have access to the model ● Generative models: – Cases in which you can “undo” – Cases in which you can not The hardest reinforcement learning setting you can find, with expensive sims
  • 4. Goals of the project – Adapting MCTS for such problems – Parallel model-free MCTS – Facilitating and Testing Crowd-Sourcing – Other methods for such problems
  • 5. Outline ● What we have done – Extension to partially observable expensive “very” model- free problems – Experiments on other WPs testbeds – Experiments on our testbeds ● Parallelization ● Conclusions ● Perspectives
  • 6. MCTS / UCT ● MCTS = UCT (nearly) ● Very good for high-dimensional problem with little expertise ● Requires many simulations ● Principle: ● Do simulations (plenty of) ● Adaptive decisions: first simulations with stupid strategies, and online improve the simulated strategy.
  • 7. Change #1: Macro-Actions (MA) ● With low-level decisions, actions should often be repeated for being meaningfull ● Example with left-right: RRRRRR makes sense LLLLLLLL makes sense RLLLRRL makes no sense ● Automatically categorize actions: eventually stationary, opposite, cyclic; define MA ==> state of the art + automatization
  • 8. Change #2: Clustering Features ● Many state variables are very similar ● Clustering: – Performsimulations – Groups of correlated features ● Strongly reduce the state space dimension
  • 9. Change #3: memory ● Partially Observable problems require memory ● Tree of subgoals:
  • 10. Change #3: memory ● Partially Observable problems require memory ● Tree of subgoals: I choose this action
  • 11. Change #3: memory ● Partially Observable problems require memory ● Tree of subgoals: Each node contains a goal, i.e. features to be activated
  • 12. Change #3: memory ● Partially Observable problems require memory ● Tree of subgoals: Each decisions node contains made by a goal, i.e. “voting”: MA features to be correlated activated with expected transitions
  • 13. Change #3: memory ● Partially Observable problems require memory ● MCTS Tree of subgoals: is an Each decisions node contains extremely made by natural tool for a goal, i.e. “voting”: MA features to be correlated building subgoals activated with expected transitions
  • 14. Summary CluVo + GMCTS: all in one slide 1) Simulations, categorization of actions 2) Building of macro-actions 3) Clustering of features 4) MCTS by simulations, correlations, voting: 1) Node creation as in MCTS, but node=subgoal 2) Simulations biased by rewards as in MCTS 3) Goals → votes → MA → decisions Vote: actions which statistically activate the goal features, in the current state, are preferred
  • 15. Other developments ● Q-learning ● Fitted Q-iteration ● Direct Policy Search Main issue: representation (macro-actions, clusters of features). ==> Direct Policy Search ● also uses MA ● but needs a memory ==> GMCTS quite convenient / focusing simulations.
  • 16. Results of Cluvo+GMCTS on other WPs testcases ● Blue flag then red flag ok ● Looks easy, but in a fully agnostic framework and thousands of variables it is not that easy. ● Same algorithm performed correctly on “catch as many flags as possible”. ● Combines many things of the state of the art: – Macro-actions – Subgoal learning – Clustering of features – MCTS / UCT
  • 17. All you can eat: DPS could do it, with MA (no memory needed)
  • 18. Blue Flag then Red Flag: Clustering ok for 3 out of 12 runs 8h learning Generalization
  • 19. Test on testcases from other WPs ● This has taken most of the manpower, easy problems but with very difficult setting ● No crowd-sourcing ● We have other testbeds with external developpers (same platform)
  • 20. Results on the game of Go (~8 contributors) ● automatic modifications of the bandit (moderate success, far less efficient than supervised learning of databases or expert handcrafting) ==> Maths for crowd-sourcing: – Automated regression testing by MSHT (good for crowd- sourcing) – Constraints on the way human enter expertise, for preserving consistency ● automatic precomputing of moves (opening books) ==> both are quite parallel, but very expensive for moderate progress
  • 21. Results on Urban Rivals (18 millions of players, ~15 developpers on the core) ● Also partial observability, but information fully revealed frequently ● Easy to simulate ● MCTS was great for this application: – No human expertise needed – Consistent independently of human expertise ● Solved the problem, whereas many engineers failed
  • 22. MineSweeper – building a code on top of an existing heuristic (1 existing code...)
  • 23. Results on Energy management (~7/8 developers, high turn over) ● Existing solutions often very poor for short term volatility and high-dimension of the state space ● Simulation-based approaches: rigorous use of cross-validation, detailed non-simplified simulations ● MCTS + DPS (for choosing the default policy): stable and efficient
  • 25. Conclusions (1) ● MCTS adapted to partially observable expensive agnostic settings – MCTS + all existing tricks from the state of the art – Integration into MCTS probably more natural than in many algorithms (in particular subgoal learning) – big implementation and experimentation work; more publications to come ● An unexpected positive result: ● Merge between two simulation-based tools, DPS (long term effects) and MCTS (short term effects) ● Quite natural, highly parallel ● Virtually no model bias ● Really efficient in stochastic setting (not adversarial)
  • 26. Conclusions (2) ● We tested on problems with external developpers: ● Simu-based optimization make precise models possible ● Interface with humans: Automatic non-regression testing / constraints for consistency / interface for using human knowledge ok ● WP problems did not motivate alternate developers, but principles could be tested on other testcases ● No real crowd-sourcing, but some moderate teams of motivated developpers ==> easier ● Principles developped for the platform are re- used for an industrial platform
  • 27. Perspectives ● The GMCTS / CLUVO program is stable and able to work on very hard settings by combining many state-of-the-art techniques, can have a long life; no crowd-sourcing ● The application of simulation based methods is efficient, compliant with non-linear stochastic dynamics and parallel ==> validated for industrialization in energy management ● MCTS variants for partially observable settings are ok also far from the hard Mash setting
  • 28. Publication ● Main MASH publication: JBHoock 's paper 2012: ● Categorizing actions for automatically designing macro-actions ● Clustering of features ● GMCTS for building subgoals ● Many ideas in it, it's a big part of his ph.D. in one article.
  • 29. Publications ● Undecidability of adversarial planning with unbounded horizon Planning with: ● DPS: Convergence rates -of robust uncertainties adversarial - finite state space optimization / noisy optimization - no observation ● - deterministic problem Parallel MCTS / nested MCTS ● MCTS in continuous settings Optimal average reward: - undecidable ● MCTS for PO setting (real-world: Urban - unapproximable Rivals) ● Model-free MCTS ● Hybridization MCTS/DPS ● Simulation-based optimization in power systems
  • 30. Publications ● Undecidability of adversarial planning with unbounded horizon Planning with: ● DPS: Convergence rates -of robust uncertainties adversarial optimization / noisy optimization - finite state space - partially observation ● Parallel MCTS / nested MCTS - stochastic problem ● MCTS in continuous settings all strategies, - for stops almost surely ● MCTS for PO setting (real-world: Urban Rivals) Optimal average reward ● Model-free MCTS is decidable ● Hybridization MCTS/DPS ● Simulation-based optimization in power systems
  • 31. Publications ● Undecidability of adversarial planning with unbounded horizon ● DPS: Convergence rates of robust optimization / noisy optimization ● Parallel MCTS / nested MCTS ● MCTS in continuous settings Optimal rates in the parallel case ● MCTS for POfor robust(real-world: Urban Rivals) setting optimization w.r.t. ● Model-free MCTSmonotonous compositions ==> essentially, bounds and ● Hybridization MCTS/DPS evolutionary patches for computation ● Simulation-based optimization in power systems
  • 32. Publications ● Undecidability of adversarial planning with unbounded horizon ● DPS: Convergence rates of robust optimization / noisy optimization ● Parallel MCTS / nested MCTS ● MCTS in continuous settings Optimal rates for noisy quadratic ● MCTS for PO setting optimization with black-box (real-world: Urban Rivals) ● Model-free MCTS linear in the regret variance ● Hybridization MCTS/DPS ● Simulation-based optimization in power systems
  • 33. Publications ● Undecidability of adversarial planning with unbounded horizon ● DPS: Convergence rates of robust optimization / noisy optimization ● Parallel MCTS / nested MCTS ● MCTS in continuous settings ● MCTS for PO setting (real-world: Urban Rivals) ● Model-free MCTS Consistency proof in the ● continuous case Hybridization MCTS/DPS ● Simulation-based optimization in power systems
  • 34. Publications ● Undecidability of adversarial planning with unbounded horizon ● DPS: Convergence rates of robust optimization / noisy optimization ● Parallel MCTS / nested MCTS ● MCTS in continuous settings ● MCTS for PO setting (real-world: Urban Rivals) ● Model-free MCTS ● Hybridization MCTS/DPS ● Simulation-based optimization in power systems