SlideShare uma empresa Scribd logo
1 de 21
DIRECT POLICY SEARCH


0. What is Direct Policy Search ?

1. Direct Policy Search:
   Parametric Policies for Financial Applications

2. Parametric Bellman values for Stock Problems

3. Direct Policy Search: Optimization Tools
First, you need to know what is
              direct policy search (DPS).

                  Principle of DPS:

 (1) Define a parametric policy Pi
     with parameters t1,...,tk.

 (2) maximize
     (t1,...,tk) → average reward when applying
     Policy pi(t1,...,tk) on the problem.

                ==> You must define Pi
 ==> You must choose a noisy optimization algorithm
==> There is a Pi by default (an actor neural network),
      but it's only a default solution (overload it)
Strengths of DPS:

- Good warm start
     If I have a solution for problem A, and
     if I switch to problem B close to A, then I quickly
     get good results.

- Benefits from expert knowledge on the structure

- No constraint on the structure of the objective function

- Anytime (i.e. not that bad in restricted time)

                          Drawbacks:
            - needs structured direct policy search
         - not directly applicable to partial observation
Virtual MashDecision computeDecision(MashState & state,
             Const Vector<double> params)

                ==> “params” = t1,...,tk
        ==> returns the decision pi(t1,...,tk,state)

                  Does it make sense ?

    Overload this function, and DPS is ready to work.

    Well, DPS (somewhere between alpha and beta)
                might be full of bugs :-)
Direct Policy Search:
Parametric Policies for Financial
          Application
Bengio et al papers on DPS for financial applications


       Stocks (various assets) + Cash              - Can be applied on data sets
                                                      (no simulator, no elasticity model)
           decision =
       tradingUnit(A, prevision(B,data))
                                                      because policy has no impact
                                                      on prices
                     Where:
- tradingUnit is designed by human experts         - 22 params in first paper
- prevision's outputs are chosen
          by human experts                         - reduced weight sharing
- prevision is a neural network
- A and B are parameters                               in other paper
                                                         ==> ~ 800 parameters
Then,                                                      (if I understand correctly)
B is optimized by LMS (prevision criterion)
    ==> poor results, little correlation between   - there exist much bigger DPS
       LMS and financial performance
A and B are optimized on the expected return             (Sigaud et al., 27 000)
   (by DPS) ==> much better
                                                   - nb: noisy optimization
An alternate solution:

parametric Bellman values

   for Stock Problems
What is a Bellman function ?

V(s): expected benefit, in the future,
  if playing optimally from state s.

V(s) is useful for playing optimally.
Rule for an optimal decision:

  d(s) = argmax V(s') + r(s,d)
            d

- s'=nextState(s,d)
- d(s): optimal decision in state s
- V(s'): Bellman value in state s'
- r(s,d): reward associated to
          decision d in state s
Remark 1: V(s) known
up to an additive constant is enough

       Remark 2: dV(s)/d(si)
       is the price of stock i

  Example with one stock, soon.
Q-rule for an optimal decision:

      d(s) = argmax Q(s,d)
                d

- d(s): optimal decision in state s
- Q(s,d) : optimal future reward if
   decision = d in s

==> approximate Q instead of V
==> we don't need r(s,d)
       nor newState(s,d)
I have enough
                                             stock;
                                        I pay only if it's
V(stock) (in euros)
                                            cheap.


       I need a
     lot of stock!
  I accept to pay a
          lot.




                      Slope = marginal price (euros/KWh)




                                              Stock (in kWh)
Examples:
For one stock:
   - very simple: constant price
   - piecewise linear (can ensure convexity)
   - “tanh” function
   - neural network, SVM, sum of Gaussians...


For several stocks:
   - each stock separately
   - 2-dimensional: V(s1,s2,s3)=V'(s1,S)+v''(s2,S)+v'''(s3,S)
                   where S=a1.s1+a2.s2+a3.s3
   - neural network, SVM, sum of Gaussians...
How to choose coefficients ?
- dynamic programming: robust, but slow in high dim
- direct policy search:
     - initializing coefficients from expert advice
     - or: supervised machine learning for approximating
             an expert advice
     ==> and then optimize
Conclusions:

V: Very convenient representation of policy:
   we can view prices.
Q: some advantages (model-free models)

Yet, less readable than direct rules.

And expensive: we need one optimization for making
  the decision, for each time step of a simulation.
  ==> but this optimization can be
        a simple sort (as a first approximation).

Simpler ? Adrien has a parametric strategy for stocks
   ==> we should see how to generalize it
   ==> transformation “constants → parameters” ==> DPS
Questions (strategic decisions for the DPS):
     - start with Adrien's policy, improve it, generalize it,
           parametrize it ? interface with ARM ?
     - or another strategy ?
     - or a parametric V function, and we assume we have
           r(s,d) and newState(s,d) (often true)
     - or a parametric Q function ?
         (more generic, unusual but appealing,
         but neglects some
         existing knowledge r(s,d) and newState(s,d) )

Further work:
   - finish the validation of Adrien's policy on stock
       (better than random as a policy; better than random
            as a UCT-Monte-Carlo)
   - generalize ? variants ?
   - introduce into DPS, compare to the baseline (neural net)
   - introduce DPS's result into MCTS
Questions (strategic decisions for the DPS):
     - start with Adrien's policy, improve it, generalize it,
           parametrize it ? interface with ARM ?
     - or another strategy ?
     - or a parametric V function, and we assume we have
           r(s,d) and newState(s,d) (often true)
     - or a parametric Q function ?
         (more generic, unusual but appealing,
         but neglects some
         existing knowledge r(s,d) and newState(s,d) )

Further work:
   - finish the validation of Adrien's policy on stock
       (better than random as a policy; better than random
            as a UCT-Monte-Carlo)
   - generalize ? variants ?
   - introduce into DPS, compare to the baseline (neural net)
   - introduce DPS's result into MCTS
Direct Policy Search:

 Optimization Tools

& Optimization Tricks
- Classical tools: Evolution Strategies,
   Cross-Entropy, Pso, ...
   ==> more or less supposed to be
          robust to local minima
   ==> no gradient
   ==> robust to noisy objective function
   ==> weak for high dimension (but: see locality, next slide)

- Hopefully:
   - good initialization: nearly convex
   - random seeds: no noise

==> NewUoa is my favorite choice
   - no gradient
   - can “really” work in high-dimension
   - update rule surprisingly fast
   - people who try to show that their
       algorithm is better than NewUoa
       suffer a lot in noise-free case
Improvements of optimization algorithms:

     - active learning: when optimization on scenarios,
            choose “good” scenarios

           ==> maybe “quasi-randomization” ?
                Just choosing a representative sample of
                scenarios. ==> simple, robust...

     - local improvement: when a gradient step/update
            is performed, only update variables concerned
            by the simulation you've used for generating
            the update

           ==> difficult to use in NewUoa
Roadmap:

- default policy for energy management problems:
      test, generalize, formalize, simplify...

- this default policy ==> a parametric policy

- test in DPS: strategy A

- interface DPS with NewUoa and/or others (openDP opt?)

- Strategy A: test into MCTS ==> Strategy B

==> IMHO, strategy A = good tool for fast
        readable non-myopic results

==> IMHO, strategy B = good for combining A with
   the efficiency of A for short term combinatorial effects.

- Also, validating the partial observation (sounds good).

Mais conteúdo relacionado

Mais procurados

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineMusa Hawamdah
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
RT-BDI: A Real-Time BDI model
RT-BDI: A Real-Time BDI modelRT-BDI: A Real-Time BDI model
RT-BDI: A Real-Time BDI modelDavide Calvaresi
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classificationYiwei Chen
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector MachinesDongseo University
 
A BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineA BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineAboul Ella Hassanien
 
Dask glm-scipy2017-final
Dask glm-scipy2017-finalDask glm-scipy2017-final
Dask glm-scipy2017-finalHussain Sultan
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance charlesmartin14
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machinesNawal Sharma
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019Charles Martin
 
Modeling interest rates and derivatives
Modeling interest rates and derivativesModeling interest rates and derivatives
Modeling interest rates and derivativesAiden Wu, FRM
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 

Mais procurados (20)

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Svm vs ls svm
Svm vs ls svmSvm vs ls svm
Svm vs ls svm
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
RT-BDI: A Real-Time BDI model
RT-BDI: A Real-Time BDI modelRT-BDI: A Real-Time BDI model
RT-BDI: A Real-Time BDI model
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classification
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
A BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineA BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machine
 
Dask glm-scipy2017-final
Dask glm-scipy2017-finalDask glm-scipy2017-final
Dask glm-scipy2017-final
 
Introduction to logistic regression
Introduction to logistic regressionIntroduction to logistic regression
Introduction to logistic regression
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019
 
Modeling interest rates and derivatives
Modeling interest rates and derivativesModeling interest rates and derivatives
Modeling interest rates and derivatives
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 

Destaque

Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesOlivier Teytaud
 
Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchSimulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchOlivier Teytaud
 
Examples of operational research
Examples of operational researchExamples of operational research
Examples of operational researchOlivier Teytaud
 
Bias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimizationBias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimizationOlivier Teytaud
 
Keywords and examples of machine learning
Keywords and examples of machine learningKeywords and examples of machine learning
Keywords and examples of machine learningOlivier Teytaud
 
Disappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree SearchDisappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree SearchOlivier Teytaud
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationOlivier Teytaud
 
Combining games artificial intelligences & improving random seeds
Combining games artificial intelligences & improving random seedsCombining games artificial intelligences & improving random seeds
Combining games artificial intelligences & improving random seedsOlivier Teytaud
 
Fuzzy control - superfast survey
Fuzzy control - superfast surveyFuzzy control - superfast survey
Fuzzy control - superfast surveyOlivier Teytaud
 
Planning for power systems
Planning for power systemsPlanning for power systems
Planning for power systemsOlivier Teytaud
 
Artificial intelligence for power systems
Artificial intelligence for power systemsArtificial intelligence for power systems
Artificial intelligence for power systemsOlivier Teytaud
 
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Olivier Teytaud
 
Réseaux neuronaux profonds & intelligence artificielle
Réseaux neuronaux profonds & intelligence artificielleRéseaux neuronaux profonds & intelligence artificielle
Réseaux neuronaux profonds & intelligence artificielleOlivier Teytaud
 

Destaque (16)

Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
 
Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchSimulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
 
Debugging
DebuggingDebugging
Debugging
 
Power systemsilablri
Power systemsilablriPower systemsilablri
Power systemsilablri
 
Examples of operational research
Examples of operational researchExamples of operational research
Examples of operational research
 
Functional programming
Functional programmingFunctional programming
Functional programming
 
Bias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimizationBias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimization
 
Keywords and examples of machine learning
Keywords and examples of machine learningKeywords and examples of machine learning
Keywords and examples of machine learning
 
Disappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree SearchDisappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree Search
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimization
 
Combining games artificial intelligences & improving random seeds
Combining games artificial intelligences & improving random seedsCombining games artificial intelligences & improving random seeds
Combining games artificial intelligences & improving random seeds
 
Fuzzy control - superfast survey
Fuzzy control - superfast surveyFuzzy control - superfast survey
Fuzzy control - superfast survey
 
Planning for power systems
Planning for power systemsPlanning for power systems
Planning for power systems
 
Artificial intelligence for power systems
Artificial intelligence for power systemsArtificial intelligence for power systems
Artificial intelligence for power systems
 
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
 
Réseaux neuronaux profonds & intelligence artificielle
Réseaux neuronaux profonds & intelligence artificielleRéseaux neuronaux profonds & intelligence artificielle
Réseaux neuronaux profonds & intelligence artificielle
 

Semelhante a Direct policy search

Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systemsOlivier Teytaud
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesArvind Rapaka
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
Differential Machine Learning Masterclass
Differential Machine Learning MasterclassDifferential Machine Learning Masterclass
Differential Machine Learning MasterclassAntoine Savine
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Andres hernandez ai_machine_learning_london_nov2017
Andres hernandez ai_machine_learning_london_nov2017Andres hernandez ai_machine_learning_london_nov2017
Andres hernandez ai_machine_learning_london_nov2017Andres Hernandez
 
Dynamic Programming and Reinforcement Learning applied to Tetris Game
Dynamic Programming and Reinforcement Learning applied to Tetris GameDynamic Programming and Reinforcement Learning applied to Tetris Game
Dynamic Programming and Reinforcement Learning applied to Tetris GameSuelen Carvalho
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)Pierre Schaus
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos butest
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regressionRaman Kannan
 
Applying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKApplying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKJeremy Chen
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Quantitative techniques
Quantitative techniquesQuantitative techniques
Quantitative techniquesAsif Bodla
 
PSO and Its application in Engineering
PSO and Its application in EngineeringPSO and Its application in Engineering
PSO and Its application in EngineeringPrince Jain
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackarogozhnikov
 

Semelhante a Direct policy search (20)

Optimization
OptimizationOptimization
Optimization
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
 
weatherr.pptx
weatherr.pptxweatherr.pptx
weatherr.pptx
 
ML .pptx
ML .pptxML .pptx
ML .pptx
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
Differential Machine Learning Masterclass
Differential Machine Learning MasterclassDifferential Machine Learning Masterclass
Differential Machine Learning Masterclass
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Andres hernandez ai_machine_learning_london_nov2017
Andres hernandez ai_machine_learning_london_nov2017Andres hernandez ai_machine_learning_london_nov2017
Andres hernandez ai_machine_learning_london_nov2017
 
Dynamic Programming and Reinforcement Learning applied to Tetris Game
Dynamic Programming and Reinforcement Learning applied to Tetris GameDynamic Programming and Reinforcement Learning applied to Tetris Game
Dynamic Programming and Reinforcement Learning applied to Tetris Game
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
 
Applying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKApplying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPK
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Quantitative techniques
Quantitative techniquesQuantitative techniques
Quantitative techniques
 
PSO and Its application in Engineering
PSO and Its application in EngineeringPSO and Its application in Engineering
PSO and Its application in Engineering
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Direct policy search

  • 1. DIRECT POLICY SEARCH 0. What is Direct Policy Search ? 1. Direct Policy Search: Parametric Policies for Financial Applications 2. Parametric Bellman values for Stock Problems 3. Direct Policy Search: Optimization Tools
  • 2. First, you need to know what is direct policy search (DPS). Principle of DPS: (1) Define a parametric policy Pi with parameters t1,...,tk. (2) maximize (t1,...,tk) → average reward when applying Policy pi(t1,...,tk) on the problem. ==> You must define Pi ==> You must choose a noisy optimization algorithm ==> There is a Pi by default (an actor neural network), but it's only a default solution (overload it)
  • 3. Strengths of DPS: - Good warm start If I have a solution for problem A, and if I switch to problem B close to A, then I quickly get good results. - Benefits from expert knowledge on the structure - No constraint on the structure of the objective function - Anytime (i.e. not that bad in restricted time) Drawbacks: - needs structured direct policy search - not directly applicable to partial observation
  • 4. Virtual MashDecision computeDecision(MashState & state, Const Vector<double> params) ==> “params” = t1,...,tk ==> returns the decision pi(t1,...,tk,state) Does it make sense ? Overload this function, and DPS is ready to work. Well, DPS (somewhere between alpha and beta) might be full of bugs :-)
  • 5. Direct Policy Search: Parametric Policies for Financial Application
  • 6. Bengio et al papers on DPS for financial applications Stocks (various assets) + Cash - Can be applied on data sets (no simulator, no elasticity model) decision = tradingUnit(A, prevision(B,data)) because policy has no impact on prices Where: - tradingUnit is designed by human experts - 22 params in first paper - prevision's outputs are chosen by human experts - reduced weight sharing - prevision is a neural network - A and B are parameters in other paper ==> ~ 800 parameters Then, (if I understand correctly) B is optimized by LMS (prevision criterion) ==> poor results, little correlation between - there exist much bigger DPS LMS and financial performance A and B are optimized on the expected return (Sigaud et al., 27 000) (by DPS) ==> much better - nb: noisy optimization
  • 7. An alternate solution: parametric Bellman values for Stock Problems
  • 8. What is a Bellman function ? V(s): expected benefit, in the future, if playing optimally from state s. V(s) is useful for playing optimally.
  • 9. Rule for an optimal decision: d(s) = argmax V(s') + r(s,d) d - s'=nextState(s,d) - d(s): optimal decision in state s - V(s'): Bellman value in state s' - r(s,d): reward associated to decision d in state s
  • 10. Remark 1: V(s) known up to an additive constant is enough Remark 2: dV(s)/d(si) is the price of stock i Example with one stock, soon.
  • 11. Q-rule for an optimal decision: d(s) = argmax Q(s,d) d - d(s): optimal decision in state s - Q(s,d) : optimal future reward if decision = d in s ==> approximate Q instead of V ==> we don't need r(s,d) nor newState(s,d)
  • 12. I have enough stock; I pay only if it's V(stock) (in euros) cheap. I need a lot of stock! I accept to pay a lot. Slope = marginal price (euros/KWh) Stock (in kWh)
  • 13. Examples: For one stock: - very simple: constant price - piecewise linear (can ensure convexity) - “tanh” function - neural network, SVM, sum of Gaussians... For several stocks: - each stock separately - 2-dimensional: V(s1,s2,s3)=V'(s1,S)+v''(s2,S)+v'''(s3,S) where S=a1.s1+a2.s2+a3.s3 - neural network, SVM, sum of Gaussians...
  • 14. How to choose coefficients ? - dynamic programming: robust, but slow in high dim - direct policy search: - initializing coefficients from expert advice - or: supervised machine learning for approximating an expert advice ==> and then optimize
  • 15. Conclusions: V: Very convenient representation of policy: we can view prices. Q: some advantages (model-free models) Yet, less readable than direct rules. And expensive: we need one optimization for making the decision, for each time step of a simulation. ==> but this optimization can be a simple sort (as a first approximation). Simpler ? Adrien has a parametric strategy for stocks ==> we should see how to generalize it ==> transformation “constants → parameters” ==> DPS
  • 16. Questions (strategic decisions for the DPS): - start with Adrien's policy, improve it, generalize it, parametrize it ? interface with ARM ? - or another strategy ? - or a parametric V function, and we assume we have r(s,d) and newState(s,d) (often true) - or a parametric Q function ? (more generic, unusual but appealing, but neglects some existing knowledge r(s,d) and newState(s,d) ) Further work: - finish the validation of Adrien's policy on stock (better than random as a policy; better than random as a UCT-Monte-Carlo) - generalize ? variants ? - introduce into DPS, compare to the baseline (neural net) - introduce DPS's result into MCTS
  • 17. Questions (strategic decisions for the DPS): - start with Adrien's policy, improve it, generalize it, parametrize it ? interface with ARM ? - or another strategy ? - or a parametric V function, and we assume we have r(s,d) and newState(s,d) (often true) - or a parametric Q function ? (more generic, unusual but appealing, but neglects some existing knowledge r(s,d) and newState(s,d) ) Further work: - finish the validation of Adrien's policy on stock (better than random as a policy; better than random as a UCT-Monte-Carlo) - generalize ? variants ? - introduce into DPS, compare to the baseline (neural net) - introduce DPS's result into MCTS
  • 18. Direct Policy Search: Optimization Tools & Optimization Tricks
  • 19. - Classical tools: Evolution Strategies, Cross-Entropy, Pso, ... ==> more or less supposed to be robust to local minima ==> no gradient ==> robust to noisy objective function ==> weak for high dimension (but: see locality, next slide) - Hopefully: - good initialization: nearly convex - random seeds: no noise ==> NewUoa is my favorite choice - no gradient - can “really” work in high-dimension - update rule surprisingly fast - people who try to show that their algorithm is better than NewUoa suffer a lot in noise-free case
  • 20. Improvements of optimization algorithms: - active learning: when optimization on scenarios, choose “good” scenarios ==> maybe “quasi-randomization” ? Just choosing a representative sample of scenarios. ==> simple, robust... - local improvement: when a gradient step/update is performed, only update variables concerned by the simulation you've used for generating the update ==> difficult to use in NewUoa
  • 21. Roadmap: - default policy for energy management problems: test, generalize, formalize, simplify... - this default policy ==> a parametric policy - test in DPS: strategy A - interface DPS with NewUoa and/or others (openDP opt?) - Strategy A: test into MCTS ==> Strategy B ==> IMHO, strategy A = good tool for fast readable non-myopic results ==> IMHO, strategy B = good for combining A with the efficiency of A for short term combinatorial effects. - Also, validating the partial observation (sounds good).