Presentation of the Exploration & Exploitation Challenge 2011 (http://explo.cs.ucl.ac.uk/), recap of the phase 1 results and announcement of the phase 2 and final results.
Talk given on 2 July 2011 at the 'On‐line Trading of Exploration and Exploitation 2' workshop at the International Conference in Machine Learning.
17. Phase 2 Rank Name Affiliation Total time Score Uplift #1 Olivier Nicol INRIA 3h 40m 11529 106% #2 Christophe Salperwyck Orange 29h 50m 10419 86% #3 Tanguy Urvoy Orange 4h 10179 82% #4 Aurélien Garivier CNRS 1h 17m 9990 78% #5 Martin Antenreiter MUL 20h 8049 44% Random 1h 12m 5598 0%
18.
19.
Editor's Notes
These are my notes
Questions -> interrupt me
The challenge is about finding good algorithms to do that. We can’t evaluate algorithms live, but on offline data (in an online fashion).
Simulated data that has the characteristics of the actual data that can be observed, but which is such that all options have same CTR.
Batches: - All visitors in a batch are different and have never been seen before. - There can be several clicks or no click in a batch. - All options might not be represented in a batch. Remarks - Need to learn a mapping from (visitor, option) to reward, and need to optimise the cumulated reward: exploration and exploitation trade-off. - Visitor responses might change through time, making it essential to keep learning their interests. - Because the CTRs for each option are the same, it is necessary to use the visitor features if we want to make better predictions than random
2 remarks. First, it’s not sure that Christophe’s algorithm is better than Tanguy’s.