Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use Case
1. Jo-fai Chow, H2O.ai
From Rapid Prototypes to an
End-to-End Model Deployment
An AI Hedge Fund Use Case
London Artificial Intelligence and Deep Learning Meetup
September 23 | 5:30 PM BST
+
2. Confidential2
Agenda
• About Us
– H2O.ai, team, community
• Numerai
– AI hedge fund, tournament, community
• H2O Driverless AI
– Why it is useful for the Numerai tournament
• Workflow Automation
– Simple end-to-end workflow
– My real-world example
• Learning Resources
• What’s Next?
• Q & A
3. Confidential3
Goals • For you
– An overview of Numerai
– Key features in H2O Driverless AI for
machine learning
– A simple end-to-end example
– An overview of my real-world workflow
– Knowing how to get started with Numerai
and Driverless AI
• For me
– Getting more people to give Numerai and
H2O Driverless AI a try
5. 5
Founded in Silicon Valley 2012
Funding: $147M | Series D
Investors: Goldman Sachs, Ping An,
Wells Fargo, NVIDIA, Nexus Ventures
We are Established
We Make World-class AI Technology
We are Global
H2O Open Source Machine Learning at Scale
H2O Driverless AI: Automatic Machine Learning
H2O Q: AI platform for business users
Mountain View, NYC, London, Paris, Ottawa,
Prague, Chennai, Singapore
240 1K
20K 180K
Universities
Companies Using
H2O Open Source
Meetup Members
Best AI Team
H2O.ai Snapshot
We are Passionate about Customers
4X customers, 2 years, all industries, all continents
Aetna/CVS, Allergan, AT&T, Capital One, CBA, Citi,
Coca Cola, Bradesco, Disney, Franklin Templeton,
Genentech, Kaiser Permanente, Lego, Merck, Pepsi,
Reckitt Benckiser, Roche
6. 66
Our Team is Made up of the World’s Leading Data Scientists
Your projects are backed by 10% of the World’s Data Science
Grandmasters and a Team of Experts who are relentless in solving your
critical problems.
7. 7
• Automatic feature engineering, ML training
and interpretability, from ingest to
deployment
• Open and Extensible AutoML
• User licenses on a per seat basis annually
• GUI-based interface, along with R & Python
API, for end-to-end data science
• A new and innovated platform to make
your own AI apps
• Rapid & Easy SDK to build interactive, low
latency AI apps
• Easy and intuitive platform to have AI
answer your question
The H2O.ai Platform
In-memory, distributed
machine learning algorithms with
H2O Flow GUI
Open Source
H2O open source engine
integration with Spark
H2O Driverless AI H2O Q
• 100% open source – Apache V2 Licensed
• Enterprise support subscriptions
• Interface using R, Python on H2O Flow
H2O ModelOps
• AI deployment platform built for DevOps and MLOps
• Scalable to support high throughput and low latency
model scoring environments
• Comprehensive model monitoring
Highly flexible and scalable
model deployment and
monitoring platform.
App Marketplace
8. 8
H2O.ai
is Empowering
Companies to
Be AI Companies
222 Fortune 500
use H2O
Open Source
8 Of top 10
Banks
7
4
Of top 10
Insurance
companies
Of top 10
Healthcare
companies
10. Confidential10
Studied Civil Engineering and Water Management
Taught myself open-source R/Python in 2013
Discovered H2O in 2014
Joined H2O in 2016
Background
Roles
Data Scientist / Sales Engineer /
Community Manager / Customer Success Manager
Current: Senior Data Science Evangelist
1st 40+
100+ 11K
Cities in Europe, US,
and Asia
H2O Talks /
Workshops
London Meetup
Members
H2O Maker in UK
About Jo-fai Chow
11. Confidential11
Studied Civil Engineering and Water Management
Taught myself open-source R/Python in 2013
Discovered H2O in 2014
Joined H2O in 2016
Background
Roles
Data Scientist / Sales Engineer /
Community Manager / Customer Success Manager
Current: Senior Data Science Evangelist
1st 40+
100+ 11K
Cities in Europe, US,
and Asia
H2O Talks /
Workshops
London Meetup
Members
H2O Maker in UK
About Jo-fai Chow
Speciality
#360Selfie
Big Data LDN 2019
16. Confidential16
The Challenge - Numerai Tournament
https://docs.numer.ai/tournament/learn
https://medium.com/numerai/encrypted-data-for-efficient-markets-fffbe9743ba8
Note: data is obfuscated
Numerai turns the predictions from
participants into real-world decisions for
the hedge fund (buying/selling stocks)
21. Confidential21
The Rewards - Weekly Payouts
https://docs.numer.ai/tournament/staking-and-payouts
https://medium.com/numerai/numeraire-the-cryptocurrency-powering-the-world-hedge-fund-5674b7dd73fe
22. Confidential22
The Crypto - Numeraire (1 NMR = 30 USD)
NMR is available on Popular Crypto Exchanges since August
(NMR-USD, NMR-GBP & NMR-EUR)
23. Confidential23
The Numerai Community
• Official Community Sites
– community.numer.ai
– forum.numer.ai
• Office Hours with Arbitrage
– https://docs.numer.ai/office-hours-
with-arbitrage/office-hours-recaps
• Payouts App by Bouwe
Ceunen
I stopped travelling for events in
March. This is how I (re)discovered
Numerai. Kudos to Jon Taylor
(Arbitrage) and Anthony Mandelli
(Numeria team).
31. Confidential31
Six
Simple
Steps
1. Download data from Numerai
2. Upload training and test datasets to a
Driverless AI instance
a. Test-drive Driverless AI at aquarium.h2o.ai
(free!)
3. Create a new regression experiment
a. Use a custom metric (Spearman’s rank
correlation)
b. Exclude “id”, “era” & “data_type”
4. Automatic machine learning
a. Feature transformation and selection
b. Hyperparameters tuning + ensembles
c. Model documentation
5. Use the final model to make predictions
6. Submit predictions to Numerai
32. Confidential32
Step 1 - Download Data from Numerai
Download
numerai_datasets.zip
https://numer.ai/tournament
● Training: numerai_training_data.csv
● Test: numerai_tournament_data.csv
37. Confidential37
Step 3 - Using a Custom Metric
(Spearman’s Rank Correlation)
https://github.com/woobe/numerati/blob/master/custom_scorer/spearman_correlation.py
SpearmanR
for Numerai
Tournament
38. Confidential38
Step 4 - Automatic Machine Learning
Each dot represents
one model
DAI is trying different
modelling strategies to
maximise SpearmanR
Training models on
multiple GPUs
Maximising time
efficiency
39. Confidential39
Step 4 - Automatic Machine Learning
Continuously improving the
performance (SpearmanR)
Complex feature engineering
tricks based on our Kaggle
experience
40. Confidential40
Step 4 - Automatic Machine Learning
early stopping strategy - maximise time efficiency
41. Confidential41
Step 4 - Automatic Machine Learning
Training final models with a lower learning rate in
order to achieve better generalisation. That’s why
there are more bars (i.e. longer training time) when
compared to the models for feature evolution.
final models
models for feature evolution
42. Confidential42
Step 4 - Automatic Machine Learning
Improved performance due to careful
training of final models and ensemble
DAI tested 2780 features on
100+ models. It found that only
154 features are needed for the
best performance (SpearmanR).
43. Confidential43
Step 5 - Making Predictions
Using the final model from the
experiment to score the test dataset.
Include “id” column in the output
(common requirement for data
science competitions like Kaggle and
Numerai)
44. Confidential44
Step 6 - Upload Predictions to Numerai
One more manual step - changing the column name
Upload the CSV to Numerai.
DONE!
53. Confidential53
Training
1. Download data from Numerai
2. Data munging (R + data.table)
3. Upload data to Driverless AI
4. Try different modelling strategies
5. Save artifacts (scoring pipelines, report)
6. Run constrained optimisation
a. Different strategies: Sharpe, Sortino, feature
exposure, drawdown …
b. Maximum ten models (strategies) allowed
54. Confidential54
Python and R Client for Driverless AI
https://docs.h2o.ai/driverless-ai/latest-
stable/docs/userguide/index.html
55. Confidential55
Download Artifacts from Each Experiment
Scoring pipeline
(Ready to be used in production)
Out-of-fold Predictions
Out-of-sample Predictions
Model Documentation
56. Confidential56
Constrained Optimisation for Different Strategies
(Same Base Models with Different Weights)
This is actually one of the most interesting parts (in my opinion). I will write a blog post about it.
Higher Average Rank Correlation
Lower Sharpe Ratio (i.e. more volatile)
Lower Average Rank Correlation
Higher Sharpe Ratio (i.e. more stable)
57. Confidential57
Scoring 1. cronR (cron job scheduler)
2. Download latest data from Numerai
a. Every Sunday morning
3. Data munging
4. Score new data
5. Apply model weights from optimisation
6. Prepare and submit predictions
7. Send notification to my phone
59. Confidential59
Monitoring 1. cronR (cron job scheduler)
2. Download latest daily scores
a. Tuesday to Saturday
3. Data munging
4. Render HTML -> Push to GitHub
a. Output: https://www.jofaichow.co.uk/numerati/
b. Code: https://github.com/woobe/numerati
5. Send notification to my phone
6. Compare different strategies
7. (Go back to the training step if needed)
72. Confidential72
Learning Resources
• Numerai
– tournament, doc, chat, forum, signals, twitter
– (Almost) Daily Discussion on Twitch https://www.twitch.tv/prof_jtaylor
– Numerai’s Master Plan (link)
– Meta Model Contribution (link)
– Build the World's Open Hedge Fund by Modeling the Stock Market by
Carlo Lepelaars (link)
– Evaluating Financial Machine Learning Models on Numerai by Suraj
Parmar (link)
• H2O Driverless AI
– Learning Center (https://training.h2o.ai/)
– AI & ML Courses (https://training.h2o.ai/ai-and-ml-foundations-courses)
– Driverless AI Tutorials (https://training.h2o.ai/driverlessai-tutorials)
– Test-drive for free (https://aquarium.h2o.ai/) (Try the simple workflow!)
73. Confidential73
New Features in Driverless AI 1.9
https://www.h2o.ai/blog/exploring-the-next-frontier-of-automatic-machine-learning-with-h2o-driverless-ai/
Visit our Virtual Booth!
74. Confidential74
Q & A
Get your free H2O.ai Intro Pack now:
• Access to free and online courses on AI / ML
• A 21-day free trial license of Driverless AI
• Tailored content about your use-cases
• Invites to our upcoming virtual events
• A link to book a meeting directly with one of
our Customer Engagement Managers
Contact Eve-Anne Trehin
eve-anne.trehin@h2o.ai or visit our virtual booth