SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Road to
Winning at Horse Racing
with Data Science
2018/06/27(Wed)
AlphaImpact
NUKUI Shun
How to win?
2
Profile
● Name: NUKUI Shun
● Speciality: Machine Learning
● Experience of horse racing: 11 years
● Bio
○ The president of horse racing club in Tokyo Tech ~2017/03
○ Participated in 電脳賞(春) 2016/03
○ AlphaImpact (development of horse racing AI) 2016/06~
● Favorites: Watching training progress of LightGBM, Kaggle(Home Credit Default Risk)
3
AlphaImpact Project
● Developing horse racing AI
● Members
○ NUKUI Shun : Machine Learning, Horse racing domain knowledge
○ OMOTO Tsukasa : Machine Learning, System Architect, One of committers of LightGBM
○ HARA Tomonori : Horse racing hacker
● Activities
○ HP: https://alphaimpact.jp/
○ Published AI scores for free (~2018/06/24)
○ Sell predictions on netkeiba ウマい馬券
■ http://yoso.netkeiba.com/
4
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
5
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
6
What is the Purpose of Horse Racing Prediction?
● Hit?
7
What is the Purpose of Horse Racing Prediction?
● Hit?
● MAKE A PROFIT!!
8
Which should we bet on?
1 2 3
win proba. (勝率) 70% 20% 10%
best choice?
9
Which should we bet on?
1 2 3
win proba. (勝率) 70% 20% 10%
x1.1 x5.2 x8.0odds (オッズ)
exp. (期待値) 1.1 * 0.70 = 0.77 5.2 * 0.20 = 1.04 8.0 * 0.10 = 0.8
best choice!!
10
House Edge (控除率)
House edge
total sales
Refund as payoff 70%~80% = mean of return ratio
11
Betting Type
● Win (単勝)
● Place (複勝)
● Bracket Quinella (枠連)
● Quinella (馬連)
● Quinella Place (ワイド)
● Exacta (馬単)
● Trio (3連複)
● Trifecta (3連単)
20.0%
22.5%
25.0%
27.5%
house edge
12
Not Impossible to Win
● Mr. 卍 (Manji) made a profit of 140 million in a 3 years
● His theory and analytical skill is great, but we believe it is
not impossible to exceed him by machine learning
● Not hitting tickets are acknowledged as expenses under
certain conditions
13
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
14
What should we solve?
● classification
○ imbalance problem
● regression
○ better choice
○ important to design smoothed objective
● ranking
○ seems to be a natural choice too
○ but not better performance than regression
15
● Strength
○ place of order (着順)
○ standardized time (標準化走破タイム)
○ standardized velocity (標準化平均速度)
○ prize (賞金)
○ speed index (スピード指数)
● Profit
○ place (=within top3) payoff (複勝払い戻し)
■ 1st 120 yen < 3rd 540 yen
○ dark score (business secret)
■ transform strength score to be high correlated with return ratio
Objective of Regression
16
Two types of Objective vs Hit Ratio
standardized velocity dark score
over-popular
17
all turf races in 2017
Two types of Objective vs Return Ratio
standardized velocity dark score 18
all turf races in 2017
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
19
Feature Engineering in Horse Racing
● Most important and most time-consuming part
● Necessary to collect data by ourselves, unlike Kaggle
● Difficult to handle complicated structured data
● Requires deep domain knowledge to horse racing
20
Horse Table (馬柱)
race info
horse attribute
jockey
odds
race histories
21
Many Types of History
horse
jockey
trainer
22
Automated Achievement Features
horse
jockey
trainer
owner
sire (父馬)
…
track type
course
length
course×length
weather
field condition
pace
…
count
win hit ratio
place hit ratio
win return ratio
place return ratio
subject race condition statistic
● We made more than 1500 achievement features
23require domain knowledge
Smoothing Ratio Features
● Eliminate noise of unpopular subject×condition
count=2
hit ratio=0.5
count=100
hit ratio=0.4
A B
hit ratio=0.2
average
unconfident
where α=0.1 hit ratio=0.254 hit ratio=0.399
24
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
25
● One of the most popular frameworks in Kaggle
● Very strong to structured data
● Not affected by scales of features
● Can handle the missing value
● Robust to meaningless features
● We have a committer of LightGBM
LightGBM
26
Training Architecture
Race1
Race2
RaceN
・・・
Race
Race
Race
split into k-folds
by the race
(not by the horse)
LightGBM
LightGBM
LightGBM
cross validation
(the valid set is used for
early stopping)
valid score
hyperopt
score
feedback
update params
average
27
shuffle
Notice of LightGBM
● Early stopping is important to avoid overfitting
● Dummied features are better than categorical ones
○ Depends on dataset
● Sensitive to random_state
○ May result from subsample/colsample_bytree?
● pred_contrib is useful for feature analysis
28
Feature Analysis with LightGBM
= predict(data, pred_contrib=True)# of horses
# of features
feature contribution
matrix
29
Feature Analysis with LightGBM
30
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
31
nDCG
● Used in ranking evaluation
● Higher relevant score (reli
) should be positioned at higher rank
non-negative, “the higher, the better”
32
The Relevant Score of nDCG
● inverse of order
○ 1/1, 1/2, …, 1/N
○ consider the whole ranking
● prize
○ 15000, 6000, 3800, 2300, 1500, 0, 0, ..., 0
○ consider only top 5
● prize@3
○ 15000, 6000, 3800, 0, 0, 0, 0, ..., 0
● place payoff
○ emphasize the dark horses
● win betting share (単勝支持率)
○ how close to popularity
33
The Relevant Score of nDCG
strength model1 (standardized velocity)
34
strength model2 (standardized velocity)
● model1 is more accurate than model2
● model2 is closer to win betting share
● model1 is able to hit the horses that the public cannot predict
all turf races in 2017
Evaluation of Top-N Box Betting
35
storength model (standardized velocity) profit model (dark score)
the lower hit,
the higher return
all turf races in 2017
Summary
● The purpose of horse racing prediction is making a profit
● Design of objective is critical to performance
● Feature engineering requires domain knowledge too
● LightGBM is cool
● nDCG is useful for model evaluation
36
Future work
● Calculate expectation with predicted probability and real-time odds
● Auto buying system with investment strategies
37
Enjoy horse racing life!!
38

Mais conteúdo relacionado

Mais procurados

協調フィルタリング入門
協調フィルタリング入門協調フィルタリング入門
協調フィルタリング入門hoxo_m
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理Yuya Unno
 
レコメンドアルゴリズムの基本と周辺知識と実装方法
レコメンドアルゴリズムの基本と周辺知識と実装方法レコメンドアルゴリズムの基本と周辺知識と実装方法
レコメンドアルゴリズムの基本と周辺知識と実装方法Takeshi Mikami
 
[DL Hacks]Variational Approaches For Auto-Encoding Generative Adversarial Ne...
[DL Hacks]Variational Approaches For Auto-Encoding  Generative Adversarial Ne...[DL Hacks]Variational Approaches For Auto-Encoding  Generative Adversarial Ne...
[DL Hacks]Variational Approaches For Auto-Encoding Generative Adversarial Ne...Deep Learning JP
 
逆求人自己紹介プレゼン(平木場)
逆求人自己紹介プレゼン(平木場)逆求人自己紹介プレゼン(平木場)
逆求人自己紹介プレゼン(平木場)Futa HIRAKOBA
 
バンディットアルゴリズム入門と実践
バンディットアルゴリズム入門と実践バンディットアルゴリズム入門と実践
バンディットアルゴリズム入門と実践智之 村上
 
機械学習モデルのサービングとは?
機械学習モデルのサービングとは?機械学習モデルのサービングとは?
機械学習モデルのサービングとは?Sho Tanaka
 
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)Shotaro Sano
 
機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門hoxo_m
 
Graphic Notes on Linear Algebra and Data Science
Graphic Notes on Linear Algebra and Data ScienceGraphic Notes on Linear Algebra and Data Science
Graphic Notes on Linear Algebra and Data ScienceKenji Hiranabe
 
時系列分析による異常検知入門
時系列分析による異常検知入門時系列分析による異常検知入門
時系列分析による異常検知入門Yohei Sato
 
情報推薦システム入門:講義スライド
情報推薦システム入門:講義スライド情報推薦システム入門:講義スライド
情報推薦システム入門:講義スライドKenta Oku
 
実務と論文で学ぶジョブレコメンデーション最前線2022
実務と論文で学ぶジョブレコメンデーション最前線2022実務と論文で学ぶジョブレコメンデーション最前線2022
実務と論文で学ぶジョブレコメンデーション最前線2022Teruyuki Sakaue
 
機械学習システムのアーキテクチャアラカルト
機械学習システムのアーキテクチャアラカルト機械学習システムのアーキテクチャアラカルト
機械学習システムのアーキテクチャアラカルトBrainPad Inc.
 
最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)
最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)
最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)Shota Imai
 
#IkaLog によるスプラトゥーンの画像解析と機械学習
#IkaLog によるスプラトゥーンの画像解析と機械学習#IkaLog によるスプラトゥーンの画像解析と機械学習
#IkaLog によるスプラトゥーンの画像解析と機械学習Takeshi HASEGAWA
 
オープンデータのためのスクレイピング
オープンデータのためのスクレイピングオープンデータのためのスクレイピング
オープンデータのためのスクレイピング直之 伊藤
 
LINE のUI自動テスト事例
LINE のUI自動テスト事例LINE のUI自動テスト事例
LINE のUI自動テスト事例LINE Corporation
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 

Mais procurados (20)

協調フィルタリング入門
協調フィルタリング入門協調フィルタリング入門
協調フィルタリング入門
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理
 
レコメンドアルゴリズムの基本と周辺知識と実装方法
レコメンドアルゴリズムの基本と周辺知識と実装方法レコメンドアルゴリズムの基本と周辺知識と実装方法
レコメンドアルゴリズムの基本と周辺知識と実装方法
 
[DL Hacks]Variational Approaches For Auto-Encoding Generative Adversarial Ne...
[DL Hacks]Variational Approaches For Auto-Encoding  Generative Adversarial Ne...[DL Hacks]Variational Approaches For Auto-Encoding  Generative Adversarial Ne...
[DL Hacks]Variational Approaches For Auto-Encoding Generative Adversarial Ne...
 
逆求人自己紹介プレゼン(平木場)
逆求人自己紹介プレゼン(平木場)逆求人自己紹介プレゼン(平木場)
逆求人自己紹介プレゼン(平木場)
 
バンディットアルゴリズム入門と実践
バンディットアルゴリズム入門と実践バンディットアルゴリズム入門と実践
バンディットアルゴリズム入門と実践
 
機械学習モデルのサービングとは?
機械学習モデルのサービングとは?機械学習モデルのサービングとは?
機械学習モデルのサービングとは?
 
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
 
機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門
 
Graphic Notes on Linear Algebra and Data Science
Graphic Notes on Linear Algebra and Data ScienceGraphic Notes on Linear Algebra and Data Science
Graphic Notes on Linear Algebra and Data Science
 
時系列分析による異常検知入門
時系列分析による異常検知入門時系列分析による異常検知入門
時系列分析による異常検知入門
 
情報推薦システム入門:講義スライド
情報推薦システム入門:講義スライド情報推薦システム入門:講義スライド
情報推薦システム入門:講義スライド
 
実務と論文で学ぶジョブレコメンデーション最前線2022
実務と論文で学ぶジョブレコメンデーション最前線2022実務と論文で学ぶジョブレコメンデーション最前線2022
実務と論文で学ぶジョブレコメンデーション最前線2022
 
機械学習システムのアーキテクチャアラカルト
機械学習システムのアーキテクチャアラカルト機械学習システムのアーキテクチャアラカルト
機械学習システムのアーキテクチャアラカルト
 
最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)
最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)
最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)
 
MLOps入門
MLOps入門MLOps入門
MLOps入門
 
#IkaLog によるスプラトゥーンの画像解析と機械学習
#IkaLog によるスプラトゥーンの画像解析と機械学習#IkaLog によるスプラトゥーンの画像解析と機械学習
#IkaLog によるスプラトゥーンの画像解析と機械学習
 
オープンデータのためのスクレイピング
オープンデータのためのスクレイピングオープンデータのためのスクレイピング
オープンデータのためのスクレイピング
 
LINE のUI自動テスト事例
LINE のUI自動テスト事例LINE のUI自動テスト事例
LINE のUI自動テスト事例
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 

Semelhante a Road to Winning at Horse Racing with Data Science

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionPreferred Networks
 
Implementation and analysis of search algorithms in single player connect fou...
Implementation and analysis of search algorithms in single player connect fou...Implementation and analysis of search algorithms in single player connect fou...
Implementation and analysis of search algorithms in single player connect fou...Anmol Rajpurohit
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move PruningImproving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move PruningAntonio Mora
 
Click prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real lifeClick prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real lifeAlexey Grigorev
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
Predicting Winning Price in Real Time Bidding with Censored Data
Predicting Winning Price in Real Time Bidding with Censored DataPredicting Winning Price in Real Time Bidding with Censored Data
Predicting Winning Price in Real Time Bidding with Censored DataWush Wu
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Final Report HeroMoto
Final Report HeroMotoFinal Report HeroMoto
Final Report HeroMotoVivek Pabla
 
Formula SAE: Racing Cars Built by Students
Formula SAE: Racing Cars Built by StudentsFormula SAE: Racing Cars Built by Students
Formula SAE: Racing Cars Built by StudentsSpeck&Tech
 
Machine learning presentation
Machine learning presentationMachine learning presentation
Machine learning presentationWalid Hamdi
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictionsAnton Kulesh
 
Carvana auctioned kicked car prediction
Carvana auctioned kicked car predictionCarvana auctioned kicked car prediction
Carvana auctioned kicked car predictionSohil Shah
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceDatabricks
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLHimadri Mishra
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI AI Frontiers
 
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...Spark Summit
 

Semelhante a Road to Winning at Horse Racing with Data Science (20)

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 
Implementation and analysis of search algorithms in single player connect fou...
Implementation and analysis of search algorithms in single player connect fou...Implementation and analysis of search algorithms in single player connect fou...
Implementation and analysis of search algorithms in single player connect fou...
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move PruningImproving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
 
Click prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real lifeClick prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real life
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Predicting Winning Price in Real Time Bidding with Censored Data
Predicting Winning Price in Real Time Bidding with Censored DataPredicting Winning Price in Real Time Bidding with Censored Data
Predicting Winning Price in Real Time Bidding with Censored Data
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Final Report HeroMoto
Final Report HeroMotoFinal Report HeroMoto
Final Report HeroMoto
 
Formula SAE: Racing Cars Built by Students
Formula SAE: Racing Cars Built by StudentsFormula SAE: Racing Cars Built by Students
Formula SAE: Racing Cars Built by Students
 
Machine learning presentation
Machine learning presentationMachine learning presentation
Machine learning presentation
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
Carvana auctioned kicked car prediction
Carvana auctioned kicked car predictionCarvana auctioned kicked car prediction
Carvana auctioned kicked car prediction
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
 
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
 
Ad Placement Challenge
Ad Placement ChallengeAd Placement Challenge
Ad Placement Challenge
 

Último

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 

Último (20)

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 

Road to Winning at Horse Racing with Data Science

  • 1. Road to Winning at Horse Racing with Data Science 2018/06/27(Wed) AlphaImpact NUKUI Shun
  • 3. Profile ● Name: NUKUI Shun ● Speciality: Machine Learning ● Experience of horse racing: 11 years ● Bio ○ The president of horse racing club in Tokyo Tech ~2017/03 ○ Participated in 電脳賞(春) 2016/03 ○ AlphaImpact (development of horse racing AI) 2016/06~ ● Favorites: Watching training progress of LightGBM, Kaggle(Home Credit Default Risk) 3
  • 4. AlphaImpact Project ● Developing horse racing AI ● Members ○ NUKUI Shun : Machine Learning, Horse racing domain knowledge ○ OMOTO Tsukasa : Machine Learning, System Architect, One of committers of LightGBM ○ HARA Tomonori : Horse racing hacker ● Activities ○ HP: https://alphaimpact.jp/ ○ Published AI scores for free (~2018/06/24) ○ Sell predictions on netkeiba ウマい馬券 ■ http://yoso.netkeiba.com/ 4
  • 5. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 5
  • 6. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 6
  • 7. What is the Purpose of Horse Racing Prediction? ● Hit? 7
  • 8. What is the Purpose of Horse Racing Prediction? ● Hit? ● MAKE A PROFIT!! 8
  • 9. Which should we bet on? 1 2 3 win proba. (勝率) 70% 20% 10% best choice? 9
  • 10. Which should we bet on? 1 2 3 win proba. (勝率) 70% 20% 10% x1.1 x5.2 x8.0odds (オッズ) exp. (期待値) 1.1 * 0.70 = 0.77 5.2 * 0.20 = 1.04 8.0 * 0.10 = 0.8 best choice!! 10
  • 11. House Edge (控除率) House edge total sales Refund as payoff 70%~80% = mean of return ratio 11
  • 12. Betting Type ● Win (単勝) ● Place (複勝) ● Bracket Quinella (枠連) ● Quinella (馬連) ● Quinella Place (ワイド) ● Exacta (馬単) ● Trio (3連複) ● Trifecta (3連単) 20.0% 22.5% 25.0% 27.5% house edge 12
  • 13. Not Impossible to Win ● Mr. 卍 (Manji) made a profit of 140 million in a 3 years ● His theory and analytical skill is great, but we believe it is not impossible to exceed him by machine learning ● Not hitting tickets are acknowledged as expenses under certain conditions 13
  • 14. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 14
  • 15. What should we solve? ● classification ○ imbalance problem ● regression ○ better choice ○ important to design smoothed objective ● ranking ○ seems to be a natural choice too ○ but not better performance than regression 15
  • 16. ● Strength ○ place of order (着順) ○ standardized time (標準化走破タイム) ○ standardized velocity (標準化平均速度) ○ prize (賞金) ○ speed index (スピード指数) ● Profit ○ place (=within top3) payoff (複勝払い戻し) ■ 1st 120 yen < 3rd 540 yen ○ dark score (business secret) ■ transform strength score to be high correlated with return ratio Objective of Regression 16
  • 17. Two types of Objective vs Hit Ratio standardized velocity dark score over-popular 17 all turf races in 2017
  • 18. Two types of Objective vs Return Ratio standardized velocity dark score 18 all turf races in 2017
  • 19. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 19
  • 20. Feature Engineering in Horse Racing ● Most important and most time-consuming part ● Necessary to collect data by ourselves, unlike Kaggle ● Difficult to handle complicated structured data ● Requires deep domain knowledge to horse racing 20
  • 21. Horse Table (馬柱) race info horse attribute jockey odds race histories 21
  • 22. Many Types of History horse jockey trainer 22
  • 23. Automated Achievement Features horse jockey trainer owner sire (父馬) … track type course length course×length weather field condition pace … count win hit ratio place hit ratio win return ratio place return ratio subject race condition statistic ● We made more than 1500 achievement features 23require domain knowledge
  • 24. Smoothing Ratio Features ● Eliminate noise of unpopular subject×condition count=2 hit ratio=0.5 count=100 hit ratio=0.4 A B hit ratio=0.2 average unconfident where α=0.1 hit ratio=0.254 hit ratio=0.399 24
  • 25. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 25
  • 26. ● One of the most popular frameworks in Kaggle ● Very strong to structured data ● Not affected by scales of features ● Can handle the missing value ● Robust to meaningless features ● We have a committer of LightGBM LightGBM 26
  • 27. Training Architecture Race1 Race2 RaceN ・・・ Race Race Race split into k-folds by the race (not by the horse) LightGBM LightGBM LightGBM cross validation (the valid set is used for early stopping) valid score hyperopt score feedback update params average 27 shuffle
  • 28. Notice of LightGBM ● Early stopping is important to avoid overfitting ● Dummied features are better than categorical ones ○ Depends on dataset ● Sensitive to random_state ○ May result from subsample/colsample_bytree? ● pred_contrib is useful for feature analysis 28
  • 29. Feature Analysis with LightGBM = predict(data, pred_contrib=True)# of horses # of features feature contribution matrix 29
  • 30. Feature Analysis with LightGBM 30
  • 31. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 31
  • 32. nDCG ● Used in ranking evaluation ● Higher relevant score (reli ) should be positioned at higher rank non-negative, “the higher, the better” 32
  • 33. The Relevant Score of nDCG ● inverse of order ○ 1/1, 1/2, …, 1/N ○ consider the whole ranking ● prize ○ 15000, 6000, 3800, 2300, 1500, 0, 0, ..., 0 ○ consider only top 5 ● prize@3 ○ 15000, 6000, 3800, 0, 0, 0, 0, ..., 0 ● place payoff ○ emphasize the dark horses ● win betting share (単勝支持率) ○ how close to popularity 33
  • 34. The Relevant Score of nDCG strength model1 (standardized velocity) 34 strength model2 (standardized velocity) ● model1 is more accurate than model2 ● model2 is closer to win betting share ● model1 is able to hit the horses that the public cannot predict all turf races in 2017
  • 35. Evaluation of Top-N Box Betting 35 storength model (standardized velocity) profit model (dark score) the lower hit, the higher return all turf races in 2017
  • 36. Summary ● The purpose of horse racing prediction is making a profit ● Design of objective is critical to performance ● Feature engineering requires domain knowledge too ● LightGBM is cool ● nDCG is useful for model evaluation 36
  • 37. Future work ● Calculate expectation with predicted probability and real-time odds ● Auto buying system with investment strategies 37
  • 38. Enjoy horse racing life!! 38