Seu SlideShare está sendo baixado.
×

- 1. Can ‘Black Box’ Responsible Gambling Algorithms be Understood by Users? A real-world example Chris Percy, New Horizons in Responsible Gambling Conference, Vancouver, February 2016
- 2. Key Concepts • Machine Learning: the study and construction of complex “black box” algorithms that are capable of learning from data and making predictions based on it • Supervised Machine Learning: the user provides the algorithm with test data with labelled outcomes (e.g. Person A is high risk, Person B is low risk, …) • Knowledge Extraction: despite being complex, machine learning algorithms can be understood (and simplified) – they are not magic
- 3. Would you rather have: 1. An algorithm that assesses problematic play that is 90% accurate which you cannot properly understand or explain or 2. An algorithm that assesses problematic play that is 75% accurate which you fully understand or explain? Question
- 4. - Overview of problem gambling literature - IT platform - Statistical researchers - Multi-site, multi-country gambling platform - Anonymised data - 240,000 accounts available (mainly EU) • Study 1: Analyse casino internet players using k-means clustering • Study 2: Describe internet casino/poker self-excluders • Study 3: Predicting self-exclusion events • Study 4: Understanding self- exclusion predictive methods Study 1: S. Dragicevic, G. Tsogas, A. Kudic, International Gambling Studies, Nov 2011. Study 2: S.Dragicevic, C.Percy, A.Kudic & J.Parke, J. of Gambling Studies, Nov 2013. Study 3: C. Percy, M. Franca, S. Dragicevic, A. d’Avila Garcez, International Gambling Studies [under review]. Study 4: In progress Research Collaboration
- 5. Focusoftoday - Overview of problem gambling literature - IT platform - Statistical researchers - Multi-site, multi-country gambling platform - Anonymised data - 240,000 accounts available (mainly EU) • Study 1: Analyse casino internet players using k-means clustering • Study 2: Describe internet casino/poker self-excluders • Study 3: Predicting self-exclusion events • Study 4: Understanding self- exclusion predictive methods Study 1: S. Dragicevic, G. Tsogas, A. Kudic, International Gambling Studies, Nov 2011. Study 2: S.Dragicevic, C.Percy, A.Kudic & J.Parke, J. of Gambling Studies, Nov 2013. Study 3: C. Percy, M. Franca, S. Dragicevic, A. d’Avila Garcez, International Gambling Studies [under review]. Study 4: In progress Research Collaboration
- 6. Chris Percy Lead Researcher Simo Dragicevic CEO Manoel Franca Research Assistant Artur d’Avila Garcez Reader Tillman Weyde Senior Lecturer Greg Slabaugh Senior Lecturer Machine Learning Group Department of Computing Science Study Team
- 7. Session Overview Ideas for the future / Q&A Findings and limitations Agenda Analysis & Results
- 8. Session Overview Analysis & Results Ideas for the future / Q&A Findings and limitations Agenda
- 9. Question for today Mid-point view… “Predicting those who self-excluded from online gambling vs those that did not” (N=845) • Can different machine learning approaches improve on predictive accuracy? Which are best? • Are such methods still interpretable and useable in practice? What can be done quickly and easily to help interpret the models? • Accuracy can be improved 10-20 % points (random forest), but models are very reliant on human input • There is an accuracy / interpretation trade-off at first glance, but additional techniques can help understand what drives the model • To explain an individual’s personal results, we may need further layers of interpretative software Session Overview
- 10. Session Overview Analysis & Results Agenda Ideas for the future / Q&A Findings and limitations
- 11. Raw Data Pre-Processing Machine Learning Accuracy Results Interpretation 176 who self-excluded for 180+ days & 669 as a control group Only partial data coverage 5 gambling behaviour risk factors (frequency, trajectory, intensity, variability, session time) 4 supervised learning techniques (logistic reg., Bayesian nets, random forest, neural nets) 72%-87% vs baseline accuracy of 52% Best results from random forest Large multi-variate models provide little actionable insight beyond raw prediction – simplification needed (e.g. TREPAN) Analysis & Results
- 12. Raw Data Pre-Processing Machine Learning Accuracy Results Interpretation 176 who self-excluded for 180+ days & 669 as a control group Only partial data coverage 5 gambling behaviour risk factors (frequency, trajectory, intensity, variability, session time) 4 supervised learning techniques (logistic reg., Bayesian nets, random forest, neural nets) 72%-87% vs baseline accuracy of 52% Best results from random forest Large multi-variate models provide little actionable insight beyond raw prediction – simplification needed (e.g. TREPAN) Analysis & Results
- 13. Demographics • Gender • Age • Country Gambling behaviour • Start/end of each session • Amount bet per wager • Amount won per wager (not used) • Type of game (not used) • Purchases/withdrawals transactions (not used) Self-exclusion • Start date of self-exclusion • Length (varies from 1 day to 1+ years, to unlimited) Our Data
- 14. Self-Excluder Cohort (SE) • 604 who had self-excluded at least once from April 2009 to July 2011 • Data on sessions leading up to their first self-exclusion from June 2008+ Control Group Cohort (CG) • 871 representative of 11,667 who gambled 10+ sessions in Jan 2009 (as did ~95% of SE cohort) • Had not self-excluded as of July 2011 • Data available from Jan 2009 to Dec 2010 Exclusions • First SE period < 180 days • Insufficient data to calculate risk factors (~50% only played one week) Exclusions • Festive period: Nov/Dec 2010 • Insufficient data to calculate risk factors Sample generation – De-identified data from IGT Final cohort size: 176 Final cohort size: 669
- 15. -1000 0 1000 2000 3000 4000 5000 5th 25th Median 75th 95th Self-excluders Control group Avg loss / month [EUR]1) SE Mean: 897 EUR CG Mean: 646 EUR P-value 0.00 Those who self- excluded lost 250 EUR extra per month on average Lose more / win more Riskier, higher wager bets? 1) Excl. one lucky self- excluding player who won EUR 24 m in one bet, enough to skew overall averages Note: Descriptive charts are taken from Study 2 and were designed to optimise the sample size available; hence different n in some cases vs today’s main results Self-excluders were: • Even more focused on casino game types (vs poker) than control group • Less loyal to their top games • Tried fewer games (shorter tenure) • Younger Little difference seen in: • Absolute time gambling per month • Number of wager-sessions per active gambling day • Gender Descriptive comparison
- 16. Pre-Processing Machine Learning Accuracy Results Interpretation 176 who self-excluded for 180+ days & 669 as a control group Only partial data coverage 5 gambling behaviour risk factors (frequency, trajectory, intensity, variability, session time) 4 supervised learning techniques (logistic reg., Bayesian nets, random forest, neural nets) 72%-87% vs baseline accuracy of 52% Best results from random forest Large multi-variate models provide little actionable insight beyond raw prediction – simplification needed (e.g. TREPAN) Analysis & Results Raw Data
- 17. • Direct input of de-contextualised raw data resulted in ineffective models • ML methods (as used here) required human-defined pre-processing activities • Alternative: Construct ML with general principles for data linkage so it can do “its own” pre- processing (e.g. image recognition ML) Risk Factor Brief description Frequency What proportion of days an individual gambles (vs all days in period) Trajectory How much an individual wagers in total per day (on days they gamble) Intensity How many times a gambler places wagers per day (on days they gamble) Variability How much total bet amount varies from day to day (on days they gamble) Session Time How long a gambler’s sessions last all together (on days they gamble) Rawdata &ML Pre-processing – Including risk factors
- 18. Risk Factor Grid Frequency Trajectory Variability Session Time Intensity Past period (Absolute value) Current period (Absolute value) Delta > 10%? (Delta vs past) Delta (Current vs past) P-Value (1-PV) (Delta vs past) P-Value category (H/M/L) 3 demographics variables • Gender (Male dummy) • Age in 2010 • Country (DE dummy) Remove over-identifying data not available in live operating scenario • Total days gambling • Total bet • Calendar dates Focus on large number of variables to optimise for accuracy (no certainty over which will work) 30 variables available to capture gambling behaviour in terms of: • Absolute level of activity (past and present) • Delta in current activity vs the past • Statistical significance of any changes in trend Pre-processing – 33 variables used in analysis
- 19. Machine Learning Pre-Processing Accuracy Results Interpretation 176 who self-excluded for 180+ days & 669 as a control group Only partial data coverage 5 gambling behaviour risk factors (frequency, trajectory, intensity, variability, session time) 4 supervised learning techniques (logistic reg., Bayesian nets, random forest, neural nets) 72%-87% vs baseline accuracy of 52% Best results from random forest Large multi-variate models provide little actionable insight beyond raw prediction – simplification needed (e.g. TREPAN) Raw Data Pre-Processing Analysis & Results
- 20. Training dataset General template ML process Model output “Outcome” “Input data” “Algorithm” “Define Good” “Iterate” “Model” Supervised machine learning – basic principles
- 21. Training dataset General template ML process Model output “Outcome” We know the correct classification for each one e.g. Yes, he self-excluded or No, he did not “Input data” For each person, we also have various descriptive facts to use to estimate the outcome “Algorithm” • Each ML technique has its own general template for manipulating input data to create a score for each player – A cut-off value will determine what scores are classified as SE • But there are many different ways to apply the template – some which the programmer chooses, others which are chosen by the computer “Define Good” Tell computer if we want a harsh model, e.g. only identifies self- excluder where very confident, or one that optimises accuracy “Iterate” • Computer tries lots (and lots!) of ways to set up and optimise the template • Not just “trial and error” but directed approach “Model” • Fully-determined specific set of rules to turn input data into a single score • Rule to turn each score into an outcome classification • We know how accurate the model is on training / test datasets • Some “rules” end up long and convoluted “black box” • For social science models, we know reality is more complex than our template • We also know we’re missing important data, e.g. have they just lost their job / are they rich? • But a particular model may still be a good enough approximation to be useful Ambition PerfectionSupervised machine learning – basic principles
- 22. Typicallylargermodels1) Typicallyhardertointerpret Technique Template summary WEKA parameters (e.g.) Logistic regression [simple] • Relates each input variable directly to the output variable via logistic curve • No mapping of inter-input variable relationships or non-linear transformations (this is possible in advanced logistic reg.) • Link function: Binomial log. • Classification cut-off : 0.5 Bayesian network • Structured map of the main ways that all input variables and output variable relate to each other • Based on their conditional probabilities. • Parents: Unlimited • Score type: Entropy • Type: SimpleEstimator • Algorithm: K2 • No prior knowledge assumed Neural network • Layers of connected nodes with activation functions (typically non-linear, e.g. sigmoid) • First layer: values of all the input variables • Middle layer: non-linear transformations • Output layer: a prediction score • Momentum: 0.1 • Learning rate: 0.05 • Decay factor: 0.999 • Learning rule: Backpropagation Random forest • Ensemble of many decision trees • Each tree seeks to classify a gambler into self-excluder or not based on the values of a subset of input variables • Max depth: Unlimited • Number of trees: 200 • Features used for rand. selc.: 3 1) i.e. the model typically employs a greater number of links from inputs to outputs / more lines of code are required to present the model Machine learning – Four techniques
- 23. Accuracy Results 72%-87% vs baseline accuracy of 52% Best results from random forest Interpretation 176 who self-excluded for 180+ days & 669 as a control group Only partial data coverage 5 gambling behaviour risk factors (frequency, trajectory, intensity, variability, session time) 4 supervised learning techniques (logistic reg., Bayesian nets, random forest, neural nets) Large multi-variate models provide little actionable insight beyond raw prediction – simplification needed (e.g. TREPAN) Pre-Processing Analysis & Results Machine Learning Raw Data
- 24. • SMOTE to balance the dataset to ~50:50 control group and self-excluders • 10-fold cross- validation to reduce over-fitting • Default WEKA parameters used where not specified Sensitivity Correctly classified self- excluders Specificity Correctly classified control group Logistic regression 0.70 0.74 Bayesian networks 0.77 0.94 Neural networks 0.73 0.80 Random forest 0.87 0.87 87% 86% 77% 72% 0% 20% 40% 60% 80% 100% Overall Accuracy [st. dev.] 2.3 3.5 3.4 2.7 • Random forest and Bayesian networks perform best on overall accuracy ~87% (baseline accuracy of “pick the most common outcome” would be 52%) • Higher reliability of random forest and sensitivity/specificity balance favours Random forest • Simple RF /demographic judgement rules by eye identify similar accuracy to logistic regression Comment Accuracy results
- 25. Interpretation Large multi-variate models provide little actionable insight beyond raw prediction – simplification needed (e.g. TREPAN) 176 who self-excluded for 180+ days & 669 as a control group Only partial data coverage 5 gambling behaviour risk factors (frequency, trajectory, intensity, variability, session time) 4 supervised learning techniques (logistic reg., Bayesian nets, random forest, neural nets) Pre-Processing Analysis & Results Machine Learning Raw Data Accuracy Results 72%-87% vs baseline accuracy of 52% Best results from random forest
- 26. When does interpretation matter? When is interpretation less important? • Focused just on getting an accurate prediction • No value in knowing why a prediction is accurate • Enough to have an estimate of how accurate it is likely to be • Understanding what drives the model means we can: • …challenge it1) and know its strengths / flaws • …take industry-level action based on model insights (e.g. identify if some casino games are very high risk) • …try to simplify the model and get similar accuracy (e.g. might matter for real-time / fast computing time) 1) Models can be wrong, e.g. poor set-up, over-fitting, quirks of the sample – challenging it from experience gives confidence in it being robust in the future • Understanding why a gambler gets a specific result helps: • …explain the result to someone (e.g. might help accept it and take action) • …check reasons model may be wrong in this particular case • …understand what to change to get a different prediction Are assessments and decisions open to regulatory, legal or clinical challenge? Interpretation – What are we seeking to achieve?
- 27. Random Forest – Raw output (1/2) Feature # Number of Trees (x2 for binary output variant) 200 Model Size in KB 841 Minimum Tree Height 13 Maximum Tree Height 23 Mean Tree Height 17 Minimum Number of Leaves 146 Maximum Number of Leaves 203 Mean Number of Leaves 176 Hypothetical decision tree [segment]Model schematic Age > 31 YesNo …. Frequency > 30% YesNo Self-ExcluderGender is Male YesNo Self-ExcluderControl Group YesNo Shown segment: Height: 3 Leaves: 3 Very small compared to real output
- 28. The model can be exported as Java script: One page shown in Word below Why hard to analyse in current form? • Each tree is a complex set of unequal routes and variable interactions • Just too large… 200 trees Random Forest – Raw output (2/2)
- 29. What is this? • A count of the number of times that each variable occurs across the 2,536 pages of code • Scores are normalised relative to the most frequent variable count (Age in this case) … Var - Increase Int - Stat Sig Sess - Stat Sig Sess - Increase Int - Previous Period Traj - Increase Freq - Increase Traj - Increase Sess - Increase Freq - Stat Sig Int - Current Period Sess - Current Period Freq - Previous Period Freq - Current Period Var - Previous Period Traj - Current period Var - Current Period Age Top variables – By scaled frequency Why is it limited? No account of: • Where on a tree the variable appears • How its influence depends on other variables • What value ranges drive results (i.e. positive or negative net influence?) • How many gamblers are distinguished at each node (minor in well-trained model) Random Forest – Frequencies
- 30. Input Layer = 33 variables Trajectory Risk Age …Gender Hidden Layer = All 33 variables feed into each of 17 nodes Node 1 Node 2 …Node 3 Output Layer = All 17 nodes feed into 2 output nodes Control Group Score Self- Excluder Score Simple rule: Allocate the player to CG or SE depending which has the highest score How to link inputs to outputs: In this model, all nodes are sigmoid nodes. As well as a threshold score, each node has a weight (i.e. a number) for each input variable. We multiply the value of each input variable by the relevant weight; do the same for all input variables and add them together, then minus the threshold score. The result is then transformed by the sigmoid function to give the node output: f(x) = (1 + e-x)-1. Node outputs are between 0 and 1. Neural Networks – Raw output
- 31. Input Layer = 33 variables Trajectory Risk Age …Gender Hidden Layer = All 33 variables feed into each of 17 nodes Node 1 Node 2 …Node 3 Output Layer = All 17 nodes feed into 2 output nodes Control Group Score Self- Excluder Score Simple rule: Allocate the player to CG or SE depending which has the highest score Neural Networks – Raw output Why hard to analyse in current form? • We can capture the model in a few hundred Excel cells, but interactions between variables make it hard to analyse directly • Each input affects the model via 17 x 2 different routes • Not all inputs are equally important, in that their typical range of values might or might not often change the prediction • Whether its value is important on any given route depends on the value of the other inputs • If an input is important on one route but not another, is it important overall?
- 32. Are at least 3 of these true: • Age > 31 • Frequency trend not significant • Variability high • Intensity has increased 49% vs previous period Has intensity increased 22% vs the previous period? Do they score medium or high on Variability stat. significance? Is increase in Frequency highly stat. significant (10%) level? Are they based in Germany? Do they score zero on Session Time statistical significance? Are they male? Score low, med or high on Frequency stat. significance? No Yes SE NO NO NO SE SE SE NO NO What is TREPAN? • Algorithm that treats the neural network as an oracle – try lots of inputs to see which ones matter • Seeks a trade-off between fidelity to the original neural network, accuracy and simplicity Results • Fidelity: 87% • Loss of accuracy: 1-2 percentage points (NB. Neural network had relatively poor accuracy) Neural Networks – TREPAN (1/2)
- 33. Neural Networks – TREPAN (2/2) Are at least 3 of these true: • Age > 31 • Frequency trend not significant • Variability high • Intensity has increased 49% vs previous period Has intensity increased 22% vs the previous period? Do they score medium or high on Variability stat. significance? Is increase in Frequency highly stat. significant (10%) level? Are they based in Germany? Do they score zero on Session Time statistical significance? Are they male? Score low, med or high on Frequency stat. significance? No Yes SE NO NO NO SE SE SE NO NO Minor: Only ~3% of sample goes down these routes Strong flag on Frequency = More likely to self-exclude These self-excluders flagged highly on Variability (and possibly on Intensity) All flagged at least a little on Session Time – German players less likely to self-exclude overall (interpretation uncertain) If you flag a little on Frequency, but do not flag on at least one of Intensity or Variability, you are likely to be control group
- 34. Variable (1/2) OR P-value Intercept 0.1 0.0 Trajectory risk factor Statistical significance (1-P-value) 2.9 0.1 Slope coefficient 1.0 0.0 Average amount bet per gambling day 1.0 0.0 Increase in average bet per day 0.9 0.5 Dummy variable for increase >10% (1=Yes,0=No) 0.6 0.1 Statistical significance category 1.5 0.1 Frequency risk factor Statistical significance (1-P-value) 2.5 0.0 Current Frequency (share of gambling days in the current period) 0.8 0.6 Prior Frequency (share of gambling days the previous period) 0.7 0.6 Increase in Frequency between periods 0.9 0.0 Dummy variable for increase >10% (1=Yes,0=No) 2.3 0.0 Statistical significance category 0.8 0.0 Intensity risk factor Statistical significance (1-P-value) 0.1 0.0 Current Intensity (number of bets placed over the recent period) 1.0 0.0 Prior Intensity (number of bets placed over the previous period) 1.0 0.5 Increase in Intensity between periods 1.1 0.1 Dummy variable for increase >10% (1=Yes,0=No) 12.7 0.0 Statistical significance category 1.4 0.1 Variable (2/2) OR P-value Session time risk factor Statistical significance (1-P-value) 0.2 0.0 Slope coefficient 1.0 0.1 Average session time per gambling day 0.8 0.0 Increase in average session time 1.2 0.4 Dummy variable for increase >10% (1=Yes,0=No) 1.7 0.1 Statistical significance category 1.3 0.2 Variability risk factor Statistical significance (1-P-value) 1.1 0.9 Amount bet standard deviation (recent period) 1.0 0.0 Amount bet standard deviation (previous period) 1.0 0.0 Increase in standard deviation 1.0 0.8 Dummy variable for increase >10% (1=Yes,0=No) 1.4 0.3 Statistical significance category 0.9 0.4 Demographic variables Gender (1=Male, 0=Female) 2.1 0.0 Age in 2010 1.1 0.0 Germany-based dummy variable (1=Germany, 0=Non-German) 1.7 0.0 Variables rounded to one decimal place for presentation purposes only Logistic regression – Full model
- 35. Variable (excerpt) Odds ratio P- value Intercept 0.1 0.0 Trajectory risk factor Statistical significance (1-P-value) 2.9 0.1 Slope coefficient 1.0 0.0 Average amount bet per gambling day 1.0 0.0 Increase in average bet per day 0.9 0.5 Dummy variable for increase >10% (1=Yes,0=No) 0.6 0.1 Statistical significance category 1.5 0.1 … • How much to multiple a gambler’s probability of being in the control group for each unit they score (holding other variables constant) • e.g. Bet amount increased by 10% 40% less likely to be control group • i.e. how powerful is the predictor • Multiplying by 1.0 doesn’t change anything! • But the slope coefficient can be a large number and varies widely • Each unit multiplied by 1.001 can be a big effect over thousands of units • P-value range is from 0 to 1 and a low value makes us more confident the predictive value of that variable is greater than zero1) • If two variables have the same Odds Ratio, a lower p-value means the predictive effect is more consistent across different players • i.e. how reliable is the predictor 1) More specifically: It states the probability that the co- efficient might in fact be zero (i.e. the odds ratio be one) given the volatility and patterns in the data Logistic regression – Excerpt (1/2)
- 36. Variable (excerpt) Odds ratio P- value Intercept 0.1 0.0 Trajectory risk factor Statistical significance (1-P-value) 2.9 0.1 Slope coefficient 1.0 0.0 Average amount bet per gambling day 1.0 0.0 Increase in average bet per day 0.9 0.5 Dummy variable for increase >10% (1=Yes,0=No) 0.6 0.1 Statistical significance category 1.5 0.1 … Why is interpretation limited? • To have a simple structure, we did not include interactions between variables, but what if combinations of behaviours matter? (e.g. betting less per day might mitigate spending more days betting) • To optimise accuracy, we included a lot of variables, but some are correlated, which reduces ability to interpret individual ORs or PVs (e.g. Statistical significance and Statistical significance category within each risk factor) • When an individual risk factor contains some “strong predictors” and some “weak predictors” with some pointing in opposite directions, what does that mean overall? In theory this looks like we can interpret the importance of each variable quite well… Logistic regression – Excerpt (2/2)
- 37. Problem with random forest and neural nets was primarily the scale and number of routes For regression, it is the large number of co-varying variables (affects other techniques too) 1. Create simpler models with fewer variables from start1) e.g. Include “Delta value” and exclude “Delta > 10%” e.g. Do each risk factor one by one BUT: No longer the same model Accuracy suffers (at least a bit) 0 0.2 0.4 Demog Int Sess Traj Freq Var Propn of variables with p-value < 0.15 100% 83% 67% 83% 67% 33% 1) Various methods exist, e.g. remove those with lowest predictive power, combine or reduce variables a priori that are conceptually related, PCA, etc. 2. Average e.g. the p-values across common topics of interest Exploring either option properly quickly becomes a major task to be done manually for each model , with lots of different approaches to choose from Logistic regression – Variable Importance BUT: Averages might disguise a “killer variable” Still no insight into net effect size or direction2) 2) Requires more work but we can do a similar graph for this too How to respond? Two ideas within LR framework
- 38. WEKA graph : Priority Layout, Top Down, With Edge Concentration One node for each of the 33 networks, plus the top node to represent the Self Excluder / Control Group outcome • Can pay less attention to mid 2 columns: Reflect just 4% of sample, others are 45%+ • Ratios of 4:5 and 5:2 Infer higher risk stat sig. = more likely to be self-excluder • Combine these conditional probabilities across network to get overall probability 0.00 0.00 – 0.21 0.21 – 0.54 0.54 – 1.00 Control 0.5 0.0 0.1 0.4 Self-Exclude 0.2 0.1 0.2 0.5 Probability distribution of Frequency Statistical Significance Simple node: Frequency Stat. Significance • Links directly to final output • Only takes values 0 1 • Probability of being a self-excluder, conditional just on med/high score is 0.5 Bayesian networks (1/2)
- 39. Bayesian networks (2/2) Why hard to analyse in current form? • Too many non-linear conditional probabilities – Not practical to do much by eye • Further computation or techniques would be required to analyse this network usefully Complex low-level node: Variability delta > 10% • Simple variable: only takes value “Yes” or “No” • But its impact on the conditional probability of being a self- excluder depends on fairly precise values of two Trajectory variables and two other Variability variables • Overall it can take 360 separately defined conditional probabilities WEKA graph : Priority Layout, Top Down, With Edge Concentration
- 40. Interpretation – Overall view on 33-variable model Technique Highlighted variables Logistic regression [simple] Bayesian network Neural network Random forest Immediate output • Simple picture (one page) • Each variable can be superficially ranked – but not that insightful on its own due to covariance • Theoretically accessible (a few pages) • But non-linear conditional probabilities too complex • Mathematically clear and accessible (dozen pages) • But interconnections too complex to analyse by eye • Impenetrable • Thousands of pages of code Quick analysis • Looking at sets of p- values gives a sense of good predictors but not effect size or direction • Visual graph analysis • TREPAN creates an interpretable map • Some loss of accuracy • Frequency chart tells what variables used • But not how they are used or when • Demographics • Absolute values and statistical significance more than trends • Variability least useful risk factor • Demographics • Most risk factor features play a role, with least emphasis on the statistical significance of Variability • Demographics • Statistical significance of Session Time, Frequency and Variability • Absolute increases in Intensity and Variability • Age • Absolute values (all risk factors) • Frequency statistical significance
- 41. Findings and limitations Session Overview Agenda Ideas for the future / Q&A Analysis & Results
- 42. Human oversight of model creation is essential - Transparency / flexibility needed 1) Supervised models rely on the choice of a well-defined outcome parameter, but problem gambling manifests in different ways; not always well-defined 2) Raw session data work poorly as inputs Human role in teaching “how to read” or pre-processing, as well as in model choice, parameters, control group etc. “Machine learning” still not “Artificial intelligence” in this instance Mid-point research: A few learnings so far (1/3)
- 43. There is a trade-off between accuracy and direct model-level interpretability…. • Interpretability reduced by more complex models (more “routes” from inputs to outputs) • And by models with more variables (especially with correlation between variables) Improving interpretability (e.g. which variables matter most for model prediction) 90% 60% Simple linear models / Visual descriptive analysis Random Forest Accuracy (onbalanceddataset) … but applying additional techniques show some promise • TREPAN reduced 595 Neural Network model parameters to a simple decision tree with 8 nodes – accuracy only declines by a few % points (other techniques can be used too) Mid-point research: A few learnings so far (2/3)
- 44. Where it is important to explain an individual’s results properly, we probably need different layers of interpretative software • Model-level interpretation enables us to say which risk factors / features matter in general – useful for industry analysis and giving general overview to people • But this does not explain why a specific prediction is obtained, as the different inputs of an individual will mean different routes through random forest or different parameters being important in a neural network • So you can’t tell someone specifically why they flagged a risk or what specifically they would need to change to alter it • In principle, interpretative software can be created, but we’re not aware of it already • Random forests frequency chart for an individual route • Software that tests slight variations on an individual’s inputs to see which ones change the prediction most; highlighted/colour-coded decision trees etc. • Supplement with descriptive analytics / trends to enable human judgement Mid-point research: A few learnings so far (3/3)
- 45. 1) Even where ML predictions are effective, they rely on a well-defined and visible outcome for the training dataset, but real-world problem gambling manifests in many different outcomes 2) This paper relies on self-exclusion as a well-defined outcome, but this only captures a small group of those potentially with issues, as well as those who also then took action • Only a small set self-exclude (<0.3%) and many do so very early • Some self-exclude for non-problem reasons (punish operator, test system, …) - Hayer and Meyer (2011a,b) found 76% self-excluded for PG reasons 3) Incomplete and imperfect data • Missing data we might wish to have (e.g. financial context, credit scores) • Small sample relative to number of variables and complexity of outcome • Imperfect control group • Incomplete picture of a person’s overall gambling activity (e.g. other sites, venues) • Data from a single mostly-European platform Key limitations
- 46. Session Overview Ideas for the future / Q&A Findings and limitations Agenda Analysis & Results
- 47. • Extract more knowledge from these models with additional techniques • With random forest, explore “information gain by variable” analysis • Apply TREPAN to other ML models / create similar oracle methods • Group variables based on domain knowledge – or start with simpler models and build up • Use larger and richer samples (e.g. OLG) to: • Explore different risk parameters, e.g. trend changes over shorter time periods • Create additional risk factors, e.g. loss chasing, time of day gambling, type of game • Test different combinations of risk factors (e.g. highest score across a set of risk factors) • Model against different outcome variables, e.g. PGSI • Explore application issues with industry • To what extent the interpretability issue matters • What kind of interpretability to prioritise • How best to use models to reduce the risk of harm - How early can risks be identified Ideas for the future
- 48. Having seen the presentation, would you rather have: 1. An algorithm that assesses problematic play that is 90% accurate which you cannot properly understand or explain or 2. An algorithm that assesses problematic play that is 75% accurate which you fully understand or explain? Question (again)
- 49. The Responsible Gaming Analytics Experts Contact Us: Christian.Percy@bet-buddy.com Web www.bet-buddy.com Twitter @Bet_Buddy @Bet_Buddy/team Q&A and Contact Us (more info + case study)
- 50. 1. Open New Horizons app 2. Select the Agenda button 3. Select This Session 4. Select Take Survey at the bottom To provide session feedback: If you are unable to download the app, please raise your hand for a paper version. If you are unable to download the app, please see one of our conference hosts located just outside the room.