SlideShare uma empresa Scribd logo
1 de 10
NHL: An Expected Goals Model
Testing different modeling techniques to predict the
success of shots in the NHL
Richard Ramsey
12/18/2015
CS 4800 – Fall 2015
Usinga sample of shotstakeninthe NHL from the 2008-09 seasonto the 2014-15 season,Itestedthe
significance of shotdistance,relative angle togoal,andotherfactorsinpredictingwhetherornot the
shotwas scored.I developedtwodifferentformsof models,one alinearregressionmodelthatwith
interactionsbetweendifferentindependentvariables,andthe othera linearmixed-effectsmodel that
treatedshottype as a groupingfactor to generate individual interceptsandcoefficientsforeachshot
type. The resultingmodelsgave anexpectedprobabilityof eachshotgoingin.
INTRODUCTION
Over the past decade, the field of analytics and statistical analysis has grown rapidly in
hockey. One of the most important questions that everyone involved would like to understand is
which shots are most likely to be converted into goals. The fluidity of hockey play makes it
difficult to capture all of the variables that contribute to the game situation in each shot, but
being able to better understand the differences between high-percentage and low-percentage
shots has far-reaching implications for analysis. Some in the NHL community have referred to
this line of analysis as looking at shot quality (Krzywicki). The work done in this field has
largely been done privately for the benefit of NHL teams, who have incentive to not share any
advances they have made.
In 2012, Brian McDonald presented an expected goals model for the NHL at the Sloan
Sports Analytics conference. His model used statistics accumulated over the course of a game,
such as shots, turnovers, and hits, to predict the total number of goals that a team would score
(McDonald). It turned out to have significant predictive power, and could be used to evaluate
both teams and players based on the expected goals. However, this model didn’t examine each
individual shot, which left the question of shot quality open for further analysis. Being able to
better determine the quality of a shot would allow more granularities in analysis of both players
and teams. Using an individual shot-based expected goals model would allow for a better
understanding of which players are better or worse than average at converting each type of shot,
or what teams create or concede high-percentage shots. It could even have implications for
offensive and defensive strategy, if the relationships between shot distance and angle were
different than the conventional wisdom. The hypothesis I generated, based on this, was that shot
distance and relative angle to goal were both significant predictors of shot conversion.
DATA AND METHODS
In order to obtain a large sample of data to work with, I used the nhlscrapr package in R.
This package scrapes NHL play-by-play data from the NHL’s official site, and converts the raw
play-by-play data into a useable data frame in R. The data available through the nhlscrapr
package goes back to the 2002-03 season, but x-coordinate and y-coordinate data only go as far
back as the 2008-09 season. Given the hypothesis and hockey intuition that the distance from and
angle at which a shot is taken will have significant effects on conversion rate, I needed to restrict
my sample to only include shots from the 2008-09 season and later.
Using this sample of data from the 2008-09 to the 2014-15 season, I had 530,053 shots to
work with. However, not all of these shots came in the course of typical game situations, because
the nhlscrapr package just pulls every play-by-play event logged by the NHL. First, I filtered out
any penalty shots and shootout goals, because those are completely different situations from the
natural state of play. Second, I removed empty-net goals, because I felt that these would distort
the predicted conversion rates from different angles and distances, given that there was no goalie
in to stop them. After removing these data, there remained an inconsequential number of shots
without x-coordinates or y-coordinates, which I had to remove.
In the final sample of shots, I created two variables based on the play-by-play data.
Relative angle represented the shooter’s relative angle to the goal on a scale ranging from 0, the
widest possible angle, to 1, straight in-line with the goal. A binary variable pp represented
whether or not the shooter’s team was on a power play at the time, meaning they had at least one
more skater on the ice than the opposing team.
Table 1 – Summary of Sample Data
Shot Type Total Shots Goals
Backhand 41876 4313
Deflected 6778 1415
Slap 113582 6260
Snap 74741 6491
Tip-In 23090 4473
Wrap 6874 382
Wrist 244965 20392
From this sample, I used random sampling to create a build sample and a holdout sample
of shots. The build sample contained 75% of the overall sample, while the holdout sample
contained the remaining 25%. In order to avoid overfitting in the models, I only used the build
sample to train the models, and tested their predictive power on the holdout sample.
I looked at two different forms of predictive models to begin, and refined the formulas of
each of those models through holdout testing. The first was a simple linear regression model,
which estimates a coefficient for the linear relationship between the dependent variable and each
of the predictor variables. The second model was a linear mixed-effects model, which allows
factors in the model to be observed as random variables, rather than be treated as fixed
parameters. This means that the grouping factors specified in the model are trained to have
random intercepts and slopes in relation to other variables. Looking at the type of shot, as
categorized in the scraped data, was an intuitive candidate for such a grouping factor. I tested this
form to examine the relationship between the type of shot and other predictor variables in the
model, so that the other model variables had different effects for each type of shot (Bates et. al).
RESULTS
Using these techniques, I tested out different formulas and relationships between
independent variables for both the linear regression and the linear mixed-effects model.
Table 2 – First stage of the linear regression model, testing significance of distance and relative angle
The first model that I tested was a linear regression using distance and relative angle as
the independent variables, in order to test their significance in predicting shot conversion. Later
iterations of the linear regression model included the shot type.
Table 3 – Final linear regression model
After testing out the inclusion of different interactions between independent variables, as
well as the overall formula, the above model was the result. The log of (1 + shot distance) is
interacted with relative angle, and is interacted with shot type. Adding one to shot distance is
necessary in order to put the log transformation on it. In addition, a binary variable pp indicating
if a team is on the power play is also included, and is significant. There is no log transformation
on shot distance in its interaction with shot type because the relationship varies significantly by
shot type, so the transformation was not necessarily the best representation of said relationship.
Table 4 – Summary of final linear mixed-effects model
The development of the linear mixed-effects model began with the key assumption that
using shot type as a grouping factor was intuitive. Initially, I included shot type as a grouping
factor just for the intercept. Shot type was found to be convergent as a grouping factor for the
intercept and relative angle coefficient, meaning that each shot type had its own intercept and
coefficient when interacted with relative angle.
Table 5 – random effect coefficients of shot type
The interaction between log(1 + shot distance) and relative angle was also included in
this model, as well as the binary pp variable.
DISCUSSION
As hypothesized, the distance and relative angle of the shot are both strong predictors of
shot conversion at a high level of statistical significance. In order to evaluate each of the models,
I initially looked at the significance of the overall model and the coefficients of each variable and
interaction. In order to differentiate between models, however, I tested on the holdout sample.
Table 6 - Gains chart for simple linear regression model
Using the gains package in R, I created a gains chart for each of the prospective models
on the holdout sample, in order to evaluate the predictive power of the models on data they were
not trained on. The gains chart orders data points in the sample by their predicted probability, in
this case the chance that a shot is scored. Depth of file is the percentage of the sample population
(10 is the first 10 percent by this order). Mean response (Mean Resp) is the mean of the binary
variable indicating whether or not the shot was scored. Cumulative percentage of total responses
(Cume Pct of Total Resp) is the percentage of goals captured in the depth of file when ordered by
the model’s probability. For example, 26% of all goals in the holdout sample were in the top
10% of calculated probabilities by the simple linear regression model. Higher cumulative
percentage of total responses in a low depth of file means that the model has more predictive
power.
Table 7 - Gains chart for final linear regression model
The final linear regression model generated a lift over the simple linear regression model
that I began with, especially beyond the top 10 percent of predicted chance of scoring. The
interaction between shot type and shot distance captured an important relationship in terms of the
probability of scoring.
Table 8 - gains chart for the final mixed-effects model
The linear mixed-effects model saw similar gains as the final linear model, though it did
not have as much of a lift over the initial linear model. There is a slight lift over the final linear
regression in the 10th to 20th percentile of predicted probability, but that lift is not sustained over
the whole sample. The mixed-effects model may require more detailed interactions and nested
grouping factors, but the tradeoff in computational complexity and ease of explanation may not
be worth the marginal gains. Both of the final models generated lift over the initial linear
regression, and we can conclude that interactions with shot type and other independent variables
have additional predictive power beyond just looking at shot distance and relative angle.
WORKS CITED
A.C. Thomas and Samuel L. Ventura (2014). nhlscrapr: Compiling the NHL Real Time Scoring
System Database for easy use in R. R package version 1.8.
http://CRAN.R-project.org/package=nhlscrapr
Bates D, Maechler M, Bolker B and Walker S (2014). _lme4: Linear mixed-effects models using
Eigen and S4_. R package version 1.1-7, <URL:
http://CRAN.R-project.org/package=lme4>.
Craig A. Rolling (2013). gains: Gains Table Package. R package version 1.1.
http://CRAN.R-project.org/package=gains
Krzywicki, Ken. "NHL Shot Quality 2009-10." Hockey Analytics. Hockey Analytics, 22 Oct.
2010. Web. 13 Oct. 2015. http://hockeyanalytics.com/2010/10/nhl-shot-quality-2010/
McDonald, Brian. "An Expected Goals Model for Evaluating NHL Teams and Players." Sloan
Sports Analytics Conference. MIT, 3 Mar. 2012. Web. 13 Oct. 2015.
http://www.sloansportsconference.com/wp-content/uploads/2012/02/NHL-Expected-Goals-
Brian-Macdonald.pdf
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-
Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
doi:10.18637/jss.v067.i01.

Mais conteúdo relacionado

Destaque

Tema 4. Forma Y FuncióN Moodle
Tema 4. Forma Y FuncióN MoodleTema 4. Forma Y FuncióN Moodle
Tema 4. Forma Y FuncióN Moodle
fernandoprofe
 
Pagsasanay (dependency equation)
Pagsasanay (dependency equation)Pagsasanay (dependency equation)
Pagsasanay (dependency equation)
ApHUB2013
 
Castlessplendours 城堡風情畫
Castlessplendours 城堡風情畫Castlessplendours 城堡風情畫
Castlessplendours 城堡風情畫
lys167
 

Destaque (16)

resume
resumeresume
resume
 
Tema 4. Forma Y FuncióN Moodle
Tema 4. Forma Y FuncióN MoodleTema 4. Forma Y FuncióN Moodle
Tema 4. Forma Y FuncióN Moodle
 
Day 1
Day 1Day 1
Day 1
 
Modelo
ModeloModelo
Modelo
 
Training Programme
Training ProgrammeTraining Programme
Training Programme
 
Pagsasanay (dependency equation)
Pagsasanay (dependency equation)Pagsasanay (dependency equation)
Pagsasanay (dependency equation)
 
Las principales ciudades de chile
Las principales ciudades de chileLas principales ciudades de chile
Las principales ciudades de chile
 
Castlessplendours 城堡風情畫
Castlessplendours 城堡風情畫Castlessplendours 城堡風情畫
Castlessplendours 城堡風情畫
 
Las principales ciudades de serbia
Las principales ciudades de serbiaLas principales ciudades de serbia
Las principales ciudades de serbia
 
Posiciones en anestesia
Posiciones en anestesiaPosiciones en anestesia
Posiciones en anestesia
 
Advanced composites
Advanced compositesAdvanced composites
Advanced composites
 
Un paseo por los secretos de la localización de videojuegos
Un paseo por los secretos de la localización de videojuegosUn paseo por los secretos de la localización de videojuegos
Un paseo por los secretos de la localización de videojuegos
 
Planeacion marita
Planeacion maritaPlaneacion marita
Planeacion marita
 
Posiciones en anestesia
Posiciones en anestesiaPosiciones en anestesia
Posiciones en anestesia
 
El bueno, el feo y el malo en la localización de videojuegos
El bueno, el feo y el malo en la localización de videojuegosEl bueno, el feo y el malo en la localización de videojuegos
El bueno, el feo y el malo en la localización de videojuegos
 
Biotin
BiotinBiotin
Biotin
 

Semelhante a CS 4800 final research paper

Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
John Michael Croft
 
ProjectWriteupforClass (3)
ProjectWriteupforClass (3)ProjectWriteupforClass (3)
ProjectWriteupforClass (3)
Jeff Lail
 
Capstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final DraftCapstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final Draft
Nick Imholte
 
MathematicsResearch
MathematicsResearchMathematicsResearch
MathematicsResearch
John Crain
 
T20 assignment
T20 assignmentT20 assignment
T20 assignment
mayankvns
 
Statistical Modelling of English Premier League Position
Statistical Modelling of English Premier League PositionStatistical Modelling of English Premier League Position
Statistical Modelling of English Premier League Position
Jack O'Reilly
 
Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)
Eric Choi
 

Semelhante a CS 4800 final research paper (20)

8225 project report (2) (1)
8225 project report (2) (1)8225 project report (2) (1)
8225 project report (2) (1)
 
A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...
A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...
A Statistical Analysis Of Summarization Evaluation Metrics Using Resampling M...
 
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
 
Assessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateAssessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generate
 
Assessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFAssessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIF
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
 
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
 
ESTIMATING R 2 SHRINKAGE IN REGRESSION
ESTIMATING R 2 SHRINKAGE IN REGRESSIONESTIMATING R 2 SHRINKAGE IN REGRESSION
ESTIMATING R 2 SHRINKAGE IN REGRESSION
 
ProjectWriteupforClass (3)
ProjectWriteupforClass (3)ProjectWriteupforClass (3)
ProjectWriteupforClass (3)
 
Capstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final DraftCapstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final Draft
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
 
MathematicsResearch
MathematicsResearchMathematicsResearch
MathematicsResearch
 
Line uo,please
Line uo,pleaseLine uo,please
Line uo,please
 
T20 assignment
T20 assignmentT20 assignment
T20 assignment
 
A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...
A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...
A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...
 
Jz3617271733
Jz3617271733Jz3617271733
Jz3617271733
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Statistical Modelling of English Premier League Position
Statistical Modelling of English Premier League PositionStatistical Modelling of English Premier League Position
Statistical Modelling of English Premier League Position
 
Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)
 

CS 4800 final research paper

  • 1. NHL: An Expected Goals Model Testing different modeling techniques to predict the success of shots in the NHL Richard Ramsey 12/18/2015 CS 4800 – Fall 2015 Usinga sample of shotstakeninthe NHL from the 2008-09 seasonto the 2014-15 season,Itestedthe significance of shotdistance,relative angle togoal,andotherfactorsinpredictingwhetherornot the shotwas scored.I developedtwodifferentformsof models,one alinearregressionmodelthatwith interactionsbetweendifferentindependentvariables,andthe othera linearmixed-effectsmodel that treatedshottype as a groupingfactor to generate individual interceptsandcoefficientsforeachshot type. The resultingmodelsgave anexpectedprobabilityof eachshotgoingin.
  • 2. INTRODUCTION Over the past decade, the field of analytics and statistical analysis has grown rapidly in hockey. One of the most important questions that everyone involved would like to understand is which shots are most likely to be converted into goals. The fluidity of hockey play makes it difficult to capture all of the variables that contribute to the game situation in each shot, but being able to better understand the differences between high-percentage and low-percentage shots has far-reaching implications for analysis. Some in the NHL community have referred to this line of analysis as looking at shot quality (Krzywicki). The work done in this field has largely been done privately for the benefit of NHL teams, who have incentive to not share any advances they have made. In 2012, Brian McDonald presented an expected goals model for the NHL at the Sloan Sports Analytics conference. His model used statistics accumulated over the course of a game, such as shots, turnovers, and hits, to predict the total number of goals that a team would score (McDonald). It turned out to have significant predictive power, and could be used to evaluate both teams and players based on the expected goals. However, this model didn’t examine each individual shot, which left the question of shot quality open for further analysis. Being able to better determine the quality of a shot would allow more granularities in analysis of both players and teams. Using an individual shot-based expected goals model would allow for a better understanding of which players are better or worse than average at converting each type of shot, or what teams create or concede high-percentage shots. It could even have implications for offensive and defensive strategy, if the relationships between shot distance and angle were different than the conventional wisdom. The hypothesis I generated, based on this, was that shot distance and relative angle to goal were both significant predictors of shot conversion.
  • 3. DATA AND METHODS In order to obtain a large sample of data to work with, I used the nhlscrapr package in R. This package scrapes NHL play-by-play data from the NHL’s official site, and converts the raw play-by-play data into a useable data frame in R. The data available through the nhlscrapr package goes back to the 2002-03 season, but x-coordinate and y-coordinate data only go as far back as the 2008-09 season. Given the hypothesis and hockey intuition that the distance from and angle at which a shot is taken will have significant effects on conversion rate, I needed to restrict my sample to only include shots from the 2008-09 season and later. Using this sample of data from the 2008-09 to the 2014-15 season, I had 530,053 shots to work with. However, not all of these shots came in the course of typical game situations, because the nhlscrapr package just pulls every play-by-play event logged by the NHL. First, I filtered out any penalty shots and shootout goals, because those are completely different situations from the natural state of play. Second, I removed empty-net goals, because I felt that these would distort the predicted conversion rates from different angles and distances, given that there was no goalie in to stop them. After removing these data, there remained an inconsequential number of shots without x-coordinates or y-coordinates, which I had to remove. In the final sample of shots, I created two variables based on the play-by-play data. Relative angle represented the shooter’s relative angle to the goal on a scale ranging from 0, the widest possible angle, to 1, straight in-line with the goal. A binary variable pp represented whether or not the shooter’s team was on a power play at the time, meaning they had at least one more skater on the ice than the opposing team.
  • 4. Table 1 – Summary of Sample Data Shot Type Total Shots Goals Backhand 41876 4313 Deflected 6778 1415 Slap 113582 6260 Snap 74741 6491 Tip-In 23090 4473 Wrap 6874 382 Wrist 244965 20392 From this sample, I used random sampling to create a build sample and a holdout sample of shots. The build sample contained 75% of the overall sample, while the holdout sample contained the remaining 25%. In order to avoid overfitting in the models, I only used the build sample to train the models, and tested their predictive power on the holdout sample. I looked at two different forms of predictive models to begin, and refined the formulas of each of those models through holdout testing. The first was a simple linear regression model, which estimates a coefficient for the linear relationship between the dependent variable and each of the predictor variables. The second model was a linear mixed-effects model, which allows factors in the model to be observed as random variables, rather than be treated as fixed parameters. This means that the grouping factors specified in the model are trained to have random intercepts and slopes in relation to other variables. Looking at the type of shot, as categorized in the scraped data, was an intuitive candidate for such a grouping factor. I tested this form to examine the relationship between the type of shot and other predictor variables in the model, so that the other model variables had different effects for each type of shot (Bates et. al).
  • 5. RESULTS Using these techniques, I tested out different formulas and relationships between independent variables for both the linear regression and the linear mixed-effects model. Table 2 – First stage of the linear regression model, testing significance of distance and relative angle The first model that I tested was a linear regression using distance and relative angle as the independent variables, in order to test their significance in predicting shot conversion. Later iterations of the linear regression model included the shot type. Table 3 – Final linear regression model
  • 6. After testing out the inclusion of different interactions between independent variables, as well as the overall formula, the above model was the result. The log of (1 + shot distance) is interacted with relative angle, and is interacted with shot type. Adding one to shot distance is necessary in order to put the log transformation on it. In addition, a binary variable pp indicating if a team is on the power play is also included, and is significant. There is no log transformation on shot distance in its interaction with shot type because the relationship varies significantly by shot type, so the transformation was not necessarily the best representation of said relationship. Table 4 – Summary of final linear mixed-effects model The development of the linear mixed-effects model began with the key assumption that using shot type as a grouping factor was intuitive. Initially, I included shot type as a grouping factor just for the intercept. Shot type was found to be convergent as a grouping factor for the
  • 7. intercept and relative angle coefficient, meaning that each shot type had its own intercept and coefficient when interacted with relative angle. Table 5 – random effect coefficients of shot type The interaction between log(1 + shot distance) and relative angle was also included in this model, as well as the binary pp variable. DISCUSSION As hypothesized, the distance and relative angle of the shot are both strong predictors of shot conversion at a high level of statistical significance. In order to evaluate each of the models, I initially looked at the significance of the overall model and the coefficients of each variable and interaction. In order to differentiate between models, however, I tested on the holdout sample. Table 6 - Gains chart for simple linear regression model
  • 8. Using the gains package in R, I created a gains chart for each of the prospective models on the holdout sample, in order to evaluate the predictive power of the models on data they were not trained on. The gains chart orders data points in the sample by their predicted probability, in this case the chance that a shot is scored. Depth of file is the percentage of the sample population (10 is the first 10 percent by this order). Mean response (Mean Resp) is the mean of the binary variable indicating whether or not the shot was scored. Cumulative percentage of total responses (Cume Pct of Total Resp) is the percentage of goals captured in the depth of file when ordered by the model’s probability. For example, 26% of all goals in the holdout sample were in the top 10% of calculated probabilities by the simple linear regression model. Higher cumulative percentage of total responses in a low depth of file means that the model has more predictive power. Table 7 - Gains chart for final linear regression model The final linear regression model generated a lift over the simple linear regression model that I began with, especially beyond the top 10 percent of predicted chance of scoring. The interaction between shot type and shot distance captured an important relationship in terms of the probability of scoring.
  • 9. Table 8 - gains chart for the final mixed-effects model The linear mixed-effects model saw similar gains as the final linear model, though it did not have as much of a lift over the initial linear model. There is a slight lift over the final linear regression in the 10th to 20th percentile of predicted probability, but that lift is not sustained over the whole sample. The mixed-effects model may require more detailed interactions and nested grouping factors, but the tradeoff in computational complexity and ease of explanation may not be worth the marginal gains. Both of the final models generated lift over the initial linear regression, and we can conclude that interactions with shot type and other independent variables have additional predictive power beyond just looking at shot distance and relative angle.
  • 10. WORKS CITED A.C. Thomas and Samuel L. Ventura (2014). nhlscrapr: Compiling the NHL Real Time Scoring System Database for easy use in R. R package version 1.8. http://CRAN.R-project.org/package=nhlscrapr Bates D, Maechler M, Bolker B and Walker S (2014). _lme4: Linear mixed-effects models using Eigen and S4_. R package version 1.1-7, <URL: http://CRAN.R-project.org/package=lme4>. Craig A. Rolling (2013). gains: Gains Table Package. R package version 1.1. http://CRAN.R-project.org/package=gains Krzywicki, Ken. "NHL Shot Quality 2009-10." Hockey Analytics. Hockey Analytics, 22 Oct. 2010. Web. 13 Oct. 2015. http://hockeyanalytics.com/2010/10/nhl-shot-quality-2010/ McDonald, Brian. "An Expected Goals Model for Evaluating NHL Teams and Players." Sloan Sports Analytics Conference. MIT, 3 Mar. 2012. Web. 13 Oct. 2015. http://www.sloansportsconference.com/wp-content/uploads/2012/02/NHL-Expected-Goals- Brian-Macdonald.pdf Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed- Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.