SlideShare uma empresa Scribd logo
1 de 21
Intermediate
Regression Topics
  Daniel Gerlanc, Director
    Enplus Advisors Inc
Topics


Abalone Data

Variable Transformation

Simulation for Predictive Inference
http://archive.ics.uci.edu/ml/datasets/Abalone




                   Abalone
Loading the data
>   abalone.path = "~/data/abalone.csv"
>   abalone.cols = c("sex", "length", "diameter", "height", "whole.wt",
+                    "shucked.wt", "viscera.wt", "shell.wt", "rings")
>
>   abalone <- read.csv(abalone.path, sep=",", row.names=NULL,
+                       col.names=abalone.cols)
>   str(abalone)

'data.frame':!
             4177 obs. of 9 variables:
 $ sex       : chr "M" "M" "F" "M" ...
 $ length    : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
 $ diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
 $ height    : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
 $ whole.wt : num 0.514 0.226 0.677 0.516 0.205 ...
 $ shucked.wt: num 0.2245 0.0995 0.2565 0.2155 0.0895 ...
 $ viscera.wt: num 0.101 0.0485 0.1415 0.114 0.0395 ...
 $ shell.wt : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
 $ rings     : int 15 7 9 10 7 8 20 16 9 19 ...
Uses lattice graphics




             Draw pictures
Lattice Plots
> xyplot(jitter(rings) ~ shell.wt | sex, abalone, grid=T, pch=".",
       subset=volume < 0.2,
       panel=function(x, y, ...) {
          panel.lmline(x, y, ...)
          panel.xyplot(x, y, ...)
       },
       ylab="rings")


ggplot2 is a newer package that can be used to create similar plots.
Infant    Adult




   Combine groups
Why Transform?


Interpretability

Additive vs. Multiplicative Form

Prediction
Simple Model
> fit.1 <- lm(rings ~ sex + shell.wt, abalone)

> summary(fit.1)

Call:
lm(formula = rings ~ sex + shell.wt, data = abalone)

Residuals:
   Min     1Q Median      3Q    Max
-5.750 -1.592 -0.535   0.886 15.736

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    6.2423    0.0799   78.08   <2e-16 ***
sex            0.9142    0.0984    9.29   <2e-16 ***
shell.wt      12.8581    0.3300   38.96   <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.5 on 4174 degrees of freedom
Centering with z-scores


 Subtract the mean from each input and
 divide by 1 or 2 standard deviations

 Dummy/Proxy variables may be centered as
 well
Center Values
> abalone.adj <- abalone[, c(outcome, predictors)]
for (i in predictors) {
  abalone.adj[[i]] <-
    (abalone.adj[[i]] - mean(abalone.adj[[i]])) / (2 * sd(abalone.adj[[i]]))
}

Also look into the ‘scale’ function
Why center?


Interpret coefficients in terms of standard
deviations

Gives a sense of variable importance
Interpretability
> fit.1a <- lm(rings ~ sex + shell.wt, abalone.adj)

> summary(fit.1a)

Call:
lm(formula = rings ~ sex + shell.wt, data = abalone.adj)

Residuals:
   Min     1Q Median      3Q    Max
-5.750 -1.592 -0.535   0.886 15.736

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   9.9337     0.0385 258.33    <2e-16 ***
sex           0.8539     0.0919    9.29   <2e-16 ***
shell.wt      3.5798     0.0919   38.96   <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.5 on 4174 degrees of freedom
Multiple R-squared: 0.406,!
                          Adjusted R-squared: 0.406
F-statistic: 1.43e+03 on 2 and 4174 DF, p-value: <2e-16
Two Models
    lm(formula = rings ~ sex + shell.wt, data = abalone)
                coef.est coef.se
    (Intercept) 6.24      0.08
    sex          0.91     0.10
    shell.wt    12.86     0.33
    ---
    n = 4177, k = 3
    residual sd = 2.49, R-Squared = 0.41



    lm(formula = rings ~ sex + shell.wt, data = abalone.adj)
                coef.est coef.se
    (Intercept) 9.93     0.04
    sex         0.85     0.09
    shell.wt    3.58     0.09
    ---
    n = 4177, k = 3
    residual sd = 2.49, R-Squared = 0.41



Smaller difference in SD terms
Why divide by 2 SDs
So binary variables may be interpreted
similarly to continuous variables

e.g., Binary Value of 0, 1 occurring with equal
frequency has an sd of 0.5.
sqrt(0.5 * (1 - 0.5)) = 0.5

(1 - 0.5) / (2 * 0.5) = 0.5    (1 - 0.5) / (2 * 0.5) = +1

(0 - 0.5) / (2 * 0.5) = -0.5   (0 - 0.5) / (2 * 0.5) = -1

-0.5 --> +0.5                  -1 --> +1
                   Diff of 1                  Diff of 2
Prediction
Simulation
Allow for more general inferences

Propagation of uncertainty
Prediction Errors
90% Percentile Adult vs. 50% Infant
    fit.4   <- lm(log(rings) ~ sex + log(shell.wt), abalone)

    large.abalone <- log(quantile(subset(abalone, sex == 1)$shell.wt, 0.90))
    small.infant <- log(median(abalone$shell.wt[abalone$sex == 0]))
    x.a <- sum(c(1, 1, large.abalone) * coef(fit.4))
    x.i <- sum(c(1, 0, small.infant) * coef(fit.4))

    set.seed(1)
    n.sims <- 1000
    pred.a <- exp(rnorm(n.sims, x.a, sigma.hat(fit.4)))
    pred.i <- exp(rnorm(n.sims, x.i, sigma.hat(fit.4)))
    pred.diff <- pred.a - pred.i

    > mean(pred.diff)
    4.5

    > quantile(pred.diff, c(0.025, 0.975))

    2.5% 98%
    -1.9 11.3
Simulation for
      Inferential Uncertainty
 Simulate residual
standard deviation


 Simulate
Inferential Uncertainty
## Create 1000 simulations of the residual standard error and coefficients

fit.5 <- lm(log(rings) ~ sex + shell.wt + sex:shell.wt, abalone)

n.sims      <-   1000
obj         <-   summary(fit.5) # save off the summary object
sigma.hat   <-   obj$sigma
b.hat       <-   obj$coef[, 'Estimate', drop=TRUE]
cov.beta    <-   obj$cov.unscaled # extract the covariance matrix
k           <-   obj$df[1] # number of predictors
n           <-   obj$df[1] + obj$df[2] # number of observations

set.seed(1)
sigma.sim <- sigma.hat * sqrt((n-k) / rchisq(n.sims, n-k))

beta.sim <- matrix(NA_real_, n.sims, k, dimnames=list(NULL, names(beta.hat)))
for (i in seq_len(n.sims)) {
  beta.sim[i, ] <- MASS::mvrnorm(1, b.hat, sigma.sim[i]^2 * cov.beta)
}
Inferential Uncertainty

Mais conteúdo relacionado

Mais procurados

Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programmingNixon Mendez
 
Multi dof modal analysis free
Multi dof modal analysis freeMulti dof modal analysis free
Multi dof modal analysis freeMahdiKarimi29
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180Mahmoud Samir Fayed
 
Manual "The meuse data set"
Manual "The meuse data set"Manual "The meuse data set"
Manual "The meuse data set"MauricioTics2016
 
The Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonThe Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonPaul Hawks
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningMax Kleiner
 

Mais procurados (13)

Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
Multi dof modal analysis free
Multi dof modal analysis freeMulti dof modal analysis free
Multi dof modal analysis free
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
 
Kursus
KursusKursus
Kursus
 
Programação funcional em Python
Programação funcional em PythonProgramação funcional em Python
Programação funcional em Python
 
08 functions
08 functions08 functions
08 functions
 
The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180
 
Manual "The meuse data set"
Manual "The meuse data set"Manual "The meuse data set"
Manual "The meuse data set"
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
The Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonThe Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint Lesson
 
Hanya contoh saja dari xampp
Hanya contoh saja dari xamppHanya contoh saja dari xampp
Hanya contoh saja dari xampp
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
 

Destaque

Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisDetecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisFraudBusters
 
Babok2 Big Picture
Babok2 Big PictureBabok2 Big Picture
Babok2 Big PictureCBAP Master
 
Using Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditUsing Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditFraudBusters
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis TechniquesGajanan Bochare
 
Quick Response Fraud Detection
Quick Response Fraud DetectionQuick Response Fraud Detection
Quick Response Fraud DetectionFraudBusters
 
Think Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterThink Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterFraudBusters
 
Using Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudUsing Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudFraudBusters
 
Faster document review and production
Faster document review and productionFaster document review and production
Faster document review and productionLexbe_Webinars
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...PAPIs.io
 
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions America, Ltd.
 

Destaque (20)

Simplifying stats
Simplifying  statsSimplifying  stats
Simplifying stats
 
ACCOUNTING & AUDITING WITH EXCEL2011
ACCOUNTING & AUDITING WITH EXCEL2011ACCOUNTING & AUDITING WITH EXCEL2011
ACCOUNTING & AUDITING WITH EXCEL2011
 
Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisDetecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
 
Babok2 Big Picture
Babok2 Big PictureBabok2 Big Picture
Babok2 Big Picture
 
Using Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditUsing Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic Audit
 
Go Predictive Analytics
Go Predictive AnalyticsGo Predictive Analytics
Go Predictive Analytics
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis Techniques
 
Quick Response Fraud Detection
Quick Response Fraud DetectionQuick Response Fraud Detection
Quick Response Fraud Detection
 
Think Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterThink Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a Fraudster
 
Using Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudUsing Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay Fraud
 
Faster document review and production
Faster document review and productionFaster document review and production
Faster document review and production
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
 
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
 
Azure BootCamp presentation 2016 v1.1
Azure BootCamp presentation 2016 v1.1Azure BootCamp presentation 2016 v1.1
Azure BootCamp presentation 2016 v1.1
 
High Range Pressure Switches MD Series
High Range Pressure Switches MD SeriesHigh Range Pressure Switches MD Series
High Range Pressure Switches MD Series
 
Contenedores Docker en SUSE: OpenExpo 2016
Contenedores Docker en SUSE: OpenExpo 2016Contenedores Docker en SUSE: OpenExpo 2016
Contenedores Docker en SUSE: OpenExpo 2016
 
Tanveer ACCA Accountant
Tanveer ACCA AccountantTanveer ACCA Accountant
Tanveer ACCA Accountant
 
R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)
 
Pamplet
PampletPamplet
Pamplet
 
Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)
 

Semelhante a Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics

11. Linear Models
11. Linear Models11. Linear Models
11. Linear ModelsFAO
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Dr. Volkan OBAN
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Dr. Volkan OBAN
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics nazlitemu
 
Java Performance Puzzlers
Java Performance PuzzlersJava Performance Puzzlers
Java Performance PuzzlersDoug Hawkins
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIDr. Volkan OBAN
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdfzehiwot hone
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxaulasnilda
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxjeremylockett77
 
Assignment 5.1.pdf
Assignment 5.1.pdfAssignment 5.1.pdf
Assignment 5.1.pdfdash41
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2Kevin Chun-Hsien Hsu
 
Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)asghar123456
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning VMax Kleiner
 
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...ShuaiGao3
 

Semelhante a Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics (20)

11. Linear Models
11. Linear Models11. Linear Models
11. Linear Models
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
 
Input analysis
Input analysisInput analysis
Input analysis
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics
 
Java Performance Puzzlers
Java Performance PuzzlersJava Performance Puzzlers
Java Performance Puzzlers
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part II
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdf
 
hw4analysis
hw4analysishw4analysis
hw4analysis
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 
Assignment 5.1.pdf
Assignment 5.1.pdfAssignment 5.1.pdf
Assignment 5.1.pdf
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
 
Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
 
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
 

Último

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Último (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics

  • 1. Intermediate Regression Topics Daniel Gerlanc, Director Enplus Advisors Inc
  • 4. Loading the data > abalone.path = "~/data/abalone.csv" > abalone.cols = c("sex", "length", "diameter", "height", "whole.wt", + "shucked.wt", "viscera.wt", "shell.wt", "rings") > > abalone <- read.csv(abalone.path, sep=",", row.names=NULL, + col.names=abalone.cols) > str(abalone) 'data.frame':! 4177 obs. of 9 variables: $ sex : chr "M" "M" "F" "M" ... $ length : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ... $ diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ... $ height : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ... $ whole.wt : num 0.514 0.226 0.677 0.516 0.205 ... $ shucked.wt: num 0.2245 0.0995 0.2565 0.2155 0.0895 ... $ viscera.wt: num 0.101 0.0485 0.1415 0.114 0.0395 ... $ shell.wt : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ... $ rings : int 15 7 9 10 7 8 20 16 9 19 ...
  • 5. Uses lattice graphics Draw pictures
  • 6. Lattice Plots > xyplot(jitter(rings) ~ shell.wt | sex, abalone, grid=T, pch=".", subset=volume < 0.2, panel=function(x, y, ...) { panel.lmline(x, y, ...) panel.xyplot(x, y, ...) }, ylab="rings") ggplot2 is a newer package that can be used to create similar plots.
  • 7. Infant Adult Combine groups
  • 8. Why Transform? Interpretability Additive vs. Multiplicative Form Prediction
  • 9. Simple Model > fit.1 <- lm(rings ~ sex + shell.wt, abalone) > summary(fit.1) Call: lm(formula = rings ~ sex + shell.wt, data = abalone) Residuals: Min 1Q Median 3Q Max -5.750 -1.592 -0.535 0.886 15.736 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.2423 0.0799 78.08 <2e-16 *** sex 0.9142 0.0984 9.29 <2e-16 *** shell.wt 12.8581 0.3300 38.96 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.5 on 4174 degrees of freedom
  • 10. Centering with z-scores Subtract the mean from each input and divide by 1 or 2 standard deviations Dummy/Proxy variables may be centered as well
  • 11. Center Values > abalone.adj <- abalone[, c(outcome, predictors)] for (i in predictors) { abalone.adj[[i]] <- (abalone.adj[[i]] - mean(abalone.adj[[i]])) / (2 * sd(abalone.adj[[i]])) } Also look into the ‘scale’ function
  • 12. Why center? Interpret coefficients in terms of standard deviations Gives a sense of variable importance
  • 13. Interpretability > fit.1a <- lm(rings ~ sex + shell.wt, abalone.adj) > summary(fit.1a) Call: lm(formula = rings ~ sex + shell.wt, data = abalone.adj) Residuals: Min 1Q Median 3Q Max -5.750 -1.592 -0.535 0.886 15.736 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.9337 0.0385 258.33 <2e-16 *** sex 0.8539 0.0919 9.29 <2e-16 *** shell.wt 3.5798 0.0919 38.96 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.5 on 4174 degrees of freedom Multiple R-squared: 0.406,! Adjusted R-squared: 0.406 F-statistic: 1.43e+03 on 2 and 4174 DF, p-value: <2e-16
  • 14. Two Models lm(formula = rings ~ sex + shell.wt, data = abalone) coef.est coef.se (Intercept) 6.24 0.08 sex 0.91 0.10 shell.wt 12.86 0.33 --- n = 4177, k = 3 residual sd = 2.49, R-Squared = 0.41 lm(formula = rings ~ sex + shell.wt, data = abalone.adj) coef.est coef.se (Intercept) 9.93 0.04 sex 0.85 0.09 shell.wt 3.58 0.09 --- n = 4177, k = 3 residual sd = 2.49, R-Squared = 0.41 Smaller difference in SD terms
  • 15. Why divide by 2 SDs So binary variables may be interpreted similarly to continuous variables e.g., Binary Value of 0, 1 occurring with equal frequency has an sd of 0.5. sqrt(0.5 * (1 - 0.5)) = 0.5 (1 - 0.5) / (2 * 0.5) = 0.5 (1 - 0.5) / (2 * 0.5) = +1 (0 - 0.5) / (2 * 0.5) = -0.5 (0 - 0.5) / (2 * 0.5) = -1 -0.5 --> +0.5 -1 --> +1 Diff of 1 Diff of 2
  • 17. Simulation Allow for more general inferences Propagation of uncertainty
  • 18. Prediction Errors 90% Percentile Adult vs. 50% Infant fit.4 <- lm(log(rings) ~ sex + log(shell.wt), abalone) large.abalone <- log(quantile(subset(abalone, sex == 1)$shell.wt, 0.90)) small.infant <- log(median(abalone$shell.wt[abalone$sex == 0])) x.a <- sum(c(1, 1, large.abalone) * coef(fit.4)) x.i <- sum(c(1, 0, small.infant) * coef(fit.4)) set.seed(1) n.sims <- 1000 pred.a <- exp(rnorm(n.sims, x.a, sigma.hat(fit.4))) pred.i <- exp(rnorm(n.sims, x.i, sigma.hat(fit.4))) pred.diff <- pred.a - pred.i > mean(pred.diff) 4.5 > quantile(pred.diff, c(0.025, 0.975)) 2.5% 98% -1.9 11.3
  • 19. Simulation for Inferential Uncertainty Simulate residual standard deviation Simulate
  • 20. Inferential Uncertainty ## Create 1000 simulations of the residual standard error and coefficients fit.5 <- lm(log(rings) ~ sex + shell.wt + sex:shell.wt, abalone) n.sims <- 1000 obj <- summary(fit.5) # save off the summary object sigma.hat <- obj$sigma b.hat <- obj$coef[, 'Estimate', drop=TRUE] cov.beta <- obj$cov.unscaled # extract the covariance matrix k <- obj$df[1] # number of predictors n <- obj$df[1] + obj$df[2] # number of observations set.seed(1) sigma.sim <- sigma.hat * sqrt((n-k) / rchisq(n.sims, n-k)) beta.sim <- matrix(NA_real_, n.sims, k, dimnames=list(NULL, names(beta.hat))) for (i in seq_len(n.sims)) { beta.sim[i, ] <- MASS::mvrnorm(1, b.hat, sigma.sim[i]^2 * cov.beta) }

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n