SlideShare a Scribd company logo
1 of 11
Download to read offline
Regression
APAM E4990
Modeling Social Data
Jake Hofman
Columbia University
February 24, 2017
Jake Hofman (Columbia University) Regression February 24, 2017 1 / 6
Definition
?
Jake Hofman (Columbia University) Regression February 24, 2017 2 / 6
Definition
Jake Hofman (Columbia University) Regression February 24, 2017 2 / 6
Definition
“The primary goal in a regression analysis is to
understand, as far as possible with the available data,
how the conditional distribution of the response varies
across subpopulations determined by the possible
values of the predictor or predictors.”
-“Applied Regression Including Computing and Graphics”
Cook & Weisberg (1999)
Jake Hofman (Columbia University) Regression February 24, 2017 2 / 6
Goals
Describe
Provide a compact summary of outcomes under different conditions
Predict
Make forecasts for future outcomes or unobserved conditions
Explain
Account for associations between predictors and outcomes
Jake Hofman (Columbia University) Regression February 24, 2017 3 / 6
Goals
Describe
Provide a compact summary of outcomes under different conditions
Never “false”, but may be wasteful or misleading
Predict
Make forecasts for future outcomes or unobserved conditions
Varying degrees of success, often room for improvement
Explain
Account for associations between predictors and outcomes
Difficult to establish causality in observational studies
See “Regression Analysis: A Constructive Critique”, Berk (2004)
Jake Hofman (Columbia University) Regression February 24, 2017 3 / 6
Goals
Models should be flexible enough to describe observed phenomena
but simple enough to generalize to future observations
Jake Hofman (Columbia University) Regression February 24, 2017 4 / 6
Examples1
1.2 Setting the Regression Context 3
Should one be especially interested in a comparison of the means, one could
roceed descriptively with a conventional least squares regression analysis as
special case. That is, for each observation i, one could let
ˆyi = β0 + β1xi, (1.1)
here the response variable y is each applicant’s SAT score, x is an indicator
Fig. 1.2. Distribution of SAT scores for Asian applicants.
SAT Scores for Asian Applicants
SAT Score
Frequency
600 800 1000 1200 1400 1600
050100150
of some response y varies across subpopulations determined by the po
values of the predictor or predictors” (Cook and Weisberg, 1999: 27).
is, interest centers on the distribution of the response variable Y conditi
on one or more predictors X.
This definition includes a wide variety of elementary procedures e
implemented in R. (See, for example, Maindonald and Braun, 2007: Ch
2.) For example, consider Figures 1.1 and 1.2. The first shows the distrib
of SAT scores for recent applicants to a major university, who self-ide
as “Hispanic.” The second shows the distribution of SAT scores for r
applicants to that same university, who self-identify as “Asian.”
Fig. 1.1. Distribution of SAT scores for Hispanic applicants.
It is clear that the two distributions differ substantially. The Asian
tribution is shifted to the right, leading to a distribution with a higher
(1227 compared to 1072), a smaller standard deviation (170 compared to
and greater skewing. A comparative description of the two histograms
constitutes a proper regression analysis. Using various summary stati
some key features of the two displays are compared and contrasted (
1
“Statistical Learning from a Regression Perspective”, Berk (2008)
Jake Hofman (Columbia University) Regression February 24, 2017 5 / 6
Examples1
aph more legible.
2e+04 4e+04 6e+04 8e+04 1e+05
8001000120014001600
SAT Score by Household Income
Income Bounded at $100,000
SATScore
Fig. 1.4. SAT scores by family income.1
“Statistical Learning from a Regression Perspective”, Berk (2008)
Jake Hofman (Columbia University) Regression February 24, 2017 5 / 6
Examples1
6 1 Regression Framework
1234
400 600 800 1000 1200 1400 1600
400 600 800 1000 1200 1400 1600 400 600 800 1000 1200 1400 1600
1234
FreshmanGPA
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
High School GPA
Fig. 1.5. Freshman GPA on SAT holding high school GPA constant.1
“Statistical Learning from a Regression Perspective”, Berk (2008)
Jake Hofman (Columbia University) Regression February 24, 2017 5 / 6
Framework
• Specify the outcome and predictors, along with the form of
the model relating them
• Define a loss function that quantifies how close a model’s
predictions are to observed outcomes
• Develop an algorithm to fit the model to the observations by
minimizing this loss
• Assess model performance and interpret results.
Jake Hofman (Columbia University) Regression February 24, 2017 6 / 6

More Related Content

What's hot

Hierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationHierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationColleen Farrelly
 
Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsColleen Farrelly
 
1.2 Data Classification
1.2 Data Classification1.2 Data Classification
1.2 Data Classificationleblance
 
Strong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional DataStrong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional Datasahirbhatnagar
 
1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempola1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempolaRising Media, Inc.
 
Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Peter Kenny
 
Elementary Statistics Picturing the World ch01.1
Elementary Statistics Picturing the World ch01.1Elementary Statistics Picturing the World ch01.1
Elementary Statistics Picturing the World ch01.1Debra Wallace
 
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.PptDetecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.Pptbarthriley
 
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...Laurens De Vocht
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysisWansuklangk
 
Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.DrBarada Mohanty
 
Fitting and understanding Multilevel Models-Andrew Gelman
 Fitting and understanding Multilevel Models-Andrew Gelman  Fitting and understanding Multilevel Models-Andrew Gelman
Fitting and understanding Multilevel Models-Andrew Gelman Deepak Kumar
 
Statistical Methods to Handle Missing Data
Statistical Methods to Handle Missing DataStatistical Methods to Handle Missing Data
Statistical Methods to Handle Missing DataTianfan Song
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lectureShreyas S K
 
Detecting Attributes and Covariates Interaction in Discrete Choice Model
Detecting Attributes and Covariates Interaction in Discrete Choice ModelDetecting Attributes and Covariates Interaction in Discrete Choice Model
Detecting Attributes and Covariates Interaction in Discrete Choice Modelkosby2000
 

What's hot (20)

Mixed models
Mixed modelsMixed models
Mixed models
 
Hierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationHierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validation
 
Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerations
 
1.2 Data Classification
1.2 Data Classification1.2 Data Classification
1.2 Data Classification
 
Strong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional DataStrong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional Data
 
1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempola1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempola
 
Combined queries
Combined queriesCombined queries
Combined queries
 
Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design
 
A1 m deon
A1 m deonA1 m deon
A1 m deon
 
Elementary Statistics Picturing the World ch01.1
Elementary Statistics Picturing the World ch01.1Elementary Statistics Picturing the World ch01.1
Elementary Statistics Picturing the World ch01.1
 
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.PptDetecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
 
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.
 
Multicollinearity
MulticollinearityMulticollinearity
Multicollinearity
 
Fitting and understanding Multilevel Models-Andrew Gelman
 Fitting and understanding Multilevel Models-Andrew Gelman  Fitting and understanding Multilevel Models-Andrew Gelman
Fitting and understanding Multilevel Models-Andrew Gelman
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Statistical Methods to Handle Missing Data
Statistical Methods to Handle Missing DataStatistical Methods to Handle Missing Data
Statistical Methods to Handle Missing Data
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 
Detecting Attributes and Covariates Interaction in Discrete Choice Model
Detecting Attributes and Covariates Interaction in Discrete Choice ModelDetecting Attributes and Covariates Interaction in Discrete Choice Model
Detecting Attributes and Covariates Interaction in Discrete Choice Model
 

Viewers also liked

Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experimentsjakehofman
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIjakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regressionjakehofman
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part Ijakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classificationjakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wranglingjakehofman
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIjakehofman
 
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part IComputational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part Ijakehofman
 
Computational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part IComputational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part Ijakehofman
 
Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIjakehofman
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Countingjakehofman
 
Chapter 10 be wto_agriculture
Chapter 10 be wto_agricultureChapter 10 be wto_agriculture
Chapter 10 be wto_agricultureHo Cao Viet
 
Capoeira power point- IKT
Capoeira power point- IKTCapoeira power point- IKT
Capoeira power point- IKTlanag
 

Viewers also liked (20)

Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part II
 
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part IComputational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part I
 
Computational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part IComputational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part I
 
Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part II
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Counting
 
Chapter 10 be wto_agriculture
Chapter 10 be wto_agricultureChapter 10 be wto_agriculture
Chapter 10 be wto_agriculture
 
Capoeira power point- IKT
Capoeira power point- IKTCapoeira power point- IKT
Capoeira power point- IKT
 
Death2
Death2Death2
Death2
 
Bx Lite
Bx LiteBx Lite
Bx Lite
 
La ecologia social de hoy
La ecologia social de hoyLa ecologia social de hoy
La ecologia social de hoy
 

Similar to Modeling Social Data, Lecture 6: Regression, Part 1

Module5.slp
Module5.slpModule5.slp
Module5.slpGimylin
 
Module5.slp
Module5.slpModule5.slp
Module5.slpGimylin
 
NON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantNON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantPRAJAKTASAWANT33
 
Module5.slp
Module5.slpModule5.slp
Module5.slpGimylin
 
Correlation: Bivariate Data and Scatter Plot
Correlation: Bivariate Data and Scatter PlotCorrelation: Bivariate Data and Scatter Plot
Correlation: Bivariate Data and Scatter PlotDenzelMontuya1
 
Assignment 1case study 6.1.jpgAssignment 1case study 6.1-1.docx
Assignment 1case study 6.1.jpgAssignment 1case study 6.1-1.docxAssignment 1case study 6.1.jpgAssignment 1case study 6.1-1.docx
Assignment 1case study 6.1.jpgAssignment 1case study 6.1-1.docxdeanmtaylor1545
 
Evaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayEvaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayCrystal Alvarez
 
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...inventionjournals
 
1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docx1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docxjackiewalcutt
 
Chapter6Chapter Guides.pdfIBM SPSS for Introductory Sta.docx
Chapter6Chapter Guides.pdfIBM SPSS for Introductory Sta.docxChapter6Chapter Guides.pdfIBM SPSS for Introductory Sta.docx
Chapter6Chapter Guides.pdfIBM SPSS for Introductory Sta.docxtiffanyd4
 
Add slides
Add slidesAdd slides
Add slidesRupa D
 
Data Description-Numerical Measure-Chap003 2 2.ppt
Data Description-Numerical Measure-Chap003 2 2.pptData Description-Numerical Measure-Chap003 2 2.ppt
Data Description-Numerical Measure-Chap003 2 2.pptArkoKesha
 
Running head DATA ANALYSIS1DATA ANALYSIS 7Dat.docx
Running head DATA ANALYSIS1DATA ANALYSIS 7Dat.docxRunning head DATA ANALYSIS1DATA ANALYSIS 7Dat.docx
Running head DATA ANALYSIS1DATA ANALYSIS 7Dat.docxhealdkathaleen
 
A. Section 7.1 Use Table 7-7 to complete t.docx
A. Section 7.1 Use Table 7-7 to complete t.docxA. Section 7.1 Use Table 7-7 to complete t.docx
A. Section 7.1 Use Table 7-7 to complete t.docxdaniahendric
 
A. Section 7.1 Use Table 7-7 to complete t.docx
A. Section 7.1 Use Table 7-7 to complete t.docxA. Section 7.1 Use Table 7-7 to complete t.docx
A. Section 7.1 Use Table 7-7 to complete t.docxSALU18
 
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Nicha Tatsaneeyapan
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression Dr Athar Khan
 

Similar to Modeling Social Data, Lecture 6: Regression, Part 1 (20)

Ekonometrika
EkonometrikaEkonometrika
Ekonometrika
 
Module5.slp
Module5.slpModule5.slp
Module5.slp
 
Module5.slp
Module5.slpModule5.slp
Module5.slp
 
NON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantNON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta Sawant
 
Module5.slp
Module5.slpModule5.slp
Module5.slp
 
Correlation: Bivariate Data and Scatter Plot
Correlation: Bivariate Data and Scatter PlotCorrelation: Bivariate Data and Scatter Plot
Correlation: Bivariate Data and Scatter Plot
 
exercises.pdf
exercises.pdfexercises.pdf
exercises.pdf
 
Assignment 1case study 6.1.jpgAssignment 1case study 6.1-1.docx
Assignment 1case study 6.1.jpgAssignment 1case study 6.1-1.docxAssignment 1case study 6.1.jpgAssignment 1case study 6.1-1.docx
Assignment 1case study 6.1.jpgAssignment 1case study 6.1-1.docx
 
Evaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayEvaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis Essay
 
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
 
1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docx1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docx
 
Chapter6Chapter Guides.pdfIBM SPSS for Introductory Sta.docx
Chapter6Chapter Guides.pdfIBM SPSS for Introductory Sta.docxChapter6Chapter Guides.pdfIBM SPSS for Introductory Sta.docx
Chapter6Chapter Guides.pdfIBM SPSS for Introductory Sta.docx
 
Add slides
Add slidesAdd slides
Add slides
 
Data Description-Numerical Measure-Chap003 2 2.ppt
Data Description-Numerical Measure-Chap003 2 2.pptData Description-Numerical Measure-Chap003 2 2.ppt
Data Description-Numerical Measure-Chap003 2 2.ppt
 
Running head DATA ANALYSIS1DATA ANALYSIS 7Dat.docx
Running head DATA ANALYSIS1DATA ANALYSIS 7Dat.docxRunning head DATA ANALYSIS1DATA ANALYSIS 7Dat.docx
Running head DATA ANALYSIS1DATA ANALYSIS 7Dat.docx
 
A. Section 7.1 Use Table 7-7 to complete t.docx
A. Section 7.1 Use Table 7-7 to complete t.docxA. Section 7.1 Use Table 7-7 to complete t.docx
A. Section 7.1 Use Table 7-7 to complete t.docx
 
A. Section 7.1 Use Table 7-7 to complete t.docx
A. Section 7.1 Use Table 7-7 to complete t.docxA. Section 7.1 Use Table 7-7 to complete t.docx
A. Section 7.1 Use Table 7-7 to complete t.docx
 
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
 
Correlation.pptx
Correlation.pptxCorrelation.pptx
Correlation.pptx
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression
 

More from jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10jakehofman
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09jakehofman
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brainjakehofman
 

More from jakehofman (14)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brain
 

Recently uploaded

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 

Recently uploaded (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 

Modeling Social Data, Lecture 6: Regression, Part 1

  • 1. Regression APAM E4990 Modeling Social Data Jake Hofman Columbia University February 24, 2017 Jake Hofman (Columbia University) Regression February 24, 2017 1 / 6
  • 2. Definition ? Jake Hofman (Columbia University) Regression February 24, 2017 2 / 6
  • 3. Definition Jake Hofman (Columbia University) Regression February 24, 2017 2 / 6
  • 4. Definition “The primary goal in a regression analysis is to understand, as far as possible with the available data, how the conditional distribution of the response varies across subpopulations determined by the possible values of the predictor or predictors.” -“Applied Regression Including Computing and Graphics” Cook & Weisberg (1999) Jake Hofman (Columbia University) Regression February 24, 2017 2 / 6
  • 5. Goals Describe Provide a compact summary of outcomes under different conditions Predict Make forecasts for future outcomes or unobserved conditions Explain Account for associations between predictors and outcomes Jake Hofman (Columbia University) Regression February 24, 2017 3 / 6
  • 6. Goals Describe Provide a compact summary of outcomes under different conditions Never “false”, but may be wasteful or misleading Predict Make forecasts for future outcomes or unobserved conditions Varying degrees of success, often room for improvement Explain Account for associations between predictors and outcomes Difficult to establish causality in observational studies See “Regression Analysis: A Constructive Critique”, Berk (2004) Jake Hofman (Columbia University) Regression February 24, 2017 3 / 6
  • 7. Goals Models should be flexible enough to describe observed phenomena but simple enough to generalize to future observations Jake Hofman (Columbia University) Regression February 24, 2017 4 / 6
  • 8. Examples1 1.2 Setting the Regression Context 3 Should one be especially interested in a comparison of the means, one could roceed descriptively with a conventional least squares regression analysis as special case. That is, for each observation i, one could let ˆyi = β0 + β1xi, (1.1) here the response variable y is each applicant’s SAT score, x is an indicator Fig. 1.2. Distribution of SAT scores for Asian applicants. SAT Scores for Asian Applicants SAT Score Frequency 600 800 1000 1200 1400 1600 050100150 of some response y varies across subpopulations determined by the po values of the predictor or predictors” (Cook and Weisberg, 1999: 27). is, interest centers on the distribution of the response variable Y conditi on one or more predictors X. This definition includes a wide variety of elementary procedures e implemented in R. (See, for example, Maindonald and Braun, 2007: Ch 2.) For example, consider Figures 1.1 and 1.2. The first shows the distrib of SAT scores for recent applicants to a major university, who self-ide as “Hispanic.” The second shows the distribution of SAT scores for r applicants to that same university, who self-identify as “Asian.” Fig. 1.1. Distribution of SAT scores for Hispanic applicants. It is clear that the two distributions differ substantially. The Asian tribution is shifted to the right, leading to a distribution with a higher (1227 compared to 1072), a smaller standard deviation (170 compared to and greater skewing. A comparative description of the two histograms constitutes a proper regression analysis. Using various summary stati some key features of the two displays are compared and contrasted ( 1 “Statistical Learning from a Regression Perspective”, Berk (2008) Jake Hofman (Columbia University) Regression February 24, 2017 5 / 6
  • 9. Examples1 aph more legible. 2e+04 4e+04 6e+04 8e+04 1e+05 8001000120014001600 SAT Score by Household Income Income Bounded at $100,000 SATScore Fig. 1.4. SAT scores by family income.1 “Statistical Learning from a Regression Perspective”, Berk (2008) Jake Hofman (Columbia University) Regression February 24, 2017 5 / 6
  • 10. Examples1 6 1 Regression Framework 1234 400 600 800 1000 1200 1400 1600 400 600 800 1000 1200 1400 1600 400 600 800 1000 1200 1400 1600 1234 FreshmanGPA 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 High School GPA Fig. 1.5. Freshman GPA on SAT holding high school GPA constant.1 “Statistical Learning from a Regression Perspective”, Berk (2008) Jake Hofman (Columbia University) Regression February 24, 2017 5 / 6
  • 11. Framework • Specify the outcome and predictors, along with the form of the model relating them • Define a loss function that quantifies how close a model’s predictions are to observed outcomes • Develop an algorithm to fit the model to the observations by minimizing this loss • Assess model performance and interpret results. Jake Hofman (Columbia University) Regression February 24, 2017 6 / 6