SlideShare uma empresa Scribd logo
1 de 9
Predicting Prices in the Iowa Housing
Market (Regularized Linear
Regression)
Erik Bebernes
Introduction
This project asks a common question in the field of predictive analytics…what are houses worth?
Identifying the true price of a home is important in preventing a housing bubble, such as the one
that plagued our country in 2008 that ultimately lead to a recession. The data I’m using comes
from Kaggle, and looks specifically at houses in Ames, Iowa. There are 81 variables, with the
control being “Sale Price.” I worked on a problem similar to this as an undergraduate student in
an econometrics class, and although I really enjoyed it, I hadn’t the slightest clue what I was
doing. Now that I am more knowledgeable when it comes to multi-regression analysis I should
be able to come up with some fairly accurate predictions. Before I begin, here is a look at the 81
variables I’ll be working with.
My plan of attack on this project is as follows:
1.) Identify any missing data (both missing at random and not at random) and impute new
data accordingly.
2.) Remove any outliers to reduce model complexity and avoid overfitting.
3.) Run a multi-regression model, using backward selection until the p-value for model as a
whole is below .05.
4.) Try a “regularized” linear model.
Identifying Missing Data and Cleaning It
The first thing I like to do in a lot of my projects is to run a “missmap” on the datasets to see how
much of the data is NA.
A handful of variables are nearly completely missing, let’s see what they are and why.
The variables with all of the missing values are “Alley” (type of alley access), “PoolQC” (pool
quality), “FireplaceQu” (fireplace quality), “Fence” (fence quality), “Lot Frontage” (linear feet
of street connected to property) and “MiscFeature” (miscellaneous feature not covered in other
categories). The descriptions of these variables make it obvious that the data is not missing at
random, because they are conditional to whether or not the house has that feature to begin with.
This can be said for all of these variables. Look at all of the missing variables related to garages
and basements…these are the houses that don’t have garages and basements. It’s also worth
noting that the amount of NA’s is equal across similar categories (i.e., all of the garage variables
have 81 missing values). There is an easy fix for this. I’m going to replace all NA’s for factor
variables with “none” and NA’s for all numeric variables with 0.
Removing Outliers
By making scatterplots of the numeric variables against Sale Price I’ll be able to identify any
outliers and remove them from the dataset. This will simplify the model and reduce any
overfitting when it comes to making the prediction.
Multi-Regression Model
In developing my linear model, I used a backward selection method, where I started by including
all of the independent variables and gradually the insignificant ones (where there was a p-value
greater than .05).
The adjusted R-squared (accounts for more error due to an abundance of variables) is .8718,
meaning 87% of the error in the dataset can be explained by the model. The model as a whole
has a p-value of < 2.2e-16, making it significant. Time to make my prediction and see how it
stands up on the kaggle rankings.
After submitting my prediction, I was only in the 13th percentile of most accurate. This is due to
the fact that there is a high variable to observation ratio, leading to overfitting. To account for
this I will attempt to make a “regularized” linear model using the caret package in R, but in order
to do so I need to convert factors of two into dummy variables.
Regularized Linear Model
Regularizing my model greatly improved my accuracy (now I’m in the 59th percentile on
kaggle’s leaderboard).

Mais conteúdo relacionado

Mais procurados

Exploring data stemplot
Exploring data   stemplotExploring data   stemplot
Exploring data stemplot
Ulster BOCES
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
Saleesh Satheeshchandran
 
FormalWriteupTornado_1
FormalWriteupTornado_1FormalWriteupTornado_1
FormalWriteupTornado_1
Katie Harvey
 
ENRON EMAIL TEXT ANALYTICS
ENRON EMAIL TEXT ANALYTICSENRON EMAIL TEXT ANALYTICS
ENRON EMAIL TEXT ANALYTICS
Radhika Kini
 

Mais procurados (13)

How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
Tut6 structural breaks
Tut6 structural breaksTut6 structural breaks
Tut6 structural breaks
 
Risk severity level extraction
Risk severity level extractionRisk severity level extraction
Risk severity level extraction
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
Exploring data stemplot
Exploring data   stemplotExploring data   stemplot
Exploring data stemplot
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Predicting house price
Predicting house pricePredicting house price
Predicting house price
 
FormalWriteupTornado_1
FormalWriteupTornado_1FormalWriteupTornado_1
FormalWriteupTornado_1
 
Employee mode of commuting
Employee mode of commutingEmployee mode of commuting
Employee mode of commuting
 
Predicting house prices_Regression
Predicting house prices_RegressionPredicting house prices_Regression
Predicting house prices_Regression
 
The Advance Spreadsheet Skills
The Advance Spreadsheet SkillsThe Advance Spreadsheet Skills
The Advance Spreadsheet Skills
 
Factors affecting customer satisfaction
Factors affecting customer satisfactionFactors affecting customer satisfaction
Factors affecting customer satisfaction
 
ENRON EMAIL TEXT ANALYTICS
ENRON EMAIL TEXT ANALYTICSENRON EMAIL TEXT ANALYTICS
ENRON EMAIL TEXT ANALYTICS
 

Destaque

(Programa da disciplina administração financeira e orçamentária (uema ))
(Programa da disciplina administração financeira e orçamentária  (uema ))(Programa da disciplina administração financeira e orçamentária  (uema ))
(Programa da disciplina administração financeira e orçamentária (uema ))
flavioxconsult
 

Destaque (20)

Rabdomiólise Ruptura do tecido muscular que libera uma proteína nociva no san...
Rabdomiólise Ruptura do tecido muscular que libera uma proteína nociva no san...Rabdomiólise Ruptura do tecido muscular que libera uma proteína nociva no san...
Rabdomiólise Ruptura do tecido muscular que libera uma proteína nociva no san...
 
3Com 3C562-MODEM-CABLE
3Com 3C562-MODEM-CABLE3Com 3C562-MODEM-CABLE
3Com 3C562-MODEM-CABLE
 
Apresentacao SB CLUB
Apresentacao SB CLUB Apresentacao SB CLUB
Apresentacao SB CLUB
 
5 técnicas para acelerar o desenvolvimento e reduzir Bugs em aplicações WEB
5 técnicas para acelerar o desenvolvimento e reduzir Bugs em aplicações WEB5 técnicas para acelerar o desenvolvimento e reduzir Bugs em aplicações WEB
5 técnicas para acelerar o desenvolvimento e reduzir Bugs em aplicações WEB
 
Introduction to TensorFlow, by Machine Learning at Berkeley
Introduction to TensorFlow, by Machine Learning at BerkeleyIntroduction to TensorFlow, by Machine Learning at Berkeley
Introduction to TensorFlow, by Machine Learning at Berkeley
 
Educación Ambiental Unidad I Tema 1
Educación Ambiental Unidad I   Tema 1Educación Ambiental Unidad I   Tema 1
Educación Ambiental Unidad I Tema 1
 
La phonologie
La phonologieLa phonologie
La phonologie
 
La mediacion
La mediacionLa mediacion
La mediacion
 
Presentation &quot;eleven&quot;
Presentation &quot;eleven&quot;Presentation &quot;eleven&quot;
Presentation &quot;eleven&quot;
 
Movimentos sociais na Republica Oligárquica
Movimentos sociais na Republica Oligárquica Movimentos sociais na Republica Oligárquica
Movimentos sociais na Republica Oligárquica
 
Pasos para quemar un cd o dvd
Pasos para quemar un cd o dvdPasos para quemar un cd o dvd
Pasos para quemar un cd o dvd
 
Punto y línea sobre el plano
Punto y línea sobre el planoPunto y línea sobre el plano
Punto y línea sobre el plano
 
Validadores digitales
Validadores digitalesValidadores digitales
Validadores digitales
 
From Boardroom to C-Suite: Why Would a Company Pick a Current Director as CEO?
From Boardroom to C-Suite: Why Would a Company Pick a Current Director as CEO? From Boardroom to C-Suite: Why Would a Company Pick a Current Director as CEO?
From Boardroom to C-Suite: Why Would a Company Pick a Current Director as CEO?
 
Agentes infecciosos asociados a cancer
Agentes infecciosos asociados a  cancer Agentes infecciosos asociados a  cancer
Agentes infecciosos asociados a cancer
 
CCI BSA-M65-19R010-02
CCI BSA-M65-19R010-02CCI BSA-M65-19R010-02
CCI BSA-M65-19R010-02
 
Quintel QS6658-2C
Quintel QS6658-2CQuintel QS6658-2C
Quintel QS6658-2C
 
(Programa da disciplina administração financeira e orçamentária (uema ))
(Programa da disciplina administração financeira e orçamentária  (uema ))(Programa da disciplina administração financeira e orçamentária  (uema ))
(Programa da disciplina administração financeira e orçamentária (uema ))
 
Informatica
InformaticaInformatica
Informatica
 
Arquitectura colonial
Arquitectura colonialArquitectura colonial
Arquitectura colonial
 

Semelhante a Housing prices project eeb

Predicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in RPredicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in R
Sotiris Baratsas
 
Workbook Project
Workbook ProjectWorkbook Project
Workbook Project
Brian Ryan
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
Ali T. Lotia
 
Building a Regression Model using SPSS
Building a Regression Model using SPSSBuilding a Regression Model using SPSS
Building a Regression Model using SPSS
Zac Bodner
 
7 QC quality control (7 QC) tools for continuous improvement of manufacturing...
7 QC quality control (7 QC) tools for continuous improvement of manufacturing...7 QC quality control (7 QC) tools for continuous improvement of manufacturing...
7 QC quality control (7 QC) tools for continuous improvement of manufacturing...
Chandan Sah
 

Semelhante a Housing prices project eeb (20)

Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregression
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Predicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in RPredicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in R
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 
Workbook Project
Workbook ProjectWorkbook Project
Workbook Project
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
BIG MART SALES.pptx
BIG MART SALES.pptxBIG MART SALES.pptx
BIG MART SALES.pptx
 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptx
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriously
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
 
Building a Regression Model using SPSS
Building a Regression Model using SPSSBuilding a Regression Model using SPSS
Building a Regression Model using SPSS
 
Bathi%20Ram%20PPT.pptx
Bathi%20Ram%20PPT.pptxBathi%20Ram%20PPT.pptx
Bathi%20Ram%20PPT.pptx
 
Alpine ML Talk: Vtreat: A Package for Automating Variable Treatment in R By ...
Alpine ML Talk:  Vtreat: A Package for Automating Variable Treatment in R By ...Alpine ML Talk:  Vtreat: A Package for Automating Variable Treatment in R By ...
Alpine ML Talk: Vtreat: A Package for Automating Variable Treatment in R By ...
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
7 QC quality control (7 QC) tools for continuous improvement of manufacturing...
7 QC quality control (7 QC) tools for continuous improvement of manufacturing...7 QC quality control (7 QC) tools for continuous improvement of manufacturing...
7 QC quality control (7 QC) tools for continuous improvement of manufacturing...
 
Generating test data for Statistical and ML models
Generating test data for Statistical and ML modelsGenerating test data for Statistical and ML models
Generating test data for Statistical and ML models
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 

Último

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 

Último (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 

Housing prices project eeb

  • 1. Predicting Prices in the Iowa Housing Market (Regularized Linear Regression) Erik Bebernes
  • 2. Introduction This project asks a common question in the field of predictive analytics…what are houses worth? Identifying the true price of a home is important in preventing a housing bubble, such as the one that plagued our country in 2008 that ultimately lead to a recession. The data I’m using comes from Kaggle, and looks specifically at houses in Ames, Iowa. There are 81 variables, with the control being “Sale Price.” I worked on a problem similar to this as an undergraduate student in an econometrics class, and although I really enjoyed it, I hadn’t the slightest clue what I was doing. Now that I am more knowledgeable when it comes to multi-regression analysis I should be able to come up with some fairly accurate predictions. Before I begin, here is a look at the 81 variables I’ll be working with. My plan of attack on this project is as follows: 1.) Identify any missing data (both missing at random and not at random) and impute new data accordingly. 2.) Remove any outliers to reduce model complexity and avoid overfitting. 3.) Run a multi-regression model, using backward selection until the p-value for model as a whole is below .05. 4.) Try a “regularized” linear model. Identifying Missing Data and Cleaning It The first thing I like to do in a lot of my projects is to run a “missmap” on the datasets to see how much of the data is NA.
  • 3. A handful of variables are nearly completely missing, let’s see what they are and why. The variables with all of the missing values are “Alley” (type of alley access), “PoolQC” (pool quality), “FireplaceQu” (fireplace quality), “Fence” (fence quality), “Lot Frontage” (linear feet of street connected to property) and “MiscFeature” (miscellaneous feature not covered in other
  • 4. categories). The descriptions of these variables make it obvious that the data is not missing at random, because they are conditional to whether or not the house has that feature to begin with. This can be said for all of these variables. Look at all of the missing variables related to garages and basements…these are the houses that don’t have garages and basements. It’s also worth noting that the amount of NA’s is equal across similar categories (i.e., all of the garage variables have 81 missing values). There is an easy fix for this. I’m going to replace all NA’s for factor variables with “none” and NA’s for all numeric variables with 0.
  • 5.
  • 6.
  • 7. Removing Outliers By making scatterplots of the numeric variables against Sale Price I’ll be able to identify any outliers and remove them from the dataset. This will simplify the model and reduce any overfitting when it comes to making the prediction.
  • 8. Multi-Regression Model In developing my linear model, I used a backward selection method, where I started by including all of the independent variables and gradually the insignificant ones (where there was a p-value greater than .05). The adjusted R-squared (accounts for more error due to an abundance of variables) is .8718, meaning 87% of the error in the dataset can be explained by the model. The model as a whole has a p-value of < 2.2e-16, making it significant. Time to make my prediction and see how it stands up on the kaggle rankings. After submitting my prediction, I was only in the 13th percentile of most accurate. This is due to the fact that there is a high variable to observation ratio, leading to overfitting. To account for this I will attempt to make a “regularized” linear model using the caret package in R, but in order to do so I need to convert factors of two into dummy variables.
  • 9. Regularized Linear Model Regularizing my model greatly improved my accuracy (now I’m in the 59th percentile on kaggle’s leaderboard).