SlideShare a Scribd company logo
1 of 20
Mohamed, Hassan Mohamed Hussein
Business administration department
Faculty of Commerce
Cairo University
Egypt
2016
Data screening and cleaning
Agenda
 Importance.
 Data screening steps.
 Data cleaning
 Missing data
 Normality
 Linearity
 Outliers
 Multicollinearity
 Homoscedasticity
Hassan Mohamed Cairo University- Statistical
Package, 2016
Importance.
Where you should clean your data in your
research process?
 Data cleaning and screening is the step that directly
follows data entry and you must not start your analysis
unless doing it.
 Data screening importance:
 It is very easy to make mistakes when entering data.
 Some errors can miss up your analysis.
 So, it is important to spend the time for checking for
the mistakes initially, rather than trying to repair the
damage later, try another person to check your data.
Hassan Mohamed Cairo University- Statistical
Package, 2016
Data screening steps
1) Check out the abnormal data (data within out of
range) from frequencies table.
2) Go back to the original questionnaire and
correct them.
Hassan Mohamed Cairo University- Statistical
Package, 2016
Data cleaning
 Data cleaning includes:
 Missing data
 Normality
 Linearity
 Outliers
 Multicollinearity
 Homoscedasticity
Hassan Mohamed Cairo University- Statistical
Package, 2016
Missing data
- If Missing data comes from data entry:
 You can detect it from the frequencies of the variable
(missing #)
 Then sort your data ascending or descending.
 Then you got the IDs of missing values
 Go back and try to fill it.
 Run your descriptive analysis again.
Hassan Mohamed Cairo University- Statistical
Package, 2016
Missing data (cont.)
- If the data entry comes from respondent errors;
 respondent was ambiguous
 Respondent forgot to answer the question.
• And missing data are more than 10% of the total
values of the variable that has missing data. Then
don’t treat with the missing data.
Hassan Mohamed Cairo University- Statistical
Package, 2016
Missing data (cont.)
• If the missing values are less than 10%:
• You can deal with it:
1. Substitute it with the neutral value. (Malhotra, 2010)
2. Substitute with an imputed value: (hair et al.,2010)
 Imputation using only valid data: Exclude cases
listwise
 Complete data. (Least preferable under 10% of
missing data)
 All available data.
Hassan Mohamed Cairo University- Statistical
Package, 2016
Missing data (cont.)
 Imputation using known replacement values:
 Case substitute.
 Hot and Cold Deck imputation (most similar case, or
best known value)
 Imputation by calculating replacement values: Replace
with……
 Mean substitution
 Regression imputation (prediction equation of the
valid data)
 This option should never be used, as it can severely
distort the results of your analysis.
Hassan Mohamed Cairo University- Statistical
Package, 2016
Missing data (cont.)
Or
 Exclude cases pairwise (recommended)
 Excludes the case only if they are missing the
data required for the specific analysis. But still
included in any other analysis. (Pallant, 2011)
Hassan Mohamed Cairo University- Statistical
Package, 2016
Normality
 The shape of the data distribution for an individual
metric variable.
 Used to describe a symmetrical, bell-shaped curve,
which has the greatest frequency of scores in the
middle with smaller frequencies towards the extremes
 It is a must for any parametric analysis.
 Normal distribution can be negligible if the sample size
more than 50 respondents.
Hassan Mohamed Cairo University- Statistical
Package, 2016
Normality (Cont.)
 Normality measures:
 Kurtosis:
 Peakedness (Leptokurtic) or flatness (Platykurtic) of
the distribution compared to the normal distribution.
 In normal distribution the kurtosis value is zero
(allowed to ±10)
 Skewness:
 The balance of the distribution
 Positive distribution (left skewed) or Negative
distribution (right skewed).
 In normal distribution the skewness value is zero
(allowed to ±3)Hassan Mohamed Cairo University- Statistical
Package, 2016
Normality (Cont.)
 5% Trimmed Mean and mean values.
 Kolmogorov-Smirnov and Shapiro-Wilk values are more
than 0.05 indicates the normality. But it is very sensitive
for the sample size more than 200.
 Form the Pell shape in the histogram.
Transformation can fix the nonnormal
distribution.
Hassan Mohamed Cairo University- Statistical
Package, 2016
Linearity
 It is for multivariate techniques based on correlational
measures of association including multiple regression.
(hair et al., 2010)
 The relationship between the two variables should be
linear. This means that when you look at a scatterplot
of scores you should see a straight line (roughly), not
a curve (Curvilinear). (pallant, 2011).
 Transformation can overcome the Curvilinear issue
(hair et al., 2010)Hassan Mohamed Cairo University- Statistical
Package, 2016
Linearity (cont.)
 So, shouldn’t transform your data to avoid non normal
distribution If your sample more than 50.
 But you should transform the data to avoid
curvilinearity.
Hassan Mohamed Cairo University- Statistical
Package, 2016
Outliers
 These are case scores that are extreme and therefore
have a much higher impact on the outcome of any
statistical analysis.
 It is not an error in your data, but it makes your data
non representative its population (Income)
 Can be detected using Box plots.
 Outliers come from: (Hair et al.,2010; Tabachnick &
Fidell, 1996)
 There was a mistake in data entry (a 6 was entered as
66, etc.)
 The missing values code was not specified and missing
values are being read as case entries (99 in spss)Hassan Mohamed Cairo University- Statistical
Package, 2016
Outliers (cont.)
 Outliers come from: (Hair et al.,2010; Tabachnick &
Fidell, 1996)
 There was a mistake in data entry (a 6 was entered as
66, etc.)
 The missing values code was not specified and missing
values are being read as case entries (99 in spss)
 The outlier is not part of the population from which you
intended to sample:
 extraordinary event (remove it).
 Extraordinary observation (take your decision
depending on your valid cases) (close to eliminate)
 Neutral value for all variables (close to retain)Hassan Mohamed Cairo University- Statistical
Package, 2016
Outliers (cont.)
 The outlier is part of the population you wanted but in the
distribution it is seen as an extreme case.
 In this case you have three choices:
1) delete the extreme cases
2) change the outliers’ scores so that they are still extreme
but they fit within a normal distribution (for example: make
it a unit larger or smaller than last case that fits in the
distribution)
3) if the outliers seem to part of an overall non-normal
distribution than a transformation can be done but first
check for normality
Hassan Mohamed Cairo University- Statistical
Package, 2016
Outliers (cont.)
 The outliers should be retained to ensure the
generalizability of population unless they are not
representative the population.
 So, again shouldn’t transform your data to avoid non
normal distribution If your sample more than 50.
 But you should transform the data to avoid outliers.
Hassan Mohamed Cairo University- Statistical
Package, 2016
Thank You
Hassan Mohamed Cairo University- Statistical
Package, 2016

More Related Content

What's hot

Data Analysis With Spss - Reliability
Data Analysis With Spss -  ReliabilityData Analysis With Spss -  Reliability
Data Analysis With Spss - ReliabilityDr Ali Yusob Md Zain
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IJames Neill
 
Reliability, validity, generalizability and the use of multi-item scales
Reliability, validity, generalizability and the use of multi-item scalesReliability, validity, generalizability and the use of multi-item scales
Reliability, validity, generalizability and the use of multi-item scalesdakter Cmc
 
Software packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSSSoftware packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSSANAND BALAJI
 
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...Ken Kwong-Kay Wong
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysisJames Neill
 
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan Yavar
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan YavarMultivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan Yavar
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan YavarBijan Yavar
 
Bivariate analysis
Bivariate analysisBivariate analysis
Bivariate analysisariassam
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis pptMukesh Bisht
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysisMurali Raj
 
Non parametric methods
Non parametric methodsNon parametric methods
Non parametric methodsPedro Moreira
 
Factor analysis (fa)
Factor analysis (fa)Factor analysis (fa)
Factor analysis (fa)Rajdeep Raut
 
Spss series - data entry and coding
Spss series - data entry and codingSpss series - data entry and coding
Spss series - data entry and codingDr. Majdi Al Jasim
 
Analysis Of Variance - ANOVA
Analysis Of Variance - ANOVAAnalysis Of Variance - ANOVA
Analysis Of Variance - ANOVASaumya Bhatnagar
 

What's hot (20)

Data Analysis With Spss - Reliability
Data Analysis With Spss -  ReliabilityData Analysis With Spss -  Reliability
Data Analysis With Spss - Reliability
 
Structural Equation Modelling (SEM) Part 3
Structural Equation Modelling (SEM) Part 3Structural Equation Modelling (SEM) Part 3
Structural Equation Modelling (SEM) Part 3
 
Manova
ManovaManova
Manova
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
 
Reliability, validity, generalizability and the use of multi-item scales
Reliability, validity, generalizability and the use of multi-item scalesReliability, validity, generalizability and the use of multi-item scales
Reliability, validity, generalizability and the use of multi-item scales
 
Software packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSSSoftware packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSS
 
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan Yavar
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan YavarMultivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan Yavar
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan Yavar
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Bivariate analysis
Bivariate analysisBivariate analysis
Bivariate analysis
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis ppt
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Non parametric methods
Non parametric methodsNon parametric methods
Non parametric methods
 
Factor analysis (fa)
Factor analysis (fa)Factor analysis (fa)
Factor analysis (fa)
 
Spss series - data entry and coding
Spss series - data entry and codingSpss series - data entry and coding
Spss series - data entry and coding
 
Analysis Of Variance - ANOVA
Analysis Of Variance - ANOVAAnalysis Of Variance - ANOVA
Analysis Of Variance - ANOVA
 

Viewers also liked

Brief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningBrief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningJennifer Morrow
 
Kofi nyanteng cleaning and screning data using spss
Kofi nyanteng   cleaning and screning data using spssKofi nyanteng   cleaning and screning data using spss
Kofi nyanteng cleaning and screning data using spssKofi Kyeremateng Nyanteng
 
Presentation on Data Cleansing
Presentation on Data CleansingPresentation on Data Cleansing
Presentation on Data Cleansingng8
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelHiram Ting
 
Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Stefan Urbanek
 
Role of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data WarehouseRole of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data WarehouseRamakant Soni
 
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...DataStax
 
DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataRitvvij Parrikh
 
Theory & Practice of Data Cleaning: Introduction to OpenRefine
Theory & Practice of Data Cleaning: Introduction to OpenRefineTheory & Practice of Data Cleaning: Introduction to OpenRefine
Theory & Practice of Data Cleaning: Introduction to OpenRefineBertram Ludäscher
 
The Cost of Bad (And Clean) Data
The Cost of Bad (And Clean) DataThe Cost of Bad (And Clean) Data
The Cost of Bad (And Clean) DataRingLead
 
Capturing and Analyzing Qualitative Data in Surveys
Capturing and Analyzing Qualitative Data in SurveysCapturing and Analyzing Qualitative Data in Surveys
Capturing and Analyzing Qualitative Data in SurveysPerformance Solutions Corp.
 
Business Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisBusiness Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisAhsan Khan Eco (Superior College)
 
Data Processing-Presentation
Data Processing-PresentationData Processing-Presentation
Data Processing-Presentationnibraspk
 
Spss lecture notes
Spss lecture notesSpss lecture notes
Spss lecture notesDavid mbwiga
 

Viewers also liked (19)

Brief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningBrief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Data cleansing
Data cleansingData cleansing
Data cleansing
 
Data Cleaning Process
Data Cleaning ProcessData Cleaning Process
Data Cleaning Process
 
Kofi nyanteng cleaning and screning data using spss
Kofi nyanteng   cleaning and screning data using spssKofi nyanteng   cleaning and screning data using spss
Kofi nyanteng cleaning and screning data using spss
 
Presentation on Data Cleansing
Presentation on Data CleansingPresentation on Data Cleansing
Presentation on Data Cleansing
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate Level
 
Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)
 
Role of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data WarehouseRole of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data Warehouse
 
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
 
DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census data
 
Theory & Practice of Data Cleaning: Introduction to OpenRefine
Theory & Practice of Data Cleaning: Introduction to OpenRefineTheory & Practice of Data Cleaning: Introduction to OpenRefine
Theory & Practice of Data Cleaning: Introduction to OpenRefine
 
The Cost of Bad (And Clean) Data
The Cost of Bad (And Clean) DataThe Cost of Bad (And Clean) Data
The Cost of Bad (And Clean) Data
 
Capturing and Analyzing Qualitative Data in Surveys
Capturing and Analyzing Qualitative Data in SurveysCapturing and Analyzing Qualitative Data in Surveys
Capturing and Analyzing Qualitative Data in Surveys
 
Business Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisBusiness Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysis
 
Data Processing
Data ProcessingData Processing
Data Processing
 
Data Processing-Presentation
Data Processing-PresentationData Processing-Presentation
Data Processing-Presentation
 
Analyzing survey data
Analyzing survey dataAnalyzing survey data
Analyzing survey data
 
Spss lecture notes
Spss lecture notesSpss lecture notes
Spss lecture notes
 

Similar to Data cleaning and screening

Normal Curve in Total Quality Management
Normal Curve in Total Quality ManagementNormal Curve in Total Quality Management
Normal Curve in Total Quality ManagementDr.Raja R
 
Statistics  What you Need to KnowIntroductionOften, when peop.docx
Statistics  What you Need to KnowIntroductionOften, when peop.docxStatistics  What you Need to KnowIntroductionOften, when peop.docx
Statistics  What you Need to KnowIntroductionOften, when peop.docxdessiechisomjj4
 
Statistic Project Essay
Statistic Project EssayStatistic Project Essay
Statistic Project EssayRobin Anderson
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti AZoha Qureshi
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSSRemas Mohamed
 
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...ahmedragab433449
 
Data Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryData Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryKelvinNMhina
 
Alternatives to t test
Alternatives to t testAlternatives to t test
Alternatives to t testLONDIWE SHANGE
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...jemille6
 
data Sreening.doc
data Sreening.docdata Sreening.doc
data Sreening.docmurtaza5500
 
2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - FinalBrian Lin
 
Machine Learning and Causal Inference
Machine Learning and Causal InferenceMachine Learning and Causal Inference
Machine Learning and Causal InferenceNBER
 
Need a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docxNeed a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docxlea6nklmattu
 
MELJUN CORTES research designing_research_methodology
MELJUN CORTES research designing_research_methodologyMELJUN CORTES research designing_research_methodology
MELJUN CORTES research designing_research_methodologyMELJUN CORTES
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X ClassRanveer Kumar
 
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docxChapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docxketurahhazelhurst
 

Similar to Data cleaning and screening (20)

Normal Curve in Total Quality Management
Normal Curve in Total Quality ManagementNormal Curve in Total Quality Management
Normal Curve in Total Quality Management
 
Applied statistics part 5
Applied statistics part 5Applied statistics part 5
Applied statistics part 5
 
Statistics  What you Need to KnowIntroductionOften, when peop.docx
Statistics  What you Need to KnowIntroductionOften, when peop.docxStatistics  What you Need to KnowIntroductionOften, when peop.docx
Statistics  What you Need to KnowIntroductionOften, when peop.docx
 
Statistic Project Essay
Statistic Project EssayStatistic Project Essay
Statistic Project Essay
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
Univariate Analysis
Univariate AnalysisUnivariate Analysis
Univariate Analysis
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSS
 
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
 
Data Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryData Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies Summary
 
Alternatives to t test
Alternatives to t testAlternatives to t test
Alternatives to t test
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...
 
data Sreening.doc
data Sreening.docdata Sreening.doc
data Sreening.doc
 
2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final
 
Machine Learning and Causal Inference
Machine Learning and Causal InferenceMachine Learning and Causal Inference
Machine Learning and Causal Inference
 
Building Better Models
Building Better ModelsBuilding Better Models
Building Better Models
 
Need a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docxNeed a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docx
 
MELJUN CORTES research designing_research_methodology
MELJUN CORTES research designing_research_methodologyMELJUN CORTES research designing_research_methodology
MELJUN CORTES research designing_research_methodology
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X Class
 
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docxChapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
 

Recently uploaded

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 

Recently uploaded (20)

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

Data cleaning and screening

  • 1. Mohamed, Hassan Mohamed Hussein Business administration department Faculty of Commerce Cairo University Egypt 2016 Data screening and cleaning
  • 2. Agenda  Importance.  Data screening steps.  Data cleaning  Missing data  Normality  Linearity  Outliers  Multicollinearity  Homoscedasticity Hassan Mohamed Cairo University- Statistical Package, 2016
  • 3. Importance. Where you should clean your data in your research process?  Data cleaning and screening is the step that directly follows data entry and you must not start your analysis unless doing it.  Data screening importance:  It is very easy to make mistakes when entering data.  Some errors can miss up your analysis.  So, it is important to spend the time for checking for the mistakes initially, rather than trying to repair the damage later, try another person to check your data. Hassan Mohamed Cairo University- Statistical Package, 2016
  • 4. Data screening steps 1) Check out the abnormal data (data within out of range) from frequencies table. 2) Go back to the original questionnaire and correct them. Hassan Mohamed Cairo University- Statistical Package, 2016
  • 5. Data cleaning  Data cleaning includes:  Missing data  Normality  Linearity  Outliers  Multicollinearity  Homoscedasticity Hassan Mohamed Cairo University- Statistical Package, 2016
  • 6. Missing data - If Missing data comes from data entry:  You can detect it from the frequencies of the variable (missing #)  Then sort your data ascending or descending.  Then you got the IDs of missing values  Go back and try to fill it.  Run your descriptive analysis again. Hassan Mohamed Cairo University- Statistical Package, 2016
  • 7. Missing data (cont.) - If the data entry comes from respondent errors;  respondent was ambiguous  Respondent forgot to answer the question. • And missing data are more than 10% of the total values of the variable that has missing data. Then don’t treat with the missing data. Hassan Mohamed Cairo University- Statistical Package, 2016
  • 8. Missing data (cont.) • If the missing values are less than 10%: • You can deal with it: 1. Substitute it with the neutral value. (Malhotra, 2010) 2. Substitute with an imputed value: (hair et al.,2010)  Imputation using only valid data: Exclude cases listwise  Complete data. (Least preferable under 10% of missing data)  All available data. Hassan Mohamed Cairo University- Statistical Package, 2016
  • 9. Missing data (cont.)  Imputation using known replacement values:  Case substitute.  Hot and Cold Deck imputation (most similar case, or best known value)  Imputation by calculating replacement values: Replace with……  Mean substitution  Regression imputation (prediction equation of the valid data)  This option should never be used, as it can severely distort the results of your analysis. Hassan Mohamed Cairo University- Statistical Package, 2016
  • 10. Missing data (cont.) Or  Exclude cases pairwise (recommended)  Excludes the case only if they are missing the data required for the specific analysis. But still included in any other analysis. (Pallant, 2011) Hassan Mohamed Cairo University- Statistical Package, 2016
  • 11. Normality  The shape of the data distribution for an individual metric variable.  Used to describe a symmetrical, bell-shaped curve, which has the greatest frequency of scores in the middle with smaller frequencies towards the extremes  It is a must for any parametric analysis.  Normal distribution can be negligible if the sample size more than 50 respondents. Hassan Mohamed Cairo University- Statistical Package, 2016
  • 12. Normality (Cont.)  Normality measures:  Kurtosis:  Peakedness (Leptokurtic) or flatness (Platykurtic) of the distribution compared to the normal distribution.  In normal distribution the kurtosis value is zero (allowed to ±10)  Skewness:  The balance of the distribution  Positive distribution (left skewed) or Negative distribution (right skewed).  In normal distribution the skewness value is zero (allowed to ±3)Hassan Mohamed Cairo University- Statistical Package, 2016
  • 13. Normality (Cont.)  5% Trimmed Mean and mean values.  Kolmogorov-Smirnov and Shapiro-Wilk values are more than 0.05 indicates the normality. But it is very sensitive for the sample size more than 200.  Form the Pell shape in the histogram. Transformation can fix the nonnormal distribution. Hassan Mohamed Cairo University- Statistical Package, 2016
  • 14. Linearity  It is for multivariate techniques based on correlational measures of association including multiple regression. (hair et al., 2010)  The relationship between the two variables should be linear. This means that when you look at a scatterplot of scores you should see a straight line (roughly), not a curve (Curvilinear). (pallant, 2011).  Transformation can overcome the Curvilinear issue (hair et al., 2010)Hassan Mohamed Cairo University- Statistical Package, 2016
  • 15. Linearity (cont.)  So, shouldn’t transform your data to avoid non normal distribution If your sample more than 50.  But you should transform the data to avoid curvilinearity. Hassan Mohamed Cairo University- Statistical Package, 2016
  • 16. Outliers  These are case scores that are extreme and therefore have a much higher impact on the outcome of any statistical analysis.  It is not an error in your data, but it makes your data non representative its population (Income)  Can be detected using Box plots.  Outliers come from: (Hair et al.,2010; Tabachnick & Fidell, 1996)  There was a mistake in data entry (a 6 was entered as 66, etc.)  The missing values code was not specified and missing values are being read as case entries (99 in spss)Hassan Mohamed Cairo University- Statistical Package, 2016
  • 17. Outliers (cont.)  Outliers come from: (Hair et al.,2010; Tabachnick & Fidell, 1996)  There was a mistake in data entry (a 6 was entered as 66, etc.)  The missing values code was not specified and missing values are being read as case entries (99 in spss)  The outlier is not part of the population from which you intended to sample:  extraordinary event (remove it).  Extraordinary observation (take your decision depending on your valid cases) (close to eliminate)  Neutral value for all variables (close to retain)Hassan Mohamed Cairo University- Statistical Package, 2016
  • 18. Outliers (cont.)  The outlier is part of the population you wanted but in the distribution it is seen as an extreme case.  In this case you have three choices: 1) delete the extreme cases 2) change the outliers’ scores so that they are still extreme but they fit within a normal distribution (for example: make it a unit larger or smaller than last case that fits in the distribution) 3) if the outliers seem to part of an overall non-normal distribution than a transformation can be done but first check for normality Hassan Mohamed Cairo University- Statistical Package, 2016
  • 19. Outliers (cont.)  The outliers should be retained to ensure the generalizability of population unless they are not representative the population.  So, again shouldn’t transform your data to avoid non normal distribution If your sample more than 50.  But you should transform the data to avoid outliers. Hassan Mohamed Cairo University- Statistical Package, 2016
  • 20. Thank You Hassan Mohamed Cairo University- Statistical Package, 2016