SlideShare uma empresa Scribd logo
1 de 17
A Brief
Introduction to
the 12 Steps of
Evaluation Data
Cleaning
Jennifer Ann Morrow, Ph.D.
University of Tennessee
Importance of Cleaning Data
• As evaluators we need our evaluation data to
be:
– Accurate
– Complete
– High quality
– Reliable
– Unbiased
– Valid
If We Don’t Clean Our Data
• Problems that can occur:
– Inaccurate/biased conclusions
– Increased error
– Reduced credibility
– Reduced generalizability
– Violation of statistical assumptions
1: Create a Data Codebook
• Contains all relevant information for your
evaluation project
• Suggestions for what to include:
– Electronic file names
– Variable names, variable labels, value labels
– Complete list of modified variables
– Citations for instrument sources
– Project diary
2: Create a Data Analysis Plan
• Your analysis plan should list each step you
will take when analyzing your data
• Suggestions for what to include:
– General instructions for data analysts
– List of datasets
– Evaluation questions
– Variables used for each analysis
– Specific analyses and graphics for each
evaluation question
3: Perform Initial Frequencies –
Round 1
• After organizing your codebook and analysis
plan you can now begin to start the data
cleaning process
• Conduct frequency analyses (frequencies,
percentages) for EVERY variable in your
evaluation dataset
• Suggestion:
– request a graphic (bar chart or histogram) for
each variable
4: Check for Coding Mistakes
• Coding errors are any values that are not
within the specified range for your variable
(e.g., you have a rating scale from 1-5 and
you have a value of 9)
• Suggestions:
– Compare all values to what is listed in your
codebook
– In many cases errors are unspecified missing
data values
5: Modify and Create Variables
• It is now time to modify your variables so
they can be used in your planned analyses
• Suggestions:
– Reverse code any variables that need to be
merged with others that are on the opposite
scale
– Recode any variables to match your codebook
– Create new variables (e.g., averages, total
scores) to be used for future analyses
6: Frequencies and Descriptives –
Round 2
• At this step you conduct frequency analyses
on every variable and descriptive analyses
on every continuous variable
• Suggestions:
– Review the following descriptives: mean,
median, mode, standard deviation, skewness,
kurtosis, minimum, and maximum
– Create standardized scores (i.e., Z-scores) for
every continuous variable
7: Search for Outliers
• Review your standardized scores and
histograms to check for outliers
• Outliers are scores that deviate greatly from
the mean (e.g., >/3.29/ standard
deviations) and potentially can create or
cover up statistical significance
• Suggestions:
– delete, transform, or alter (winsorize, trim,
modify) your outliers
8: Assess for Normality
• For many inferential statistics (e.g., analyses
of variance, regressions) your outcome
(dependent) variable should be normally
distributed (i.e., mean=median=mode)
• Suggestions:
– check to see if the values of your skewness and
kurtosis are greater than /2/
– Transform the variable, use a non-parametric
analysis, or modify your alpha level
9: Dealing with Missing Data
• You should always check to see if missing
data is random or non-random (i.e.,
patterns of missing data)
• Evaluation results can be misleading and
less generalizable
• Suggestions:
– Delete cases/variables with missing data,
estimate missing data, conduct analyses with
and without modifying variables
10: Examine Cell Sample Size
• For many of our analyses (e.g., group
difference statistics) we want to have equal
sample sizes in our cells of our design
• Unequal sample sizes lead to lower statistical
power and reduced generalizability
• Suggestions:
– Collapse categories within a variable, use a non-
parametric analysis, or apply a more stringent
alpha level
11: Frequencies and Descriptives –
The Finale
• Your data is now cleaned and ready to be
summarized!
• Conduct a final set of frequencies and
descriptives prior to conducting your
inferential statistics
• Suggestion:
– Use a variety of graphics and visual aids to
showcase your evaluation data for your clients
12: Assumption Testing
• For some inferential statistics (e.g.,
correlational analyses, group difference
analyses) you still need to address a few
additional assumptions in order to conduct
the analysis
• Suggestions:
– Some common assumptions are: homogeneity of
variance, linearity, independence of errors,
multicollinearity, and reliability
Some Helpful Resources
• YouTube videos
– http://www.youtube.com/watch?v=R6Cc5flsbsw
– http://www.youtube.com/watch?
v=5qhLDYr70MM&feature=channel&list=UL
• Websites
– http://clinistat.hk/internetresource.php
– http://pareonline.net/getvn.asp?v=9&n=6
• Software
– http://www.gnu.org/software/pspp/
– http://davidmlane.com/hyperstat/Statistical_analyses.html
Contact Information
Jennifer Ann Morrow, Ph.D.
Associate Professor of Evaluation, Statistics, and Measurement
Department of Educational Psychology and Counseling
University of Tennessee
Knoxville, TN 37996-3452
Email: jamorrow@utk.edu
http://web.utk.edu/~edpsych/eval_assessment/default.html

Mais conteúdo relacionado

Mais procurados

research process
research processresearch process
research process
Shruti Jain
 
Data analysis – qualitative data presentation 2
Data analysis – qualitative data   presentation 2Data analysis – qualitative data   presentation 2
Data analysis – qualitative data presentation 2
Azura Zaki
 

Mais procurados (20)

Quantitative Research
Quantitative ResearchQuantitative Research
Quantitative Research
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Univariate Analysis
Univariate AnalysisUnivariate Analysis
Univariate Analysis
 
research process
research processresearch process
research process
 
Data analysis – qualitative data presentation 2
Data analysis – qualitative data   presentation 2Data analysis – qualitative data   presentation 2
Data analysis – qualitative data presentation 2
 
Research tools and techniques
Research tools and techniquesResearch tools and techniques
Research tools and techniques
 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collection
 
Components of research
Components of researchComponents of research
Components of research
 
Synthesis
SynthesisSynthesis
Synthesis
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Ppt on Report Writing
Ppt on  Report WritingPpt on  Report Writing
Ppt on Report Writing
 
Writing proposals and project reports
Writing proposals and project reportsWriting proposals and project reports
Writing proposals and project reports
 
Science report writing
Science report writingScience report writing
Science report writing
 
Qualitative, Quantitative (PowerPoint)
Qualitative, Quantitative (PowerPoint)Qualitative, Quantitative (PowerPoint)
Qualitative, Quantitative (PowerPoint)
 
Chapter 7-THE RESEARCH DESIGN
Chapter 7-THE RESEARCH DESIGNChapter 7-THE RESEARCH DESIGN
Chapter 7-THE RESEARCH DESIGN
 
Types of variables-Advance Research Methodology
Types of variables-Advance Research MethodologyTypes of variables-Advance Research Methodology
Types of variables-Advance Research Methodology
 
Types of report writing
Types of report writingTypes of report writing
Types of report writing
 
Writing research report
Writing research reportWriting research report
Writing research report
 
Grounded theory
Grounded theoryGrounded theory
Grounded theory
 
Mixed methods designs
Mixed methods designs Mixed methods designs
Mixed methods designs
 

Semelhante a Brief Introduction to the 12 Steps of Evaluation Data Cleaning

Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysis
ILRI-Jmaru
 
Final spss hands on training (descriptive analysis) may 24th 2013
Final spss  hands on training (descriptive analysis) may 24th 2013Final spss  hands on training (descriptive analysis) may 24th 2013
Final spss hands on training (descriptive analysis) may 24th 2013
Tin Myo Han
 
Data analysis plan in medicine and nurse.pptx
Data analysis plan in medicine and nurse.pptxData analysis plan in medicine and nurse.pptx
Data analysis plan in medicine and nurse.pptx
Juma675663
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
Vivastream
 
Data Analysis and Synthesis & Techniques of System.pptx
Data Analysis and Synthesis & Techniques of System.pptxData Analysis and Synthesis & Techniques of System.pptx
Data Analysis and Synthesis & Techniques of System.pptx
Ts. Heshalini Rajagopal
 
UNIT 4.pptx
UNIT 4.pptxUNIT 4.pptx
UNIT 4.pptx
SreeLatha98
 

Semelhante a Brief Introduction to the 12 Steps of Evaluation Data Cleaning (20)

lecture-8.pdf
lecture-8.pdflecture-8.pdf
lecture-8.pdf
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation a
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate Level
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysis
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
Lecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfLecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdf
 
Data Preparation and Processing
Data Preparation and ProcessingData Preparation and Processing
Data Preparation and Processing
 
Data mining
Data miningData mining
Data mining
 
Final spss hands on training (descriptive analysis) may 24th 2013
Final spss  hands on training (descriptive analysis) may 24th 2013Final spss  hands on training (descriptive analysis) may 24th 2013
Final spss hands on training (descriptive analysis) may 24th 2013
 
Modeling and analysis
Modeling and analysisModeling and analysis
Modeling and analysis
 
Design of experiments BY Minitab
Design of experiments BY MinitabDesign of experiments BY Minitab
Design of experiments BY Minitab
 
How to Think Like A Statistician
How to Think Like A StatisticianHow to Think Like A Statistician
How to Think Like A Statistician
 
Data analysis plan in medicine and nurse.pptx
Data analysis plan in medicine and nurse.pptxData analysis plan in medicine and nurse.pptx
Data analysis plan in medicine and nurse.pptx
 
Hm306 week 1 ppt A
Hm306 week 1 ppt AHm306 week 1 ppt A
Hm306 week 1 ppt A
 
Hm306 week 1 ppt 1
Hm306 week 1 ppt 1Hm306 week 1 ppt 1
Hm306 week 1 ppt 1
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Data Analysis and Synthesis & Techniques of System.pptx
Data Analysis and Synthesis & Techniques of System.pptxData Analysis and Synthesis & Techniques of System.pptx
Data Analysis and Synthesis & Techniques of System.pptx
 
UNIT 4.pptx
UNIT 4.pptxUNIT 4.pptx
UNIT 4.pptx
 
Data warehouse 16 data analysis techniques
Data warehouse 16 data analysis techniquesData warehouse 16 data analysis techniques
Data warehouse 16 data analysis techniques
 

Mais de Jennifer Morrow (6)

Using Collaborative and Expressive Writing Activities to Educate First-Year S...
Using Collaborative and Expressive Writing Activities to Educate First-Year S...Using Collaborative and Expressive Writing Activities to Educate First-Year S...
Using Collaborative and Expressive Writing Activities to Educate First-Year S...
 
Preparing to go on the job market: Strategies for academic and non-academic j...
Preparing to go on the job market: Strategies for academic and non-academic j...Preparing to go on the job market: Strategies for academic and non-academic j...
Preparing to go on the job market: Strategies for academic and non-academic j...
 
Collecting Longitudinal Evaluation Data in a College Setting
Collecting Longitudinal Evaluation Data in a College SettingCollecting Longitudinal Evaluation Data in a College Setting
Collecting Longitudinal Evaluation Data in a College Setting
 
What is program evaluation lecture 100207 [compatibility mode]
What is program evaluation lecture   100207 [compatibility mode]What is program evaluation lecture   100207 [compatibility mode]
What is program evaluation lecture 100207 [compatibility mode]
 
APA Version 6 Quick Guide
APA Version 6 Quick GuideAPA Version 6 Quick Guide
APA Version 6 Quick Guide
 
How to create multiple choice questions
How to create multiple choice questionsHow to create multiple choice questions
How to create multiple choice questions
 

Último

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Último (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

Brief Introduction to the 12 Steps of Evaluation Data Cleaning

  • 1. A Brief Introduction to the 12 Steps of Evaluation Data Cleaning Jennifer Ann Morrow, Ph.D. University of Tennessee
  • 2. Importance of Cleaning Data • As evaluators we need our evaluation data to be: – Accurate – Complete – High quality – Reliable – Unbiased – Valid
  • 3. If We Don’t Clean Our Data • Problems that can occur: – Inaccurate/biased conclusions – Increased error – Reduced credibility – Reduced generalizability – Violation of statistical assumptions
  • 4. 1: Create a Data Codebook • Contains all relevant information for your evaluation project • Suggestions for what to include: – Electronic file names – Variable names, variable labels, value labels – Complete list of modified variables – Citations for instrument sources – Project diary
  • 5. 2: Create a Data Analysis Plan • Your analysis plan should list each step you will take when analyzing your data • Suggestions for what to include: – General instructions for data analysts – List of datasets – Evaluation questions – Variables used for each analysis – Specific analyses and graphics for each evaluation question
  • 6. 3: Perform Initial Frequencies – Round 1 • After organizing your codebook and analysis plan you can now begin to start the data cleaning process • Conduct frequency analyses (frequencies, percentages) for EVERY variable in your evaluation dataset • Suggestion: – request a graphic (bar chart or histogram) for each variable
  • 7. 4: Check for Coding Mistakes • Coding errors are any values that are not within the specified range for your variable (e.g., you have a rating scale from 1-5 and you have a value of 9) • Suggestions: – Compare all values to what is listed in your codebook – In many cases errors are unspecified missing data values
  • 8. 5: Modify and Create Variables • It is now time to modify your variables so they can be used in your planned analyses • Suggestions: – Reverse code any variables that need to be merged with others that are on the opposite scale – Recode any variables to match your codebook – Create new variables (e.g., averages, total scores) to be used for future analyses
  • 9. 6: Frequencies and Descriptives – Round 2 • At this step you conduct frequency analyses on every variable and descriptive analyses on every continuous variable • Suggestions: – Review the following descriptives: mean, median, mode, standard deviation, skewness, kurtosis, minimum, and maximum – Create standardized scores (i.e., Z-scores) for every continuous variable
  • 10. 7: Search for Outliers • Review your standardized scores and histograms to check for outliers • Outliers are scores that deviate greatly from the mean (e.g., >/3.29/ standard deviations) and potentially can create or cover up statistical significance • Suggestions: – delete, transform, or alter (winsorize, trim, modify) your outliers
  • 11. 8: Assess for Normality • For many inferential statistics (e.g., analyses of variance, regressions) your outcome (dependent) variable should be normally distributed (i.e., mean=median=mode) • Suggestions: – check to see if the values of your skewness and kurtosis are greater than /2/ – Transform the variable, use a non-parametric analysis, or modify your alpha level
  • 12. 9: Dealing with Missing Data • You should always check to see if missing data is random or non-random (i.e., patterns of missing data) • Evaluation results can be misleading and less generalizable • Suggestions: – Delete cases/variables with missing data, estimate missing data, conduct analyses with and without modifying variables
  • 13. 10: Examine Cell Sample Size • For many of our analyses (e.g., group difference statistics) we want to have equal sample sizes in our cells of our design • Unequal sample sizes lead to lower statistical power and reduced generalizability • Suggestions: – Collapse categories within a variable, use a non- parametric analysis, or apply a more stringent alpha level
  • 14. 11: Frequencies and Descriptives – The Finale • Your data is now cleaned and ready to be summarized! • Conduct a final set of frequencies and descriptives prior to conducting your inferential statistics • Suggestion: – Use a variety of graphics and visual aids to showcase your evaluation data for your clients
  • 15. 12: Assumption Testing • For some inferential statistics (e.g., correlational analyses, group difference analyses) you still need to address a few additional assumptions in order to conduct the analysis • Suggestions: – Some common assumptions are: homogeneity of variance, linearity, independence of errors, multicollinearity, and reliability
  • 16. Some Helpful Resources • YouTube videos – http://www.youtube.com/watch?v=R6Cc5flsbsw – http://www.youtube.com/watch? v=5qhLDYr70MM&feature=channel&list=UL • Websites – http://clinistat.hk/internetresource.php – http://pareonline.net/getvn.asp?v=9&n=6 • Software – http://www.gnu.org/software/pspp/ – http://davidmlane.com/hyperstat/Statistical_analyses.html
  • 17. Contact Information Jennifer Ann Morrow, Ph.D. Associate Professor of Evaluation, Statistics, and Measurement Department of Educational Psychology and Counseling University of Tennessee Knoxville, TN 37996-3452 Email: jamorrow@utk.edu http://web.utk.edu/~edpsych/eval_assessment/default.html