SlideShare uma empresa Scribd logo
1 de 29
Data Analysis Module
ILRI Graduate Fellows skills training
Nairobi 4th December 2013
Session Objectives
To be able to;
• Answer the research questions:
What results do you need to show and in what format
(tables, graphs, charts etc.)
Selection of data analysis methods (principles and concepts)
• Identify, evaluate and apply data analysis packages – software
available, open source
• Plan analysis of own data and carry out exploratory analysis
• Carry out formal data analysis using different tools and methods
e.g. R
Research Process
• Problem definition
• Literature review
• Objective & hypothesis
• Study design
• Sampling
• Data collection
• Data management
• Formal analysis
• Reporting
• Publication
• Data archiving or
publication
Project development implementation Communicating findings
Definition of problem
domain & how the
specific problem fits in
Identification of gaps,
appropriate methods &
theory
Research will approve or
disapprove the hypothesis
Research strategy to be
used, sample size,
sampling frame
 Sample selection
 Data collection tools
 Database development and
data cleaning
 Exploration, description,
modelling & interpretation
of statistical outputs
 Choice of reporting
media & format
 Advise on presentation
of results
 Data sharing media
Data Analysis – Guiding Principles
Translating Research Questions / Objectives / Hypotheses into an
‘analysis plan’
• Used our research questions / objectives / hypotheses to design
the study – experiment or survey, questions to ask & data to
collect
• We use them again to plan the analysis – what differences do I
need to show, what are my response variables, what types of
model may I need to use etc.
• This is often a good time to draft tables and graphs which you
think will help answer the questions
Data Analysis – Multi-level Data Structures
• Many studies are designed at different levels and collect data at these
multiple levels
• E.g. in experiments this could be animals or plots → blocks and for survey this
may be animal → household → village → district.
• Another aspect of ‘levels’ is repeated measurements over time.
• These levels must not be forgotten when we reach analysis stage –
which level do we summarise each variable at? What is our ‘unit of
analysis’?
• For formal analysis then there are advanced methods that allow the
data to be analysed in a way that allows for the multi-level data
structure (including variation).
• The analysis can often be simplified into fewer dimensions by
summarising particular aspects
• e.g. means between two points in time, the slope of the trend between two
points, the value reached at the end.
Data Analysis – Response & Explanatory
variables
Terms are discipline specific:
Response ≡ Dependent variable ≡ y
Explanatory ≡ Independent variable ≡ x’s
• Explanatory variables can be continuous or discrete
• In epidemiology they can be ‘confounding’ or biological
‘interacting’*
• In economics they can be exogenous or endogenous
*Note that statistical confounding and interactions have different
interpretations
Data Analysis – Variable Types
• Variables can be:
CONTINUOUS or DISCRETE
• In analysis we may sometimes convert continuous data to discrete
• Both Response and Explanatory variables can be continuous or
discrete.
Data Analysis – Aim of Exploratory Analysis
• Data exploration is the first stage to any analysis of the data –
people often jump straight to formal analysis and models but it
is at this stage where you will identify patterns and ‘odd’ data
• In some cases this may be 90% of the analysis you do on the
data (e.g. Case Study 11)
• The more complicated the data set the more interesting and
necessary the exploratory phase becomes.
• With some expertise in data management we can highlight the
important patterns within the data and list the types of
statistical models and their forms that need to be fitted at the
next stage.
Data Analysis – Activities of Exploratory Analysis
Data Analysis – Exploratory Analysis Methods
(Continuous & Discrete Variables)
 Tools for exploratory analysis:
• Means & ranges Case Study 2
• 1 & 2-way tables of means Case Study 3
• Frequency tables Case Study 3 & Case Study 8 – use of Excel pivot tables!
• Histograms Case Study 11
• Scatterplots Case Study 1 & Case Study 3
• Boxplots Case Study 3
• Bar charts & pie charts Case Study 11
• Trend graphs, survival curves Case Study 3
 Identify patterns and unusual variables – e.g. outliers, zeros
 Measures of variation – variance, standard deviation, confidence interval,
standard error, CV
 Transformation of variables?
Data Analysis – Confirmatory / Formal Analysis
Exploratory analysis was the first part of the analysis of our
research data – to gain initial understanding of patterns that
exist in the data and suggest further analysis needs
By the time we reach Confirmatory / Formal analysis we
have refined our objectives and these should clearly define
exactly what type of further statistical analysis we need (and
which models to use)
In exploratory analysis it is difficult to look at many variables
at the same time – formal analysis allows us to do this and be
able to see which variables are more important and others.
Data Analysis – Confirmatory / Formal
Analysis (Some options…)
(Non-parametric equivalents
for small samples)
Response / Dependent Variable(s)
Discrete Continuous
Explanatory /
Independent
Variable(s)
Discrete
Chi-square/ Regression
(Logistic – binary;
Poisson - count)
T-Test / Analysis of
Variance (/ Regression)
Continuous
Regression (Logistic –
binary; Poisson - count)
Correlation / Linear
Regression
Both
Regression (Logistic –
binary; Poisson - count)
ANOVA / Linear
Regression
Data Analysis – Confirmatory / Formal Analysis
(Some advanced options…)
• Mixed-effects models (REML) – incorporates random effects
including spatial & temporal repeated measurements; better at
managing data with many levels of hierarchy; used a lot in
epidemiology, animal/plant genetics.
• Survival models.
• Multivariate (> 1 y) methods– can be both parametric & non-
parametric).
• Proportional-odds models when categorical response with both
than 2 categories.
Data Analysis – Confirmatory / Formal Analysis
Concepts
The underlying concept in formal statistical modeling is:
Data = Pattern + Residual
 ‘Data’ are the raw data (responses) that you collected, or sometimes may be
summaries derived from the raw data*. They could also be transformed
values*
 ‘Pattern’ is all the variables (continuous or discrete) that are in the design of
the study (e.g. treatment) or have been selected in your exploratory analysis
as explaining some of the differences in the ‘Data’
 ‘Residual’ is the variation we can’t explain by the ‘Pattern’.
The aim of our formal analysis is to put as much of the
variation as possible into the ‘Pattern’ while keeping the
model as simple as possible…easy 
*See earlier slides
Data Analysis – Confirmatory / Formal Analysis:
Correlation & simple Linear Regression 1
 We will use correlation & fitting a
straight line to data to explain the
concepts of statistical modeling:
• The simplest way of looking at the
relationship between an x-variable
and a y-variable is with the
CORRELATION.
• An extension of this is to use a
LINEAR REGRESSION model to fit a
straight line through the points (Case
study 3)
• To look at how well this model is
fitting we use an ‘analysis of
variance’ – the amount of variation in
the Pattern vs. in the Residual
Data Analysis – Confirmatory / Formal Analysis:
Correlation & simple Linear Regression 2
 Linear regression and similar models present the analysis in an
‘analysis of variance’ table that looks like (Section 3.2):
 In this example the p-value will tell you if the slope of the line is
significantly different from 0 (i.e. a flat line) (Section 5)
 Models such as Logistic for binary data and proportions or
Poisson for counts give a similar table but it is now the ‘analysis
of deviance’ with similar interpretations (Section 10.2
Source of
variation
d.f. s.s. m.s. v.r (F-value) p-value
Slope of line 1
Residual (error) N-2
Total N-1
Data Analysis – Confirmatory / Formal Analysis:
Correlation & simple Linear Regression 3
 A key aspect of any
model is the ‘model
checking’ – part of this is
done through
examination of the
residuals.
 For all regression models
which assume that
either the data or the
residuals are ‘normal’
then we use the same
assumptions of
independence,
randomness and normal
distribution
Data Analysis – Confirmatory / Formal Analysis:
Parameter Estimates & Least Square Means
 We also look at the parameter estimates and their standard
errors – for the linear regression example the parameter estimate
is the slope (and intercept). For more complex models and those
with discrete explanatory variable we will use the parameter
estimates to compare levels of the discrete variables (Case Study 3
for examples & discussion).
 For models containing discrete variables as explanatory /
independent variables we will often want to present Means and
Standard Errors and compare these (with t-test if comparing 2 or
multiple comparison tests) (Section 6).
Data Analysis – Confirmatory / Formal Analysis:
Exercise
 Identify what sort of model you may use in your research (check
the Statistical Modeling Teaching Guide) – e.g. linear regression
(section 3), designed experiment (section 4), response data which
are proportions or binary (section 10), count response data
(section 11), survival data (section 12).
 Which parameters may be included in your model as the Pattern
 Draw a pretend analysis of variance / deviance or parameter
estimates table of what you may expect to see in the analysis
Features Excel Stata SPSS SAS R
Learning curve Gradual/flat Steep/gradual Gradual/flat Pretty steep Pretty steep
User interface Point-and-click Programming/point-
and-click
Mostly point-and-
click
Programming Programming
Data
manipulation
Weak/moderate Very strong Moderate Very strong Very strong
Data analysis Modest Powerful Powerful Powerful/versatile Powerful/versatile
Graphics Very good Very good Very good Good Excellent
Cost
Part of MS office Affordable
(perpetual
licenses, renew
only when
upgrade)
Expensive (with
annual license
renewal )
Expensive
Annual Renewal
Open source
(free)
Data Analysis – Application (Statistical Software
Packages)
Data Analysis – Application (Using
R)
Outline
• Installing R
• R Environment
 Command prompt
 RStudio
 Setting your workspace
• Loading and installing R packages
• Importing data into R - *.csv/*.xls
• Saving R data
• Data exploration – summary statistics
• Graphing in R - boxplot
• Data analysis – T-test/linear & logistic regression
Introduction to R
Installing R
- Download R from http://cran.r-project.org
CRAN – Comprehensive R Archive Network
R version changes over time, the current one is E-
2.15.0
- Installing RStudio
- Setting up the work environment
Introduction to R
R Environment
Command prompt
R is primarily a command driven software where instructions are typed at
the command prompt (> )
R is case sensitive
Rstudio
Rstudio has limited set of commands that can be selected and executed from
the menu
Setting your workspace
It is important to set R preferences to suit your work environment, one such
setting is the working directory.
Working directory is set using the command setwd
setwd("D:/My Documents/R course") or from File->change dir on the menu
Take note of / R will not recognize  when specifying subdirectories
Introduction to R
Loading and installing R packages
Modules or sets of functions are referred to as PACKAGES in R. Some packages
are part of the base installation while others have to be installed separately.
There are several user-contributed packages.
Type library() to view installed packages
To view functions within a package type
library(help=“packagename”) e.g. library(help=stats) – no quotes
Install packages using the menu Packages->Install package(s) ….
Use find(“item”) command to identify the package containing an item of
interest
e.g. find(“plot”), if you are sure of the exact name otherwise use
apropos(“item”)
Introduction to R
Importing data into R - *.csv/*.xls
Although it is possible to enter data directly into R, importing data in a spreadsheet format is more
efficient.
Use:
i. Read.table – to import space separated data with column headings (*.txt)
prod1 <- read.table("D://My Documents/R course/PROD2B.txt", header=T, sep=",")
ii. Read.csv – to import comma separated data with column headings (*.csv)
prod2 <- read.csv("D://My Documents/R course/PROD2B.csv", header=T)
To save the file in R
write.table(prod2, file="D://My Documents/R course/proddata2", quote=FALSE)
i. odbcConnectExcel() - to import excel worksheet
prod3<-"D://My Documents/R course/PROD2B.xls“
datachannel<-odbcConnectExcel(prod3)
outprod3 <-sqlFetch(channel= datachannel, sqtable="prod3")
write.table(outprod3, file="D://My Documents/R course/proddata3", quote=FALSE)
Introduction to R
Data exploration – summary statistics
One can get summary statistics on all numeric variables in the dataset
using summary(datasetname)
eg. summary(outprod3)
It is also possible to get summary statistics on a particular variable, use
$ to attach variable to the data table
e.g. summary(outprod3$WEIGHT)
Use aggregate to get summary statistics by group/category
e.g. aggregate(data.frame(WEIGHT), by=list(herd=HERD,sex=SEX), mean)
It is advisable to attach a data file to avoid having to specify the data file
all the time particularly for long summaries such as aggregate
attach(outprod3)
Introduction to R
Graphing in R – boxplot
- R has powerful graphing features that can be used in data
exploration, such as histograms, boxplot, scatterplot, etc.
histogram(~PCV, n=30, xlab="Packed Cell Volume")
boxplot(PCV, ylab="Packed Cell Volume")
boxplot(PCV~HERD, color="orange", ylab="Packed Cell Volume",
xlab="Herd")
xyplot(PCV~WEIGHT, color="orange", ylab="Packed Cell Volume",
xlab="Weight")
Introduction to R
Data analysis – T-test
- t.test(prod2$WEIGHT~prod2$SEX)
Data analysis – Linear Regression
- output1<-lm(PCV~WEIGHT)
- Remember to attach the dataset to make it active
- attach(prod2)
Data analysis – Logistic Regression
-
References
 Research Methods & Biometrics Teaching Resource – Case Study 1 & 4 have R,
Case Study 2 has ANOVA: many others used in this session as examples. The study
guides are useful for reference material on Explanatory and Formal Analysis.
 Take home:
 Analysis the Data & Models Chapters in Green Book
 Good Statistical Practice for Natural Resources Research – Part IV
 R Intro Course Notes – Nicholas Ndiwa
 Reading University SSC – Approaches to Analysis of Survey Data; Confidence &
Significance: Key Concepts of Inferential Statistics; Modern methods of analysis;
Analysis of Experimental Data

Mais conteúdo relacionado

Mais procurados

Class 1 Introduction, Levels Of Measurement, Hypotheses, Variables
Class 1   Introduction, Levels Of Measurement, Hypotheses, VariablesClass 1   Introduction, Levels Of Measurement, Hypotheses, Variables
Class 1 Introduction, Levels Of Measurement, Hypotheses, Variablesaoudshoo
 
Analysis of data in research
Analysis of data in researchAnalysis of data in research
Analysis of data in researchAbhijeet Birari
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data AnalyticsSSaudia
 
Data collection,tabulation,processing and analysis
Data collection,tabulation,processing and analysisData collection,tabulation,processing and analysis
Data collection,tabulation,processing and analysisRobinsonRaja1
 
Research methodology - Analysis of Data
Research methodology - Analysis of DataResearch methodology - Analysis of Data
Research methodology - Analysis of DataThe Stockker
 
Unit 4 editing and coding (2)
Unit 4 editing and coding (2)Unit 4 editing and coding (2)
Unit 4 editing and coding (2)kalailakshmi
 
Ch21 22 data analysis and interpretation
Ch21 22 data analysis and interpretationCh21 22 data analysis and interpretation
Ch21 22 data analysis and interpretationJay Tanna
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Ho Cao Viet
 
Wynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statisticsWynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statisticsWynberg Girls High
 
Confirmatory factor analysis (cfa)
Confirmatory factor analysis (cfa)Confirmatory factor analysis (cfa)
Confirmatory factor analysis (cfa)HennaAnsari
 
Quantitative data analysis
Quantitative data analysisQuantitative data analysis
Quantitative data analysisAyuni Abdullah
 
applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2amir rahmani
 
Statistical treatment and data processing copy
Statistical treatment and data processing   copyStatistical treatment and data processing   copy
Statistical treatment and data processing copySWEET PEARL GAMAYON
 
Quantitative data analysis final
Quantitative data analysis final Quantitative data analysis final
Quantitative data analysis final atrantham
 
Data representation and analysis - Mathematics
Data representation and analysis - MathematicsData representation and analysis - Mathematics
Data representation and analysis - MathematicsNayan Dagliya
 
data analysis and report wring in research (Section d)
data analysis and report wring  in research (Section d)data analysis and report wring  in research (Section d)
data analysis and report wring in research (Section d)CGC Technical campus,Mohali
 
Multivariate Analysis Techniques
Multivariate Analysis TechniquesMultivariate Analysis Techniques
Multivariate Analysis TechniquesMehul Gondaliya
 

Mais procurados (20)

Class 1 Introduction, Levels Of Measurement, Hypotheses, Variables
Class 1   Introduction, Levels Of Measurement, Hypotheses, VariablesClass 1   Introduction, Levels Of Measurement, Hypotheses, Variables
Class 1 Introduction, Levels Of Measurement, Hypotheses, Variables
 
Analysis of data in research
Analysis of data in researchAnalysis of data in research
Analysis of data in research
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
Data collection,tabulation,processing and analysis
Data collection,tabulation,processing and analysisData collection,tabulation,processing and analysis
Data collection,tabulation,processing and analysis
 
Research methodology - Analysis of Data
Research methodology - Analysis of DataResearch methodology - Analysis of Data
Research methodology - Analysis of Data
 
Unit 4 editing and coding (2)
Unit 4 editing and coding (2)Unit 4 editing and coding (2)
Unit 4 editing and coding (2)
 
Ch21 22 data analysis and interpretation
Ch21 22 data analysis and interpretationCh21 22 data analysis and interpretation
Ch21 22 data analysis and interpretation
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11
 
Wynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statisticsWynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statistics
 
Confirmatory factor analysis (cfa)
Confirmatory factor analysis (cfa)Confirmatory factor analysis (cfa)
Confirmatory factor analysis (cfa)
 
Quantitative data analysis
Quantitative data analysisQuantitative data analysis
Quantitative data analysis
 
applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2
 
Statistical treatment and data processing copy
Statistical treatment and data processing   copyStatistical treatment and data processing   copy
Statistical treatment and data processing copy
 
Quantitative data analysis final
Quantitative data analysis final Quantitative data analysis final
Quantitative data analysis final
 
Data representation and analysis - Mathematics
Data representation and analysis - MathematicsData representation and analysis - Mathematics
Data representation and analysis - Mathematics
 
statistical analysis
statistical analysisstatistical analysis
statistical analysis
 
Level Of Measurement
Level Of MeasurementLevel Of Measurement
Level Of Measurement
 
data analysis and report wring in research (Section d)
data analysis and report wring  in research (Section d)data analysis and report wring  in research (Section d)
data analysis and report wring in research (Section d)
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Multivariate Analysis Techniques
Multivariate Analysis TechniquesMultivariate Analysis Techniques
Multivariate Analysis Techniques
 

Destaque

Explanatory Report – on Constant Default Rates (CDRs).
Explanatory Report – on Constant Default Rates (CDRs).Explanatory Report – on Constant Default Rates (CDRs).
Explanatory Report – on Constant Default Rates (CDRs).Jasmin Abdel
 
M d business company brochure
M d business company brochureM d business company brochure
M d business company brochureMoustapha Diouf
 
Presenting with impact.potx
Presenting with impact.potxPresenting with impact.potx
Presenting with impact.potxILRI-Jmaru
 
Julian Cooper Presentation
Julian Cooper PresentationJulian Cooper Presentation
Julian Cooper PresentationJulian Cooper
 
M d business company brochure
M d business company brochureM d business company brochure
M d business company brochureMoustapha Diouf
 
FlashCast | Development Impact Lab | 2014 | Georgetown
FlashCast | Development Impact Lab | 2014 | GeorgetownFlashCast | Development Impact Lab | 2014 | Georgetown
FlashCast | Development Impact Lab | 2014 | GeorgetownJeremy Gordon
 
Oke Pool Community - Bahasa Indonesia
Oke Pool Community - Bahasa IndonesiaOke Pool Community - Bahasa Indonesia
Oke Pool Community - Bahasa IndonesiaRinaldi Rinaldi
 
презентация вдк киноостров_2014 (1)
презентация вдк киноостров_2014 (1)презентация вдк киноостров_2014 (1)
презентация вдк киноостров_2014 (1)Аврора Крейсер
 
Rhel7 beta information
Rhel7 beta informationRhel7 beta information
Rhel7 beta information현익 박
 
Data management ii 12 nov 2013
Data management  ii 12 nov 2013Data management  ii 12 nov 2013
Data management ii 12 nov 2013ILRI-Jmaru
 
Presenting with impact.potx
Presenting with impact.potxPresenting with impact.potx
Presenting with impact.potxILRI-Jmaru
 
Historical leader - Napoleon Bonaparte
Historical leader - Napoleon BonaparteHistorical leader - Napoleon Bonaparte
Historical leader - Napoleon Bonapartemargauxsydney
 
Lesson plan presentation skills 30th sept
Lesson plan presentation skills 30th septLesson plan presentation skills 30th sept
Lesson plan presentation skills 30th septILRI-Jmaru
 
Zendesk reseller agreement worldwide template_dec_1_14[3]
Zendesk reseller agreement worldwide template_dec_1_14[3]Zendesk reseller agreement worldwide template_dec_1_14[3]
Zendesk reseller agreement worldwide template_dec_1_14[3]Jan Smulders
 

Destaque (20)

Explanatory Report – on Constant Default Rates (CDRs).
Explanatory Report – on Constant Default Rates (CDRs).Explanatory Report – on Constant Default Rates (CDRs).
Explanatory Report – on Constant Default Rates (CDRs).
 
Religion and Achieving Women's Human Rights in South East Asia
Religion and Achieving Women's Human Rights in South East AsiaReligion and Achieving Women's Human Rights in South East Asia
Religion and Achieving Women's Human Rights in South East Asia
 
M d business company brochure
M d business company brochureM d business company brochure
M d business company brochure
 
Presenting with impact.potx
Presenting with impact.potxPresenting with impact.potx
Presenting with impact.potx
 
Plastics additives market
Plastics additives marketPlastics additives market
Plastics additives market
 
Julian Cooper Presentation
Julian Cooper PresentationJulian Cooper Presentation
Julian Cooper Presentation
 
M d business company brochure
M d business company brochureM d business company brochure
M d business company brochure
 
скн 12
скн 12скн 12
скн 12
 
FlashCast | Development Impact Lab | 2014 | Georgetown
FlashCast | Development Impact Lab | 2014 | GeorgetownFlashCast | Development Impact Lab | 2014 | Georgetown
FlashCast | Development Impact Lab | 2014 | Georgetown
 
Oke Pool Community - Bahasa Indonesia
Oke Pool Community - Bahasa IndonesiaOke Pool Community - Bahasa Indonesia
Oke Pool Community - Bahasa Indonesia
 
презентация вдк киноостров_2014 (1)
презентация вдк киноостров_2014 (1)презентация вдк киноостров_2014 (1)
презентация вдк киноостров_2014 (1)
 
Realising SRHR to Accelerate Action to End Child Marriage in South Asia
Realising SRHR to Accelerate Action to End Child Marriage in South AsiaRealising SRHR to Accelerate Action to End Child Marriage in South Asia
Realising SRHR to Accelerate Action to End Child Marriage in South Asia
 
Rhel7 beta information
Rhel7 beta informationRhel7 beta information
Rhel7 beta information
 
Data management ii 12 nov 2013
Data management  ii 12 nov 2013Data management  ii 12 nov 2013
Data management ii 12 nov 2013
 
Presenting with impact.potx
Presenting with impact.potxPresenting with impact.potx
Presenting with impact.potx
 
Food security, nutrition, gender and SRHR: Interlinkages and recommendations ...
Food security, nutrition, gender and SRHR: Interlinkages and recommendations ...Food security, nutrition, gender and SRHR: Interlinkages and recommendations ...
Food security, nutrition, gender and SRHR: Interlinkages and recommendations ...
 
Historical leader - Napoleon Bonaparte
Historical leader - Napoleon BonaparteHistorical leader - Napoleon Bonaparte
Historical leader - Napoleon Bonaparte
 
Lesson plan presentation skills 30th sept
Lesson plan presentation skills 30th septLesson plan presentation skills 30th sept
Lesson plan presentation skills 30th sept
 
SDGs and ICPD: Exploring Linkages and Strategizing the Way Forward to Achieve...
SDGs and ICPD: Exploring Linkages and Strategizing the Way Forward to Achieve...SDGs and ICPD: Exploring Linkages and Strategizing the Way Forward to Achieve...
SDGs and ICPD: Exploring Linkages and Strategizing the Way Forward to Achieve...
 
Zendesk reseller agreement worldwide template_dec_1_14[3]
Zendesk reseller agreement worldwide template_dec_1_14[3]Zendesk reseller agreement worldwide template_dec_1_14[3]
Zendesk reseller agreement worldwide template_dec_1_14[3]
 

Semelhante a Module 4 data analysis

Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"James Neill
 
Nursing Data Analysis.pptx
Nursing Data Analysis.pptxNursing Data Analysis.pptx
Nursing Data Analysis.pptxChinna Chadayan
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelHiram Ting
 
Data Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingData Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingSOMASUNDARAM T
 
Chapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and TabulationChapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and TabulationInternational advisers
 
Planning the analysis and interpretation of resseaech data
Planning the analysis and interpretation of resseaech dataPlanning the analysis and interpretation of resseaech data
Planning the analysis and interpretation of resseaech dataramil12345
 
A review of statistics
A review of statisticsA review of statistics
A review of statisticsedisonre
 
Edisons Statistics
Edisons StatisticsEdisons Statistics
Edisons Statisticsteresa_soto
 
Edison S Statistics
Edison S StatisticsEdison S Statistics
Edison S Statisticsteresa_soto
 
Statistics for Managers pptS for better understanding
Statistics for Managers pptS for better understandingStatistics for Managers pptS for better understanding
Statistics for Managers pptS for better understandingShamshadAli58
 
Exploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data AnalyticsExploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data Analyticsharshrnotaria
 
April Heyward Research Methods Class Session - 8-5-2021
April Heyward Research Methods Class Session - 8-5-2021April Heyward Research Methods Class Session - 8-5-2021
April Heyward Research Methods Class Session - 8-5-2021April Heyward
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIStudio Synthesis
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statisticsjasondroesch
 

Semelhante a Module 4 data analysis (20)

RM7.ppt
RM7.pptRM7.ppt
RM7.ppt
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"
 
Nursing Data Analysis.pptx
Nursing Data Analysis.pptxNursing Data Analysis.pptx
Nursing Data Analysis.pptx
 
Analyzing survey data
Analyzing survey dataAnalyzing survey data
Analyzing survey data
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate Level
 
Data Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingData Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report Writing
 
Chapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and TabulationChapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and Tabulation
 
Planning the analysis and interpretation of resseaech data
Planning the analysis and interpretation of resseaech dataPlanning the analysis and interpretation of resseaech data
Planning the analysis and interpretation of resseaech data
 
A review of statistics
A review of statisticsA review of statistics
A review of statistics
 
Edisons Statistics
Edisons StatisticsEdisons Statistics
Edisons Statistics
 
Edison S Statistics
Edison S StatisticsEdison S Statistics
Edison S Statistics
 
Presentation1
Presentation1Presentation1
Presentation1
 
Statistics for Managers pptS for better understanding
Statistics for Managers pptS for better understandingStatistics for Managers pptS for better understanding
Statistics for Managers pptS for better understanding
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptx
 
Stat and prob a recap
Stat and prob   a recapStat and prob   a recap
Stat and prob a recap
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Exploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data AnalyticsExploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data Analytics
 
April Heyward Research Methods Class Session - 8-5-2021
April Heyward Research Methods Class Session - 8-5-2021April Heyward Research Methods Class Session - 8-5-2021
April Heyward Research Methods Class Session - 8-5-2021
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 

Último

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 

Último (20)

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 

Module 4 data analysis

  • 1. Data Analysis Module ILRI Graduate Fellows skills training Nairobi 4th December 2013
  • 2. Session Objectives To be able to; • Answer the research questions: What results do you need to show and in what format (tables, graphs, charts etc.) Selection of data analysis methods (principles and concepts) • Identify, evaluate and apply data analysis packages – software available, open source • Plan analysis of own data and carry out exploratory analysis • Carry out formal data analysis using different tools and methods e.g. R
  • 3. Research Process • Problem definition • Literature review • Objective & hypothesis • Study design • Sampling • Data collection • Data management • Formal analysis • Reporting • Publication • Data archiving or publication Project development implementation Communicating findings Definition of problem domain & how the specific problem fits in Identification of gaps, appropriate methods & theory Research will approve or disapprove the hypothesis Research strategy to be used, sample size, sampling frame  Sample selection  Data collection tools  Database development and data cleaning  Exploration, description, modelling & interpretation of statistical outputs  Choice of reporting media & format  Advise on presentation of results  Data sharing media
  • 4. Data Analysis – Guiding Principles Translating Research Questions / Objectives / Hypotheses into an ‘analysis plan’ • Used our research questions / objectives / hypotheses to design the study – experiment or survey, questions to ask & data to collect • We use them again to plan the analysis – what differences do I need to show, what are my response variables, what types of model may I need to use etc. • This is often a good time to draft tables and graphs which you think will help answer the questions
  • 5. Data Analysis – Multi-level Data Structures • Many studies are designed at different levels and collect data at these multiple levels • E.g. in experiments this could be animals or plots → blocks and for survey this may be animal → household → village → district. • Another aspect of ‘levels’ is repeated measurements over time. • These levels must not be forgotten when we reach analysis stage – which level do we summarise each variable at? What is our ‘unit of analysis’? • For formal analysis then there are advanced methods that allow the data to be analysed in a way that allows for the multi-level data structure (including variation). • The analysis can often be simplified into fewer dimensions by summarising particular aspects • e.g. means between two points in time, the slope of the trend between two points, the value reached at the end.
  • 6. Data Analysis – Response & Explanatory variables Terms are discipline specific: Response ≡ Dependent variable ≡ y Explanatory ≡ Independent variable ≡ x’s • Explanatory variables can be continuous or discrete • In epidemiology they can be ‘confounding’ or biological ‘interacting’* • In economics they can be exogenous or endogenous *Note that statistical confounding and interactions have different interpretations
  • 7. Data Analysis – Variable Types • Variables can be: CONTINUOUS or DISCRETE • In analysis we may sometimes convert continuous data to discrete • Both Response and Explanatory variables can be continuous or discrete.
  • 8. Data Analysis – Aim of Exploratory Analysis • Data exploration is the first stage to any analysis of the data – people often jump straight to formal analysis and models but it is at this stage where you will identify patterns and ‘odd’ data • In some cases this may be 90% of the analysis you do on the data (e.g. Case Study 11) • The more complicated the data set the more interesting and necessary the exploratory phase becomes. • With some expertise in data management we can highlight the important patterns within the data and list the types of statistical models and their forms that need to be fitted at the next stage.
  • 9. Data Analysis – Activities of Exploratory Analysis
  • 10. Data Analysis – Exploratory Analysis Methods (Continuous & Discrete Variables)  Tools for exploratory analysis: • Means & ranges Case Study 2 • 1 & 2-way tables of means Case Study 3 • Frequency tables Case Study 3 & Case Study 8 – use of Excel pivot tables! • Histograms Case Study 11 • Scatterplots Case Study 1 & Case Study 3 • Boxplots Case Study 3 • Bar charts & pie charts Case Study 11 • Trend graphs, survival curves Case Study 3  Identify patterns and unusual variables – e.g. outliers, zeros  Measures of variation – variance, standard deviation, confidence interval, standard error, CV  Transformation of variables?
  • 11. Data Analysis – Confirmatory / Formal Analysis Exploratory analysis was the first part of the analysis of our research data – to gain initial understanding of patterns that exist in the data and suggest further analysis needs By the time we reach Confirmatory / Formal analysis we have refined our objectives and these should clearly define exactly what type of further statistical analysis we need (and which models to use) In exploratory analysis it is difficult to look at many variables at the same time – formal analysis allows us to do this and be able to see which variables are more important and others.
  • 12. Data Analysis – Confirmatory / Formal Analysis (Some options…) (Non-parametric equivalents for small samples) Response / Dependent Variable(s) Discrete Continuous Explanatory / Independent Variable(s) Discrete Chi-square/ Regression (Logistic – binary; Poisson - count) T-Test / Analysis of Variance (/ Regression) Continuous Regression (Logistic – binary; Poisson - count) Correlation / Linear Regression Both Regression (Logistic – binary; Poisson - count) ANOVA / Linear Regression
  • 13. Data Analysis – Confirmatory / Formal Analysis (Some advanced options…) • Mixed-effects models (REML) – incorporates random effects including spatial & temporal repeated measurements; better at managing data with many levels of hierarchy; used a lot in epidemiology, animal/plant genetics. • Survival models. • Multivariate (> 1 y) methods– can be both parametric & non- parametric). • Proportional-odds models when categorical response with both than 2 categories.
  • 14. Data Analysis – Confirmatory / Formal Analysis Concepts The underlying concept in formal statistical modeling is: Data = Pattern + Residual  ‘Data’ are the raw data (responses) that you collected, or sometimes may be summaries derived from the raw data*. They could also be transformed values*  ‘Pattern’ is all the variables (continuous or discrete) that are in the design of the study (e.g. treatment) or have been selected in your exploratory analysis as explaining some of the differences in the ‘Data’  ‘Residual’ is the variation we can’t explain by the ‘Pattern’. The aim of our formal analysis is to put as much of the variation as possible into the ‘Pattern’ while keeping the model as simple as possible…easy  *See earlier slides
  • 15. Data Analysis – Confirmatory / Formal Analysis: Correlation & simple Linear Regression 1  We will use correlation & fitting a straight line to data to explain the concepts of statistical modeling: • The simplest way of looking at the relationship between an x-variable and a y-variable is with the CORRELATION. • An extension of this is to use a LINEAR REGRESSION model to fit a straight line through the points (Case study 3) • To look at how well this model is fitting we use an ‘analysis of variance’ – the amount of variation in the Pattern vs. in the Residual
  • 16. Data Analysis – Confirmatory / Formal Analysis: Correlation & simple Linear Regression 2  Linear regression and similar models present the analysis in an ‘analysis of variance’ table that looks like (Section 3.2):  In this example the p-value will tell you if the slope of the line is significantly different from 0 (i.e. a flat line) (Section 5)  Models such as Logistic for binary data and proportions or Poisson for counts give a similar table but it is now the ‘analysis of deviance’ with similar interpretations (Section 10.2 Source of variation d.f. s.s. m.s. v.r (F-value) p-value Slope of line 1 Residual (error) N-2 Total N-1
  • 17. Data Analysis – Confirmatory / Formal Analysis: Correlation & simple Linear Regression 3  A key aspect of any model is the ‘model checking’ – part of this is done through examination of the residuals.  For all regression models which assume that either the data or the residuals are ‘normal’ then we use the same assumptions of independence, randomness and normal distribution
  • 18. Data Analysis – Confirmatory / Formal Analysis: Parameter Estimates & Least Square Means  We also look at the parameter estimates and their standard errors – for the linear regression example the parameter estimate is the slope (and intercept). For more complex models and those with discrete explanatory variable we will use the parameter estimates to compare levels of the discrete variables (Case Study 3 for examples & discussion).  For models containing discrete variables as explanatory / independent variables we will often want to present Means and Standard Errors and compare these (with t-test if comparing 2 or multiple comparison tests) (Section 6).
  • 19. Data Analysis – Confirmatory / Formal Analysis: Exercise  Identify what sort of model you may use in your research (check the Statistical Modeling Teaching Guide) – e.g. linear regression (section 3), designed experiment (section 4), response data which are proportions or binary (section 10), count response data (section 11), survival data (section 12).  Which parameters may be included in your model as the Pattern  Draw a pretend analysis of variance / deviance or parameter estimates table of what you may expect to see in the analysis
  • 20. Features Excel Stata SPSS SAS R Learning curve Gradual/flat Steep/gradual Gradual/flat Pretty steep Pretty steep User interface Point-and-click Programming/point- and-click Mostly point-and- click Programming Programming Data manipulation Weak/moderate Very strong Moderate Very strong Very strong Data analysis Modest Powerful Powerful Powerful/versatile Powerful/versatile Graphics Very good Very good Very good Good Excellent Cost Part of MS office Affordable (perpetual licenses, renew only when upgrade) Expensive (with annual license renewal ) Expensive Annual Renewal Open source (free) Data Analysis – Application (Statistical Software Packages)
  • 21. Data Analysis – Application (Using R) Outline • Installing R • R Environment  Command prompt  RStudio  Setting your workspace • Loading and installing R packages • Importing data into R - *.csv/*.xls • Saving R data • Data exploration – summary statistics • Graphing in R - boxplot • Data analysis – T-test/linear & logistic regression
  • 22. Introduction to R Installing R - Download R from http://cran.r-project.org CRAN – Comprehensive R Archive Network R version changes over time, the current one is E- 2.15.0 - Installing RStudio - Setting up the work environment
  • 23. Introduction to R R Environment Command prompt R is primarily a command driven software where instructions are typed at the command prompt (> ) R is case sensitive Rstudio Rstudio has limited set of commands that can be selected and executed from the menu Setting your workspace It is important to set R preferences to suit your work environment, one such setting is the working directory. Working directory is set using the command setwd setwd("D:/My Documents/R course") or from File->change dir on the menu Take note of / R will not recognize when specifying subdirectories
  • 24. Introduction to R Loading and installing R packages Modules or sets of functions are referred to as PACKAGES in R. Some packages are part of the base installation while others have to be installed separately. There are several user-contributed packages. Type library() to view installed packages To view functions within a package type library(help=“packagename”) e.g. library(help=stats) – no quotes Install packages using the menu Packages->Install package(s) …. Use find(“item”) command to identify the package containing an item of interest e.g. find(“plot”), if you are sure of the exact name otherwise use apropos(“item”)
  • 25. Introduction to R Importing data into R - *.csv/*.xls Although it is possible to enter data directly into R, importing data in a spreadsheet format is more efficient. Use: i. Read.table – to import space separated data with column headings (*.txt) prod1 <- read.table("D://My Documents/R course/PROD2B.txt", header=T, sep=",") ii. Read.csv – to import comma separated data with column headings (*.csv) prod2 <- read.csv("D://My Documents/R course/PROD2B.csv", header=T) To save the file in R write.table(prod2, file="D://My Documents/R course/proddata2", quote=FALSE) i. odbcConnectExcel() - to import excel worksheet prod3<-"D://My Documents/R course/PROD2B.xls“ datachannel<-odbcConnectExcel(prod3) outprod3 <-sqlFetch(channel= datachannel, sqtable="prod3") write.table(outprod3, file="D://My Documents/R course/proddata3", quote=FALSE)
  • 26. Introduction to R Data exploration – summary statistics One can get summary statistics on all numeric variables in the dataset using summary(datasetname) eg. summary(outprod3) It is also possible to get summary statistics on a particular variable, use $ to attach variable to the data table e.g. summary(outprod3$WEIGHT) Use aggregate to get summary statistics by group/category e.g. aggregate(data.frame(WEIGHT), by=list(herd=HERD,sex=SEX), mean) It is advisable to attach a data file to avoid having to specify the data file all the time particularly for long summaries such as aggregate attach(outprod3)
  • 27. Introduction to R Graphing in R – boxplot - R has powerful graphing features that can be used in data exploration, such as histograms, boxplot, scatterplot, etc. histogram(~PCV, n=30, xlab="Packed Cell Volume") boxplot(PCV, ylab="Packed Cell Volume") boxplot(PCV~HERD, color="orange", ylab="Packed Cell Volume", xlab="Herd") xyplot(PCV~WEIGHT, color="orange", ylab="Packed Cell Volume", xlab="Weight")
  • 28. Introduction to R Data analysis – T-test - t.test(prod2$WEIGHT~prod2$SEX) Data analysis – Linear Regression - output1<-lm(PCV~WEIGHT) - Remember to attach the dataset to make it active - attach(prod2) Data analysis – Logistic Regression -
  • 29. References  Research Methods & Biometrics Teaching Resource – Case Study 1 & 4 have R, Case Study 2 has ANOVA: many others used in this session as examples. The study guides are useful for reference material on Explanatory and Formal Analysis.  Take home:  Analysis the Data & Models Chapters in Green Book  Good Statistical Practice for Natural Resources Research – Part IV  R Intro Course Notes – Nicholas Ndiwa  Reading University SSC – Approaches to Analysis of Survey Data; Confidence & Significance: Key Concepts of Inferential Statistics; Modern methods of analysis; Analysis of Experimental Data

Notas do Editor

  1. May summarise at the different levels and answer to ‘unit of analysis’ is that IT DEPENDSExamples for variability at different levels including e.g. animal breeding where we look at how different animals are in the same household and then in the same environment.Draw diagram of response changes over time (e.g. plant/animal growth) and how we can analysis various parts of the graph – average, total, final, slope etc.
  2. Exogenous = Determined by variables outside the system / model; Endogenous = Value determined by states of other variables a.k.a. correlated (= confounded if completely determined)‘Confounding’ = Confounding is a distortion of an association between an exposure (E) and disease (D) brought about by extraneous factors (C1, C2 etc). This problem occurs when E is associated with C and C is an independent risk factor for D. For example, smoking (C) confounds the relationship between alcohol consumption (E) and lung cancer (D), since alcohol and smoking are related, and smoking (C) is an independent risk factor for lung cancer (D).Biological interacting = Cccurs when there is a difference in the biologic effect of an exposure according to the presence or absence of another factor. Biological interaction can be thought of as effect modification, and is an example of antagonism and synergy. An example of interaction is seen in the case of oral contraceptive use (E), cardiovascular disease (D), and smoking (C). Because smoking (C) amplifies thromboembolic-disease risk (D) in oral contraceptive users, interaction is said to exist. This is why oral contraceptives carry a boxed warning advising against their use in smokers.
  3. Discrete –distinct values, in experiments could be the treatments, blocks. In surveys could be wealth levels, gender, education level. Can also be count data – e.g. plants germinated, number of times an animal is fed. Binary data also discrete – e.g. presence / absence, yes / no. can only take certain values; can be ordinal, nominal or binary. Continuous – numbers that can take any value. Examples – weights, heights, counts of plants etc.Reasons for changing continuous data to discrete may be due to the distribution of the data or the biological knowledge of the variable and where differences may occur, and sometimes for simplicity
  4. Put in hyperlink to Case Study 11
  5. Decide on appropriate ways to answer the research questions / hypotheses / objectives &amp; may identify additional ones Decide on which variables to initially explore Develop a provisional, draft report. Understand the data and their variability better. Study frequency distributions for classification variables and decide whether amalgamation of classification levels is needed. Study data distributions and decide whether transformations of data are required.Detect unusual values in the data and decide what to do about them. Investigate the types of patterns inherent in the data &amp; types of relationships that may be present.Provide preliminary initial summaries, arranged in tables and graphs, that provide some of the answers required to meet the objectives. Lay out a plan and a series of objectives for the statistical modeling phase.Decide on which variables to model
  6. Identify which methods work for which type of data: means = cont, 1-way / 2-way tables of means = both, frequency tables = discrete; histograms= both; scatterplots = 2 cont; boxplots = both; bar / pie charts = both Go through the meaning and calculations for measures of variation – or get them to google to see what they find!Note that the calculations are more complex for surveys where we usually have multiple levels of data (households within villages within…)ADD IN HYPERLINKS TO EXAMPLES
  7. Simple formal analysis may use t-tests or chi-square tests – these are similar to exploratory analysis as only look at 2 variables at the same time. These methods are then extended to statistical models such as linear regression.
  8. What’s the equation of a straight line? – y = mx + c where c = intercept and m = gradient. To this straight line we add the residual e which is the vertical difference between the point and the line. Sum of e’s = 0. A linear regression uses what is called ‘least squares’ to minimise the square of all these differences.Read section 3.1 of the statistical guides in the Biometrics &amp; Research Methods CDADD HYPERLINK TO EXAMPLE
  9. Get students to read through section 3.2 of the Study Guide in the CD on Statistical modeling – the maths part!Exercise: 1. Identify what sort of model they may be using (from the type of data they have collected) – e.g. linear regression (section 3), designed experient (section 4), response data which are proportions or binary (section 10), count response data (section 11), survival data (section 12)
  10. Give examples of the use of parameter estimates – take students through Statistical Modeling in Case Study 3ADD HYPERLINK TO CASE STUDY
  11. Search for principal component analysis and install it.