SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
1 Het begint met een idee
Data Analysis
Descriptive Statistics and EDA
Giuseppe Procaccianti
Vrije Universiteit Amsterdam
2 Giuseppe Procaccianti / S2 group / The Green Lab
Quick Recap
Experiment
scoping
Experiment
planning
Idea
Experiment
operation
Analysis &
interpretation
Presentation &
package
Vrije Universiteit Amsterdam
3 Giuseppe Procaccianti / S2 group / The Green Lab
Analysis and Interpretation
● Understanding the data
○ descriptive statistics
○ exploratory data analysis (EDA, e.g. boxplots, scatter plots)
● (Optional) data reduction
● Hypothesis testing
● Results interpretation
Vrije Universiteit Amsterdam
4 Giuseppe Procaccianti / S2 group / The Green Lab
Descriptive Statistics
● Goal: get a ‘feeling’ about how data is distributed
● Properties:
○ Central Tendency (e.g. Mean, Median)
○ Dispersion (e.g. Frequency, Standard Deviation)
○ Dependency (e.g. Correlation)
Vrije Universiteit Amsterdam
5 Giuseppe Procaccianti / S2 group / The Green Lab
Parameter vs. statistic
● Parameter: feature of the population
○ μ: mean
○ σ: standard deviation
● Statistic: feature of the sample
○ : mean
○ s: standard deviation
● Statistics are an estimation of parameters
Vrije Universiteit Amsterdam
6 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency
● Arithmetic mean:
● Geometric Mean:
Vrije Universiteit Amsterdam
7 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency: example
● Average of scores:
6 - 7 - 8 - 9 - 10
● Arithmetic mean: 8
● Geometric mean: ~7.87
Vrije Universiteit Amsterdam
8 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency: example
● Average of returns of investments:
90% ; 10% ; 20% ; 30% ; -90%
● Arithmetic mean:
(90+10+20+30-90)/5= 12%
● Geometric mean:
[(1.9 x 1.1 x 1.2 x 1.3 x 0.1) ^ 1/5] - 1 =0.2008= -20.08%
Vrije Universiteit Amsterdam
9 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency
● Median (or 50% percentile): middle value separating the
greater and lesser halves of a data set
X = [13, 18, 13, 14, 13, 16, 14, 21, 13]
Xsort
= [13, 13, 13, 13, 14, 14, 16, 18, 21]
Vrije Universiteit Amsterdam
10 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency
● Mode: most frequent value in data set
X = [13, 18, 13, 14, 13, 16, 14, 21, 13]
Mox
= 13
Vrije Universiteit Amsterdam
11 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency - Skewness
Vrije Universiteit Amsterdam
12 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion
● Sample variance:
● Standard Deviation:
● Standard Deviation is dimensionally equivalent to the data
Vrije Universiteit Amsterdam
13 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - three-sigma-rule
"Empirical Rule" by Dan Kernler - Own work. Licensed under CC BY-SA 4.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG#/media/File:Empirical_Rule.PNG
Vrije Universiteit Amsterdam
14 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - three-sigma-rule
● Range:
● Coefficient of variation:
(in percentage of mean)
● Coefficient of variation only has meaning if all values are
positive (ratio scale, not interval scale e.g. temperatures)
Vrije Universiteit Amsterdam
15 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - example
● Dataset: [100, 100, 100]
Mean: 100
● Variance: 0
● Standard Deviation: 0
● Coeff. Variation: 0
● Range: 0
Vrije Universiteit Amsterdam
16 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - example
● Dataset: [90, 100, 110]
Mean: 100
● Sample Variance: 100
● Standard Deviation: 10
● Coeff. Variation: 10%
● Range: 20
Vrije Universiteit Amsterdam
17 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - example
● Dataset: [1, 5, 6, 8, 10, 40, 65, 88]
Mean: 27.875
● Sample Variance: 1082.69
● Standard Deviation: 32.9
● Coeff. Variation: 1.18%
● Range: 87
Vrije Universiteit Amsterdam
18 Giuseppe Procaccianti / S2 group / The Green Lab
Basic visualizations
Box Plot
Median
3rd quartile
1st quartile
Vrije Universiteit Amsterdam
19 Giuseppe Procaccianti / S2 group / The Green Lab
Basic visualizations
Box Plot
Vrije Universiteit Amsterdam
20 Giuseppe Procaccianti / S2 group / The Green Lab
Basic visualizations
Box Plot
By Gbdivers (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
outliers positive
skewness
Vrije Universiteit Amsterdam
21 Giuseppe Procaccianti / S2 group / The Green Lab
Dependency: correlation
● Sample correlation coefficient (Pearson):
● Meaningful when comparing paired values/datasets
Vrije Universiteit Amsterdam
22 Giuseppe Procaccianti / S2 group / The Green Lab
Dependency: correlation
● Spearman’s rank correlation coefficient:
● Kendall’s rank correlation coefficient:
○ smaller values
○ more accurate on small samples
● Pearson correlation coefficient assumes normally distributed
data
Vrije Universiteit Amsterdam
23 Giuseppe Procaccianti / S2 group / The Green Lab
Dependency: example
Age vs. body fat %
● Pearson: r = 0.7921
● Spearman: = 0.7539
● Kendall: = 0.5762
Vrije Universiteit Amsterdam
24 Giuseppe Procaccianti / S2 group / The Green Lab
Basic Visualizations
Scatter Plot
Vrije Universiteit Amsterdam
25 Giuseppe Procaccianti / S2 group / The Green Lab
Basic Visualizations
Image Source:
http://www.cqeacademy.com/cqe-body-of-knowledge/continuous-improvement/quality-control-tools/the-scatter-
plot-linear-regression/
Scatter plots per different
values of r
Vrije Universiteit Amsterdam
26 Giuseppe Procaccianti / S2 group / The Green Lab
Correlation does NOT imply causation!
● Spurious Correlations: http://tylervigen.com/
Vrije Universiteit Amsterdam
Thank you!
g.procaccianti@vu.nl
i.malavolta@vu.nl
27 Giuseppe Procaccianti / S2 group / The Green Lab

Mais conteúdo relacionado

Mais procurados

Business Research Methods Chap017
Business Research Methods Chap017Business Research Methods Chap017
Business Research Methods Chap017
Mazhar Masood
 

Mais procurados (20)

The Green Lab - [09 B] Experiment validity
The Green Lab - [09  B] Experiment validityThe Green Lab - [09  B] Experiment validity
The Green Lab - [09 B] Experiment validity
 
[05-A] Experiment design (basics)
[05-A] Experiment design (basics)[05-A] Experiment design (basics)
[05-A] Experiment design (basics)
 
[05-B] Experiment design (advanced)
[05-B] Experiment design (advanced)[05-B] Experiment design (advanced)
[05-B] Experiment design (advanced)
 
[13 - A] Experiment validity
[13 - A] Experiment validity[13 - A] Experiment validity
[13 - A] Experiment validity
 
The Green Lab - [03 A] Experiment planning
The Green Lab - [03 A] Experiment planningThe Green Lab - [03 A] Experiment planning
The Green Lab - [03 A] Experiment planning
 
The Green Lab - [01 C] Empirical software engineering
The Green Lab - [01 C] Empirical software engineeringThe Green Lab - [01 C] Empirical software engineering
The Green Lab - [01 C] Empirical software engineering
 
[03-A] Experiment planning
[03-A] Experiment planning[03-A] Experiment planning
[03-A] Experiment planning
 
[02-A] The experimental process
[02-A] The experimental process[02-A] The experimental process
[02-A] The experimental process
 
[07-B] Statistical hypothesis testing
[07-B] Statistical hypothesis testing[07-B] Statistical hypothesis testing
[07-B] Statistical hypothesis testing
 
[03-B] Measurement theory basics
[03-B] Measurement theory basics[03-B] Measurement theory basics
[03-B] Measurement theory basics
 
[02-B] Experiment scoping
[02-B] Experiment scoping[02-B] Experiment scoping
[02-B] Experiment scoping
 
The Green Lab - [04-A] Lab environment and tools
The Green Lab - [04-A] Lab environment and toolsThe Green Lab - [04-A] Lab environment and tools
The Green Lab - [04-A] Lab environment and tools
 
[01-B] Empirical software engineering
[01-B] Empirical software engineering[01-B] Empirical software engineering
[01-B] Empirical software engineering
 
Data visualization via Tableau solving an excel problem
Data visualization via Tableau solving an excel problemData visualization via Tableau solving an excel problem
Data visualization via Tableau solving an excel problem
 
OHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysisOHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysis
 
Business Research Methods Chap017
Business Research Methods Chap017Business Research Methods Chap017
Business Research Methods Chap017
 
On e-Assessment
On e-AssessmentOn e-Assessment
On e-Assessment
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125
 
3701552978
37015529783701552978
3701552978
 
Iannacci Cornford BAM_2017
Iannacci Cornford BAM_2017Iannacci Cornford BAM_2017
Iannacci Cornford BAM_2017
 

Destaque

WebSci2013 Harnessing Disagreement in Crowdsourcing
WebSci2013 Harnessing Disagreement in CrowdsourcingWebSci2013 Harnessing Disagreement in Crowdsourcing
WebSci2013 Harnessing Disagreement in Crowdsourcing
Lora Aroyo
 

Destaque (17)

The Green Lab - [07-B] Hypothesis Testing
The Green Lab - [07-B] Hypothesis TestingThe Green Lab - [07-B] Hypothesis Testing
The Green Lab - [07-B] Hypothesis Testing
 
The Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setupThe Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setup
 
The Green Lab - [02 B] Experiment scoping
The Green Lab - [02 B] Experiment scopingThe Green Lab - [02 B] Experiment scoping
The Green Lab - [02 B] Experiment scoping
 
The Green Lab - [01-B] Case study presentation
The Green Lab - [01-B] Case study presentationThe Green Lab - [01-B] Case study presentation
The Green Lab - [01-B] Case study presentation
 
The Green Lab - [13 B] Future research challenges
The Green Lab - [13 B] Future research challengesThe Green Lab - [13 B] Future research challenges
The Green Lab - [13 B] Future research challenges
 
The Green Lab - [02 C] [case study] Progressive web apps
The Green Lab - [02 C] [case study] Progressive web appsThe Green Lab - [02 C] [case study] Progressive web apps
The Green Lab - [02 C] [case study] Progressive web apps
 
The Green Lab - [02 A] The experimental process
The Green Lab - [02 A] The experimental processThe Green Lab - [02 A] The experimental process
The Green Lab - [02 A] The experimental process
 
Beyond Native Apps: Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...
Beyond Native Apps:  Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...Beyond Native Apps:  Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...
Beyond Native Apps: Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...
 
CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...
CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...
CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...
 
WebSci2013 Harnessing Disagreement in Crowdsourcing
WebSci2013 Harnessing Disagreement in CrowdsourcingWebSci2013 Harnessing Disagreement in Crowdsourcing
WebSci2013 Harnessing Disagreement in Crowdsourcing
 
PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors
 
Agora User Committee Meeting 2013
Agora User Committee Meeting 2013Agora User Committee Meeting 2013
Agora User Committee Meeting 2013
 
Talk of Europe – Linking European Parliament Proceedings
Talk of Europe – Linking European Parliament ProceedingsTalk of Europe – Linking European Parliament Proceedings
Talk of Europe – Linking European Parliament Proceedings
 
SealincMedia Accurator Demos
SealincMedia Accurator DemosSealincMedia Accurator Demos
SealincMedia Accurator Demos
 
Dive exploring history presentation
Dive exploring history presentationDive exploring history presentation
Dive exploring history presentation
 
Future TV is Now: Personalized & Social
Future TV is Now: Personalized & SocialFuture TV is Now: Personalized & Social
Future TV is Now: Personalized & Social
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & Health
 

Semelhante a The Green Lab - [07-A] Data Analysis

Lecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptxLecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptx
shakirRahman10
 
UNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptUNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.ppt
NAGESH108233
 
UNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptUNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.ppt
NAGESH108233
 
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic SystemEnsemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Mariangel (Angie) Garcia, Ph.D
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2
Daria Bogdanova
 

Semelhante a The Green Lab - [07-A] Data Analysis (20)

[07-A] Descriptive Statistics and data exploration
[07-A] Descriptive Statistics and data exploration[07-A] Descriptive Statistics and data exploration
[07-A] Descriptive Statistics and data exploration
 
Circular Analysis in Neuroscience
Circular Analysis in NeuroscienceCircular Analysis in Neuroscience
Circular Analysis in Neuroscience
 
Lecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptxLecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptx
 
UNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptUNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.ppt
 
UNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptUNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.ppt
 
Data in science
Data in science Data in science
Data in science
 
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...
 
Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
2.7.21 sampling methods data analysis
2.7.21 sampling methods data analysis2.7.21 sampling methods data analysis
2.7.21 sampling methods data analysis
 
Regulative Supports for Inquiry Learning with Simulations and Modeling
Regulative Supports for Inquiry Learning with Simulations and ModelingRegulative Supports for Inquiry Learning with Simulations and Modeling
Regulative Supports for Inquiry Learning with Simulations and Modeling
 
Teknik sampling.pptx
Teknik sampling.pptxTeknik sampling.pptx
Teknik sampling.pptx
 
Statistical test
Statistical testStatistical test
Statistical test
 
Research Methods for Business 6-ch06 (research design).pptx
Research Methods for Business 6-ch06 (research design).pptxResearch Methods for Business 6-ch06 (research design).pptx
Research Methods for Business 6-ch06 (research design).pptx
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research data
 
Statistic
StatisticStatistic
Statistic
 
Pt 12 Mixed Research.pptx
Pt 12 Mixed Research.pptxPt 12 Mixed Research.pptx
Pt 12 Mixed Research.pptx
 
How to prepare a thesis
How to prepare a thesisHow to prepare a thesis
How to prepare a thesis
 
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic SystemEnsemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2
 

Mais de Giuseppe Procaccianti

Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
Giuseppe Procaccianti
 
EnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
EnviroInfo 2013: Energy Efficiency in Cloud Software ArchitecturesEnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
EnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
Giuseppe Procaccianti
 

Mais de Giuseppe Procaccianti (7)

Energy Efficiency of ORM Approaches
Energy Efficiency of ORM ApproachesEnergy Efficiency of ORM Approaches
Energy Efficiency of ORM Approaches
 
The Green Lab - Experimentation in Software Energy Efficiency (ICSE)
The Green Lab - Experimentation in Software Energy Efficiency (ICSE)The Green Lab - Experimentation in Software Energy Efficiency (ICSE)
The Green Lab - Experimentation in Software Energy Efficiency (ICSE)
 
Four-dimensional Sustainable E-Services
Four-dimensional Sustainable E-ServicesFour-dimensional Sustainable E-Services
Four-dimensional Sustainable E-Services
 
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
 
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
 
SEIT 2013: A Categorization of Green Practices used by Dutch data centers
SEIT 2013: A Categorization of Green Practices used by Dutch data centersSEIT 2013: A Categorization of Green Practices used by Dutch data centers
SEIT 2013: A Categorization of Green Practices used by Dutch data centers
 
EnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
EnviroInfo 2013: Energy Efficiency in Cloud Software ArchitecturesEnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
EnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
 

Último

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

The Green Lab - [07-A] Data Analysis

  • 1. 1 Het begint met een idee Data Analysis Descriptive Statistics and EDA Giuseppe Procaccianti
  • 2. Vrije Universiteit Amsterdam 2 Giuseppe Procaccianti / S2 group / The Green Lab Quick Recap Experiment scoping Experiment planning Idea Experiment operation Analysis & interpretation Presentation & package
  • 3. Vrije Universiteit Amsterdam 3 Giuseppe Procaccianti / S2 group / The Green Lab Analysis and Interpretation ● Understanding the data ○ descriptive statistics ○ exploratory data analysis (EDA, e.g. boxplots, scatter plots) ● (Optional) data reduction ● Hypothesis testing ● Results interpretation
  • 4. Vrije Universiteit Amsterdam 4 Giuseppe Procaccianti / S2 group / The Green Lab Descriptive Statistics ● Goal: get a ‘feeling’ about how data is distributed ● Properties: ○ Central Tendency (e.g. Mean, Median) ○ Dispersion (e.g. Frequency, Standard Deviation) ○ Dependency (e.g. Correlation)
  • 5. Vrije Universiteit Amsterdam 5 Giuseppe Procaccianti / S2 group / The Green Lab Parameter vs. statistic ● Parameter: feature of the population ○ μ: mean ○ σ: standard deviation ● Statistic: feature of the sample ○ : mean ○ s: standard deviation ● Statistics are an estimation of parameters
  • 6. Vrije Universiteit Amsterdam 6 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency ● Arithmetic mean: ● Geometric Mean:
  • 7. Vrije Universiteit Amsterdam 7 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency: example ● Average of scores: 6 - 7 - 8 - 9 - 10 ● Arithmetic mean: 8 ● Geometric mean: ~7.87
  • 8. Vrije Universiteit Amsterdam 8 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency: example ● Average of returns of investments: 90% ; 10% ; 20% ; 30% ; -90% ● Arithmetic mean: (90+10+20+30-90)/5= 12% ● Geometric mean: [(1.9 x 1.1 x 1.2 x 1.3 x 0.1) ^ 1/5] - 1 =0.2008= -20.08%
  • 9. Vrije Universiteit Amsterdam 9 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency ● Median (or 50% percentile): middle value separating the greater and lesser halves of a data set X = [13, 18, 13, 14, 13, 16, 14, 21, 13] Xsort = [13, 13, 13, 13, 14, 14, 16, 18, 21]
  • 10. Vrije Universiteit Amsterdam 10 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency ● Mode: most frequent value in data set X = [13, 18, 13, 14, 13, 16, 14, 21, 13] Mox = 13
  • 11. Vrije Universiteit Amsterdam 11 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency - Skewness
  • 12. Vrije Universiteit Amsterdam 12 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion ● Sample variance: ● Standard Deviation: ● Standard Deviation is dimensionally equivalent to the data
  • 13. Vrije Universiteit Amsterdam 13 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion - three-sigma-rule "Empirical Rule" by Dan Kernler - Own work. Licensed under CC BY-SA 4.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG#/media/File:Empirical_Rule.PNG
  • 14. Vrije Universiteit Amsterdam 14 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion - three-sigma-rule ● Range: ● Coefficient of variation: (in percentage of mean) ● Coefficient of variation only has meaning if all values are positive (ratio scale, not interval scale e.g. temperatures)
  • 15. Vrije Universiteit Amsterdam 15 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion - example ● Dataset: [100, 100, 100] Mean: 100 ● Variance: 0 ● Standard Deviation: 0 ● Coeff. Variation: 0 ● Range: 0
  • 16. Vrije Universiteit Amsterdam 16 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion - example ● Dataset: [90, 100, 110] Mean: 100 ● Sample Variance: 100 ● Standard Deviation: 10 ● Coeff. Variation: 10% ● Range: 20
  • 17. Vrije Universiteit Amsterdam 17 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion - example ● Dataset: [1, 5, 6, 8, 10, 40, 65, 88] Mean: 27.875 ● Sample Variance: 1082.69 ● Standard Deviation: 32.9 ● Coeff. Variation: 1.18% ● Range: 87
  • 18. Vrije Universiteit Amsterdam 18 Giuseppe Procaccianti / S2 group / The Green Lab Basic visualizations Box Plot Median 3rd quartile 1st quartile
  • 19. Vrije Universiteit Amsterdam 19 Giuseppe Procaccianti / S2 group / The Green Lab Basic visualizations Box Plot
  • 20. Vrije Universiteit Amsterdam 20 Giuseppe Procaccianti / S2 group / The Green Lab Basic visualizations Box Plot By Gbdivers (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons outliers positive skewness
  • 21. Vrije Universiteit Amsterdam 21 Giuseppe Procaccianti / S2 group / The Green Lab Dependency: correlation ● Sample correlation coefficient (Pearson): ● Meaningful when comparing paired values/datasets
  • 22. Vrije Universiteit Amsterdam 22 Giuseppe Procaccianti / S2 group / The Green Lab Dependency: correlation ● Spearman’s rank correlation coefficient: ● Kendall’s rank correlation coefficient: ○ smaller values ○ more accurate on small samples ● Pearson correlation coefficient assumes normally distributed data
  • 23. Vrije Universiteit Amsterdam 23 Giuseppe Procaccianti / S2 group / The Green Lab Dependency: example Age vs. body fat % ● Pearson: r = 0.7921 ● Spearman: = 0.7539 ● Kendall: = 0.5762
  • 24. Vrije Universiteit Amsterdam 24 Giuseppe Procaccianti / S2 group / The Green Lab Basic Visualizations Scatter Plot
  • 25. Vrije Universiteit Amsterdam 25 Giuseppe Procaccianti / S2 group / The Green Lab Basic Visualizations Image Source: http://www.cqeacademy.com/cqe-body-of-knowledge/continuous-improvement/quality-control-tools/the-scatter- plot-linear-regression/ Scatter plots per different values of r
  • 26. Vrije Universiteit Amsterdam 26 Giuseppe Procaccianti / S2 group / The Green Lab Correlation does NOT imply causation! ● Spurious Correlations: http://tylervigen.com/
  • 27. Vrije Universiteit Amsterdam Thank you! g.procaccianti@vu.nl i.malavolta@vu.nl 27 Giuseppe Procaccianti / S2 group / The Green Lab