SlideShare uma empresa Scribd logo
1 de 40
Exploratory Data Analysis 
Wesley GOI
In today’s session 
• Principles behind exploratory analyses 
• Plotting data out on to popular exploratory graphs 
• Plotting Systems in R 
• Base (Week1) 
• Lattice (Week2) 
• GGPLOT2 (Week2) 
• Choosing and using Graphic Devices aka the output formats 
Scripts can be downloaded at: 
https://www.dropbox.com/s/ii1yj8f650d4l1q/lesson1.r?dl=0 
https://www.dropbox.com/s/eme44h6lrhn775l/final.r?dl=0
Principles behind exploratory analyses 
• Show comparisons 
• Show causality, mechanism, explanation 
• Show multivariate data 
• Integrate multiple modes of evidence 
• Describe and document the evidence 
• Content is king 
• SPEED
Dimensionality 
• Five-number summary 
• Boxplots 
• Histograms 
• Density plot 
• Barplot 
Multiple-overlayed 1D plots 
Scatter plots
Downloading our dataset 
R code 
dir.create("exploring_data") 
setwd(“exploring_data”) 
download.file(“http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/therbook.zip",dest="data.zip") 
unzip(“data.zip”)
R code 
Boxplots 
weather = read.table("SilwoodWeather.txt",h=T) 
onemonth = subset(weather, 
month==1 & yr == 2004) 
boxplot(onemonth$rain) 
Header = T
Histograms 
R code 
hist(weather$upper) 
rug(weather$upper) ticks for each value
Barplot 
R code 
Barplot( 
table(weather$month), 
col = "wheat", 
main = "Number of Observations in 
Months”)
Raster Vector 
PNG PDF SVG 
grDevices 
Filesize small medium medium 
Scalable No Yes Yes 
Web friendly Yes No Yes
Plotting Systems 
Plotting Systems 
Base Lattice Grid 
Libraries lattice grid, gridExtras 
ggplot2 
Example 
functions 
hist✔ 
barplot✔ 
boxplot✔ 
Plot 
xyplot (scatterplots) 
bwplot (boxplots) 
levelplot 
qplot 
ggplot 
geom 
Facetted plots Yes Yes Yes 
Grammar of 
NO No Yes 
graphics 
Interface with 
statistical 
functions 
Yes Partial Partial + 
Workarounds 
Cannot 
be mixed
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T)
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
#Regression Line 
with(data1, abline(lm(ys~xv))) 
Color
Base plots: Scatterplot 
Set symbol to represent data point
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#shape 
with(data2, 
points(xv2, ys2, col="blue", 
pch =11)) 
Symbol shape
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#shape 
with(data2, 
points(xv2, ys2, col="blue", 
pch =11)) 
Symbol shape
Base plots: Using par for multiple plots 
R code 
par(mfrow=c(1,2)) 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#Plot2 
with(data2, 
plot(xv2, ys2, col="blue", 
pch =11)) 
title(“My Title", outer=TRUE)
Par: To set global settings 
R code 
mfrow( 
mar=c(5.1,4.1,4.1,2.1), 
oma=c(2,2,2,2) 
)
Lattice 
productivity = read.table("productivity.txt",h=T) 
# of species in forest against differing productivity 
library(lattice) 
#plotting 
xyplot( x~y, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
R code 
Formular 
Data frame
Lattice 
productivity = read.table("productivity.txt",h=T) 
# of species in forest against differing productivity 
library(lattice) 
#plotting 
xyplot( x~y, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
xyplot( x~y | f, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
R code 
Formular 
Data frame 
given
ggplot2 
• Grammar of graphics (gg) 
• Based on GRID plotting system, cannot be 
mixed with base 
ggplot2.org
ggplot 
Components 
• Data & relationship 
• GEOMetric Object 
• Statistical transformation 
• Scales 
• Coordinate system 
• Facetting
ggplot 
Data
ggplot 
Mapping
ggplot 
Geometric objects 
aka 
Geoms 
Coordinate system 
wrt 
scales 
Log scale / sqrt / log ratio 
Title 
Plot 
Theme 
etc
ggplot 
Geometric objects 
aka 
Geoms
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object 
• Statistical transformation 
• Scales 
• Coordinate system 
• Facetting 
R code 
Rmbr to change 
month into a 
factor 
data.frame 
Aesthetics function which maps the relationships 
ggplot(weather, aes(x=month, y=upper))+ 
geom_boxplot()
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales 
• Coordinate system 
• Facetting 
R code 
weather2 = weather %>% 
group_by(month) %>% 
summarise(average.upper = mean(upper)) 
ggplot(weather2, aes(month, average.upper))+ 
geom_bar(stat="identity")
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales 
• Coordinate system 
• Facetting 
R code 
weather2 = weather %>% 
group_by(month) %>% 
summarise(average.upper = mean(upper)) 
ggplot(weather2, aes(month, average.upper))+ 
geom_bar(stat="identity")
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales✔ 
• Coordinate system 
• Facetting 
R code 
plot2 = ggplot(weather2, 
aes(month, average.upper))+ 
geom_bar(aes(fill=month),stat="identity")+ 
scale_fill_brewer(palette="Set3")+ 
xlab("Months")+ 
ylab("Upper Quantile")+theme_bw()
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales✔ 
• Coordinate system 
• Facetting 
R code 
plot2 = ggplot(weather2, 
aes(month, average.upper))+ 
geom_bar(aes(fill=month),stat="identity")+ 
scale_fill_brewer(palette="Set3")+ 
xlab("Months")+ 
ylab("Upper Quantile")+theme_bw()
ggplot
qplot 
A separate function which wraps ggplot, for simpler syntax 
R code 
qplot(month, upper, fill=month, data=weather, facets = ~yr, geom="bar", 
stat="identity")
Ethos behind visualization 
http://keylines.com/network-visualization
Final Challenge
Final Challenge 
R code 
library(ggplot2) 
#Reads in data 
data = read.csv("final.csv") 
#Preparing for the rectangle background 
areas=unique(subset(data, select=c(Planning_Area,Planning_Region))) 
areas=areas[order(areas$Planning_Region),] 
areas$rectid=1:nrow(areas) 
rectdata = areas %>% group_by(Planning_Region) %>% summarise(xstart=min(rectid)- 
0.5,xend= max(rectid)+0.5) 
#Order the levels 
data$Planning_Area=factor(data$Planning_Area, 
levels=as.character(areas[order(areas$Planning_Region),]$Planning_Area))
Final challenge 
#Plot 
p0 = 
ggplot(data, aes(Planning_Area, Unit_Price____psm_))+ 
geom_boxplot(outlier.colour=NA)+ 
geom_rect(data=rectdata,aes(xmin=xstart,xmax=xend,ymin = -Inf, ymax = Inf, fill = 
Planning_Region,group=Planning_Region), alpha = 0.4,inherit.aes=F)+ 
geom_jitter(alpha=0.40, aes(color=as.factor(Year)))+ 
scale_color_brewer("Year", palette='RdBu')+ 
scale_fill_brewer(palette="Set1",name='Region')+ 
theme_minimal()+ 
theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+ 
xlab("Planning Area")+ylab("Unit Price (PSM)") 
R code 
#Save plot 
ggsave(p0, file="areaboxplots.pdf",w=20,h=10,units="in",dpi=300)
“Above all else show the data.” 
― Edward R. Tufte, The Visual Display of Quantitative Information 
Thank you for your time
gridExtras

Mais conteúdo relacionado

Destaque

Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor AnalysisDaire Hooper
 
Hamilton 1994 time series analysis
Hamilton 1994 time series analysisHamilton 1994 time series analysis
Hamilton 1994 time series analysisOzan Baskan
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysisAmmar Pervaiz
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in StatisticsAzmi Mohd Tamil
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTetiana Ivanova
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsBhagya Silva
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAiden Yeh
 
Time Series
Time SeriesTime Series
Time Seriesyush313
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statisticsguest290abe
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpointjamiebrandon
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysisJames Neill
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 

Destaque (16)

Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Hamilton 1994 time series analysis
Hamilton 1994 time series analysisHamilton 1994 time series analysis
Hamilton 1994 time series analysis
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in Statistics
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Time series
Time seriesTime series
Time series
 
Time Series Analysis Ravi
Time Series Analysis RaviTime Series Analysis Ravi
Time Series Analysis Ravi
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Time Series
Time SeriesTime Series
Time Series
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
time series analysis
time series analysistime series analysis
time series analysis
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 

Semelhante a Exploratory Analysis Part1 Coursera DataScience Specialisation

R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data scienceLong Nguyen
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache CalciteDataWorks Summit
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2jalle6
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettyNoam Ross
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in RIlya Zhbannikov
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_publicLong Nguyen
 
ggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphicsggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality GraphicsClaus Wilke
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxhelzerpatrina
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2yannabraham
 
Exploratory data analysis of 2017 US Employment data using R
Exploratory data analysis  of 2017 US Employment data using RExploratory data analysis  of 2017 US Employment data using R
Exploratory data analysis of 2017 US Employment data using RChetan Khanzode
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 

Semelhante a Exploratory Analysis Part1 Coursera DataScience Specialisation (20)

R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
ggplotcourse.pptx
ggplotcourse.pptxggplotcourse.pptx
ggplotcourse.pptx
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the Pretty
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
 
R training5
R training5R training5
R training5
 
BasicGraphsWithR
BasicGraphsWithRBasicGraphsWithR
BasicGraphsWithR
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
 
ggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphicsggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphics
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
 
Exploratory data analysis of 2017 US Employment data using R
Exploratory data analysis  of 2017 US Employment data using RExploratory data analysis  of 2017 US Employment data using R
Exploratory data analysis of 2017 US Employment data using R
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 

Último

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Último (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

Exploratory Analysis Part1 Coursera DataScience Specialisation

  • 2. In today’s session • Principles behind exploratory analyses • Plotting data out on to popular exploratory graphs • Plotting Systems in R • Base (Week1) • Lattice (Week2) • GGPLOT2 (Week2) • Choosing and using Graphic Devices aka the output formats Scripts can be downloaded at: https://www.dropbox.com/s/ii1yj8f650d4l1q/lesson1.r?dl=0 https://www.dropbox.com/s/eme44h6lrhn775l/final.r?dl=0
  • 3. Principles behind exploratory analyses • Show comparisons • Show causality, mechanism, explanation • Show multivariate data • Integrate multiple modes of evidence • Describe and document the evidence • Content is king • SPEED
  • 4. Dimensionality • Five-number summary • Boxplots • Histograms • Density plot • Barplot Multiple-overlayed 1D plots Scatter plots
  • 5. Downloading our dataset R code dir.create("exploring_data") setwd(“exploring_data”) download.file(“http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/therbook.zip",dest="data.zip") unzip(“data.zip”)
  • 6. R code Boxplots weather = read.table("SilwoodWeather.txt",h=T) onemonth = subset(weather, month==1 & yr == 2004) boxplot(onemonth$rain) Header = T
  • 7. Histograms R code hist(weather$upper) rug(weather$upper) ticks for each value
  • 8. Barplot R code Barplot( table(weather$month), col = "wheat", main = "Number of Observations in Months”)
  • 9. Raster Vector PNG PDF SVG grDevices Filesize small medium medium Scalable No Yes Yes Web friendly Yes No Yes
  • 10. Plotting Systems Plotting Systems Base Lattice Grid Libraries lattice grid, gridExtras ggplot2 Example functions hist✔ barplot✔ boxplot✔ Plot xyplot (scatterplots) bwplot (boxplots) levelplot qplot ggplot geom Facetted plots Yes Yes Yes Grammar of NO No Yes graphics Interface with statistical functions Yes Partial Partial + Workarounds Cannot be mixed
  • 11. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T)
  • 12. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) #Regression Line with(data1, abline(lm(ys~xv))) Color
  • 13. Base plots: Scatterplot Set symbol to represent data point
  • 14. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #shape with(data2, points(xv2, ys2, col="blue", pch =11)) Symbol shape
  • 15. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #shape with(data2, points(xv2, ys2, col="blue", pch =11)) Symbol shape
  • 16. Base plots: Using par for multiple plots R code par(mfrow=c(1,2)) with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #Plot2 with(data2, plot(xv2, ys2, col="blue", pch =11)) title(“My Title", outer=TRUE)
  • 17. Par: To set global settings R code mfrow( mar=c(5.1,4.1,4.1,2.1), oma=c(2,2,2,2) )
  • 18. Lattice productivity = read.table("productivity.txt",h=T) # of species in forest against differing productivity library(lattice) #plotting xyplot( x~y, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) R code Formular Data frame
  • 19.
  • 20. Lattice productivity = read.table("productivity.txt",h=T) # of species in forest against differing productivity library(lattice) #plotting xyplot( x~y, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) xyplot( x~y | f, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) R code Formular Data frame given
  • 21.
  • 22. ggplot2 • Grammar of graphics (gg) • Based on GRID plotting system, cannot be mixed with base ggplot2.org
  • 23. ggplot Components • Data & relationship • GEOMetric Object • Statistical transformation • Scales • Coordinate system • Facetting
  • 26. ggplot Geometric objects aka Geoms Coordinate system wrt scales Log scale / sqrt / log ratio Title Plot Theme etc
  • 28. ggplot Components • Data & relationship ✔ • GEOMetric Object • Statistical transformation • Scales • Coordinate system • Facetting R code Rmbr to change month into a factor data.frame Aesthetics function which maps the relationships ggplot(weather, aes(x=month, y=upper))+ geom_boxplot()
  • 29. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales • Coordinate system • Facetting R code weather2 = weather %>% group_by(month) %>% summarise(average.upper = mean(upper)) ggplot(weather2, aes(month, average.upper))+ geom_bar(stat="identity")
  • 30. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales • Coordinate system • Facetting R code weather2 = weather %>% group_by(month) %>% summarise(average.upper = mean(upper)) ggplot(weather2, aes(month, average.upper))+ geom_bar(stat="identity")
  • 31. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales✔ • Coordinate system • Facetting R code plot2 = ggplot(weather2, aes(month, average.upper))+ geom_bar(aes(fill=month),stat="identity")+ scale_fill_brewer(palette="Set3")+ xlab("Months")+ ylab("Upper Quantile")+theme_bw()
  • 32. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales✔ • Coordinate system • Facetting R code plot2 = ggplot(weather2, aes(month, average.upper))+ geom_bar(aes(fill=month),stat="identity")+ scale_fill_brewer(palette="Set3")+ xlab("Months")+ ylab("Upper Quantile")+theme_bw()
  • 34. qplot A separate function which wraps ggplot, for simpler syntax R code qplot(month, upper, fill=month, data=weather, facets = ~yr, geom="bar", stat="identity")
  • 35. Ethos behind visualization http://keylines.com/network-visualization
  • 37. Final Challenge R code library(ggplot2) #Reads in data data = read.csv("final.csv") #Preparing for the rectangle background areas=unique(subset(data, select=c(Planning_Area,Planning_Region))) areas=areas[order(areas$Planning_Region),] areas$rectid=1:nrow(areas) rectdata = areas %>% group_by(Planning_Region) %>% summarise(xstart=min(rectid)- 0.5,xend= max(rectid)+0.5) #Order the levels data$Planning_Area=factor(data$Planning_Area, levels=as.character(areas[order(areas$Planning_Region),]$Planning_Area))
  • 38. Final challenge #Plot p0 = ggplot(data, aes(Planning_Area, Unit_Price____psm_))+ geom_boxplot(outlier.colour=NA)+ geom_rect(data=rectdata,aes(xmin=xstart,xmax=xend,ymin = -Inf, ymax = Inf, fill = Planning_Region,group=Planning_Region), alpha = 0.4,inherit.aes=F)+ geom_jitter(alpha=0.40, aes(color=as.factor(Year)))+ scale_color_brewer("Year", palette='RdBu')+ scale_fill_brewer(palette="Set1",name='Region')+ theme_minimal()+ theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+ xlab("Planning Area")+ylab("Unit Price (PSM)") R code #Save plot ggsave(p0, file="areaboxplots.pdf",w=20,h=10,units="in",dpi=300)
  • 39. “Above all else show the data.” ― Edward R. Tufte, The Visual Display of Quantitative Information Thank you for your time

Notas do Editor

  1. In this course we will be learning how to
  2. In this course we will be learning how to
  3. In this course we will be learning how to
  4. In this course we will be learning how to
  5. barplot(table(weather$month), col = "wheat", main = "Number of Observations in Months")
  6. In this course we will be learning how to
  7. In this course we will be learning how to
  8. In this course we will be learning how to
  9. In this course we will be learning how to
  10. In this course we will be learning how to
  11. In this course we will be learning how to title("My Title", outer=TRUE)
  12. In this course we will be learning how to
  13. ggplot(weather, aes(month, upper))+ geom_boxplot()
  14. ggplot(weather, aes(month, upper))+ geom_boxplot()
  15. ggplot(weather, aes(month, upper))+ geom_boxplot()
  16. ggplot(weather, aes(month, upper))+ geom_boxplot()
  17. ggplot(weather, aes(month, upper))+ geom_boxplot()
  18. ggplot(weather, aes(month, upper))+ geom_boxplot()
  19. In this course we will be learning how to
  20. In this course we will be learning how to