SlideShare uma empresa Scribd logo
1 de 8
Baixar para ler offline
Using Qplot and R Tutorial 1
                                    Abhik Seal



This tutorial will guide you how to use ggplot2 (an R package for visualizing data). This
tutorial will not cover every function of ggplot2 but will cover basic and some important
functions how you will use the data for visualization.



1. Getting R
   http://www.r-project.org/  Link for downloading R for windows ,linux   and Mac.
2. Install ggplot2
   Type install.packages(“ggplot2”) in the R command window and select any
   of the mirror sites for downloading ggplot2.
3. After downloading the ggplot2 package use the command
   library(ggplot2) to load the package
   The Figure 1 gives you the screenshot of the previous steps
Fig1



1. Learning qplot()
   qplot() stands for “quick plot”. It can produce complex plots with a single line of code. The
   qplot function in the ggplot2 package and it is based on the grammar of graphics.

2. Datasets used: ggplot2 comes with various datasets. For time being we will use diamonds
   dataset in ggplot2 package.
   To see how the diamonds data look like you can type
   >diamonds #to get the idea of the diamond data or to visualize in Excel type
   >write.csv(diamonds,”E:/diamonds.csv”)

   Open the file diamonds.csv. Figure 2 give you the table of the diamonds containing
   Carat, cut ,color and clarity and 5 physical measurements i.e depth, table and the
   dimensions x,y,z .




                                          Fig2
Since the dataset contain around more than 50000 rows we will use a small sample of the
data around 250 randomly selected rows to analyse and visualize. To select a random
sample from the data use

>dsmall <- diamonds[sample(nrow(diamonds), 250), ]
>dsmall # dsmall contains the sampled data from diamond dataset.

Plotting the data with qplot()
The general syntax of a simple call to qplot is as follows:
> qplot(x = ???, y = ???, data = ???, color = ???, shape = ???, geom =
???, main = "my plot title").

The arguments are:
x - The x values to plot; they must be a variable in the data frame you specify and there are no quotes
around the name.N ote that if you give only x values (no y values), you are plotting univariate data and
qplot figures this out.However, you have to give a geom that makes sense for univariate data.
y- The y values to plot; they must be a variable in the data frame you specify, and again, no quotes
around the name. If you are providing y values, you have to specify a geom that makes sense with
bivariate data. The name of data frame which contains the x and y values.
Color- Perhaps surprisingly, not a set of colors to use, but rather a "mapping" of the color scheme onto
some variable in your data frame. You are basically telling qplot to use different colors for different
values of the variable you specify; hence this variable should be a factor, not a number. qplot decides
which colors to use.
Shape- Exactly as for color, except different symbols will be used for each value of the variable you
specify. Note that you can use either color = ?? or shape = ?? or both, depending upon how you want
your plot to look. Qplot decides which symbols to use.
Geom- A "geom" specication, which is basically a list of keywords describing what to plot. Common
examples are "histogram", "density", "line", "point" which pretty much do what they say. The geom must
make sense for the kind of data you are supplying.
Main- The title for the plot.

> qplot(price, carat, data = dsmall) # price is taken in x axis and carat in y axis fig 3 a
#Fig 3a shows the distribution of diamonds price and carat. As the carat (weight) increases the price also
increases

> qplot(log(price),log(carat), data = dsmall) # taking log of the data. fig 3 b
#Fig3b shows the logarithmic scale of the data. Logarithmic scales are used when amount of data is huge
so as the range.The figure shows a linear relationship of data in logarithmic scale.

>qplot(carat,x*y*z,data=dsmall) # x*y*z indicates the volume of the diamond. fig 3 c
# Fig 3c shows weight of the diamond i.e carat with respect to the x*y*z i.e volume of the diamond.
   >qplot(carat, price, data = dsmall, color =I("red")) # set color of dots to red




                a                         Fig 3 b                           c

> qplot (carat, price, data = dsmall, colour = cut) fig 4a
# The graph gives the idea of different cuts in the dsmall table sampled with carat in the x axis and price in
the y axis . It has been observed that premium and very good quality diamonds price and carat increase
linearly .Some variations are observed in the good and fair diamonds. This command assigns colors to the
plot.

> qplot (carat, price, data = dsmall, shape= cut) Fig4b
# Here the plot is similar to fig4a but in place of color shapes being added to the plot.

> qplot (carat, price, data = diamonds, alpha = I(1/100))
# alpha aesthetic is for transparency which ranges in the value in between 0(complete transparent) and
1(complete opaque) This is applied for diamond dataset Fig4c . The alpha transparency is applied to see
where the points are located at maximum.




           A                                 Fig4   b                                  c
Adding many points to the plot it becomes very difficult to see what trend is actually shown by the data.
Adding a trend line or a smoothed line to the plot will help to visualize the data at which direction it is
moving. The span parameter maintains the wiggliness of the line when the span is close to 0 the line covers
as much points as possible making the line crooked.
> qplot(carat, x*y*z, data = dsmall, geom = c("point", "smooth"),span=1)
Fig 5a
> qplot(carat, price, data = dsmall, geom = c("point", "smooth"),span=1)
Fig 5b
> qplot(carat, price, data = dsmall, geom = c("point",
"smooth"),span=0.2) Fig5c




      Fig5a                             fig5b                      fig5c
> qplot(color, price/carat, data=dsmall,geom=c("boxplot", "jitter")) Fig 6c
#Here the box plot summarizes the data with only five numbers i.e the sample minimum, lower quartile,
median, upper quartile and the largest observation. Though you will find Box plots are much more
informative than the jitter plots. Here jitter plots have shown some overplotting such as Fig 6a . When we
increase the transparency by the alpha parameter we can easily find out where the maximum points lie.

> qplot(color, price / carat, data = diamonds, geom = "jitter", alpha =
I(1 / 5)) Fig 6a
#Jitter and box plots shows the distribution of categorical variable and continuous variable .Categorical
variables are those which are not quantitative. Quantitative variable means whose value is naturally
measured. The jittering helps to investigate distribution of price per carat conditional on color. Here as the
color improves the spread of values decreases.
>qplot(color, price / carat, data = diamonds, geom =
"jitter",alpha=I(1/100)) Fig 6b
Fig 6a                            fig6b                          fig6c
    Histogram and density plots shows the distribution of univariate data. These provides more information
    about the distribution of Univariate data than box plots do.
    >qplot (price, data = diamonds, geom = "density", color=color) #fig7a
    In figure 7a it shows that the distribution of diamonds with respect to price for each level of diamond color.
    >qplot (price, data = diamonds, geom = "histogram",fill=color) #7b
    In fig 7b you can see that within a price range around 1200 you can find the maximum color of diamonds
    > qplot (color,data = diamonds,geom = histogram", weight=carat)
    +scale_y_continuous("carat")




                    Fig 7a                             Fig7b                                Fig7c


>qplot(carat, ..density.., data = diamonds, facets = color ~ .,geom =
"histogram", binwidth = 0.1, xlim = c(0, 3))
 #xlim is the limit of x axis and binwidth is the width of the histograms facets which are choosen by the form row
variable ~ column variable. Use of more than one variable like 2 and three will make the graph very long time to
compute and also making the graph much complex. Color facet used as row variable.

> qplot(carat, ..density.., data = diamonds, facets = . ~ color,
geom="histogram", binwidth=0.1,xlim=c(0,3))
# when color facet is used as column variable.
From the two figure 8a and 8b it is observed that 8a is much more informative than 8b because we can see in 8b
the bars are much more congested and difficult to interpret than bars in 8a. High-quality diamonds (colour D) are
skewed towards small sizes, and as quality declines the distribution becomes more flat.




                Fig 8a                                    Fig 8b

    Now to use the maps package( the maps package contains maps of USA,World,Italy,New Zealand,France

    To install maps
    >install.packages(“maps”)
    Mapss pacakage has various datasets among them one is us.cities to see the dataset type to use the data
    us.cities

    > data(us.cities)
    Now the cities have populations as a variable. I want to make a sample of data of population >500000
    > sample_city<-subset(us.cities,pop>500000)
    > qplot(long ,lat,data=sample_city)+border(“state”,size=0.5)
    There is one problem in R while plotting maps you have to provide the longitude and latitude of the
    destination otherwise the point will not be plotted on map.
>qplot(long,lat,data=sample_city,size=pop)+ borders(“state”,size=0.5)
The size attribute will help you to visualize the state’s population in the form of size of points




Note size attribute in the border function indicates the size of the boundaries if I increase it the border
length from 0.5 to 1.0 will increase for example the diagram given below shows it.

Mais conteúdo relacionado

Mais procurados

peRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysispeRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysis
Vyacheslav Arbuzov
 
A walk in graph databases v1.0
A walk in graph databases v1.0A walk in graph databases v1.0
A walk in graph databases v1.0
Pierre De Wilde
 
Au2419291933
Au2419291933Au2419291933
Au2419291933
IJMER
 

Mais procurados (20)

NumPy Refresher
NumPy RefresherNumPy Refresher
NumPy Refresher
 
peRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysispeRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysis
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
 
Graphics and Java 2D
Graphics and Java 2DGraphics and Java 2D
Graphics and Java 2D
 
java graphics
java graphicsjava graphics
java graphics
 
Chapter2
Chapter2Chapter2
Chapter2
 
Numpy python cheat_sheet
Numpy python cheat_sheetNumpy python cheat_sheet
Numpy python cheat_sheet
 
Rexx summary
Rexx summaryRexx summary
Rexx summary
 
A walk in graph databases v1.0
A walk in graph databases v1.0A walk in graph databases v1.0
A walk in graph databases v1.0
 
Linear models
Linear modelsLinear models
Linear models
 
(2015 06-16) Three Approaches to Monads
(2015 06-16) Three Approaches to Monads(2015 06-16) Three Approaches to Monads
(2015 06-16) Three Approaches to Monads
 
Matlab plotting
Matlab plottingMatlab plotting
Matlab plotting
 
R-ggplot2 package Examples
R-ggplot2 package ExamplesR-ggplot2 package Examples
R-ggplot2 package Examples
 
Matlab Graphics Tutorial
Matlab Graphics TutorialMatlab Graphics Tutorial
Matlab Graphics Tutorial
 
Scientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
Scientific Computing with Python Webinar March 19: 3D Visualization with MayaviScientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
Scientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
 
Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303
 
Numpy Talk at SIAM
Numpy Talk at SIAMNumpy Talk at SIAM
Numpy Talk at SIAM
 
Introduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersIntroduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning Programmers
 
Au2419291933
Au2419291933Au2419291933
Au2419291933
 
11 Data Structures
11 Data Structures11 Data Structures
11 Data Structures
 

Destaque (9)

Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
 
eDiscovery for Dummies "The Book"
eDiscovery for Dummies "The Book"eDiscovery for Dummies "The Book"
eDiscovery for Dummies "The Book"
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Data handling in r
Data handling in rData handling in r
Data handling in r
 
Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in r
 
Networks
NetworksNetworks
Networks
 
Chemical data
Chemical dataChemical data
Chemical data
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug Reactions
 

Semelhante a Q plot tutorial

CIV1900 Matlab - Plotting & Coursework
CIV1900 Matlab - Plotting & CourseworkCIV1900 Matlab - Plotting & Coursework
CIV1900 Matlab - Plotting & Coursework
TUOS-Sam
 
Student_Garden_geostatistics_course
Student_Garden_geostatistics_courseStudent_Garden_geostatistics_course
Student_Garden_geostatistics_course
Pedro Correia
 
Student_Garden_geostatistics_course
Student_Garden_geostatistics_courseStudent_Garden_geostatistics_course
Student_Garden_geostatistics_course
Pedro Correia
 
Lectures r-graphics
Lectures r-graphicsLectures r-graphics
Lectures r-graphics
etyca
 

Semelhante a Q plot tutorial (20)

Ggplot2 ch2
Ggplot2 ch2Ggplot2 ch2
Ggplot2 ch2
 
R scatter plots
R scatter plotsR scatter plots
R scatter plots
 
CIV1900 Matlab - Plotting & Coursework
CIV1900 Matlab - Plotting & CourseworkCIV1900 Matlab - Plotting & Coursework
CIV1900 Matlab - Plotting & Coursework
 
Ggplot in python
Ggplot in pythonGgplot in python
Ggplot in python
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Exploratory data analysis using r
Exploratory data analysis using rExploratory data analysis using r
Exploratory data analysis using r
 
Basics of Digital Images
Basics of  Digital ImagesBasics of  Digital Images
Basics of Digital Images
 
Survey Demo
Survey DemoSurvey Demo
Survey Demo
 
Chart and graphs in R programming language
Chart and graphs in R programming language Chart and graphs in R programming language
Chart and graphs in R programming language
 
R_CheatSheet.pdf
R_CheatSheet.pdfR_CheatSheet.pdf
R_CheatSheet.pdf
 
MATLAB
MATLABMATLAB
MATLAB
 
RBootcamp Day 4
RBootcamp Day 4RBootcamp Day 4
RBootcamp Day 4
 
Chapter3_Visualizations2.pdf
Chapter3_Visualizations2.pdfChapter3_Visualizations2.pdf
Chapter3_Visualizations2.pdf
 
Student_Garden_geostatistics_course
Student_Garden_geostatistics_courseStudent_Garden_geostatistics_course
Student_Garden_geostatistics_course
 
Student_Garden_geostatistics_course
Student_Garden_geostatistics_courseStudent_Garden_geostatistics_course
Student_Garden_geostatistics_course
 
R Data Visualization Tutorial: Bar Plots
R Data Visualization Tutorial: Bar PlotsR Data Visualization Tutorial: Bar Plots
R Data Visualization Tutorial: Bar Plots
 
Assignment in regression1
Assignment in regression1Assignment in regression1
Assignment in regression1
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Data Visualization With R: Learn To Modify Color Of Plots
Data Visualization With R: Learn To Modify Color Of PlotsData Visualization With R: Learn To Modify Color Of Plots
Data Visualization With R: Learn To Modify Color Of Plots
 
Lectures r-graphics
Lectures r-graphicsLectures r-graphics
Lectures r-graphics
 

Mais de Abhik Seal (12)

Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical data
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with google
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data
 
Poster
PosterPoster
Poster
 
Indo us 2012
Indo us 2012Indo us 2012
Indo us 2012
 
Weka guide
Weka guideWeka guide
Weka guide
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
Pharmacohoreppt
 
Document1
Document1Document1
Document1
 
Qsar and drug design ppt
Qsar and drug design pptQsar and drug design ppt
Qsar and drug design ppt
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Q plot tutorial

  • 1. Using Qplot and R Tutorial 1 Abhik Seal This tutorial will guide you how to use ggplot2 (an R package for visualizing data). This tutorial will not cover every function of ggplot2 but will cover basic and some important functions how you will use the data for visualization. 1. Getting R http://www.r-project.org/  Link for downloading R for windows ,linux and Mac. 2. Install ggplot2 Type install.packages(“ggplot2”) in the R command window and select any of the mirror sites for downloading ggplot2. 3. After downloading the ggplot2 package use the command library(ggplot2) to load the package The Figure 1 gives you the screenshot of the previous steps
  • 2. Fig1 1. Learning qplot() qplot() stands for “quick plot”. It can produce complex plots with a single line of code. The qplot function in the ggplot2 package and it is based on the grammar of graphics. 2. Datasets used: ggplot2 comes with various datasets. For time being we will use diamonds dataset in ggplot2 package. To see how the diamonds data look like you can type >diamonds #to get the idea of the diamond data or to visualize in Excel type >write.csv(diamonds,”E:/diamonds.csv”) Open the file diamonds.csv. Figure 2 give you the table of the diamonds containing Carat, cut ,color and clarity and 5 physical measurements i.e depth, table and the dimensions x,y,z . Fig2
  • 3. Since the dataset contain around more than 50000 rows we will use a small sample of the data around 250 randomly selected rows to analyse and visualize. To select a random sample from the data use >dsmall <- diamonds[sample(nrow(diamonds), 250), ] >dsmall # dsmall contains the sampled data from diamond dataset. Plotting the data with qplot() The general syntax of a simple call to qplot is as follows: > qplot(x = ???, y = ???, data = ???, color = ???, shape = ???, geom = ???, main = "my plot title"). The arguments are: x - The x values to plot; they must be a variable in the data frame you specify and there are no quotes around the name.N ote that if you give only x values (no y values), you are plotting univariate data and qplot figures this out.However, you have to give a geom that makes sense for univariate data. y- The y values to plot; they must be a variable in the data frame you specify, and again, no quotes around the name. If you are providing y values, you have to specify a geom that makes sense with bivariate data. The name of data frame which contains the x and y values. Color- Perhaps surprisingly, not a set of colors to use, but rather a "mapping" of the color scheme onto some variable in your data frame. You are basically telling qplot to use different colors for different values of the variable you specify; hence this variable should be a factor, not a number. qplot decides which colors to use. Shape- Exactly as for color, except different symbols will be used for each value of the variable you specify. Note that you can use either color = ?? or shape = ?? or both, depending upon how you want your plot to look. Qplot decides which symbols to use. Geom- A "geom" specication, which is basically a list of keywords describing what to plot. Common examples are "histogram", "density", "line", "point" which pretty much do what they say. The geom must make sense for the kind of data you are supplying. Main- The title for the plot. > qplot(price, carat, data = dsmall) # price is taken in x axis and carat in y axis fig 3 a #Fig 3a shows the distribution of diamonds price and carat. As the carat (weight) increases the price also increases > qplot(log(price),log(carat), data = dsmall) # taking log of the data. fig 3 b #Fig3b shows the logarithmic scale of the data. Logarithmic scales are used when amount of data is huge so as the range.The figure shows a linear relationship of data in logarithmic scale. >qplot(carat,x*y*z,data=dsmall) # x*y*z indicates the volume of the diamond. fig 3 c
  • 4. # Fig 3c shows weight of the diamond i.e carat with respect to the x*y*z i.e volume of the diamond. >qplot(carat, price, data = dsmall, color =I("red")) # set color of dots to red a Fig 3 b c > qplot (carat, price, data = dsmall, colour = cut) fig 4a # The graph gives the idea of different cuts in the dsmall table sampled with carat in the x axis and price in the y axis . It has been observed that premium and very good quality diamonds price and carat increase linearly .Some variations are observed in the good and fair diamonds. This command assigns colors to the plot. > qplot (carat, price, data = dsmall, shape= cut) Fig4b # Here the plot is similar to fig4a but in place of color shapes being added to the plot. > qplot (carat, price, data = diamonds, alpha = I(1/100)) # alpha aesthetic is for transparency which ranges in the value in between 0(complete transparent) and 1(complete opaque) This is applied for diamond dataset Fig4c . The alpha transparency is applied to see where the points are located at maximum. A Fig4 b c
  • 5. Adding many points to the plot it becomes very difficult to see what trend is actually shown by the data. Adding a trend line or a smoothed line to the plot will help to visualize the data at which direction it is moving. The span parameter maintains the wiggliness of the line when the span is close to 0 the line covers as much points as possible making the line crooked. > qplot(carat, x*y*z, data = dsmall, geom = c("point", "smooth"),span=1) Fig 5a > qplot(carat, price, data = dsmall, geom = c("point", "smooth"),span=1) Fig 5b > qplot(carat, price, data = dsmall, geom = c("point", "smooth"),span=0.2) Fig5c Fig5a fig5b fig5c > qplot(color, price/carat, data=dsmall,geom=c("boxplot", "jitter")) Fig 6c #Here the box plot summarizes the data with only five numbers i.e the sample minimum, lower quartile, median, upper quartile and the largest observation. Though you will find Box plots are much more informative than the jitter plots. Here jitter plots have shown some overplotting such as Fig 6a . When we increase the transparency by the alpha parameter we can easily find out where the maximum points lie. > qplot(color, price / carat, data = diamonds, geom = "jitter", alpha = I(1 / 5)) Fig 6a #Jitter and box plots shows the distribution of categorical variable and continuous variable .Categorical variables are those which are not quantitative. Quantitative variable means whose value is naturally measured. The jittering helps to investigate distribution of price per carat conditional on color. Here as the color improves the spread of values decreases. >qplot(color, price / carat, data = diamonds, geom = "jitter",alpha=I(1/100)) Fig 6b
  • 6. Fig 6a fig6b fig6c Histogram and density plots shows the distribution of univariate data. These provides more information about the distribution of Univariate data than box plots do. >qplot (price, data = diamonds, geom = "density", color=color) #fig7a In figure 7a it shows that the distribution of diamonds with respect to price for each level of diamond color. >qplot (price, data = diamonds, geom = "histogram",fill=color) #7b In fig 7b you can see that within a price range around 1200 you can find the maximum color of diamonds > qplot (color,data = diamonds,geom = histogram", weight=carat) +scale_y_continuous("carat") Fig 7a Fig7b Fig7c >qplot(carat, ..density.., data = diamonds, facets = color ~ .,geom = "histogram", binwidth = 0.1, xlim = c(0, 3)) #xlim is the limit of x axis and binwidth is the width of the histograms facets which are choosen by the form row variable ~ column variable. Use of more than one variable like 2 and three will make the graph very long time to compute and also making the graph much complex. Color facet used as row variable. > qplot(carat, ..density.., data = diamonds, facets = . ~ color, geom="histogram", binwidth=0.1,xlim=c(0,3)) # when color facet is used as column variable.
  • 7. From the two figure 8a and 8b it is observed that 8a is much more informative than 8b because we can see in 8b the bars are much more congested and difficult to interpret than bars in 8a. High-quality diamonds (colour D) are skewed towards small sizes, and as quality declines the distribution becomes more flat. Fig 8a Fig 8b Now to use the maps package( the maps package contains maps of USA,World,Italy,New Zealand,France To install maps >install.packages(“maps”) Mapss pacakage has various datasets among them one is us.cities to see the dataset type to use the data us.cities > data(us.cities) Now the cities have populations as a variable. I want to make a sample of data of population >500000 > sample_city<-subset(us.cities,pop>500000) > qplot(long ,lat,data=sample_city)+border(“state”,size=0.5) There is one problem in R while plotting maps you have to provide the longitude and latitude of the destination otherwise the point will not be plotted on map.
  • 8. >qplot(long,lat,data=sample_city,size=pop)+ borders(“state”,size=0.5) The size attribute will help you to visualize the state’s population in the form of size of points Note size attribute in the border function indicates the size of the boundaries if I increase it the border length from 0.5 to 1.0 will increase for example the diagram given below shows it.