SlideShare a Scribd company logo
1 of 20
R IntroWeek 1 Scott Chamberlain [modified from Haldre Rogers] September 9, 2011
Don’t just listen to me! Other Intros to R: http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdf http://www.cyclismo.org/tutorial/R/ http://www.r-tutor.com/r-introduction Quick R: http://www.statmethods.net/ http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf
R user frameworks R from command line: OSX and PC Just type “R” into the command line – and have fun! R itself http://www.r-project.org/ RStudio – good choice http://www.rstudio.org/ RevolutionR [free academic version] – this is sort of the SAS-ised version of R http://www.revolutionanalytics.com/downloads/free-academic.php Uses proprietary .xdf file format that speeds up computation times Many other ways to use R, including GUIs, other IDEs, and huge variety of text editors https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources If you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red R You can learn using these interfaces what code does what after pressing buttons
R user frameworks, cont. R from Python RPy: http://rpy.sourceforge.net/ C from R:  rcpp package: http://cran.r-project.org/web/packages/Rcpp/index.html http://dirk.eddelbuettel.com/code/rcpp.html Can hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R. E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html & http://dirk.eddelbuettel.com/code/rcpp.examples.html Excel from R XLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.html And more….see for yourself
R Tips R can crash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links:  https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources When asking for help on listserves/help websites, use BRIEF and  REPRODUCIBLE examples Not doing this makes people not want to help you! R automatically overwrites files with the same file name!!!! Make sure you want to overwrite a file before doing so
Style
Not this kind of style…
This kind of style!!!
Style Style is important so YOU and OTHERS can read your code and actually use it Google style guide:  http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayout Henrik Bengtsson style guide:  http://www1.maths.lth.se/help/R/RCC/ Hadley Wickham's style guide:  https://github.com/hadley/devtools/wiki/Style
Preparing your data for R What makes clean data? Correct spelling Identical capitalization (e.g. Premna vspremna) If myvector <- c(3, 4, 5), calling Myvector does not work! No spaces between words (spaces turned into “.”) Generally try to avoid, use underscores instead NA or blank (if using csv) for missing values Find and replace to get rid of spaces after words I generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file
Bringing data into R Create csv file One worksheet only No special formatting, filters, comments etc. Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimes Name your variables well  self-explanatory, unique, lowercase, short-ish, one-word names In R, set the working directory setwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro") What is the working directory? getwd() What is in the working directory? dir() Read in data CSV files: iris.df <- read.csv("iris_df.csv", header=T) Clipboard: read.csv("clipboard")- reads in file like cutting and pasting it From web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV") From excel files: (using the XLConnect package) iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”) Write data write.csv(dataframe, “dataframename.csv”), OR save(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
R data structures Scalar: Object with a single value, either numeric or character Vector: Sequence of any values, including numeric, character, and NA List: Arbitrary collections of variables – very useful R object Character: Text, e.g., “this is some text” Factor: Like character vectors, but only w/ values in predefined “levels” Matrix: Only numeric values allowed Dataframe:  Each column can be of a different class Immutable dataframe:  special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculations Function Environment
Exploring dataframes str(dataframe) gives column formats and dimensions head(dataframe) and tail() give first and last 6 rows names(dataframe) gives column names row.names(dataframe) gives row names attributes(dataframe) gives column and row names and object class summary(dataframe) gives a lot of good information Make sure variables are appropriate form Character/string, Numeric, Factor, Integer, logical Make sure mins, maxs, means, etc. seem right Make sure you don’t have typing errors so Premna and premna are two separate factors Use: unique(iris$species) to see what all unique values of a column are Or use: levels(spider$species) to see different levels
To attach or not to attach…that is the question Some like to use ‘attach’ to make dataframe variables accessible by name within the R session  Generally, ‘attach’ is frowned upon by R junkies.   Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2] To detach the object, use: detach()   I recommend: do not use attach, but do what you want
R Packages 3,262 packages!!!! Packages are extensions written by anyone for any purpose, usually loaded by: install.packages(”packagename”), then require(packagename) or library() Use ?functionname for help on any function in base R or in R packages In RStudio, just press tab when in parentheses after the function name to see function options!!! Explore packages at the CRAN site: http://cran.r-project.org/web/packages/ Inside-R package reference:  http://www.inside-r.org/packages
Data manipulation Packages: plyr, data.table, doBY, sqldf, reshape2, and more Comparison of packages Modified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919 data.table is by far the fastest!!!  BUT, ease of use and flexibility may be plyr? See for yourself… Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks
Visualizations A few different approaches: Base graphics Lattice graphics Grid graphics ggplot2 graphics Further reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphics An example:
more on ggplot2 graphics There are classes taught by Hadley Wickham here at Rice if you want to learn more! Data visualization (Stat645): http://had.co.nz/stat645/ Statistical computing (Stat405): http://had.co.nz/stat405/ Hadley’s website is really helpful: http://had.co.nz/ggplot2/ The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2
QUICK RSTUDIO RUN THROUGH Keyboard shortcuts!! http://www.rstudio.org/docs/using/keyboard_shortcuts
USE CASE HERE [see intro_usecase.R file]

More Related Content

What's hot

What's hot (20)

Class ppt intro to r
Class ppt intro to rClass ppt intro to r
Class ppt intro to r
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Intro to Reproducible Research
Intro to Reproducible ResearchIntro to Reproducible Research
Intro to Reproducible Research
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
 
R programming
R programmingR programming
R programming
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
R programming language
R programming languageR programming language
R programming language
 
Parametric versus semi nonparametric parametric regression models
Parametric versus semi nonparametric parametric regression modelsParametric versus semi nonparametric parametric regression models
Parametric versus semi nonparametric parametric regression models
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Analysis of crop yield prediction using data mining techniques
Analysis of crop yield prediction using data mining techniquesAnalysis of crop yield prediction using data mining techniques
Analysis of crop yield prediction using data mining techniques
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
R Programming
R ProgrammingR Programming
R Programming
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
High Dimensional Data Visualization
High Dimensional Data VisualizationHigh Dimensional Data Visualization
High Dimensional Data Visualization
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 

Viewers also liked

Viewers also liked (20)

Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
R programming Basic & Advanced
R programming Basic & AdvancedR programming Basic & Advanced
R programming Basic & Advanced
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
R presentation
R presentationR presentation
R presentation
 
R programming language
R programming languageR programming language
R programming language
 
Rtutorial
RtutorialRtutorial
Rtutorial
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
Intro to RStudio
Intro to RStudioIntro to RStudio
Intro to RStudio
 
R tutorial
R tutorialR tutorial
R tutorial
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
 
Counterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning modelsCounterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning models
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
R introduction v2
R introduction v2R introduction v2
R introduction v2
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 

Similar to R Introduction

Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
Kazuki Yoshida
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
Nimrita Koul
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
Ajay Ohri
 

Similar to R Introduction (20)

Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Unit 3
Unit 3Unit 3
Unit 3
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Reproducible research (and literate programming) in R
Reproducible research (and literate programming) in RReproducible research (and literate programming) in R
Reproducible research (and literate programming) in R
 
Basics R.ppt
Basics R.pptBasics R.ppt
Basics R.ppt
 
Basics.ppt
Basics.pptBasics.ppt
Basics.ppt
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Easy R
Easy REasy R
Easy R
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
1 installing & Getting Started with R
1 installing & Getting Started with R1 installing & Getting Started with R
1 installing & Getting Started with R
 

More from schamber (6)

Poster
PosterPoster
Poster
 
Poster
PosterPoster
Poster
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesis
 
Phylogenetics in R
Phylogenetics in RPhylogenetics in R
Phylogenetics in R
 
Web data from R
Web data from RWeb data from R
Web data from R
 
regex-presentation_ed_goodwin
regex-presentation_ed_goodwinregex-presentation_ed_goodwin
regex-presentation_ed_goodwin
 

Recently uploaded

CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
Overkill Security
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 

R Introduction

  • 1. R IntroWeek 1 Scott Chamberlain [modified from Haldre Rogers] September 9, 2011
  • 2. Don’t just listen to me! Other Intros to R: http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdf http://www.cyclismo.org/tutorial/R/ http://www.r-tutor.com/r-introduction Quick R: http://www.statmethods.net/ http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf
  • 3. R user frameworks R from command line: OSX and PC Just type “R” into the command line – and have fun! R itself http://www.r-project.org/ RStudio – good choice http://www.rstudio.org/ RevolutionR [free academic version] – this is sort of the SAS-ised version of R http://www.revolutionanalytics.com/downloads/free-academic.php Uses proprietary .xdf file format that speeds up computation times Many other ways to use R, including GUIs, other IDEs, and huge variety of text editors https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources If you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red R You can learn using these interfaces what code does what after pressing buttons
  • 4. R user frameworks, cont. R from Python RPy: http://rpy.sourceforge.net/ C from R: rcpp package: http://cran.r-project.org/web/packages/Rcpp/index.html http://dirk.eddelbuettel.com/code/rcpp.html Can hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R. E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html & http://dirk.eddelbuettel.com/code/rcpp.examples.html Excel from R XLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.html And more….see for yourself
  • 5. R Tips R can crash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources When asking for help on listserves/help websites, use BRIEF and REPRODUCIBLE examples Not doing this makes people not want to help you! R automatically overwrites files with the same file name!!!! Make sure you want to overwrite a file before doing so
  • 7. Not this kind of style…
  • 8. This kind of style!!!
  • 9. Style Style is important so YOU and OTHERS can read your code and actually use it Google style guide: http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayout Henrik Bengtsson style guide: http://www1.maths.lth.se/help/R/RCC/ Hadley Wickham's style guide: https://github.com/hadley/devtools/wiki/Style
  • 10. Preparing your data for R What makes clean data? Correct spelling Identical capitalization (e.g. Premna vspremna) If myvector <- c(3, 4, 5), calling Myvector does not work! No spaces between words (spaces turned into “.”) Generally try to avoid, use underscores instead NA or blank (if using csv) for missing values Find and replace to get rid of spaces after words I generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file
  • 11. Bringing data into R Create csv file One worksheet only No special formatting, filters, comments etc. Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimes Name your variables well self-explanatory, unique, lowercase, short-ish, one-word names In R, set the working directory setwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro") What is the working directory? getwd() What is in the working directory? dir() Read in data CSV files: iris.df <- read.csv("iris_df.csv", header=T) Clipboard: read.csv("clipboard")- reads in file like cutting and pasting it From web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV") From excel files: (using the XLConnect package) iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”) Write data write.csv(dataframe, “dataframename.csv”), OR save(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
  • 12. R data structures Scalar: Object with a single value, either numeric or character Vector: Sequence of any values, including numeric, character, and NA List: Arbitrary collections of variables – very useful R object Character: Text, e.g., “this is some text” Factor: Like character vectors, but only w/ values in predefined “levels” Matrix: Only numeric values allowed Dataframe: Each column can be of a different class Immutable dataframe: special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculations Function Environment
  • 13. Exploring dataframes str(dataframe) gives column formats and dimensions head(dataframe) and tail() give first and last 6 rows names(dataframe) gives column names row.names(dataframe) gives row names attributes(dataframe) gives column and row names and object class summary(dataframe) gives a lot of good information Make sure variables are appropriate form Character/string, Numeric, Factor, Integer, logical Make sure mins, maxs, means, etc. seem right Make sure you don’t have typing errors so Premna and premna are two separate factors Use: unique(iris$species) to see what all unique values of a column are Or use: levels(spider$species) to see different levels
  • 14. To attach or not to attach…that is the question Some like to use ‘attach’ to make dataframe variables accessible by name within the R session Generally, ‘attach’ is frowned upon by R junkies. Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2] To detach the object, use: detach()  I recommend: do not use attach, but do what you want
  • 15. R Packages 3,262 packages!!!! Packages are extensions written by anyone for any purpose, usually loaded by: install.packages(”packagename”), then require(packagename) or library() Use ?functionname for help on any function in base R or in R packages In RStudio, just press tab when in parentheses after the function name to see function options!!! Explore packages at the CRAN site: http://cran.r-project.org/web/packages/ Inside-R package reference: http://www.inside-r.org/packages
  • 16. Data manipulation Packages: plyr, data.table, doBY, sqldf, reshape2, and more Comparison of packages Modified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919 data.table is by far the fastest!!! BUT, ease of use and flexibility may be plyr? See for yourself… Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks
  • 17. Visualizations A few different approaches: Base graphics Lattice graphics Grid graphics ggplot2 graphics Further reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphics An example:
  • 18. more on ggplot2 graphics There are classes taught by Hadley Wickham here at Rice if you want to learn more! Data visualization (Stat645): http://had.co.nz/stat645/ Statistical computing (Stat405): http://had.co.nz/stat405/ Hadley’s website is really helpful: http://had.co.nz/ggplot2/ The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2
  • 19. QUICK RSTUDIO RUN THROUGH Keyboard shortcuts!! http://www.rstudio.org/docs/using/keyboard_shortcuts
  • 20. USE CASE HERE [see intro_usecase.R file]

Editor's Notes

  1. Header=T means first row contains variable names
  2. Some numbers are actually factors- think of 0/1 for dead/alive or zipcodes (average zipcode?)