SlideShare uma empresa Scribd logo
1 de 20
R IntroWeek 1 Scott Chamberlain [modified from Haldre Rogers] September 9, 2011
Don’t just listen to me! Other Intros to R: http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdf http://www.cyclismo.org/tutorial/R/ http://www.r-tutor.com/r-introduction Quick R: http://www.statmethods.net/ http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf
R user frameworks R from command line: OSX and PC Just type “R” into the command line – and have fun! R itself http://www.r-project.org/ RStudio – good choice http://www.rstudio.org/ RevolutionR [free academic version] – this is sort of the SAS-ised version of R http://www.revolutionanalytics.com/downloads/free-academic.php Uses proprietary .xdf file format that speeds up computation times Many other ways to use R, including GUIs, other IDEs, and huge variety of text editors https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources If you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red R You can learn using these interfaces what code does what after pressing buttons
R user frameworks, cont. R from Python RPy: http://rpy.sourceforge.net/ C from R:  rcpp package: http://cran.r-project.org/web/packages/Rcpp/index.html http://dirk.eddelbuettel.com/code/rcpp.html Can hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R. E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html & http://dirk.eddelbuettel.com/code/rcpp.examples.html Excel from R XLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.html And more….see for yourself
R Tips R can crash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links:  https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources When asking for help on listserves/help websites, use BRIEF and  REPRODUCIBLE examples Not doing this makes people not want to help you! R automatically overwrites files with the same file name!!!! Make sure you want to overwrite a file before doing so
Style
Not this kind of style…
This kind of style!!!
Style Style is important so YOU and OTHERS can read your code and actually use it Google style guide:  http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayout Henrik Bengtsson style guide:  http://www1.maths.lth.se/help/R/RCC/ Hadley Wickham's style guide:  https://github.com/hadley/devtools/wiki/Style
Preparing your data for R What makes clean data? Correct spelling Identical capitalization (e.g. Premna vspremna) If myvector <- c(3, 4, 5), calling Myvector does not work! No spaces between words (spaces turned into “.”) Generally try to avoid, use underscores instead NA or blank (if using csv) for missing values Find and replace to get rid of spaces after words I generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file
Bringing data into R Create csv file One worksheet only No special formatting, filters, comments etc. Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimes Name your variables well  self-explanatory, unique, lowercase, short-ish, one-word names In R, set the working directory setwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro") What is the working directory? getwd() What is in the working directory? dir() Read in data CSV files: iris.df <- read.csv("iris_df.csv", header=T) Clipboard: read.csv("clipboard")- reads in file like cutting and pasting it From web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV") From excel files: (using the XLConnect package) iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”) Write data write.csv(dataframe, “dataframename.csv”), OR save(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
R data structures Scalar: Object with a single value, either numeric or character Vector: Sequence of any values, including numeric, character, and NA List: Arbitrary collections of variables – very useful R object Character: Text, e.g., “this is some text” Factor: Like character vectors, but only w/ values in predefined “levels” Matrix: Only numeric values allowed Dataframe:  Each column can be of a different class Immutable dataframe:  special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculations Function Environment
Exploring dataframes str(dataframe) gives column formats and dimensions head(dataframe) and tail() give first and last 6 rows names(dataframe) gives column names row.names(dataframe) gives row names attributes(dataframe) gives column and row names and object class summary(dataframe) gives a lot of good information Make sure variables are appropriate form Character/string, Numeric, Factor, Integer, logical Make sure mins, maxs, means, etc. seem right Make sure you don’t have typing errors so Premna and premna are two separate factors Use: unique(iris$species) to see what all unique values of a column are Or use: levels(spider$species) to see different levels
To attach or not to attach…that is the question Some like to use ‘attach’ to make dataframe variables accessible by name within the R session  Generally, ‘attach’ is frowned upon by R junkies.   Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2] To detach the object, use: detach()   I recommend: do not use attach, but do what you want
R Packages 3,262 packages!!!! Packages are extensions written by anyone for any purpose, usually loaded by: install.packages(”packagename”), then require(packagename) or library() Use ?functionname for help on any function in base R or in R packages In RStudio, just press tab when in parentheses after the function name to see function options!!! Explore packages at the CRAN site: http://cran.r-project.org/web/packages/ Inside-R package reference:  http://www.inside-r.org/packages
Data manipulation Packages: plyr, data.table, doBY, sqldf, reshape2, and more Comparison of packages Modified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919 data.table is by far the fastest!!!  BUT, ease of use and flexibility may be plyr? See for yourself… Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks
Visualizations A few different approaches: Base graphics Lattice graphics Grid graphics ggplot2 graphics Further reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphics An example:
more on ggplot2 graphics There are classes taught by Hadley Wickham here at Rice if you want to learn more! Data visualization (Stat645): http://had.co.nz/stat645/ Statistical computing (Stat405): http://had.co.nz/stat405/ Hadley’s website is really helpful: http://had.co.nz/ggplot2/ The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2
QUICK RSTUDIO RUN THROUGH Keyboard shortcuts!! http://www.rstudio.org/docs/using/keyboard_shortcuts
USE CASE HERE [see intro_usecase.R file]

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Unit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptxUnit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptx
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
 
Introduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R StudioIntroduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R Studio
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
R programming
R programmingR programming
R programming
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
R programming
R programmingR programming
R programming
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)
 
Data Visualization With R
Data Visualization With RData Visualization With R
Data Visualization With R
 
Euclidean Distance And Manhattan Distance
Euclidean Distance And Manhattan DistanceEuclidean Distance And Manhattan Distance
Euclidean Distance And Manhattan Distance
 
6. R data structures
6. R data structures6. R data structures
6. R data structures
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox
 
Language R
Language RLanguage R
Language R
 

Destaque

Destaque (20)

R language tutorial
R language tutorialR language tutorial
R language tutorial
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
R programming
R programmingR programming
R programming
 
R programming Basic & Advanced
R programming Basic & AdvancedR programming Basic & Advanced
R programming Basic & Advanced
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
Class ppt intro to r
Class ppt intro to rClass ppt intro to r
Class ppt intro to r
 
R presentation
R presentationR presentation
R presentation
 
R programming language
R programming languageR programming language
R programming language
 
Rtutorial
RtutorialRtutorial
Rtutorial
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
Intro to RStudio
Intro to RStudioIntro to RStudio
Intro to RStudio
 
R tutorial
R tutorialR tutorial
R tutorial
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Counterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning modelsCounterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning models
 

Semelhante a R Introduction

Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
Kazuki Yoshida
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
Nimrita Koul
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
Ajay Ohri
 

Semelhante a R Introduction (20)

Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Unit 3
Unit 3Unit 3
Unit 3
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Reproducible research (and literate programming) in R
Reproducible research (and literate programming) in RReproducible research (and literate programming) in R
Reproducible research (and literate programming) in R
 
Basics R.ppt
Basics R.pptBasics R.ppt
Basics R.ppt
 
Basics.ppt
Basics.pptBasics.ppt
Basics.ppt
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Easy R
Easy REasy R
Easy R
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
1 installing & Getting Started with R
1 installing & Getting Started with R1 installing & Getting Started with R
1 installing & Getting Started with R
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with R
 

Mais de schamber (6)

Poster
PosterPoster
Poster
 
Poster
PosterPoster
Poster
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesis
 
Phylogenetics in R
Phylogenetics in RPhylogenetics in R
Phylogenetics in R
 
Web data from R
Web data from RWeb data from R
Web data from R
 
regex-presentation_ed_goodwin
regex-presentation_ed_goodwinregex-presentation_ed_goodwin
regex-presentation_ed_goodwin
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

R Introduction

  • 1. R IntroWeek 1 Scott Chamberlain [modified from Haldre Rogers] September 9, 2011
  • 2. Don’t just listen to me! Other Intros to R: http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdf http://www.cyclismo.org/tutorial/R/ http://www.r-tutor.com/r-introduction Quick R: http://www.statmethods.net/ http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf
  • 3. R user frameworks R from command line: OSX and PC Just type “R” into the command line – and have fun! R itself http://www.r-project.org/ RStudio – good choice http://www.rstudio.org/ RevolutionR [free academic version] – this is sort of the SAS-ised version of R http://www.revolutionanalytics.com/downloads/free-academic.php Uses proprietary .xdf file format that speeds up computation times Many other ways to use R, including GUIs, other IDEs, and huge variety of text editors https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources If you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red R You can learn using these interfaces what code does what after pressing buttons
  • 4. R user frameworks, cont. R from Python RPy: http://rpy.sourceforge.net/ C from R: rcpp package: http://cran.r-project.org/web/packages/Rcpp/index.html http://dirk.eddelbuettel.com/code/rcpp.html Can hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R. E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html & http://dirk.eddelbuettel.com/code/rcpp.examples.html Excel from R XLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.html And more….see for yourself
  • 5. R Tips R can crash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources When asking for help on listserves/help websites, use BRIEF and REPRODUCIBLE examples Not doing this makes people not want to help you! R automatically overwrites files with the same file name!!!! Make sure you want to overwrite a file before doing so
  • 7. Not this kind of style…
  • 8. This kind of style!!!
  • 9. Style Style is important so YOU and OTHERS can read your code and actually use it Google style guide: http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayout Henrik Bengtsson style guide: http://www1.maths.lth.se/help/R/RCC/ Hadley Wickham's style guide: https://github.com/hadley/devtools/wiki/Style
  • 10. Preparing your data for R What makes clean data? Correct spelling Identical capitalization (e.g. Premna vspremna) If myvector <- c(3, 4, 5), calling Myvector does not work! No spaces between words (spaces turned into “.”) Generally try to avoid, use underscores instead NA or blank (if using csv) for missing values Find and replace to get rid of spaces after words I generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file
  • 11. Bringing data into R Create csv file One worksheet only No special formatting, filters, comments etc. Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimes Name your variables well self-explanatory, unique, lowercase, short-ish, one-word names In R, set the working directory setwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro") What is the working directory? getwd() What is in the working directory? dir() Read in data CSV files: iris.df <- read.csv("iris_df.csv", header=T) Clipboard: read.csv("clipboard")- reads in file like cutting and pasting it From web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV") From excel files: (using the XLConnect package) iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”) Write data write.csv(dataframe, “dataframename.csv”), OR save(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
  • 12. R data structures Scalar: Object with a single value, either numeric or character Vector: Sequence of any values, including numeric, character, and NA List: Arbitrary collections of variables – very useful R object Character: Text, e.g., “this is some text” Factor: Like character vectors, but only w/ values in predefined “levels” Matrix: Only numeric values allowed Dataframe: Each column can be of a different class Immutable dataframe: special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculations Function Environment
  • 13. Exploring dataframes str(dataframe) gives column formats and dimensions head(dataframe) and tail() give first and last 6 rows names(dataframe) gives column names row.names(dataframe) gives row names attributes(dataframe) gives column and row names and object class summary(dataframe) gives a lot of good information Make sure variables are appropriate form Character/string, Numeric, Factor, Integer, logical Make sure mins, maxs, means, etc. seem right Make sure you don’t have typing errors so Premna and premna are two separate factors Use: unique(iris$species) to see what all unique values of a column are Or use: levels(spider$species) to see different levels
  • 14. To attach or not to attach…that is the question Some like to use ‘attach’ to make dataframe variables accessible by name within the R session Generally, ‘attach’ is frowned upon by R junkies. Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2] To detach the object, use: detach()  I recommend: do not use attach, but do what you want
  • 15. R Packages 3,262 packages!!!! Packages are extensions written by anyone for any purpose, usually loaded by: install.packages(”packagename”), then require(packagename) or library() Use ?functionname for help on any function in base R or in R packages In RStudio, just press tab when in parentheses after the function name to see function options!!! Explore packages at the CRAN site: http://cran.r-project.org/web/packages/ Inside-R package reference: http://www.inside-r.org/packages
  • 16. Data manipulation Packages: plyr, data.table, doBY, sqldf, reshape2, and more Comparison of packages Modified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919 data.table is by far the fastest!!! BUT, ease of use and flexibility may be plyr? See for yourself… Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks
  • 17. Visualizations A few different approaches: Base graphics Lattice graphics Grid graphics ggplot2 graphics Further reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphics An example:
  • 18. more on ggplot2 graphics There are classes taught by Hadley Wickham here at Rice if you want to learn more! Data visualization (Stat645): http://had.co.nz/stat645/ Statistical computing (Stat405): http://had.co.nz/stat405/ Hadley’s website is really helpful: http://had.co.nz/ggplot2/ The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2
  • 19. QUICK RSTUDIO RUN THROUGH Keyboard shortcuts!! http://www.rstudio.org/docs/using/keyboard_shortcuts
  • 20. USE CASE HERE [see intro_usecase.R file]

Notas do Editor

  1. Header=T means first row contains variable names
  2. Some numbers are actually factors- think of 0/1 for dead/alive or zipcodes (average zipcode?)