SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
Reading data
    into
  2012-09-28 @HSPH
 Kazuki Yoshida, M.D.
   MPH-CLE student

                        FREEDOM
                        TO(KNOW
Previously in this group


!   Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH
!   Introduction to R
Menu
!   What statistics is all about.
!   Data-reading functions in R
!   Installing packages
!   Reading excel files
!   Reading other files
http://mediacrushllc.com/2012/internet-statistics-2012/




                            is the study of the
collection, organization, analysis, interpretation,
               and presentation of



               data
                 http://en.wikipedia.org/wiki/Statistics
No data,
  No life
No statistics
Loading data is
 the first step
  http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html
Supported
!   .RData (native): load()
!   .csv: read.csv()
!   .xls/.xlsx: library(gdata) or library(XLConnect)
!   .sas7bdat: read.sas7bdat() via library(sas7bdat)
!   .dta: read.dta via library(foreign)
!   and more...
      http://cran.r-project.org/doc/manuals/R-data.html
library()
packages
4000+
   user-
contributed
 packages
           Fast
       development
         http://r4stats.com/articles/popularity/
Downside:
not much can be
 done without
    packages
CRAN
Comprehensive
      R
      Archive
      Network
http://cran.r-project.org/web/packages/
   available_packages_by_date.html
Let’s try
Open
R Studio
http://rstudio.org


Watch the screencast
Plot              Workspace
        switched


Console              Source
Menu: RStudio - Preferences     My configuration




                     Plot     Workspace




                 Console      Source
Menu: RStudio - Preferences     My configuration




                       Configure CRAN mirror
Comma Separated Values




            Use .CSV
            if possible
http://www.edrugsearch.com/edsblog/cvs-takes-on-wal-marts-generic-drug-prices-with-a-gimmicky-twist/#.UEfft0J8z0d
.csv
http://www.wondergraphs.com/img/SFO_Landings.csv
read.csv(“file.csv”)
http://www.wondergraphs.com/img/SFO_Landings.csv

                 Careful big file!
name of a dataset here




    new.dat <-
read.csv(“file.csv”)
                                file name here
function to read .csv files
alternatively
                 name of a dataset here




        new.dat <-
  read.csv(file.choose())
                                function to open a
  function to read .csv files   file-choose dialogue
Space separated

http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
read.table(“file.dat”)
                  or
  read.table(“file.dat”, header = T)

http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
tab-separated
read.delim(“file.tsv”)
     http://www.brookscole.com/cgi-wadsworth/
              course_products_wp.pl?
fid=M20b&flag=student&product_isbn_issn=9780495384
    960&disciplinenumber=1038&template=AUS
For comma-, tab-, or
space-separated text
       Let’s try!
Excel files
     prevalent
http://www.biography.com/people/bill-gates-9307520
     http://www.last.fm/music/Excel/+images/285200
http://www.philipcoppens.com/matrixconstructs.html




    We will use
  publicly available
         data
http://www.hsph.harvard.edu/faculty/miguel-
         hernan/files/nhefs_book.xls
        http://www.hsph.harvard.edu/faculty/miguel-hernan/causal-inference-book/
install.packages(“gdata”, dep = T)

  library(gdata)
read.xls(“file.xls”)

           Perl configuration necessary on Win
               http://cran.r-project.org/web/packages/gdata/INSTALL
install.packages(“XLConnect”, dep = T)

library(XLConnect)
readWorksheet(loadWorkbook(“file.xls”),
               sheet=1)

                Define a function for simplicity
  my.read.xls <- function(file) readWorksheet(loadWorkbook(file), sheet = 1)
  my.read.xls(“file.xls”)

        install.packages("XLConnect", type = "source") on Mac
To install a package
                            package name here


install.packages(“package”,
          dep = T)
                             short for TRUE
   short for dependencies
To load a package
                        package name here



library(package)
   double quote “” can be omitted
Just click
  box
Install package
       Load package
Read xls file chosen to nhefs
install.packages(“sas7bdat”, dep = T)

      library(sas7bdat)
read.sas7bdat(“file.sas7bdat”)
 http://www.biostat.harvard.edu/~fitzmaur/ala2e/
               smoking.sas7bdat
library(foreign)
    read.xport(“file.xpt”)
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/
                2009-2010/DEMO_F.xpt
library(foreign)
read.dta(“file.dta”)
http://www.biostat.harvard.edu/~fitzmaur/ala2e/
                 headache.dta
HTML table

            http://
        www.drugs.com/
       top200_2003.html
install.packages(“XML”, dep = T)
               library(XML)
readHTMLTable("http://www.drugs.com/top200_2003.html",
              which = 2, skip.rows = 1)




                                http://www.drugs.com/top200_2003.html
Fixed width
read.fwf(“file.txt”,
width = c(3, 5, ...))

       Use width = list(c(3,5,..), c(5,7,..))
         for multiple rows per subject
Important functions
!   install.packages(“PackageName”, dep = T)
!   library(PackageName)
!   str(dataset)
!   summary(dataset)
!   head(dataset)
Appendix:
Probability
Functions
what it
        -norm          -t      -binom -pois               does

                                                        density

 d-        dnorm        dt     dbinom       dpois
                                                         (mass)
                                                        given x-
                                                          axis
                                                         return

 p-        pnorm        pt     pbinom       ppois
                                                      probability,
                                                        given x-
                                                      axis(quan.)
                                                         return

 q-        qnorm        qt     qbinom       qpois
                                                        quantile
                                                       (x-axis),
                                                      given prob.
        library(BS    t.test,                          return p-

-test       DA):
           z.test,
                   library(BS
                      DA):
                              binom.test poisson.test
                                                       value and
                                                      confidence
         zsum.test tsum.test                            interval

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Datastructures in python
Datastructures in pythonDatastructures in python
Datastructures in python
 
List,tuple,dictionary
List,tuple,dictionaryList,tuple,dictionary
List,tuple,dictionary
 
Strings in Python
Strings in PythonStrings in Python
Strings in Python
 
Group By, Having Clause and Order By clause
Group By, Having Clause and Order By clause Group By, Having Clause and Order By clause
Group By, Having Clause and Order By clause
 
Time and Space Complexity
Time and Space ComplexityTime and Space Complexity
Time and Space Complexity
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
 
SQL Queries Information
SQL Queries InformationSQL Queries Information
SQL Queries Information
 
Python list
Python listPython list
Python list
 
Introduction to matplotlib
Introduction to matplotlibIntroduction to matplotlib
Introduction to matplotlib
 
Lab2 ddl commands
Lab2 ddl commandsLab2 ddl commands
Lab2 ddl commands
 
SQL - Structured query language introduction
SQL - Structured query language introductionSQL - Structured query language introduction
SQL - Structured query language introduction
 
Sql subquery
Sql  subquerySql  subquery
Sql subquery
 
Unit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptxUnit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptx
 
DML, DDL, DCL ,DRL/DQL and TCL Statements in SQL with Examples
DML, DDL, DCL ,DRL/DQL and TCL Statements in SQL with ExamplesDML, DDL, DCL ,DRL/DQL and TCL Statements in SQL with Examples
DML, DDL, DCL ,DRL/DQL and TCL Statements in SQL with Examples
 
Python strings presentation
Python strings presentationPython strings presentation
Python strings presentation
 
Complexity of Algorithm
Complexity of AlgorithmComplexity of Algorithm
Complexity of Algorithm
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Python variables and data types.pptx
Python variables and data types.pptxPython variables and data types.pptx
Python variables and data types.pptx
 
SQL - DML and DDL Commands
SQL - DML and DDL CommandsSQL - DML and DDL Commands
SQL - DML and DDL Commands
 
SQL BUILT-IN FUNCTION
SQL BUILT-IN FUNCTIONSQL BUILT-IN FUNCTION
SQL BUILT-IN FUNCTION
 

Semelhante a Reading Data into R

Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
Kazuki Yoshida
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into R
Kazuki Yoshida
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
Radek Maciaszek
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
DataMine Lab
 

Semelhante a Reading Data into R (20)

Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
 
Genomic Analysis in Scala
Genomic Analysis in ScalaGenomic Analysis in Scala
Genomic Analysis in Scala
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into R
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
IntroductionSTATA.ppt
IntroductionSTATA.pptIntroductionSTATA.ppt
IntroductionSTATA.ppt
 
200612_BioPackathon_ss
200612_BioPackathon_ss200612_BioPackathon_ss
200612_BioPackathon_ss
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Stream or not to Stream?

Stream or not to Stream?
Stream or not to Stream?

Stream or not to Stream?

 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
Thea: Processing OWL Ontologies - An application of logic programming
Thea: Processing OWL Ontologies - An application of logic programmingThea: Processing OWL Ontologies - An application of logic programming
Thea: Processing OWL Ontologies - An application of logic programming
 
Processing OWL2 Ontologies using Thea: An application of Logic Programming
Processing OWL2 Ontologies using Thea: An application of Logic ProgrammingProcessing OWL2 Ontologies using Thea: An application of Logic Programming
Processing OWL2 Ontologies using Thea: An application of Logic Programming
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Psycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptPsycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python Script
 
So various polymorphism in Scala
So various polymorphism in ScalaSo various polymorphism in Scala
So various polymorphism in Scala
 
User biglm
User biglmUser biglm
User biglm
 
Introduction To Scala
Introduction To ScalaIntroduction To Scala
Introduction To Scala
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 

Mais de Kazuki Yoshida

20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R
Kazuki Yoshida
 
Linear regression with R 2
Linear regression with R 2Linear regression with R 2
Linear regression with R 2
Kazuki Yoshida
 
Linear regression with R 1
Linear regression with R 1Linear regression with R 1
Linear regression with R 1
Kazuki Yoshida
 
(Very) Basic graphing with R
(Very) Basic graphing with R(Very) Basic graphing with R
(Very) Basic graphing with R
Kazuki Yoshida
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to Deducer
Kazuki Yoshida
 
Groupwise comparison of continuous data
Groupwise comparison of continuous dataGroupwise comparison of continuous data
Groupwise comparison of continuous data
Kazuki Yoshida
 
Categorical data with R
Categorical data with RCategorical data with R
Categorical data with R
Kazuki Yoshida
 
Install and Configure R and RStudio
Install and Configure R and RStudioInstall and Configure R and RStudio
Install and Configure R and RStudio
Kazuki Yoshida
 

Mais de Kazuki Yoshida (20)

Graphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisGraphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysis
 
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTPharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
 
Emacs Key Bindings
Emacs Key BindingsEmacs Key Bindings
Emacs Key Bindings
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSO
 
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
 
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
 
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
 
Spacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionSpacemacs: emacs user's first impression
Spacemacs: emacs user's first impression
 
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing Data
 
20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R
 
Linear regression with R 2
Linear regression with R 2Linear regression with R 2
Linear regression with R 2
 
Linear regression with R 1
Linear regression with R 1Linear regression with R 1
Linear regression with R 1
 
(Very) Basic graphing with R
(Very) Basic graphing with R(Very) Basic graphing with R
(Very) Basic graphing with R
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to Deducer
 
Groupwise comparison of continuous data
Groupwise comparison of continuous dataGroupwise comparison of continuous data
Groupwise comparison of continuous data
 
Categorical data with R
Categorical data with RCategorical data with R
Categorical data with R
 
Install and Configure R and RStudio
Install and Configure R and RStudioInstall and Configure R and RStudio
Install and Configure R and RStudio
 

Último

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 

Reading Data into R