SlideShare a Scribd company logo
1 of 11
Download to read offline
useR Vignette:



    Accessing Databases
          from R


      Greater Boston useR Group
              May 4, 2011


                          by

                 Jeffrey Breen
           jbreen@cambridge.aero




Photo from http://en.wikipedia.org/wiki/File:Oracle_Headquarters_Redwood_Shores.jpg
Outline
●    Why relational databases?
●    Introducing DBI
●    Simple SQL queries
●    dbApply() marries strengths of
     SQL and *apply()
●    The parallel universe of RODBC
●    sqldf: No database? No problem!
●    Further Reading
●    Loading mtcars sample
     data.frame into MySQL




                                            AP Photo/Ben Margot

useR Vignette: Accessing Databases from R                Greater Boston useR Meeting, May 2011   Slide 2
Why relational databases?
●    Databases excel at handling large amounts of, um, data
●    They're everywhere
      ●    Virtually all enterprise applications are built on relational databases (CRM, ERP,
           HRIS, etc.)
      ●    Thanks to high quality open source databases (esp. MySQL and PostgreSQL),
           they're central to dynamic web development since beginning.
             –   “LAMP” = Linux + Apache + MySQL + PHP
      ●    Amazon's “Relational Data Service” is just a tuned deployment of MySQL
●    SQL provides almost-standard language to filter, aggregate, group, sort
      ●    SQL-like query languages showing up in new places (Hadoop Hive)
      ●    ODBC provides SQL interface to non-database data (Excel, CSV, text files)




useR Vignette: Accessing Databases from R       Greater Boston useR Meeting, May 2011    Slide 3
Introducing DBI

 ●      DBI provides a common interface for (most of) R's
        database packages
 ●      Database-specific code implemented in sub-
        packages
          ●     RMySQL, RPostgreSQL, ROracle, RSQLite, RJDBC
 ●      Use dbConnect(), dbDisconnect() to open, close
        connections:
> library(RMySQL)
> con = dbConnect("MySQL", "testdb", username="testuser", password="testpass")
[...]
> dbDisconnect(con)




useR Vignette: Accessing Databases from R   Greater Boston useR Meeting, May 2011   Slide 4
Using DBI
 ●    dbReadTable() and dbWriteTable() read and write entire tables
> df = dbReadTable(con, 'motortrend')
> head(df, 4)
                   mpg cyl disp hp drat                     wt     qsec vs am gear carb                     mfg     model
Mazda RX4         21.0   6 160 110 3.90                  2.620    16.46 0 1      4    4                   Mazda       RX4
Mazda RX4 Wag     21.0   6 160 110 3.90                  2.875    17.02 0 1      4    4                   Mazda   RX4 Wag
Datsun 710        22.8   4 108 93 3.85                   2.320    18.61 1 1      4    1                  Datsun       710
Hornet 4 Drive    21.4   6 258 110 3.08                  3.215    19.44 1 0      3    1                  Hornet   4 Drive


 ●    dbGetQuery() runs SQL query and returns entire result set
> df = dbGetQuery(con, "SELECT              * FROM motortrend")
> head(df,4)
       row_names mpg cyl disp                hp   drat      wt    qsec vs am gear carb    mfg   model
1      Mazda RX4 21.0   6 160               110   3.90   2.620   16.46 0 1      4    4 Mazda      RX4
2 Mazda RX4 Wag 21.0    6 160               110   3.90   2.875   17.02 0 1      4    4 Mazda RX4 Wag
3     Datsun 710 22.8   4 108                93   3.85   2.320   18.61 1 1      4    1 Datsun     710
4 Hornet 4 Drive 21.4   6 258               110   3.08   3.215   19.44 1 0      3    1 Hornet 4 Drive

       ●     Note how dbReadTable() uses “row_names” column
 ●    Use dbSendQuery() & fetch() to stream larger result sets
 ●    Advanced functions available to read schema definitions, handle
      transactions, call stored procedures, etc.
useR Vignette: Accessing Databases from R                        Greater Boston useR Meeting, May 2011                      Slide 5
Simple SQL queries

Fetch a column with no filtering but de-dupe:
> df = dbGetQuery(con, "SELECT DISTINCT mfg FROM motortrend")
> head(df, 3)
       mfg
1    Mazda
2   Datsun
3   Hornet


Aggregate and sort result:
> df = dbGetQuery(con, "SELECT mfg, avg(hp) AS meanHP FROM motortrend GROUP BY mfg ORDER BY
meanHP DESC")
> head(df, 4)
       mfg meanHP
1 Maserati    335
2     Ford    264
3   Duster    245
4   Camaro    245

> df = dbGetQuery(con, "SELECT cyl as cylinders, avg(hp) as meanHP FROM motortrend GROUP by
cyl ORDER BY cyl")
> df
  cylinders    meanHP
1         4 82.63636
2         6 122.28571
3         8 209.21429

useR Vignette: Accessing Databases from R   Greater Boston useR Meeting, May 2011       Slide 6
●dbApply() marries strengths of SQL and
*apply()
 ●    Operates on result set from dbSendQuery()
        ●     Uses fetch() to bring in smaller chunks at a time to handle Big Data
        ●     You must order result set by your “chunking” variable
 ●    Example: calculate quantiles for horsepower vs. cylinders
> sql = "SELECT cyl, hp FROM motortrend ORDER BY cyl"
> rs = dbSendQuery(con, sql)
> dbApply(rs, INDEX='cyl', FUN=function(x, grp) quantile(x$hp))
$`4.000000`
   0%   25%   50%   75% 100%
 52.0 65.5 91.0 96.0 113.0

$`6.000000`
  0% 25% 50%                    75% 100%
 105 110 110                    123 175

$`8.000000`
    0%    25%    50%    75%   100%
150.00 176.25 192.50 241.25 335.00

 ●    Implemented and available in RMySQL, RPostgreSQL
useR Vignette: Accessing Databases from R   Greater Boston useR Meeting, May 2011   Slide 7
The parallel universe of RODBC
 ●    ODBC = “open database connectivity”
       ●     Released by Microsoft in 1992
       ●     Cross-platform, but strongest support on Windows
       ●     ODBC drivers are available for every database you can think of PLUS
             Excel spreadsheets, CSV text files, etc.
 ●    For historical reasons, RODBC not part of DBI family
 ●    Same idea, different details:
       ●     odbcConnect() instead of dbConnection()
       ●     sqlFetch() = dbReadTable()
       ●     sqlSave() = dbWriteTable()
       ●     sqlQuery() = dbGetQuery()
 ●    Closest match in DBI family is RJDBC using Java JDBC drivers

useR Vignette: Accessing Databases from R   Greater Boston useR Meeting, May 2011   Slide 8
sqldf: No database? No problem!
 ●    Provides SQL access to data.frames as if they were tables
 ●    Creates & updates SQLite databases automagically
        ●    But can also be used with existing SQLite, MySQL databases
> library(sqldf)
> data(mtcars)
> sqldf("SELECT cyl, avg(hp) FROM mtcars GROUP BY cyl ORDER BY cyl")
  cyl   avg(hp)
1   4 82.63636
2   6 122.28571
3   8 209.21429

> library(stringr)
> mtcars$mfg = str_split_fixed(rownames(mtcars), ' ', 2)[,1]
> sqldf("SELECT mfg, avg(hp) AS meanHP FROM mtcars GROUP BY mfg ORDER BY meanHP DESC
LIMIT 4")
       mfg meanHP
1 Maserati    335
2     Ford    264
3   Camaro    245
4   Duster    245


useR Vignette: Accessing Databases from R   Greater Boston useR Meeting, May 2011   Slide 9
Further Reading
●   Bell Labs: R/S-Database Interface
      ●   http://stat.bell-labs.com/RS-DBI/
●   R Data Import/Export manual
      ●   http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases
●   CRAN: DBI and “Reverse depends” friends
      ●   http://cran.r-project.org/web/packages/DBI/
      ●   http://cran.r-project.org/web/packages/RMySQL/
      ●   http://cran.r-project.org/web/packages/RPostgreSQL/
      ●   http://cran.r-project.org/web/packages/RJDBC/
●   CRAN: RODBC
      ●   http://cran.r-project.org/web/packages/RODBC/
●   CRAN: sqldf
      ●   http://cran.r-project.org/web/packages/sqldf/
●   Phil Spector's SQL tutorial
      ●   http://www.stat.berkeley.edu/~spector/sql.pdf
useR Vignette: Accessing Databases from R       Greater Boston useR Meeting, May 2011   Slide 10
Loading 'mtcars' sample data.frame into
MySQL
In MySQL, create new database & user:
mysql> create database testdb;
mysql> grant all privileges on testdb.* to 'testuser'@'localhost' identified by 'testpass';
mysql> flush privileges;


In R, load "mtcars" data.frame, clean up, and write to new "motortrend" data base table:
library(stringr)
library(RMySQL)

data(mtcars)

mtcars$mfg = str_split_fixed(rownames(mtcars), ' ', 2)[,1]
mtcars$mfg[mtcars$mfg=='Merc'] = 'Mercedes'
mtcars$model = str_split_fixed(rownames(mtcars), ' ', 2)[,2]

con = dbConnect("MySQL", "testdb", username="testuser", password="testpass")

dbWriteTable(con, 'motortrend', mtcars)

dbDisconnect(con)



useR Vignette: Accessing Databases from R   Greater Boston useR Meeting, May 2011        Slide 11

More Related Content

What's hot

Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkPatrick Wendell
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra EnvironmentJim Hatcher
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingGerger
 
Updates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI IndexesUpdates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI IndexesJim Hatcher
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalystTakuya UESHIN
 
Oracle sharding : Installation & Configuration
Oracle sharding : Installation & ConfigurationOracle sharding : Installation & Configuration
Oracle sharding : Installation & Configurationsuresh gandhi
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat SheetHortonworks
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce AnandMHadoop
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamZheng Shao
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsCheng Lian
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
Foreign Data Wrapper Enhancements
Foreign Data Wrapper EnhancementsForeign Data Wrapper Enhancements
Foreign Data Wrapper EnhancementsShigeru Hanada
 

What's hot (20)

Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
Updates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI IndexesUpdates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI Indexes
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
 
Oracle sharding : Installation & Configuration
Oracle sharding : Installation & ConfigurationOracle sharding : Installation & Configuration
Oracle sharding : Installation & Configuration
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime Internals
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
 
Foreign Data Wrapper Enhancements
Foreign Data Wrapper EnhancementsForeign Data Wrapper Enhancements
Foreign Data Wrapper Enhancements
 

Viewers also liked

Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with RJeffrey Breen
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Jeffrey Breen
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMJeffrey Breen
 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Jeffrey Breen
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeJeffrey Breen
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in RJeffrey Breen
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & HadoopJeffrey Breen
 
Introduction to R for Data Science :: Session 5 [Data Structuring: Strings in R]
Introduction to R for Data Science :: Session 5 [Data Structuring: Strings in R]Introduction to R for Data Science :: Session 5 [Data Structuring: Strings in R]
Introduction to R for Data Science :: Session 5 [Data Structuring: Strings in R]Goran S. Milovanovic
 
Lattice Energy LLC - Battery energy density - product safety - thermal runawa...
Lattice Energy LLC - Battery energy density - product safety - thermal runawa...Lattice Energy LLC - Battery energy density - product safety - thermal runawa...
Lattice Energy LLC - Battery energy density - product safety - thermal runawa...Lewis Larsen
 
Data Hacking with RHadoop
Data Hacking with RHadoopData Hacking with RHadoop
Data Hacking with RHadoopEd Kohlwey
 
OBIEE Answers Vs Data Visualization: A Cage Match
OBIEE Answers Vs Data Visualization: A Cage MatchOBIEE Answers Vs Data Visualization: A Cage Match
OBIEE Answers Vs Data Visualization: A Cage MatchMichelle Kolbe
 
Hp distributed R User Guide
Hp distributed R User GuideHp distributed R User Guide
Hp distributed R User GuideAndrey Karpov
 

Viewers also liked (20)

Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with R
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VM
 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
 
Using R with Hadoop
Using R with HadoopUsing R with Hadoop
Using R with Hadoop
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
 
Introduction to R for Data Science :: Session 5 [Data Structuring: Strings in R]
Introduction to R for Data Science :: Session 5 [Data Structuring: Strings in R]Introduction to R for Data Science :: Session 5 [Data Structuring: Strings in R]
Introduction to R for Data Science :: Session 5 [Data Structuring: Strings in R]
 
Reshaping Data in R
Reshaping Data in RReshaping Data in R
Reshaping Data in R
 
Lattice Energy LLC - Battery energy density - product safety - thermal runawa...
Lattice Energy LLC - Battery energy density - product safety - thermal runawa...Lattice Energy LLC - Battery energy density - product safety - thermal runawa...
Lattice Energy LLC - Battery energy density - product safety - thermal runawa...
 
Data Hacking with RHadoop
Data Hacking with RHadoopData Hacking with RHadoop
Data Hacking with RHadoop
 
OBIEE Answers Vs Data Visualization: A Cage Match
OBIEE Answers Vs Data Visualization: A Cage MatchOBIEE Answers Vs Data Visualization: A Cage Match
OBIEE Answers Vs Data Visualization: A Cage Match
 
Excel/R
Excel/RExcel/R
Excel/R
 
Applications of R (DataWeek 2014)
Applications of R (DataWeek 2014)Applications of R (DataWeek 2014)
Applications of R (DataWeek 2014)
 
Hp distributed R User Guide
Hp distributed R User GuideHp distributed R User Guide
Hp distributed R User Guide
 
R crash course
R crash courseR crash course
R crash course
 

Similar to Accessing Databases from R

TSQL in SQL Server 2012
TSQL in SQL Server 2012TSQL in SQL Server 2012
TSQL in SQL Server 2012Eduardo Castro
 
My sql susecon_crashcourse_2012
My sql susecon_crashcourse_2012My sql susecon_crashcourse_2012
My sql susecon_crashcourse_2012sqlhjalp
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to AnalyticsReplicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to AnalyticsContinuent
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql databasePARIKSHIT SAVJANI
 
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to AnalyticsReplicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to AnalyticsLinas Virbalas
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerMariaDB plc
 
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...Continuent
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Databasewangzhonnew
 
[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)Steve Min
 
Data stage scenario design9 - job1
Data stage scenario   design9 - job1Data stage scenario   design9 - job1
Data stage scenario design9 - job1Naresh Bala
 
HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)akirahiguchi
 
GoldenGate CDR from UKOUG 2017
GoldenGate CDR from UKOUG 2017GoldenGate CDR from UKOUG 2017
GoldenGate CDR from UKOUG 2017Bobby Curtis
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?Mydbops
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierKellyn Pot'Vin-Gorman
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
RMySQL Tutorial For Beginners
RMySQL Tutorial For BeginnersRMySQL Tutorial For Beginners
RMySQL Tutorial For BeginnersRsquared Academy
 

Similar to Accessing Databases from R (20)

TSQL in SQL Server 2012
TSQL in SQL Server 2012TSQL in SQL Server 2012
TSQL in SQL Server 2012
 
My sql susecon_crashcourse_2012
My sql susecon_crashcourse_2012My sql susecon_crashcourse_2012
My sql susecon_crashcourse_2012
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
MySQL8.0 in COSCUP2017
MySQL8.0 in COSCUP2017MySQL8.0 in COSCUP2017
MySQL8.0 in COSCUP2017
 
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to AnalyticsReplicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to AnalyticsReplicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
 
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Database
 
[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)
 
Copy Data Management for the DBA
Copy Data Management for the DBACopy Data Management for the DBA
Copy Data Management for the DBA
 
Data stage scenario design9 - job1
Data stage scenario   design9 - job1Data stage scenario   design9 - job1
Data stage scenario design9 - job1
 
HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)
 
GoldenGate CDR from UKOUG 2017
GoldenGate CDR from UKOUG 2017GoldenGate CDR from UKOUG 2017
GoldenGate CDR from UKOUG 2017
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next Frontier
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
RMySQL Tutorial For Beginners
RMySQL Tutorial For BeginnersRMySQL Tutorial For Beginners
RMySQL Tutorial For Beginners
 

Recently uploaded

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Recently uploaded (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Accessing Databases from R

  • 1. useR Vignette: Accessing Databases from R Greater Boston useR Group May 4, 2011 by Jeffrey Breen jbreen@cambridge.aero Photo from http://en.wikipedia.org/wiki/File:Oracle_Headquarters_Redwood_Shores.jpg
  • 2. Outline ● Why relational databases? ● Introducing DBI ● Simple SQL queries ● dbApply() marries strengths of SQL and *apply() ● The parallel universe of RODBC ● sqldf: No database? No problem! ● Further Reading ● Loading mtcars sample data.frame into MySQL AP Photo/Ben Margot useR Vignette: Accessing Databases from R Greater Boston useR Meeting, May 2011 Slide 2
  • 3. Why relational databases? ● Databases excel at handling large amounts of, um, data ● They're everywhere ● Virtually all enterprise applications are built on relational databases (CRM, ERP, HRIS, etc.) ● Thanks to high quality open source databases (esp. MySQL and PostgreSQL), they're central to dynamic web development since beginning. – “LAMP” = Linux + Apache + MySQL + PHP ● Amazon's “Relational Data Service” is just a tuned deployment of MySQL ● SQL provides almost-standard language to filter, aggregate, group, sort ● SQL-like query languages showing up in new places (Hadoop Hive) ● ODBC provides SQL interface to non-database data (Excel, CSV, text files) useR Vignette: Accessing Databases from R Greater Boston useR Meeting, May 2011 Slide 3
  • 4. Introducing DBI ● DBI provides a common interface for (most of) R's database packages ● Database-specific code implemented in sub- packages ● RMySQL, RPostgreSQL, ROracle, RSQLite, RJDBC ● Use dbConnect(), dbDisconnect() to open, close connections: > library(RMySQL) > con = dbConnect("MySQL", "testdb", username="testuser", password="testpass") [...] > dbDisconnect(con) useR Vignette: Accessing Databases from R Greater Boston useR Meeting, May 2011 Slide 4
  • 5. Using DBI ● dbReadTable() and dbWriteTable() read and write entire tables > df = dbReadTable(con, 'motortrend') > head(df, 4) mpg cyl disp hp drat wt qsec vs am gear carb mfg model Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Datsun 710 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive ● dbGetQuery() runs SQL query and returns entire result set > df = dbGetQuery(con, "SELECT * FROM motortrend") > head(df,4) row_names mpg cyl disp hp drat wt qsec vs am gear carb mfg model 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Datsun 710 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive ● Note how dbReadTable() uses “row_names” column ● Use dbSendQuery() & fetch() to stream larger result sets ● Advanced functions available to read schema definitions, handle transactions, call stored procedures, etc. useR Vignette: Accessing Databases from R Greater Boston useR Meeting, May 2011 Slide 5
  • 6. Simple SQL queries Fetch a column with no filtering but de-dupe: > df = dbGetQuery(con, "SELECT DISTINCT mfg FROM motortrend") > head(df, 3) mfg 1 Mazda 2 Datsun 3 Hornet Aggregate and sort result: > df = dbGetQuery(con, "SELECT mfg, avg(hp) AS meanHP FROM motortrend GROUP BY mfg ORDER BY meanHP DESC") > head(df, 4) mfg meanHP 1 Maserati 335 2 Ford 264 3 Duster 245 4 Camaro 245 > df = dbGetQuery(con, "SELECT cyl as cylinders, avg(hp) as meanHP FROM motortrend GROUP by cyl ORDER BY cyl") > df cylinders meanHP 1 4 82.63636 2 6 122.28571 3 8 209.21429 useR Vignette: Accessing Databases from R Greater Boston useR Meeting, May 2011 Slide 6
  • 7. ●dbApply() marries strengths of SQL and *apply() ● Operates on result set from dbSendQuery() ● Uses fetch() to bring in smaller chunks at a time to handle Big Data ● You must order result set by your “chunking” variable ● Example: calculate quantiles for horsepower vs. cylinders > sql = "SELECT cyl, hp FROM motortrend ORDER BY cyl" > rs = dbSendQuery(con, sql) > dbApply(rs, INDEX='cyl', FUN=function(x, grp) quantile(x$hp)) $`4.000000` 0% 25% 50% 75% 100% 52.0 65.5 91.0 96.0 113.0 $`6.000000` 0% 25% 50% 75% 100% 105 110 110 123 175 $`8.000000` 0% 25% 50% 75% 100% 150.00 176.25 192.50 241.25 335.00 ● Implemented and available in RMySQL, RPostgreSQL useR Vignette: Accessing Databases from R Greater Boston useR Meeting, May 2011 Slide 7
  • 8. The parallel universe of RODBC ● ODBC = “open database connectivity” ● Released by Microsoft in 1992 ● Cross-platform, but strongest support on Windows ● ODBC drivers are available for every database you can think of PLUS Excel spreadsheets, CSV text files, etc. ● For historical reasons, RODBC not part of DBI family ● Same idea, different details: ● odbcConnect() instead of dbConnection() ● sqlFetch() = dbReadTable() ● sqlSave() = dbWriteTable() ● sqlQuery() = dbGetQuery() ● Closest match in DBI family is RJDBC using Java JDBC drivers useR Vignette: Accessing Databases from R Greater Boston useR Meeting, May 2011 Slide 8
  • 9. sqldf: No database? No problem! ● Provides SQL access to data.frames as if they were tables ● Creates & updates SQLite databases automagically ● But can also be used with existing SQLite, MySQL databases > library(sqldf) > data(mtcars) > sqldf("SELECT cyl, avg(hp) FROM mtcars GROUP BY cyl ORDER BY cyl") cyl avg(hp) 1 4 82.63636 2 6 122.28571 3 8 209.21429 > library(stringr) > mtcars$mfg = str_split_fixed(rownames(mtcars), ' ', 2)[,1] > sqldf("SELECT mfg, avg(hp) AS meanHP FROM mtcars GROUP BY mfg ORDER BY meanHP DESC LIMIT 4") mfg meanHP 1 Maserati 335 2 Ford 264 3 Camaro 245 4 Duster 245 useR Vignette: Accessing Databases from R Greater Boston useR Meeting, May 2011 Slide 9
  • 10. Further Reading ● Bell Labs: R/S-Database Interface ● http://stat.bell-labs.com/RS-DBI/ ● R Data Import/Export manual ● http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases ● CRAN: DBI and “Reverse depends” friends ● http://cran.r-project.org/web/packages/DBI/ ● http://cran.r-project.org/web/packages/RMySQL/ ● http://cran.r-project.org/web/packages/RPostgreSQL/ ● http://cran.r-project.org/web/packages/RJDBC/ ● CRAN: RODBC ● http://cran.r-project.org/web/packages/RODBC/ ● CRAN: sqldf ● http://cran.r-project.org/web/packages/sqldf/ ● Phil Spector's SQL tutorial ● http://www.stat.berkeley.edu/~spector/sql.pdf useR Vignette: Accessing Databases from R Greater Boston useR Meeting, May 2011 Slide 10
  • 11. Loading 'mtcars' sample data.frame into MySQL In MySQL, create new database & user: mysql> create database testdb; mysql> grant all privileges on testdb.* to 'testuser'@'localhost' identified by 'testpass'; mysql> flush privileges; In R, load "mtcars" data.frame, clean up, and write to new "motortrend" data base table: library(stringr) library(RMySQL) data(mtcars) mtcars$mfg = str_split_fixed(rownames(mtcars), ' ', 2)[,1] mtcars$mfg[mtcars$mfg=='Merc'] = 'Mercedes' mtcars$model = str_split_fixed(rownames(mtcars), ' ', 2)[,2] con = dbConnect("MySQL", "testdb", username="testuser", password="testpass") dbWriteTable(con, 'motortrend', mtcars) dbDisconnect(con) useR Vignette: Accessing Databases from R Greater Boston useR Meeting, May 2011 Slide 11