10. sqldf パッケージ
sqldf
sqldf is an R package for runing SQL statements on R data frames, optimized for convenience. The user
simply specifies an SQL statement in R using data frame names in place of table names and a database with
appropriate table layouts/schema is automatically created, the data frames are automatically loaded into
the database, the specified SQL statement is performed, the result is read back into R and the database is
deleted all automatically behind the scenes making the database's existence transparent to the user who
only specifies the SQL statement. Surprisingly this can at times be even faster than the corresponding pure
R calculation (although the purpose of the project is convenience and not speed).
http://code.google.com/p/sqldf/
(
sqldf R SQL
DB SQL
R DB
03/05/11 10
11. sqldf パッケージ
#sqldf
install.packages("sqldf")
library(sqldf)
sqldf(” SQL ”
#
#iris
sqldf("SELECT COUNT(*) as iris_count FROM iris")
iris_count
1 150
#iris Secies
>sqldf("SELECT Species , COUNT(*) as iris_count FROM iris GROUP BY Species")
Species iris_count
1 setosa 50
2 versicolor 50
3 virginica 50
03/05/11 11
12. sqldf パッケージ
#
# .
iris2 <- iris
colnames(iris2) <- c("Sepal_Length" , "Sepal_Width" , "Petal_Length" ,"Petal_Width" , "Species")
head(iris2)
#Specis
sqldf("
SELECT
Species ,
COUNT(Species) as Species_num,
AVG(Sepal_Length) as average_Lentgh,
AVG(Sepal_Width) as average_width
FROM
iris2
GROUP BY
Species
")
Species Species_num average_Lentgh average_width
1 setosa 50 5.006 3.428
2 versicolor 50 5.936 2.770
3 virginica 50 6.588 2.974
03/05/11 12