2. R introduction
● R is a statistical and graphical programming language
– Lingua Franca of Data Science
● Easy to use and powerful
● R is free and exists on very platform (Window, Unix)
– Large community
● There will be a lack of data-scientists
● Some elements are coming from Datacamp tutorials
3. R in public repositories of Github
Year Rank Nb public repository
2014 14th 48.574
2013 24th 7.867
2012 25th 5.626
● Index Tiobe
– http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
– R is 19th (September 2015)
4. R links
● Datacamp (R training)
– https://www.datacamp.com/
● Following datacamp, each year the number of R
users grows by 40 %.
● My examples in github
– https://github.com/franck-benault/test-R
5. R plan
● Environment
● Data types
– From basics to Dataframes
● R and statistics
● Diagrams
7. R's fundamental data types
● Logical value (TRUE, FALSE, T, F, NA)
● Numeric (2, 4.5)
● Integer (2L)
● Character
● Complex
● Raw (store raw bytes)
● is.* functions (test : is.numeric(), is.integer() ...)
● as.* functions (conversion as.numeric(), as.integer() ...)
8. R datatype vector
● Vector
– Sequence of data elements (one dimension)
– Same datatype
– a <- c(1,2,5.3,6,-2,4)
– b <- c("one","two","three")
– d <- c(1,2.1,"three") # vector of character
● Methods
– is.vector(v)
– Length(v)
– Names(v) <- v2 #to associate a name to the values
● Basic data types are vectors
– a <-2
– is.vector(a) #return TRUE
9. R datatype vector, methods
● A lot of methods can be used on vector of numeric
– mean(V) #average
– median(V)
– sum(V)
● Name on vectors
– a <- c(1,6,5)
– n <- c("Ford","Renault", "Fiat")
– names(a) <- n
– b <- c(Ford=1, Renault=6, Fiat=5)
● You need a collection of elements with different datatype use a List
10. R datatype matrix, names
● Names with rownames, and colnames
– m <- matrix(1:6, byrow=TRUE, nrow=2)
– rownames(m) <- c("row1", "row2")
– colnames(m) <- c("col1", "col2", "col3")
● matrix(1:6, byrow=TRUE, nrow=2,
dimnames=list(c("row1", "row2"),c("col1","col2","col3")))
11. R datatype matrix
● Matrix
– two dimensions
– all elements have same type
● Creation, matrix() function with vector as parameter
– y<-matrix(1:20, nrow=5,ncol=4)
● Creation from two or more vectors, cbind or rbind
– cbind(1:4, 1:4, 1:4)
– rbind(1:4, 1:4, 1:4)
12. R datatype Factor
● Categorical variable
– Limited number of different values
– Belong to a category
● In R, Factor datastructure
● # example blood type
– blood <- c("A","B", "O", "AB","O", "A")
– blood_factor <- factor(blood)
– blood_factor
– #order of the levels alphabetical
– str(blood_factor)
– table(blood_factor)
13. R datatype List
● List
– One dimension
– Different R objects (even list, matrix, vector)
– Loss of functionality
● Creation of list
– song <- list("Rsome types", 190, 5)
● Naming a list
– names(song) <- c("title","duration","track")
– song <- list(title="Rsome types", duration=190, track=5)
14. R datatype dataframe
● Datasets
– Observations
– Variables
● Example people
– Row = observation
– Properties = variables
● Store that in R
– List
– Dataframe
15. R datatype dataframe
● data.frame
– Specifically for a dataset
– Rows = observations
– Columns = variables
– Contains elements of different types
● Read a csv file to create a dataframe
– people <-read.csv("./people.csv", sep="",
header=TRUE)
16. R and statistics
● Four types of variables (SS Stevens 1946)
– Nominal (categories)
– Ordinal (rank 1st 2nd etc)
– Interval (interval between each value is equal)
– Ratio (interval + « true » zero)
17. R and statistics : Data description
● Data description
– centrality
● Mean (average), function mean()
● Median (50%), function median()
● Mode (peak)
– Spread
● Standard deviation (variance and sd)
● Inter quartile range
– Scale() : transformation to Z-score (mean = 0)
18. R and statistics : main functions
● Rnorm()
– generation of a sample following the normal distribution
● Summary()
– Lot of information
● Min,max,average,median etc
20. R Diagrams
● Qualitative, diagrams
– Bar plot
– Pie chart
● Quantitative
– Few numerical value
● Diagram = dot plot
– Lot of data
● Histogram
● Box plot
21. R Libraries
● Maps
– Install.packages(« maps »)
– library(« maps »)
– map(« world »)
– map(« france »)
– title("la France")
22. Conclusion
● When will you start using R ?
● Maybe it is also a good idea to follow a basis statistics
course