3. MACHINE LEARNING WITH R
WHAT IS MACHINE LEARNING USE CASES FOR MACHINE
LEARNING
SUPERVISED LEARNING
UNSUPERVISED LEARNING INTRODUCING R
COOL FEATURES OF R R AND ORACLE
4. MACHINE LEARNING
• Machine learning is the subfield of computer science that gives
computers the ability to learn without being explicitly programmed.
5. MACHINE LEARNING
USE CASES
• E-mail categorization
Spam, News, Personal, Orders, …
• Anomaly detection
Fraud detection, behavior which does not fit known classifications well
• Optical Character recognition (OCR)
• Genetics
Will you have a high change of relapse when you have this cancer type
and these genes?
6. MACHINE LEARNING
USE CASES
• Log file analysis
Which entries are rare?
Which are the variables in a log line?
Intruder detection
• IoT
Self learning thermostats
• Predict weather
Based on environmental measures like
humidity, air pressure, satellite images
• Detect trends
The number of cases present in the
KEI system at Spir-it and performance
• Image recognition
Self driving cars like Tesla, BMW
• Predict stock prices
Find correlations between stocks and try to
find features which can predict future prices
7. 1 2
WHAT IS MACHINE LEARNING
Supervised learning Unsupervised learning
8. SUPERVISED LEARNING
• The computer is presented with input and desired output
• The goal is to derive a general ruleset to map input to output
• This ruleset can be used to do predictions of output based on input
14. SUPERVISED LEARNING
RANDOM FOREST
• Features are used to classify data
• A set of decision trees are generated based on 2 sets of random features
• Every tree sees a subset of the data
• Splits in the tree are determined by training data values
where does a split add most information
• To do predictions, features are put through all decision trees
and the result classifications are given a weight
18. SUPERVISED LEARNING
RANDOM FOREST
• Why is it very useful?
• Data does not have many requirements
• Can deal with multiple dimensions
• Does good predictions in a lot of cases
• Fast
• Variable importance can easily be determined
If many features are correlated, a single representative feature can be used
21. ARTIFICIAL NEURAL NETWORKS (ANN)
EXAMPLE BACKPROPAGATION
• Backpropagation
1. Nodes have connections and connections have a random assigned weight
2. Provide input and let the network generate output
3. Compare generated output with desired output
4. Go from output nodes back to input and adjust the weight of the node connections.
Adjusting a little bit at a time increases learning time and accuracy
5. Repeat from step 2 until desired error rate reached
• Can be done with weights or with node activation thresholds
22. ARTIFICIAL NEURAL NETWORKS (ANN)
SOME PERSONAL THOUGHTS (AS NEUROBIOLOGIST)
• Most samples of artificial neural networks do not take into account several
properties of biological neural networks
• Signals take time to go from A to B
• Neurons are not arranged in layers
Biological neural networks have a 3d structure with specialized area’s
• Once trained, most artificial neural networks are static and don’t learn anymore
• Biological neural networks implement a wide range of signaling mechanisms per node
(neurotransmitters)
• Learning algorithms are not only internal to the neural network.
Natural selection also plays a role
23. SUPERVISED LEARNING
CHALLENGES
• Requires learning set of inputs and desired outputs
• Training data should be balanced
• Correlated features cause biases
• Outputs should be distributed as evenly as possible
25. UNSUPERVISED LEARNING
• Unsupervised machine learning is the machine learning task of
inferring a function to describe hidden structure from "unlabeled"
data
a classification or categorization is not included in the observations
• Examples
• Clustering
• Anomaly detection
• Neural networks (Self Organizing Map)
32. R A SHORT HISTORY
• Conceived august 1993
An implementation of the S programming language
S was conceived in 1976
• Open sourced June 1995
• Main competitors: SPSS and SAS
• A lot of (mostly statistical) libraries available
CRAN package repository features 10366 available packages.
35. R BASICS
• R is a functional programming (FP) language
• It provides many tools for the creation and manipulation of functions.
• You can do anything with functions that you can do with vectors: you
can assign them to variables, store them in lists, pass them as
arguments to other functions, create them inside functions, and even
return them as the result of a function.
36. R BASICS
SOME FEATURES
• GIT integration
• Interpreted; does not require compilation
Execute a line in your script and look at the result in the console
• Has its own markdown variant for documentation
Especially useful if you want to have graphs
• R Shiny allows you to generate and host scripts / graphs and make
them available from a browser
37. R BASICS
SOME FEATURES
• Code completion
• Allows multi threaded execution
• Can be run remotely on an R-server
• Great at reading / writing datasets
For example web site scraping for data
• Of course great at statistics
• Great at generating plots
Especially when using the ggplot2 library
39. R DATATYPES
THE VECTOR
• Vector
a <- c(1,2,5.3,6,-2,4) # numeric vector
b <- c("one","two","three") # character vector
c <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
a <- c(1,2,5.3,6,-2,4)
b <- a * 2
[1] 2.0 4.0 10.6 12.0 -4.0 8.0
40. R DATATYPES
THE MATRIX. ALL VALUES HAVE THE SAME TYPE AND LENGTH
# generates 5 x 4 numeric matrix
y<-matrix(1:20, nrow=5,ncol=4)
# another example
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2")
mymatrix <- matrix(cells, nrow=2, ncol=2,
byrow=TRUE, dimnames=list(rnames, cnames))
# accessing matrix values
|x[,4] # 4th column of matrix
x[3,] # 3rd row of matrix
x[2:4,1:3] # rows 2,3,4 of columns 1,2,3
41. R DATATYPES
THE DATA.FRAME. LIKE A MATRIX BUT TYPES AND LENGTHS CAN VARY
d <- c(1,2,3,4)
e <- c("red", "white", "red", NA)
f <- c(TRUE,TRUE,TRUE,FALSE)
mydata <- data.frame(d,e,f)
names(mydata) <- c("ID","Color","Passed") # variable names
myframe[3:5] # columns 3,4,5 of data frame
myframe[c("ID","Age")] # columns ID and Age from data frame
myframe$X1 # variable x1 in the data frame
42. R DATATYPES
THE LIST
• An ordered collection of objects (components)
# example of a list with 4 components –
# a string, a numeric vector, a matrix, and a scaler
w <- list(name=“Maarten", mynumbers=a, mymatrix=y, age=36)
# example of a list containing two lists
v <- c(list1,list2)
43. 1 2 3
Hosting plots
Shiny
Plot.ly
R markdown Web site crawling
COOL FEATURES OF R