SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
Time Series Analysis with R 
Yanchang Zhao 
http://www.RDataMining.com 
30 September 2014 
1 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
2 / 39
Time Series Analysis with R 1 
I time series data in R 
I time series decomposition, forecasting, clustering and 
classi
cation 
I autoregressive integrated moving average (ARIMA) model 
I Dynamic Time Warping (DTW) 
I Discrete Wavelet Transform (DWT) 
I k-NN classi
cation 
1Chapter 8: Time Series Analysis and Mining, in book 
R and Data Mining: Examples and Case Studies. 
http://www.rdatamining.com/docs/RDataMining.pdf 
3 / 39
R 
I a free software environment for statistical computing and 
graphics 
I runs on Windows, Linux and MacOS 
I widely used in academia and research, as well as industrial 
applications 
I 5,800+ packages (as of 13 Sept 2014) 
I CRAN Task View: Time Series Analysis 
http://cran.r-project.org/web/views/TimeSeries.html 
4 / 39
Time Series Data in R 
I class ts 
I represents data which has been sampled at equispaced points 
in time 
I frequency=7: a weekly series 
I frequency=12: a monthly series 
I frequency=4: a quarterly series 
5 / 39
Time Series Data in R 
a <- ts(1:20, frequency = 12, start = c(2011, 3)) 
print(a) 
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
## 2011 1 2 3 4 5 6 7 8 9 10 
## 2012 11 12 13 14 15 16 17 18 19 20 
str(a) 
## Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 9 10... 
attributes(a) 
## $tsp 
## [1] 2011 2013 12 
## 
## $class 
## [1] "ts" 
6 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
7 / 39
What is Time Series Decomposition 
To decompose a time series into components: 
I Trend component: long term trend 
I Seasonal component: seasonal variation 
I Cyclical component: repeated but non-periodic 
uctuations 
I Irregular component: the residuals 
8 / 39
Data AirPassengers 
Data AirPassengers: monthly totals of Box Jenkins international 
airline passengers, 1949 to 1960. It has 144(=1212) values. 
plot(AirPassengers) 
Time 
AirPassengers 
1950 1952 1954 1956 1958 1960 
100 200 300 400 500 600 
9 / 39
Decomposition 
apts - ts(AirPassengers, frequency = 12) 
f - decompose(apts) 
plot(f$figure, type = b) # seasonal figures 
2 4 6 8 10 12 
−40 −20 0 20 40 60 
Index 
f$figure 
10 / 39
Decomposition 
plot(f) 
observed 
150 250 350 450 
100 300 500 
trend 
seasonal 
−40 0 20 40 60 
−40 0 20 40 60 
2 4 6 8 10 12 
random 
Time 
Decomposition of additive time series 
11 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
12 / 39
Time Series Forecasting 
I To forecast future events based on known past data 
I For example, to predict the price of a stock based on its past 
performance 
I Popular models 
I Autoregressive moving average (ARMA) 
I Autoregressive integrated moving average (ARIMA) 
13 / 39
Forecasting 
# build an ARIMA model 
fit - arima(AirPassengers, order = c(1, 0, 0), list(order = c(2, 
1, 0), period = 12)) 
fore - predict(fit, n.ahead = 24) 
# error bounds at 95% confidence level 
U - fore$pred + 2 * fore$se 
L - fore$pred - 2 * fore$se 
ts.plot(AirPassengers, fore$pred, U, L, 
col = c(1, 2, 4, 4), lty = c(1, 1, 2, 2)) 
legend(topleft, col = c(1, 2, 4), lty = c(1, 1, 2), 
c(Actual, Forecast, Error Bounds (95% Confidence))) 
14 / 39
Forecasting 
1950 1952 1954 1956 1958 1960 1962 
Time 
100 200 300 400 500 600 700 
Actual 
Forecast 
Error Bounds (95% Confidence) 
15 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
16 / 39
Time Series Clustering 
I To partition time series data into groups based on similarity or 
distance, so that time series in the same cluster are similar 
I Measure of distance/dissimilarity 
I Euclidean distance 
I Manhattan distance 
I Maximum norm 
I Hamming distance 
I The angle between two vectors (inner product) 
I Dynamic Time Warping (DTW) distance 
I ... 
17 / 39
Dynamic Time Warping (DTW) 
DTW
nds optimal alignment between two time series. 
library(dtw) 
idx - seq(0, 2 * pi, len = 100) 
a - sin(idx) + runif(100)/10 
b - cos(idx) 
align - dtw(a, b, step = asymmetricP1, keep = T) 
dtwPlotTwoWay(align) 
Index 
Query value 
0 20 40 60 80 100 
−1.0 −0.5 0.0 0.5 1.0 
18 / 39
Synthetic Control Chart Time Series 
I The dataset contains 600 examples of control charts 
synthetically generated by the process in Alcock and 
Manolopoulos (1999). 
I Each control chart is a time series with 60 values. 
I Six classes: 
I 1-100 Normal 
I 101-200 Cyclic 
I 201-300 Increasing trend 
I 301-400 Decreasing trend 
I 401-500 Upward shift 
I 501-600 Downward shift 
I http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_ 
control.html 
19 / 39
Synthetic Control Chart Time Series 
# read data into R sep='': the separator is white space, i.e., one 
# or more spaces, tabs, newlines or carriage returns 
sc - read.table(./data/synthetic_control.data, header = F, sep = ) 
# show one sample from each class 
idx - c(1, 101, 201, 301, 401, 501) 
sample1 - t(sc[idx, ]) 
plot.ts(sample1, main = ) 
20 / 39
Six Classes 
1 
24 26 28 30 32 34 36 
101 
15 20 25 30 35 40 45 
25 30 35 40 45 
0 10 20 30 40 50 60 
201 
Time 
301 
0 10 20 30 
401 
25 30 35 40 45 
10 15 20 25 30 35 
0 10 20 30 40 50 60 
501 
Time 
21 / 39
Hierarchical Clustering with Euclidean distance 
# sample n cases from every class 
n - 10 
s - sample(1:100, n) 
idx - c(s, 100 + s, 200 + s, 300 + s, 400 + s, 500 + s) 
sample2 - sc[idx, ] 
observedLabels - rep(1:6, each = n) 
# hierarchical clustering with Euclidean distance 
hc - hclust(dist(sample2), method = ave) 
plot(hc, labels = observedLabels, main = ) 
22 / 39
Hierarchical Clustering with Euclidean distance 
20 40 60 80 100 120 140 
4 6 66 
1 22 11 
11 11 
1 11 66 
6 6 6 66 
4 4 44 
44 
4 44 
33 
33 33 
5 5 55 
55 
33 
33 
55 
55 
22 
2 22 
2 22 
hclust (*, average) 
dist(sample2) 
Height 
23 / 39
Hierarchical Clustering with Euclidean distance 
# cut tree to get 8 clusters 
memb - cutree(hc, k = 8) 
table(observedLabels, memb) 
## memb 
## observedLabels 1 2 3 4 5 6 7 8 
## 1 10 0 0 0 0 0 0 0 
## 2 0 3 1 1 3 2 0 0 
## 3 0 0 0 0 0 0 10 0 
## 4 0 0 0 0 0 0 0 10 
## 5 0 0 0 0 0 0 10 0 
## 6 0 0 0 0 0 0 0 10 
24 / 39
Hierarchical Clustering with DTW Distance 
myDist - dist(sample2, method = DTW) 
hc - hclust(myDist, method = average) 
plot(hc, labels = observedLabels, main = ) 
# cut tree to get 8 clusters 
memb - cutree(hc, k = 8) 
table(observedLabels, memb) 
## memb 
## observedLabels 1 2 3 4 5 6 7 8 
## 1 10 0 0 0 0 0 0 0 
## 2 0 4 3 2 1 0 0 0 
## 3 0 0 0 0 0 6 4 0 
## 4 0 0 0 0 0 0 0 10 
## 5 0 0 0 0 0 0 10 0 
## 6 0 0 0 0 0 0 0 10 
25 / 39
Hierarchical Clustering with DTW Distance 
0 200 400 600 800 1000 
3 33 
3 33 
55 
33 33 
5 55 
5 55 
55 
66 
6 46 
4 44 
44 
44 
44 
66 66 66 
11 111 
11 111 
22 22 
22 
22 
22 
hclust (*, average) 
myDist 
Height 
26 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
27 / 39
Time Series Classi
cation 
Time Series Classi
cation 
I To build a classi
cation model based on labelled time series 
I and then use the model to predict the lable of unlabelled time 
series 
Feature Extraction 
I Singular Value Decomposition (SVD) 
I Discrete Fourier Transform (DFT) 
I Discrete Wavelet Transform (DWT) 
I Piecewise Aggregate Approximation (PAA) 
I Perpetually Important Points (PIP) 
I Piecewise Linear Representation 
I Symbolic Representation 
28 / 39
Decision Tree (ctree) 
ctree from package party 
classId - rep(as.character(1:6), each = 100) 
newSc - data.frame(cbind(classId, sc)) 
library(party) 
ct - ctree(classId ~ ., data = newSc, 
controls = ctree_control(minsplit = 20, 
minbucket = 5, maxdepth = 5)) 
29 / 39

Mais conteúdo relacionado

Mais procurados

Bivariate analysis
Bivariate analysisBivariate analysis
Bivariate analysis
ariassam
 

Mais procurados (20)

Time Series Analysis, Components and Application in Forecasting
Time Series Analysis, Components and Application in ForecastingTime Series Analysis, Components and Application in Forecasting
Time Series Analysis, Components and Application in Forecasting
 
1634 time series and trend analysis
1634 time series and trend analysis1634 time series and trend analysis
1634 time series and trend analysis
 
Charts And Graphs
Charts And GraphsCharts And Graphs
Charts And Graphs
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 
Time Series Analysis/ Forecasting
Time Series Analysis/ Forecasting  Time Series Analysis/ Forecasting
Time Series Analysis/ Forecasting
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Trend and Seasonal Components
Trend and Seasonal ComponentsTrend and Seasonal Components
Trend and Seasonal Components
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and Forecasting
 
Bivariate analysis
Bivariate analysisBivariate analysis
Bivariate analysis
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Time series
Time seriesTime series
Time series
 
INTRODUCTION TO STATA.pptx
INTRODUCTION TO STATA.pptxINTRODUCTION TO STATA.pptx
INTRODUCTION TO STATA.pptx
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 
Basic Classification.pptx
Basic Classification.pptxBasic Classification.pptx
Basic Classification.pptx
 
Holt winters exponential.pptx
Holt winters exponential.pptxHolt winters exponential.pptx
Holt winters exponential.pptx
 
Introduction to Stata
Introduction to StataIntroduction to Stata
Introduction to Stata
 
time series analysis
time series analysistime series analysis
time series analysis
 
Time Series Decomposition
Time Series DecompositionTime Series Decomposition
Time Series Decomposition
 
Analysis of Time Series
Analysis of Time SeriesAnalysis of Time Series
Analysis of Time Series
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive Models
 

Destaque

INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
USAINS Holding Sdn. Bhd. (wholly-owned by Universiti Sains Malaysia)
 
Time Series
Time SeriesTime Series
Time Series
yush313
 

Destaque (20)

Timeseries Analysis with R
Timeseries Analysis with RTimeseries Analysis with R
Timeseries Analysis with R
 
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Time Series
Time SeriesTime Series
Time Series
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time series
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniques
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Course - Machine Learning Basics with R
Course - Machine Learning Basics with R
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
 
TextMining with R
TextMining with RTextMining with R
TextMining with R
 
Demand forecasting by time series analysis
Demand forecasting by time series analysisDemand forecasting by time series analysis
Demand forecasting by time series analysis
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
 

Semelhante a Time Series Analysis and Mining with R

Semelhante a Time Series Analysis and Mining with R (20)

R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with R
 
RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.
 
R programming language
R programming languageR programming language
R programming language
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
C PROGRAMS - SARASWATHI RAMALINGAM
C PROGRAMS - SARASWATHI RAMALINGAMC PROGRAMS - SARASWATHI RAMALINGAM
C PROGRAMS - SARASWATHI RAMALINGAM
 
R and data mining
R and data miningR and data mining
R and data mining
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
 
Assignment 5.2.pdf
Assignment 5.2.pdfAssignment 5.2.pdf
Assignment 5.2.pdf
 

Mais de Yanchang Zhao (8)

RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Time Series Analysis and Mining with R

  • 1. Time Series Analysis with R Yanchang Zhao http://www.RDataMining.com 30 September 2014 1 / 39
  • 2. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 4. Time Series Analysis with R 1 I time series data in R I time series decomposition, forecasting, clustering and classi
  • 5. cation I autoregressive integrated moving average (ARIMA) model I Dynamic Time Warping (DTW) I Discrete Wavelet Transform (DWT) I k-NN classi
  • 6. cation 1Chapter 8: Time Series Analysis and Mining, in book R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 3 / 39
  • 7. R I a free software environment for statistical computing and graphics I runs on Windows, Linux and MacOS I widely used in academia and research, as well as industrial applications I 5,800+ packages (as of 13 Sept 2014) I CRAN Task View: Time Series Analysis http://cran.r-project.org/web/views/TimeSeries.html 4 / 39
  • 8. Time Series Data in R I class ts I represents data which has been sampled at equispaced points in time I frequency=7: a weekly series I frequency=12: a monthly series I frequency=4: a quarterly series 5 / 39
  • 9. Time Series Data in R a <- ts(1:20, frequency = 12, start = c(2011, 3)) print(a) ## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ## 2011 1 2 3 4 5 6 7 8 9 10 ## 2012 11 12 13 14 15 16 17 18 19 20 str(a) ## Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 9 10... attributes(a) ## $tsp ## [1] 2011 2013 12 ## ## $class ## [1] "ts" 6 / 39
  • 10. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 12. What is Time Series Decomposition To decompose a time series into components: I Trend component: long term trend I Seasonal component: seasonal variation I Cyclical component: repeated but non-periodic uctuations I Irregular component: the residuals 8 / 39
  • 13. Data AirPassengers Data AirPassengers: monthly totals of Box Jenkins international airline passengers, 1949 to 1960. It has 144(=1212) values. plot(AirPassengers) Time AirPassengers 1950 1952 1954 1956 1958 1960 100 200 300 400 500 600 9 / 39
  • 14. Decomposition apts - ts(AirPassengers, frequency = 12) f - decompose(apts) plot(f$figure, type = b) # seasonal figures 2 4 6 8 10 12 −40 −20 0 20 40 60 Index f$figure 10 / 39
  • 15. Decomposition plot(f) observed 150 250 350 450 100 300 500 trend seasonal −40 0 20 40 60 −40 0 20 40 60 2 4 6 8 10 12 random Time Decomposition of additive time series 11 / 39
  • 16. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 18. Time Series Forecasting I To forecast future events based on known past data I For example, to predict the price of a stock based on its past performance I Popular models I Autoregressive moving average (ARMA) I Autoregressive integrated moving average (ARIMA) 13 / 39
  • 19. Forecasting # build an ARIMA model fit - arima(AirPassengers, order = c(1, 0, 0), list(order = c(2, 1, 0), period = 12)) fore - predict(fit, n.ahead = 24) # error bounds at 95% confidence level U - fore$pred + 2 * fore$se L - fore$pred - 2 * fore$se ts.plot(AirPassengers, fore$pred, U, L, col = c(1, 2, 4, 4), lty = c(1, 1, 2, 2)) legend(topleft, col = c(1, 2, 4), lty = c(1, 1, 2), c(Actual, Forecast, Error Bounds (95% Confidence))) 14 / 39
  • 20. Forecasting 1950 1952 1954 1956 1958 1960 1962 Time 100 200 300 400 500 600 700 Actual Forecast Error Bounds (95% Confidence) 15 / 39
  • 21. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 23. Time Series Clustering I To partition time series data into groups based on similarity or distance, so that time series in the same cluster are similar I Measure of distance/dissimilarity I Euclidean distance I Manhattan distance I Maximum norm I Hamming distance I The angle between two vectors (inner product) I Dynamic Time Warping (DTW) distance I ... 17 / 39
  • 24. Dynamic Time Warping (DTW) DTW
  • 25. nds optimal alignment between two time series. library(dtw) idx - seq(0, 2 * pi, len = 100) a - sin(idx) + runif(100)/10 b - cos(idx) align - dtw(a, b, step = asymmetricP1, keep = T) dtwPlotTwoWay(align) Index Query value 0 20 40 60 80 100 −1.0 −0.5 0.0 0.5 1.0 18 / 39
  • 26. Synthetic Control Chart Time Series I The dataset contains 600 examples of control charts synthetically generated by the process in Alcock and Manolopoulos (1999). I Each control chart is a time series with 60 values. I Six classes: I 1-100 Normal I 101-200 Cyclic I 201-300 Increasing trend I 301-400 Decreasing trend I 401-500 Upward shift I 501-600 Downward shift I http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_ control.html 19 / 39
  • 27. Synthetic Control Chart Time Series # read data into R sep='': the separator is white space, i.e., one # or more spaces, tabs, newlines or carriage returns sc - read.table(./data/synthetic_control.data, header = F, sep = ) # show one sample from each class idx - c(1, 101, 201, 301, 401, 501) sample1 - t(sc[idx, ]) plot.ts(sample1, main = ) 20 / 39
  • 28. Six Classes 1 24 26 28 30 32 34 36 101 15 20 25 30 35 40 45 25 30 35 40 45 0 10 20 30 40 50 60 201 Time 301 0 10 20 30 401 25 30 35 40 45 10 15 20 25 30 35 0 10 20 30 40 50 60 501 Time 21 / 39
  • 29. Hierarchical Clustering with Euclidean distance # sample n cases from every class n - 10 s - sample(1:100, n) idx - c(s, 100 + s, 200 + s, 300 + s, 400 + s, 500 + s) sample2 - sc[idx, ] observedLabels - rep(1:6, each = n) # hierarchical clustering with Euclidean distance hc - hclust(dist(sample2), method = ave) plot(hc, labels = observedLabels, main = ) 22 / 39
  • 30. Hierarchical Clustering with Euclidean distance 20 40 60 80 100 120 140 4 6 66 1 22 11 11 11 1 11 66 6 6 6 66 4 4 44 44 4 44 33 33 33 5 5 55 55 33 33 55 55 22 2 22 2 22 hclust (*, average) dist(sample2) Height 23 / 39
  • 31. Hierarchical Clustering with Euclidean distance # cut tree to get 8 clusters memb - cutree(hc, k = 8) table(observedLabels, memb) ## memb ## observedLabels 1 2 3 4 5 6 7 8 ## 1 10 0 0 0 0 0 0 0 ## 2 0 3 1 1 3 2 0 0 ## 3 0 0 0 0 0 0 10 0 ## 4 0 0 0 0 0 0 0 10 ## 5 0 0 0 0 0 0 10 0 ## 6 0 0 0 0 0 0 0 10 24 / 39
  • 32. Hierarchical Clustering with DTW Distance myDist - dist(sample2, method = DTW) hc - hclust(myDist, method = average) plot(hc, labels = observedLabels, main = ) # cut tree to get 8 clusters memb - cutree(hc, k = 8) table(observedLabels, memb) ## memb ## observedLabels 1 2 3 4 5 6 7 8 ## 1 10 0 0 0 0 0 0 0 ## 2 0 4 3 2 1 0 0 0 ## 3 0 0 0 0 0 6 4 0 ## 4 0 0 0 0 0 0 0 10 ## 5 0 0 0 0 0 0 10 0 ## 6 0 0 0 0 0 0 0 10 25 / 39
  • 33. Hierarchical Clustering with DTW Distance 0 200 400 600 800 1000 3 33 3 33 55 33 33 5 55 5 55 55 66 6 46 4 44 44 44 44 66 66 66 11 111 11 111 22 22 22 22 22 hclust (*, average) myDist Height 26 / 39
  • 34. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 38. cation I To build a classi
  • 39. cation model based on labelled time series I and then use the model to predict the lable of unlabelled time series Feature Extraction I Singular Value Decomposition (SVD) I Discrete Fourier Transform (DFT) I Discrete Wavelet Transform (DWT) I Piecewise Aggregate Approximation (PAA) I Perpetually Important Points (PIP) I Piecewise Linear Representation I Symbolic Representation 28 / 39
  • 40. Decision Tree (ctree) ctree from package party classId - rep(as.character(1:6), each = 100) newSc - data.frame(cbind(classId, sc)) library(party) ct - ctree(classId ~ ., data = newSc, controls = ctree_control(minsplit = 20, minbucket = 5, maxdepth = 5)) 29 / 39
  • 41. Decision Tree pClassId - predict(ct) table(classId, pClassId) ## pClassId ## classId 1 2 3 4 5 6 ## 1 100 0 0 0 0 0 ## 2 1 97 2 0 0 0 ## 3 0 0 99 0 1 0 ## 4 0 0 0 100 0 0 ## 5 4 0 8 0 88 0 ## 6 0 3 0 90 0 7 # accuracy (sum(classId == pClassId))/nrow(sc) ## [1] 0.8183 30 / 39
  • 42. DWT (Discrete Wavelet Transform) I Wavelet transform provides a multi-resolution representation using wavelets. I Haar Wavelet Transform { the simplest DWT http://dmr.ath.cx/gfx/haar/ I DFT (Discrete Fourier Transform): another popular feature extraction technique 31 / 39
  • 43. DWT (Discrete Wavelet Transform) # extract DWT (with Haar filter) coefficients library(wavelets) wtData - NULL for (i in 1:nrow(sc)) f a - t(sc[i, ]) wt - dwt(a, filter = haar, boundary = periodic) wtData - rbind(wtData, unlist(c(wt@W, wt@V[[wt@level]]))) g wtData - as.data.frame(wtData) wtSc - data.frame(cbind(classId, wtData)) 32 / 39
  • 44. Decision Tree with DWT ct - ctree(classId ~ ., data = wtSc, controls = ctree_control(minsplit=20, minbucket=5, maxdepth=5)) pClassId - predict(ct) table(classId, pClassId) ## pClassId ## classId 1 2 3 4 5 6 ## 1 98 2 0 0 0 0 ## 2 1 99 0 0 0 0 ## 3 0 0 81 0 19 0 ## 4 0 0 0 74 0 26 ## 5 0 0 16 0 84 0 ## 6 0 0 0 3 0 97 (sum(classId==pClassId)) / nrow(wtSc) ## [1] 0.8883 33 / 39
  • 45. plot(ct, ip_args = list(pval = F), ep_args = list(digits = 0)) 1 V57 £ 117 117 2 W43 £ −4 −4 3 W5 £ −8 −8 Node 4 (n = 68) 123456 1 0.8 0.6 0.4 0.2 0 Node 5 (n = 6) 123456 1 0.8 0.6 0.4 0.2 0 6 W31 £ −6 −6 Node 7 (n = 9) 123456 1 0.8 0.6 0.4 0.2 0 Node 8 (n = 86) 123456 1 0.8 0.6 0.4 0.2 0 9 V57 £ 140 140 Node 10 (n = 31) 123456 1 0.8 0.6 0.4 0.2 0 11 V57 £ 178 178 12 W22 £ −6 −6 Node 13 (n = 80) 123456 1 0.8 0.6 0.4 0.2 0 14 W31 £ −13 −13 Node 15 (n = 9) 123456 1 0.8 0.6 0.4 0.2 0 Node 16 (n = 99) 123456 1 0.8 0.6 0.4 0.2 0 17 W31 £ −15 −15 Node 18 (n = 12) 123456 1 0.8 0.6 0.4 0.2 0 19 W43 £ 3 3 Node 20 (n = 103) 123456 1 0.8 0.6 0.4 0.2 0 Node 21 (n = 97) 123456 1 0.8 0.6 0.4 0.2 0 34 / 39
  • 48. nd the k nearest neighbours of a new instance I label it by majority voting I needs an ecient indexing structure for large datasets k - 20 newTS - sc[501, ] + runif(100) * 15 distances - dist(newTS, sc, method = DTW) s - sort(as.vector(distances), index.return = TRUE) # class IDs of k nearest neighbours table(classId[s$ix[1:k]]) ## ## 4 6 ## 3 17 35 / 39
  • 51. nd the k nearest neighbours of a new instance I label it by majority voting I needs an ecient indexing structure for large datasets k - 20 newTS - sc[501, ] + runif(100) * 15 distances - dist(newTS, sc, method = DTW) s - sort(as.vector(distances), index.return = TRUE) # class IDs of k nearest neighbours table(classId[s$ix[1:k]]) ## ## 4 6 ## 3 17 Results of majority voting: class 6 35 / 39
  • 52. The TSclust Package I TSclust: a package for time seriesclustering 2 I measures of dissimilarity between time series to perform time series clustering. I metrics based on raw data, on generating models and on the forecast behavior I time series clustering algorithms and cluster evaluation metrics 2http://cran.r-project.org/web/packages/TSclust/ 36 / 39
  • 53. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 55. Online Resources I Chapter 8: Time Series Analysis and Mining, in book R and Data Mining: Examples and Case Studies http://www.rdatamining.com/docs/RDataMining.pdf I R Reference Card for Data Mining http://www.rdatamining.com/docs/R-refcard-data-mining.pdf I Free online courses and documents http://www.rdatamining.com/resources/ I RDataMining Group on LinkedIn (7,000+ members) http://group.rdatamining.com I RDataMining on Twitter (1,700+ followers) @RDataMining 38 / 39
  • 56. The End Thanks! Email: yanchang(at)rdatamining.com 39 / 39