This presentations includes the basic fundamentals of time series data forecasting. It starts with basic naive, regression models and then explains advanced ARIMA models.
3. Contents
– What
is
forecas2ng
– Why
we
need
it
– Forecas2ng
approaches
– Forecas2ng
evalua2on
– Forecas2ng
applica2ons
3
4. Forecasting: Case Scenario - I
500
1000
Aug 28 Aug 29 Aug 30 Aug 31 Sep 01
Timestamp
Power(W)
variable
Train
Forecast
4
5. Forecasting: Case Scenario - II
5
Find
carbon
footprint
for
Toyota
Matrix
with
capacity
of
1.8
(L),
and
mileages
of
25
(C)
and
31
(H)?
Sta2s2cs
of
4
cylinder
cars
Table:
hWps://www.otexts.org/fpp/1/4
6. Forecasting Types
– Time
series
Forecas2ng
– Data
collected
at
regular
intervals
of
2me
– e.g.,
Weather,
electricity
forecas2ng
– Cross-‐Sec2onal
Forecas2ng
– Data
collected
at
single
point
in
2me
– e.g.,
Carbon
emission,
disease
predic2on
6
Time
series
Forecas2ng
(Energy)
7. Assumptions
1. Historical
informa2on
is
available
2. Past
paWerns
will
con2nue
in
the
future
7
100
200
300
400
Aug 03 Aug 04 Aug 05 Aug 06
Timestamp
Power(W)
500
1000
Aug 27 Aug 28 Aug 29 Aug 30
Timestamp
Power(W)
8. Forecasting Horizon
1. Short-‐term
forecas2ng:
–
Hours
to
few
days
ahead
2. Medium-‐term
forecas2ng:
– Few
days
to
months
ahead
3. Long-‐term
forecas2ng:
–
Months
to
years
ahead
8
9. I.
Short
term
decision
II.
Long
term
investment
9
Why electricity forecasting
2
3
4
5
0 20 40 60 80
Months (Approx. 6 years)
Power(gW)
Sol:
Backup
Generators
1. Renewables
(Solar)
2. Diesel
Generators
3. Power
plants
Img:
hWp://www.installeronline.co.uk/brownout-‐one-‐coming/
16. 4. Model fitting: Vertical approach
16
0.1
0.2
0.3
0.4
0 5 10 15 20 25
Hour of the Day
0.1
0.2
0.3
0.4
0 5 10 15 20 25
Hour of the Day
Power(kW)
Day
2
9
16
23
Forecast
of
Day
30,
‘15
Weighted Forecasting
17. 5. Model evaluation: Prediction accuracy
Root
mean
square
error
(RMSE):
Lower
is
beWer
17
RMSE =
v
u
u
t 1
n
nX
i=1
(yi ˆyi)2
n = values
yi = Actual values
ˆyi = Forecast values
y = [713, 711, 652, 522]
ˆy = [751, 713, 711, 652]
RMSE = 73
24. Evaluation
1. Standard
error:
Lower
is
beWer
2. Goodness
of
fit:
Higher
is
beWer
24
ei = yi ˆyi
se =
v
u
u
t 1
N 2
NX
i=1
e2
i
R2
=
P
( ˆyi ¯y)2
P
(yi ¯y)2
R_squared:
hWps://goo.gl/Xm5gUd
30. Multiple regression
−1
0
1
2
3
Aug 28 Aug 29 Aug 30 Aug 31 Sep 01
Timestamp
Power(W)
variable
Train
Forecast
Actual
Residual Test RMSE:1.05
30
Demo
2
31. Summary - II
1. Regression
(Line
fieng)
1. Linear
2. Non-‐Linear
(Cousin
:
Splines)
3. Mul2ple
2. Demonstra2on
31
32. Auto-regressive (AR) model
– Auto-‐regressive
model:
– Models
future
values
as
a
func2on
of
recent
past
sequen2al
values
– Representa2on:
An
AR
model
with
past
p
values
is
denoted
as
AR(p)
32
Yt = f(Yt 1, Yt 2, ..., Yt p, ✏t)
Yt = 0 + 1Yt 1 + 2Yt 2 + ... + pYt p + ✏t
33. Moving Average (MA) model
33
Yt = 0 + ✏t + 1✏t 1 + 2✏t 2 + ... + q✏t q
Yt = f(✏t, ✏t 1, ✏t 2, ..., ✏t q)
• Moving
average
model:
• Models
future
values
as
a
func2on
of
recent
past
sequen2al
error
terms
• Representa2on:
An
MA
model
with
past
q
values
is
denoted
as
MA(q)
34. AR MA model
34
Yt = f(✏t, ✏t 1, ✏t 2, ..., ✏t q, Yt 1, Yt 2, ..., Yt p)
Yt = 0 + 1Yt 1 + 2Yt 2 + ... + pYt p+
✏t + 1✏t 1 + 2✏t 2 + ... + q✏t q
• Auto
regressive
Moving
Average
(ARMA)
model:
• Models
future
values
as
a
func2on
of
recent
past
sequen2al
values
and
error
terms
• Representa2on:
ARMA(p,
q)
model
37. ARIMA model
37
ARIMA
is
defined
by
a
tuple
(p,
d,
q)
Auto-‐Regressive
Integrated
Moving
Average
AR
I
MA
Yt = 0 + ✏t + 1✏t 1 + 2✏t 2 + ... + q✏t q
Yt = 0 + 1Yt 1 + 2Yt 2 + ... + pYt p + ✏t
Yt = Yt Yt 1
[Order
p]
[Order
d]
[Order
q]
38. ACF/PACF plots
1. Auto-‐Correla2on
Func2on
(ACF)
Plot:
– Correla2on
coefficients
of
2me-‐series
at
different
lags
– Defines
q
order
of
MA
model
2. Par2al
Auto-‐correla2on
Func2on
(PACF)
Plot:
– Par2al
correla2on
coefficients
of
2me
series
at
different
lags
– Defines
p
order
of
AR
model
38
39. ACF/PACF plots
39
500
1000
Aug 29 Aug 30 Aug 31 Sep 01
Timestamp
Power(W)
−0.25
0.00
0.25
0.50
5 10 15
Lag
PACF
−0.25
0.00
0.25
0.50
5 10 15
Lag
ACF
Data
PACF
plot
ACF
plot
42. SARIMA (Seasonal ARIMA)
– SARIMA(p,d,q)
(P,D,Q):
– Order
(P,D,Q)
handles
the
seasonality
part
42
43. Complete Forecasting pseudocode
1. Visualize
2me-‐series
data
2. If
data
is
noisy
– Apply
averaging
or
naïve
model
3. If
data
is
not
sta2onary
– First,
sta2onarize
data
using
differencing
– Next,
apply
any
2me
series
model
4. If
data
is
already
sta2onary
– Apply
any
2me
series
model
43
44. Challenges : Outliers
44
0.5
1.0
0 5 10 15 20 25
Hour of the Day
Power(kW)
Day
26
27
28
29
Anomalous
usage
0.25
0.50
0.75
1.00
1.25
0 5 10 15 20 25
Hour of the Day
Forecast
Actual
Day
30
45. Challenges: Domain Knowledge
45
100
200
300
400
0 5 10 15 20 25
Hour of the Day
Power
Day
2
9
16
23
250
500
750
0 5 10 15 20 25
Hour of the Day
Power(w)
Forecast
Actual
1
500
1000
0 5 10 15 20 25
Hour of the Day
Power(w)
Day
26
27
28
29
250
500
750
1000
1250
0 5 10 15 20 25
Hour of the Day
Power(w)
Forecast
Actual
2
47. References
1. Book:
Forecas2ng
principles
and
prac2ce,
hWps://
www.otexts.org/fpp
2. Understanding
seasonality
and
trend
with
code:
hWps://anomaly.io/seasonal-‐trend-‐decomposi2on-‐
in-‐r/
3. ARIMA
ordering:
hWps://people.duke.edu/~rnau/411arim3.htm
4. Time
series
Forecas2ng
theory:
hWps://www.youtube.com/watch?v=Aw77aMLj9uM
5. Book:
Applied
predic2ve
modeling
by
kuhn
et
al.
6. Book:
An
introduc2on
to
sta2s2cal
learning
by
Gareth
et
al.
47