8. Motivation
1 Common in business to have over 1000
products that need forecasting at least monthly.
2 Forecasts are often required by people who are
untrained in time series analysis.
Specifications
Automatic forecasting algorithms must:
¯ determine an appropriate time series model;
¯ estimate the parameters;
¯ compute the forecasts with prediction intervals.
Automatic time series forecasting Motivation 4
9. Motivation
1 Common in business to have over 1000
products that need forecasting at least monthly.
2 Forecasts are often required by people who are
untrained in time series analysis.
Specifications
Automatic forecasting algorithms must:
¯ determine an appropriate time series model;
¯ estimate the parameters;
¯ compute the forecasts with prediction intervals.
Automatic time series forecasting Motivation 4
10. Example: Asian sheep
Automatic time series forecasting Motivation 5
Numbers of sheep in Asia
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
250300350400450500550
11. Example: Asian sheep
Automatic time series forecasting Motivation 5
Automatic ETS forecasts
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
250300350400450500550
12. Example: Cortecosteroid sales
Automatic time series forecasting Motivation 6
Monthly cortecosteroid drug sales in Australia
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.4
13. Example: Cortecosteroid sales
Automatic time series forecasting Motivation 6
Automatic ARIMA forecasts
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.4
14. Outline
1 Motivation
2 Forecasting competitions
3 Forecasting the PBS
4 Exponential smoothing
5 ARIMA modelling
6 Automatic nonlinear forecasting?
7 Time series with complex seasonality
8 Recent developments
Automatic time series forecasting Forecasting competitions 7
15. Makridakis and Hibon (1979)
Automatic time series forecasting Forecasting competitions 8
16. Makridakis and Hibon (1979)
Automatic time series forecasting Forecasting competitions 8
17. Makridakis and Hibon (1979)
This was the first large-scale empirical evaluation of
time series forecasting methods.
Highly controversial at the time.
Difficulties:
How to measure forecast accuracy?
How to apply methods consistently and objectively?
How to explain unexpected results?
Common thinking was that the more
sophisticated mathematical models (ARIMA
models at the time) were necessarily better.
If results showed ARIMA models not best, it
must be because analyst was unskilled.
Automatic time series forecasting Forecasting competitions 9
18. Makridakis and Hibon (1979)
I do not believe that it is very fruitful to attempt to
classify series according to which forecasting techniques
perform “best”. The performance of any particular
technique when applied to a particular series depends
essentially on (a) the model which the series obeys;
(b) our ability to identify and fit this model correctly and
(c) the criterion chosen to measure the forecasting
accuracy. — M.B. Priestley
. . . the paper suggests the application of normal scientific
experimental design to forecasting, with measures of
unbiased testing of forecasts against subsequent reality,
for success or failure. A long overdue reform.
— F.H. Hansford-Miller
Automatic time series forecasting Forecasting competitions 10
19. Makridakis and Hibon (1979)
I do not believe that it is very fruitful to attempt to
classify series according to which forecasting techniques
perform “best”. The performance of any particular
technique when applied to a particular series depends
essentially on (a) the model which the series obeys;
(b) our ability to identify and fit this model correctly and
(c) the criterion chosen to measure the forecasting
accuracy. — M.B. Priestley
. . . the paper suggests the application of normal scientific
experimental design to forecasting, with measures of
unbiased testing of forecasts against subsequent reality,
for success or failure. A long overdue reform.
— F.H. Hansford-Miller
Automatic time series forecasting Forecasting competitions 10
20. Makridakis and Hibon (1979)
Modern man is fascinated with the subject of
forecasting — W.G. Gilchrist
It is amazing to me, however, that after all this
exercise in identifying models, transforming and so
on, that the autoregressive moving averages come
out so badly. I wonder whether it might be partly
due to the authors not using the backwards
forecasting approach to obtain the initial errors.
— W.G. Gilchrist
Automatic time series forecasting Forecasting competitions 11
21. Makridakis and Hibon (1979)
Modern man is fascinated with the subject of
forecasting — W.G. Gilchrist
It is amazing to me, however, that after all this
exercise in identifying models, transforming and so
on, that the autoregressive moving averages come
out so badly. I wonder whether it might be partly
due to the authors not using the backwards
forecasting approach to obtain the initial errors.
— W.G. Gilchrist
Automatic time series forecasting Forecasting competitions 11
22. Makridakis and Hibon (1979)
I find it hard to believe that Box-Jenkins, if properly
applied, can actually be worse than so many of the
simple methods — C. Chatfield
Why do empirical studies sometimes give different
answers? It may depend on the selected sample of
time series, but I suspect it is more likely to depend
on the skill of the analyst and on their individual
interpretations of what is meant by Method X.
— C. Chatfield
. . . these authors are more at home with simple
procedures than with Box-Jenkins. — C. Chatfield
Automatic time series forecasting Forecasting competitions 12
23. Makridakis and Hibon (1979)
I find it hard to believe that Box-Jenkins, if properly
applied, can actually be worse than so many of the
simple methods — C. Chatfield
Why do empirical studies sometimes give different
answers? It may depend on the selected sample of
time series, but I suspect it is more likely to depend
on the skill of the analyst and on their individual
interpretations of what is meant by Method X.
— C. Chatfield
. . . these authors are more at home with simple
procedures than with Box-Jenkins. — C. Chatfield
Automatic time series forecasting Forecasting competitions 12
24. Makridakis and Hibon (1979)
I find it hard to believe that Box-Jenkins, if properly
applied, can actually be worse than so many of the
simple methods — C. Chatfield
Why do empirical studies sometimes give different
answers? It may depend on the selected sample of
time series, but I suspect it is more likely to depend
on the skill of the analyst and on their individual
interpretations of what is meant by Method X.
— C. Chatfield
. . . these authors are more at home with simple
procedures than with Box-Jenkins. — C. Chatfield
Automatic time series forecasting Forecasting competitions 12
25. Makridakis and Hibon (1979)
There is a fact that Professor Priestley must accept:
empirical evidence is in disagreement with his
theoretical arguments. — S. Makridakis M. Hibon
Dr Chatfield expresses some personal views about
the first author . . . It might be useful for Dr Chatfield
to read some of the psychological literature quoted
in the main paper, and he can then learn a little
more about biases and how they affect prior
probabilities. — S. Makridakis M. Hibon
Automatic time series forecasting Forecasting competitions 13
26. Makridakis and Hibon (1979)
There is a fact that Professor Priestley must accept:
empirical evidence is in disagreement with his
theoretical arguments. — S. Makridakis M. Hibon
Dr Chatfield expresses some personal views about
the first author . . . It might be useful for Dr Chatfield
to read some of the psychological literature quoted
in the main paper, and he can then learn a little
more about biases and how they affect prior
probabilities. — S. Makridakis M. Hibon
Automatic time series forecasting Forecasting competitions 13
27. Consequences of MH (1979)
As a result of this paper, researchers started to:
¯ consider how to automate forecasting methods;
¯ study what methods give the best forecasts;
¯ be aware of the dangers of over-fitting;
¯ treat forecasting as a different problem from
time series analysis.
Makridakis Hibon followed up with a new
competition in 1982:
1001 series
Anyone could submit forecasts (avoiding the
charge of incompetence)
Multiple forecast measures used.
Automatic time series forecasting Forecasting competitions 14
28. Consequences of MH (1979)
As a result of this paper, researchers started to:
¯ consider how to automate forecasting methods;
¯ study what methods give the best forecasts;
¯ be aware of the dangers of over-fitting;
¯ treat forecasting as a different problem from
time series analysis.
Makridakis Hibon followed up with a new
competition in 1982:
1001 series
Anyone could submit forecasts (avoiding the
charge of incompetence)
Multiple forecast measures used.
Automatic time series forecasting Forecasting competitions 14
30. M-competition
Main findings (taken from Makridakis Hibon, 2000)
1 Statistically sophisticated or complex methods do
not necessarily provide more accurate forecasts
than simpler ones.
2 The relative ranking of the performance of the
various methods varies according to the accuracy
measure being used.
3 The accuracy when various methods are being
combined outperforms, on average, the individual
methods being combined and does very well in
comparison to other methods.
4 The accuracy of the various methods depends upon
the length of the forecasting horizon involved.
Automatic time series forecasting Forecasting competitions 16
32. Makridakis and Hibon (2000)
“The M3-Competition is a final attempt by the authors to
settle the accuracy issue of various time series methods. . .
The extension involves the inclusion of more methods/
researchers (in particular in the areas of neural networks
and expert systems) and more series.”
3003 series
All data from business, demography, finance and
economics.
Series length between 14 and 126.
Either non-seasonal, monthly or quarterly.
All time series positive.
MH claimed that the M3-competition supported the
findings of their earlier work.
However, best performing methods far from “simple”.
Automatic time series forecasting Forecasting competitions 18
33. Makridakis and Hibon (2000)
Best methods:
Theta
A very confusing explanation.
Shown by Hyndman and Billah (2003) to be average of
linear regression and simple exponential smoothing
with drift, applied to seasonally adjusted data.
Later, the original authors claimed that their
explanation was incorrect.
Forecast Pro
A commercial software package with an unknown
algorithm.
Known to fit either exponential smoothing or ARIMA
models using BIC.
Automatic time series forecasting Forecasting competitions 19
34. M3 results (recalculated)
Method MAPE sMAPE MASE
Theta 17.42 12.76 1.39
ForecastPro 18.00 13.06 1.47
ForecastX 17.35 13.09 1.42
Automatic ANN 17.18 13.98 1.53
B-J automatic 19.13 13.72 1.54
Automatic time series forecasting Forecasting competitions 20
35. M3 results (recalculated)
Method MAPE sMAPE MASE
Theta 17.42 12.76 1.39
ForecastPro 18.00 13.06 1.47
ForecastX 17.35 13.09 1.42
Automatic ANN 17.18 13.98 1.53
B-J automatic 19.13 13.72 1.54
Automatic time series forecasting Forecasting competitions 20
® Calculations do not match
published paper.
® Some contestants apparently
submitted multiple entries but only
best ones published.
36. Outline
1 Motivation
2 Forecasting competitions
3 Forecasting the PBS
4 Exponential smoothing
5 ARIMA modelling
6 Automatic nonlinear forecasting?
7 Time series with complex seasonality
8 Recent developments
Automatic time series forecasting Forecasting the PBS 21
38. Forecasting the PBS
The Pharmaceutical Benefits Scheme (PBS) is
the Australian government drugs subsidy scheme.
Many drugs bought from pharmacies are
subsidised to allow more equitable access to
modern drugs.
The cost to government is determined by the
number and types of drugs purchased.
Currently nearly 1% of GDP ($14 billion).
The total cost is budgeted based on forecasts
of drug usage.
Automatic time series forecasting Forecasting the PBS 23
39. Forecasting the PBS
The Pharmaceutical Benefits Scheme (PBS) is
the Australian government drugs subsidy scheme.
Many drugs bought from pharmacies are
subsidised to allow more equitable access to
modern drugs.
The cost to government is determined by the
number and types of drugs purchased.
Currently nearly 1% of GDP ($14 billion).
The total cost is budgeted based on forecasts
of drug usage.
Automatic time series forecasting Forecasting the PBS 23
40. Forecasting the PBS
The Pharmaceutical Benefits Scheme (PBS) is
the Australian government drugs subsidy scheme.
Many drugs bought from pharmacies are
subsidised to allow more equitable access to
modern drugs.
The cost to government is determined by the
number and types of drugs purchased.
Currently nearly 1% of GDP ($14 billion).
The total cost is budgeted based on forecasts
of drug usage.
Automatic time series forecasting Forecasting the PBS 23
41. Forecasting the PBS
The Pharmaceutical Benefits Scheme (PBS) is
the Australian government drugs subsidy scheme.
Many drugs bought from pharmacies are
subsidised to allow more equitable access to
modern drugs.
The cost to government is determined by the
number and types of drugs purchased.
Currently nearly 1% of GDP ($14 billion).
The total cost is budgeted based on forecasts
of drug usage.
Automatic time series forecasting Forecasting the PBS 23
42. Forecasting the PBS
In 2001: $4.5 billion budget, under-forecasted
by $800 million.
Thousands of products. Seasonal demand.
Subject to covert marketing, volatile products,
uncontrollable expenditure.
Although monthly data available for 10 years,
data are aggregated to annual values, and only
the first three years are used in estimating the
forecasts.
All forecasts being done with the FORECAST
function in MS-Excel!
Automatic time series forecasting Forecasting the PBS 24
43. Forecasting the PBS
In 2001: $4.5 billion budget, under-forecasted
by $800 million.
Thousands of products. Seasonal demand.
Subject to covert marketing, volatile products,
uncontrollable expenditure.
Although monthly data available for 10 years,
data are aggregated to annual values, and only
the first three years are used in estimating the
forecasts.
All forecasts being done with the FORECAST
function in MS-Excel!
Automatic time series forecasting Forecasting the PBS 24
44. Forecasting the PBS
In 2001: $4.5 billion budget, under-forecasted
by $800 million.
Thousands of products. Seasonal demand.
Subject to covert marketing, volatile products,
uncontrollable expenditure.
Although monthly data available for 10 years,
data are aggregated to annual values, and only
the first three years are used in estimating the
forecasts.
All forecasts being done with the FORECAST
function in MS-Excel!
Automatic time series forecasting Forecasting the PBS 24
45. Forecasting the PBS
In 2001: $4.5 billion budget, under-forecasted
by $800 million.
Thousands of products. Seasonal demand.
Subject to covert marketing, volatile products,
uncontrollable expenditure.
Although monthly data available for 10 years,
data are aggregated to annual values, and only
the first three years are used in estimating the
forecasts.
All forecasts being done with the FORECAST
function in MS-Excel!
Automatic time series forecasting Forecasting the PBS 24
46. Forecasting the PBS
In 2001: $4.5 billion budget, under-forecasted
by $800 million.
Thousands of products. Seasonal demand.
Subject to covert marketing, volatile products,
uncontrollable expenditure.
Although monthly data available for 10 years,
data are aggregated to annual values, and only
the first three years are used in estimating the
forecasts.
All forecasts being done with the FORECAST
function in MS-Excel!
Automatic time series forecasting Forecasting the PBS 24
47. PBS data
Automatic time series forecasting Forecasting the PBS 25
Total cost: A03 concession safety net group
Time
$thousands
1995 2000 2005
020040060080010001200
48. PBS data
Automatic time series forecasting Forecasting the PBS 25
Total cost: A05 general copayments group
Time
$thousands
1995 2000 2005
050100150200
49. PBS data
Automatic time series forecasting Forecasting the PBS 25
Total cost: D01 general copayments group
Time
$thousands
1995 2000 2005
0100200300400500600700
50. PBS data
Automatic time series forecasting Forecasting the PBS 25
Total cost: S01 general copayments group
Time
$thousands
1995 2000 2005
050010001500
51. PBS data
Automatic time series forecasting Forecasting the PBS 25
Total cost: R03 general copayments group
Time
$thousands
1995 2000 2005
10002000300040005000
52. Outline
1 Motivation
2 Forecasting competitions
3 Forecasting the PBS
4 Exponential smoothing
5 ARIMA modelling
6 Automatic nonlinear forecasting?
7 Time series with complex seasonality
8 Recent developments
Automatic time series forecasting Exponential smoothing 26
53. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
Automatic time series forecasting Exponential smoothing 27
54. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
Automatic time series forecasting Exponential smoothing 27
55. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Automatic time series forecasting Exponential smoothing 27
56. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Ad,N: Additive damped trend method
Automatic time series forecasting Exponential smoothing 27
57. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Ad,N: Additive damped trend method
M,N: Exponential trend method
Automatic time series forecasting Exponential smoothing 27
58. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Ad,N: Additive damped trend method
M,N: Exponential trend method
Md,N: Multiplicative damped trend method
Automatic time series forecasting Exponential smoothing 27
59. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Ad,N: Additive damped trend method
M,N: Exponential trend method
Md,N: Multiplicative damped trend method
A,A: Additive Holt-Winters’ method
Automatic time series forecasting Exponential smoothing 27
60. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
A,N: Holt’s linear method
Ad,N: Additive damped trend method
M,N: Exponential trend method
Md,N: Multiplicative damped trend method
A,A: Additive Holt-Winters’ method
A,M: Multiplicative Holt-Winters’ method
Automatic time series forecasting Exponential smoothing 27
61. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exponential smoothing
methods.
Automatic time series forecasting Exponential smoothing 27
62. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exponential smoothing
methods.
Each can have an additive or multiplicative error,
giving 30 separate models.
Automatic time series forecasting Exponential smoothing 27
63. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exponential smoothing
methods.
Each can have an additive or multiplicative error,
giving 30 separate models.
Only 19 models are numerically stable.
Automatic time series forecasting Exponential smoothing 27
64. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic time series forecasting Exponential smoothing 28
65. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic time series forecasting Exponential smoothing 28
66. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
↑
Trend
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic time series forecasting Exponential smoothing 28
67. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
↑
Trend Seasonal
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic time series forecasting Exponential smoothing 28
68. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
↑
Error Trend Seasonal
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic time series forecasting Exponential smoothing 28
69. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
↑
Error Trend Seasonal
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic time series forecasting Exponential smoothing 28
70. Exponential smoothing methods
Seasonal Component
Trend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
↑
Error Trend Seasonal
Examples:
A,N,N: Simple exponential smoothing with additive errors
A,A,N: Holt’s linear method with additive errors
M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic time series forecasting Exponential smoothing 28
Innovations state space models
¯ All ETS models can be written in innovations
state space form (IJF, 2002).
¯ Additive and multiplicative versions give the
same point forecasts but different prediction
intervals.
71. ETS state space model
xt−1
εt
yt
Automatic time series forecasting Exponential smoothing 29
State space model
xt = (level, slope, seasonal)
72. ETS state space model
xt−1
εt
yt
xt
Automatic time series forecasting Exponential smoothing 29
State space model
xt = (level, slope, seasonal)
73. ETS state space model
xt−1
εt
yt
xt yt+1
εt+1
Automatic time series forecasting Exponential smoothing 29
State space model
xt = (level, slope, seasonal)
74. ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1
Automatic time series forecasting Exponential smoothing 29
State space model
xt = (level, slope, seasonal)
75. ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2
Automatic time series forecasting Exponential smoothing 29
State space model
xt = (level, slope, seasonal)
76. ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2 xt+2
Automatic time series forecasting Exponential smoothing 29
State space model
xt = (level, slope, seasonal)
77. ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2 xt+2 yt+3
εt+3
Automatic time series forecasting Exponential smoothing 29
State space model
xt = (level, slope, seasonal)
78. ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2 xt+2 yt+3
εt+3 xt+3
Automatic time series forecasting Exponential smoothing 29
State space model
xt = (level, slope, seasonal)
79. ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2 xt+2 yt+3
εt+3 xt+3 yt+4
εt+4
Automatic time series forecasting Exponential smoothing 29
State space model
xt = (level, slope, seasonal)
80. ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2 xt+2 yt+3
εt+3 xt+3 yt+4
εt+4
Automatic time series forecasting Exponential smoothing 29
State space model
xt = (level, slope, seasonal)
Estimation
Compute likelihood L from
ε1, ε2, . . . , εT.
Optimize L wrt model
parameters.
81. Innovations state space models
Let xt = ( t, bt, st, st−1, . . . , st−m+1) and εt
iid
∼ N(0, σ2
).
yt = h(xt−1) + k(xt−1)εt Observation equation
µt et
xt = f(xt−1) + g(xt−1)εt State equation
Additive errors:
k(xt−1) = 1. yt = µt + εt.
Multiplicative errors:
k(xt−1) = µt. yt = µt(1 + εt).
εt = (yt − µt)/µt is relative error.
Automatic time series forecasting Exponential smoothing 30
82. Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give same
point forecasts but different prediction
intervals.
Estimation
L∗
(θ, x0) = n log
n
t=1
ε2
t /k2
(xt−1) + 2
n
t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial states
x0 = ( 0, b0, s0, s−1, . . . , s−m+1).
Automatic time series forecasting Exponential smoothing 31
83. Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give same
point forecasts but different prediction
intervals.
Estimation
L∗
(θ, x0) = n log
n
t=1
ε2
t /k2
(xt−1) + 2
n
t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial states
x0 = ( 0, b0, s0, s−1, . . . , s−m+1).
Automatic time series forecasting Exponential smoothing 31
84. Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give same
point forecasts but different prediction
intervals.
Estimation
L∗
(θ, x0) = n log
n
t=1
ε2
t /k2
(xt−1) + 2
n
t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial states
x0 = ( 0, b0, s0, s−1, . . . , s−m+1).
Automatic time series forecasting Exponential smoothing 31
85. Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give same
point forecasts but different prediction
intervals.
Estimation
L∗
(θ, x0) = n log
n
t=1
ε2
t /k2
(xt−1) + 2
n
t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial states
x0 = ( 0, b0, s0, s−1, . . . , s−m+1).
Automatic time series forecasting Exponential smoothing 31
86. Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give same
point forecasts but different prediction
intervals.
Estimation
L∗
(θ, x0) = n log
n
t=1
ε2
t /k2
(xt−1) + 2
n
t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial states
x0 = ( 0, b0, s0, s−1, . . . , s−m+1).
Automatic time series forecasting Exponential smoothing 31
87. Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give same
point forecasts but different prediction
intervals.
Estimation
L∗
(θ, x0) = n log
n
t=1
ε2
t /k2
(xt−1) + 2
n
t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial states
x0 = ( 0, b0, s0, s−1, . . . , s−m+1).
Automatic time series forecasting Exponential smoothing 31
Q: How to choose
between the 19
different ETS models?
93. Akaike’s Information Criterion
AIC = −2 log(L) + 2k
where L is the likelihood and k is the number of
estimated parameters in the model.
This is a penalized likelihood approach.
If L is Gaussian, then AIC ≈ c + T log MSE + 2k
where c is a constant, MSE is from one-step
forecasts on training set, and T is the length of
the series.
Minimizing the Gaussian AIC is asymptotically
equivalent (as T → ∞) to minimizing MSE from
one-step forecasts on test set via time series
cross-validation.
Automatic time series forecasting Exponential smoothing 33
94. Akaike’s Information Criterion
AIC = −2 log(L) + 2k
where L is the likelihood and k is the number of
estimated parameters in the model.
This is a penalized likelihood approach.
If L is Gaussian, then AIC ≈ c + T log MSE + 2k
where c is a constant, MSE is from one-step
forecasts on training set, and T is the length of
the series.
Minimizing the Gaussian AIC is asymptotically
equivalent (as T → ∞) to minimizing MSE from
one-step forecasts on test set via time series
cross-validation.
Automatic time series forecasting Exponential smoothing 33
95. Akaike’s Information Criterion
AIC = −2 log(L) + 2k
where L is the likelihood and k is the number of
estimated parameters in the model.
This is a penalized likelihood approach.
If L is Gaussian, then AIC ≈ c + T log MSE + 2k
where c is a constant, MSE is from one-step
forecasts on training set, and T is the length of
the series.
Minimizing the Gaussian AIC is asymptotically
equivalent (as T → ∞) to minimizing MSE from
one-step forecasts on test set via time series
cross-validation.
Automatic time series forecasting Exponential smoothing 33
96. Akaike’s Information Criterion
AIC = −2 log(L) + 2k
where L is the likelihood and k is the number of
estimated parameters in the model.
This is a penalized likelihood approach.
If L is Gaussian, then AIC ≈ c + T log MSE + 2k
where c is a constant, MSE is from one-step
forecasts on training set, and T is the length of
the series.
Minimizing the Gaussian AIC is asymptotically
equivalent (as T → ∞) to minimizing MSE from
one-step forecasts on test set via time series
cross-validation.
Automatic time series forecasting Exponential smoothing 33
97. Akaike’s Information Criterion
AIC = −2 log(L) + 2k
where L is the likelihood and k is the number of
estimated parameters in the model.
This is a penalized likelihood approach.
If L is Gaussian, then AIC ≈ c + T log MSE + 2k
where c is a constant, MSE is from one-step
forecasts on training set, and T is the length of
the series.
Minimizing the Gaussian AIC is asymptotically
equivalent (as T → ∞) to minimizing MSE from
one-step forecasts on test set via time series
cross-validation.
Automatic time series forecasting Exponential smoothing 33
98. Akaike’s Information Criterion
AIC = −2 log(L) + 2k
Corrected AIC
For small T, AIC tends to over-fit. Bias-corrected
version:
AICC = AIC + 2(k+1)(k+2)
T−k
Bayesian Information Criterion
BIC = AIC + k[log(T) − 2]
BIC penalizes terms more heavily than AIC
Minimizing BIC is consistent if there is a true
model.
Automatic time series forecasting Exponential smoothing 34
99. Akaike’s Information Criterion
AIC = −2 log(L) + 2k
Corrected AIC
For small T, AIC tends to over-fit. Bias-corrected
version:
AICC = AIC + 2(k+1)(k+2)
T−k
Bayesian Information Criterion
BIC = AIC + k[log(T) − 2]
BIC penalizes terms more heavily than AIC
Minimizing BIC is consistent if there is a true
model.
Automatic time series forecasting Exponential smoothing 34
100. Akaike’s Information Criterion
AIC = −2 log(L) + 2k
Corrected AIC
For small T, AIC tends to over-fit. Bias-corrected
version:
AICC = AIC + 2(k+1)(k+2)
T−k
Bayesian Information Criterion
BIC = AIC + k[log(T) − 2]
BIC penalizes terms more heavily than AIC
Minimizing BIC is consistent if there is a true
model.
Automatic time series forecasting Exponential smoothing 34
101. What to use?
Choice: AIC, AICc, BIC, CV-MSE
CV-MSE too time consuming for most automatic
forecasting purposes. Also requires large T.
As T → ∞, BIC selects true model if there is
one. But that is never true!
AICc focuses on forecasting performance, can
be used on small samples and is very fast to
compute.
Empirical studies in forecasting show AIC is
better than BIC for forecast accuracy.
Automatic time series forecasting Exponential smoothing 35
102. What to use?
Choice: AIC, AICc, BIC, CV-MSE
CV-MSE too time consuming for most automatic
forecasting purposes. Also requires large T.
As T → ∞, BIC selects true model if there is
one. But that is never true!
AICc focuses on forecasting performance, can
be used on small samples and is very fast to
compute.
Empirical studies in forecasting show AIC is
better than BIC for forecast accuracy.
Automatic time series forecasting Exponential smoothing 35
103. What to use?
Choice: AIC, AICc, BIC, CV-MSE
CV-MSE too time consuming for most automatic
forecasting purposes. Also requires large T.
As T → ∞, BIC selects true model if there is
one. But that is never true!
AICc focuses on forecasting performance, can
be used on small samples and is very fast to
compute.
Empirical studies in forecasting show AIC is
better than BIC for forecast accuracy.
Automatic time series forecasting Exponential smoothing 35
104. What to use?
Choice: AIC, AICc, BIC, CV-MSE
CV-MSE too time consuming for most automatic
forecasting purposes. Also requires large T.
As T → ∞, BIC selects true model if there is
one. But that is never true!
AICc focuses on forecasting performance, can
be used on small samples and is very fast to
compute.
Empirical studies in forecasting show AIC is
better than BIC for forecast accuracy.
Automatic time series forecasting Exponential smoothing 35
105. What to use?
Choice: AIC, AICc, BIC, CV-MSE
CV-MSE too time consuming for most automatic
forecasting purposes. Also requires large T.
As T → ∞, BIC selects true model if there is
one. But that is never true!
AICc focuses on forecasting performance, can
be used on small samples and is very fast to
compute.
Empirical studies in forecasting show AIC is
better than BIC for forecast accuracy.
Automatic time series forecasting Exponential smoothing 35
106. ets algorithm in R
Automatic time series forecasting Exponential smoothing 36
Based on Hyndman, Koehler,
Snyder Grose (IJF 2002):
Apply each of 19 models that are
appropriate to the data. Optimize
parameters and initial values
using MLE.
Select best method using AICc.
Produce forecasts using best
method.
Obtain prediction intervals using
underlying state space model.
107. ets algorithm in R
Automatic time series forecasting Exponential smoothing 36
Based on Hyndman, Koehler,
Snyder Grose (IJF 2002):
Apply each of 19 models that are
appropriate to the data. Optimize
parameters and initial values
using MLE.
Select best method using AICc.
Produce forecasts using best
method.
Obtain prediction intervals using
underlying state space model.
108. ets algorithm in R
Automatic time series forecasting Exponential smoothing 36
Based on Hyndman, Koehler,
Snyder Grose (IJF 2002):
Apply each of 19 models that are
appropriate to the data. Optimize
parameters and initial values
using MLE.
Select best method using AICc.
Produce forecasts using best
method.
Obtain prediction intervals using
underlying state space model.
109. ets algorithm in R
Automatic time series forecasting Exponential smoothing 36
Based on Hyndman, Koehler,
Snyder Grose (IJF 2002):
Apply each of 19 models that are
appropriate to the data. Optimize
parameters and initial values
using MLE.
Select best method using AICc.
Produce forecasts using best
method.
Obtain prediction intervals using
underlying state space model.
110. Exponential smoothing
Automatic time series forecasting Exponential smoothing 37
Forecasts from ETS(M,A,N)
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
300400500600
111. Exponential smoothing
fit - ets(livestock)
fcast - forecast(fit)
plot(fcast)
Automatic time series forecasting Exponential smoothing 38
Forecasts from ETS(M,A,N)
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
300400500600
112. Exponential smoothing
Automatic time series forecasting Exponential smoothing 39
Forecasts from ETS(M,Md,M)
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.41.6
113. Exponential smoothing
fit - ets(h02)
fcast - forecast(fit)
plot(fcast)
Automatic time series forecasting Exponential smoothing 40
Forecasts from ETS(M,Md,M)
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.41.6
114. Exponential smoothing
fit
ETS(M,Md,M)
Smoothing parameters:
alpha = 0.3318
beta = 4e-04
gamma = 1e-04
phi = 0.9695
Initial states:
l = 0.4003
b = 1.0233
s = 0.8575 0.8183 0.7559 0.7627 0.6873 1.2884
1.3456 1.1867 1.1653 1.1033 1.0398 0.9893
sigma: 0.0651
AIC AICc BIC
-121.97999 -118.68967 -65.57195
Automatic time series forecasting Exponential smoothing 41
115. M3 comparisons
Method MAPE sMAPE MASE
Theta 17.42 12.76 1.39
ForecastPro 18.00 13.06 1.47
ForecastX 17.35 13.09 1.42
Automatic ANN 17.18 13.98 1.53
B-J automatic 19.13 13.72 1.54
ETS 18.06 13.38 1.52
Automatic time series forecasting Exponential smoothing 42
124. ARIMA models
yt−1
yt−2
yt−3
εt
εt−1
εt−2
yt
Inputs Output
Automatic time series forecasting ARIMA modelling 45
Autoregression moving
average (ARMA) model
Estimation
Compute likelihood L from
ε1, ε2, . . . , εT.
Use optimization
algorithm to maximize L.
ARIMA model
Autoregression moving
average (ARMA) model
applied to differences.
128. Auto ARIMA
Automatic time series forecasting ARIMA modelling 47
Forecasts from ARIMA(0,1,0) with drift
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
250300350400450500550
129. Auto ARIMA
fit - auto.arima(livestock)
fcast - forecast(fit)
plot(fcast)
Automatic time series forecasting ARIMA modelling 48
Forecasts from ARIMA(0,1,0) with drift
Year
millionsofsheep
1960 1970 1980 1990 2000 2010
250300350400450500550
130. Auto ARIMA
Automatic time series forecasting ARIMA modelling 49
Forecasts from ARIMA(3,1,3)(0,1,1)[12]
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.4
131. Auto ARIMA
fit - auto.arima(h02)
fcast - forecast(fit)
plot(fcast)
Automatic time series forecasting ARIMA modelling 50
Forecasts from ARIMA(3,1,3)(0,1,1)[12]
Year
Totalscripts(millions)
1995 2000 2005 2010
0.40.60.81.01.21.4
132. Auto ARIMA
fit
Series: h02
ARIMA(3,1,3)(0,1,1)[12]
Coefficients:
ar1 ar2 ar3 ma1 ma2 ma3
-0.3648 -0.0636 0.3568 -0.4850 0.0479 -0.353
s.e. 0.2198 0.3293 0.1268 0.2227 0.2755 0.212
sma1
-0.5931
s.e. 0.0651
sigma^2 estimated as 0.002706: log likelihood=290.25
AIC=-564.5 AICc=-563.71 BIC=-538.48
Automatic time series forecasting ARIMA modelling 51
133. How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1 − B)d
yt = c + θ(B)εt
Need to select appropriate orders p, q, d, and
whether to include c.
Automatic time series forecasting ARIMA modelling 52
Algorithm choices driven by forecast accuracy.
134. How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1 − B)d
yt = c + θ(B)εt
Need to select appropriate orders p, q, d, and
whether to include c.
Hyndman Khandakar (JSS, 2008) algorithm:
Select no. differences d via KPSS unit root test.
Select p, q, c by minimising AICc.
Use stepwise search to traverse model space,
starting with a simple model and considering
nearby variants.
Automatic time series forecasting ARIMA modelling 52
Algorithm choices driven by forecast accuracy.
135. How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1 − B)d
yt = c + θ(B)εt
Need to select appropriate orders p, q, d, and
whether to include c.
Hyndman Khandakar (JSS, 2008) algorithm:
Select no. differences d via KPSS unit root test.
Select p, q, c by minimising AICc.
Use stepwise search to traverse model space,
starting with a simple model and considering
nearby variants.
Automatic time series forecasting ARIMA modelling 52
Algorithm choices driven by forecast accuracy.
136. How does auto.arima() work?
A seasonal ARIMA process
Φ(Bm
)φ(B)(1 − B)d
(1 − Bm
)D
yt = c + Θ(Bm
)θ(B)εt
Need to select appropriate orders p, q, d, P, Q, D, and
whether to include c.
Hyndman Khandakar (JSS, 2008) algorithm:
Select no. differences d via KPSS unit root test.
Select D using OCSB unit root test.
Select p, q, P, Q, c by minimising AIC.
Use stepwise search to traverse model space,
starting with a simple model and considering
nearby variants.
Automatic time series forecasting ARIMA modelling 53
139. M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
Automatic time series forecasting ARIMA modelling 55
140. M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
Automatic time series forecasting ARIMA modelling 55
141. M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
Automatic time series forecasting ARIMA modelling 55
142. M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
Automatic time series forecasting ARIMA modelling 55
143. M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additive
and AutoARIMA) is the only fully documented
method that is comparable to the M3
competition winners.
Automatic time series forecasting ARIMA modelling 55
144. Outline
1 Motivation
2 Forecasting competitions
3 Forecasting the PBS
4 Exponential smoothing
5 ARIMA modelling
6 Automatic nonlinear forecasting?
7 Time series with complex seasonality
8 Recent developments
Automatic time series forecasting Automatic nonlinear forecasting? 56
145. Automatic nonlinear forecasting
Automatic ANN in M3 competition did poorly.
Linear methods did best in the NN3
competition!
Very few machine learning methods get
published in the IJF because authors cannot
demonstrate their methods give better
forecasts than linear benchmark methods,
even on supposedly nonlinear data.
Some good recent work by Kourentzes and
Crone (Neurocomputing, 2010) on automated
ANN for time series.
Watch this space!
Automatic time series forecasting Automatic nonlinear forecasting? 57
146. Automatic nonlinear forecasting
Automatic ANN in M3 competition did poorly.
Linear methods did best in the NN3
competition!
Very few machine learning methods get
published in the IJF because authors cannot
demonstrate their methods give better
forecasts than linear benchmark methods,
even on supposedly nonlinear data.
Some good recent work by Kourentzes and
Crone (Neurocomputing, 2010) on automated
ANN for time series.
Watch this space!
Automatic time series forecasting Automatic nonlinear forecasting? 57
147. Automatic nonlinear forecasting
Automatic ANN in M3 competition did poorly.
Linear methods did best in the NN3
competition!
Very few machine learning methods get
published in the IJF because authors cannot
demonstrate their methods give better
forecasts than linear benchmark methods,
even on supposedly nonlinear data.
Some good recent work by Kourentzes and
Crone (Neurocomputing, 2010) on automated
ANN for time series.
Watch this space!
Automatic time series forecasting Automatic nonlinear forecasting? 57
148. Automatic nonlinear forecasting
Automatic ANN in M3 competition did poorly.
Linear methods did best in the NN3
competition!
Very few machine learning methods get
published in the IJF because authors cannot
demonstrate their methods give better
forecasts than linear benchmark methods,
even on supposedly nonlinear data.
Some good recent work by Kourentzes and
Crone (Neurocomputing, 2010) on automated
ANN for time series.
Watch this space!
Automatic time series forecasting Automatic nonlinear forecasting? 57
149. Automatic nonlinear forecasting
Automatic ANN in M3 competition did poorly.
Linear methods did best in the NN3
competition!
Very few machine learning methods get
published in the IJF because authors cannot
demonstrate their methods give better
forecasts than linear benchmark methods,
even on supposedly nonlinear data.
Some good recent work by Kourentzes and
Crone (Neurocomputing, 2010) on automated
ANN for time series.
Watch this space!
Automatic time series forecasting Automatic nonlinear forecasting? 57
150. Outline
1 Motivation
2 Forecasting competitions
3 Forecasting the PBS
4 Exponential smoothing
5 ARIMA modelling
6 Automatic nonlinear forecasting?
7 Time series with complex seasonality
8 Recent developments
Automatic time series forecasting Time series with complex seasonality 58
151. Examples
Automatic time series forecasting Time series with complex seasonality 59
US finished motor gasoline products
Weeks
Thousandsofbarrelsperday
1992 1994 1996 1998 2000 2002 2004
6500700075008000850090009500
152. Examples
Automatic time series forecasting Time series with complex seasonality 59
Number of calls to large American bank (7am−9pm)
5 minute intervals
Numberofcallarrivals
100200300400
3 March 17 March 31 March 14 April 28 April 12 May
153. Examples
Automatic time series forecasting Time series with complex seasonality 59
Turkish electricity demand
Days
Electricitydemand(GW)
2000 2002 2004 2006 2008
10152025
154. TBATS model
TBATS
Trigonometric terms for seasonality
Box-Cox transformations for heterogeneity
ARMA errors for short-term dynamics
Trend (possibly damped)
Seasonal (including multiple and non-integer periods)
Automatic algorithm described in AM De Livera,
RJ Hyndman, and RD Snyder (2011). “Forecasting
time series with complex seasonal patterns using
exponential smoothing”. Journal of the American
Statistical Association 106(496), 1513–1527.
Automatic time series forecasting Time series with complex seasonality 60
155. TBATS model
yt = observation at time t
y
(ω)
t =
(yω
t − 1)/ω if ω = 0;
log yt if ω = 0.
y
(ω)
t = t−1 + φbt−1 +
M
i=1
s
(i)
t−mi
+ dt
t = t−1 + φbt−1 + αdt
bt = (1 − φ)b + φbt−1 + βdt
dt =
p
i=1
φidt−i +
q
j=1
θjεt−j + εt
s
(i)
t =
ki
j=1
s
(i)
j,t
Automatic time series forecasting Time series with complex seasonality 61
s
(i)
j,t = s
(i)
j,t−1 cos λ
(i)
j + s
∗(i)
j,t−1 sin λ
(i)
j + γ
(i)
1 dt
s
(i)
j,t = −s
(i)
j,t−1 sin λ
(i)
j + s
∗(i)
j,t−1 cos λ
(i)
j + γ
(i)
2 dt
156. TBATS model
yt = observation at time t
y
(ω)
t =
(yω
t − 1)/ω if ω = 0;
log yt if ω = 0.
y
(ω)
t = t−1 + φbt−1 +
M
i=1
s
(i)
t−mi
+ dt
t = t−1 + φbt−1 + αdt
bt = (1 − φ)b + φbt−1 + βdt
dt =
p
i=1
φidt−i +
q
j=1
θjεt−j + εt
s
(i)
t =
ki
j=1
s
(i)
j,t
Automatic time series forecasting Time series with complex seasonality 61
s
(i)
j,t = s
(i)
j,t−1 cos λ
(i)
j + s
∗(i)
j,t−1 sin λ
(i)
j + γ
(i)
1 dt
s
(i)
j,t = −s
(i)
j,t−1 sin λ
(i)
j + s
∗(i)
j,t−1 cos λ
(i)
j + γ
(i)
2 dt
Box-Cox transformation
157. TBATS model
yt = observation at time t
y
(ω)
t =
(yω
t − 1)/ω if ω = 0;
log yt if ω = 0.
y
(ω)
t = t−1 + φbt−1 +
M
i=1
s
(i)
t−mi
+ dt
t = t−1 + φbt−1 + αdt
bt = (1 − φ)b + φbt−1 + βdt
dt =
p
i=1
φidt−i +
q
j=1
θjεt−j + εt
s
(i)
t =
ki
j=1
s
(i)
j,t
Automatic time series forecasting Time series with complex seasonality 61
s
(i)
j,t = s
(i)
j,t−1 cos λ
(i)
j + s
∗(i)
j,t−1 sin λ
(i)
j + γ
(i)
1 dt
s
(i)
j,t = −s
(i)
j,t−1 sin λ
(i)
j + s
∗(i)
j,t−1 cos λ
(i)
j + γ
(i)
2 dt
Box-Cox transformation
M seasonal periods
158. TBATS model
yt = observation at time t
y
(ω)
t =
(yω
t − 1)/ω if ω = 0;
log yt if ω = 0.
y
(ω)
t = t−1 + φbt−1 +
M
i=1
s
(i)
t−mi
+ dt
t = t−1 + φbt−1 + αdt
bt = (1 − φ)b + φbt−1 + βdt
dt =
p
i=1
φidt−i +
q
j=1
θjεt−j + εt
s
(i)
t =
ki
j=1
s
(i)
j,t
Automatic time series forecasting Time series with complex seasonality 61
s
(i)
j,t = s
(i)
j,t−1 cos λ
(i)
j + s
∗(i)
j,t−1 sin λ
(i)
j + γ
(i)
1 dt
s
(i)
j,t = −s
(i)
j,t−1 sin λ
(i)
j + s
∗(i)
j,t−1 cos λ
(i)
j + γ
(i)
2 dt
Box-Cox transformation
M seasonal periods
global and local trend
159. TBATS model
yt = observation at time t
y
(ω)
t =
(yω
t − 1)/ω if ω = 0;
log yt if ω = 0.
y
(ω)
t = t−1 + φbt−1 +
M
i=1
s
(i)
t−mi
+ dt
t = t−1 + φbt−1 + αdt
bt = (1 − φ)b + φbt−1 + βdt
dt =
p
i=1
φidt−i +
q
j=1
θjεt−j + εt
s
(i)
t =
ki
j=1
s
(i)
j,t
Automatic time series forecasting Time series with complex seasonality 61
s
(i)
j,t = s
(i)
j,t−1 cos λ
(i)
j + s
∗(i)
j,t−1 sin λ
(i)
j + γ
(i)
1 dt
s
(i)
j,t = −s
(i)
j,t−1 sin λ
(i)
j + s
∗(i)
j,t−1 cos λ
(i)
j + γ
(i)
2 dt
Box-Cox transformation
M seasonal periods
global and local trend
ARMA error
160. TBATS model
yt = observation at time t
y
(ω)
t =
(yω
t − 1)/ω if ω = 0;
log yt if ω = 0.
y
(ω)
t = t−1 + φbt−1 +
M
i=1
s
(i)
t−mi
+ dt
t = t−1 + φbt−1 + αdt
bt = (1 − φ)b + φbt−1 + βdt
dt =
p
i=1
φidt−i +
q
j=1
θjεt−j + εt
s
(i)
t =
ki
j=1
s
(i)
j,t
Automatic time series forecasting Time series with complex seasonality 61
s
(i)
j,t = s
(i)
j,t−1 cos λ
(i)
j + s
∗(i)
j,t−1 sin λ
(i)
j + γ
(i)
1 dt
s
(i)
j,t = −s
(i)
j,t−1 sin λ
(i)
j + s
∗(i)
j,t−1 cos λ
(i)
j + γ
(i)
2 dt
Box-Cox transformation
M seasonal periods
global and local trend
ARMA error
Fourier-like seasonal
terms
161. TBATS model
yt = observation at time t
y
(ω)
t =
(yω
t − 1)/ω if ω = 0;
log yt if ω = 0.
y
(ω)
t = t−1 + φbt−1 +
M
i=1
s
(i)
t−mi
+ dt
t = t−1 + φbt−1 + αdt
bt = (1 − φ)b + φbt−1 + βdt
dt =
p
i=1
φidt−i +
q
j=1
θjεt−j + εt
s
(i)
t =
ki
j=1
s
(i)
j,t
Automatic time series forecasting Time series with complex seasonality 61
s
(i)
j,t = s
(i)
j,t−1 cos λ
(i)
j + s
∗(i)
j,t−1 sin λ
(i)
j + γ
(i)
1 dt
s
(i)
j,t = −s
(i)
j,t−1 sin λ
(i)
j + s
∗(i)
j,t−1 cos λ
(i)
j + γ
(i)
2 dt
Box-Cox transformation
M seasonal periods
global and local trend
ARMA error
Fourier-like seasonal
terms
TBATS
Trigonometric
Box-Cox
ARMA
Trend
Seasonal
162. Examples
fit - tbats(gasoline)
fcast - forecast(fit)
plot(fcast)
Automatic time series forecasting Time series with complex seasonality 62
Forecasts from TBATS(0.999, {2,2}, 1, {52.1785714285714,8})
Weeks
Thousandsofbarrelsperday
1995 2000 2005
70008000900010000
163. Examples
fit - tbats(callcentre)
fcast - forecast(fit)
plot(fcast)
Automatic time series forecasting Time series with complex seasonality 63
Forecasts from TBATS(1, {3,1}, 0.987, {169,5, 845,3})
5 minute intervals
Numberofcallarrivals
0100200300400500
3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June
164. Examples
fit - tbats(turk)
fcast - forecast(fit)
plot(fcast)
Automatic time series forecasting Time series with complex seasonality 64
Forecasts from TBATS(0, {5,3}, 0.997, {7,3, 354.37,12, 365.25,4})
Days
Electricitydemand(GW)
2000 2002 2004 2006 2008 2010
10152025
165. Outline
1 Motivation
2 Forecasting competitions
3 Forecasting the PBS
4 Exponential smoothing
5 ARIMA modelling
6 Automatic nonlinear forecasting?
7 Time series with complex seasonality
8 Recent developments
Automatic time series forecasting Recent developments 65
166. Further competitions
1 2011 tourism forecasting competition.
2 Kaggle and other forecasting platforms.
3 GEFCom 2012: Point forecasting of
electricity load and wind power.
4 GEFCom 2014: Probabilistic forecasting
of electricity load, electricity price,
wind energy and solar energy.
Automatic time series forecasting Recent developments 66
167. Further competitions
1 2011 tourism forecasting competition.
2 Kaggle and other forecasting platforms.
3 GEFCom 2012: Point forecasting of
electricity load and wind power.
4 GEFCom 2014: Probabilistic forecasting
of electricity load, electricity price,
wind energy and solar energy.
Automatic time series forecasting Recent developments 66
168. Further competitions
1 2011 tourism forecasting competition.
2 Kaggle and other forecasting platforms.
3 GEFCom 2012: Point forecasting of
electricity load and wind power.
4 GEFCom 2014: Probabilistic forecasting
of electricity load, electricity price,
wind energy and solar energy.
Automatic time series forecasting Recent developments 66
169. Further competitions
1 2011 tourism forecasting competition.
2 Kaggle and other forecasting platforms.
3 GEFCom 2012: Point forecasting of
electricity load and wind power.
4 GEFCom 2014: Probabilistic forecasting
of electricity load, electricity price,
wind energy and solar energy.
Automatic time series forecasting Recent developments 66
170. Forecasts about forecasting
1 Automatic algorithms will become more
general — handling a wide variety of time
series.
2 Model selection methods will take account
of multi-step forecast accuracy as well as
one-step forecast accuracy.
3 Automatic forecasting algorithms for
multivariate time series will be developed.
4 Automatic forecasting algorithms that
include covariate information will be
developed.
Automatic time series forecasting Recent developments 67
171. Forecasts about forecasting
1 Automatic algorithms will become more
general — handling a wide variety of time
series.
2 Model selection methods will take account
of multi-step forecast accuracy as well as
one-step forecast accuracy.
3 Automatic forecasting algorithms for
multivariate time series will be developed.
4 Automatic forecasting algorithms that
include covariate information will be
developed.
Automatic time series forecasting Recent developments 67
172. Forecasts about forecasting
1 Automatic algorithms will become more
general — handling a wide variety of time
series.
2 Model selection methods will take account
of multi-step forecast accuracy as well as
one-step forecast accuracy.
3 Automatic forecasting algorithms for
multivariate time series will be developed.
4 Automatic forecasting algorithms that
include covariate information will be
developed.
Automatic time series forecasting Recent developments 67
173. Forecasts about forecasting
1 Automatic algorithms will become more
general — handling a wide variety of time
series.
2 Model selection methods will take account
of multi-step forecast accuracy as well as
one-step forecast accuracy.
3 Automatic forecasting algorithms for
multivariate time series will be developed.
4 Automatic forecasting algorithms that
include covariate information will be
developed.
Automatic time series forecasting Recent developments 67
174. For further information
robjhyndman.com
Slides and references for this talk.
Links to all papers and books.
Links to R packages.
A blog about forecasting research.
Automatic time series forecasting Recent developments 68