Recent advances in computer recording and storing technology have tremendously increased the presence of functional data, whose graphical representation can be infinite-dimensional curve, image, or shape. When the same functional object is observed over a period of time, such data are known as functional time series. This article makes first attempt to describe several techniques (centered around functional principal component analysis) for modeling and forecasting functional time series from a computational aspect, using a readily-available R addon package. These methods are demonstrated using age-specific Australian fertility rate data from 1921 to 2006, and monthly sea surface temperature data from January 1950 to December 2011.
The aim of this article is to describe the R functions that are readily-available in the ftsa package (Hyndman and H. L. Shang 2013), for modeling and forecasting functional time series. This article was motivated by recent advances in computer recording and storing technology that have enabled researchers to collect and store (ultra) high-dimensional data. When the high-dimensional data are repeatedly measured on the same object over a period of time, a time series of continuous functions is observed within a common bounded interval (Shen and J. Z. Huang 2008).
Analyzing functional time series has received increasing attention in the functional data analysis literature (see for example, Hörmann and P. Kokoszka 2010; Horváth, M. Hušková, and P. Kokoszka 2010; Horváth and P. Kokoszka 2012). (Hyndman and H. L. Shang 2010) presented a rainbow plot for visualizing functional time series, where the distant past data are shown in red and most recent data are shown in purple. (Aguilera, F. Ocana, and M. Valderrama 1999) proposed functional principal component regression (FPCR) to model and forecast functional time series.
Before reviewing the FPCR, we first define the problem more precisely.
Let
where
This article proceeds as follows. Techniques for modeling and forecasting functional time series are reviewed and their implementations using the ftsa package are described. Conclusions are given at the end.
The theoretical, methodological and practical aspects of functional principal component analysis (FPCA) have been extensively studied in the functional data analysis literature, since it allows finite dimensional analysis of a problem that is intrinsically infinite-dimensional (Hall and M. Hosseini-Nasab 2006). Numerous examples of using FPCA as an estimation tool in regression problem can be found in different fields of applications, such as breast cancer mortality rate modeling and forecasting (Erbas, R. J. Hyndman, and D. M. Gertig 2007), call volume forecasting (Shen and J. Z. Huang 2008), climate forecasting (Shang and R. J. Hyndman 2011), demographical modeling and forecasting (Hyndman and H. L. Shang 2009), and electricity demand forecasting (Antoch, L. Prchal, M. R. De Rosa, and P. Sarda 2008).
At a population level, a stochastic process denoted by
In practice, we can only observe
where
The functional principal component decomposition is first demonstrated using the age-specific Australian fertility rate data between ages 15 and 49 observed from 1921 to 2006. This data set was obtained from the Australian Bureau of Statistics (Cat No, 3105.0.65.001, Table 38). A functional graphical display is given in (Shang 2011).
Figure 1 presents the first two functional principal components and their associated principal component scores. The bottom panel of Figure 1 also plots the forecasted principal component scores, and their 80% prediction intervals (in yellow color), using an exponential smoothing state-space model (Hyndman, A. B. Koehler, J. K. Ord, and R. D. Snyder 2008). As pointed out by a referee, the forecasts of principal component scores appear to quickly level off, and the prediction intervals widen very quickly. This reflects the difficulty of our model in forecasting medium or long term horizon, as a result of the increase in variability.
Figure 1 was produced by the following code.
# load the package used throughout this article
library("ftsa")
# Fit and plot functional principal components
# order specifies the number of principal components
# h specifies the forecast horizon
plot(forecast(ftsm(Australiasmoothfertility, order=2), h=20), "components")
By conditioning on the set of smoothed functions
where
Figure 2 shows the forecasts of Australian fertility rate data from 2007 to 2026 highlighted in rainbow color, while the data used for estimation are grayed out. Both the multi-step-ahead and iterative one-step-ahead forecasts exhibit a continuing shift to older ages of peak fertility rates, caused by a recent tendency to postpone child-bearing while pursuing careers.
(a)
Multiple-step-ahead forecasts. Based on the historical data from 1921 to
2006, we obtain 20-step-ahead forecasts for 2007 to 2026.
(b) Iterative
one-step-ahead forecasts. Based on the historical data from 1921 to
2006, we obtain iterative one-step-ahead forecasts for 2007 to 2026
using the rolling origin approach.
Figure 2 was produced by the following code.
# Plot the historical data in gray
plot(Australiasmoothfertility, col = gray(0.8), xlab = "Age",
ylab = "Count of live birth (per 1,000 females)",
main = "Forecasted fertility rates (2007-2026)")
# Plot the forecasts in rainbow color for Fig. 4(a)
plot(forecast(ftsm(Australiasmoothfertility, order = 2), h = 20), add = TRUE)
legend("topright", c("2007", "2026"), col = c("red", "blue"), lty = 1)
plot(Australiasmoothfertility, col = gray(0.8), xlab = "Age",
ylab = "Count of live birth (per 1,000 females)",
main = "Forecasted fertility rates (2007-2026)")
# Plot the forecasts in rainbow color for Fig. 4(b)
plot(ftsmiterativeforecasts(Australiasmoothfertility, components = 2, iteration = 20),
add = TRUE)
legend("topright", c("2007", "2026"), col = c("red", "blue"), lty = 1)
To construct prediction interval, we calculate the forecast variance
that follows from (1) and (2). Because of
orthogonality, the forecast variance can be approximated by the sum of
component variances
Based on the normality assumption, the
Figure 3 displays the forecasts of fertility rates in 2007, along with the 80% prediction interval. It was created by the following code.
# Plot the point forecast
aus = forecast(ftsm(Australiasmoothfertility, order = 2), h = 1)
plot(aus, ylim = c(0, 140))
# Plot the lower and upper bounds
lines(aus$lower, col = 2); lines(aus$upper, col = 2)
# Add a legend to the plot
legend("topright", c("Point forecasts", "Interval forecasts"), col = c(1, 2), lty = 1,
cex = 0.9)
For this Australian fertility rate data, the point and interval forecast accuracy obtained by the FPCR have already been studied by (Shang 2012).
A special case of functional time series is when the continuous variable
is also a time variable, such as the monthly sea surface temperature
data from 1950 to 2011, obtained from National Oceanic and Atmospheric
Administration
(http://www.cpc.noaa.gov/ data/indices/sstoi.indices).
A similar type of functional graphical display is given in Shang (2011 2). Such data originate from a univariate seasonal time series.
Let
When
When functional time series are segments of a univariate time series,
the most recent trajectory is observed sequentially (Hyndman and H. L. Shang 2010). When we
observe the first
In order to improve point forecast accuracy, it is desirable to
dynamically update the point and interval forecasts for the rest of year
The BM method re-defines the start and end points of trajectories.
Because time is a continuous variable, we can change the function
support from
The re-defined data are shown diagrammatically in Figure 4, where the bottom box has moved to become the top box. The cyan colored region shows the data loss in the first year. The partially observed last trajectory under the old “year" completes the last trajectory under the new year.
As an illustration, suppose we observe the monthly sea surface temperature data from January 1950 to May 2011, we aim to update the point and interval forecasts from June 2011 to December 2011. Figure 5 displays the point and interval forecasts for the remaining months of 2011, by using the BM method.
Figure 5 was created by the following code
# Name history to represent historical data,
history <- ElNino2011smooth
# Name obs to represent partially observed data,
obs <- ElNino2011smooth$y[1:5,62]
# Name fore to represent the forecasting period
fore <- ElNino2011smooth$y[6:12,62]
int <- dynupdate(data = history, newdata = obs, holdoutdata = fore,
method = "block", interval = TRUE, level = 80)
bmupdate <- dynupdate(data = history, newdata = obs, holdoutdata = fore,
method = "block", value = TRUE)
plot(6:12, fore, type = "l", ylim = c(19, 26), xlab = "Month",
ylab = "Sea surface temperature")
lines(6:12, bmupdate, col = 4)
lines(6:12, int$low$y, col = 2); lines(6:12, int$up$y, col = 2)
legend("topright", c("True observations", "Point forecasts", "Interval forecasts"),
col=c(1, 4, 2), lty = 1, cex = 0.8)
We can also update the remaining part of the trajectory by using
regression-based approaches. Let
where
The point forecasts of
Among these regression-based approaches, the PLS method can also update
the interval forecasts. Let the one-step-ahead forecast errors of the
principal component scores be given by
Figure 6 displays the point and interval forecasts for the remaining time period of year 2011, by using the PLS method. It was created by the following code
history <- ElNino2011smooth
obs <- ElNino2011smooth$y[1:5, 62]
fore <- ElNino2011smooth$y[6:12, 62]
# Implement the ridge and PLS regressions,
# The tuning parameter lambda=100 as an
# illustration
rrmethod <- dynupdate(history, newdata = obs, holdoutdata = fore, method = "ridge",
value = TRUE, lambda = 100, level = 80)
plsmethod <- dynupdate(history, newdata = obs, holdoutdata = fore, method = "pls",
value = TRUE, lambda = 100, level = 80)
plsmethodint <- dynupdate(history, newdata = obs, holdoutdata = fore, method = "pls",
interval = TRUE, lambda = 100, level = 80)
# Plot the true observations for forecasting period
plot(6:12, fore, type = "l", ylim = c(19, 26), xlab = "Month",
ylab = "Sea surface temperature")
# Plot point forecasts obtained by RR and PLS
lines(6:12, plsmethod, col = 4); lines(6:12, rrmethod, col = "purple")
# Plot interval forecasts obtained by PLS
lines(6:12, plsmethodint$low$y, col = 2); lines(6:12, plsmethodint$up$y, col = 2)
legend("topright",c("True observations", "PLS point forecasts", "RR point forecasts",
"PLS interval forecasts"), col = c(1, 4, "purple", 2), lty = 1, cex = 0.8)
For this sea surface temperature data set, the point and interval forecast accuracy obtained by the FPCR, BM and PLS methods have already been studied by (Shang and R. J. Hyndman 2011).
This article described several techniques in the ftsa package, for modeling and forecasting functional time series. These methods centered on the FPCR, which is a common dimension reduction technique in the functional data analysis literature. FPCR reduces intrinsically infinite number of variables to several orthogonal regressors, which captures the main mode of variation in data. As illustrated by the Australian fertility rate data, FPCR is able to model and forecast annual Australian fertility rates through either multi-step-ahead forecasts or iterative one-step-ahead forecasts using the rolling origin approach. When the continuous variable in a functional time series is also a time variable, a new observation arrives sequentially. As shown using the monthly sea surface temperature data, the BM and PLS methods can update the point and interval forecasts based on the FPCR.
To sum up, the methods reviewed in this article focus on extracting patterns from a set of functional time series, and should be considered when the interest lies in modeling and forecasting the future realizations of a stochastic process.
I would like to thank the editors and two reviewers for constructive comments and suggestions that have significantly improved this article. Thanks also go to Professor Rob Hyndman for helping with R code.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Shang, "ftsa: An R Package for Analyzing Functional Time Series", The R Journal, 2013
BibTeX citation
@article{RJ-2013-006, author = {Shang, Han Lin}, title = {ftsa: An R Package for Analyzing Functional Time Series}, journal = {The R Journal}, year = {2013}, note = {https://rjournal.github.io/}, volume = {5}, issue = {1}, issn = {2073-4859}, pages = {64-72} }