The bayesdfa package provides a flexible Bayesian modeling framework for applying dynamic factor analysis (DFA) to multivariate time-series data as a dimension reduction tool. The core estimation is done with the Stan probabilistic programming language. In addition to being one of the few Bayesian implementations of DFA, novel features of this model include (1) optionally modeling latent process deviations as drawn from a Student-t distribution to better model extremes, and (2) optionally including autoregressive and moving-average components in the latent trends. Besides estimation, we provide a series of plotting functions to visualize trends, loadings, and model predicted values. A secondary analysis for some applications is to identify regimes in latent trends. We provide a flexible Bayesian implementation of a Hidden Markov Model — also written with Stan — to characterize regime shifts in latent processes. We provide simulation testing and details on parameter sensitivities in supplementary information.
A goal of many multivariate statistical techniques is to reduce dimensionality in observed data to identify shared or latent processes. Factor analysis models represent a general class of models used to relate multiple observations to a lower dimension (factors), while also considering different covariance structures of the observed data. Factors are not directly observed, but represent a hidden, shared process among variables. Though goals of factor analysis are sometimes similar to techniques such as principal component analysis (PCA), factor analysis models explicitly estimate residual error terms, whereas PCA does not (Anderson and Rubin 1956; Jolliffe 1986). These factor models are written as \({ y }_{ i }={ u }_{ i }+\textbf{ Z }{f}_{ i }+{ \varepsilon }_{ i }\), where observed data \({ y }_{ i }\) is a linear combination of an intercept \({ u }_{ i }\) and the product of latent factors \({f}_{ i }\) and loadings \(\textbf{ Z }\) (loadings are sometimes referred to in the literature as \(\textbf{L}\)).
In a time-series setting, factor models may be extended to dynamic factor analysis (DFA) models. DFA models aim to reduce the dimensionality of a collection of time series by estimating a set of shared trends and factors, representing the linear effects of each trend on the observed data (Molenaar 1985; Zuur et al. 2003; Stock and Watson 2005). The number of trends \(m\) is chosen to be less or equal than the number of time series \(n\). The general form of the DFA model can be formulated as a state-space model (Petris 2010). The latent processes (also referred to as ‘trends’) are generally modeled as random walks, so that trend \(i\) is modeled as \({ x }_{ i,t+1 }={ x }_{ i,t }+{ w }_{ i,t }\) where \({ x }_{ i, t }\) is the value of the \(i\)-th latent trend at time \(t\), and the deviations \({ w }_{ i,t }\) are modeled as white noise. Across trends, these deviations are modeled as \(\textbf{w}_{t} \sim \mathrm{MVN}(0, \textbf{Q})\). The latent trends \({ x }_{ i,t }\) are linked to data via a loadings matrix \(\textbf{Z}\) whose values do not evolve through time, \({ y }_{ t }=\textbf{Z}{ x }_{ t } + \textbf{a} + \textbf{B}{ d }_{ t } + \textbf{e}_{t}\). The loadings matrix \(\textbf{Z}\) is dimensioned \(n\: \times\: m\) so that \({Z}_{j,i}\) represents the effect of trend \(i\) on time series \(j\). The parameters \(\textbf{a}\) and \(\textbf{B}\) are optional parameters, representing time-series-specific intercepts and effects of covariates, \({d}_{t}\). Finally, the residual errors are assumed to be \(\textbf{e}_{t} \sim \mathrm{MVN}(0,\textbf{R})\), where \(\textbf{R}\) is an estimated covariance matrix.
Estimation of DFA models is typically done in a maximum likelihood framework, using the expectation-maximization (EM) algorithm or other optimization tools. Implementation of these methods is available in multiple R packages including dlm (Petris 2010), KFAS (Helske 2017), MARSS (Holmes et al. 2012b), and tsfa (Gilbert and Meijer 2005). Challenges in parameter estimation and interpretation for DFA models have been well studied. Without constraints, parameters in the DFA model are not identifiable (Harvey 1990; Zuur et al. 2003). To ensure identifiability of variance parameters, for example, the covariance matrix \(\textbf{Q}\) is generally fixed as an identity matrix (Harvey 1990). To avoid confounding the latent trends and loadings matrix \(\textbf{Z}\), elements of \(\textbf{Z}\) must also be constrained. A common choice of constraints is for the elements in the first \(m-1\) rows of \(\textbf{Z}\) to be set to zero if the column index is greater than the row index, \(j > i\) (Harvey 1990), though other constraints have been proposed (Bai and Wang 2015). For a 3-trend DFA model for instance, these constraints would mean that the \(\textbf{Z}\) matrix parameters would be configured as \[\begin{matrix} \begin{bmatrix} { Z }_{ 1,1 } & 0 & 0 \\ { Z }_{ 2,1 } & { Z }_{ 2,2 } & 0 \\ { Z }_{ 3,1 } & { Z }_{ 3,2 } & { Z }_{ 3,3 } \\ ... & ... & ... \end{bmatrix} \end{matrix}.\]
Several previous approaches to DFA estimation in a maximum likelihood framework also center (subtract the sample means) or standardize (subtract the sample means and divide by the sample standard deviations) data prior to fitting DFA models and set the intercepts \(a\) equal to zero to avoid potential confounding of level parameters (Holmes et al. 2012a). We adopt a similar approach, allowing users to either center or standardize data before estimation, and not including the intercepts as estimated parameters.
We developed our DFA model in a Bayesian framework, using Stan and the package rstan (Stan Development Team 2016), which implements Markov chain Monte Carlo (MCMC) using the No-U Turn Sampling (NUTS) algorithm (Hoffman and Gelman 2014; Carpenter et al. 2017). Although estimation of the DFA model in a Bayesian setting is not new (Aguilar and West 2000; Koop and Korobilis 2010; Stock and Watson 2011), it presents several interesting challenges over the EM algorithm. In addition to the constraints on \(\textbf{Q}\) and \(\textbf{Z}\), Bayesian estimation suffers from a problem of label switching. In particular, elements of \(\textbf{F}\) or \(\textbf{Z}\) may flip sign within an MCMC chain, or multiple chains may converge on parameters that are identical in magnitude but with different signs.
To minimize issues with label switching, previous work on Bayesian
factor analysis has proposed additional constraints on the loadings
matrix, including setting the elements of \(\textbf{Z}\) to be constrained
(-1, 1), or adding a positive constraint to the diagonal,
\({ Z }_{ ii }>0\) (Geweke and Zhou 1996; Aguilar and West 2000). Though these constraints
generally help, there may be situations where MCMC chains still do not
converge. To address this issue, we adopt the parameter-expanded priors
for the loadings and trends proposed by Ghosh and Dunson (2009). To ensure that the
sign of the estimated quantities is the same across MCMC chains, we
created the function flip_trends()
to flip the posterior samples of
MCMC chains relative to the first chain as needed.
There are several approaches for modeling extreme deviations in time series models. Techniques include modeling deviations as a two-component mixture (Ward et al. 2007; Evin et al. 2011), or modeling deviations with non-Gaussian distributions including the Student-t distribution (Praetz 1972; Anderson et al. 2017; Anderson and Ward 2018). There are several existing packages to include Student-t distributions; these include heavy for applications to regression and mixed effects models (Osorio and F. 2018), bsts for univariate time series models (Scott 2018), and stochvol for stochastic volatility models (Kastner 2016). Because switching from a Gaussian to Student-t distribution only introduces a single parameter, \(\nu\), the degrees of freedom, we extend the latter approach to a multivariate setting to model extreme events in the latent trends, so that deviations in the trends are modeled as \({ w }_{ t } \sim \mathrm{MVT}(\nu, 0, \textbf{Q})\). As before, \(\textbf{Q}\) is fixed as an identity matrix \(\textbf{I}\). Our parameterization constrains DFA models to have the same degrees of freedom \(\nu\) in the residuals of the multiple trends, which may be fixed a priori or treated as a free parameter with a gamma(shape = 2, rate = 0.1)[2,\(\infty\)] prior (Juárez and Steel 2010).
The trends of the dynamic factor model are most commonly modeled as
non-stationary random walks,
\({ x }_{ i,t+1 }={ x }_{ i,t }+{ w }_{ i,t }\), where the
\({ w }_{ i,t } \sim N(0,1)\) are Gaussian white noise. Like with other
vector autoregressive time series models, this framework can be easily
extended to include optional autoregressive (AR) or moving average (MA)
components (Chow et al. 2011). We allow for AR(1) and MA(1) processes to be
specified with boolean arguments to the fit_dfa()
function. For both
the AR(1) and MA(1) components, we assume separate parameters for each
trend. Including the AR(1) component \(\phi_{i}\) makes the trend process
become \({ x }_{ i,t+1 }=\phi_{i}{ x }_{ i,t }+{ w }_{ i,t }\), where
values of \(\phi_{i}\) close to 1 make the trend behave as a random walk,
and small values of \(\phi_{i}\) close to 0 make the trend behave as white
noise. Similarly, we model the MA(1) component as an AR(1) process on
the error terms \({ w }_{ i,t }\). Instead of being independent at each
time step, \({ \theta }_{ i }\) controls the degree of autocorrelation
among deviations,
\({ w }_{ i,t }\sim \mathrm{N}\left( { { \theta }_{ i }w }_{ i,t-1 }, 1 \right)\).
For stationarity and invertability, we constrain
\(\left| \phi _{ i } \right|\) < 1 and \(\left| \theta _{ i } \right|\) <
1.
Like factor analysis models, there are many solutions from a DFA model capable of producing the same fit to the data. Following previous authors, we use a varimax rotation of the loadings matrix \(\textbf{Z}\) to transform the posterior loadings and trends (Kaiser 1958; Harvey 1990; Holmes et al. 2012a). If \(\widehat { \textbf{Z} }\) is the posterior mean of the loadings matrix from a DFA model of 4 time series and 2 trends for example, the rotation matrix \(\textbf{ W }^{ * }=\mathrm{varimax}(\widehat{ \textbf{Z} } )\) is dimensioned \(2\: \times\: 2\). The rotated loadings matrix can then be calculated as \(\widehat{ \textbf{Z} }^{ * } = \widehat{\textbf{Z}}\textbf{ W }^{ * }\) and rotated trends calculated as \(\widehat{ \textbf{x} }^{ * }=\textbf{ W }^{ *-1 }\widehat{ \textbf{x} }\), where \(\widehat{\textbf{x}}\) is the posterior mean of the trends.
Since the number of trends in a DFA model is not a parameter, comparing
data support across models is often necessary. Using model selection
tools to identify data support is available via Akaike’s Information
Criterion (AIC) in packages implementing maximum likelihood for
estimation of state-space models (Petris 2010; Holmes et al. 2012b). In addition
to comparing the relative support of different number of trends, model
selection for Bayesian dynamic factor models may be useful for
evaluating the error structure for the residual error covariance matrix
\(\textbf{R}\), whether covariates should be included, whether latent
trends are better modeled with a distribution allowing for extremes (MVT
versus MVN), and whether the latent trends support estimation of AR or
MA components. For our Bayesian DFA models, we extend the
loo package
(Vehtari et al. 2016a,b) to generate estimates of LOOIC
(Leave-One-Out Information Criterion) for fitted models. To ease the
selection process, bayesdfa includes the function find_dfa_trends()
to run multiple models specified by the user. It returns a table of
LOOIC values (denoting which of those failed convergence criteria) and
the model with the lowest LOOIC value.
As a diagnostic tool, we include the function find_swans()
to fitted
DFA models. We adopt the same approach and terminology for ‘black-swan
events’ as in Anderson et al. (2017), where black-swan events are rare and
unexpected extremes. Our find_swans()
function first-differences the
posterior mean estimates of each DFA trend and evaluates the probability
of observing a difference that is more extreme than expected under a
normal distribution with the same scale parameter. Events beyond a
user-defined threshold (e.g. 1 in 100, or 1 in 10,000) are then
classified as outliers and plotted.
To evaluate the ability of the Bayesian DFA model to identify anomalies
in latent processes, we created simulated data using our sim_dfa()
function. We generated simulated multivariate time series (\(n = 4\) time
series with \(T = 20\) time steps each) with \(m = 2\) underlying latent
trends. Extremes were included as a step-change in the midpoint of the
first trend in each simulated dataset. We varied the value of the step
from -4 to -8, which represent unlikely events under the assumption that
temporal deviations in the latent trends are distributed according to
\(N(0,1)\). Because increased observation error may corrupt inference
about anomalies in the trends, we considered three levels of observation
error \((\sigma = 0.25, 0.75, 1.25)\). We generated 200 simulated samples
for each permutation of parameters, resulting in a total of 3000
datasets.
We fit the Bayesian DFA model with Student-t errors to each simulated dataset. As expected, the posterior estimates from these simulations illustrate that the ability to estimate low degrees of freedom is related to the magnitude of extremes (Figure 1). Similarly, higher observation error corrupts the ability to estimate extreme events, even when they are large in magnitude (Figure 1).
An alternative approach to DFA for dimension reduction of multivariate time series data are Hidden Markov Models (HMMs). Like DFA models, they model a latent process for a time series (or collection of multivariate time series). Instead of the latent process being modeled continuously (e.g. as a random walk in DFA), HMMs conceive the latent process as a series of discrete-time, discrete-state first-order Markov chains \(s_t \in \left\{ 1, \dots, G \right\}\) with the number of possible states \(G\) specified a priori. State transition is characterized by the \(G\: \times\: G\) transition matrix with simplex rows \(\mathbf{A} = \left\{ a_{ig} \right\}\) where \(a_{ig} = p({ s }_{ t } = g | { s }_{ t-1 } = i)\) represents the probability of transitioning from state \(i\) to \(g\). Useful quantities from HMMs include the transition probabilities between latent states, and the probability of being in a given lantent state at each point in time (Zucchini et al. 2017).
HMMs can be applied to raw multivariate data to identify latent states;
however, they may also be linked with DFA to identify regimes and
transitions in the latent DFA trends. Similar to DFA, applications of
HMMs are widely available in R, including via the packages
depmixS4 (Visser and Speekenbrink 2010),
HMM (Himmelmann 2010), and
msm (Jackson 2011).
Consistent with our implementation of the Bayesian DFA model, we include
fully Bayesian inference in Stan based on (Damiano et al. 2018). We apply
independent HMM models to each DFA trend to identify alternate states or
regimes. Like with the estimation of DFA models, we use the LOOIC metric
to evaluate the relative support for HMMs with different numbers of
underlying states, selecting the converged model with the lowest LOOIC.
By default, we assume the observation model of the input time series to
be normally distributed with the scale parameter equal to the estimated
residual variance. However, for some applications, such as datasets with
changing sampling frequencies over time, uncertainty in DFA trends may
also vary through time. To propogate this uncertainty forward, we also
allow the residual variance to be entered as a known quantity for every
data point in our find_regimes()
function.
To illustrate an example application of the bayesdfa package to real data, we use monthly anomalies of sea surface temperature (SST, measured in \(C^\circ\)). SST is observed from satellite and buoy data at fixed locations, and model-based interpolations are used to generate estimates at additional gridded locations1. We used estimates generated at the locations of 4 observing stations used by the Pacific Fisheries Environmental Laboratory2 from the west coast of North America (USA). The four stations have some degree of correlation with one another, and are separated by approximately 6 degrees of latitude from one another. In summary, we work with \(n = 4\) monthly time series with \(T = 167\) observations each (from 2003–01 to 2016–05) and no missing values.
Initially, we fit a DFA model with 2 hidden trends, and will assume the
4 time series to have the same error variances \(\textbf{R}\). We will fit
the DFA model with possible extremes, modeling process error with a
Student-t distribution by using the argument estimate_nu()
. To
evaluate whether these data support an extreme DFA with trends modeled
as a t-distribution, we will fit two competing forms: one modeling the
random walks with a Gaussian distribution, and the other using a
Student-t distribution. Generating posterior samples for each model
takes approximately 7 minutes per chain, when MCMC chains aren’t run in
parallel.
After fitting the models, we confirm whether the MCMC chains are
consistent with convergence using a threshold value of \(\hat { R }=1.05\)
(Gelman et al. 2014) using our is_converged()
function. We also visually
inspect chain traceplots (e.g. Figure 3) and check the minimum effective
sample size across parameters: NaN.
As a consistency diagnostic, we also retrieve the estimated degrees of freedom from the Student-t model \(\nu\). By visual inspection, Figure 4 shows that the posterior distribution on \(\nu\) is lower than the prior distribution.
We will focus the remaining portion of our analysis on the results from the DFA model with Student-t deviations. In Figure 5, we observe that Trend 1 and Trend 2 both support SST anomalies increasing over the latter half of the time series. Both trends appear to have reversed direction (reverting to the mean in the last 2–3 years) and this pattern is more evident in Trend 1. Because we do not model seasonality explicitly, for example by including a covariate effect for the month, each of the estimated trends also includes the within-year variability that describes seasonal patterns in observed sea surface temperature.
In the violin plot of Figure 6, we note that more southern stations (24 and 30N) contribute largely to Trend 1, while the more northern stations appear to load more heavily on Trend 2.
For each trend, we apply independent HMMs to examine the support for differing numbers of underlying regimes. Both the posterior mean and standard deviation (optional argument) will be the inputs to the HMM.
Using LOOIC as a metric of support for the number of regimes, the estimates reported in Table 1 support the inclusion of 2 regimes for both Trends 1 and 2.
Regimes | LOOIC Trend 1 | LOOIC Trend 2 |
---|---|---|
1 | 855.5 | 756.7 |
2 | 31.0 | 30.3 |
3 | 69.1 | 99.9 |
4 | 139.7 | 164.6 |
Our fit_regimes()
function computes the probability of each time point
being in one of the regime states, which may also be visualized using
plot_regime_model()
. For example, the output of the 2-regime model for
Trend 1 in Figure 7 suggests a change in the middle of the time series,
then changing back again to State 1. Similarly, by the end of the
series, the HMM assigns Trend 1 to being in State 1.
There are a number of extensions to our implementation of the Bayesian DFA model with extremes that could make the model more applicable to a wider range of problems. Examples for the process model include adopting a skew-t distribution for asymmetric extremes. For models estimating multiple trends, multiple parameters may be treated hierarchically (e.g. covariate effects, variance parameters). For the observation or data model, our implementation of the Bayesian DFA model only includes data arising from a Gaussian or Student-t distribution, though this could be extended to include discrete or other continuous densities. Finally, spatial dynamic factor models (sDFA) have emerged as a useful tool for complicated multivariate spatial datasets (Lopes et al. 2011; Thorson et al. 2015), and could be similarly implemented in Stan.
This paper presents the bayesdfa package for applying Bayesian DFA to multivariate time series as a dimension reduction tool, particularly if extreme events may be present in observed data. In addition to allowing for the inclusion of covariates, we also extend the conventional dynamic factor model to include optionial moving average and autoregressive components in the latent trends. Applying this package to a dataset of sea surface temperature from the Northeast Pacific Ocean, we fit DFA models with Gaussian and Student-t errors. Though the model with Student-t errors has slightly lower LOOIC, the results from the two models are similar. Output from these 2-trend DFA models of sea surface temperature are useful in demonstrating a north-to-south gradient in temperature anomalies (Figure 6). Standardized temperature data from southern stations experience more interannual variability and temperatures that are greater in magnitude compared to northern stations (Figure 5). We also illustrate how latent trends from DFA models can be analyzed in a HMM framework to identify regimes and transitions; applied to the sea surface temperature data, both Trend 1 and Trend 2 support 2-regime models (roughly interpreted as ‘warm’ and ‘cool’ regimes; Figure 7).
This work was funded by NOAA’s Fisheries and the Environment (FATE) Program. Development of this package benefitted from discussions with other members of our working group (Jin Gao, Chris Harvey, Sam McClatchie, Stepahni Zador) and scientists at the Northwest Fisheries Science Center (including Mark Scheuerell, James Thorson, Eli Holmes, and Kelly Andrews). 2 anonymous reviewers helped improve the clarity and plots of this paper.
dlm, KFAS, MARSS, tsfa, rstan, heavy, bsts, stochvol, loo, depmixS4, HMM, msm
Bayesian, Cluster, Distributions, Finance, MixedModels, Survival, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ward, et al., "Modeling regimes with extremes: the bayesdfa package for identifying and forecasting common trends and anomalies in multivariate time-series data", The R Journal, 2019
BibTeX citation
@article{RJ-2019-007, author = {Ward, Eric J. and Anderson, Sean C. and Damiano, Luis A. and Hunsicker, Mary E. and Litzow, Michael A.}, title = {Modeling regimes with extremes: the bayesdfa package for identifying and forecasting common trends and anomalies in multivariate time-series data}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {2}, issn = {2073-4859}, pages = {46-55} }