The bayesdfa package provides a flexible Bayesian modeling framework for applying dynamic factor analysis (DFA) to multivariate time-series data as a dimension reduction tool. The core estimation is done with the Stan probabilistic programming language. In addition to being one of the few Bayesian implementations of DFA, novel features of this model include (1) optionally modeling latent process deviations as drawn from a Student-t distribution to better model extremes, and (2) optionally including autoregressive and moving-average components in the latent trends. Besides estimation, we provide a series of plotting functions to visualize trends, loadings, and model predicted values. A secondary analysis for some applications is to identify regimes in latent trends. We provide a flexible Bayesian implementation of a Hidden Markov Model — also written with Stan — to characterize regime shifts in latent processes. We provide simulation testing and details on parameter sensitivities in supplementary information.
A goal of many multivariate statistical techniques is to reduce
dimensionality in observed data to identify shared or latent processes.
Factor analysis models represent a general class of models used to
relate multiple observations to a lower dimension (factors), while also
considering different covariance structures of the observed data.
Factors are not directly observed, but represent a hidden, shared
process among variables. Though goals of factor analysis are sometimes
similar to techniques such as principal component analysis (PCA), factor
analysis models explicitly estimate residual error terms, whereas PCA
does not (Anderson and Rubin 1956; Jolliffe 1986). These factor models are written
as
In a time-series setting, factor models may be extended to dynamic
factor analysis (DFA) models. DFA models aim to reduce the
dimensionality of a collection of time series by estimating a set of
shared trends and factors, representing the linear effects of each trend
on the observed data (Molenaar 1985; Zuur et al. 2003; Stock and Watson 2005). The number
of trends
Estimation of DFA models is typically done in a maximum likelihood
framework, using the expectation-maximization (EM) algorithm or other
optimization tools. Implementation of these methods is available in
multiple R packages including
dlm (Petris 2010),
KFAS (Helske 2017),
MARSS (Holmes et al. 2012b), and
tsfa (Gilbert and Meijer 2005).
Challenges in parameter estimation and interpretation for DFA models
have been well studied. Without constraints, parameters in the DFA model
are not identifiable (Harvey 1990; Zuur et al. 2003). To ensure identifiability
of variance parameters, for example, the covariance matrix
Several previous approaches to DFA estimation in a maximum likelihood
framework also center (subtract the sample means) or standardize
(subtract the sample means and divide by the sample standard deviations)
data prior to fitting DFA models and set the intercepts
We developed our DFA model in a Bayesian framework, using Stan and the
package rstan
(Stan Development Team 2016), which implements Markov chain Monte Carlo
(MCMC) using the No-U Turn Sampling (NUTS) algorithm
(Hoffman and Gelman 2014; Carpenter et al. 2017). Although estimation of the DFA model in
a Bayesian setting is not new (Aguilar and West 2000; Koop and Korobilis 2010; Stock and Watson 2011), it
presents several interesting challenges over the EM algorithm. In
addition to the constraints on
To minimize issues with label switching, previous work on Bayesian
factor analysis has proposed additional constraints on the loadings
matrix, including setting the elements of flip_trends()
to flip the posterior samples of
MCMC chains relative to the first chain as needed.
There are several approaches for modeling extreme deviations in time
series models. Techniques include modeling deviations as a two-component
mixture (Ward et al. 2007; Evin et al. 2011), or modeling deviations with
non-Gaussian distributions including the Student-t distribution
(Praetz 1972; Anderson et al. 2017; Anderson and Ward 2018). There are several
existing packages to include Student-t distributions; these include
heavy for applications to
regression and mixed effects models (Osorio and F. 2018),
bsts for univariate time
series models (Scott 2018), and
stochvol for stochastic
volatility models (Kastner 2016). Because switching from a Gaussian to
Student-t distribution only introduces a single parameter,
The trends of the dynamic factor model are most commonly modeled as
non-stationary random walks,
fit_dfa()
function. For both
the AR(1) and MA(1) components, we assume separate parameters for each
trend. Including the AR(1) component
Like factor analysis models, there are many solutions from a DFA model
capable of producing the same fit to the data. Following previous
authors, we use a varimax rotation of the loadings matrix
Since the number of trends in a DFA model is not a parameter, comparing
data support across models is often necessary. Using model selection
tools to identify data support is available via Akaike’s Information
Criterion (AIC) in packages implementing maximum likelihood for
estimation of state-space models (Petris 2010; Holmes et al. 2012b). In addition
to comparing the relative support of different number of trends, model
selection for Bayesian dynamic factor models may be useful for
evaluating the error structure for the residual error covariance matrix
find_dfa_trends()
to run multiple models specified by the user. It returns a table of
LOOIC values (denoting which of those failed convergence criteria) and
the model with the lowest LOOIC value.
As a diagnostic tool, we include the function find_swans()
to fitted
DFA models. We adopt the same approach and terminology for ‘black-swan
events’ as in Anderson et al. (2017), where black-swan events are rare and
unexpected extremes. Our find_swans()
function first-differences the
posterior mean estimates of each DFA trend and evaluates the probability
of observing a difference that is more extreme than expected under a
normal distribution with the same scale parameter. Events beyond a
user-defined threshold (e.g. 1 in 100, or 1 in 10,000) are then
classified as outliers and plotted.
To evaluate the ability of the Bayesian DFA model to identify anomalies
in latent processes, we created simulated data using our sim_dfa()
function. We generated simulated multivariate time series (
We fit the Bayesian DFA model with Student-t errors to each simulated dataset. As expected, the posterior estimates from these simulations illustrate that the ability to estimate low degrees of freedom is related to the magnitude of extremes (Figure 1). Similarly, higher observation error corrupts the ability to estimate extreme events, even when they are large in magnitude (Figure 1).
An alternative approach to DFA for dimension reduction of multivariate
time series data are Hidden Markov Models (HMMs). Like DFA models, they
model a latent process for a time series (or collection of multivariate
time series). Instead of the latent process being modeled continuously
(e.g. as a random walk in DFA), HMMs conceive the latent process as a
series of discrete-time, discrete-state first-order Markov chains
HMMs can be applied to raw multivariate data to identify latent states;
however, they may also be linked with DFA to identify regimes and
transitions in the latent DFA trends. Similar to DFA, applications of
HMMs are widely available in R, including via the packages
depmixS4 (Visser and Speekenbrink 2010),
HMM (Himmelmann 2010), and
msm (Jackson 2011).
Consistent with our implementation of the Bayesian DFA model, we include
fully Bayesian inference in Stan based on (Damiano et al. 2018). We apply
independent HMM models to each DFA trend to identify alternate states or
regimes. Like with the estimation of DFA models, we use the LOOIC metric
to evaluate the relative support for HMMs with different numbers of
underlying states, selecting the converged model with the lowest LOOIC.
By default, we assume the observation model of the input time series to
be normally distributed with the scale parameter equal to the estimated
residual variance. However, for some applications, such as datasets with
changing sampling frequencies over time, uncertainty in DFA trends may
also vary through time. To propogate this uncertainty forward, we also
allow the residual variance to be entered as a known quantity for every
data point in our find_regimes()
function.
To illustrate an example application of the bayesdfa package to real
data, we use monthly anomalies of sea surface temperature (SST, measured
in
Initially, we fit a DFA model with 2 hidden trends, and will assume the
4 time series to have the same error variances estimate_nu()
. To
evaluate whether these data support an extreme DFA with trends modeled
as a t-distribution, we will fit two competing forms: one modeling the
random walks with a Gaussian distribution, and the other using a
Student-t distribution. Generating posterior samples for each model
takes approximately 7 minutes per chain, when MCMC chains aren’t run in
parallel.
After fitting the models, we confirm whether the MCMC chains are
consistent with convergence using a threshold value of is_converged()
function. We also visually
inspect chain traceplots (e.g. Figure 3) and check the minimum effective
sample size across parameters: NaN.
As a consistency diagnostic, we also retrieve the estimated degrees of
freedom from the Student-t model
We will focus the remaining portion of our analysis on the results from the DFA model with Student-t deviations. In Figure 5, we observe that Trend 1 and Trend 2 both support SST anomalies increasing over the latter half of the time series. Both trends appear to have reversed direction (reverting to the mean in the last 2–3 years) and this pattern is more evident in Trend 1. Because we do not model seasonality explicitly, for example by including a covariate effect for the month, each of the estimated trends also includes the within-year variability that describes seasonal patterns in observed sea surface temperature.
stats::varimax()
rotation.
In the violin plot of Figure 6, we note that more southern stations (24 and 30N) contribute largely to Trend 1, while the more northern stations appear to load more heavily on Trend 2.
stats::varimax()
rotation.
For each trend, we apply independent HMMs to examine the support for differing numbers of underlying regimes. Both the posterior mean and standard deviation (optional argument) will be the inputs to the HMM.
Using LOOIC as a metric of support for the number of regimes, the estimates reported in Table 1 support the inclusion of 2 regimes for both Trends 1 and 2.
Regimes | LOOIC Trend 1 | LOOIC Trend 2 |
---|---|---|
1 | 855.5 | 756.7 |
2 | 31.0 | 30.3 |
3 | 69.1 | 99.9 |
4 | 139.7 | 164.6 |
Our fit_regimes()
function computes the probability of each time point
being in one of the regime states, which may also be visualized using
plot_regime_model()
. For example, the output of the 2-regime model for
Trend 1 in Figure 7 suggests a change in the middle of the time series,
then changing back again to State 1. Similarly, by the end of the
series, the HMM assigns Trend 1 to being in State 1.
There are a number of extensions to our implementation of the Bayesian DFA model with extremes that could make the model more applicable to a wider range of problems. Examples for the process model include adopting a skew-t distribution for asymmetric extremes. For models estimating multiple trends, multiple parameters may be treated hierarchically (e.g. covariate effects, variance parameters). For the observation or data model, our implementation of the Bayesian DFA model only includes data arising from a Gaussian or Student-t distribution, though this could be extended to include discrete or other continuous densities. Finally, spatial dynamic factor models (sDFA) have emerged as a useful tool for complicated multivariate spatial datasets (Lopes et al. 2011; Thorson et al. 2015), and could be similarly implemented in Stan.
This paper presents the bayesdfa package for applying Bayesian DFA to multivariate time series as a dimension reduction tool, particularly if extreme events may be present in observed data. In addition to allowing for the inclusion of covariates, we also extend the conventional dynamic factor model to include optionial moving average and autoregressive components in the latent trends. Applying this package to a dataset of sea surface temperature from the Northeast Pacific Ocean, we fit DFA models with Gaussian and Student-t errors. Though the model with Student-t errors has slightly lower LOOIC, the results from the two models are similar. Output from these 2-trend DFA models of sea surface temperature are useful in demonstrating a north-to-south gradient in temperature anomalies (Figure 6). Standardized temperature data from southern stations experience more interannual variability and temperatures that are greater in magnitude compared to northern stations (Figure 5). We also illustrate how latent trends from DFA models can be analyzed in a HMM framework to identify regimes and transitions; applied to the sea surface temperature data, both Trend 1 and Trend 2 support 2-regime models (roughly interpreted as ‘warm’ and ‘cool’ regimes; Figure 7).
This work was funded by NOAA’s Fisheries and the Environment (FATE) Program. Development of this package benefitted from discussions with other members of our working group (Jin Gao, Chris Harvey, Sam McClatchie, Stepahni Zador) and scientists at the Northwest Fisheries Science Center (including Mark Scheuerell, James Thorson, Eli Holmes, and Kelly Andrews). 2 anonymous reviewers helped improve the clarity and plots of this paper.
dlm, KFAS, MARSS, tsfa, rstan, heavy, bsts, stochvol, loo, depmixS4, HMM, msm
Bayesian, Cluster, Distributions, Finance, MixedModels, Survival, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ward, et al., "Modeling regimes with extremes: the bayesdfa package for identifying and forecasting common trends and anomalies in multivariate time-series data", The R Journal, 2019
BibTeX citation
@article{RJ-2019-007, author = {Ward, Eric J. and Anderson, Sean C. and Damiano, Luis A. and Hunsicker, Mary E. and Litzow, Michael A.}, title = {Modeling regimes with extremes: the bayesdfa package for identifying and forecasting common trends and anomalies in multivariate time-series data}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {2}, issn = {2073-4859}, pages = {46-55} }