We present an R package bssm for Bayesian non-linear/non-Gaussian state space modeling. Unlike the existing packages, bssm allows for easy-to-use approximate inference based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The package also accommodates discretely observed latent diffusion processes. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional importance sampling post-correction to eliminate any approximation bias. The package also implements a direct pseudo-marginal MCMC and a delayed acceptance pseudo-marginal MCMC using intermediate approximations. The package offers an easy-to-use interface to define models with linear-Gaussian state dynamics with non-Gaussian observation models and has an Rcpp interface for specifying custom non-linear and diffusion models.
State space models (SSM) are a flexible class of latent variable models commonly used in analyzing time series data (cf. Durbin and Koopman 2012). There are several packages available for state space modeling for R, especially for two special cases: a linear-Gaussian SSM (LGSSM) where both the observation and state densities are Gaussian with linear relationships with the states, and an SSM with discrete state space, which is sometimes called a hidden Markov model (HMM). These classes admit analytically tractable marginal likelihood functions and conditional state distributions (conditioned on the observations), making inference relatively straightforward. See, for example, Petris and Petrone (2011; Tusell 2011; Helske 2017; Helske and Helske 2019) for a review of some of the R packages dealing with these type of models. The present R package bssm is designed for Bayesian inference of general state space models with non-Gaussian and/or non-linear observational and state equations. The package’s primary aim is to provide easy-to-use and fast functions for fully Bayesian inference with common time series models such as the basic structural time series model (Harvey 1989) with exogenous covariates and simple stochastic volatility models. The package also accommodates custom non-linear models and discretized diffusion models.
When extending the state space modeling to non-linear or non-Gaussian models, some difficulties arise. As the likelihood is no longer analytically tractable, computing the latent state distributions and hyperparameter estimation of the model becomes more challenging. One general option is to use Markov chain Monte Carlo (MCMC) methods targeting the full joint posterior of hyperparameters and the latent states, for exampl,e by Gibbs sampling or Hamiltonian Monte Carlo. Unfortunately, the joint posterior is typically very high dimensional, and due to the strong autocorrelation structures of the state densities, the efficiency of such methods can be relatively poor. Another asymptotically exact approach is based on the pseudo-marginal particle MCMC approach (Andrieu et al. 2010), where the likelihood function and the state distributions are estimated using sequential Monte Carlo (SMC), i.e., the particle filter (PF) algorithm. Instead of computationally demanding Monte Carlo methods, approximation-based methods such as extended and unscented Kalman filters may be used, as well as Laplace approximations, which are provided for example by the INLA (Lindgren and Rue 2015) R package. These approximations are computationally appealing but may lead to hard-to-quantify biases of the posterior.
Some of the R packages suitable for Bayesian state space modeling include pomp (King et al. 2016), rbi (Jacob and Funk 2020), nimbleSMC (Michaud et al. 2020; NIMBLE Development Team 2020), and rstan (Stan Development Team 2020). With the package pomp, the user defines the model using R or C snippets for simulation from and evaluation of the latent state and observation level densities, allowing flexible model construction. The rbi package is an interface to LibBi (Murray 2015), a standalone software with a focus on Bayesian state space modeling on high-performance computers. The pomp package provides several simulation-based inference methods mainly based on iterated filtering and maximum likelihood, whereas rbi is typically used for Bayesian inference via particle MCMC. For a more detailed comparison of differences of rbi/LibBi and pomp with examples; see (Funk and King 2020). The nimbleSMC package contains some particle filtering algorithms which can be used in the general Nimble modeling system (de Valpine et al. 2017), whereas the rstan package provides an R interface to the Stan C++ package, a general statistical modeling platform (Carpenter et al. 2017).
The key difference to the aforementioned packages and the motivation behind the present bssm package is to combine the use of fast approximation-based methods with the Monte Carlo correction step, leading to computationally efficient and unbiased (approximation error free) inference of the joint posterior of hyperparameters and latent states, as suggested in (Vihola et al. 2020). In a nutshell, the method uses MCMC, which targets an approximate marginal posterior of the hyperparameters and an importance sampling type weighting which provides asymptotically exact inference on the joint posterior of hyperparameters and the latent states. In addition to this two-stage procedure, the bssm also supports delayed acceptance pseudo-marginal MCMC (Christen and Fox 2005) using the approximations and direct pseudo-marginal MCMC. To our knowledge, importance sampling and delayed acceptance in this form are not available in other Bayesian state space modeling packages in R.
We denote the sequence of observations \((y_1,\ldots,y_T)\) as \(y\), and the sequence of latent state variables \((\alpha_1,\ldots, \alpha_T)\) as \(\alpha\). The latent states \(\alpha_t \in \mathbb{R}^d\) are typically vector-valued, whereas we focus mainly on scalar observations \(y_t \in \mathbb{R}\) (vector-valued observations are also supported, assuming conditional independence (given \(\alpha_t\)) in case of non-Gaussian observations).
A general state space model consists of two parts: observation level densities \(g_t^{(\theta)}(y_t | \alpha_t)\) and latent state transition densities \(\mu_t^{(\theta)}(\alpha_{t+1} | \alpha_t)\). Typically, both \(g_t^{(\theta)}\) and \(\mu_t^{(\theta)}\) depend on unknown parameter vector \(\theta\) for which we can define arbitrary prior \(p(\theta)\).
In a linear-Gaussian SSM, both \(g_t^{(\theta)}\) and \(\mu_t^{(\theta)}\) are Gaussian densities, and they depend linearly on the current and previous state vectors, respectively. Section 2.1 describes a common extension to these models supported by bssm, which relaxes the assumptions on observational density \(g_t^{(\theta)}\), by allowing exponential family links and stochastic volatility models. While the main focus of bssm is in state space models with linear-Gaussian state dynamics, there is also support for more general non-linear models, discussed briefly in Section 2.2. Section 4 describes how arbitrary models based on these definitions are constructed in bssm.
The primary class of models supported by bssm consists of SSMs with linear-Gaussian state dynamics of form \[\begin{aligned} \alpha_{t+1} &= c_t + T_t \alpha_t + R_t \eta_t, \end{aligned}\] where \(c_t \in\mathbb{R}^d\), \(T_t \in\mathbb{R}^{d\times d}\), and \(R_t\in\mathbb{R}^{d \times k}\) can depend on the unknown parameters \(\theta\) and covariates. The noise terms \(\eta_t \sim N(0, I_k)\) and \(\alpha_1 \sim N(a_1, P_1)\) are independent. These state dynamics can be combined with the observational level density \(g_t\) of form \[g_t(y_t | d_t + Z_t \alpha_t, \phi, u_t),\] where parameters \(\phi\) and the known vector \(u_t\) are distribution specific and can be omitted in some cases. Currently, following observational level distributions are supported:
For multivariate models, these distributions can be combined arbitrarily, except the stochastic volatility model case, which is currently handled separately. Also, for a fully Gaussian model, the observational level errors \(\epsilon_t\) can be correlated across time series.
The general non-linear Gaussian model in the bssm has the following form: \[\begin{aligned} y_t &= Z(t, \alpha_t, \theta) + H(t, \alpha_t, \theta)\epsilon_t,\\ \alpha_{t+1} &= T(t, \alpha_t, \theta) + R(t, \alpha_t, \theta)\eta_t,\\ \alpha_1 &\sim N(a_1(\theta), P_1(\theta)), \end{aligned}\] with \(t=1,\ldots, n\), \(\epsilon_t \sim N(0,\textrm{I}_p)\), and \(\eta \sim N(0,\textrm{I}_k)\).
The bssm package also supports models where the state equation is defined as a continuous-time diffusion model of the form \[\textrm{d} \alpha_t = \mu(\alpha_t,\theta) \textrm{d} t + \sigma(\alpha_t, \theta) \textrm{d} B_t, \quad t\geq0,\] where \(B_t\) is a Brownian motion and where \(\mu\) and \(\sigma\) are scalar-valued functions, with the univariate observation density \(p(y_k | \alpha_k)\) defined at integer times \(k=1\ldots,n\).
The main goal of bssm is to facilitate easy-to-use full Bayesian inference of the joint posterior \(p(\alpha, \theta | y)\) for models discussed in Section 2. The inference methods implemented in bssm are based on a factorized approach where the joint posterior of hyperparameters \(\theta\) and latent states \(\alpha\) is given as \[p(\alpha, \theta | y) \propto p(\theta) p(\alpha, y | \theta) = p(\theta) p(y | \theta) p( \alpha | y, \theta),\] where \(p(y | \theta)\) is the parameter marginal likelihood and \(p(\alpha | y, \theta)\) is the smoothing distribution.
All the inference algorithms are based on a Markov chain Monte Carlo on the parameters \(\theta\), whose single iteration may be summarised as follows:
Draw a proposal \(\theta' \sim N(\theta^{i-1}, \Sigma_{i-1})\).
Calculate the (approximate) marginal likelihood \(\hat{p}(y | \theta')\).
Accept the proposal with probability \(\alpha := \min\Big\{1, \frac{p(\theta')\hat{p}(y | \theta')}{p(\theta^{i-1}) \hat{p}(y | \theta^{i-1})}\Big\}\).
If the proposal \(\theta'\) is accepted, set \(\theta^i = \theta'\). Otherwise, set \(\theta^i = \theta^{i-1}\).
Adapt the proposal covariance matrix \(\Sigma_{i-1}\to \Sigma_i\).
The adaptation step 5 in bssm currently implements the robust adaptive Metropolis algorithm (Vihola 2012) with a fixed target acceptance rate (0.234 by default) provided by the ramcmc package (Helske 2016). The (approximate) marginal likelihood \(\hat{p}(y | \theta)\) takes different forms, leading to different inference algorithms discussed below.
The simplest case is with a linear-Gaussian SSM, where we can use the exact marginal likelihood \(\hat{p}(y | \theta)=p(y | \theta)\), in which case Algorithm 1 reduces to (an adaptive) random-walk Metropolis algorithm targeting the posterior marginal of the parameters \(\theta\). Inference from the full posterior may be made using the simulation smoothing algorithm (Durbin and Koopman 2002) conditional to the sampled hyperparameters.
The other ‘direct’ option, which can be used with any model, is using the bootstrap particle filter (BSF) (Gordon et al. 1993), which leads to a random \(\hat{p}(y | \theta)\) which is an unbiased estimator of \(p(y | \theta)\). In this case, Algorithm 1 reduces to (an adaptive) particle marginal Metropolis-Hastings (Andrieu et al. 2010). The full posterior inference is achieved simultaneously, by picking particle trajectories based on their ancestries as in the filter-smoother algorithm (Kitagawa 1996). Note that with BSF, the desired acceptance rate needs to be lower, depending on the number of particles used (Doucet et al. 2015).
The direct BSF discussed above may be used with any non-linear and/or non-Gaussian model but may be slow and/or poor mixing. To alleviate this, bssm provides computationally efficient (intermediate) approximate inference in case of non-Gaussian observation models in Section 2.1 and in case of non-linear dynamics in Section 2.2.
With non-Gaussian models with linear-Gaussian dynamics, we use an approximating Gaussian model \(\tilde{p}(y,\alpha | \theta)\) which is a Laplace approximation of \(p(\alpha, y | \theta)\) following (Durbin and Koopman 2000). We write the likelihood as follows \[\begin{aligned} p(y | \theta) &= \int p(\alpha, y | \theta)\textrm{d}\alpha = \tilde{p}(y | \theta) E\left[\frac{p(y| \alpha, \theta)}{\tilde p(y| \alpha, \theta)}\right], \end{aligned}\] where \(\tilde p(y | \theta)\) is the likelihood of the Laplace approximation, and the expectation is taken with respect to its conditional \(\tilde p(\alpha|y, \theta)\) (Durbin and Koopman 2012). Indeed, denoting \(\hat{\alpha}\) as the mode of \(\tilde{p}(\alpha | \theta, y)\), we may write \[\begin{aligned} \log p(y | \theta) &= \log \tilde p(y | \theta) + \log \frac{p(y| \hat \alpha, \theta)}{\tilde p(y| \hat \alpha, \theta)}+ \log E\left[\frac{p(y| \alpha, \theta) / p(y | \hat \alpha, \theta)}{\tilde p(y| \alpha, \theta) / \tilde p(y | \hat \alpha, \theta)}\right]. \end{aligned}\] If \(\tilde{p}\) resembles \(p\) with typical values of \(\alpha\), the latter logarithm of expectation is zero. We take \(\hat{p}(y |\theta)\) as the expression on the right, dropping the expectation.
When \(\hat{p}\) is approximate, the MCMC algorithm targets an approximate posterior marginal. Approximate full inference may be done analogously as in the case of the previous section by simulating trajectories conditional to the sampled parameter configurations \(\theta^i\). We believe that approximate inference is often good enough for model development, but we strongly recommend using post-correction as discussed in Section 3.3 to check the validity of the final inference.
In addition to these algorithms, bssm also supports \(\hat{p}(y | \theta)\) based on the extended KF (EKF) or iterated EKF (IEKF) (Jazwinski 1970), which can be used for models with non-linear dynamics. Approximate smoothing based on (iterated) EKF is also supported. It is also possible to perform direct inference which instead of the BSF, employs particle filter based on EKF (Van Der Merwe et al. 2001).
The approximate inference methods of the previous section are computationally efficient but come with a bias. The bssm implements importance-sampling type post-correction as discussed in (Vihola et al. 2020). Indeed, having MCMC samples \((\theta^i)\) from the approximate posterior, we may produce (random) weights and latent states, such that the weighted samples form estimators which are consistent with respect to the true posterior \(p(\alpha,\theta|y)\).
The primary approach which we recommend for post-correction is based on a "\(\psi\)-APF", — a particle filter using intermediate Gaussian approximations in the previous section. In essence, this particle filter employs the dynamics and a look-ahead strategy coming from the approximation, which leads to low-variance estimators; see (Vihola et al. 2020) and package vignettes1 for a more detailed description. Naturally, \(\psi\)-APF can also be used in place of BSF in the direct inference of Section 3.1.
An alternative to approximate MCMC and post-correction, bssm also supports an analogous delayed acceptance method (Christen and Fox 2005; Banterle et al. 2019) (here denoted by DA-MCMC). This algorithm is similar to 1, but in the case of "acceptance", it leads to second-stage acceptance using the same weights as the post-correction would; see (Vihola et al. 2020) for details. Note that as in the direct approach for non-Gaussian/non-linear models, the desired acceptance rate with DA-MCMC should be lower than the default 0.234.
The DA-MCMC also leads to consistent posterior estimators and often outperforms the direct particle marginal Metropolis-Hastings. However, empirical findings (Vihola et al. 2020) and theoretical considerations (Franks and Vihola 2020) suggest that approximate inference with post-correction may often be preferable. The bssm supports parallelization with post-correction using OpenMP, which may further promote the latter.
For general continuous-time diffusion models, the transition densities are intractable. The bssm uses Millstein time-discretization scheme for approximate simulation, and inference is based on the corresponding BSF. Fine time-discretization mesh gives less bias than the coarser one, with increased computational complexity. The DA and IS approaches can be used to speed up the inference by using coarse discretization in the first stage and then using more fine mesh in the second stage. For comparison of DA and IS approaches in the case of geometric Brownian motion model, see (Vihola et al. 2020).
The main functions of bssm related to the MCMC sampling,
approximations, and particle filtering are written in C++, with help the
of Rcpp (Eddelbuettel and François 2011) and
RcppArmadillo
(Eddelbuettel and Sanderson 2014) packages. On the R side, the package uses S3 methods to
provide a relatively unified workflow independent of the type of model
one is working with. The model building functions such as bsm_ng
and
svm
are used to construct the model objects of the same name, which
can then be passed to other methods, such as logLik
and run_mcmc
,
which compute the log-likelihood value and run MCMC algorithm,
respectively. We will now briefly describe the main functionality of
bssm. For more detailed descriptions of different functions and their
arguments, see the corresponding documentation in R and the package
vignettes.
For models with linear-Gaussian state dynamics, bssm includes some
predefined models such as bsm_lg
and bsm_ng
for univariate Gaussian
and non-Gaussian structural time series models with external covariates,
for which the user only needs to supply the data and priors for unknown
model parameters. In addition, bssm supports general model building
functions ssm_ulg
, ssm_mlg
for custom univariate and multivariate
Gaussian models, and ssm_ung
and ssm_mng
for their non-Gaussian
counterparts. For these models, users need to supply their own R
functions for the evaluation of the log prior density and for updating
the model matrices given the current value of the parameter vector
\(\theta\). It is also possible to avoid defining the matrices manually by
leveraging the formula interface of the
KFAS package (Helske 2017)
together with as_bssm
function which converts the KFAS model to a
bssm equivalent model object. This is especially useful in the case of
complex multivariate models with covariates.
As an example, consider a Gaussian local linear trend model of the form
\[\begin{aligned}
y_t &= \mu_t + \epsilon_t,\\
\mu_{t+1} &= \mu_t + \nu_t + \eta_t,\\
\nu_{t+1} &= \nu_t + \xi_t,
\end{aligned}\]
with zero-mean Gaussian noise terms \(\epsilon_t, \eta_t, \xi_t\) with
unknown standard deviations. Using the time series of the mean annual
temperature (in Fahrenheit) in New Haven, Connecticut, from 1912 to 1971
(available in the datasets
package) as an example, this model can be
built with bsm
function as
library("bssm")
data("nhtemp", package = "datasets")
<- halfnormal(1, 10)
prior <- bsm_lg(y = nhtemp, sd_y = prior, sd_level = prior, sd_slope = prior) bsm_model
Here, we use a helper function, halfnormal
, which defines half-Normal
prior distribution for the standard deviation parameters, with the first
argument defining the initial value of the parameter and the second
defines the scale parameter of the half-Normal distribution. Other prior
options are normal
, tnormal
(truncated normal), gamma
, and
uniform
.
As an example of a multivariate model, consider bivariate Poisson model
with latent random walk model, defined as
\[\begin{aligned}
y_{i,t} &\sim \textrm{Poisson}(\exp(x_t)), \quad i = 1, 2,\\
x_{t+1} &= x_t + \eta_t,
\end{aligned}\]
with \(\eta_t \sim N(0, \sigma^2)\), and prior
\(\sigma \sim \textrm{Gamma}(2,0.01)\). This model can be built with
ssm_mng
function as
# Generate observations
set.seed(1)
<- cumsum(rnorm(50, sd = 0.2))
x <- cbind(
y rpois(50, exp(x)),
rpois(50, exp(x)))
# Log prior density function
<- function(theta) {
prior_fn dgamma(theta, 2, 0.01, log = TRUE)
}
# Model parameters from hyperparameters
<- function(theta) {
update_fn list(R = (theta, c(1, 1, 1)))
}
# define the model
<- ssm_mng(y = y, Z = matrix(1,2,1), T = 1,
mng_model R = 0.1, P1 = 1, distribution = "poisson",
init_theta = 0.1,
prior_fn = prior_fn, update_fn = update_fn)
Here, the user-defined functions prior_fn
and update_fn
define the
log-prior for the model and how the model components depend on the
hyperparameters \(\theta\), respectively.
For models where the state equation is no longer linear-Gaussian, we use
a pointer-based interface by defining all model components as well as
functions defining the Jacobians of \(Z(\cdot)\) and \(T(\cdot)\) needed by
the extended Kalman filter as C++ snippets. The general non-linear
Gaussian model can be defined with the function ssm_nlg
. Discretely
observed diffusion models where the state process is assumed to be a
continuous stochastic process can be constructed using the ssm_sde
function, which takes pointers to C++ functions defining the drift,
diffusion, the derivative of the diffusion function, and the
log-densities of the observations and the prior. As an example of the
latter, let us consider an Ornstein–Uhlenbeck process
\[\textrm{d} \alpha_t = \rho (\nu - \alpha_t) \textrm{d} t + \sigma \textrm{d} B_t,\]
with parameters \(\theta = (\phi, \nu, \sigma) = (0.5, 2, 1)\) and the
initial condition \(\alpha_0 = 1\). For observation density, we use
Poisson distribution with parameter \(\exp(\alpha_k)\). We first simulate
a trajectory \(x_0, \ldots, x_n\) using the sde.sim
function from the
sde package (Iacus 2016) and use
that for the simulation of observations \(y\):
library("sde")
<- sde.sim(t0 = 0, T = 100, X0 = 1, N = 100,
x drift = expression(0.5 * (2 - x)),
sigma = expression(1),
sigma.x = expression(0))
<- rpois(100, exp(x[-1])) y
We then compile and build the model as
::sourceCpp("ssm_sde_template.cpp")
Rcpp<- create_xptrs()
pntrs <- ssm_sde(y, pntrs$drift, pntrs$diffusion,
sde_model $ddiffusion, pntrs$obs_density, pntrs$prior,
pntrsc(0.5, 2, 1), 1, FALSE)
The templates for the C++ functions for SDE and non-linear Gaussian models can be found from the package vignettes on the CRAN2.
The main purpose of the bssm is to allow computationally efficient
MCMC-based inference for various state space models. For this task, a
method run_mcmc
can be used. The function takes several arguments,
depending on the model class, but for many of these, default values are
provided. For linear-Gaussian models, we only need to supply the number
of iterations. Using the previously created local linear trend model for
the New Haven temperature data of Section
4.1, we run an MCMC with 100,000 iterations
where the first 10,000 is discarded as a burn-in (burn-in phase is also
used for the adaptation of the proposal distribution):
<- run_mcmc(bsm_model, iter = 1e5, burnin = 1e4) mcmc_bsm
The print
method for the output of the MCMC algorithms gives a summary
of the results, and detailed summaries for \(\theta\) and \(\alpha\) can be
obtained using summary
function. For all MCMC algorithms, bssm uses
so-called jump chain representation of the Markov chain
\(X_1,\ldots,X_n\), where we only store each accepted \(X_k\) and the number
of steps we stayed on the same state. So for example if
\(X_{1:n} = (1,2,2,1,1,1)\), we present such chain as
\(\tilde X = (1,2,1)\), \(N=(1,2,3)\). This approach reduces the storage
space and makes it more computationally efficient to use importance
sampling type correction algorithms. One drawback of this approach is
that the results from the MCMC run correspond to weighted samples from
the target posterior, so some of the commonly used postprocessing tools
need to be adjusted. Of course, in the case of other methods than
IS-weighting, the simplest option is to just expand the samples to a
typical Markov chain using the stored counts \(N\). This can be done using
the function expand_sample
, which returns an object of class "mcmc"
of the coda package (Plummer et al. 2006)
(thus the plotting and diagnostic methods of coda can also be used).
We can also directly transform the posterior samples to a "data.frame"
object by using as.data.frame
method for the MCMC output (for
IS-weighting, the returned data frame contains additional column
weights
). This is useful, for example, for visualization purposes with
the ggplot2 (Wickham 2016)
package:
library("ggplot2")
<- as.data.frame(mcmc_bsm, variable = "theta")
d ggplot(d, aes(x = value)) +
geom_density(bw = 0.1, fill = "#9ebcda") +
facet_wrap(~ variable, scales = "free") +
theme_bw()
Figure 1 shows the estimated posterior densities of the three standard deviation parameters of the model. The relatively large observational level standard deviation \(\sigma_y\) suggests that the underlying latent temperature series is much smoother than the observed series, which can also be seen from Figure 2, which shows the original observations (black dots) spread around the estimated temperature series (solid line). This figure was drawn using dplyr (Wickham et al. 2021) and ggplot2 with the following code:
library("dplyr")
<- as.data.frame(mcmc_bsm, variable = "states")
d <- d %>%
summary_y filter(variable == "level") %>%
group_by(time) %>%
summarise(mean = mean(value),
lwr = quantile(value, 0.025),
upr = quantile(value, 0.975))
ggplot(summary_y, aes(x = time, y = mean)) +
geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.25) +
geom_line() +
geom_point(data = data.frame(mean = nhtemp,
time = time(nhtemp))) +
theme_bw() + xlab("Year") +
ylab("Mean annual temperature in New Haven")
For non-Gaussian models, the default MCMC algorithm is an approximate
inference based on Laplace approximation combined with importance
sampling post-correction. It is also possible to perform first
approximate MCMC using the argument mcmc_type = "approx"
and then
perform the post-correction step using the results from the approximate
MCMC. In doing so, we can also use the function suggest_N
to find a
suitable number of particles \(N\) for \(\psi\)-APF in the spirit of
Doucet et al. (2015):
<- run_mcmc(mng_model, mcmc_type = "approx", iter = 50000)
out_approx <- suggest_N(mng_model, out_approx)
est_N <- post_correct(mng_model, out_approx, particles = est_N$N) out_exact
The function suggest_N
computes the standard deviation of the
logarithm of the post-correction weights (i.e., the random part of
log-likelihood of \(\psi\)-APF) at the approximate MAP estimator of
\(\theta\) using a range of \(N\) and returns a list with component N
which is the smallest number of particles where the standard deviation
was less than one. For small and moderate problems, typically, 10-20
particles are enough.
The bssm package also offers separate methods for performing (approximate) state filtering and smoothing which may be useful in some custom settings.
For LGSSM, methods kfilter
and smoother
perform Kalman filtering and
smoothing. For non-Gaussian models with linear-Gaussian dynamics,
approximate filtering and smoothing estimates can be obtained by calls
to kfilter
and smoother
. These functions first construct an
approximating Gaussian model for which the Kalman filter/smoother is
then applied. For non-linear models defined by nlg_ssm
we can run
approximate filtering using the extended Kalman filter with the function
ekf
, the unscented Kalman filter with the function ukf
, or the
iterated EKF by changing the argument iekf_iter
of the ekf
function.
Function ekf_smoother
can be used for smoothing based on EKF/IEKF.
For particle filtering, the bssm package supports a general bootstrap
particle filter for all model classes of the bssm (function
bootstrap_filter
). For "nlg_ssm"
, extended Kalman particle filtering
(Van Der Merwe et al. 2001) is also supported (function ekpf_filter
). For particle
smoothing, function particle_smoother
with the smoothing based on BSF
is available for all models. In addition, \(\psi\)-APF (using argument
method = "psi"
) is available for all models except for "ssm_sde"
class. Currently, only the filter-smoother approach (Kitagawa 1996) for
particle smoothing is supported.
Vihola et al. (2020) compared the computational efficiency of delayed acceptance MCMC and importance sampling type MCMC approaches in various settings. Here we make a small experiment comparing the generic Hamiltonian Monte Carlo using the NUTS sampler (Hoffman and Gelman 2014) with rstan and IS-MCMC with bssm. Given that the bssm package is specialized for state space models whereas Stan is a general purpose tool suitable for a wider range of problems, it is to be expected that bssm performs better in terms of computational efficiency. The purpose of this comparison is to illustrate this fact, i.e., that there is still demand for specialized algorithms for various types of statistical models. For the complete code of the experiment, see supplementary materials.
We consider the case of a random walk with drift model with negative binomial observations and some known covariate \(x_t\), defined as \[\begin{aligned} y_t &\sim \textrm{NB}(\exp(\beta x_t + \mu_t), \phi),\\ \mu_{t+1} &= \mu_t + \nu_t + \eta_t,\\ \nu_{t+1} &= \nu_t, \end{aligned}\] with zero-mean Gaussian noise term \(\eta_t\) with unknown standard deviation \(\sigma_\mu\). Based on this we simulate one realization of \(y\) and \(x\) with \(n=200\), \(\phi=5\), \(\beta= -0.9\), \(\nu = 0.01\), \(\sigma_\mu = 0.1\).
For the IS approach, we use ng_bsm
function for model building, with
prior variances 100 and 0.01 for the initial states \(\mu_1\) and \(\nu_1\).
For hyperparameters, we used a fairly uninformative half-Normal
distribution with a standard deviation of 0.5 for \(\sigma_\mu\) and 0.1
for \(\sigma_\nu\). We then ran the IS-MCMC algorithm with run_mcmc
using a burn-in phase of length 10,000 and ran 50,000 iterations after
the burn-in, with 10 particles per SMC.
Using the same setup, we ran the MCMC with rstan using 15,000 iterations (with the first 5000 used for warm-up). Note that in order to avoid sampling problems, it was necessary to tweak the default control parameters of the sampler (see Appendix).
Table 1 shows the results. We see both methods produce identical results (within the Monte Carlo error), but while rstan produces similar Monte Carlo standard errors with a smaller amount of total iterations than bssm, the total computation time of rstan is almost 80 times higher than with bssm (58 minutes versus 45 seconds), which suggests that for these types of problems it is highly beneficial to take advantage of the known model structure and available approximations versus general Bayesian software such as Stan which makes no distinction between latent states \(\alpha\) and hyperparameters \(\theta\).
bssm | rstan | |||||
lr)2-4 | ||||||
lr)5-7 | Mean | SD | MCSE | Mean | SD | MCSE |
\(\sigma_{\mu}\) | \(0.092\) | \(0.037\) | \(9\times 10^{-4}\) | \(0.090\) | \(0.036\) | \(9\times 10^{-4}\) |
\(\sigma_{\nu}\) | \(0.003\) | \(0.003\) | \(5\times 10^{-5}\) | \(0.003\) | \(0.003\) | \(7\times 10^{-5}\) |
\(\phi\) | \(5.392\) | \(0.910\) | \(2\times 10^{-2}\) | \(5.386\) | \(0.898\) | \(1\times 10^{-2}\) |
\(\beta\) | \(-0.912\) | \(0.056\) | \(1\times 10^{-3}\) | \(-0.911\) | \(0.056\) | \(7\times 10^{-4}\) |
\(\mu_{200}\) | \(6.962\) | \(0.346\) | \(5\times 10^{-3}\) | \(6.965\) | \(0.349\) | \(4\times 10^{-3}\) |
\(\nu_{200}\) | \(0.006\) | \(0.020\) | \(3\times 10^{-4}\) | \(0.006\) | \(0.019\) | \(2\times 10^{-4}\) |
State space models are a flexible tool for analyzing a variety of time series data. Here, we introduced the R package bssm for fully Bayesian state space modeling for a large class of models with several alternative MCMC sampling strategies. All computationally intensive parts of the package are implemented with C++ with parallel computation support for IS-MCMC making it an attractive option for many common models where relatively accurate Gaussian approximations are available.
Compared to early versions of the bssm package, the option to define R
functions for model updating and prior evaluation has lowered the bar
for analyzing custom models. The package is also written in a way that
it is relatively easy to extend to new model types similar to current
bsm_lg
in the future. The bssm package could be expanded to allow
other proposal adaptation schemes such as the adaptive Metropolis
algorithm by Haario et al. (2001), as well as support for multivariate SDE models and
automatic differentiation for EKF-type algorithms.
This work has been supported by the Academy of Finland research grants 284513, 312605, 315619, 311877, and 331817.
bssm, Rcpp, pomp, rbi, nimbleSMC, rstan, ramcmc, RcppArmadillo, KFAS, sde, coda, ggplot2, dplyr
Bayesian, Databases, DifferentialEquations, Epidemiology, Finance, GraphicalModels, HighPerformanceComputing, MixedModels, ModelDeployment, NumericalMathematics, Phylogenetics, Spatial, TeachingStatistics, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Helske & Vihola, "bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R", The R Journal, 2021
BibTeX citation
@article{RJ-2021-103, author = {Helske, Jouni and Vihola, Matti}, title = {bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {2}, issn = {2073-4859}, pages = {578-589} }