NGSSEML: Non-Gaussian State Space with Exact Marginal Likelihood

Thiago R. Santos; Dani Gamerman; Glaura C. Franco

doi:10.32614/RJ-2021-087

1 Introduction

Standard statistical software and packages in the State Space Model (SSM) context mainly implement Gaussian models, which are well developed in the literature. For example, the R-packages StructTS (Ripley 2002), dlm (Petris 2010), dlmodeler (Szymanski 2014), SSsimple (Zes 2019), and MARSS (Holmes et al. 2013) are a sample of them. The last introduces multivariate autoregressive state-space modeling or multivariate Gaussian state-space models. In spite of that, there is an increasing demand for methods or models to handle non-Gaussian and non-linear time series that can be applied to real data. In the SSM class, a general structure called Dynamic Generalized Linear Models (DGLM), proposed by West et al. (1985), attracted a great deal of interest due to its flexibility, allowing the observation distribution to belong to the exponential family. The book by Durbin and Koopman (2012) also analyzed non-Gaussian time series using importance sampling in SSM.

Many other researchers have devoted themselves, over the last decades, to auxiliary procedures to handle non-Gaussian State Space under the Bayesian approach (see, for example, Andrieu and Doucet (2002; Carvalho et al. 2010), among others). Some attempts to code NGSS include the sspir (Dethlefsen and Lundbye-Christensen 2006), pomp (King et al. 2016), KFAS (Helske 2016), bssm (Helske and Vihola 2021), and dynamichazard (Christoffersen et al. 2021) packages in the R environment (Team 2021) and SsfPack (Koopman et al. 1999) in the Ox software (Doornik 1997). The SsfPack is the most consolidated package with several updated versions.

Nevertheless, even for simple structures in this class of models, the analytical tractability is easily lost, and, hence, approximations are required in order to perform inferential procedures. Thus, all packages mentioned above use intensive computational methods for making inferences for the model parameters, like sequential Monte Carlo methods or particle filters, importance sampling, particle Markov chain Monte Carlo (MCMC), etc.

In spite of this increasing number of models and packages to handle non-Gaussian SSM, a common feature shared by all of them is the need to use approximations to perform inferential procedures even in very simple cases. Gamerman et al. (2013) identified a subset of NGSS that allows for exact and analytical calculation of the marginal likelihood and the predictive and filtering distributions, called NGSSEML. This family unified several different models previously existing in the literature and included more cases. These exact computations are extremely easy to code and can be performed effortlessly and in a simple way. Applications to real data are presented, and the codes are made available by the authors in Ox language. Also, NGSSEML includes the works of Aktekin et al. (2013, 2018), Pinho et al. (2016) for modeling count and volatility data, respectively, and was later extended by Santos et al. (2017) to include models commonly used in reliability and survival contexts.

Thus, in the face of the absence of consolidated software and packages to handle NGSSEML, the objective of this work is to unify the range of models encompassed and to introduce the R-package NGSSEML. The package illustrates how the NGSS models can be easily employed in non-Gaussian time series and reliability analysis using R under the Bayesian and classical perspectives. It also provides codes in the R environment for mainly formulating and specifying the NGSS models considered in Gamerman et al. (2013) and Santos et al. (2017). The main advantage of this package, which is designed for the specific class of NGSSEML, is that the models are implemented in a simple and flexible way. All other softwares listed above deal with state space models where the marginal likelihood can not be exactly obtained. The NGSSEML approach has the advantage of not needing to approximate the likelihood as others do, achieving that via an alternative evolution. This exact specification enables a computational gain, making the NGSSEML package an attractive option to handle non-Gaussian state space. Our approach to inference is to use (virtually) exact methods. Intensive computational methods are only required when the dimension of the unknown static parameter vector is high. In this case, the Adaptive Rejection Metropolis Sampling (ARMS) algorithm is used to approximate the marginal posterior of the static parameters (Petris 2010). Applications to count data, volatility time series, and reliability models, such as the piecewise exponential model (PEM) and Weibull and Gamma software reliability models are presented.

The paper is organized as follows. The NGSSEML family is presented in Section 2, as well as the inferential and smoothing procedures for the latent states. Section 3 introduces the package functions. Section 4 shows illustrative examples of time series and reliability data. Finally, Section 5 presents the main conclusions and final remarks.

2 The model

Gamerman et al. (2013) introduced a rich class of non-Gaussian state space models with exact marginal likelihood, called here NGSSEML, based on the seminal work of Smith and Miller (1986). Inference free from approximations can be performed in the NGSSEML, and this is its main advantage compared with other NGSS procedures, such as the DGLM. Due to the model formulation, some components, such as seasonality and the effect of external covariates, are introduced as fixed effects.

A time series $\{y_ {t}\}_{t=1}^n$ belongs to this class of models if its probability (density) function can be written as: \[\begin{aligned} \label{eq31fgmd} p(y_{t}|\mu_{t},\mu_{t-1},\ldots,\mu_{1},\mathbf Y_{t-1}, \mathbf \theta ) = & p(y_{t}|\mu_{t},\mathbf Y_{t-1},\mathbf \theta)\nonumber\\ = & a(y_{t},\mathbf \theta)\mu_{t}^{b(y_{t},\mathbf \theta)} \exp\left(-\mu_{t}c(y_{t},\mathbf \theta)\right), \end{aligned} \tag{1}\]

for $y_{t}\in S(\mathbf \theta)\subset\Re$ and $p(y_{t}|\mu_{t},\mathbf \theta)=0$, otherwise. Functions $a(\cdot)$, $b(\cdot)$, $c(\cdot)$, and $S(\cdot)$ are such that $p(y_{t}|\mu_{t},\mathbf \theta)\geq0$ and, therefore, $\mu_t > 0$, for all $t>0$. $\mu_t$ is the state at time $t$ and $\mathbf Y_{t}= \{ Y_0 , y_{1},\ldots,y_{t} \}$ represents previously available information.

It is also assumed that $\mathbf \theta$ varies in the $q$-dimensional parameter space .

The state $\mu_{t}$ in Equation ((1)) is defined as $\mu_{t}=\lambda_{t}g({\mathbf x}_{t}, {\mathbf \beta})$, where ${\mathbf \beta}$ are the regression coefficients (one of the components of $\mathbf{\theta}$), ${\mathbf x}_{t}$ is a covariate vector, and $\lambda_{t}$ is the latent variable related to the dynamic level. The usual specification for the link function $g$ is the logarithmic function given the positive nature of $\mu_t$, but other link functions dictated by the application may also be used. Given $\lambda_{t}$, the dynamic level $\lambda_{t+1}$ evolves according to the system equation

\[\begin{aligned} \label{eq4evol} w \frac{\lambda_{t+1}}{\lambda_{t}} \mid \lambda_{t},\mathbf Y_{t} , \mathbf \theta \sim \text{Beta}\left(w a_{t},(1-w) a_{t}\right), \end{aligned} \tag{2}\] where $a_{t}$ are values obtained in a sequential procedure and $0 < w < 1$. The parameter $w$ is responsible for increasing the variance over time and plays a similar role to that of discount factors in the DGLM (West and Harrison 1997). Finally, the initial prior distribution is given by $\lambda_{0}|Y_{0}\sim \text{Gamma}(a_{0},b_{0})$. Thus, the full model specification is completed.

There is a wide range of distributions that belong to this class of models. It includes many commonly known discrete and continuous distributions such as Poisson, Gamma, and Normal (with static mean) but also includes many other distributions that are not so common and some reliability models. Table 1 provides the form of functions $a$, $b$, $c$, and $S$ for some usual distributions in this family. .

This family provides a collection of distributions for modeling a variety of real-time series and reliability data that are of practical importance, including software reliability. A detailed explanation and some illustrations can be found in Pinho et al. (2016) and Santos et al. (2017).

Table 1: Special cases of the NGSSEML.
Models		$\mathbf \theta$	$a(y_{t},\mathbf \theta)$	$b(y_{t},\mathbf \theta)$	$c(y_{t},{\mathbf{\theta}})$
	Poisson	$w,\beta$	$(y_{t}!)^{-1}$	$y_{t}$	1
	Gamma	$w,\chi,\beta$	$y_{t}^{\chi-1}/ \Gamma(\chi)$	$\chi$	$y_{t}$
	Weibull	$w,\nu,\beta$	$\nu(y_{t})^{\nu-1}$	1	$(y_{t})^{\nu}$
Time	Normal	$w,\mu,\beta$	$(2\pi)^{-1/2}$	$1/2$	$(y_{t}- \mu )^{2}/2$
Series	Laplace	$w,\mu,\beta$	$\frac{1}{\sqrt{2}}$	$1$	$\sqrt{2}\left\|y_t- \mu \right\|$
	Power Exponential (GED*)	$w,\nu,\mu,\beta$	$\frac{\nu}{2^{\frac{\nu+1}{\nu}}\Gamma(1/\nu)}$	$1/\nu$	$\frac{(y_t- \mu)^\nu}{2}$
	Generalized Gamma	$w,\nu,\chi,\beta$	$\nu y_{t}^{\nu\chi-1} / \Gamma(\chi)$	$\chi$	$y_t^\nu$
	$\text{PEM}^{\star}$	$w$	$1$	$\sum\limits_{j=1}^{N} \chi_{ij}$	$\sum\limits_{j=1}^{N}(t_{ij}-t_{i-1})$
Relia-	$\text{PH}^{\star}$	$w,\beta$	$\exp \left( \sum\limits_{j\in R_t}^{} \chi_{ij} \mathbf x_{j}'\mathbf \beta \right)$	$\sum\limits_{j=1}^{N}\chi_{ij}$	$\sum\limits_{j=1}^{N}(t_{ij}-t_{i-1})\exp(\mathbf x_{ij}'\mathbf \beta)$
bility	Weibull $\text{SR*}$	$w, \nu, \beta$	$\nu y_{t}^{\nu-1} \exp(-\nu x_t \beta)$	$1$	$y_{t}^\nu \exp(- \nu x_t \beta)$
	Gamma $\text{SR*}$	$w,\alpha,\beta$	$\frac{\left(y_{t}\right)^{\alpha-1}}{\Gamma(\alpha)\exp\left(\alpha x_{t}\beta\right)}$	$\alpha$	$y_{t}\exp(- x_{t}\beta)$

Note: $^{\star}$PEM: Piecewise Exponential Model; PH: Proportional Hazards model; GED: Generalized Error Distribution; SR: Software Reliability.

Hereafter, following Theorem 1 in Gamerman et al. (2013), sequential inference for the NGSSEML will be presented as a basic result that includes the one-step-ahead predicted distributions and the filtering distribution of the latent states $\{\lambda_t\}_{t=1:n}$. If the model is defined as in ((1)) and ((2)), the following results can be obtained:

One-step-ahead predicted (forecast) distribution of the states \[\begin{aligned} \lambda_{t} |\mathbf Y_{t-1},\mathbf{\theta}\sim \text{Gamma}(wa_{t-1}, wb_{t-1}), \label{eq33fg} \end{aligned} \tag{3}\] where $0<w< 1$.
Filtering distribution of the states \[\begin{aligned} % \lambda_{t}=\left(\mu_{t}[g(x_{t}^{'}\mathbf{\beta})]^{-1}\right)|\mathbf Y_{t},\mathbf{\theta}\sim \text{Gamma}(a_{t},b_{t}), \label{eq35fg1} \end{aligned} \tag{4}\] where $a_{t} = wa_{t-1}+b(y_{t},\mathbf{\theta})$ and $b_{t} = wb_{t-1} +c(y_{t},\mathbf{\theta})g(x_{t}, \mathbf{\beta}).$
One-step-ahead predictive density function of the observations \[\label{eq310fg} p(y_t \vert \mathbf Y_{t-1}, \mathbf \theta )=\frac{\Gamma\left(b(y_{t},\mathbf \theta)+wa_{t-1}\right)a(y_{t},\mathbf \theta)[wb_{t-1}\left(g(x_{t}, \mathbf \beta)\right)^{-1}]^{wa_{t-1}}}{\Gamma(a_{t-1})[c(y_{t},\mathbf \theta) +wb_{t-1}(g(x_{t}, \mathbf \beta))^{-1}]^{b(y_{t},\mathbf \theta)+wa_{t-1}}} , \quad y_{t}\in S(\mathbf \theta), \tag{5}\]

$\forall t \leq n$ where $\Gamma(\cdot)$ is the gamma function.

The distribution of $\mu_{t}=\lambda_{t}g(x_{t},\mathbf \beta)$ can be easily obtained from the predictive and filtering distributions.

The analytical form of the predictive density function in ((5)) allows the exact computation of the marginal likelihood function \[\label{eqmlik} L(\mathbf \theta)=L(\mathbf \theta;\mathbf Y_{n})=\displaystyle\prod_{t=1}^{n}\left(p(y_t \vert \mathbf Y_{t-1}, \mathbf \theta)\right). \tag{6}\] Thus, classical and Bayesian inference for the model parameters are easily performed.

Classical inference for the static parameter vector $\mathbf \theta$ can be performed through the maximum likelihood estimator (MLE) obtained by maximizing the marginal likelihood function with respect to $\mathbf\theta$. In general, since the MLE does not possess a closed-form expression, numerical maximization methods (such as the BFGS (Shanno 1970)) are required to maximize the likelihood function.

Bayesian inference for the static parameters can be performed combining the marginal likelihood function and a prior of the static parameters, $p(\mathbf \theta)$, thus obtaining the marginal posterior \[\label{margpost} p(\mathbf \theta\vert \mathbf Y_{n})\propto L(\mathbf \theta)p(\mathbf \theta). \tag{7}\] When the posterior distribution does not have a closed-form, computational methods can be used, such as the Markov chain Monte Carlo (MCMC), Adaptive Rejection Metropolis Sampling (ARMS) (Petris 2010), or numerical integration/quadrature (Santos et al. 2017). In the latter, an inexpensive grid of points can be utilized due to the low dimensionality of $\mathbf \theta$ in many special models and examples presented in this work. Table 2 shows the prior distributions for the static parameters used in each model of Table 1.

Table 2: Prior for the static parameters of the NGSSEML.
Models		$\mathbf \theta$	Prior ($p(\mathbf \theta$))
Time	Poisson	$w,\beta$	$w\sim \text{Beta}(a_w, b_w)$,
Series			$\beta\sim \text{Normal}_q(m_{beta}, V_{beta})$
	Gamma	$w,\chi,\beta$	$w\sim \text{Beta}(a_w, b_w)$,
			$\chi\sim \text{Gamma}(a_\chi, b_\chi)$,
			$\beta\sim \text{Normal}_q(m_{beta}, V_{beta})$
	Weibull	$w,\nu,\beta$	$w\sim \text{Beta}(a_w, b_w)$,
			$\nu\sim \text{Gamma}(a_\nu, b_\nu)$,
			$\beta\sim \text{Normal}_q(m_{beta}, V_{beta})$
	Normal	$w,\mu,\beta$	$w\sim \text{Beta}(a_w, b_w)$,
			$\mu\sim \text{Normal}(m, v)$,
			$\beta\sim \text{Normal}_q(m_{beta}, V_{beta})$
	Laplace	$w,\mu,\beta$	$w\sim \text{Beta}(a_w, b_w)$,
			$\mu\sim \text{Normal}(m, v)$,
			$\beta\sim \text{Normal}_q(m_{beta}, V_{beta})$
	Power Exponential (GED)	$w,\nu,\mu,\beta$	$w\sim \text{Beta}(a_w, b_w)$,
			$\nu\sim \text{Gamma}(a_\nu, b_\nu)$,
			$\mu\sim \text{Normal}(m, v)$,
			$\beta\sim \text{Normal}_q(m_{beta}, V_{beta})$
	Generalized Gamma	$w,\nu,\chi,\beta$	$w\sim \text{Beta}(a_w, b_w)$,
			$\nu\sim \text{Gamma}(a_\nu, b_\nu)$,
			$\chi\sim \text{Gamma}(a_\chi, b_\chi)$,
			$\beta\sim \text{Normal}_q(m_{beta}, V_{beta})$
Relia-	$\text{PEM}^{\star}$	$w$	$w\sim \text{Beta}(a_w, b_w)$
bility
	$\text{PH}^{\star}$	$w,\beta$	$w\sim \text{Beta}(a_w, b_w)$,
		$w,\beta$	$\beta\sim \text{Normal}_q(m_{beta}, V_{beta})$
	Weibull $\text{SR}$	$w, \nu, \beta$	$w\sim \text{Beta}(a_w, b_w)$,
			$\nu\sim \text{Gamma}(a_\nu, b_\nu)$
			$\beta\sim \text{Normal}_q(m_{beta}, V_{beta})$
	Gamma $\text{SR}$	$w,\alpha,\beta$	$w\sim \text{Beta}(a_w, b_w)$,
			$\alpha\sim \text{Gamma}(a_\alpha, b_\alpha)$
			$\beta\sim \text{Normal}_q((m_{beta}, V_{beta})$

Note: $^{\star}$PEM: Piecewise Exponential Model; PH: Proportional Hazards model; GED: Generalized Error Distribution; SR: Software Reliability.

Regarding the latent state, $\mathbf{\lambda}=\{\lambda_t\}_{t=1:n}$, inference can be made using its predictive and filtering distributions as well as smoothing procedures with $\mathbf{\theta}$ set as a fixed value, e.g., its respective MLE (Brockwell and Davis 1996), sampled from its posterior distribution via MCMC output or directly. The latter approach is explained in the sequel.

In order to obtain a sample of the latent state, the smoothing procedure, described in Theorem 2 of Gamerman et al. (2013), is required. It states that the distribution of $(\mathbf{\lambda} |\mathbf{\theta},\mathbf Y_{n})$ is given by \[\begin{aligned} p(\mathbf{\lambda} | \mathbf{\theta}, \mathbf Y_{n}) =p(\lambda_{n}|\mathbf{\theta},\mathbf Y_{n})\prod_{t=1}^{n-1}p\left(\lambda_{t}|\lambda_{t+1},\mathbf{\theta},\mathbf Y_{t}\right), \end{aligned}\] where \[\begin{aligned} \label{resimsuav} \displaystyle\lambda_{t}-w\lambda_{t+1}|\lambda_{t+1},\mathbf{\theta},\mathbf Y_{t}\sim\text{Gamma}\left((1-w)a_{t},b_{t}\right), \forall t \ge 0. \end{aligned} \tag{8}\]

In an analogous way, the marginal distribution of $(\mathbf{\lambda} |\mathbf Y_{n})$ is expressed as \[\begin{aligned} \label{lambdamarg} p(\mathbf{\lambda}|\mathbf Y_{n}) =\int p(\mathbf{\lambda} | \mathbf{\theta}, \mathbf Y_{n})p(\mathbf{\theta}|\mathbf Y_{n}) d\mathbf{\theta}. \end{aligned} \tag{9}\] Once a sample $\mathbf{\theta}^{(1)},\ldots, \mathbf{\theta}^{(M)}$ is available, samples from the states are obtained retrospectively from the distribution of the states using the result in ((8)). More details can be found in Gamerman et al. (2013). A sample $\mathbf{\lambda}^{(1)},\ldots,\mathbf{\lambda}^{(M)}$ of the marginal distribution in ((9)) is obtained in the following manner: i) sample $\mathbf{\theta}^{(1)},\ldots, \mathbf{\theta}^{(M)}$ from its marginal posterior $p(\mathbf{\theta} |\mathbf Y_{n})$. ii) sample $\mathbf{\lambda}^{(j)}$ from $p(\mathbf{\lambda} | \mathbf{\theta}^{(j)}, \mathbf Y_{n}),$ $j=1,\ldots,M$. If the dimensionality of the static parameters $\mathbf \theta$ is low, the numerical calculation of $p(\mathbf \theta\vert \mathbf Y_{n})$ using quadrature is indistinguishable from the exact calculation. If $\mathbf \theta$ is discrete, it is an exact result. Thus, the numerical calculation of the marginal posterior distribution $p(\mathbf{\lambda} |\mathbf Y_{n})$ is a suitable approximation, whose quality depends on the approximation of $p(\mathbf \theta\vert \mathbf Y_{n})$ and, hence, the dimensionality of $\mathbf \theta$ and its nature.

Gamerman et al. (2013) used the Pearson and deviance residuals for model diagnostics and AIC, BIC, and DIC for model comparison in this family. It should be noted that, if MCMC methods are used, convergence checks using graphs like trace plot, autocorrelogram, etc., and tests like the Geweke and Gelman-Rubin tests are necessary.

3 Specification of the state space objects

The package NGSSEML can be downloaded and installed from GitHub or CRAN website and is then activated in R by library("NGSSEML"). Assuming that the data are available in the current environment, the NGSSEML can be set up using the formulas and family arguments presented in the next subsections.

3.1 Estimation of the static parameters

This subsection presents the estimation of the static parameters $\mathbf{\theta}$ under classical and Bayesian approaches. The classical procedure uses the BFGS algorithm, implemented in the “optim” function of the R package, to obtain the MLE. The Bayesian procedure uses numerical integration/quadrature with an inexpensive grid of points. The functions ‘ngssm.mle’ and ‘ngssm.bayes’ perform the classical and Bayesian inferences, respectively, shown in the previous section and utilize the function ‘LikeF’ (the marginal likelihood function) in Eq. ((6)), internally.

Basically, the functions of NGSSEML work with a very simple specification of a few arguments. We need to specify the formula, data (data frame) arguments, and model, as well as the function ‘lm’ for linear regression. For piecewise exponential models in reliability analysis, we cannot forget to include the breaks and events arguments.

The function ‘ngssm.mle’ performs the marginal likelihood estimation of the static parameters of the model using Eq. ((6)), and can be executed by the following command:

  > ngssm.mle(formula, data, na.action="na.omit", pz = NULL, nBreaks = NULL,
  + model = "Poisson", StaPar = NULL, amp = FALSE, a0 = 0.01, b0 = 0.01,
  + ci = 0.95, LabelParTheta = NULL, verbose = TRUE, method = "BFGS",
  + hessian = TRUE, control = list(maxit = 30000, temp = 2000,
  + trace = FALSE, REPORT = 500))

The output of the model fit presents the maximum likelihood estimators, standard errors, Z statistics, and asymptotic confidence intervals of the model parameters. Furthermore, it allows the user to change the control settings, choose the optimizing algorithm, and set arguments as the hyperparameters $a_0$ and $b_0$, the confidence level, etc.

The function ‘ngssm.bayes’ performs the Bayesian estimation of the static parameters of the model using the marginal posterior in Eq. ((7)), and can be run by the following command:

  > ngssm.bayes(formula, data, na.action = "na.omit", pz = NULL, nBreaks = NULL,
  + model = "Poisson", StaPar = NULL, amp = FALSE, a0 = 0.01, b0 = 0.01, prw = c(1, 1),
  + prnu = NULL, prchi = NULL, prmu = NULL, prbetamu = NULL, prbetasigma = NULL,
  + lower = NULL, upper = NULL, ci = 0.95, pointss = 10, nsamplex = 1000, mcmc = NULL,
  + postplot = FALSE, contourplot = FALSE, LabelParTheta = NULL, verbose = TRUE)

The model fit’s output provides the Bayesian estimators, posterior mean and median, standard error, and percentile credibility intervals for NGSSEML parameters, and the posterior samples of the static parameters of the NGSSEML using multinomial sampling. The main functions of the model fit provide an object class “ngssm”, associated with print, summary, and extract further information. Besides, the user can choose the number of grid points (the argument pointss) of the quadrature and samples (the argument nsamplex) and set the hyperparameters of prior distributions (the arguments a0,b0,prw,prnu,prchi,prmu,prbetamu,prbetasigma), the credibility level, etc. If mcmc=TRUE or the number of points in the grid is very high, the ARMS algorithm is executed instead of the quadrature method. Samples of the posterior distribution of the latent states using the smoothing procedure are calculated as described in Eq. ((9)).

3.2 Estimation of latent states

The filtering, smoothing, and plot functions for the NGSSEML are shown in this subsection. For these functions, the static parameters are estimated with the whole data.

The function ‘FilteringF’ computes the shape and scale parameters of the one-step-ahead forecast and filtering distributions of the latent states using the filters in Eq. 3 and 4, respectively, for several models. It can be called by the command below.

  > FilteringF(formula, data, na.action = "na.omit", pz = NULL, nBreaks = NULL,
  + model = "Poisson", StaPar = NULL, a0 = 0.01, b0 = 0.01, amp = FALSE,
  + distl = "PRED", splot = FALSE)

The output presents the shape and scale parameters of the one-step-ahead forecast and filtering distributions of the latent states. Furthermore, it allows the user to modify the arguments of the function as the hyperparameters $a_0$ and $b_0$, etc.

The function ‘SmoothingF’ provides an exact sample of the smoothing distribution of the latent states in Eq. ((9)), and its command is

  > SmoothingF(formula, data, na.action = "na.omit", pz = NULL, nBreaks = NULL,
  + model = "Poisson", StaPar = NULL, Type = "Cond", a0 = 0.01, b0 = 0.01,
  + amp = FALSE, samples = 1, ci = 0.95, splot = FALSE)

The function provides an object with an exact sample of the joint distribution of the states. The user can choose the arguments as the hyperparameters $a_0$ and $b_0$, type of the distribution of the latent states - margin/conditional on the static parameters, the confidence/credibility level, etc. If the number of samples is greater than 1, some summaries of the state samples are returned. If the argument splot=TRUE, a graph with the states’ smoothed estimates is shown.

The function ‘PlotF’ builds graphs with smoothed/filtered estimates of the latent states and can be run by the following command:

  > PlotF(formula, data, na.action = "na.omit", pz = NULL, nBreaks = NULL,
  + plotYt = TRUE, axisxdate = NULL, transf = 1, model = "Poisson", posts,
  + Proc = "Smooth", Type = "Marg", distl = "PRED", a0 = 0.01, b0 = 0.01,
  + ci = 0.95, startdate = NULL, enddate = NULL, Freq = NULL, ...)

The function returns a graph with smoothed or filtered estimates of the latent states. The user can change the arguments as the hyperparameters $a_0$ and $b_0$, the type of the distribution of the latent states - margin/conditional on the static parameters, the procedure between smoothing and filtering, the confidence/credibility level, etc. Other options related to graph edition, such as color, type line, labels, can be inserted into the function ‘PlotF’ using the argument ellipsis (...). To save space in this manuscript, they are not shown. We will try to provide more details in the examples.

4 Examples

In this section, we present four examples to illustrate the use of the proposed package for count, volatility, lifetime, and software reliability data. The count time series is the poliomyelitis data in the USA. The volatility series refers to the return data set from the Petrobras stock market. The lifetime data set (GTE) are the daily failure times of 125 telecommunication systems. The “SYS1” data set are times between 136 successive computer software failures. All examples are widely used in the literature of their respective areas. We present parameter estimation, the goodness of fit for the adjusted models and predictions. For the poliomyelitis data, we also provide comparisons with other softwares available in the literature.

As previously explained, this software is designed for the specific class of non-Gaussian state space models with exact marginal likelihood. The NGSSEML may seem analytically more complex in comparison with other procedures, which present in general simpler equations. Nevertheless, although simplest in form, these procedures frequently need more calculations and heavier computational effort to produce similar results to our method. Based on an alternative evolution, our approach requires less approximation, and thus, inference for static parameters is almost indistinguishable from the (unobtainable) exact form. Readers interested in the advantages and comparisons of the NGSSEML against alternatives should refer to Gamerman et al. (2013) and Santos et al. (2017).

4.1 Count data

A Poisson NGSSEML model is used to fit the monthly data of the number of poliomyelitis cases in the USA, shown in Figure 1, from 1970/01 to 1983/12 (168 observations). The polio data is a well-known example in the count time series literature and was first analyzed by Zeger (1988). The objective is to explain the number of poliomyelitis cases through covariates and make predictions. The covariates are the deterministic trend centered at 73, annual cosine and sine, and semiannual cosine and sine. The intercept is inserted in the model as a dynamic level. The confidence and credibility levels for the intervals are fixed at 0.95. The state initial condition is $\lambda_{0}|Y_{0}\sim \text{Gamma}(a_0, b_0)$, where $a_0=0.3$ and $b_0=0.1$ that can be specified in the functions ‘ngssm.mle’ and ‘ngssm.bayes’ by the arguments $a0$ and $b0$. The static parameter vector in this example is $\mathbf {\theta}=(w,\beta_1,\beta_2,\beta_2,\beta_3,\beta_4,\beta_5)$, so it is necessary to specify a joint probability density for it, that is, a joint prior $p(\mathbf {\theta})$. Assuming that the static parameters are independent and based on the priors available for the package in Table 2, the marginal priors for its components are $w\sim \text{Beta}(1,1)$ and $\beta_j\sim N(0,10),$ for $j=1,\ldots,4$. Parameter $w$ controls the information loss over time, while parameters $\mathbf \beta$ are regression coefficients associated with the covariates for capturing trend and seasonal patterns of the count time series. The prior hyperparameters are chosen to set vague priors for the model parameters.

The MLE for the static parameters of the NGSSM in the poliomyelitis data can be obtained using the BFGS algorithm (default: method = "BFGS") implemented in the optim function, which is the default of ‘ngssm.mle’ function. It is necessary to specify data1 = data.frame(Ytm, Xtm) in the argument data, the formula Ytm$\sim$CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual, and the model model = "Poisson". The arguments StaPar = c(0.79, -0.11, -0.49, 0.18, -0.38), a0 = 0.01, b0 = 0.01, and ci = 0.95 are optional. The name of the variables in the argument must coincide with the name in the data. The model fit is stored in the fit object. The code for classical inference is given below. We should note that a graph with the filtered estimates of the latent states is returned when the ‘FilteringF’ function is called.

  > library(NGSSEML)
  > data(Polio_data)
  > Xtm = Polio_data[, 3:7]
  > Xtm[,1] = (1:168-73)/168
  > Ytm = Polio_data$y
  > Ztm=NULL
  ## CosAnnual, SinAnnual, CosSemiAnnual, SinSemiAnnual
  ## LabelParTheta = c("w", "Beta1", "Beta2", "Beta3", "Beta4")
  StaPar = c(0.79, -0.11, -0.49, 0.18, -0.38)
  > a0 = 0.2
  > b0 = 0.1
  > ci = 0.95
  > data1=data.frame(Ytm,Xtm)
  ## Fit
  > fit = ngssm.mle(Ytm ~ CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual,
  + data = data1, model = model, pz = NULL, StaPar = StaPar, 
  + a0 = a0, b0 = b0, ci = ci)
  > esttheta = as.numeric(fit[,1])
  > filt = FilteringF(Ytm ~ Trend + CosAnnual + SinAnnual + CosSemiAnnual
  + SinSemiAnnual, data = data1, StaPar = esttheta, 
  + model = "Poisson", a0 = a0, b0 = b0,
  + distl = "FILTER", splot = TRUE)
  > filtest = (filt[1,]/filt[2,])

The results are presented in Table 3, showing the MLE’s and their respective 95% confidence intervals for $w$ and $\beta's$.

Bayesian estimation can be performed using quadrature rules with a grid of 6 points (pointss = 6), providing a sample of 2,000 draws (nsamplex = 2000) of the marginal posterior distribution of the static parameters. This can be performed without the use of intensive computational methods to approximate the posterior distribution as, is usually the case in other non-Gaussian state-space models. We should set the arguments in the function as the data frame data1, the formula Ytm$\sim$CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual, and the model model = "Poisson". The marginal priors are: $w\sim U(0,1) \equiv \text{Beta}(1,1)$, and $\beta_j\sim N(0,10),$ for $j=1,\ldots,4$, specified in the function ‘ngssm.bayes’ by the following arguments prw = c(1,1) (two shape parameters of a Beta distribution), prbetamu = rep(0,4), and prbetasigma = diag(10, 4, 4) (two parameters of a multivariate Normal distribution). If postplot = TRUE and contourplot = TRUE, the graph of the marginal posterior and the contour plot are provided. If verbose = TRUE, the estimation results are shown; otherwise, they are omitted. The code is presented below.

  > library(NGSSEML)
  > data(Polio_data)
  > Xtm=Polio_data[,3:7]
  > Xtm[,1] = (1:168-73)/168
  > Ytm = Polio_data$y
  > Ztm = NULL
  ## CosAnnual, SinAnnual, CosSemiAnnual, SinSemiAnnual
  ## LabelParTheta=c("w", "Beta1", "Beta2", "Beta3", "Beta4")
  > StaPar = c(0.79, -0.11, -0.49, 0.18, -0.38)
  > a0 = 0.2
  > b0 = 0.1
  ## points
  > pointss = 6
  ## posterior sample
  > nsamplex = 2000
  ## Cred. level
  > ci = 0.95
  > data1 = data.frame(Ytm, Xtm)
  ## Bayesian fit
  > fitbayes = ngssm.bayes(Ytm ~ CosAnnual + SinAnnual + 
   CosSemiAnnual + SinSemiAnnual,
  + data = data1, model = "Poisson", pz = NULL, StaPar = StaPar, a0 = a0, b0 = b0,
  + prw = c(1, 1), prbetamu = rep(0, 4), prbetasigma = diag(10, 4, 4), ci = ci,
  + pointss = pointss, nsamplex = nsamplex, postplot = TRUE, contourplot = TRUE,
  + verbose = TRUE)

The results of the Bayesian inference are also presented in Table 3, showing the posterior mean and the credibility intervals. The coefficient regression related to the determinist trend is not significant at the 5% significance level, so we decide to remove it from the model. Analyzing Table 3, we can see that both for $\beta_2$ and $\beta_4$ associated with the trigonometric annual and semiannual sine covariates, respectively, are negative and seem to be significant under both approaches at the same significance level. We keep the cosine covariates to preserve the trigonometric representation of seasonality. The magnitude and significance of the regression coefficients are similar to those obtained by other works in the count time series literature (Davis and Wu 2009). Furthermore, the classical estimates are close to the Bayesian estimates. If the ARMS algorithm is used, we need to make a convergence check using graphs like trace plot, autocorrelogram, etc., and tests like the Geweke and Gelman-Rubin tests in the R-package coda.

The R codes for the smoothing function below are used to build the graphs presented in Figure 1, for the Polio data with the smoothed estimates for the observations and mean under the Bayesian approach. The first three arguments of the function ‘PlotF’ refer to the Bayesian model obtained with ‘ngssm.bayes’. The commands posts = posts, axisxdate = x, Proc = "Smooth", Type = "Marg", startdate = "1970/01/01", enddate = "1983/12/31", and Freq = "months" indicate, respectively, the sample of the marginal posterior of the static parameters, the vector of monthly dates, the smoothed distribution for the states, the chosen distribution for the latent states, the start and end dates, and the frequency of the observations. The last arguments refer to the type of lines, colors, and legends in the graph. We can see that the smoothed mean trajectory in Figure 1 follows well the behavior of the time series.

  > posts = fitbayes$samplepost
  > x = seq(as.Date("1970/01/01"), as.Date("1983/12/31"), "months")
  > PlotF(Ytm ~ CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual,
  + data = data1, model = "Poisson", axisxdate = x, Proc = "Smooth", Type = "Marg",
  + posts = posts, startdate = "1970/01/01", enddate = "1983/12/31",
  + Freq = "months", type = 'l', col = c("black", "blue","lightgrey"), xlab = "Months",
  + ylab = "The number of poliomyelitis cases", xlim = NULL, ylim = c(0, 15),
  + Lty = c(1, 2, 1), lwd = c(2, 2, 2), cex = 0.68)

graphic without alt text — Figure 1: The polio data: The full line represents the polio monthly time series in the USA from 1970/01 to 1983/12 (168 observations). The dashed line and the grey area indicate the smoothed means and their respective 95% percentile credibility intervals using the smoothing procedure and the quadrature method.

Table 3: Point and interval (level 95%) estimation for the Poisson model fitted to the polio count data.
$\mathbf{\theta}$	MLE	95% Conf. Int.	Posterior Mean	95% Cred. Int.
$w$	0.793	[0.711; 0.874]	0.783	[0.713; 0.843]
$\beta_1$	-0.116	[-0.324; 0.092]	-0.117	[-0.314; 0.015]
$\beta_2$	-0.488	[-0.717; -0.259]	-0.491	[-0.707; -0.344]
$\beta_3$	0.175	[-0.024; 0.374]	0.176	[-0.014; 0.301]
$\beta_4$	-0.385	[-0.577; -0.193]	-0.387	[-0.568; -0.265]

Figure 2 shows the standardized Pearson residuals and its autocorrelogram for model diagnostic. The residuals are not correlated and distributed between -2 to 2, except the points indicated in the graph (a) that coincide with the largest values of the series in Figure 1. Further investigation of these points could be performed, and the inclusion of interventions may improve the model fit. The R code for the diagnostic analysis in this example is

  > esttheta = as.numeric(fit[,1]); n = length(Ytm)
  > filt = FilteringF(Ytm ~ CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual,
  + data = data1, StaPar = esttheta, model = "Poisson", distl = "FILTER",
  + splot = TRUE)
  > filtest = (filt[1,]/filt[2,])
  > pred = FilteringF(Ytm ~CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual,
  + data = data1, StaPar = esttheta, model = "Poisson", a0 = a0, b0 = b0,
  + distl = "PRED")
  > predonestepahead = (pred[1,]/pred[2,])
  > residuals = (Ytm-predonestepahead)/sqrt(predonestepahead)
  > summary(residuals)
  > residuals = scale(residuals[1:n])
  > par(mfrow = c(1, 2))
  > x = seq(as.Date("1970/01/01"), as.Date("1983/12/31"), "months")
  > plot(x[1:n], residuals, xlab = "Months", ylab = "Standardized Pearson residuals")
  > points(x[7], residuals[7], col = "red", lwd = c(2))
  > points(x[35], residuals[35], col = "red", lwd = c(2))
  > points(x[74], residuals[74] ,col = "red", lwd = c(2))
  > points(x[113], residuals[113], col = "red", lwd = c(2))
  > title("(a)")
  > acf(residuals, ci = 0.99, main = "(b)")

For comparison purposes, Table 4 displays some statistics for evaluating the in-sample performance of the fitted model using the KFAS, sspir and our software. The statistics are the square root of the mean square error $\text{SMSE}=\frac{1}{n}\sum_{i=1}^n (y_i-\hat{y}_i)^2$, the mean absolute error $\text{MAE}=\frac{1}{n}\sum_{i=1}^n |y_i-\hat{y}_i|$, and the mean absolute percentage error, $\text{MAPE}=\frac{1}{n}\sum_{i=1}^n |(y_i-\hat{y}_i)/y_i|$, where $\hat{y}_i$, $i=1,\ldots,n,$ are the fitted values. The Poisson model with the logarithmic link function, level component, and covariates is specified for the three packages, as described in this subsection. The static parameters were fixed at the MLE and posterior mean for our package and the MLE for the other packages to get the smoothed/filtered mean. The results are similar for the KFAS and sspir packages, which use the maximum likelihood estimation via importance sampling methods. Our package provided the predicted values (the filtered estimates) using the classical and Bayesian approaches described in this paper and presented the best results for the SMSE, MAE, and MAPE. Convergence of the maximization algorithm is fast and stable because the likelihood is a well-behaved and exact function. The joint likelihood function ((6)) of our Poisson model is a product of the negative binomial distributions given by Eq. ((5)).

Table 4: In sample SMSE, MAE, and MAPE for the Poisson model using the *KFAS*, *NGSSEML*, and *sspir* packages.
	SMSE	MAE	MAPE
KFAS*	1.776	1.179	0.638
NGSSEML*	1.280	0.890	0.468
NGSSEML**	1.264	0.880	0.462
sspir*	1.788	1.202	0.648

Note: *MLE; **The posterior mean.

4.2 Volatility data

The second example refers to the daily return data from Petrobras stock market from 2000/01/06 to 2008/29/01 (1999 observations), which can be obtained on the website finance.yahoo.com/. The return at time $t$ is defined as $y_t=\ln\left(\frac{P_{t}}{P_{t-1}}\right)$, where $P_{t}$ is the daily closing spot price. Data collection irregularity due to holidays and weekends will be ignored. Figure 3 presents the time series plot of $y_t$. A distinctive feature of financial series is that they usually present non-constant variance or volatility. Thus, this series can be modeled using distributions with a heavy tail, such as GED, which does not belong to the exponential family. The aim is to estimate the daily volatility of Petrobras return time series.

The maximum likelihood estimation for the Petrobras return data is performed using the R code below with the following arguments in the function ‘ngssm.mle’: Ytm$\sim$1, data = data.frame(Ytm), and model = "GED". The state initial condition is $\lambda_{0}|Y_{0}\sim \text{Gamma}(a_0, b_0)$, where $a_0=b_0=0.01$ (the default values) that can be specified in the functions ‘ngssm.mle’ and ‘ngssm.bayes’ by the arguments $a0$ and $b0$.

  > library(NGSSEML)
  > data(Return_data)
  > plot(Return_data[,2], Return_data[,1], type = 'l', xlab = "Days",
  + ylab = "Returns")
  > Ytm = Return_data$Rt
  > Xt = NULL
  > Zt = NULL
  > model = "GED"
  ## LabelParTheta = c("w", "nu")
  > StaPar = c(0.9, 1)
  > a0 = 0.01
  > b0 = 0.01
  > ci = 0.95
  > fit = ngssm.mle(Ytm ~ 1, data=data.frame(Ytm), model = model, a0 = a0, b0 = b0,
  + ci = ci, verbose = FALSE)

Bayesian estimation for the Petrobras return data is obtained via quadrature with 15 points (argument pointss=15) and a sample of 1000 draws (nsamplex=1000) of the marginal posterior distribution. The state initial condition is $\lambda_{0}|Y_{0}\sim \text{Gamma}(0.01, 0.01)$. Analogously to the previous example, the first arguments, which refer to the marginal prior distribution (see Table 2) for the static parameter vector $\mathbf{\theta}=(w,\nu)$ are $w\sim U(0,1)\equiv \text{Beta}(1,1)$ and $\nu\sim \text{Gamma}(0.01,0.01)$, which can be specified by the arguments prw = c(1, 1) (two shape parameters of a Beta distribution) and prmu=rep(0.01,0.01) (two parameters of a Gamma distribution with the mean equals to one). The difference here is that model = "GED". Again, the prior hyperparameters are chosen to set vague priors for the model parameters. A summary of the estimates was contained in the objects fitbayes (Bayesian fit). The code is presented below.

  > library(NGSSEML)
  > data(Return_data)
  > Ytm = Return_data$Rt
  > Date = Return_data$Date
  > Xtm = NULL
  > Ztm = NULL
  ## LabelParTheta = c("W", "nu")
  > StaPar =c (0.9,1)
  > a0 = 0.01
  > b0 = 0.01
  ## points
  > pointss = 15
  ## posterior sample size
  > nsamplex = 1000
  ## Cred. level
  > ci = 0.95
  ## Bayesian fit
  > fitbayes = ngssm.bayes(Ytm ~ 1, data = data.frame(Ytm), model = "GED", pz = NULL,
  + StaPar = StaPar, a0 = a0, b0 = b0, prw = c(1, 1), prnu = c(0.01, 0.01), ci = ci,
  + pointss = pointss, nsamplex = nsamplex, postplot = TRUE, contourplot = TRUE,
  + verbose  =TRUE)

Table 5 presents the point and the interval estimates for the GED model fitted to the Petrobras return data under the classical and Bayesian approaches obtained by the above codes. Figure 4 shows the marginal posterior distributions and contour plot of the joint posterior distribution of the parameters $w$ and $\nu$, respectively, with postplot = TRUE and contourplot = TRUE in the code of the Bayesian approach. Parameter $w$ controls the information loss over time, while $r$ is the shape parameter of the GED model and controls its tails. If $\nu<2$, the GED is a heavy-tailed distribution. It is the case of the Petrobras return data, as expected, since the estimate for $\nu$ is around 1.6 and the upper limit of the 95% intervals for $\nu$ is smaller than 2.

To illustrate the usage of the ‘PlotF’ function in this example, a detailed description of its arguments is given. The first three entries in the ‘PlotF’ function refer to the data and model used. As there are no explanatory variables to be inserted in the volatility, pz = NULL. A date vector for the x-axis was specified in the axisxdate argument. The time series was not inserted in the plot, thus plotYt = FALSE. A transformation of $-0.5$ was applied to the estimates of the latent states to get the volatility. The chosen distribution for the latent states was conditional on the static parameters and marginal (Type = "Marg"). A sample of the static parameters was set as posts = fitbayes$samplepost, and the latent states distribution was the smoothed (Proc = "Smooth"). The started date was set as ’2000-06-01’ and the enddate as ’2008-01-29’. The frequency of data is daily (Freq = "days"). The remaining arguments refer to type of lines, colors, and legends in the graph. We can see that the peaks in Figure 5 correspond to periods of crisis known in the literature and are pointed out by the model.

Table 5: Point and interval estimation for the GED model fitted to the Petrobras series.
$\mathbf{\theta}$	MLE	95%Conf. Int.	Posterior Mean	95% Cred. Int.
$w$	0.949	[0.931; 0.966]	0.947	[0.928; 0.962]
$\nu$	1.601	[1.458; 1.745]	1.606	[1.467; 1.743]

  ## Smoothing
  > set.seed(1000)
  > posts = fitbayes$samplepost
  > PlotF(Ytm ~ 1, data = data.frame(Ytm), model = "GED", pz = NULL,
  + axisxdate = Return_data$Date, plotYt = FALSE, transf = -0.5, Proc = "Smooth",
  + Type = "Marg", posts = posts, startdate = '2000-06-01', enddate = '2008-01-29',
  + Freq = "days", typeline = 'l', col = c("black", "blue", "lightgrey"),
  + xlab = "Days", ylab = expression(paste(hat(sigma)[t])), xlim = NULL,
  + ylim = c(0.02,0.10), lty = c(1, 2, 1), lwd = c(2, 2, 2), cex = 0.68)

4.3 Lifetime data

The third example refers to the daily failure times of 125 telecommunication systems installed by the GTE corporation in a pre-specified time period (Kim and Proschan 1991). Seventeen failures were observed out of the 125 observations. The purpose is to estimate the failure rate for the GTE data.

We specify the formula Ytm$\sim$Event in the first argument of ‘ngssm.mle’ function. The data must contain columns with the times and events. We should also define the model (model = "PEM") and obtain the breaks (one per interval) using the function ‘GridP’ with the times, events, and nT = NULL (the number of one interval per failure). The breaks are automatically inserted into the functions ‘ngssm.mle’ and ‘ngssm.bayes’ with nBreaks = NULL. The maximum likelihood estimation for the GTE data is performed using the R code below.

  > library(NGSSEML)
  > data(gte_data)
  ## Times
  > Ytm = gte_data$V1
  > Xtm = NULL
  > Ztm = NULL
  > amp = FALSE
  ## Event: failure, 1
  > Event = gte_data$V2
  > Break=NGSSEML:::GridP(Ytm, Event, nT = NULL)
  ## LabelParTheta = c("w")
  > StaPar = c(0.73)
  > a0 = 0.01
  > b0 = 0.01
  > ci = 0.95
  > fit = ngssm.mle(Ytm ~ Event, data = data.frame(Ytm,Event), model = "PEM",
  +  nBreaks = NULL, amp = amp, a0 = a0, b0 = b0, ci = ci, verbose = FALSE)

Bayesian estimation for the GTE data is built using quadrature with a grid of 50 points (pointss = 50) and a sample of 1000 draws (nsamplex = 1000). The state initial condition is $\lambda_{0}|Y_{0}\sim \text{Gamma}(a_0, b_0)$, where $a_0 =b_0=0.01$ are specified in the functions by the arguments $a0$ and $b0$, and the prior is $w\sim U(0,1)\equiv \text{Beta}(1,1)$. Both priors are specified in the function ‘ngssm.bayes’ in a similar manner of the previous examples to get vague priors. The formula Ytm$\sim$Event contains the times and events, and the breaks are automatically inserted with nBreaks = NULL. The R code is below.

  > library(NGSSEML)
  > data(gte_data)
  ## Times
  > Ytm = gte_data$V1
  ## Event: failure, 1
  > Event = gte_data$V2
  > Break = NGSSEML:::GridP(Ytm, Event, nT = NULL)
  > Xtm = NULL
  > Ztm = NULL
  > amp = FALSE
  ## LabelParTheta = c("w")
  > StaPar = c(0.5)
  > lower = c(0.01)
  > upper = c(0.99)
  > a0 = 0.01
  > b0 = 0.01
  ## points
  > pointss = 50
  ## Number of samples from the posterior
  > nsamplex = 1000
  ## Bayesian fit
  > fitbayes = ngssm.bayes(Ytm ~ Event, data = data.frame(Ytm,Event), model = "PEM",
  + StaPar = StaPar, amp = amp, a0 = a0, b0 = b0, prw = c(1,1), pointss = pointss,
  + nsamplex = nsamplex, postplot = TRUE, contourplot = FALSE, verbose = TRUE)

Table 6 presents the point and the interval estimates for the PEM fitted to the GTE data under the classical and Bayesian approaches. Parameter $w$ controls the information loss over time in the same manner as the previous application. If pz = TRUE, $w$ becomes $w_i$ for each interval, weighted by the interval width. These estimates are stored in the objects “fit” (classical fit) and “fitbayes” (Bayesian fit) of the previous codes.

The code below is used to build Figure 6, which shows the smoothed failure rate estimates under the Bayesian approach, with a 95% credibility interval (the grey area). As already mentioned, the first three entries in the ‘PlotF’ function refer to the data and model used. The date for the x-axis was specified in axisxdate = Break[1:17]. Once more, the chosen distribution for the latent states was conditional on the static parameters and marginal (Type = "Marg"), and the latent states distribution was the smoothed (Proc = "Smooth"). A sample of the static parameters was built as posts=fitbayes$samplepost. The remaining arguments refer to type of lines, colors, and legends in the graph. We can note that the failure rate is decreasing, as expected.

  > posts = fitbayes$samplepost
  > set.seed(1000)
  > PlotF(Ytm ~ Event, data = data.frame(Ytm, Event), axisxdate = Break[1:17],
  + model = "PEM", Proc = "Smooth", Type = "Marg",  posts = posts, typeline = 's',
  + col = c("black", "blue", "lightgrey"),  xlab = "Time to Failure (Days)",
  + ylab = "Failure rate", xlim = c(0, 139), ylim = c(0, 0.008), lty = c(1, 2, 1),
  + lwd = c(2, 2, 2), cex = 0.68)

Table 6: Point and interval (level 95%) estimation for the PEM fitted to the GTE data.
$\mathbf{\theta}$	MLE	95%Conf. Int.	Posterior Mean	95% Cred. Int.
$w$	0.773	[0.518; 1.000]	0.752	[0.503; 0.956]

4.4 Software reliability data

The Weibull software reliability model is used here for modeling the times between $136$ successive computer software failures of “SYS1” data (Chang and Liu 2009) and for estimating the mean time. A very small constant is added to failure times in order to avoid the case of a zero value in the likelihood function, but the value of the constant did not change the estimation results.

The state initial condition is $\lambda_{0}|Y_{0}\sim \text{Gamma}(0.01, 0.01)$ in a similar way to the previous examples. The arguments of the ‘ngssm.mle’ that should be set are the formula Ytm$\sim$Xtm, data.frame(Ytm, Xtm), and model = "SRWeibull". An initial static parameter vector is optional (StaPar = c(0.9, 0.7, 0.01)). The maximum likelihood estimation for the “SYS1” data is performed using the R code below.

  > library(NGSSEML)
  > data(sys1_data)
  > Ytm = sys1_data[,1]+0.00001
  > Xtm = sys1_data[,2]
  > Zt = NULL
  > model = "SRWeibull"
  ## LabelParTheta = c("w", "alpha", "Beta1")
  > StaPar = c(0.9, 0.7, 0.01)
  > fit = ngssm.mle(Ytm ~ Xtm, data = data.frame(Ytm,Xtm), model = model,
  + StaPar = StaPar)

Bayesian estimation for the “SYS1” data is obtained via quadrature rules with a grid of 15 points (pointss=15) and a sample of 1000 draws of the marginal posterior distribution (nsamplex=1000). As in the chassical fit, the following arguments have to be inserted in the ‘ngssm.bayes’ function: Ytm$\sim$Xtm, data = data.frame(Ytm, Xtm), and model = "SRWeibull". An initial static parameter vector is optional (StaPar = c(0.98, 0.75, 0.02). The marginal priors for the static parameter vector $\mathbf{\theta}=(w,\nu,\beta_1)$ are $w\sim \text{Beta}(1,1)$, $\nu\sim \text{Gamma}(0.01,0.01)$ and $\beta_1\sim N(0,100)$ (vague priors) that are stated in the code via the arguments prw = c(1, 1), prnu = c(0.1, 0.1), prbetamu = c(0), and prbetasigma = diag(100,1,1), respectively.

  > library(NGSSEML)
  > data(sys1_data)
  > Ytm = sys1_data[,1]+0.00001
  > Xtm = sys1_data[,2]
  > model = "SRWeibull"
  ## LabelParTheta = c("w", "nu", "Beta")
  > StaPar = c(0.98,0.75,0.02)
  ## points
  > pointss = 15
  ## Bayesian Fit:
  > fitbayes = ngssm.bayes(Ytm ~ Xtm, data = data.frame(Ytm,Xtm), model = model,
  + pz = NULL, StaPar = StaPar, prw = c(1, 1), prnu = c(0.1, 0.1), prbetamu = c(0),
  + prbetasigma = diag(100, 1, 1), pointss = pointss, nsamplex = 1000, postplot = TRUE,
  + contourplot = TRUE, verbose = TRUE)

Table 7 presents the point and interval estimates for the Weibull SR model fitted to the “SYS1” data under the classical and Bayesian approaches. The estimate of parameter $w$ is high (larger than 0.9), which means that the information loss over time is small. Besides, we have strong evidence from the data in favor of positive values for the regression coefficient $\beta_1$ and values lower than 1 for the shape parameter $\nu$ (i.e., decreasing failure rates).

Table 7: Point and interval (level 95%) estimation for the Weibull SR model fitted to the “SYS1” data.
$\mathbf{\theta}$	MLE	95% Conf. Int.	Posterior Mean	95% Cred. Int.
$w$	0.999	[0.996; 1.000]	0.969	[0.895; 0.994]
$\nu$	0.753	[0.648; 0.857]	0.754	[0.655; 0.856]
$\beta_1$	0.023	[0.018; 0.029]	0.023	[0.015; 0.031]

The commands postplot = TRUE and contourplot = TRUE in the code of the Bayesian approach provide Figure 7 that shows the marginal posterior distributions and contour plot of the joint posterior distribution of the parameters $w$, $\nu$, and $\beta_1$, respectively. Parameter $w$ controls the information loss over time, while $\nu$ is the shape parameter of the Weibull SR model. The estimate of $\nu$ is lower than one, which means strong evidence in favor of decreasing failure rates. Analyzing the marginal posterior of $\beta_1$, we can observe that the number of previous failures seems to be relevant to the model.

The arguments posts = posts, Proc = "Smooth", Type = "Marg", transf = 1/4, and model = "SRWeibull" establish a sample of the marginal posterior of the static parameters, the smoothed distribution for the states, integrating the static parameters out, a transformation of the fourth root in the series, and the Weibull SR model are included in the code below to build Figure 8. Figure 8 provides transformed smoothed estimates for the mean of the data under the Weibull SR model. A transformation to reduce the scale of the data was necessary in order to obtain a better visualization of the graph. The fourth root was a suitable transformation in this case. As can be seen in Figure 8, the estimates follow well the behavior of the data.

  > library(NGSSEML)
  ## Smoothing
  > posts = fitbayes$samplepost
  > set.seed(1000)
  > PlotF(Ytm ~ Xtm, data = data.frame(Ytm, Xtm), plotYt = TRUE, transf = 1/4,
  + model = "SRWeibull", Proc = "Smooth", Type = "Marg", posts = posts, typeline = 'l',
  + col = c("black", "blue", "lightgrey"), xlab = "Number of Failures",
  + ylab = "Transformed Times", xlim = c(0, 139), ylim = c(-2, 10), lty = c(1, 2, 1),
  + lwd = c(1, 2, 1), cex = 0.68)

5 Conclusion

The package demonstrates how non-Gaussian state-space models (with exact marginal likelihood) can suitably be applied and employed in non-Gaussian time series and reliability analysis using R. The main contribution of the R-package NGSSEML is to provide an easy and fast code for classical and Bayesian estimations in non-Gaussian state space models with the exact marginal likelihood in the R programming language. Four important examples were presented for modeling time series and reliability data. The main advantages of the employed methodology are its analytical and computational simplicity, fast routines, and the ability to accommodate different types of distributions, even those which do not belong to the exponential family, combined with exact inference. Both classical and Bayesian approaches may be performed since the exact marginal likelihood for the static parameters is available. The package also provides R codes or functions for the prediction, filtering, and smoothing procedures of the latent states. Also, it is a platform where frequently used models, like the Normal and Poisson, can be included, as well as other volatility and reliability models, like the GED and Generalized Gamma.

Future work aims to increase the number of distributions (special cases) that can be specified for the observations in the package (mainly for time series data), to include dynamic linear models with the mean and variance varying over time that can conditionally be written in the NGSSEML form (Gamerman et al. 2013; Rego and Santos 2020), and to introduce models for modeling multivariate time series data (Aktekin et al. 2018, 2020).

Acknowledgements

T. R. Santos was supported by CNPq-Brazil and FAPEMIG Foundation/Grant 25909. G. C. Franco was supported by CNPq-Brazil and FAPEMIG Foundation. D. Gamerman was supported by CNPq-Brazil and FAPERJ-Brazil. We thank the editor and reviewers for their constructive critique and positive review of our paper.

Models		\(\mathbf \theta\)	\(a(y_{t},\mathbf \theta)\)	\(b(y_{t},\mathbf \theta)\)	\(c(y_{t},{\mathbf{\theta}})\)
	Poisson	\(w,\beta\)	\((y_{t}!)^{-1}\)	\(y_{t}\)	1
	Gamma	\(w,\chi,\beta\)	\(y_{t}^{\chi-1}/ \Gamma(\chi)\)	\(\chi\)	\(y_{t}\)
	Weibull	\(w,\nu,\beta\)	\(\nu(y_{t})^{\nu-1}\)	1	\((y_{t})^{\nu}\)
Time	Normal	\(w,\mu,\beta\)	\((2\pi)^{-1/2}\)	\(1/2\)	\((y_{t}- \mu )^{2}/2\)
Series	Laplace	\(w,\mu,\beta\)	\(\frac{1}{\sqrt{2}}\)	\(1\)	\(\sqrt{2}\left\|y_t- \mu \right\|\)
	Power Exponential (GED*)	\(w,\nu,\mu,\beta\)	\(\frac{\nu}{2^{\frac{\nu+1}{\nu}}\Gamma(1/\nu)}\)	\(1/\nu\)	\(\frac{(y_t- \mu)^\nu}{2}\)
	Generalized Gamma	\(w,\nu,\chi,\beta\)	\(\nu y_{t}^{\nu\chi-1} / \Gamma(\chi)\)	\(\chi\)	\(y_t^\nu\)
	\(\text{PEM}^{\star}\)	\(w\)	\(1\)	\(\sum\limits_{j=1}^{N} \chi_{ij}\)	\(\sum\limits_{j=1}^{N}(t_{ij}-t_{i-1})\)
Relia-	\(\text{PH}^{\star}\)	\(w,\beta\)	\(\exp \left( \sum\limits_{j\in R_t}^{} \chi_{ij} \mathbf x_{j}'\mathbf \beta \right)\)	\(\sum\limits_{j=1}^{N}\chi_{ij}\)	\(\sum\limits_{j=1}^{N}(t_{ij}-t_{i-1})\exp(\mathbf x_{ij}'\mathbf \beta)\)
bility	Weibull \(\text{SR*}\)	\(w, \nu, \beta\)	\(\nu y_{t}^{\nu-1} \exp(-\nu x_t \beta)\)	\(1\)	\(y_{t}^\nu \exp(- \nu x_t \beta)\)
	Gamma \(\text{SR*}\)	\(w,\alpha,\beta\)	\(\frac{\left(y_{t}\right)^{\alpha-1}}{\Gamma(\alpha)\exp\left(\alpha x_{t}\beta\right)}\)	\(\alpha\)	\(y_{t}\exp(- x_{t}\beta)\)

Models		\(\mathbf \theta\)	Prior (\(p(\mathbf \theta\)))
Time	Poisson	\(w,\beta\)	\(w\sim \text{Beta}(a_w, b_w)\),
Series			\(\beta\sim \text{Normal}_q(m_{beta}, V_{beta})\)
	Gamma	\(w,\chi,\beta\)	\(w\sim \text{Beta}(a_w, b_w)\),
			\(\chi\sim \text{Gamma}(a_\chi, b_\chi)\),
			\(\beta\sim \text{Normal}_q(m_{beta}, V_{beta})\)
	Weibull	\(w,\nu,\beta\)	\(w\sim \text{Beta}(a_w, b_w)\),
			\(\nu\sim \text{Gamma}(a_\nu, b_\nu)\),
			\(\beta\sim \text{Normal}_q(m_{beta}, V_{beta})\)
	Normal	\(w,\mu,\beta\)	\(w\sim \text{Beta}(a_w, b_w)\),
			\(\mu\sim \text{Normal}(m, v)\),
			\(\beta\sim \text{Normal}_q(m_{beta}, V_{beta})\)
	Laplace	\(w,\mu,\beta\)	\(w\sim \text{Beta}(a_w, b_w)\),
			\(\mu\sim \text{Normal}(m, v)\),
			\(\beta\sim \text{Normal}_q(m_{beta}, V_{beta})\)
	Power Exponential (GED)	\(w,\nu,\mu,\beta\)	\(w\sim \text{Beta}(a_w, b_w)\),
			\(\nu\sim \text{Gamma}(a_\nu, b_\nu)\),
			\(\mu\sim \text{Normal}(m, v)\),
			\(\beta\sim \text{Normal}_q(m_{beta}, V_{beta})\)
	Generalized Gamma	\(w,\nu,\chi,\beta\)	\(w\sim \text{Beta}(a_w, b_w)\),
			\(\nu\sim \text{Gamma}(a_\nu, b_\nu)\),
			\(\chi\sim \text{Gamma}(a_\chi, b_\chi)\),
			\(\beta\sim \text{Normal}_q(m_{beta}, V_{beta})\)
Relia-	\(\text{PEM}^{\star}\)	\(w\)	\(w\sim \text{Beta}(a_w, b_w)\)
bility
	\(\text{PH}^{\star}\)	\(w,\beta\)	\(w\sim \text{Beta}(a_w, b_w)\),
		\(w,\beta\)	\(\beta\sim \text{Normal}_q(m_{beta}, V_{beta})\)
	Weibull \(\text{SR}\)	\(w, \nu, \beta\)	\(w\sim \text{Beta}(a_w, b_w)\),
			\(\nu\sim \text{Gamma}(a_\nu, b_\nu)\)
			\(\beta\sim \text{Normal}_q(m_{beta}, V_{beta})\)
	Gamma \(\text{SR}\)	\(w,\alpha,\beta\)	\(w\sim \text{Beta}(a_w, b_w)\),
			\(\alpha\sim \text{Gamma}(a_\alpha, b_\alpha)\)
			\(\beta\sim \text{Normal}_q((m_{beta}, V_{beta})\)