The number of packages/software for Gaussian State Space models has
increased over recent decades. However, there are very few codes
available for non-Gaussian State Space (NGSS) models due to analytical
intractability that prevents exact calculations. One of the few
tractable exceptions is the family of NGSS with exact marginal
likelihood, named NGSSEML. In this work, we present the wide range of
data formats and distributions handled by NGSSEML and a package in the
R language to perform classical and Bayesian inference for them.
Special functions for filtering, forecasting, and smoothing procedures
and the exact calculation of the marginal likelihood function are
provided. The methods implemented in the package are illustrated for
count and volatility time series and some reliability/survival models,
showing that the codes are easy to handle. Therefore, the NGSSEML
family emerges as a simple and interesting option/alternative for
modeling non-Gaussian time-varying structures commonly encountered in
time series and reliability/survival studies.
Keywords: Bayesian, classical inference, reliability, smoothing,
time series, software R
Standard statistical software and packages in the State Space Model (SSM) context mainly implement Gaussian models, which are well developed in the literature. For example, the R-packages StructTS (Ripley 2002), dlm (Petris 2010), dlmodeler (Szymanski 2014), SSsimple (Zes 2019), and MARSS (Holmes et al. 2013) are a sample of them. The last introduces multivariate autoregressive state-space modeling or multivariate Gaussian state-space models. In spite of that, there is an increasing demand for methods or models to handle non-Gaussian and non-linear time series that can be applied to real data. In the SSM class, a general structure called Dynamic Generalized Linear Models (DGLM), proposed by West et al. (1985), attracted a great deal of interest due to its flexibility, allowing the observation distribution to belong to the exponential family. The book by Durbin and Koopman (2012) also analyzed non-Gaussian time series using importance sampling in SSM.
Many other researchers have devoted themselves, over the last decades, to auxiliary procedures to handle non-Gaussian State Space under the Bayesian approach (see, for example, Andrieu and Doucet (2002; Carvalho et al. 2010), among others). Some attempts to code NGSS include the sspir (Dethlefsen and Lundbye-Christensen 2006), pomp (King et al. 2016), KFAS (Helske 2016), bssm (Helske and Vihola 2021), and dynamichazard (Christoffersen et al. 2021) packages in the R environment (Team 2021) and SsfPack (Koopman et al. 1999) in the Ox software (Doornik 1997). The SsfPack is the most consolidated package with several updated versions.
Nevertheless, even for simple structures in this class of models, the analytical tractability is easily lost, and, hence, approximations are required in order to perform inferential procedures. Thus, all packages mentioned above use intensive computational methods for making inferences for the model parameters, like sequential Monte Carlo methods or particle filters, importance sampling, particle Markov chain Monte Carlo (MCMC), etc.
In spite of this increasing number of models and packages to handle non-Gaussian SSM, a common feature shared by all of them is the need to use approximations to perform inferential procedures even in very simple cases. Gamerman et al. (2013) identified a subset of NGSS that allows for exact and analytical calculation of the marginal likelihood and the predictive and filtering distributions, called NGSSEML. This family unified several different models previously existing in the literature and included more cases. These exact computations are extremely easy to code and can be performed effortlessly and in a simple way. Applications to real data are presented, and the codes are made available by the authors in Ox language. Also, NGSSEML includes the works of Aktekin et al. (2013, 2018), Pinho et al. (2016) for modeling count and volatility data, respectively, and was later extended by Santos et al. (2017) to include models commonly used in reliability and survival contexts.
Thus, in the face of the absence of consolidated software and packages to handle NGSSEML, the objective of this work is to unify the range of models encompassed and to introduce the R-package NGSSEML. The package illustrates how the NGSS models can be easily employed in non-Gaussian time series and reliability analysis using R under the Bayesian and classical perspectives. It also provides codes in the R environment for mainly formulating and specifying the NGSS models considered in Gamerman et al. (2013) and Santos et al. (2017). The main advantage of this package, which is designed for the specific class of NGSSEML, is that the models are implemented in a simple and flexible way. All other softwares listed above deal with state space models where the marginal likelihood can not be exactly obtained. The NGSSEML approach has the advantage of not needing to approximate the likelihood as others do, achieving that via an alternative evolution. This exact specification enables a computational gain, making the NGSSEML package an attractive option to handle non-Gaussian state space. Our approach to inference is to use (virtually) exact methods. Intensive computational methods are only required when the dimension of the unknown static parameter vector is high. In this case, the Adaptive Rejection Metropolis Sampling (ARMS) algorithm is used to approximate the marginal posterior of the static parameters (Petris 2010). Applications to count data, volatility time series, and reliability models, such as the piecewise exponential model (PEM) and Weibull and Gamma software reliability models are presented.
The paper is organized as follows. The NGSSEML family is presented in Section 2, as well as the inferential and smoothing procedures for the latent states. Section 3 introduces the package functions. Section 4 shows illustrative examples of time series and reliability data. Finally, Section 5 presents the main conclusions and final remarks.
Gamerman et al. (2013) introduced a rich class of non-Gaussian state space models with exact marginal likelihood, called here NGSSEML, based on the seminal work of Smith and Miller (1986). Inference free from approximations can be performed in the NGSSEML, and this is its main advantage compared with other NGSS procedures, such as the DGLM. Due to the model formulation, some components, such as seasonality and the effect of external covariates, are introduced as fixed effects.
A time series
for
It is also assumed that
The state
There is a wide range of distributions that belong to this class of
models. It includes many commonly known discrete and continuous
distributions such as Poisson, Gamma, and Normal (with static mean) but
also includes many other distributions that are not so common and some
reliability models. Table 1 provides the form of functions
This family provides a collection of distributions for modeling a variety of real-time series and reliability data that are of practical importance, including software reliability. A detailed explanation and some illustrations can be found in Pinho et al. (2016) and Santos et al. (2017).
Models | |||||
---|---|---|---|---|---|
Poisson | 1 | ||||
Gamma | |||||
Weibull | 1 | ||||
Time | Normal | ||||
Series | Laplace | ||||
Power Exponential (GED*) | |||||
Generalized Gamma | |||||
Relia- | |||||
bility | Weibull |
||||
Gamma |
Note:
Hereafter, following Theorem 1 in Gamerman et al. (2013), sequential inference for the
NGSSEML will be presented as a basic result that includes the
one-step-ahead predicted distributions and the filtering distribution of
the latent states
The distribution of
The analytical form of the predictive density function in
((5)) allows the exact computation of the marginal
likelihood function
Classical inference for the static parameter vector
Bayesian inference for the static parameters can be performed combining
the marginal likelihood function and a prior of the static parameters,
Models | Prior ( |
||
---|---|---|---|
Time | Poisson | ||
Series | |||
Gamma | |||
Weibull | |||
Normal | |||
Laplace | |||
Power Exponential (GED) | |||
Generalized Gamma | |||
Relia- | |||
bility | |||
Weibull |
|||
Gamma |
|||
Note:
Regarding the latent state,
In order to obtain a sample of the latent state, the smoothing
procedure, described in Theorem 2 of Gamerman et al. (2013), is required. It states that
the distribution of
In an analogous way, the marginal distribution of
Gamerman et al. (2013) used the Pearson and deviance residuals for model diagnostics and AIC, BIC, and DIC for model comparison in this family. It should be noted that, if MCMC methods are used, convergence checks using graphs like trace plot, autocorrelogram, etc., and tests like the Geweke and Gelman-Rubin tests are necessary.
The package NGSSEML can
be downloaded and installed from
GitHub or CRAN
website and is then
activated in R by library("NGSSEML")
. Assuming that the data are
available in the current environment, the NGSSEML can be set up using
the formulas and family arguments presented in the next subsections.
This subsection presents the estimation of the static parameters
Basically, the functions of
NGSSEML work with a very
simple specification of a few arguments. We need to specify the
formula
, data
(data frame) arguments, and model
, as well as the
function ‘lm’ for linear regression. For piecewise exponential models in
reliability analysis, we cannot forget to include the breaks
and
events
arguments.
The function ‘ngssm.mle’ performs the marginal likelihood estimation of the static parameters of the model using Eq. ((6)), and can be executed by the following command:
> ngssm.mle(formula, data, na.action="na.omit", pz = NULL, nBreaks = NULL,
+ model = "Poisson", StaPar = NULL, amp = FALSE, a0 = 0.01, b0 = 0.01,
+ ci = 0.95, LabelParTheta = NULL, verbose = TRUE, method = "BFGS",
+ hessian = TRUE, control = list(maxit = 30000, temp = 2000,
+ trace = FALSE, REPORT = 500))
The output of the model fit presents the maximum likelihood estimators,
standard errors, Z statistics, and asymptotic confidence intervals of
the model parameters. Furthermore, it allows the user to change the
control settings, choose the optimizing algorithm, and set arguments as
the hyperparameters
The function ‘ngssm.bayes’ performs the Bayesian estimation of the static parameters of the model using the marginal posterior in Eq. ((7)), and can be run by the following command:
> ngssm.bayes(formula, data, na.action = "na.omit", pz = NULL, nBreaks = NULL,
+ model = "Poisson", StaPar = NULL, amp = FALSE, a0 = 0.01, b0 = 0.01, prw = c(1, 1),
+ prnu = NULL, prchi = NULL, prmu = NULL, prbetamu = NULL, prbetasigma = NULL,
+ lower = NULL, upper = NULL, ci = 0.95, pointss = 10, nsamplex = 1000, mcmc = NULL,
+ postplot = FALSE, contourplot = FALSE, LabelParTheta = NULL, verbose = TRUE)
The model fit’s output provides the Bayesian estimators, posterior mean
and median, standard error, and percentile credibility intervals for
NGSSEML parameters, and the posterior samples of the static parameters
of the NGSSEML using multinomial sampling. The main functions of the
model fit provide an object class “ngssm”, associated with print,
summary, and extract further information. Besides, the user can choose
the number of grid points (the argument pointss
) of the quadrature and
samples (the argument nsamplex
) and set the hyperparameters of prior
distributions (the arguments
a0,b0,prw,prnu,prchi,prmu,prbetamu,prbetasigma
), the credibility
level, etc. If mcmc=TRUE
or the number of points in the grid is very
high, the ARMS algorithm is executed instead of the quadrature method.
Samples of the posterior distribution of the latent states using the
smoothing procedure are calculated as described in Eq.
((9)).
The filtering, smoothing, and plot functions for the NGSSEML are shown in this subsection. For these functions, the static parameters are estimated with the whole data.
The function ‘FilteringF’ computes the shape and scale parameters of the one-step-ahead forecast and filtering distributions of the latent states using the filters in Eq. 3 and 4, respectively, for several models. It can be called by the command below.
> FilteringF(formula, data, na.action = "na.omit", pz = NULL, nBreaks = NULL,
+ model = "Poisson", StaPar = NULL, a0 = 0.01, b0 = 0.01, amp = FALSE,
+ distl = "PRED", splot = FALSE)
The output presents the shape and scale parameters of the one-step-ahead
forecast and filtering distributions of the latent states. Furthermore,
it allows the user to modify the arguments of the function as the
hyperparameters
The function ‘SmoothingF’ provides an exact sample of the smoothing distribution of the latent states in Eq. ((9)), and its command is
> SmoothingF(formula, data, na.action = "na.omit", pz = NULL, nBreaks = NULL,
+ model = "Poisson", StaPar = NULL, Type = "Cond", a0 = 0.01, b0 = 0.01,
+ amp = FALSE, samples = 1, ci = 0.95, splot = FALSE)
The function provides an object with an exact sample of the joint
distribution of the states. The user can choose the arguments as the
hyperparameters splot=TRUE
, a graph with the states’ smoothed estimates is
shown.
The function ‘PlotF’ builds graphs with smoothed/filtered estimates of the latent states and can be run by the following command:
> PlotF(formula, data, na.action = "na.omit", pz = NULL, nBreaks = NULL,
+ plotYt = TRUE, axisxdate = NULL, transf = 1, model = "Poisson", posts,
+ Proc = "Smooth", Type = "Marg", distl = "PRED", a0 = 0.01, b0 = 0.01,
+ ci = 0.95, startdate = NULL, enddate = NULL, Freq = NULL, ...)
The function returns a graph with smoothed or filtered estimates of the
latent states. The user can change the arguments as the hyperparameters
In this section, we present four examples to illustrate the use of the proposed package for count, volatility, lifetime, and software reliability data. The count time series is the poliomyelitis data in the USA. The volatility series refers to the return data set from the Petrobras stock market. The lifetime data set (GTE) are the daily failure times of 125 telecommunication systems. The “SYS1” data set are times between 136 successive computer software failures. All examples are widely used in the literature of their respective areas. We present parameter estimation, the goodness of fit for the adjusted models and predictions. For the poliomyelitis data, we also provide comparisons with other softwares available in the literature.
As previously explained, this software is designed for the specific class of non-Gaussian state space models with exact marginal likelihood. The NGSSEML may seem analytically more complex in comparison with other procedures, which present in general simpler equations. Nevertheless, although simplest in form, these procedures frequently need more calculations and heavier computational effort to produce similar results to our method. Based on an alternative evolution, our approach requires less approximation, and thus, inference for static parameters is almost indistinguishable from the (unobtainable) exact form. Readers interested in the advantages and comparisons of the NGSSEML against alternatives should refer to Gamerman et al. (2013) and Santos et al. (2017).
A Poisson NGSSEML model is used to fit the monthly data of the number of
poliomyelitis cases in the USA, shown in Figure 1, from
1970/01 to 1983/12 (168 observations). The polio data is a well-known
example in the count time series literature and was first analyzed by
Zeger (1988). The objective is to explain the number of poliomyelitis cases
through covariates and make predictions. The covariates are the
deterministic trend centered at 73, annual cosine and sine, and
semiannual cosine and sine. The intercept is inserted in the model as a
dynamic level. The confidence and credibility levels for the intervals
are fixed at 0.95. The state initial condition is
The MLE for the static parameters of the NGSSM in the poliomyelitis data
can be obtained using the BFGS algorithm (default: method = "BFGS"
)
implemented in the optim function, which is the default of ‘ngssm.mle’
function. It is necessary to specify data1 = data.frame(Ytm, Xtm)
in
the argument data, the formula
Ytm
CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual
,
and the model model = "Poisson"
. The arguments
StaPar = c(0.79, -0.11, -0.49, 0.18, -0.38)
, a0 = 0.01
, b0 = 0.01
,
and ci = 0.95
are optional. The name of the variables in the argument
must coincide with the name in the data. The model fit is stored in the
fit
object. The code for classical inference is given below. We should
note that a graph with the filtered estimates of the latent states is
returned when the ‘FilteringF’ function is called.
> library(NGSSEML)
> data(Polio_data)
> Xtm = Polio_data[, 3:7]
> Xtm[,1] = (1:168-73)/168
> Ytm = Polio_data$y
> Ztm=NULL
## CosAnnual, SinAnnual, CosSemiAnnual, SinSemiAnnual
## LabelParTheta = c("w", "Beta1", "Beta2", "Beta3", "Beta4")
StaPar = c(0.79, -0.11, -0.49, 0.18, -0.38)
> a0 = 0.2
> b0 = 0.1
> ci = 0.95
> data1=data.frame(Ytm,Xtm)
## Fit
> fit = ngssm.mle(Ytm ~ CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual,
+ data = data1, model = model, pz = NULL, StaPar = StaPar,
+ a0 = a0, b0 = b0, ci = ci)
> esttheta = as.numeric(fit[,1])
> filt = FilteringF(Ytm ~ Trend + CosAnnual + SinAnnual + CosSemiAnnual
+ SinSemiAnnual, data = data1, StaPar = esttheta,
+ model = "Poisson", a0 = a0, b0 = b0,
+ distl = "FILTER", splot = TRUE)
> filtest = (filt[1,]/filt[2,])
The results are presented in Table 3, showing the MLE’s and
their respective 95% confidence intervals for
Bayesian estimation can be performed using quadrature rules with a grid
of 6 points (pointss = 6
), providing a sample of 2,000 draws
(nsamplex = 2000
) of the marginal posterior distribution of the static
parameters. This can be performed without the use of intensive
computational methods to approximate the posterior distribution as, is
usually the case in other non-Gaussian state-space models. We should set
the arguments in the function as the data frame data1
, the formula
Ytm
CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual
,
and the model model = "Poisson"
. The marginal priors are:
prw = c(1,1)
(two shape parameters of a Beta distribution),
prbetamu = rep(0,4)
, and prbetasigma = diag(10, 4, 4)
(two
parameters of a multivariate Normal distribution). If postplot = TRUE
and contourplot = TRUE
, the graph of the marginal posterior and the
contour plot are provided. If verbose = TRUE
, the estimation results
are shown; otherwise, they are omitted. The code is presented below.
> library(NGSSEML)
> data(Polio_data)
> Xtm=Polio_data[,3:7]
> Xtm[,1] = (1:168-73)/168
> Ytm = Polio_data$y
> Ztm = NULL
## CosAnnual, SinAnnual, CosSemiAnnual, SinSemiAnnual
## LabelParTheta=c("w", "Beta1", "Beta2", "Beta3", "Beta4")
> StaPar = c(0.79, -0.11, -0.49, 0.18, -0.38)
> a0 = 0.2
> b0 = 0.1
## points
> pointss = 6
## posterior sample
> nsamplex = 2000
## Cred. level
> ci = 0.95
> data1 = data.frame(Ytm, Xtm)
## Bayesian fit
> fitbayes = ngssm.bayes(Ytm ~ CosAnnual + SinAnnual +
CosSemiAnnual + SinSemiAnnual,
+ data = data1, model = "Poisson", pz = NULL, StaPar = StaPar, a0 = a0, b0 = b0,
+ prw = c(1, 1), prbetamu = rep(0, 4), prbetasigma = diag(10, 4, 4), ci = ci,
+ pointss = pointss, nsamplex = nsamplex, postplot = TRUE, contourplot = TRUE,
+ verbose = TRUE)
The results of the Bayesian inference are also presented in Table
3, showing the posterior mean and the credibility intervals.
The coefficient regression related to the determinist trend is not
significant at the 5% significance level, so we decide to remove it from
the model. Analyzing Table 3, we can see that both for
The R codes for the smoothing function below are used to build the
graphs presented in Figure 1, for the Polio data with the
smoothed estimates for the observations and mean under the Bayesian
approach. The first three arguments of the function ‘PlotF’ refer to the
Bayesian model obtained with ‘ngssm.bayes’. The commands
posts = posts
, axisxdate = x
, Proc = "Smooth"
, Type = "Marg"
,
startdate = "1970/01/01"
, enddate = "1983/12/31"
, and
Freq = "months"
indicate, respectively, the sample of the marginal
posterior of the static parameters, the vector of monthly dates, the
smoothed distribution for the states, the chosen distribution for the
latent states, the start and end dates, and the frequency of the
observations. The last arguments refer to the type of lines, colors, and
legends in the graph. We can see that the smoothed mean trajectory in
Figure 1 follows well the behavior of the time series.
> posts = fitbayes$samplepost
> x = seq(as.Date("1970/01/01"), as.Date("1983/12/31"), "months")
> PlotF(Ytm ~ CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual,
+ data = data1, model = "Poisson", axisxdate = x, Proc = "Smooth", Type = "Marg",
+ posts = posts, startdate = "1970/01/01", enddate = "1983/12/31",
+ Freq = "months", type = 'l', col = c("black", "blue","lightgrey"), xlab = "Months",
+ ylab = "The number of poliomyelitis cases", xlim = NULL, ylim = c(0, 15),
+ Lty = c(1, 2, 1), lwd = c(2, 2, 2), cex = 0.68)
MLE | 95% Conf. Int. | Posterior Mean | 95% Cred. Int. | |
0.793 | [0.711; 0.874] | 0.783 | [0.713; 0.843] | |
-0.116 | [-0.324; 0.092] | -0.117 | [-0.314; 0.015] | |
-0.488 | [-0.717; -0.259] | -0.491 | [-0.707; -0.344] | |
0.175 | [-0.024; 0.374] | 0.176 | [-0.014; 0.301] | |
-0.385 | [-0.577; -0.193] | -0.387 | [-0.568; -0.265] |
Figure 2 shows the standardized Pearson residuals and its autocorrelogram for model diagnostic. The residuals are not correlated and distributed between -2 to 2, except the points indicated in the graph (a) that coincide with the largest values of the series in Figure 1. Further investigation of these points could be performed, and the inclusion of interventions may improve the model fit. The R code for the diagnostic analysis in this example is
> esttheta = as.numeric(fit[,1]); n = length(Ytm)
> filt = FilteringF(Ytm ~ CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual,
+ data = data1, StaPar = esttheta, model = "Poisson", distl = "FILTER",
+ splot = TRUE)
> filtest = (filt[1,]/filt[2,])
> pred = FilteringF(Ytm ~CosAnnual + SinAnnual + CosSemiAnnual + SinSemiAnnual,
+ data = data1, StaPar = esttheta, model = "Poisson", a0 = a0, b0 = b0,
+ distl = "PRED")
> predonestepahead = (pred[1,]/pred[2,])
> residuals = (Ytm-predonestepahead)/sqrt(predonestepahead)
> summary(residuals)
> residuals = scale(residuals[1:n])
> par(mfrow = c(1, 2))
> x = seq(as.Date("1970/01/01"), as.Date("1983/12/31"), "months")
> plot(x[1:n], residuals, xlab = "Months", ylab = "Standardized Pearson residuals")
> points(x[7], residuals[7], col = "red", lwd = c(2))
> points(x[35], residuals[35], col = "red", lwd = c(2))
> points(x[74], residuals[74] ,col = "red", lwd = c(2))
> points(x[113], residuals[113], col = "red", lwd = c(2))
> title("(a)")
> acf(residuals, ci = 0.99, main = "(b)")
For comparison purposes, Table 4 displays some
statistics for evaluating the in-sample performance of the fitted model
using the KFAS,
sspir and our software.
The statistics are the square root of the mean square error
SMSE | MAE | MAPE | |
---|---|---|---|
KFAS* | 1.776 | 1.179 | 0.638 |
NGSSEML* | 1.280 | 0.890 | 0.468 |
NGSSEML** | 1.264 | 0.880 | 0.462 |
sspir* | 1.788 | 1.202 | 0.648 |
Note: *MLE; **The posterior mean.
The second example refers to the daily return data from Petrobras stock
market from 2000/01/06 to 2008/29/01 (1999 observations), which can be
obtained on the website finance.yahoo.com/. The return at time
The maximum likelihood estimation for the Petrobras return data is
performed using the R code below with the following arguments in the
function ‘ngssm.mle’: Ytm
1
, data = data.frame(Ytm)
, and
model = "GED"
. The state initial condition is
> library(NGSSEML)
> data(Return_data)
> plot(Return_data[,2], Return_data[,1], type = 'l', xlab = "Days",
+ ylab = "Returns")
> Ytm = Return_data$Rt
> Xt = NULL
> Zt = NULL
> model = "GED"
## LabelParTheta = c("w", "nu")
> StaPar = c(0.9, 1)
> a0 = 0.01
> b0 = 0.01
> ci = 0.95
> fit = ngssm.mle(Ytm ~ 1, data=data.frame(Ytm), model = model, a0 = a0, b0 = b0,
+ ci = ci, verbose = FALSE)
Bayesian estimation for the Petrobras return data is obtained via
quadrature with 15 points (argument pointss=15
) and a sample of 1000
draws (nsamplex=1000
) of the marginal posterior distribution. The
state initial condition is
prw = c(1, 1)
(two shape parameters of a Beta distribution)
and prmu=rep(0.01,0.01)
(two parameters of a Gamma distribution with
the mean equals to one). The difference here is that model = "GED"
.
Again, the prior hyperparameters are chosen to set vague priors for the
model parameters. A summary of the estimates was contained in the
objects fitbayes
(Bayesian fit). The code is presented below.
> library(NGSSEML)
> data(Return_data)
> Ytm = Return_data$Rt
> Date = Return_data$Date
> Xtm = NULL
> Ztm = NULL
## LabelParTheta = c("W", "nu")
> StaPar =c (0.9,1)
> a0 = 0.01
> b0 = 0.01
## points
> pointss = 15
## posterior sample size
> nsamplex = 1000
## Cred. level
> ci = 0.95
## Bayesian fit
> fitbayes = ngssm.bayes(Ytm ~ 1, data = data.frame(Ytm), model = "GED", pz = NULL,
+ StaPar = StaPar, a0 = a0, b0 = b0, prw = c(1, 1), prnu = c(0.01, 0.01), ci = ci,
+ pointss = pointss, nsamplex = nsamplex, postplot = TRUE, contourplot = TRUE,
+ verbose =TRUE)
Table 5 presents the point and the interval estimates for the
GED model fitted to the Petrobras return data under the classical and
Bayesian approaches obtained by the above codes. Figure
4 shows the marginal posterior distributions and
contour plot of the joint posterior distribution of the parameters postplot = TRUE
and contourplot = TRUE
in the code of the Bayesian approach. Parameter
To illustrate the usage of the ‘PlotF’ function in this example, a
detailed description of its arguments is given. The first three entries
in the ‘PlotF’ function refer to the data and model used. As there are
no explanatory variables to be inserted in the volatility, pz = NULL
.
A date vector for the x-axis was specified in the axisxdate
argument.
The time series was not inserted in the plot, thus plotYt = FALSE
. A
transformation of Type = "Marg"
). A sample of the static parameters was set as
posts = fitbayes$samplepost
, and the latent states distribution was
the smoothed (Proc = "Smooth"
). The started
date was set as
’2000-06-01’
and the enddate
as ’2008-01-29’
. The frequency of
data is daily (Freq = "days"
). The remaining arguments refer to type
of lines, colors, and legends in the graph. We can see that the peaks in
Figure 5 correspond to periods of crisis known in the
literature and are pointed out by the model.
MLE | 95%Conf. Int. | Posterior Mean | 95% Cred. Int. | |
0.949 | [0.931; 0.966] | 0.947 | [0.928; 0.962] | |
1.601 | [1.458; 1.745] | 1.606 | [1.467; 1.743] |
## Smoothing
> set.seed(1000)
> posts = fitbayes$samplepost
> PlotF(Ytm ~ 1, data = data.frame(Ytm), model = "GED", pz = NULL,
+ axisxdate = Return_data$Date, plotYt = FALSE, transf = -0.5, Proc = "Smooth",
+ Type = "Marg", posts = posts, startdate = '2000-06-01', enddate = '2008-01-29',
+ Freq = "days", typeline = 'l', col = c("black", "blue", "lightgrey"),
+ xlab = "Days", ylab = expression(paste(hat(sigma)[t])), xlim = NULL,
+ ylim = c(0.02,0.10), lty = c(1, 2, 1), lwd = c(2, 2, 2), cex = 0.68)
The third example refers to the daily failure times of 125 telecommunication systems installed by the GTE corporation in a pre-specified time period (Kim and Proschan 1991). Seventeen failures were observed out of the 125 observations. The purpose is to estimate the failure rate for the GTE data.
We specify the formula Ytm
Event
in the first argument of
‘ngssm.mle’ function. The data must contain columns with the times and
events. We should also define the model (model = "PEM"
) and obtain the
breaks (one per interval) using the function ‘GridP’ with the times,
events, and nT = NULL
(the number of one interval per failure). The
breaks are automatically inserted into the functions ‘ngssm.mle’ and
‘ngssm.bayes’ with nBreaks = NULL
. The maximum likelihood estimation
for the GTE data is performed using the R code below.
> library(NGSSEML)
> data(gte_data)
## Times
> Ytm = gte_data$V1
> Xtm = NULL
> Ztm = NULL
> amp = FALSE
## Event: failure, 1
> Event = gte_data$V2
> Break=NGSSEML:::GridP(Ytm, Event, nT = NULL)
## LabelParTheta = c("w")
> StaPar = c(0.73)
> a0 = 0.01
> b0 = 0.01
> ci = 0.95
> fit = ngssm.mle(Ytm ~ Event, data = data.frame(Ytm,Event), model = "PEM",
+ nBreaks = NULL, amp = amp, a0 = a0, b0 = b0, ci = ci, verbose = FALSE)
Bayesian estimation for the GTE data is built using quadrature with a
grid of 50 points (pointss = 50
) and a sample of 1000 draws
(nsamplex = 1000
). The state initial condition is
Ytm
Event
contains the times and events, and the breaks are automatically inserted
with nBreaks = NULL
. The R code is below.
> library(NGSSEML)
> data(gte_data)
## Times
> Ytm = gte_data$V1
## Event: failure, 1
> Event = gte_data$V2
> Break = NGSSEML:::GridP(Ytm, Event, nT = NULL)
> Xtm = NULL
> Ztm = NULL
> amp = FALSE
## LabelParTheta = c("w")
> StaPar = c(0.5)
> lower = c(0.01)
> upper = c(0.99)
> a0 = 0.01
> b0 = 0.01
## points
> pointss = 50
## Number of samples from the posterior
> nsamplex = 1000
## Bayesian fit
> fitbayes = ngssm.bayes(Ytm ~ Event, data = data.frame(Ytm,Event), model = "PEM",
+ StaPar = StaPar, amp = amp, a0 = a0, b0 = b0, prw = c(1,1), pointss = pointss,
+ nsamplex = nsamplex, postplot = TRUE, contourplot = FALSE, verbose = TRUE)
Table 6 presents the point and the interval estimates for the
PEM fitted to the GTE data under the classical and Bayesian approaches.
Parameter pz = TRUE
,
The code below is used to build Figure 6, which shows the
smoothed failure rate estimates under the Bayesian approach, with a 95%
credibility interval (the grey area). As already mentioned, the first
three entries in the ‘PlotF’ function refer to the data and model used.
The date for the x-axis was specified in axisxdate = Break[1:17]
. Once
more, the chosen distribution for the latent states was conditional on
the static parameters and marginal (Type = "Marg"
), and the latent
states distribution was the smoothed (Proc = "Smooth"
). A sample of
the static parameters was built as posts=fitbayes$samplepost. The
remaining arguments refer to type of lines, colors, and legends in the
graph. We can note that the failure rate is decreasing, as expected.
> posts = fitbayes$samplepost
> set.seed(1000)
> PlotF(Ytm ~ Event, data = data.frame(Ytm, Event), axisxdate = Break[1:17],
+ model = "PEM", Proc = "Smooth", Type = "Marg", posts = posts, typeline = 's',
+ col = c("black", "blue", "lightgrey"), xlab = "Time to Failure (Days)",
+ ylab = "Failure rate", xlim = c(0, 139), ylim = c(0, 0.008), lty = c(1, 2, 1),
+ lwd = c(2, 2, 2), cex = 0.68)
MLE | 95%Conf. Int. | Posterior Mean | 95% Cred. Int. | |
0.773 | [0.518; 1.000] | 0.752 | [0.503; 0.956] |
The Weibull software reliability model is used here for modeling the
times between
The state initial condition is
Ytm
Xtm
, data.frame(Ytm, Xtm)
, and
model = "SRWeibull"
. An initial static parameter vector is optional
(StaPar = c(0.9, 0.7, 0.01)
). The maximum likelihood estimation for
the “SYS1” data is performed using the R code below.
> library(NGSSEML)
> data(sys1_data)
> Ytm = sys1_data[,1]+0.00001
> Xtm = sys1_data[,2]
> Zt = NULL
> model = "SRWeibull"
## LabelParTheta = c("w", "alpha", "Beta1")
> StaPar = c(0.9, 0.7, 0.01)
> fit = ngssm.mle(Ytm ~ Xtm, data = data.frame(Ytm,Xtm), model = model,
+ StaPar = StaPar)
Bayesian estimation for the “SYS1” data is obtained via quadrature rules
with a grid of 15 points (pointss=15
) and a sample of 1000 draws of
the marginal posterior distribution (nsamplex=1000
). As in the
chassical fit, the following arguments have to be inserted in the
‘ngssm.bayes’ function: Ytm
Xtm
, data = data.frame(Ytm, Xtm)
,
and model = "SRWeibull"
. An initial static parameter vector is
optional (StaPar = c(0.98, 0.75, 0.02)
. The marginal priors for the
static parameter vector prw = c(1, 1)
, prnu = c(0.1, 0.1)
, prbetamu = c(0)
,
and prbetasigma = diag(100,1,1)
, respectively.
> library(NGSSEML)
> data(sys1_data)
> Ytm = sys1_data[,1]+0.00001
> Xtm = sys1_data[,2]
> model = "SRWeibull"
## LabelParTheta = c("w", "nu", "Beta")
> StaPar = c(0.98,0.75,0.02)
## points
> pointss = 15
## Bayesian Fit:
> fitbayes = ngssm.bayes(Ytm ~ Xtm, data = data.frame(Ytm,Xtm), model = model,
+ pz = NULL, StaPar = StaPar, prw = c(1, 1), prnu = c(0.1, 0.1), prbetamu = c(0),
+ prbetasigma = diag(100, 1, 1), pointss = pointss, nsamplex = 1000, postplot = TRUE,
+ contourplot = TRUE, verbose = TRUE)
Table 7 presents the point and interval estimates for
the Weibull SR model fitted to the “SYS1” data under the classical and
Bayesian approaches. The estimate of parameter
MLE | 95% Conf. Int. | Posterior Mean | 95% Cred. Int. | |
---|---|---|---|---|
0.999 | [0.996; 1.000] | 0.969 | [0.895; 0.994] | |
0.753 | [0.648; 0.857] | 0.754 | [0.655; 0.856] | |
0.023 | [0.018; 0.029] | 0.023 | [0.015; 0.031] |
The commands postplot = TRUE
and contourplot = TRUE
in the code of
the Bayesian approach provide Figure 7
that shows the marginal posterior distributions and contour plot of the
joint posterior distribution of the parameters
The arguments posts = posts
, Proc = "Smooth"
, Type = "Marg"
,
transf = 1/4
, and model = "SRWeibull"
establish a sample of the
marginal posterior of the static parameters, the smoothed distribution
for the states, integrating the static parameters out, a transformation
of the fourth root in the series, and the Weibull SR model are included
in the code below to build Figure 8. Figure
8 provides transformed smoothed estimates for
the mean of the data under the Weibull SR model. A transformation to
reduce the scale of the data was necessary in order to obtain a better
visualization of the graph. The fourth root was a suitable
transformation in this case. As can be seen in Figure
8, the estimates follow well the behavior of
the data.
> library(NGSSEML)
## Smoothing
> posts = fitbayes$samplepost
> set.seed(1000)
> PlotF(Ytm ~ Xtm, data = data.frame(Ytm, Xtm), plotYt = TRUE, transf = 1/4,
+ model = "SRWeibull", Proc = "Smooth", Type = "Marg", posts = posts, typeline = 'l',
+ col = c("black", "blue", "lightgrey"), xlab = "Number of Failures",
+ ylab = "Transformed Times", xlim = c(0, 139), ylim = c(-2, 10), lty = c(1, 2, 1),
+ lwd = c(1, 2, 1), cex = 0.68)
The package demonstrates how non-Gaussian state-space models (with exact marginal likelihood) can suitably be applied and employed in non-Gaussian time series and reliability analysis using R. The main contribution of the R-package NGSSEML is to provide an easy and fast code for classical and Bayesian estimations in non-Gaussian state space models with the exact marginal likelihood in the R programming language. Four important examples were presented for modeling time series and reliability data. The main advantages of the employed methodology are its analytical and computational simplicity, fast routines, and the ability to accommodate different types of distributions, even those which do not belong to the exponential family, combined with exact inference. Both classical and Bayesian approaches may be performed since the exact marginal likelihood for the static parameters is available. The package also provides R codes or functions for the prediction, filtering, and smoothing procedures of the latent states. Also, it is a platform where frequently used models, like the Normal and Poisson, can be included, as well as other volatility and reliability models, like the GED and Generalized Gamma.
Future work aims to increase the number of distributions (special cases) that can be specified for the observations in the package (mainly for time series data), to include dynamic linear models with the mean and variance varying over time that can conditionally be written in the NGSSEML form (Gamerman et al. 2013; Rego and Santos 2020), and to introduce models for modeling multivariate time series data (Aktekin et al. 2018, 2020).
Acknowledgements
T. R. Santos was supported by CNPq-Brazil and FAPEMIG Foundation/Grant 25909. G. C. Franco was supported by CNPq-Brazil and FAPEMIG Foundation. D. Gamerman was supported by CNPq-Brazil and FAPERJ-Brazil. We thank the editor and reviewers for their constructive critique and positive review of our paper.
StructTS, dlm, dlmodeler, SSsimple, MARSS, sspir, pomp, KFAS, bssm, dynamichazard, NGSSEML, coda
Bayesian, DifferentialEquations, Epidemiology, GraphicalModels, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Santos, et al., "NGSSEML: Non-Gaussian State Space with Exact Marginal Likelihood", The R Journal, 2021
BibTeX citation
@article{RJ-2021-087, author = {Santos, Thiago R. and Gamerman, Dani and Franco, Glaura C.}, title = {NGSSEML: Non-Gaussian State Space with Exact Marginal Likelihood}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {2}, issn = {2073-4859}, pages = {208-227} }