Linear time series models are commonly used in analyzing dependent data and in forecasting. On the other hand, real phenomena often exhibit nonlinear behavior and the observed data show nonlinear dynamics. This paper introduces the R package NTS that offers various computational tools and nonlinear models for analyzing nonlinear dependent data. The package fills the gaps of several outstanding R packages for nonlinear time series analysis. Specifically, the NTS package covers the implementation of threshold autoregressive (TAR) models, autoregressive conditional mean models with exogenous variables (ACMx), functional autoregressive models, and state-space models. Users can also evaluate and compare the performance of different models and select the best one for prediction. Furthermore, the package implements flexible and comprehensive sequential Monte Carlo methods (also known as particle filters) for modeling non-Gaussian or nonlinear processes. Several examples are used to demonstrate the capabilities of the NTS package.
Time series analysis investigates the dynamic dependence of data observed over time or in space. While linear time series analysis has been extensively studied in the literature with many software packages widely available, nonlinear time series analysis only attracts limited attention. Although there exist some software packages for analyzing nonlinear time series focusing on different sets of tools, there are still significant gaps in capability. The NTS (Tsay et al. 2020), a recent R package, provides a number of functions for simulating, analyzing, and predicting nonlinear time series data. The available models include univariate and multivariate TAR models, conditional intensity models, nonlinear state-space models, and functional time series models. The package also features various nonlinearity tests and sequential Monte Carlo (SMC) methods. While NTS package does not intend to be comprehensive, it fills the important missing parts of the existing packages, providing users valuable tools for analyzing dependent data with nonlinear dynamics. The package is now available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=NTS.
NTS incorporates the latest developments in statistical methods and algorithms for analyzing nonlinear time series data, and it makes the following contributions: (1) NTS offers various computational tools with a wide range of applications, and it fills the gaps left by the existing R functions. There are several R packages focusing on nonlinear time series. The nonlinearTseries (Garcia and Sawitzki 2020) package implements the methods based on information theory, the NlinTS (Youssef 2020) package introduces functions for causality detection and neural networks, and the nlts (Bjornstad 2018) package emphasizes nonparametric autoregression and tests. NTS, providing computational tools for TAR models, ACMx models, convolutional functional autoregressive (CFAR) models, and non-Gaussian and nonlinear state-space models, consists of some of the missing pieces in the current coverage, hence making a more completed toolkit for nonlinear time series analysis in R. Other well-known modern methods for nonlinear data such as smoothing, deep learning and random forest that have been implemented in packages sm (Bowman and Azzalini 2018), tree (Ripley 2019) and randomForest (Breiman et al. 2018) can be adopted for nonlinear time series analysis, even though they are mainly designed for independent data. Hence, they are not included in this package. (2) NTS provides complete solutions with superior performance for the nonlinear models entertained. For example, NTS implements estimation, prediction, model building and model comparison procedures for TAR models. It allows the threshold variable in the model to be a lag variable or an exogenous variable, while the TAR (Zhang and Nieto 2017) package, using Markov Chain Monte Carlo and Bayesian methods aiming to deal with missing values, assumes the threshold variable is exogenous. Threshold estimation methods in NTS, which perform recursive least squares or nested sub-sample searching, are more computationally efficient than the conditional least squares methods implemented in package tsDyn (Narzo et al. 2020). Furthermore, the threshold nonlinearity test proposed by Tsay (1989) in NTS is specifically designed for self-exciting TAR (SETAR) models while the existing R package nonlinearTseries just conducts general nonlinearity tests. In addition, NTS utilizes the out-of-sample forecasting to evaluate different TAR models to avoid overfitting, while other R packages such as tsDyn just compare TAR models based on AIC and residuals. (3) NTS offers additional options to existing packages with more flexibility. Specifically, NTS offers R functions to fit the ACMx model for time series analysis of count data, which allow the conditional distribution to be double Poisson, while the tscount (Liboschik et al. 2020) package uses the generalized linear models and only considers Poisson and negative binomial distributions. Another example is that NTS implements the estimation and prediction procedures of CFAR models proposed by Liu et al. (2016), which give an intuitive and direct interpretation for functional time series analysis and provide more flexibility for estimation to deal with irregular observation locations compared to functional autoregressive models developed by Bosq (2000) introduced in the ftsa (Hyndman and Shang 2020) package. (4) NTS provides easy access to SMC methods with various options for statistical inference. It contains different R functions which can be easily implemented for filtering and smoothing and are much more user-friendly, while the SMC (Goswami 2011) package only writes a generic function for SMC and requires more effort from users.
The goal of this paper is to highlight the main functions of the NTS package. In the paper, we first consider different models for nonlinear time series analysis, and provide an overview of the available functions for parameter estimation, prediction and nonlinearity tests in the NTS package. Then we discuss the functions for SMC methods and demonstrate their applications via an example. Conclusions are given at the end.
TAR models are a piecewise extension of the autoregressive (AR) model proposed by (Tong 1978). It has been widely used in many scientific fields, such as economics (Tong and Lim 1980; Tiao and Tsay 1989), finance (Domian and Louton 1997; Narayan 2006), among others (Chen 1995). The models are characterized by partitioning the Euclidean space into non-overlapping regimes via a threshold variable and fitting a linear AR model in each regime (Li and Tong 2016). The partition is by various thresholds in the domain of the threshold variable.
Let
In this subsection, we introduce three algorithms for estimation of two-regime TAR models.
The two-regime TAR model can be rewritten as
Define
(Conditional) least squares: For each fixed threshold candidate
Recursive least squares method provides an efficient way to update the
least squares solution with new observations, and is much less
computationally expensive than the ordinary least squares method. When
we traverse all possible thresholds and calculate
Let
Here is the algorithm of recursive least squares for TAR model estimation:
When
When
For
With
NeSS algorithm proposed by (Li and Tong 2016) produces a much faster way to search threshold candidates, and reduce the computational complexity dramatically.
(Li and Tong 2016) shows that there exists a positive constant
NeSS algorithm:
Get the initial feasible set
Obtain the 25th, 50th, and 75th percentiles of the feasible set, and
define them as
If
If
Otherwise, the feasible set is updated as
Repeat Steps 1-2 until the number of elements in the new feasible
set is less than a pre-specified positive integer
Minimize
Comparing to the standard search algorithm which traverses all the
threshold candidates, NeSS algorithm reduces the number of least squares
operations from
In the R package NTS, the function uTAR
implements recursive least
squares estimation or the NeSS algorithm for TAR model estimation. The
two methods both have lower computational complexity than the existing R
function setar
designed for SETAR model estimation in the tsDyn
package, which performs least squares estimation and adopts a single
grid search algorithm.
To illustrate, we use the following data generating process to compare
the performance of the three methodsuTAR
both take shorter time than setar
when sample
size is large. It is also seen that when sample size is large, NeSS
algorithm is the fastest, but when the sample size is relatively small,
the recursive least squares method is the fastest.
Function | Sample size 200 | Sample size 2000 | ||
Elapsed time | MSE | Elapsed time | MSE | |
uTAR with recursive least squares |
2.802s | 0.0017 | 44.020s | 2.08e-05 |
uTAR with NeSS algorithm |
20.360s | 0.0017 | 25.878s | 2.08e-05 |
setar |
7.147s | 0.0017 | 286.226s | 2.08e-05 |
Besides threshold value estimation for univariate time series, the NTS package implements data generating, forecasting, model checking, and model comparison procedures for both univariate and multivariate time series into user-friendly computational tools. Table 2 lists these functions of NTS related to TAR models. In the following we will demonstrate the usage of functions for univariate time series through the data generating process in Model ((7)).
Function | Description | |
---|---|---|
Univariate TAR model | uTAR.sim |
Generate a univariate SETAR process for up to 3 regimes |
uTAR |
Estimate univariate two-regime TAR models including threshold | |
uTAR.est |
Estimate multiple regimes TAR models with known threshold(s) | |
uTAR.pred |
Predict a fitted univariate TAR model | |
thr.test |
Test for threshold nonlinearity of a scalar series | |
Multivariate TAR model | mTAR.sim |
Generate a multivariate two-regime SETAR process |
mTAR |
Estimate multivariate two-regime TAR models including threshold | |
mTAR.est |
Estimate multivariate multiple-regime TAR models | |
ref.mTAR |
Refine a fitted multivariate two-regime TAR model | |
mTAR.pred |
Predict a fitted multivariate TAR model |
The function uTAR.sim
generates data from a given univariate SETAR
model for up to three regimes with following arguments: nob
is the
sample size of the generated data, arorder
specifies the AR orders for
different regimes, phi
is a real matrix containing the AR coefficients
with one row for a regime, d
is the time delay, cnst
is a vector of
constant terms for the regimes, and sigma
is a vector containing the
standard deviations of the innovation process of the regimes. It also
allows users to customize the burn-in period with option ini
. The
function returns a list of components including the generated data from
the specified TAR model (series
) and the innovation series (at
).
We simulate the data generating process in Model ((7)) with the following code. Figure 1 shows the time series plot of the first 200 observations of the simulated data.
R> set.seed(1687)
R> y <- uTAR.sim(nob = 2000, arorder = c(2,2), phi = t(matrix(c(-0.3, 0.5, 0.6,
+ -0.3), 2, 2)), d = 2, thr = 0.2, cnst = c(1, -1), sigma = c(1, 1))
Estimation of the threshold value of the two-regime SETAR process can be
done via the function uTAR
as illustrated below:
R> thr.est<- uTAR(y = y$series, p1 = 2, p2 = 2, d = 2, thrQ = c(0, 1), Trim = c(0.1,
+ 0.9), include.mean = T, method = "NeSS", k0 = 50)
Estimated Threshold: 0.1951103
Regime 1:
Estimate Std. Error t value Pr(>|t|)
X1 1.0356009 0.04902797 21.12265 8.946275e-85
X2 -0.3017810 0.01581242 -19.08506 2.383743e-71
X3 0.4890477 0.02707987 18.05945 7.230880e-65
nob1 & sigma1: 1236 1.017973
Regime 2:
Estimate Std. Error t value Pr(>|t|)
X1 -1.1352678 0.07222915 -15.717585 2.107275e-48
X2 0.5560001 0.03177212 17.499622 7.360494e-58
X3 -0.2122922 0.04641671 -4.573616 5.596852e-06
nob2 & sigma2: 762 1.034592
Overall MLE of sigma: 1.024343
Overall AIC: 101.8515
R> thr.est <- uTAR(y = y$series, p1 = 2, p2 = 2, d = 2, thrQ = c(0,1), Trim = c(0.1,
+ 0.9), include.mean = T, method = "RLS")
Estimated Threshold: 0.1951103
Regime 1:
Estimate Std. Error t value Pr(>|t|)
X1 1.0356009 0.04902797 21.12265 8.946275e-85
X2 -0.3017810 0.01581242 -19.08506 2.383743e-71
X3 0.4890477 0.02707987 18.05945 7.230880e-65
nob1 & sigma1: 1236 1.017973
Regime 2:
Estimate Std. Error t value Pr(>|t|)
X1 -1.1352678 0.07222915 -15.717585 2.107275e-48
X2 0.5560001 0.03177212 17.499622 7.360494e-58
X3 -0.2122922 0.04641671 -4.573616 5.596852e-06
nob2 & sigma2: 762 1.034592
Overall MLE of sigma: 1.024343
Overall AIC: 101.8515
uTAR
has the following arguments: y
is a vector of observed time
seres, p1
and p2
are the AR order of regime 1 and regime 2,
respectively, d
is the delay, and thrV
contains the external
threshold variable y
. For SETAR models, thrV
is not needed and should be set to NULL.
thrQ
determines the lower and upper quantiles to search for threshold
value. Trim
defines the lower and upper trimmings to control the
minimum sample size in each regime and determine
include.mean
is a
logical value for including the constant term in each linear model.
method
decides the way to search the threshold value, and there are
two choices, "RLS" for recursive least squares and "NeSS" for NeSS
algorithm. k0
is only used when NeSS algorithm is selected to controls
the maximum sub-sample size.
From the output, the estimated threshold value is
Here we provide an incomplete list of the returned values of the
function uTAR
:
residuals
: estimated innovations or residuals series.coefs
: a 2-by-(sigma
: estimated covariances of the innovation process in regime 1
and regime 2.thr
: estimated threshold value.Estimation of a multiple-regime TAR model with pre-specified threshold
values can be done by the function uTAR.est
.
R> est <- uTAR.est(y = y$series, arorder = c(2, 2), thr = thr.est$thr, d = 2,
+ output = FALSE)
Here aroder
is a row vector of positive integers containing the AR
orders of all the regimes. thr
collects the threshold values whose
length should be the number of regimes minus 1. output
is a logical
value for printing out the estimation results with default being TRUE.
The function uTAR.est
returns the following components: coefs
is a
matrix with sigma
contains the estimated innovation variances for
different regimes, residuals
collects the estimated innovations, and
sresi
shows the standardized residuals.
The following R code provides one-step-ahead prediction with function
uTAR.pred
.
R> set.seed(12)
R> pred <- uTAR.pred(model = est, orig = 2000, h = 1, iteration = 100, ci = 0.95,
+ output = TRUE)
Forecast origin: 2000
Predictions: 1-step to 1 -step
step forecast
[1,] 1 -1.429635
Pointwise 95 % confident intervals
step Lowb Uppb
int 1 -2.991667 0.6531542
The output above shows that the one-step ahead prediction for uTAR.pred
provide users
the flexibility to customize the forecasting origin with orig
,
forecast horizon with h
, number of iterations with iterations
, and
confidence level with ci
. The function uTAR.pred
returns the
prediction with pred
.
The R function thr.test
in the NTS package implements the thr.test
.
R> thr.test(y$series, p = 2, d = 2, ini = 40, include.mean = T)
SETAR model is entertained
Threshold nonlinearity test for (p,d): 2 2
F-ratio and p-value: 213.0101 1.511847e-119
ini
is the initial number of data to start the recursive least square
estimation. The output shows that y$series
.
Back-testing can be used to evaluate the forecasting performance of a
model and to conduct model comparison between different models.
Back-testing for a univariate SETAR model is implemented through the
function backTAR
with syntax:
R> backTAR(model, orig, h = 1, iter = 3000)
where model
is an object returned by uTAR
or uTAR.est
, h
is the
forecast horizon, and iter
controls the number of simulation
iterations in prediction.
The function returns the model, out-of-sample rolling prediction errors and predicted states. It also provides information for model comparison. The following example shows the out-of-sample forecasting performance of SETAR models with delay 2 and 1, respectively. It shows that the root MSE, mean absolute error, and biases of the model with delay 2 are all smaller than those of the model with delay 1. Hence, as expected, the model with delay 2 is preferred.
R> set.seed(11)
R> backTAR(est, 50, 1, 3000)
Starting forecast origin: 50
1-step to 1 -step out-sample forecasts
RMSE: 1.02828
MAE: 0.8172728
Bias: -0.001337478
Performance based on the regime of forecast origins:
Summary Statistics when forecast origins are in State: 1
Number of forecasts used: 1204
RMSEj: 1.029292
MAEj: 0.8172963
Biasj: 0.00259177
Summary Statistics when forecast origins are in State: 2
Number of forecasts used: 746
RMSEj: 1.026645
MAEj: 0.817235
Biasj: -0.007679051
R> thr.est2 <- uTAR(y = y$series, p1 = 2, p2 = 2, d = 1, thrQ = c(0, 1),
+ Trim=c(0.1, 0.9), include.mean = T, method = "RLS")
R> est2 <- uTAR.est(y = y$series, arorder = c(2, 2), thr = thr.est2$thr, d = 1)
R> set.seed(11)
R> backTAR(est2, 50, 1, 3000)
Starting forecast origin: 50
1-step to 1 -step out-sample forecasts
RMSE: 1.38731
MAE: 1.105443
Bias: -0.006635381
Performance based on the regime of forecast origins:
Summary Statistics when forecast origins are in State: 1
Number of forecasts used: 1112
RMSEj: 1.360347
MAEj: 1.090989
Biasj: 0.2462278
Summary Statistics when forecast origins are in State: 2
Number of forecasts used: 838
RMSEj: 1.4223
MAEj: 1.124622
Biasj: -0.3421769
The usage of functions for multivariate two-regime TAR models listed in
Table 2, including mTAR.sim
, mTAR
, mTAR.pred
, is
similar to that of the univariate counterpart functions discussed
before. The only exception is that these multivariate functions take
different arguments to define the vector autoregressive(VAR)
coefficients:
phi1, phi2
: VAR coefficient matrices of regime 1 and regime 2.sigma1, sigma2
: innovation covariance matrices of regime 1 and
regime 2.c1, c2
: constant vectors of regime 1 and regime 2.delay
: two elements The function mTAR
conducts the nested sub-sample search algorithm and
provides different choices of criterion for threshold selection with the
option score
, namely (AIC, det(RSS)). It has less computational cost,
but only applies to two-regime models. mTAR.est
can handle multiple
regimes. They both return a list of components with the estimated VAR
coefficients in beta
, estimated innovation covariance matrices in
sigma
, and estimated innovations in residuals
.
Autoregressive conditional mean (ACM) models are designed for time
series of count data, starting with the autoregressive conditional
Poisson models, and various extensions of ACM models were investigated.
The NTS includes a function ACMx
for the estimation of ACMx models.
Let
The estimation of ACMx models is implemented via the function ACMx
with syntax:
R> ACMx(y, order = c(1, 1), X = NULL, cond.dist = "po", ini = NULL)
where y
is the series of count data, X
is the matrix of exogenous
variables, order
specifies the values for cond.dist
determines the conditional distribution with options: "po" for
Poisson, "nb" for negative binomial, and "dp" for double Poisson,
and ini
collects initial parameter estimates designed for use with
"nb" or "dp".
We illustrate the function ACMx
with an example below:
R> set.seed(12)
R> x <- rnorm(1000)*0.1
R> y <- matrix(0, 1000, 1)
R> y[1] <- 2
R> lambda <- matrix(0, 1000, 1)
R> for (i in 2:1000){
+ lambda[i] <- 2 + 0.2*y[i-1]/exp(x[i-1]) + 0.5*lambda[i-1]
+ set.seed(i)
+ y[i] <- rpois(1, exp(x[i]) * lambda[i])
+ }
R> ACMx(y, order = c(1, 1), x, "po")
Initial estimates: 1.056732 1.738874 0.05 0.5
loB: -1.056732 1e-06 1e-06 1e-06
upB: 3.170195 19.12762 0.5 0.999999
Maximized log-likehood: -2373.08
Coefficient(s):
Estimate Std. Error t value Pr(>|t|)
beta 1.0562836 0.1274853 8.28553 2.2204e-16 ***
omega 2.6696378 0.5569954 4.79293 1.6437e-06 ***
alpha 0.1579050 0.0265997 5.93634 2.9145e-09 ***
gamma 0.4427157 0.0913361 4.84711 1.2528e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here a time series following the ACMx model with Poisson conditional
distribution, order (1,1),
Functional time series analysis has received much attention since the pioneering work of (Bosq 2000), and has been widely applied in many fields, including environmental science (Hormann and Kokoszka 2010), social science (Hyndman and Shang 2009), and finance (Diebold and Li 2006; Horváth et al. 2013). (Liu et al. 2016) proposed a new class of models called the CFAR models, which has an intuitive and direct interpretation of the dynamics of a stochastic process. The NTS encompasses functions to implement the method proposed by (Liu et al. 2016).
Before presenting these R functions, we briefly introduce the CFAR model
and its estimation procedure. A sequence of square integrable random
functions
In practice,
With the above approximation, the B-spline coefficients
The convolutional functions are estimated by
In the NTS package, model specification and estimation of a CFAR
process can be carried out by the functions F_test_cfar
and est_cfar
with the syntax:
R> F_test_cfar(f, p.max = 6, df_b = 10, grid = 1000)
R> est_cfar(f, p = 3, df_b = 10, grid = 1000)
The observed functional time series is stored in f
, which is a
p
specifies the CFAR order. df_b
determines the degrees
of freedom grid
is the number
of grid points used to construct the functional time series and noise
process.
The function F_test_cfar
returns the test statistics and the
p.max
. The
function est_cfar
returns the following values: phi_coef
collects
the estimated spline coefficients phi_func
contains the estimated convolutional function
values which is a rho
is
the estimated sigma
is estimated
standard deviation of the noise process, respectively.
The function g_cfar
in the NTS package generates a CFAR process with
argument tmax
being the length of time, rho
is the parameter for the
O-U process, sigma
is the standard deviation of the O-U process,
phi_list
is the convolutional function(s), and ini
is the burn-in
period. It returns a list with two components. One is cfar
, a
tmax
-by-(grid
+1) matrix following the CFAR(p) model, and the other
one epsilon
is the innovation at time tmax
. Function p_cfar
provides the forecasts of a CFAR model with argument model
as a result
of an est_cfar
fit and argument m
as the forecasting horizon.
Let us consider a CFAR(1) process with the following convolutional
coefficient function
R> phi_func <- function(x){
+ return(dnorm(x, mean = 0, sd = 0.1))
+ }
R> t <- 1000; N <- 50
R> x <- g_cfar(t, rho = 5, phi_func, sigma = 1, ini = 100)
R> f <- x$cfar[, seq(1, 1001, 1000/N)]
R> F_test_cfar(f, p.max = 2, df_b = 10, grid = 1000)
Test and p-value of Order 0 vs Order 1:
[1] 1368.231 0.000
Test and p-value of Order 1 vs Order 2 :
[1] 0.6848113 0.7544222
R> model <- est_cfar(f, p = 1, df_b = 10, grid = 1000)
R> print(c(model$rho, model$sigma))
[1] 4.940534 1.005315
R> pred <- p_cfar(model, f, m = 3)
From the output, the est_cfar
performs well.
F_test_cfarh
and est_cfarh
in NTS can deal with heteroscedasticity
and irregular observation locations, while the existing R package ftsa
designed for functional time series assumes that the functional data are
observed at regular locations. The two functions come with following
arguments: weight
is the heteroscedasticity weight function of the
noise process with grid
+1 elements, num_obs
is a x_pos
is a
R> num_obs <- rep(N+1, t); x_pos <- matrix(rep(seq(0, 1, 1/N), each = t), t, N+1);
R> weight0 <- function(x){return(rep(1, length(x)))}
R> F_test_cfarh(f, weight0, p.max = 2, df_b = 10, grid = 1000, num_obs, x_pos)
R> modelh <- est_cfarh(f, weight0, p = 1, df_b = 10, grid = 1000, num_obs, x_pos)
It is challenging to derive analytic solutions for filtering or
smoothing of nonlinear or non-Gaussian state-space models. The SMC
approach fully utilizes the dynamic nature of the model, and is an
effective way to solve such complex problems (Tsay and Chen 2018). Consider the
state-space model:
In general, there are four main statistical inference objectives associated with a state-space model:
Filtering: obtain the marginal posterior distribution of the current
state
Prediction: obtain the marginal posterior distribution of the future
state given the current available information, that is,
Smoothing: obtain the posterior distribution of the state at the
time
Likelihood and parameter estimation. SMC uses a set of weighted
samples
SMC is a recursive procedure with three components:
The selection of the propagation trial distribution
Algorithm: Full information propagation step
At time
Draw
Compute the incremental weight
Because using full information propagation allows for more efficient estimation procedure with Rao-Blackwellization, in NTS we provide a specific function for it, different from the more general function using any user designed propagation function.
It is shown that the variance of the weight distributions stochastically increases over time. Continuing to propagate the samples with small weights forward is a waste of computational resources since inference is based on weighted average of the samples. One solution is to use resampling schemes to duplicate the importance samples and to eliminate the ones with low weights. Hence resampling steps are essential in SMC implementations. Another issue to consider is how to make inference as efficient as possible. Rao-Blackwellization is one of the effective tools that can be used.
Smoothing is another important inference to make when we analyze data
with state-space models. Delayed estimation (e. g. , making inference on
Algorithm: Weight marginalization SMC smoother
Let
Calculate
Table 3 lists functions in NTS that implement the aforementioned SMC procedures. Compared to the existing R package SMC coming with one generic function, NTS provides various functions for statistical inference and are much more user-friendly.
Usage | Function | Description |
---|---|---|
Generic function | SMC |
SMC method with delay but not using a full information propagation step |
SMC.Smooth |
SMC smoothing method | |
SMC.Full |
SMC method using a full information propagation step | |
SMC.Full.RB |
SMC method using a full information propagation step with | |
Rao-Blackwellization estimate and no delay |
The SMC
function can be called by:
R> SMC(Sstep, nobs, yy, mm, par, xx.init, xdim, ydim, resample.sch,
+ delay = 0, funH = identity)
The following arguments need to be specified for SMC
.
Sstep
: A function that performs one step propagation using a
proposal distribution. Its input variables include
(mm, xx, logww, yyy, par, xdim, ydim
), where xx
and logww
are
the prior iteration samples and their corresponding log weights, and
yyy
is the observation at current time step. It returns a list
that contains xx
(the sample logww
(their
corresponding log weights).nobs
: the number of observations, yy
: the observations with ydim
rows.mm
: the Monte Carlo sample size par
: a list of parameter values to pass to Sstep
.xx.init
: the initial samples of xdim, ydim
: the dimension of the state variable resample.sch
: a binary vector of length nobs
, reflecting the
resampling schedule.resample.sch[i]=1
indicates resample should be carried out at step
delay
: the maximum delay lag for delayed wighting estimation.
Default is zero.funH
: a user supplied function The function returns xhat
, an array with dimensions
(xdim,nobs,delay+1
) and the scaled log-likelihood value loglike
. The
functions SMC.Smooth
, SMC.Full
and SMC.Full.RB
have similar inputs
and outputs, except that SMC.Smooth
needs another input function for
the backward smoothing step and funH
is a function for estimating
Here is an example demonstrating how to implement SMC methods using the NTS package. Passive sonar is often used in military surveillance systems to track a target and to reduce the chance to be detected by the target. Without using an active sonar, it collects the signals generated by the motion of the target, and thus only the direction (or bearing) of the target is observed with error. With multiple detectors, the location of the target can be identified (Peach 1995; Kronhamn 1998; Arulampalam et al. 2004; Tsay and Chen 2018).
Suppose the target is moving in a two-dimensional plane and there are
two stationary detectors located on the same plane at
The unobserved state variable is
In this example, the sensors are placed as
We use the following code to generate data.
R> simPassiveSonar <- function(nn = 300, q, r, W, V, s2, start, seed){
+ set.seed(seed)
+ x <- matrix(nrow = 4, ncol = nn)
+ y <- matrix(nrow = 2, ncol = nn)
+ for(ii in 1:nn){
+ if(ii == 1) x[, ii] <- start
+ if(ii > 1) x[, ii] <- H%*%x[, ii - 1] + W%*%rnorm(2)
+ y[1, ii] <- atan(x[2, ii]/x[1, ii])
+ y[2, ii] <- atan(x[2, ii]/(x[1, ii] - s2))
+ y[, ii] <- (y[, ii] + V%*%rnorm(2) + 0.5*pi)%%pi -0.5*pi
+ }
+ return(list(xx = x, yy = y, H = H, W = W, V = V))
+ }
R> s2 <- 20; nobs <- 300
R> q <- c(0.03, 0.03); r <- c(0.02, 0.02)
R> start <- c(10, 10, 0.01, 0.01)
R> H <- c(1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0,1)
R> H <- matrix(H, ncol = 4, nrow = 4, byrow = T)
R> W <- c(0.5*q[1], 0, 0, 0.5*q[2], q[1], 0, 0, q[2])
R> W <- matrix(W, ncol = 2, nrow = 4, byrow = T)
R> V <- diag(r)
R> simu_out <- simPassiveSonar(nobs, q, r, W, V, s2, start, seed = 2000)
R> plot(simu_out$xx[1, ], simu_out$xx[2, ], xlab = 'x', ylab = 'y', type = "l")
R> points(c(0, 20), c(0, 0), pch = 15, cex = 2)
Figure 3 shows a simulated trajectory for
We consider using the state equation as the proposal distribution.
Specifically, given
The following statements are user generated functions for SMC implementation.
R> SISstep.Sonar <- function(mm, xx, logww, yy, par, xdim = 1, ydim = 1){
+ H <- par$H; W <- par$W; V <- par$V; s2 <- par$s2;
+ xx <- H%*%xx + W%*%matrix(rnorm(2*mm), nrow = 2, ncol = mm)
+ y1 <- atan(xx[2, ]/xx[1, ])
+ y2 <- atan(xx[2,]/(xx[1,] - s2))
+ res1 <- (yy[1] - y1 + 0.5*pi)%%pi - 0.5*pi
+ res2 <- (yy[2] - y2 + 0.5*pi)%%pi - 0.5*pi
+ uu <- -res1**2/2/V[1, 1]**2 - res2**2/2/V[2, 2]**2
+ logww <- logww + uu
+ return(list(xx = xx, logww = logww))
+ }
R> SISstep.Smooth.Sonar <- function(mm, xxt, xxt1, ww, vv, par){
+ H <- par$H; W <- par$W;
+ uu <- 1:mm
+ aa <- 1:mm
+ xxt1p <- H%*%xxt
+ for(i in 1:mm){
+ res1 <- (xxt1[1, i] - xxt1p[1, ])/W[1,1]
+ res2 <- (xxt1[2, i] - xxt1p[2, ])/W[2, 2]
+ aa[i] <- sum(exp(-0.5*res1**2 - 0.5*res2**2)*ww)
+ }
+ for(j in 1:mm){
+ res1 <- (xxt1[1, ] - xxt1p[1, j])/W[1, 1]
+ res2 <- (xxt1[2, ] - xxt1p[2, j])/W[2, 2]
+ uu[j] <- sum(exp(-0.5*res1**2 - 0.5*res2**2)*vv/aa)
+ }
+ vv <- ww*uu
+ return(list(vv = vv))
+ }
Now we are ready to run the Monte Carlo sample with size
R> mm <- 100000
R> set.seed(1)
R> resample.sch <- rep(1,nobs)
R> xdim <- 4; ydim <- 2
R> mu0 <- start; SS0 <- diag(c(1, 1, 1, 1))*0.01
R> xx.init <- mu0 + SS0%*%matrix(rnorm(mm*4), nrow = 4, ncol = mm)
R> par <- list(H = H, W = W, V = V, s2 = s2)
R> delay <- 10
R> out <- SMC(SISstep.Sonar, nobs, yy, mm, par, xx.init, xdim, ydim, resample.sch, delay)
R> tt <- 100:nobs
R> plot(simu_out$xx[1, tt], simu_out$xx[2, tt], xlab = 'x', ylab = 'y')
R> for(dd in c(1, 6, 11)){
+ tt <- 100:(nobs - dd)
+ lines(out$xhat[1, tt, dd], out$xhat[2, tt, dd], lty = 23 - 2*dd, lwd = 1)
+ }
> legend(25, 22.5, legend = c("delay 0", "delay 5", "delay 10"), lty = c(21, 11, 1))
The top left panel in Figure 4 shows the delayed estimation
using the delay weighting method with delay
SMC smoothing can be implemented with the following R code. The top right panel in Figure 4 plots the smoothing results, and it shows that the SMC smoothing function performs very well.
R> set.seed(1)
R> mm <- 5000
R> par <- list(H = H, W = W, V = V, s2 = s2)
R> xx.init <- mu0 + SS0%*%matrix(rnorm(mm*4), nrow = 4, ncol = mm)
R> out.s5K <- SMC.Smooth(SISstep.Sonar, SISstep.Smooth.Sonar, nobs, yy, mm, par,
+ xx.init, xdim, ydim, resample.sch)
R> plot(simu_out$xx[1, tt], simu_out$xx[2, tt], xlab = 'x', ylab = 'y')
R> lines(out.s5K$xhat[1, tt], out.s5K$xhat[2, tt], lty = 1, lwd = 2)
+ legend(17, 19.5, legend = c("SMC Smoother"), lty = 1, lwd = 2)
The paper introduces the R package NTS which offers a broad collection of functions for the analysis of nonlinear time series data. We briefly review various nonlinear time series models, including TAR models, ACMx models, CFAR models, and state-space models. The associated estimation, identification, and forecasting procedures are discussed. The NTS package provides computational tools to fit these models, to evaluate their performance, and to provide predictions. Furthermore, the functions can be used, extended, and modified within the package to analyze larger univariate/multivariate, Gaussian/non-Gaussian time series. These features enable users to carry out a comprehensive and complex analysis of time series without the constraints from software availability.
Chen’s research is supported in part by National Science Foundation grants DMS-1503409, DMS-1737857, IIS-1741390 and CCF-1934924. Tsay’s research is supported in part by the Booth School of Business, University of Chicago.
NTS, nonlinearTseries, NlinTS, nlts, sm, tree, randomForest, TAR, tsDyn, tscount, ftsa, SMC
Econometrics, Environmetrics, Finance, FunctionalData, MachineLearning, MissingData, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Liu, et al., "NTS: An R Package for Nonlinear Time Series Analysis", The R Journal, 2021
BibTeX citation
@article{RJ-2021-016, author = {Liu, Xialu and Chen, Rong and Tsay, Ruey}, title = {NTS: An R Package for Nonlinear Time Series Analysis}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {12}, issue = {2}, issn = {2073-4859}, pages = {293-310} }