Chaos theory has been hailed as a revolution of thoughts and attracting ever-increasing attention of many scientists from diverse disciplines. Chaotic systems are non-linear deterministic dynamic systems which can behave like an erratic and apparently random motion. A relevant field inside chaos theory is the detection of chaotic behavior from empirical time-series data. One of the main features of chaos is the well-known initial-value sensitivity property. Methods and techniques related to testing the hypothesis of chaos try to quantify the initial-value sensitive property estimating the so-called Lyapunov exponents. This paper describes the main estimation methods of the Lyapunov exponent from time series data. At the same time, we present the DChaos library. R users may compute the delayed-coordinate embedding vector from time series data, estimates the best-fitted neural net model from the delayed-coordinate embedding vectors, calculates analytically the partial derivatives from the chosen neural nets model. They can also obtain the neural net estimator of the Lyapunov exponent from the partial derivatives computed previously by two different procedures and four ways of subsampling by blocks. To sum up, the DChaos package allows the R users to test robustly the hypothesis of chaos in order to know if the data-generating process behind time series behaves chaotically or not. The package’s functionality is illustrated by examples.
According to the literature, countless techniques have been developed
and used to estimate the complexity of time series data; for a review
see, e.g., (Faggini 2014), (Bradley and Kantz 2015), (Tang et al. 2015). We have
focused on methods derived from chaos theory which estimates the
complexity of a dataset through exploring the structure of the
attractor. Particularly, we have been interested in the so-called
Lyapunov exponent (
for
This ergodic measure can be defined as follows. Let
This relation indicates the average rate of divergence or convergence of
an orbit
There are some R packages recently developed related to nonlinear time
series analysis and chaos. For instance, the
tseriesChaos
written by (Narzo 2019),
nonlinearTseries
proposed by (Garcia 2019) and
fNonlinear provided
by (Wuertz et al. 2017). These R packages are based on ideas inspired by the
time series analysis (TISEAN) project suggested by (Hegger et al. 1999). All of
them implement the algorithm written by (Kantz 1994) using exclusively
the direct method for estimating the Lyapunov exponent. The main
drawbacks of the direct methods are the following:
Function | Description |
---|---|
embedding |
Provides the delayed-coordinate embedding vectors backwards |
gauss.sim |
Simulates time series data from the Gauss map |
henon.sim |
Simulates time series data from the Hénon system |
jacobian.net |
Computes the partial derivatives from the best-fitted neural net model |
logistic.sim |
Simulates time series data from the Logistic map |
lyapunov |
Estimates the Lyapunov exponent through several methods |
lyapunov.max |
Estimates the largest Lyapunov exponent by the Norma-2 procedure |
lyapunov.spec |
Estimates the Lyapunov exponent spectrum by the QR decomposition |
netfit |
Fits any standard feed-forward neural net model from time series data |
rossler.sim |
Simulates time series data from the system |
summary.lyapunov |
Summary method for a lyapunov object |
w0.net |
Estimates the initial parameter vector of the neural net model |
In this paper, we present the DChaos package written by (Sandubete and Escot 2021), which provides the following contributions. First, as far as we know, the DChaos package is the first R library that provides the Jacobian indirect methods proposed by (Eckmann and Ruelle 1985) and (Gençay and Dechert 1992) for estimating the Lyapunov exponents. These Jacobian indirect methods solve all drawbacks which belong to the direct methods. Particularly, this package has focused on the neural net approach following (McCaffrey et al. 1992) and (Nychka et al. 1992) by approximating the unknown non-linear system through a feed-forward single hidden layer neural network. Those neural net methods provide a well-fit of any unknown linear or non-linear model. They also have the advantage to allow the analytical derivation, rather than numerically, the jacobian needed for the estimation of the Lyapunov exponents.
Second, the DChaos package allows the use of full sample estimation, but it also provides three different blocking methods for estimating the Lyapunov exponents following (McCaffrey et al. 1992) and (Shintani and Linton 2004). One of them is a new proposal based on the bootstrap method. The results provided by the three blocking methods improve the case of the full sample being very similar between them. Although, on average, the bootstrap method gives better results.
Third, the DChaos package provides new algorithms implementing the formal test proposed by (Shintani and Linton 2004), who provided a statistical framework for testing the hypothesis of chaos, based on the theoretical asymptotic properties from the neural net estimator and its variance. R users might make statistical inferences about Lyapunov exponents’ significance and test the hypothesis of chaos (eq.(1)). Remember that the asymptotic distribution of the estimator obtained from direct methods does not exist, and that is, there is no chance to make statistical inference about chaos. Fourth, any methods for estimating the Lyapunov exponent from time series data are previously based on the state space reconstruction procedure. Note that a key point to create a suitable reconstruction of the state-space is to fix criteria in order to estimate the embedding parameters. Researchers usually estimate them using heuristic approaches based on prescriptions proposed by, e.g., (Abarbanel 1996) or (Kantz and Schreiber 2004). The main drawbacks of these heuristic approaches are the following: they are not intrinsically statistical; their results are not robust; they lead to estimators whose properties are unknown or largely unexplored; they do not take into account the results of any model fit. Although the Dchaos package also allows the use of heuristic methods, we have implemented some alternative and consistent statistical methods based on model selection criteria following (Chan and Tong 2001) which solves those disadvantages.
Fifth, instead of considering only time-series data with uniform time-frequency, e.g., 1-month, 1-day, 1-hour, 30-min, 5-min, 1-min, and so on, as R packages mentioned above, the DChaos package also allows the use of time series data with non-uniform time-frequency. As far as we know, the DChaos package is the first R library that provides the reconstruction of the pseudo-estate space from time series data with non-uniform time-frequency (which are not equally spaced in time). In this sense, this package provides a new algorithm implementing the theorem proposed by (Huke and Broomhead 2007), who extended the reconstruction theorem when the dynamical system is sampled non-uniformly in time. Their results have allowed us to get the non-uniform delayed-coordinate embedding vectors from time series data with non-uniform time-frequency, e.g., tick-by-tick financial time series.
To sum up, the DChaos package provides a novel R interface for researchers interested in the field of chaotic dynamic systems and non-linear time series analysis and professors (and students) who teach (learn) courses related to those topics. There are 12 functions available in the DChaos package; see table 1. We are going to describe all of them as we go on to explain the theoretical procedure. The rest of the paper is organized as follows. Section 2 presents the state space reconstruction procedure from time series data. Section 3 gives an overview about the main estimation methods of the Lyapunov exponents and provides some technical details about the DChaos package. Section 4 illustrates some examples showing the package’s functionality. Section 5 contains some concluding remarks.
From now on, we assume that the true data-generating process is unknown.
Hence, we do not have the advantage of observing directly the state of
the system
The embedding procedure allows us to get all the relevant information
(invariant properties) about the unknown underlying dynamical system
that generates the time series data, e.g., the Lyapunov exponents
defined previously must have approximately the same value in both the
true and the reconstructed state space. This fact allows us to test the
hypothesis of chaos in the unknown original dynamic system
((1)). Any methods for estimating the Lyapunov exponent from
some observed time series data are based previously on the state space
reconstruction procedure. The embedding theorem proposed by
(Takens 1981) provides a framework to reconstruct an unknown dynamical
system which gave rise to a given observed scalar time series simply by
reconstructing a new state space out of successive values of the time
series. We have considered the method of delayed-coordinates proposed by
(Ruelle and Takens 1971) to get the delayed-coordinate embedding vectors as
follows. Let
where
The underlying idea is to make copies of the measured signal with uniform time-lapse between observations and consider these delayed values as coordinates of a reconstructed state space, retrieved from the data. Notice that the reconstruction theorem assumes that the dynamical system is sampled uniformly in time. There has, however, been increasing interest in situations where observations are not uniform in time. That is, the data does not come from a series of values measured periodically in time. For instance, a motivating example would be the case of financial markets. The information generated by the interactions between traders who buy (bid orders) and sell (ask orders) financial instruments such as stocks, bonds, commodities, currencies, derivatives, and so on is apparently encoded in frequencies which are not usually equally spaced in time. Thus, if we observe a particular financial asset, i.e., a currency pair on the Foreign Exchange Market, the quotes or rates of this financial asset are sampling, by tick-by-tick intervals, which do not follow a constant rhythm. Each tick will appear when there is a change, upward or downward, in the trade price of each transaction. Can we use this kind of information to build a dynamical system model, and if so, how is it related to the true data-generating process? (Huke and Broomhead 2007) showed how to extend the reconstruction theorem under certain conditions when the dynamical system is sampled non-uniformly in time. So, we can use their results to get the non-uniform delayed-coordinate embedding vectors from time series data with non-uniform time-frequency.
The three R packages mentioned before have implemented their algorithms
considering only the state space reconstruction from time series data
sampled uniformly in time. The fNonlinear package has a function
called embeddPSR
. buildTakens
is the function that belongs to the
nonlinearTseries package, and the tseriesChaos package includes the
function embedd
. Note that such packages consider the
delayed-coordinate embedding vectors forward while we consider them
backward. Both ways of reconstructing the attractor are correct, but we
think it more convenient to define the delayed-coordinate embedding
vectors backward since in time series analysis we explain the initial
and future instants from what has happened in the past as in equation
(3). Let us show some brief examples of how to deal with the
state space reconstruction procedure from time series data.
Uniform delayed-coordinate embedding vectors. On the one hand, the
first five values are shown using the embedd
function, which belongs
to tseriesChaos package for an embedding dimension m=5
and a time
delay equal to d=2
. We have simulated time series data from the
logistic equation contained in the DChaos package. The command
set.seed(34)
will set the seed for reproducibility.
## Simulates time-series data from the Logistic map with chaos
ts <- DChaos::logistic.sim(a=4, n=1000)
[1] 7.47e-01 7.57e-01 7.37e-01 7.76e-01 6.95e-01 8.48e-01 5.16e-01 9.99e-01 4.22e-03
## Provides the uniform delayed-coordinate embedding vectors forwards
data <- tseriesChaos::embedd(ts, m=5, d=2)
show(head(data, 5))
V1/0 V1/2 V1/4 V1/6 V1/8
[1,] 0.7466701 0.7365940 0.69509054 0.51625546 0.00422337
[2,] 0.7566155 0.7760930 0.84775873 0.99894304 0.01682213
[3,] 0.7365940 0.6950905 0.51625546 0.00422337 0.06615659
[4,] 0.7760930 0.8477587 0.99894304 0.01682213 0.24711960
[5,] 0.6950905 0.5162555 0.00422337 0.06615659 0.74420600
On the other hand, the DChaos package provides the function
embedding
which arguments are (x
, m
, lag
, timelapse
) to get
the delayed-coordinate embedding vectors backward considering both the
uniform and the non-uniform case. If the observations are sampled at
uniform time intervals, then timelapse=FIXED
. Otherwise,
timelapse=VARIABLE
has to be specified. Note that if FIXED
has been
selected, data
must be a vector
or a time series object ts
or
xts
. Otherwise, VARIABLE
has to be specified. In this case, data
must be a data.frame
, a data.table
, or a matrix
with two columns,
the date and the univariate time series as a sequence of numerical
values, in that order. The date can have the following three classes:
POSIXt
, Date
, or Factor
.
In the latter case, the date should come in the following format
YMD H:M:OS3
considering milliseconds, e.g., 20190407 00:00:03.347. If
the R users do not consider milliseconds, they must put .000 after the
seconds. The function embedding
returns the uniform or non-uniform
delayed-coordinate embedding vectors backward by columns from univariate
time series considering the parameter set selected by the R users. Let
us show an example. Firstly, the first five values corresponding to the
uniform embedding vectors set for m=5
, lag=2
, and timelapse=FIXED
are shown considering the time series previously simulated from the
logistic map. As we said before, we consider the delayed-coordinate
embedding vectors backward while other R packages (i.e., tseriesChaos)
consider them forward.
## Provides the uniform delayed-coordinate embedding vectors backwards
data <- DChaos::embedding(ts, m=5, lag=2, timelapse="FIXED")
show(head(data, 5))
y x1 x2 x3 x4
1 0.00422337 0.51625546 0.69509054 0.7365940 0.7466701
2 0.01682213 0.99894304 0.84775873 0.7760930 0.7566155
3 0.06615659 0.00422337 0.51625546 0.6950905 0.7365940
4 0.24711960 0.01682213 0.99894304 0.8477587 0.7760930
5 0.74420600 0.06615659 0.00422337 0.5162555 0.6950905
Non-uniform delayed-coordinate embedding vectors. We have used a data
set called sbux
, which belongs to the
highfrequency
package. It contains the bid quote price data for Starbucks company.
This R package provides several tools for high-frequency time series
data analysis. The first five values corresponding to the non-uniform
embedding vectors set for m = 3
, lag = 4
, and timelapse=VARIABLE
are shown. We can get the non-uniform delayed-coordinate embedding as
follows. Let
## Simulates tick-by-tick data from bid quote price for Starbucks company
ts2 <- highfrequency::sbux
## Provides the non-uniform delayed-coordinate embedding vectors backwards
data <- DChaos::embedding(ts2, m=3, lag=4, timelapse="VARIABLE")
show(head(data, 5))
y x1 x2
2010-07-01 15:30:09 -0.0012327924 -0.0010255359 0.0008179960
2010-07-01 15:30:11 0.0000000000 -0.0002053177 -0.0004090816
2010-07-01 15:30:12 -0.0004112688 -0.0002053177 -0.0020479221
2010-07-01 15:30:15 0.0004112688 0.0000000000 -0.0002053177
2010-07-01 15:30:16 -0.0004112688 -0.0004112688 -0.0002053177
Note that a key point to create a suitable reconstruction of the state
space is to fix criteria in order to estimate the embedding parameters
(
The main drawbacks of these heuristic approaches are the following:
For instance, the Akaike’s information criterion (AIC), the bayesian information criterion (BIC), the Hannan-Quinn information criterion (HQC), or the focused information criterion (FIC).
In any case, we think that the information derived from the heuristic approaches might be still useful and should not be disregarded as complementary information. The DChaos allows the R users to choose between both methods. By default, it uses the statistical approach based on model selection procedures instead of heuristic techniques. Now, once we have shown how to deal with the state space reconstruction, we are going to focus on the next step.
In this section, we are going to discuss the main estimation methods of the Lyapunov exponents from time series data briefly. As we said before, we have assumed that the true data-generating process is unknown because in most real-world, observed time series data, the underlying generator system is rarely known. For this reason, we will not take into consideration estimation methods that presuppose the knowledge of the equations of the dynamical system. There are mainly two kind of methods to compute the estimated Lyapunov exponent from time-series data: the direct methods and the Jacobian indirect methods. Let us begin by focusing on the direct methods.
The direct method was first proposed by (Wolf et al. 1985), and then revisited
by (Rosenstein et al. 1993), and by (Kantz 1994; Kantz and Schreiber 2004). The underlying
algorithm is explained in detail in (Kantz and Schreiber 1997). The three R packages
already mentioned are based on ideas inspired by the time series
analysis (TISEAN) project proposed by (Hegger et al. 1999). All of them
implement the algorithm proposed by Kantz using exclusively the direct
method. The tseriesChaos package includes the function lyap.k
, the
nonlinearTseries package has a function called maxLyapunov
, and
lyapunovPlot
is the function that belongs to the fNonlinear package.
We will compare the results from these packages with those provide by
the DChaos library later. Notice that the function lyapunovPlot
has
considered exactly the same code as the function lyap.k
, which belongs
to tseriesChaos. For this reason, we are going to consider only the
algorithms included in the R packages tseriesChaos and
nonlinearTseries.
The idea behind this direct approach can be described as follows. Given
a time series data of length
where
Hence, the presence of a sharp linear region in the plot of the
evolution of the logarithm of the mean distance between nearby points
could be a strong signal of chaotic behavior. The main advantages of the
direct methods are the following:
The DChaos package is the first R library that provides the Jacobian
indirect methods proposed by (Eckmann and Ruelle 1985) for estimating the Lyapunov
exponents. The procedure behind the Jacobian indirect method can be
described as follows. We have assumed following (Gençay and Dechert 1992) that there
exist a function
The Jacobian corresponding to the dynamical system
Notice that the estimation of the Lyapunov exponent by the jacobian
indirect method is reduced to the estimation of the unknown nonlinear
function
We have focused on the neural net approach, which is a global
nonparametric method that tries to estimate the underlying dynamic
system without imposing the restriction of local linearity. This
approach has the following advantages over all other methods:
The neural network (or neural net) estimator of the Lyapunov exponent was first proposed by (McCaffrey et al. 1992) and (Nychka et al. 1992) and then revisited by (Gençay and Dechert 1992) and by (Shintani and Linton 2003, 2004). Note that in our case, the main reason for using neural network models is not to look for the best predictive model but to estimate a model that captures the non-linear time dependence well enough and, additionally, allows us to obtain in an analytical way (instead of numerical) the jacobian functional of the unknown data-generating process (eq.(6)).
The estimation of this Jacobian will allow us to contrast the hypothesis
of chaos (eq.(1)) using equation (2). (Hornik et al. 1989)
showed that any standard feed-forward networks with as few as one hidden
layer using arbitrary squashing functions are capable of approximating
any Borel measurable function from one finite-dimensional space to
another for any desired degree of accuracy, providing sufficiently many
hidden units. In this sense, the feed-forward networks are a class of
universal approximations. Theoretically, neural nets are expected to
perform better than other approximation methods, especially with
high-dimensional models, since the approximation form is not so
sensitive to the increasing dimension. The results proposed by the
authors mentioned above have enabled us to consider a neural network
with just one single hidden layer. The number of hidden units (or
neurons) in the single hidden layer is determined by statistical methods
based on model selection criteria, as it appears in the results of the
paper. Let us show how we have obtained a consistent neural net
estimator based on the robust estimation of the function
the neural network estimator can be obtained by approximating the
unknown non-linear function
where
Note that in our case, the main reason for using neural network models is not to look for the best predictive model but to estimate a model that captures the non-linear time dependence well enough and, additionally, allows us to obtain in an analytical way (instead of numerical) the jacobian functional of the unknown underlying generator system (eq. (6)). The estimation of this jacobian or partial derivatives will later allow us to contrast our hypothesis of chaos using equation (2). We have obtained the partial derivates of the Jacobian in eq.(6), applying the chain rule to eq.(7) as
where
and the estimated partial derivates are given by
The DChaos package provides the function netfit
whose arguments are
serie
, m
, lag
, timelapse
, h
, w0maxit
, wtsmaxit
,
pre.white
, trace
, seed.t
, seed
, to fit the unknown non-linear
function nnet
function that belongs to
the nnet package to fit the
neural net models. Notice that the process of adjustment to a neural
network often suffers from being trapped in local optima, and different
initialization strategies should be taken into account.
For this reason, we have implemented the function w0.net
whose
arguments are x
, y
, m
, h
, rangx
, w0maxit
, seed.t
, seed
.
This function estimates previously the initial parameter vector of the
neural network being able to set the maximum number of iterations that
we want to obtain, setting w0maxit
. In addition, by default, the
neural network estimation is initialized with a fixed seed denoted by
seed.t=TRUE
with a value equal to seed=56666459
. The R users can let
the seed be fixed either randomly by seed.t=FALSE
or even fix other
values of the seed to be able to replicate the results obtained.
Best-fitted neural network models. We are going to illustrate an
example considering the time series data previously simulated from the
logistic map. The netfit
function returns several objects. The
best-fitted feed-forward single hidden layer neural net model is saved.
It also contains some useful information about the best set of weights
found, the fitted values, the residuals obtained, or the best embedding
parameters set chosen. The best 10 models are displayed on the console
as we show below for m
lag
h
m=1
, lag=1
, and h=9
.
The BIC criterion is defined following (Shintani and Linton 2003) by
where
## Provides the best-fitted neural network models for certain parameter set
model <- DChaos::netfit(ts, m=1:4, lag=1:3, timelapse="FIXED", h=2:10)
Best models:
m lag h BIC
85 1 1 9 -15.96086
73 1 1 8 -15.95343
97 1 1 10 -15.91173
37 1 1 5 -15.29339
61 1 1 7 -15.26654
63 3 1 7 -15.04988
49 1 1 6 -15.04090
3 3 1 2 -14.74957
98 2 1 10 -14.17451
## Summary method for a nnet object
## Provides the best set of weights found for the best-fitted neural net model (#85)
summary(model)
a 1-9-1 network with 28 weights
options were - linear output units
b->h1 i1->h1
-0.91 0.38
b->h2 i1->h2
-6.29 4.61
b->h3 i1->h3
3.06 -3.11
b->h4 i1->h4
1.49 6.10
b->h5 i1->h5
0.11 0.64
b->h6 i1->h6
0.70 3.02
b->h7 i1->h7
-0.19 0.57
b->h8 i1->h8
-1.37 0.84
b->h9 i1->h9
-0.03 -1.44
b->o h1->o h2->o h3->o h4->o h5->o h6->o h7->o h8->o h9->o
-4.75 -0.16 -3.30 2.97 0.64 -0.39 5.07 -0.45 -2.02 -2.27
Partial derivatives are calculated analytically from the best-fitted
neural net model. The DChaos package provides as well the function
jacobian.net
which arguments are model
, data
, m
, lag
,
timelapse
, h
, w0maxit
, wtsmaxit
, pre.white
, trace
, seed.t
,
seed
, to compute the partial derivatives of the Jacobian in
eq.(9) from the best-fitted feed-forward single hidden layer
network. See R documentation for the interpretation of this parameter
set. This function returns several objects considering the parameter set
selected by the user. Partial derivatives are calculated analytically
from the best-fitted neural net model. It also contains some useful
information about the best-fitted feed-forward single hidden layer
neural net model saved, the best set of weights found, the fitted
values, the residuals obtained, or the best embedding parameters set
chosen. This function allows the R user uses the data previously
obtained from the best-fitted neural network estimated by the netfit
function if model
is not empty. Otherwise, data
has to be specified.
Let us show an example. Firstly, we provide the first seven partial
derivatives values (dx1
) corresponding to the best-fitted neural net
model estimated above (neural net
## Computes analytically the partial derivatives from the best-fitted neural net model
## showed in the netfit example (#85)
jacobian <- DChaos::jacobian.net(model=model)
show(head(jacobian$jacobian))
dx1
1 -1.9701373
2 -2.0499734
3 -1.8893034
4 -2.2064735
5 -1.5568866
6 -2.7836348
7 -0.1313633
Now, we are going to provide the results obtained from the function
jacobian.net
without setting the best-fitted neural net model
estimated previously. We have considered the dataset simulated from the
logistic map again. We have chosen the following parameter set values
m=3
, lag=1
, timelapse=FIXED
, h=2:10
, w0maxit=100
,
wtsmaxit=1e6
, pre.white=FALSE
, trace=1
, seed.t=TRUE
, and
seed=56666459
. We show below the first seven partial derivatives
values (dx1
, dx2
, dx3
) corresponding to compute the partial
derivatives of the Jacobian following eq.(9) from the
best-fitted feed-forward single hidden layer network (neural net
## Partial derivatives are calculated analytically without setting previously
## any neural net model
jacobian <- DChaos::jacobian.net(data=ts, m=3:3, lag=1:1, timelapse="FIXED", h=2:10)
Best models:
m lag h BIC
6 3 1 7 -15.04988
1 3 1 2 -14.74957
7 3 1 8 -13.73532
4 3 1 5 -13.63287
3 3 1 4 -13.29422
8 3 1 9 -13.08082
2 3 1 3 -11.79369
9 3 1 10 -11.78703
5 3 1 6 -11.61098
show(head(jacobian$jacobian))
dx1 dx2 dx3
1 -1.8880449 -0.0003353808 0.0020567805
2 -2.2071082 -0.0007068399 0.0014469903
3 -1.5552770 0.0001417462 0.0026879632
4 -2.7896525 -0.0011615596 0.0003214048
5 -0.1343617 0.0029934521 0.0053918764
6 -3.9235626 -0.0012869863 -0.0015386951
7 3.9269526 0.0116947029 0.0046071740
In this section, we provide the right procedure to obtain a consistent
estimator of the Lyapunov exponent from the partial derivates estimated
following eq.(9). In addition, we are going to illustrate how
to test the hypothesis of chaos (eq.(1)) based on the
theoretical asymptotic properties of the neural net estimator. The
where
Blocking method | Sample size | Block length | Block number | Block subset |
---|---|---|---|---|
Full | n | |||
Non-overlapping | Blocks | M | ||
Equally spaced | Blocks | M | ||
Bootstrap | Blocks | M | Randomly |
The asymptotic properties of the non parametric neural network estimator
of the Lyapunov exponent
where
where
where
Hence, we will reject the null hypothesis if
Lyapunov exponent estimator by jacobian indirect method using a neural
net approach. The DChaos package provides several ways to figure out
robustly the neural net estimator of the jacobian.net
function, they can apply directly
the function lyapunov.spec
, which estimates the Lyapunov exponent
spectrum taking into account the QR decomposition procedure. They can
also use the function lyapunov.max
, which estimates only the largest
Lyapunov exponent considering the Norma-2 procedure. The arguments of
both functions are the same (data
, blocking
, B
, doplot
). See R
documentation for the interpretation of this parameter set. Hence, the
DChaos package allows the R users to choose between two different
procedures to obtain the neural net estimator of the
Note that the DChaos package provides 8 internal functions (one for
each procedure and blocking method), which estimate the Lyapunov
exponents consistently (eq.(10)). These functions return
several objects considering the parameter set selected by the user. The
largest Lyapunov exponent (lyapunov.max
) or the Lyapunov exponent
spectrum (lyapunov.spec
) by each blocking method are estimated. They
also contain some useful information about the estimated jacobian, the
best-fitted feed-forward single hidden layer neural net model, the best
set of weights found, the fitted values, the residuals obtained, the
best embedding parameters set chosen, the sample size, or the block
length considered by each blocking method. These functions provide the
standard error, the blocking = NOVER
), the equally spaced sample
(blocking = EQS
), or the bootstrap sample (blocking = BOOT
). The
mean and median values of the Lyapunov exponent for each block or
subsample are saved. By default, we recommend using the median value as
it is more robust to the presence of outliers. Notice that parameter B
(a non-negative integer denoting the number of bootstrap iterations)
will only be considered if the R users choose the bootstrap blocking
method.
Now, we are going to compare the results obtained from the
lyapunov.max
and lyapunov.spec
functions for estimating the largest
Lyapunov exponent and the Lyapunov exponent spectrum respectively. We
have considered the dataset simulated from the logistic map again, the
best-fitted neural net model estimated previously (neural net
m=3
, lag=1
, h=7
, and the partial derivatives are dx1
, dx2
,
dx3
. We show below the estimation of the largest Lyapunov exponent
provided by the Norma-2 procedure and the Lyapunov exponent spectrum
taking into account the results obtained by the QR decomposition
procedure (bootstrap blocking method). The results provided by both
methods are very similar between them (0.6943782 and 0.6942063
respectively), but thelyapunov.spec
function also allows the
estimation of the full spectrum of the Lyapunov exponents. The estimated
values by both methods are certainly close to the theoretical value
(0.6931472), and they are statistically significant at the 99
## Provides the largest Lyapunov exponent by the Norma-2 procedure considering the
## bootstrap blocking method from the best-fitted neural net model and the partial
## derivatives showed in the jacobian.net example.
exponent <- DChaos::lyapunov.max(data=jacobian, blocking="BOOT", doplot=FALSE)
## Provides summary statistics of the results given in an object of class lyapunov
summary(exponent)
Call:
Largest Lyapunov exponent
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Exponent 0.6943782 0.00443619 1783.442 1
---
Procedure: Norma-2 by bootstrap blocking method
Embedding dimension: 3, Time-delay: 1, No. hidden units: 7
Sample size: 997, Block length: 82, No. blocks: 1000
## Provides the Lyapunov exponent spectrum by the QR decomposition procedure considering
## the bootstrap blocking method from the best-fitted neural net model and the partial
## derivatives showed in the jacobian.net example as well.
exponent <- DChaos::lyapunov.spec(data=jacobian, blocking="BOOT", doplot=FALSE)
## Provides summary statistics of the results given in an object of class lyapunov
summary(exponent)
Call:
Lyapunov exponent spectrum
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Exponent 1 0.6942063 0.00309979 1670.773 1
Exponent 2 -3.5818150 0.00915042 -3873.927 0
Exponent 3 -3.7257022 0.00898081 -4053.132 0
---
Procedure: QR decomposition by bootstrap blocking method
Embedding dimension: 3, Time-delay: 1, No. hidden units: 7
Sample size: 997, Block length: 82, No. blocks: 1000
On the other hand, the DChaos package provides an all-in-one function
called lyapunov
. We have considered it appropriate to incorporate a
function that unifies the whole process to make it easier and more
intuitive for the R users. The lyapunov()
function has the following
usage:
lyapunov(data, m, lag, timelapse, h, w0maxit, wtsmaxit, pre.white, lyapmethod,
blocking, B, trace, seed.t, seed, doplot)
This function has fifteen main arguments where only the first one is mandatory. These are:
data:
a vector
, a time-series object ts
or xts
, a
data.frame
, a data.table
, or a matrix
depending on the method
selected in timelapse
. If FIXED
has been selected, data
must
be a vector
or a time-series object ts
or xts
. Otherwise,
VARIABLE
has to be specified. In this case, data
must be a
data.frame
, a data.table
, or a matrix
.
m:
a non-negative integer denoting a lower and upper bound for the
embedding dimension (Default 1:4).
lag:
a non-negative integer denoting a lower and upper bound for
the the reconstruction delay (Default 1:1).
timelapse:
a character denoting if the time-series data are
sampled at uniform time-frequency e.g., 1-month, 1-day, 1-hour,
30-min, 5-min, 1-min, and so on, FIXED
or non-uniform
time-frequency, which are not equally spaced in time VARIABLE
(Default FIXED
).
h:
a non-negative integer denoting a lower and upper bound for the
number of neurons (or nodes) in the single hidden layer (Default
2:10).
w0maxit:
a non-negative integer denoting the maximum iterations to
estimate the initial parameter vector of the neural net models
(Default 100).
wtsmaxit:
a non-negative integer denoting the maximum iterations
to estimate the weights parameter vector of the neural net models
(Default 1e6).
pre.white:
a logical value denoting if the user wants to use as
points to evaluate the partial derivatives the delayed vectors
filtered by the neural net model chosen TRUE
or not FALSE
(Default TRUE
).
lyapmethod:
a character denoting the procedure chosen to estimate
the Lyapunov exponent. If LLE
has been selected, the function will
estimate only the largest Lyapunov exponent through the Norma-2
method. If SLE
has been selected, the function will estimate the
Lyapunov exponent spectrum through the QR decomposition. Otherwise,
ALL
has to be specified. In this case, the function will estimate
the Lyapunov exponent considering both procedures (Default SLE
).
blocking:
a character denoting the blocking method chosen for
figuring out the Lyapunov exponent. Available options are FULL
if
the user considers the full sample, NOVER
if the user considers
the non-overlapping sample, EQS
if the user considers the equally
spaced sample, BOOT
if the user considers the bootstrap sample, or
ALL
if the user considers all of them (Default BOOT
).
B:
a non-negative integer denoting the number of bootstrap
iterations (Default 1000).
trace:
a binary value denoting if the user wants to print the
output on the console 1
or not 0
(Default 1
).
seed.t:
a logical value denoting if the user wants to fix the seed
TRUE
or not FALSE
(Default TRUE
).
seed:
a non-negative integer denoting the value of the seed
selected if seed.t = TRUE
(Default 56666459
).
doplot:
a logical value denoting if the user wants to draw a plot
TRUE
or not FALSE
. If it is TRUE
, the evolution of the
Lyapunov exponent values are represented for the whole period
considering the different procedures and blocking methods chosen by
the user. The default value is TRUE
.
The lyapunov()
function provides (at the same time) the
delayed-coordinate embedding vector from time-series data
(eq.(3)), estimates the best-fitted neural net model from the
delayed-coordinate embedding vectors (eq.(7)), calculates
analytically the partial derivatives from the chosen neural nets model
(eq.(9)). Finally, the neural net estimator of the lyapmethod = LLE
), the Lyapunov exponent spectrum
(lyapmethod = SLE
), or both (lyapmethod = ALL
) by each blocking
method are estimated. They also contain some useful information about
the estimated Jacobian, the best-fitted feed-forward single hidden layer
neural net model, the best set of weights found, the fitted values, the
residuals obtained, the best embedding parameters set chosen, the sample
size, or the block length considered by each blocking method. This
function also provides the standard error, the
Now, we are going to show an example of this all-in-one function. We
provide the results obtained from the lyapunov
function considering
the dataset simulated from the logistic map again for the following
parameter set values m=3:3
, lag=1:1
, timelapse=FIXED
, h=2:10
,
w0maxit=100
, wtsmaxit=1e6
, trace=1
, seed.t=TRUE
,
seed=56666459
, and doplot=FALSE
. In this case, we have estimated the
Lyapunov exponent spectrum taking into account the results provided by
the QR decomposition procedure (lyapmethod=SLE
) and all blocking
methods (blocking=ALL
). Note that the delayed-coordinate embedding
vector, the best-fitted neural net model (with a lower BIC value), and
the partial derivatives are internally calculated according to the
parameter set chosen (all-in-one way). The DChaos package provides the
summary
method for class "lyapunov"
called summary.lyapunov
as we
show below.
Let us point out briefly some useful information to understand the
results presented in this example. For the estimation based on blocking
methods, median values of all used blocks are given. For the block
length
## Provides the Lyapunov exponent spectrum by several blocking methods (all-in-one)
exponent <- DChaos::lyapunov(ts, m=3:3, lag=1:1, timelapse="FIXED", h=2:10, w0maxit=100,
wtsmaxit=1e6, pre.white=TRUE, lyapmethod="SLE", blocking="ALL",
B=100, trace=1, seed.t=TRUE, seed=56666459, doplot=FALSE)
Best models:
m lag h BIC
8 3 1 9 -13.37732
9 3 1 10 -12.90353
7 3 1 8 -12.76361
4 3 1 5 -12.60372
2 3 1 3 -12.20489
5 3 1 6 -11.88894
6 3 1 7 -11.85880
1 3 1 2 -11.75219
3 3 1 4 -10.76974
## Summary method for a lyapunov object
## Provides summary statistics of the results given in an object of class lyapunov
summary(exponent)
Call:
Lyapunov exponent spectrum
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Exponent 1 0.6922016 0.08452517 74.15722 1
Exponent 2 -2.7082326 0.09158355 -267.77832 0
Exponent 3 -3.2056094 0.10913227 -265.98941 0
---
Procedure: QR decomposition by bootstrap blocking method
Embedding dimension: 3, Time-delay: 1, No. hidden units: 9
Sample size: 997, Block length: 82, No. blocks: 100
... only the first method is shown (see lyapunov object)
Firstly, let us illustrate how adding just a (small) measurement noise
to a well-known deterministic dynamic system as the logistic map. This
fact increases the dispersion of the huge amount of points that describe
the attractor with the consequent inaccuracy; see figure 1. We
have added to each time-series data a normal multinomial error term
denoted by
## The user should make a loop adding a measurement noise with several variance values
par(mfrow=c(2,3))
noise <- c(0,0.01,0.02,0.03,0.04,0.05)
for (i in 1:length(noise)){
X <- logistic.sim(a=4, n=1000, s=noise[i])
plot(X[c(1:999)],X[c(2:1000)], cex=.4, xlab= "X(t-1)",ylab="X(t)",
main=paste("Measurement noise","s =",noise[i]))
}
In order to test the robustness of the different methods available to estimate the Lyapunov exponent, we are going to compare the results provided by the tseriesChaos package and the nonlinearTseries package with those of the DChaos package taking into account different measurement noise levels. We have considered four well-known chaotic dynamic systems. These datasets are available on the DChaos library; see Table 3.
The commands install.packages("DChaos")
and library(DChaos)
will
download, install, and load the DChaos package so it can be used. The
command set.seed(34)
also will set the seed for reproducibility. To
save CPU time, we have set the embedding dimension
Note that the neural net models are sorted from lowest to highest BIC values. Then each best-fitted neural net model (with a lower BIC value) is considered to estimate the 24000 largest Lyapunov exponents. The mean square error (MSE) is calculated between the theoretical value and the value estimated by several direct and indirect methods. To do that, we have used the Monte Carlo method. We have done 1000 repetitions by different initial conditions when simulating the time-series data from the four dynamic systems and six measurement noise levels considered.
The MSE values based on the estimation of the largest Lyapunov exponent
from the direct methods provided by the tseriesChaos and
nonlinearTseries packages are denoted by D1 and D2, respectively.
Those obtained by the Jacobian indirect methods through the DChaos
library are denoted by N2 (lyapunov.max
) and QR (lyapunov.spec
)
regarding the Norma-2 and QR decomposition procedures, respectively. The
results shown in table 4 provide the following comments.
First, we can remark that our algorithms are robust to the presence of (small) measurement errors because the results obtained are comparable to those which are noise-free. Although as the noise increases, the error committed increases, but it is not proportional in any case. Second, the indirect methods provide better estimates than direct methods in all the experiments we have conducted. The algorithms proposed by the tseriesChaos package behave better than those of the nonlinearTseries library. Between the two methods available in the DChaos package, we do not observe significant differences.
Dynamic system | Equations | Parameters | ||
---|---|---|---|---|
Logistic | 0.69314 | |||
Gauss | 0.38367 | |||
Hénon | 0.41921 | |||
-1.63479 | ||||
0.07143 | ||||
0.00000 | ||||
-0.53943 |
Logistic map | ||||||
---|---|---|---|---|---|---|
D1 direct method | 0.0001220 | 0.0056643 | 0.0030120 | 0.003006 | 0.0033485 | 0.0030913 |
D2 direct method | 0.4802315 | 0.4765133 | 0.4814125 | 0.4815446 | 0.4790305 | 0.4830895 |
N2 indirect method | 0.0000324 | 0.0000382 | 0.0000691 | 0.0000994 | 0.0001314 | 0.0001532 |
QR indirect method | 0.0000331 | 0.0000348 | 0.0000672 | 0.0000986 | 0.0000997 | 0.0001124 |
Gauss map | ||||||
D1 direct method | 0.0015270 | 0.0111180 | 0.0205349 | 0.0293853 | 0.0275621 | 0.0336681 |
D2 direct method | 0.1474216 | 0.1480353 | 0.1477251 | 0.1464405 | 0.1481204 | 0.1476371 |
N2 indirect method | 0.0000436 | 0.0000526 | 0.0000555 | 0.0000678 | 0.0000719 | 0.0000944 |
QR indirect method | 0.0000618 | 0.0000656 | 0.0000672 | 0.0000782 | 0.0000817 | 0.0000924 |
Hénon system | ||||||
D1 direct method | 0.0035650 | 0.0067221 | 0.0092761 | 0.0100339 | 0.0141379 | 0.0189926 |
D2 direct method | 0.3121588 | 0.3133259 | 0.3145991 | 0.3115671 | 0.3226997 | 0.3178221 |
N2 indirect method | 0.0000365 | 0.0000486 | 0.0000635 | 0.0000761 | 0.0000899 | 0.0000917 |
QR indirect method | 0.0000318 | 0.0000451 | 0.0000589 | 0.0000601 | 0.0000866 | 0.0000932 |
system | ||||||
D1 direct method | 0.0024471 | 0.0049521 | 0.0063189 | 0.0072719 | 0.0127326 | 0.0174911 |
D2 direct method | 0.6398841 | 0.6412752 | 0.6388524 | 0.6396631 | 0.6451333 | 0.6499127 |
N2 indirect method | 0.0002477 | 0.0003529 | 0.0005997 | 0.0006122 | 0.0009521 | 0.0019947 |
QR indirect method | 0.0003168 | 0.0004891 | 0.0006070 | 0.0007155 | 0.0008190 | 0.0009268 |
Third, the direct methods are surely less flexible and robust but much faster and still informative. The Jacobian indirect methods based on the neural net approach seems to perform well for every noisy time series data. The price we have to pay is a greater computational complexity from two points of view, the computing time and the tuning parameters (node weights). Fourth, we have only focused on the largest Lyapunov exponent as direct methods do not estimate the full spectrum. In addition, those methods do not allow us to make inferences about it. In our case, we will be able to do so. Hence, we are going to focus on the reliability of the proposed methods. We want to know the power and size of our algorithms.
Finally, we will focus on testing the reliability of the algorithms
provided by the DChaos package. For this purpose, we have calculated
the so-called size and power of our hypothesis test. Due to the test is
based on probabilities, there is always a chance of making an incorrect
conclusion. When one does a hypothesis test, two types of errors are
possible. As a reminder, when the null hypothesis is true, and we reject
it, we make a type I error. The probability of making a type I error is
denoted by
In order to understand the interrelationship between the type error I
and II, in this case, consider the following. As we pointed out before,
feasible test statistics were introduced, and a one-sided test was
proposed for the purpose of testing the hypothesis of chaos
(eq.(1)) based on the theoretical asymptotic properties of the
neural net estimator. Under the null hypothesis
Logistic map | 2.40 | 2.60 | 3.60 | 3.50 | 3.30 | 4.10 |
Gauss map | 2.70 | 3.60 | 4.20 | 4.70 | 4.30 | 4.70 |
Hénon system | 2.90 | 3.70 | 3.90 | 4.00 | 4.50 | 4.30 |
system | 3.10 | 3.50 | 4.70 | 4.90 | 4.50 | 5.10 |
Logistic map | 1.20 | 1.50 | 2.30 | 2.50 | 3.40 | 3.90 |
Gauss map | 1.50 | 1.70 | 1.90 | 2.70 | 3.10 | 3.50 |
Hénon system | 1.70 | 1.90 | 2.20 | 3.30 | 3.70 | 4.00 |
system | 1.70 | 1.80 | 2.30 | 4.10 | 3.70 | 4.70 |
Logistic map | 0.30 | 0.50 | 1.20 | 1.40 | 1.50 | 2.10 |
Gauss map | 0.50 | 0.80 | 0.90 | 1.10 | 1.40 | 2.50 |
Hénon system | 0.40 | 0.60 | 0.70 | 1.80 | 1.90 | 2.20 |
system | 0.50 | 1.10 | 1.30 | 1.70 | 2.50 | 3.30 |
Note that the results shown in tables 5-6 are given
in terms of rejection percentages of the tests at the
The data given in tables 5-6 provide the following
comments. First, we can point out that the reliability of the tests is
solid at the
Logistic map | 99.30 | 98.90 | 98.40 | 97.70 | 97.50 | 96.90 |
Gauss map | 99.40 | 99.00 | 98.70 | 98.50 | 97.10 | 97.00 |
Hénon system | 99.20 | 98.90 | 98.60 | 97.90 | 97.40 | 96.90 |
system | 98.70 | 98.10 | 97.40 | 97.00 | 96.70 | 96.20 |
Logistic map | 99.70 | 99.00 | 98.70 | 98.20 | 97.70 | 97.20 |
Gauss map | 99.70 | 99.30 | 98.90 | 98.70 | 98.40 | 97.70 |
Hénon system | 99.60 | 99.00 | 98.80 | 98.20 | 97.70 | 97.10 |
system | 99.40 | 98.80 | 98.30 | 97.70 | 97.20 | 96.90 |
Logistic map | 99.90 | 99.40 | 99.00 | 98.70 | 98.20 | 97.90 |
Gauss map | 100.00 | 99.60 | 99.30 | 99.00 | 98.80 | 98.20 |
Hénon system | 99.80 | 99.30 | 99.00 | 98.60 | 98.10 | 97.80 |
system | 99.70 | 99.00 | 98.70 | 98.20 | 97.50 | 97.00 |
The main feature of chaos is the well-known initial-value sensitivity property. The study of measures to quantify the initial-value sensitive property began with the introduction of Lyapunov exponents. So quantifying chaos through this kind of quantitative measure is a key point for understanding chaotic behavior. This paper describes the main statistical estimation methods of the Lyapunov exponent from time series data. At the same time, we present the DChaos library. This package provides an R interface for chaotic time series data analysis. R itself and all packages used are available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/.
R users may compute the delayed-coordinate embedding vectors from time series data, estimates the best neural net model, calculates the partial derivatives directly from the chosen neural network model. Finally, they can estimate both the largest Lyapunov exponent through the Norma-2 procedure and the Lyapunov exponent spectrum through the QR decomposition procedure taking into account the full sample and three different methods of subsampling by blocks. The results provided by (Shintani and Linton 2004) have enabled us to obtain a consistent Lyapunov exponent estimator and test the chaotic hypothesis based on the theoretical asymptotic properties of our neural net estimators.
Finally, looking to the future, our focus will be on extending the functionality of this package. Let us remark that the different contributions that compose the Jacobian indirect methods differ in the algorithm used for the estimation of the Jacobian. So far, we have focused on the neural net approach, which tries to estimate the underlying dynamic system without imposing the restriction of local linearity. We are working on three more Jacobian indirect methods that try to estimate the Jacobian by local polynomial kernel regressions, by local neural nets models, and by convolutional neural nets models. We are interested in developing the proposed algorithms from multivariate time series data. We are also focusing on how to apply parallelization and big data techniques to the design of the proposed algorithms in order to obtain computational efficiency when applying them to massive databases. We will keep providing useful tools and robust algorithms related to the analysis of chaotic time-series data.
This work was supported by the Government of Spain (grant RTI2018-094901-B-I00); the Complutense University of Madrid (Faculty of Statistical Studies); the San Pablo CEU University (Faculty of Economics); and the Data Analysis in Social and Gender Studies and Equality Policies Research Group (www.ucm.es/aedipi/).
tseriesChaos, nonlinearTseries, fNonlinear, DChaos, highfrequency, nnet
Econometrics, Finance, MachineLearning, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Sandubete & Escot, "DChaos: An R Package for Chaotic Time Series Analysis", The R Journal, 2021
BibTeX citation
@article{RJ-2021-036, author = {Sandubete, Julio E. and Escot, Lorenzo}, title = {DChaos: An R Package for Chaotic Time Series Analysis}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {1}, issn = {2073-4859}, pages = {232-252} }