DChaos: An R Package for Chaotic Time Series Analysis

Chaos theory has been hailed as a revolution of thoughts and attracting ever-increasing attention of many scientists from diverse disciplines. Chaotic systems are non-linear deterministic dynamic systems which can behave like an erratic and apparently random motion. A relevant field inside chaos theory is the detection of chaotic behavior from empirical time-series data. One of the main features of chaos is the well-known initial-value sensitivity property. Methods and techniques related to testing the hypothesis of chaos try to quantify the initial-value sensitive property estimating the so-called Lyapunov exponents. This paper describes the main estimation methods of the Lyapunov exponent from time series data. At the same time, we present the DChaos library. R users may compute the delayed-coordinate embedding vector from time series data, estimates the best-fitted neural net model from the delayed-coordinate embedding vectors, calculates analytically the partial derivatives from the chosen neural nets model. They can also obtain the neural net estimator of the Lyapunov exponent from the partial derivatives computed previously by two different procedures and four ways of subsampling by blocks. To sum up, the DChaos package allows the R users to test robustly the hypothesis of chaos in order to know if the data-generating process behind time series behaves chaotically or not. The package’s functionality is illustrated by examples.


Introduction
According to the literature, countless techniques have been developed and used to estimate the complexity of time series data; for a review see, e.g., Faggini (2014), Bradley and Kantz (2015), Tang et al. (2015).We have focused on methods derived from chaos theory which estimates the complexity of a dataset through exploring the structure of the attractor.Particularly, we have been interested in the so-called Lyapunov exponent (λ) as an attractor invariant measure.Quantifing chaos through this kind of quantitative measure is a key point for understanding chaotic behavior.Hence, our interest will be to test the hypothesis of chaos defined as follows: for k = 1, 2, 3, . . . on a k-dimensional system.Reject the null hypothesis H 0 : λk > 0 means lack of chaotic behavior.That is, the data-generating process does not have a chaotic attractor because it does not show the property of sensitivity to initial conditions, see Gençay and Dechert (1992).The existence of a positive Lyapunov exponent makes it possible to distinguish whether an apparently erratic, non-cyclical, and aperiodic dynamic system is random or chaotic.In this sense, a positive Lyapunov exponent allows us to evidence that a deterministic generating system exists behind that chaotic system in spite of showing an apparently random dynamic behavior.This fact would provide us to take advantage of this deterministic character to be able to seek modeling of time series using non-linear dynamic models, make reliable predictions, at least within the limits established by the sensitivity to the initial conditions, and control over the variables of these chaotic deterministic dynamic systems, see Fernández-Díaz (2019).
This ergodic measure can be defined as follows.Let X t = f (X t−1 ) be a difference equation where f : R k → R k for t = 1, 2, 3, . . ., n.For a k-dimensional system, there will be k Lyapunov exponents which are given by This relation indicates the average rate of divergence or convergence of an orbit f t (X 0 ) starting at point X 0 , where D f t (X 0 ) is the Jacobian evaluated along the trajectory {X 0 , X 1 , . . . ,X t }.If one knows the data-generating process behind the time series, the theoretical Lyapunov exponent can be calculated directly using its own definition as outlined in eq.2.However, we have assumed that the true dynamics of the system is unknown because in most real-world observed time series, the data-generating process is rarely known a priori.
The R Journal Vol.13/1, June 2021 ISSN 2073-4859 For this reason, we will not take into consideration estimation methods that presuppose the knowledge of the equations of the dynamical system.There are basically two kinds of methods in order to compute the estimated Lyapunov exponent from time-series data.The first one, the socalled direct approach, which directly measures the growth rate of the divergence between two neighboring trajectories with an infinitesimal difference in their initial conditions.The second one, the so-called indirect approach (or Jacobian-based method), which first fits a model to the data based on approximations of the trajectories in the reconstructed state space, and then the Jacobian matrices of the estimated dynamic system are used to compute the Lyapunov exponent.We will discuss both methods, although we have focused in greater detail on the Jacobian indirect methods for the reasons set out later.
There are some R packages recently developed related to nonlinear time series analysis and chaos.For instance, the tseriesChaos written by Narzo (2019), nonlinearTseries proposed by Garcia (2019) and fNonlinear provided by Wuertz et al. (2017).These R packages are based on ideas inspired by the time series analysis (TISEAN) project suggested by Hegger et al. (1999).All of them implement the algorithm written by Kantz (1994) using exclusively the direct method for estimating the Lyapunov exponent.The main drawbacks of the direct methods are the following: (i) it does not allow the estimation of the full spectrum of Lyapunov exponents; (ii) it is not robust to the presence of (small) measurement noise because these estimators can produce a linear scaling region giving a wrong chaotic Lyapunov exponent value even for non-chaotic systems; (iii) it does not have a satisfactory performance in detecting existing non-linearities on time-series data of moderate sample sizes; (iv) the asymptotic distribution of the estimator cannot be derived, means that it does not allow the building of formal tests.

Function Description embedding
Provides the delayed-coordinate embedding vectors backwards gauss.sim Simulates time series data from the Gauss map henon.sim Simulates time series data from the Hénon system jacobian.net Computes the partial derivatives from the best-fitted neural net model logistic.sim Simulates time series data from the Logistic map lyapunov Estimates the Lyapunov exponent through several methods lyapunov.max Estimates the largest Lyapunov exponent by the Norma-2 procedure lyapunov.spec Estimates the Lyapunov exponent spectrum by the QR decomposition netfit Fits any standard feed-forward neural net model from time series data rossler.sim Simulates time series data from the Rössler system summary.lyapunovSummary method for a lyapunov object w0.netEstimates the initial parameter vector of the neural net model In this paper, we present the DChaos package written by Sandubete and Escot (2021), which provides the following contributions.First, as far as we know, the DChaos package is the first R library that provides the Jacobian indirect methods proposed by Eckmann and Ruelle (1985) and Gençay and Dechert (1992) for estimating the Lyapunov exponents.These Jacobian indirect methods solve all drawbacks which belong to the direct methods.Particularly, this package has focused on the neural net approach following McCaffrey et al. (1992) and Nychka et al. (1992) by approximating the unknown non-linear system through a feed-forward single hidden layer neural network.Those neural net methods provide a well-fit of any unknown linear or non-linear model.They also have the advantage to allow the analytical derivation, rather than numerically, the jacobian needed for the estimation of the Lyapunov exponents.
Second, the DChaos package allows the use of full sample estimation, but it also provides three different blocking methods for estimating the Lyapunov exponents following McCaffrey et al. (1992) and Shintani and Linton (2004).One of them is a new proposal based on the bootstrap method.The results provided by the three blocking methods improve the case of the full sample being very similar between them.Although, on average, the bootstrap method gives better results.
Third, the DChaos package provides new algorithms implementing the formal test proposed by Shintani and Linton (2004), who provided a statistical framework for testing the hypothesis of chaos, based on the theoretical asymptotic properties from the neural net estimator and its variance.R users might make statistical inferences about Lyapunov exponents' significance and test the hypothesis of chaos (eq.1).Remember that the asymptotic distribution of the estimator obtained from direct methods does not exist, and that is, there is no chance to make statistical inference about chaos.
The R Journal Vol.13/1, June 2021 ISSN 2073-4859 Fourth, any methods for estimating the Lyapunov exponent from time series data are previously based on the state space reconstruction procedure.Note that a key point to create a suitable reconstruction of the state-space is to fix criteria in order to estimate the embedding parameters.Researchers usually estimate them using heuristic approaches based on prescriptions proposed by, e.g., Abarbanel (1996) or Kantz and Schreiber (2004).The main drawbacks of these heuristic approaches are the following: they are not intrinsically statistical; their results are not robust; they lead to estimators whose properties are unknown or largely unexplored; they do not take into account the results of any model fit.Although the Dchaos package also allows the use of heuristic methods, we have implemented some alternative and consistent statistical methods based on model selection criteria following Chan and Tong (2001) which solves those disadvantages.
Fifth, instead of considering only time-series data with uniform time-frequency, e.g., 1-month, 1-day, 1-hour, 30-min, 5-min, 1-min, and so on, as R packages mentioned above, the DChaos package also allows the use of time series data with non-uniform time-frequency.As far as we know, the DChaos package is the first R library that provides the reconstruction of the pseudo-estate space from time series data with non-uniform time-frequency (which are not equally spaced in time).In this sense, this package provides a new algorithm implementing the theorem proposed by Huke and Broomhead (2007), who extended the reconstruction theorem when the dynamical system is sampled non-uniformly in time.Their results have allowed us to get the non-uniform delayed-coordinate embedding vectors from time series data with non-uniform time-frequency, e.g., tick-by-tick financial time series.
To sum up, the DChaos package provides a novel R interface for researchers interested in the field of chaotic dynamic systems and non-linear time series analysis and professors (and students) who teach (learn) courses related to those topics.There are 12 functions available in the DChaos package; see table 1.We are going to describe all of them as we go on to explain the theoretical procedure.The rest of the paper is organized as follows.Section 2 presents the state space reconstruction procedure from time series data.Section 3 gives an overview about the main estimation methods of the Lyapunov exponents and provides some technical details about the DChaos package.Section 4 illustrates some examples showing the package's functionality.Section 5 contains some concluding remarks.

State space reconstruction from time series data
From now on, we assume that the true data-generating process is unknown.Hence, we do not have the advantage of observing directly the state of the system X t , let alone knowing the functional form f that generates the dynamic associated with it.Instead of that, there is an observer function that includes an additive measurement error ψ : R k+1 → R which generates observations as x t = ψ (X t , ε t ), where ε t is a sequence of independent and identically distributed random variables such that ε t is independent of X j , 0 ⩽ j ⩽ t, for t = 1, 2, 3, . . ., n.Therefore, it is assumed that all information available is the noise-contaminated sequence {x t } n t=1 as a time series data.We have considered it appropriate to add a measurement noise term in the observation function because most real-world, observed time series data are usually noise-contaminated signals.
The embedding procedure allows us to get all the relevant information (invariant properties) about the unknown underlying dynamical system that generates the time series data, e.g., the Lyapunov exponents defined previously must have approximately the same value in both the true and the reconstructed state space.This fact allows us to test the hypothesis of chaos in the unknown original dynamic system (1).Any methods for estimating the Lyapunov exponent from some observed time series data are based previously on the state space reconstruction procedure.The embedding theorem proposed by Takens (1981) provides a framework to reconstruct an unknown dynamical system which gave rise to a given observed scalar time series simply by reconstructing a new state space out of successive values of the time series.We have considered the method of delayed-coordinates proposed by Ruelle and Takens (1971) to get the delayed-coordinate embedding vectors as follows.Let {x t } n t=1 be the time series data.We form a sequence of delayed vectors by associating for each time period a vector in a reconstructed state space R m , whose coordinates satisfy the following equation: where m is the embedding dimension and τ is the reconstruction time-delay (or lag).
The R Journal Vol.13/1, June 2021 ISSN 2073-4859 We can construct a vector space whose axes represent all the relevant variables given by The underlying idea is to make copies of the measured signal with uniform time-lapse between observations and consider these delayed values as coordinates of a reconstructed state space, retrieved from the data.Notice that the reconstruction theorem assumes that the dynamical system is sampled uniformly in time.There has, however, been increasing interest in situations where observations are not uniform in time.That is, the data does not come from a series of values measured periodically in time.For instance, a motivating example would be the case of financial markets.The information generated by the interactions between traders who buy (bid orders) and sell (ask orders) financial instruments such as stocks, bonds, commodities, currencies, derivatives, and so on is apparently encoded in frequencies which are not usually equally spaced in time.Thus, if we observe a particular financial asset, i.e., a currency pair on the Foreign Exchange Market, the quotes or rates of this financial asset are sampling, by tick-by-tick intervals, which do not follow a constant rhythm.Each tick will appear when there is a change, upward or downward, in the trade price of each transaction.Can we use this kind of information to build a dynamical system model, and if so, how is it related to the true data-generating process?Huke and Broomhead (2007) showed how to extend the reconstruction theorem under certain conditions when the dynamical system is sampled non-uniformly in time.So, we can use their results to get the non-uniform delayed-coordinate embedding vectors from time series data with non-uniform time-frequency.
The three R packages mentioned before have implemented their algorithms considering only the state space reconstruction from time series data sampled uniformly in time.The fNonlinear package has a function called embeddPSR.buildTakens is the function that belongs to the nonlinearTseries package, and the tseriesChaos package includes the function embedd.Note that such packages consider the delayed-coordinate embedding vectors forward while we consider them backward.Both ways of reconstructing the attractor are correct, but we think it more convenient to define the delayedcoordinate embedding vectors backward since in time series analysis we explain the initial and future instants from what has happened in the past as in equation 3. Let us show some brief examples of how to deal with the state space reconstruction procedure from time series data.
Uniform delayed-coordinate embedding vectors.On the one hand, the first five values are shown using the embedd function, which belongs to tseriesChaos package for an embedding dimension m=5 and a time delay equal to d=2 .We have simulated time series data from the logistic equation contained in the DChaos package.The command set.seed(34) will set the seed for reproducibility.
## Simulates time-series data from the Logistic map with chaos ts <-DChaos::logistic.sim(a=4,n=1000) [1] 7.47e-01 7.57e-01 7.37e-01 7.76e-01 6.95e-01 8.48e-01 5.16e-01 9.99e-01 4.22e-03 ## Provides the uniform delayed-coordinate embedding vectors forwards data <-tseriesChaos::embedd(ts, m=5, d=2) show(head(data, 5)) On the other hand, the DChaos package provides the function embedding which arguments are (x, m, lag, timelapse) to get the delayed-coordinate embedding vectors backward considering both the uniform and the non-uniform case.If the observations are sampled at uniform time intervals, then timelapse=FIXED.Otherwise, timelapse=VARIABLE has to be specified.Note that if FIXED has been selected, data must be a vector or a time series object ts or xts.Otherwise, VARIABLE has to be specified.In this case, data must be a data.frame, a data.table,or a matrix with two columns, the date and the univariate time series as a sequence of numerical values, in that order.The date can have the following three classes: POSIXt, Date, or Factor.
The R Journal Vol.13/1, June 2021 ISSN 2073-4859 In the latter case, the date should come in the following format YMD H:M:OS3 considering milliseconds, e.g., 20190407 00:00:03.347.If the R users do not consider milliseconds, they must put .000 after the seconds.The function embedding returns the uniform or non-uniform delayed-coordinate embedding vectors backward by columns from univariate time series considering the parameter set selected by the R users.Let us show an example.Firstly, the first five values corresponding to the uniform embedding vectors set for m=5, lag=2, and timelapse=FIXED are shown considering the time series previously simulated from the logistic map.As we said before, we consider the delayed-coordinate embedding vectors backward while other R packages (i.e., tseriesChaos) consider them forward.
## Provides the uniform delayed-coordinate embedding vectors backwards data <-DChaos::embedding(ts, m=5, lag=2, timelapse="FIXED") show(head(data, 5)) This R package provides several tools for high-frequency time series data analysis.The first five values corresponding to the non-uniform embedding vectors set for m = 3, lag = 4, and timelapse=VARIABLE are shown.We can get the non-uniform delayed-coordinate embedding as follows.Let {x t i } n i=1 be our time series taking t i − t i−1 ̸ = t s − t s−1 ∀i ̸ = s.We form a sequence of delayed vectors by associating with each time a vector in a reconstructed state space R m , whose coordinates satisfy the following equation: where m is the embedding dimension and τ is the reconstruction delay (or lag).As in the uniform case (eq.3), we can construct a vector space whose axes represent all the relevant variables.
The main drawbacks of these heuristic approaches are the following: (i) they are not intrinsically statistical; (ii) they lead to estimators whose properties are unknown or largely unexplored; (iii) they do not take into account the results of any model fit.The alternative proposed by the statistical approach solves those 3 disadvantages.The statistical approach to state space reconstruction can be viewed as the best subset selection problem within the nonparametric regression context as argued Chan and Tong (2001).The idea behind it is to select the embedding parameters, τ and m, that provide the best fit in the estimation of the Lyapunov exponents taking into account some information criteria.
In any case, we think that the information derived from the heuristic approaches might be still useful and should not be disregarded as complementary information.The DChaos allows the R users to choose between both methods.By default, it uses the statistical approach based on model selection procedures instead of heuristic techniques.Now, once we have shown how to deal with the state space reconstruction, we are going to focus on the next step.

Estimating the Lyapunov exponents from time series data
In this section, we are going to discuss the main estimation methods of the Lyapunov exponents from time series data briefly.As we said before, we have assumed that the true data-generating process is unknown because in most real-world, observed time series data, the underlying generator system is rarely known.For this reason, we will not take into consideration estimation methods that presuppose the knowledge of the equations of the dynamical system.There are mainly two kind of methods to compute the estimated Lyapunov exponent from time-series data: the direct methods and the Jacobian indirect methods.Let us begin by focusing on the direct methods.

Direct method
The direct method was first proposed by Wolf et al. (1985), and then revisited by Rosenstein et al. (1993), and by Kantz (1994); Kantz and Schreiber (2004).The underlying algorithm is explained in detail in Kantz and Schreiber (1997).The three R packages already mentioned are based on ideas inspired by the time series analysis (TISEAN) project proposed by Hegger et al. (1999).All of them implement the algorithm proposed by Kantz using exclusively the direct method.The tseriesChaos package includes the function lyap.k, the nonlinearTseries package has a function called maxLyapunov, and lyapunovPlot is the function that belongs to the fNonlinear package.We will compare the results from these packages with those provide by the DChaos library later.Notice that the function lyapunovPlot has considered exactly the same code as the function lyap.k, which belongs to tseriesChaos.For this reason, we are going to consider only the algorithms included in the R packages tseriesChaos and nonlinearTseries.
The idea behind this direct approach can be described as follows.Given a time series data of length n, embedded in a m-dimensional space reconstructed with a time delay τ, the main aim is to compute a local average of the distances between every point x i in the embedding space and its neighbors for 1 ⩽ i ⩽ n − (m − 1) τ.That is, the evolution of the logarithm of this (local) mean distance L (∆) is monitored for a finite number ∆ of step ahead in time.The formula for this distance can be written as where T = n − (m − 1) τ is the number of points in the state space that are involved in the computation, #U (x i , ϵ) is the number of neighbors of each point that are closer than a distance equal to ϵ and have a temporal separation greater than a certain value.We compute the log-average distance between every point and its neighbours, and we follow it for ∆ time steps ahead.Thus, for every point x i , we have approximately a straight line which represents the evolution of the logarithm of the local mean distance.The average line of all these straight lines gives the evolution of the mean distance for the whole attractor.The slope of this average line gives us the estimated values of the Lyapunov exponent.
Note that the algorithm suggested by Kantz includes the one proposed by Rosentein, Collins, and De Luca as a special case (their number of neighbors are equal to 1).
Hence, the presence of a sharp linear region in the plot of the evolution of the logarithm of the mean distance between nearby points could be a strong signal of chaotic behavior.The main advantages of the direct methods are the following: (i) their low number of estimation parameters; (ii) easy implementation; (iii) they provide direct visual feedback to the R users whether the available time series data really exhibits exponential divergence on small scales; (iv) do not involve any kind of modeling; (v) do not require assumptions on the nature of the process.Despite this, there are certain drawbacks, as we pointed before: (i) it does not allow the estimation of the full spectrum of Lyapunov exponents; (ii) it is not robust to the presence of (small) noise because these estimators can produce a linear scaling region giving a positive estimate of the Lyapunov exponent even for non-chaotic systems; (iii) it does not have a satisfactory performance in detecting existing non-linearities on time series data of moderate sample sizes; (iv) the asymptotic distribution of the estimator cannot be derived, which means that it does not allow the building of formal tests.
The R Journal Vol.13/1, June 2021 ISSN 2073-4859 For these reasons, we have focused on another alternative called the jacobian indirect methods.Let us move on to explain how we have implemented this approach.

Jacobian indirect method
The DChaos package is the first R library that provides the Jacobian indirect methods proposed by Eckmann and Ruelle (1985) for estimating the Lyapunov exponents.The procedure behind the Jacobian indirect method can be described as follows.We have assumed following Gençay and Dechert (1992) that there exist a function g : R m → R m such that x m t i = g x m t i −τ , where x m t i are the non-uniform delayed-coordinate embedding vectors which contains the uniform case taking t i − t i−1 = t s − t s−1 ∀i ̸ = s.Under the assumption that the embedding is a homeomorphism, the map g is topologically conjugate to the unknown dynamic system f in equation 2. This implies that certain dynamical properties of f and g are the same.In our case, the Lyapunov exponents of f and g should be the same, so we can focus on estimating the exponents from the map g.The dynamical system g may be expressed as a matrix by The Jacobian corresponding to the dynamical system g will be as follows: Notice that the estimation of the Lyapunov exponent by the jacobian indirect method is reduced to the estimation of the unknown nonlinear function υ : R m → R. The different approaches that compose the indirect method differ in the algorithm used for the estimation of this function υ in the Jacobian (eq.7).The main contributions focus on two different approaches.Firstly, those who use some kind of local linear regression, see Sano and Sawada (1985), Eckmann et al. (1986), Brown et al. (1991), or their extension in the form of local polynomial regression proposed by Lu and Smith (1997).Second, other approaches use nonlinear models based on neural networks, see McCaffrey et al. (1992), Nychka et al. (1992), Dechert and Gençay (1992), Whang and Linton (1999), Shintani andLinton (2003, 2004).In this sense, there are no analogous functions in the R packages already mentioned since they consider exclusively the direct method.In addition, those packages do not allow us to make inferences about the Lyapunov exponents and test if the estimated values of the Lyapunov exponents are or not statistically significant.
We have focused on the neural net approach, which is a global nonparametric method that tries to estimate the underlying dynamic system without imposing the restriction of local linearity.This approach has the following advantages over all other methods: (i) its robustness to the presence of (small) noise as the measurement errors present in most real-world, observed time series data; (ii) their satisfactory performance in detecting existing non-linearities on time series data of moderate sample sizes; (iii) allows the estimation of the full spectrum of Lyapunov exponents; (iv) the asymptotic distribution of the estimator can be derived, allowing the building of formal tests.

Neural network approach
The neural network (or neural net) estimator of the Lyapunov exponent was first proposed by Mc-Caffrey et al. (1992) and Nychka et al. (1992) and then revisited by Gençay and Dechert (1992) and by Shintani andLinton (2003, 2004).Note that in our case, the main reason for using neural network models is not to look for the best predictive model but to estimate a model that captures the non-linear time dependence well enough and, additionally, allows us to obtain in an analytical way (instead of numerical) the jacobian functional of the unknown data-generating process (eq.7).
The R Journal Vol.13/1, June 2021 ISSN 2073-4859 The estimation of this Jacobian will allow us to contrast the hypothesis of chaos (eq.1) using equation 2. Hornik et al. (1989) showed that any standard feed-forward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite-dimensional space to another for any desired degree of accuracy, providing sufficiently many hidden units.In this sense, the feed-forward networks are a class of universal approximations.Theoretically, neural nets are expected to perform better than other approximation methods, especially with high-dimensional models, since the approximation form is not so sensitive to the increasing dimension.The results proposed by the authors mentioned above have enabled us to consider a neural network with just one single hidden layer.The number of hidden units (or neurons) in the single hidden layer is determined by statistical methods based on model selection criteria, as it appears in the results of the paper.Let us show how we have obtained a consistent neural net estimator based on the robust estimation of the function υ in the Jacobian (eq.7).Note that if we consider the m-dimensional reconstruction vector as defined by eq.6, the neural network estimator can be obtained by approximating the unknown non-linear function υ through a feed-forward single hidden layer network with a single output by where Φ 0 ∈ I, α0 is the estimated network bias from input, h is the number of neurones (or nodes) in the single hidden layer, ωq0 are the estimated layers connection weights from input to hidden layer, Φ q is the transfer function, which in our case is the logistic function, αq is the estimated network bias from hidden layer, m is the embedding dimension, and ωjq are the estimated layers connection weights from hidden layer to output.The issue of parameter estimation is reduced to a least squares problem in which the quantity to be minimized is defined by Note that in our case, the main reason for using neural network models is not to look for the best predictive model but to estimate a model that captures the non-linear time dependence well enough and, additionally, allows us to obtain in an analytical way (instead of numerical) the jacobian functional of the unknown underlying generator system (eq.7).The estimation of this jacobian or partial derivatives will later allow us to contrast our hypothesis of chaos using equation 2. We have obtained the partial derivates of the Jacobian in eq.7, applying the chain rule to eq.8 as ∂ υ where and the estimated partial derivates are given by The DChaos package provides the function netfit whose arguments are serie, m, lag, timelapse, h, w0maxit, wtsmaxit, pre.white, trace, seed.t,seed, to fit the unknown non-linear function υ in eq.8 through a feed-forward single hidden layer network.See R documentation for the interpretation of this parameter set.We have considered the nnet function that belongs to the nnet package to fit the neural net models.Notice that the process of adjustment to a neural network often suffers from being trapped in local optima, and different initialization strategies should be taken into account.
The R Journal Vol.13/1, June 2021 ISSN 2073-4859 For this reason, we have implemented the function w0.net whose arguments are x, y, m, h, rangx, w0maxit, seed.t,seed.This function estimates previously the initial parameter vector of the neural network being able to set the maximum number of iterations that we want to obtain, setting w0maxit.In addition, by default, the neural network estimation is initialized with a fixed seed denoted by seed.t=TRUE with a value equal to seed=56666459.The R users can let the seed be fixed either randomly by seed.t=FALSE or even fix other values of the seed to be able to replicate the results obtained.
Best-fitted neural network models.We are going to illustrate an example considering the time series data previously simulated from the logistic map.The netfit function returns several objects.The best-fitted feed-forward single hidden layer neural net model is saved.It also contains some useful information about the best set of weights found, the fitted values, the residuals obtained, or the best embedding parameters set chosen.The best 10 models are displayed on the console as we show below for m ∈ {1 : 4}, lag ∈ {1 : 3} and h ∈ {2 : 10}.The first column is the neural net number, the second column is the embedding dimension, the third column is the lag or reconstruction delay considered, the fourth column is the number of neurons (or nodes) in the single hidden layer, and the fifth column is the Bayesian Information Criterion (BIC) value corresponding to that neural net.Notice that the neural net models are sorted from lowest to highest BIC values.In this case, the best-fitted neural net model has the following parameter set values m=1, lag=1, and h=9.The BIC criterion is defined following Shintani and Linton (2003) by where RSS is the residual sum of squares, n is the number of observations, m is the embedding dimension, and h is the number of neurones (or nodes) used in the single hidden layer as noted above.

Lyapunov exponent estimation and inference
In this section, we provide the right procedure to obtain a consistent estimator of the Lyapunov exponent from the partial derivates estimated following eq.10.In addition, we are going to illustrate how to test the hypothesis of chaos (eq.1) based on the theoretical asymptotic properties of the neural net estimator.The kth Lyapunov exponent is given by where µ k is the kth largest eigenvalue provided by Dg M = Dg (x t M ) • Dg (x t M−1 ) • . . .• Dg (x t 1 ) for k = 1, 2, 3, . . ., m. Dg () are the partial derivates estimated above following eq.10.Note that it is necessary to distinguish between the sample size n used for estimating the partial derivatives of the Jacobian in eq.9 and the block length M, defined in eq.11, which is the number of evaluation points (number of products of the Jacobian) used for estimating the kth Lyapunov exponent.Since the number of evaluation points is less than or equal to n, M can be also understood as the sample size of a subsample.We have taken into account in our algorithms both the full sample and three different methods of subsampling by blocks, as we show below in table 2. The first column shows the blocking methods considered.The second, third, and fourth column give the sample size, the block length, and the block number of each subsampling method, respectively.The fifth column provides the way in which the algorithm picks the position of each element inside the block where each block B corresponds to one row.The bootstrap blocking method takes random samples without replacement by each block.The asymptotic properties of the non parametric neural network estimator of the Lyapunov exponent λk was derived by Shintani and Linton Shintani and Linton (2004).They provided a statistical framework for testing the hypothesis of chaos (eq.1) based on the neural net estimator of the Lyapunov exponent and the consistent estimator of its variance.Their results showed the asymptotic normality of the Lyapunov exponent estimator by where φ k is the variance of the kth Lyapunov exponent estimator (eq.2).They proved that φk is a consistent variance estimator of φ k .It can be defined as follows: where M is the subsampling size, that is equal to n for the full sample.The quadratic spectral kernel function η k,t is given by η k,t = ξ k,t − λk following Shintani and Linton (2004).The parameter ξ k,t is obtained by The R Journal Vol.13/1, June 2021 ISSN 2073-4859 where t = 1, 2, . . ., M.Then, a feasible test statistics were introduced, and a one-sided test was proposed for the purpose of testing the hypothesis of chaos (eq.1) based on the theoretical asymptotic properties of the neural net estimator.That is, we would know if the estimated Lyapunov exponent values are or not statistically significant.Hence, our interest will be to test the null hypothesis H 0 : λk > 0 against the alternative H 1 : λk ⩽ 0. The test statistics can be defined as follows: Hence, we will reject the null hypothesis if tk ⩽ −z α , where z α is the critical value that satisfies Pr [Z ⩾ z α ] = α with Z being a standard normal random variable and α is the significance level.Under the null hypothesis H 0 that the data-generating process is chaotic, the neural net estimator λk leads to asymptotically valid inferences in that the associated p-value follows a normal distribution on N (0, φk ).Rejecting the null hypothesis H 0 : λk > 0 means lack of chaotic behavior.Thus, we have used these results to calculate the standard error of the Lyapunov exponent estimator and investigate the statistical significance of the sign of the exponents.
Lyapunov exponent estimator by jacobian indirect method using a neural net approach.The DChaos package provides several ways to figure out robustly the neural net estimator of the kth Lyapunov exponent.On the one hand, if the R users have previously obtained the partial derivatives from the jacobian.netfunction, they can apply directly the function lyapunov.spec,which estimates the Lyapunov exponent spectrum taking into account the QR decomposition procedure.They can also use the function lyapunov.max,which estimates only the largest Lyapunov exponent considering the Norma-2 procedure.The arguments of both functions are the same (data, blocking, B, doplot).See R documentation for the interpretation of this parameter set.Hence, the DChaos package allows the R users to choose between two different procedures to obtain the neural net estimator of the kth Lyapunov exponent and four ways of subsampling by blocks: full sample, non-overlapping sample, equally spaced sample, and bootstrap sample; for a review see table 2.
Note that the DChaos package provides 8 internal functions (one for each procedure and blocking method), which estimate the Lyapunov exponents consistently (eq.11).These functions return several objects considering the parameter set selected by the user.The largest Lyapunov exponent (lyapunov.max)or the Lyapunov exponent spectrum (lyapunov.spec)by each blocking method are estimated.They also contain some useful information about the estimated jacobian, the best-fitted feed-forward single hidden layer neural net model, the best set of weights found, the fitted values, the residuals obtained, the best embedding parameters set chosen, the sample size, or the block length considered by each blocking method.These functions provide the standard error, the z test value, and the p-value for testing the hypothesis of chaos (eq.1).Rejecting the null hypothesis H 0 means lack of chaotic behavior.That is, the data-generating process does not have a chaotic attractor because it does not show the property of sensitivity to initial conditions.The blocking methods split the time-series data into several blocks to estimate a Lyapunov exponent for each subsample.The R users can choose the non-overlapping sample (blocking = NOVER), the equally spaced sample (blocking = EQS), or the bootstrap sample (blocking = BOOT).The mean and median values of the Lyapunov exponent for each block or subsample are saved.By default, we recommend using the median value as it is more robust to the presence of outliers.Notice that parameter B (a non-negative integer denoting the number of bootstrap iterations) will only be considered if the R users choose the bootstrap blocking method.Now, we are going to compare the results obtained from the lyapunov.maxand lyapunov.specfunctions for estimating the largest Lyapunov exponent and the Lyapunov exponent spectrum respectively.We have considered the dataset simulated from the logistic map again, the best-fitted neural net model estimated previously (neural net #6) which the best embedding parameters set chosen is m=3, lag=1, h=7, and the partial derivatives are dx1, dx2, dx3.We show below the estimation of the largest Lyapunov exponent provided by the Norma-2 procedure and the Lyapunov exponent spectrum taking into account the results obtained by the QR decomposition procedure (bootstrap blocking method).The results provided by both methods are very similar between them (0.6943782 and 0.6942063 respectively), but thelyapunov.specfunction also allows the estimation of the full spectrum of the Lyapunov exponents.The estimated values by both methods are certainly close to the theoretical value (0.6931472), and they are statistically significant at the 99% confidence level.This fact evidences the consistency of the proposed algorithms.On the other hand, the DChaos package provides an all-in-one function called lyapunov.We have considered it appropriate to incorporate a function that unifies the whole process to make it easier and more intuitive for the R users.The lyapunov() function has the following usage: lyapunov (data, m, lag, timelapse, h, w0maxit, wtsmaxit, pre.white, lyapmethod, blocking, B, trace, seed.t, seed, doplot) This function has fifteen main arguments where only the first one is mandatory.These are: data: a vector, a time-series object ts or xts, a data.frame, a data.table,or a matrix depending on the method selected in timelapse.If FIXED has been selected, data must be a vector or a time-series object ts or xts.Otherwise, VARIABLE has to be specified.In this case, data must be a data.frame, a data.table,or a matrix.m: a non-negative integer denoting a lower and upper bound for the embedding dimension (Default 1:4).
lag: a non-negative integer denoting a lower and upper bound for the the reconstruction delay (Default 1:1).
timelapse: a character denoting if the time-series data are sampled at uniform time-frequency e.g., 1-month, 1-day, 1-hour, 30-min, 5-min, 1-min, and so on, FIXED or non-uniform timefrequency, which are not equally spaced in time VARIABLE (Default FIXED).
h: a non-negative integer denoting a lower and upper bound for the number of neurons (or nodes) in the single hidden layer (Default 2:10).
w0maxit: a non-negative integer denoting the maximum iterations to estimate the initial parameter vector of the neural net models (Default 100).
The R Journal Vol.13/1, June 2021 ISSN 2073-4859 wtsmaxit: a non-negative integer denoting the maximum iterations to estimate the weights parameter vector of the neural net models (Default 1e6).
pre.white: a logical value denoting if the user wants to use as points to evaluate the partial derivatives the delayed vectors filtered by the neural net model chosen TRUE or not FALSE (Default TRUE).
lyapmethod: a character denoting the procedure chosen to estimate the Lyapunov exponent.If LLE has been selected, the function will estimate only the largest Lyapunov exponent through the Norma-2 method.If SLE has been selected, the function will estimate the Lyapunov exponent spectrum through the QR decomposition.Otherwise, ALL has to be specified.In this case, the function will estimate the Lyapunov exponent considering both procedures (Default SLE).
blocking: a character denoting the blocking method chosen for figuring out the Lyapunov exponent.Available options are FULL if the user considers the full sample, NOVER if the user considers the non-overlapping sample, EQS if the user considers the equally spaced sample, BOOT if the user considers the bootstrap sample, or ALL if the user considers all of them (Default BOOT).
B: a non-negative integer denoting the number of bootstrap iterations (Default 1000).
trace: a binary value denoting if the user wants to print the output on the console 1 or not 0 (Default 1).
seed.t: a logical value denoting if the user wants to fix the seed TRUE or not FALSE (Default TRUE).
seed: a non-negative integer denoting the value of the seed selected if seed.t = TRUE (Default 56666459).
doplot: a logical value denoting if the user wants to draw a plot TRUE or not FALSE.If it is TRUE, the evolution of the Lyapunov exponent values are represented for the whole period considering the different procedures and blocking methods chosen by the user.The default value is TRUE.
The lyapunov() function provides (at the same time) the delayed-coordinate embedding vector from time-series data (eq.3),estimates the best-fitted neural net model from the delayed-coordinate embedding vectors (eq.8), calculates analytically the partial derivatives from the chosen neural nets model (eq.10).Finally, the neural net estimator of the kth Lyapunov exponent (eq.11) is obtained from the partial derivatives computed previously.This function returns several objects considering the parameter set selected by the user.The largest Lyapunov exponent (lyapmethod = LLE), the Lyapunov exponent spectrum (lyapmethod = SLE), or both (lyapmethod = ALL) by each blocking method are estimated.They also contain some useful information about the estimated Jacobian, the best-fitted feed-forward single hidden layer neural net model, the best set of weights found, the fitted values, the residuals obtained, the best embedding parameters set chosen, the sample size, or the block length considered by each blocking method.This function also provides the standard error, the z test value, and the p-value for testing the hypothesis of chaos (eq.1).As we said before, rejecting the null hypothesis H 0 means lack of chaotic behavior.That is, the data-generating process does not have chaotic attractor because it does not show the property of sensitivity to initial conditions.Now, we are going to show an example of this all-in-one function.We provide the results obtained from the lyapunov function considering the dataset simulated from the logistic map again for the following parameter set values m=3:3, lag=1:1, timelapse=FIXED, h=2:10, w0maxit=100, wtsmaxit=1e6, trace=1, seed.t=TRUE,seed=56666459, and doplot=FALSE.In this case, we have estimated the Lyapunov exponent spectrum taking into account the results provided by the QR decomposition procedure (lyapmethod=SLE) and all blocking methods (blocking=ALL).Note that the delayed-coordinate embedding vector, the best-fitted neural net model (with a lower BIC value), and the partial derivatives are internally calculated according to the parameter set chosen (all-in-one way).The DChaos package provides the summary method for class "lyapunov" called summary.lyapunov as we show below.
Let us point out briefly some useful information to understand the results presented in this example.For the estimation based on blocking methods, median values of all used blocks are given.
For the block length M, the DChaos package uses M = int c × (n/ log n) 1/6 with c = 36.2where int [A] signifies the integer part of A following Shintani and Linton (2004).The number of blocks B depends on the sample size n of each time series data; for a review, see table 2. QS Kernel with optimal bandwidth has been used for the heteroskedasticity and autocorrelation consistent covariance matrix estimation following Andrews (1991).In this example, the results provided by the three blocking methods improve the case of the full sample being very similar between them.This result is consistent with the recommendation proposed by Shintani and Linton (2004) on the use of blocking methods instead of full sampling when estimating the Lyapunov exponent.

Analysis of chaotic time-series data adding a noise
Firstly, let us illustrate how adding just a (small) measurement noise to a well-known deterministic dynamic system as the logistic map.This fact increases the dispersion of the huge amount of points that describe the attractor with the consequent inaccuracy; see figure 1.We have added to each time-series data a normal multinomial error term denoted by ε t ∼ N (0, s) with different variance values s.The R code used to obtain the graphs shown below is the following.We have used the dataset simulated previously from the logistic map with chaotic behavior.
The commands install.packages("DChaos")and library(DChaos) will download, install, and load the DChaos package so it can be used.The command set.seed(34) also will set the seed for reproducibility.To save CPU time, we have set the embedding dimension 1 ⩽ m ⩽ 7, the time delay τ = 1, the number of nodes in the single hidden layer 2 ⩽ h ⩽ 10, and the length of all time series data The R Journal Vol.13/1, June 2021 ISSN 2073-4859  4.
Note that the neural net models are sorted from lowest to highest BIC values.Then each best-fitted neural net model (with a lower BIC value) is considered to estimate the 24000 largest Lyapunov exponents.The mean square error (MSE) is calculated between the theoretical value and the value estimated by several direct and indirect methods.To do that, we have used the Monte Carlo method.We have done 1000 repetitions by different initial conditions when simulating the time-series data from the four dynamic systems and six measurement noise levels considered.
The MSE values based on the estimation of the largest Lyapunov exponent from the direct methods provided by the tseriesChaos and nonlinearTseries packages are denoted by D1 and D2, respectively.Those obtained by the Jacobian indirect methods through the DChaos library are denoted by N2 (lyapunov.max)and QR (lyapunov.spec)regarding the Norma-2 and QR decomposition procedures, respectively.The results shown in table 4 provide the following comments.
First, we can remark that our algorithms are robust to the presence of (small) measurement errors because the results obtained are comparable to those which are noise-free.Although as the noise increases, the error committed increases, but it is not proportional in any case.Second, the indirect methods provide better estimates than direct methods in all the experiments we have conducted.The algorithms proposed by the tseriesChaos package behave better than those of the nonlinearTseries library.Between the two methods available in the DChaos package, we do not observe significant differences.

Dynamic system Equations
Parameters  Third, the direct methods are surely less flexible and robust but much faster and still informative.The Jacobian indirect methods based on the neural net approach seems to perform well for every noisy time series data.The price we have to pay is a greater computational complexity from two points of view, the computing time and the tuning parameters (node weights).Fourth, we have only focused on the largest Lyapunov exponent as direct methods do not estimate the full spectrum.In addition, those methods do not allow us to make inferences about it.In our case, we will be able to do so.Hence, we are going to focus on the reliability of the proposed methods.We want to know the power and size of our algorithms.
Finally, we will focus on testing the reliability of the algorithms provided by the DChaos package.For this purpose, we have calculated the so-called size and power of our hypothesis test.Due to the test is based on probabilities, there is always a chance of making an incorrect conclusion.When one does a hypothesis test, two types of errors are possible.As a reminder, when the null hypothesis is true, and we reject it, we make a type I error.The probability of making a type I error is denoted by α, which is the significance level that we set for our hypothesis test.An α of 0.05 indicates that we are willing to accept a 5% chance that we are wrong when we reject the null hypothesis.To lower this risk, we must use a lower value for α.However, using a lower value for alpha means that we will be less likely to detect a true difference if one really exists.The probability of not rejecting the null hypothesis when it is true (not committing a type I error) is called the size of the test.When the null hypothesis is false, and we fail to reject it, we make a type II error.The probability of making a type II error is called beta, and is often denoted by β.We can decrease our risk of committing a type II error by ensuring our test has enough power.We can do this by ensuring our sample size is large enough to detect a practical difference when one truly exists.The probability of rejecting the null hypothesis when it is false (not committing a type II error) is called the power of the test.
In order to understand the interrelationship between the type error I and II, in this case, consider the following.As we pointed out before, feasible test statistics were introduced, and a one-sided test was proposed for the purpose of testing the hypothesis of chaos (eq.1) based on the theoretical asymptotic properties of the neural net estimator.Under the null hypothesis H 0 that the data-generating process is chaotic, the neural net estimator λk leads to asymptotically valid inferences in that the associated pvalue follows a normal distribution on N (0, φk ).Hence, our interest will be to test the null hypothesis H 0 : λk > 0 against the alternative H 1 : λk ⩽ 0. We will reject the null hypothesis if tk ⩽ −z α , where z α is the critical value that satisfies Pr [Z ⩾ z α ] = α with Z being a standard normal random variable and α is the significance level.Rejecting the null hypothesis H 0 : λk > 0 means lack of chaotic behavior.That is, the data-generating process does not have a chaotic attractor because it does not show the property of sensitivity to initial conditions.Thus, we have used these results to calculate the standard error of the Lyapunov exponent estimator and investigate the statistical significance of the sign of the exponents in order to test the reliability of the proposed algorithms.Table 5: Rejection percentages at the 5% significance level from the dynamic systems considered with a chaotic behaviour.These results provide the size of our hypothesis test.
Note that the results shown in tables 5-6 are given in terms of rejection percentages of the tests at the 5% significance level over 1000 Monte Carlo replications.The rejection percentages give an indication of the size of the tests (tab.5) and the power of the tests (tab.6).We have chosen n = 50, 100, 200 as we want to check the reliability of the algorithms in small sample sizes.The four dynamic systems used are listed in tab.3.Table 5 provides the results from chaotic dynamic systems which have the following parameter set values: µ = 4 (logistic map); α = 6.2, β = −0.5 (Gauss map); a = 1.4,b = 0.3 (Hénon system), and a = 0.2, b = 0.2, c = 5.7 (Rössler system).Table 6 gives the results from nonchaotic dynamic systems which have the following parameter set values: µ = 3.2 (logistic map); α = 4.9, β = −0.58(Gauss map); a = 1.2, b = 0.1 (Hénon system), and a = 0.1, b = 0.1, c = 7 (Rössler system).
The data given in tables 5-6 provide the following comments.First, we can point out that the reliability of the tests is solid at the 5% significance level.The rejection percentages in almost every situation, even for n = 50, are low when the null hypothesis H 0 is true and high when H 0 is false.Second, the results provide satisfactory performance in moderate sample sizes.This fact is really important for those researchers who work with short time series data.Third, the empirical size decrease and the empirical power increase as the sample size increase which means that our tests are consistent as well.Fourth, the noise-contaminated data are comparable to those which are noise-free.Although as the noise increases, the errors committed increase but not significantly.Fifth, the results shown are fairly robust with respect to the parameter values of the four dynamic systems considered.

Conclusion
The main feature of chaos is the well-known initial-value sensitivity property.The study of measures to quantify the initial-value sensitive property began with the introduction of Lyapunov exponents.So quantifying chaos through this kind of quantitative measure is a key point for understanding chaotic behavior.This paper describes the main statistical estimation methods of the Lyapunov exponent from time series data.At the same time, we present the DChaos library.This package provides an R interface for chaotic time series data analysis.R itself and all packages used are available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/.
R users may compute the delayed-coordinate embedding vectors from time series data, estimates the best neural net model, calculates the partial derivatives directly from the chosen neural network model.Finally, they can estimate both the largest Lyapunov exponent through the Norma-2 procedure and the Lyapunov exponent spectrum through the QR decomposition procedure taking into account the full sample and three different methods of subsampling by blocks.The results provided by Shintani and Linton (2004) have enabled us to obtain a consistent Lyapunov exponent estimator and test the chaotic hypothesis based on the theoretical asymptotic properties of our neural net estimators.
Finally, looking to the future, our focus will be on extending the functionality of this package.Let us remark that the different contributions that compose the Jacobian indirect methods differ in Table 6: Rejection percentages at the 5% significance level from the dynamic systems considered with a non-chaotic behaviour.These results provide the power of our hypothesis test.
the algorithm used for the estimation of the Jacobian.So far, we have focused on the neural net approach, which tries to estimate the underlying dynamic system without imposing the restriction of local linearity.We are working on three more Jacobian indirect methods that try to estimate the Jacobian by local polynomial kernel regressions, by local neural nets models, and by convolutional neural nets models.We are interested in developing the proposed algorithms from multivariate time series data.We are also focusing on how to apply parallelization and big data techniques to the design of the proposed algorithms in order to obtain computational efficiency when applying them to massive databases.We will keep providing useful tools and robust algorithms related to the analysis of chaotic time-series data.

The
the largest Lyapunov exponent by the Norma-2 procedure considering the ## bootstrap blocking method from the best-fitted neural net model and the partial ## derivatives showed in the jacobian.netexample.exponent <-DChaos::lyapunov.max(data=jacobian,blocking="BOOT", doplot=FALSE) ## Provides summary statistics of the results given in an object of class lyapunov summary-2 by bootstrap blocking method Embedding dimension: 3, Time-delay: 1, No. hidden units: 7 Sample size: 997, Block length: 82, No. blocks: 1000 ## Provides the Lyapunov exponent spectrum by the QR decomposition procedure considering ## the bootstrap blocking method from the best-fitted neural net model and the partial ## derivatives showed in the jacobian.netexample as well.exponent <-DChaos::lyapunov.spec(data=jacobian,blocking="BOOT", doplot=FALSE) ## Provides summary statistics of the results given in an object of class lyapunov summary(decomposition by bootstrap blocking method Embedding dimension: 3, Time-delay: 1, No. hidden units: 7 Sample size: 997, Block length: 82, No. blocks: 1000

Figure 1 :
Figure 1: Logistic map attractor adding a measurement noise with several variance values

Table 1 :
A summary of the functions available in the DChaos package Non-uniform delayed-coordinate embedding vectors.We have used a data set called sbux, which belongs to the highfrequency package.It contains the bid quote price data for Starbucks company.

Table 2 :
Blocking methods for figuring out the neural network estimator of the kth Lyapunov exponent by Jacobian indirect methods.

Table 3 :
Theoretical Lyapunov exponents (λ th ) from time-series data available on DChaos package.

Table 4 :
The mean square error (MSE) values based on the estimation of the largest Lyapunov exponent from direct methods provided by the tseriesChaos (D1) and nonlinearTseries (D2) packages are showed.Also, those obtained by the Jacobian indirect methods through the DChaos library are presented (N2 for lyapunov.maxand QR for lyapunov.spec).