Unidimensional and Multidimensional Methods for Recurrence Quantification Analysis with crqa

Recurrence quantification analysis is a widely used method for characterizing patterns in time series. This article presents a comprehensive survey for conducting a wide range of recurrence-based analyses to quantify the dynamical structure of single and multivariate time series, and to capture coupling properties underlying leader-follower relationships. The basics of recurrence quantification analysis (RQA) and all its variants are formally introduced step-by-step from the simplest auto-recurrence to the most advanced multivariate case. Importantly, we show how such RQA methods can be deployed under a single computational framework in R using a substantially renewed version our crqa 2.0 package. This package includes implementations of several recent advances in recurrence-based analysis, among them applications to multivariate data, and improved entropy calculations for categorical data. We show concrete applications of our package to example data, together with a detailed description of its functions and some guidelines on their usage.


Introduction
In the current article, we present the updated 2.0 version of the R package crqa to perform many variants of recurrence-based analyses [1] including some very recent developments for the treatment of multivariate and categorical data. Recurrence-based techniques allow the quantification of temporal structure and generalized autocorrelation properties of individual time series [2,3], the quantification of bivariate relationships and coupling between two time series [4,5], as well as the quantification of multidimensional dynamics of multivariate time series [6,7]. Recurrence-based techniques originate from the description and analysis of dynamical systems [8,5], and have been widely applied to data from physics [9,10,11,12,13], physiology [14,15,16,17,18,19], and psychology [20,21,22,23,24,25], to name a few fields.
The real success of recurrence-based analyses has revolved around their power of capturing the dynamics of complex and non-stationary time series data, and of time series exhibiting qualitatively different patterns along their temporal evolution [8]. This is because recurrence-based analyses are modelfree techniques that make few assumptions and hence are well suited for the analysis of complex systems. Moreover, recurrence-based analyses are versatile and can be applied to interval-scale data as well as nominal data, continuously sampled data and inter-event data alike [26,4].
The new version of the crqa package features the integration of major developments in recurrence analysis, such as its extension to multidimensional data, as well as a key simplification of its design and a marked improvement of the underlying computational procedures. It includes useful new functions, including a tool for mining the parameter settings for continuous data, and piece-wise computation of recurrence plots to mitigate computational cost of long time series.
The remainder of this article is divided into two broad sections. In the first section, we provide a concise introduction to the core concepts of recurrence analysis from the simplest case of auto-recurrence of a unidimensional time series, to the most complex case of multidimensional cross-recurrence, which is now integrated in the new version of the crqa package. In the second section, we showcase example applications of the different analysis methods using empirical data, hence providing a hands-on tutorial for how to use the different functions of the package.

Methodological background
In section 2.1, we briefly introduce the framework of recurrence quantification analysis (RQA hereafter) with the simplest case of a unidimensional time series. Then, in section 2.2, we discuss how RQA can be applied to two different unidimensional times series. Finally, in section 2.3 and 2.4 we explain how RQA methods can be extended to multidimensional data.

Recurrence quantification analysis (RQA)
The concept of recurrence is at the heart of all recurrence-based analyses [5,27], which mostly apply to time series or sequenced data (but see [28]). As we will see, recurrences are often defined in terms of phase space coordinates, not directly in terms of the values of a single time series, but the concept is easily demonstrated using this case-a single time series or sequence-as a starting point. Loosely speaking, a recurrence is the repetition of a value in a sequence of data points. More precisely, recurrence in a time series x, with n data points x 1 , x 2 , . . . , x n is defined as: The above equation defines the elements R ij of the recurrence matrix R in terms of identical repetitions. Such a definition is useful for a nominal sequence, where there are categorical elements that are either identical or not, and so no meaningful 'distance' norm among categories can be defined (see section 3.3.1 for an application to text data). In order to define recurrences for continuous data, we need to establish a threshold parameter (or radius parameter) ε, which provides the width of a tolerance band in the chosen distance norm within which similar, but not identical values in a time series are counted as recurrent: Setting a threshold is necessary in most cases for empirical data, because such data feature intrinsic fluctuations as well as measurement error [8]. By using recurrences as defined above, we can convert any unidimensional time series x into a recurrence plot (RP), which is a two-dimensional portrait of its dynamics expressed through its recurrence characteristics. Figure 1 shows some examples of time series of various complexity (i.e., a sinusoidal, a chaotic attractor and white noise) and their associated RPs, given some value for the threshold parameter ε.
If a time series x constitutes the one-dimensional measurement of an underlying multidimensional system, and the dynamics of the underlying dimensions are co-dependent, then these underlying dimensions can be recovered via the method of time-delayed embedding from the unidimensional Figure 1: The three time series in the first row (A) -a periodical sine wave, one of the dimensions of the chaotic Lorenz attractor and a white noise signal -were subjected to recurrence analysis without dimensional embedding, that is the recurrence plots depict recurrences based on the values of the one-dimensional time series (B). The third row (C) shows their associated recurrence plots with dimensional embedding (the sine wave was embedded in 2 dimensions, the Lorenz attractor and the white noise signal in 3 dimensions).
time series [29,30]. In these cases, the time series x is delayed (or lagged) by a certain number of data points, τ , and the number of times such delays are applied to x depend on an embedding dimension parameter m. The time-shifted copies of x, x τ , x 2τ ,. . . ,x (m−1)τ , can now be integrated into a single m-dimensional phase space, which shows the recovered multidimensional dynamics behind the measured unidimensional time series (Figure 2).
If the time series data comes from a multidimensional system, embedding the data into a higher dimensional phase space before the computation of an RP will improve the quantification of the dynamics of the systems from which that time series was recorded [8]. Now, with embedded data, recurrences are defined not in terms of the individual values of the original, unidimensional time series x, but in terms of coordinates in m-dimensional phase space. It is possible to construct a total of N = n − (m − 1)τ points in phase space, with the following coordinates: . . .
In terms of points in the m-dimensional phase space, the elements of the recurrence matrix are then given by where || · || is a distance norm in the m-dimensional phase space and ε is the threshold used to determine whether the points are close enough in phase space to be considered recurring or not. Examples of embedding and the resulting RPs are shown in Figure 1, lower panel, and in Figure 2, panel D. Note, however, that in the process of phase space reconstruction, the number of coordinates available in phase space is less than the number of data points in the original time series, the difference being equal to n − N = τ (m − 1). Moreover, the resulting phase space portraits do not exactly reflect the true underlying multidimensional dynamics, but are isomorphic to them [31]. For continuous time series (i.e., not categorical), the delay τ , the embedding dimension m and the radius ε are usually unknown, and have to be estimated After selecting an appropriate value of the threshold (or radius) parameter (here ε = 0.25) recurrence analysis can be applied to the trajectory in the two-dimensional phase space (A), and a recurrence plot is generated (D). In the recurrence plot (D), the recurrence points around the coordinates of (X 9 ;X 9+τ ) are highlighted in red color.
from the data. τ can be obtained by examining the average mutual information function (AMI) of x [32], and the false-nearest-neighbor function (FNN) of x, given some value for τ , can in turn be used to determine m [33]. The radius ε is then chosen to achieve a desired proportion of recurrence points that is expected given the type of data at hand ( [34]; see [7] for practical information on parameter estimation). The crqa package implements functions to carry out parameter estimation semi-automatically.
RPs are a very useful visualisation to qualitatively explore the dynamics of a time series. However, their main advantage is that they provide the basis to quantify the dynamics of a time series based on the patterns of recurrence points found in the plot [3]. There are various measures that can be computed from RPs. Here, we briefly describe a selection of measures, including the most common ones, that are implemented in the new version of the crqa package.
The most basic measure is the recurrence rate (RR) or percentage recurrence, which is defined as the sum of all recurrence points in an RP divided by the area of that RP. RR provides a measure for how many individual values of a time series-or its phase space coordinates-recur over time: All other measures characterise dynamics by exploiting the patterns of recurrences along the vertical and diagonal structures of the RP (see Table 1 for a concise summary of the measures). In particular, measures based on diagonal-line structures reflect repetitions of the trajectories of the time series, whereas measures based on vertical-line structures focus on the states during which a time series slows down its dynamics. The entropy of the time series can also be computed on the basis of diagonal and vertical line structures of the RP. In version 2.0 of the crqa package, we include a novel entropy measure, which is based on the distribution of the areas of the rectangular structures in an RP generated from categorical time series. This measure provides a more accurate estimation of entropy over the classic diagonal-line entropy for categorical time series that predominantly evolve in terms of changes of states [35].

Cross-recurrence quantification analysis (CRQA)
Until now, we focused on quantifying the dynamics of a system by way of recurrences of a single time series. But, the concept of recurrence can be extended to that of cross-recurrence, which extends the univariate recurrence analysis to a bivariate analysis technique that allows quantification of the temporal coupling properties or similarity of two time series [4,8]. In other words, cross-recurrence is the recurrence of a value in a time series x, with data points x 1 , x 2 , . . . , x n with the values of a time series y, with data points y 1 , y 2 , . . . , y n . Formally: As for Equation 2, a threshold parameter ε is applied to identify similar, but not necessarily identical, values that are recurrent across the two time series. As in the univariate case, this parameter can be set to values closer to 0, forcing cross-recurrences to be identical values, such as between two nominal (categorical) sequences. Also in the case of cross-recurrence analysis, the two time series x and y can be embedded before computing the cross-recurrence plot (CRP): Commonly it is expected that x and y have the same number of data points, and that the delay and embedding dimension parameters, τ and m, have to be the same too (see [7,36] for practical aspects of parameter estimation) 1 . The recurrence measures obtained from cross-recurrence quantification analysis are calculated in the same way as for the univariate recurrence quantification analysis (see Table 1). However, the values for cross-recurrence now reflect the coupled dynamics of the two time series [37], rather than the dynamics of an individual time series in univariate RQA. There is a key difference between RPs and CRPs. RPs always have recurrence points all along the main diagonal line of the plot (so-called line of identity, LOI), because a time series is by definition recurrent with itself at lag 0, which is what recurrences along the main diagonal reflect (i.e., x i = x j when i = j). This is not necessarily the case for CRPs as two time series are not necessarily synchronized (x i need not be the same as y j when i = j). If the time series are not synchronized, cross-recurrences around the main diagonal are absent or sparse. The presence of a full LOI in a CRP implies that the dynamics of the two time series are identical, or that they have a strict linear relationship to each other.

Multidimensional recurrence quantification analysis (MdRQA)
Multidimensional recurrence quantification analysis (MdRQA) is one of the recent extensions of RQA [6] that is now also available in the crqa package. MdRQA allows the analysis of multidimensional time series z with samples z 1 , z 2 , . . . , z n , where each point (sample) in z has d dimensions: Here, recurrences are defined on the d-dimensional coordinate space made up of the points of z: Like their unidimensional counterparts, multidimensional time series can be embedded into a higher dimensional space. The logic of estimating the delay and embedding dimension parameters, τ and m, are the same as with univariate RQA [38,36]. However, if one has multivariate time series and wants to estimate embedding parameters for those time series, one should use the multivariate embedding functions which are also provided with the new version of the crqa package, because they provide superior estimates of embedding parameters for multidimensional time series [7].
Depending on the underlying data, MdRQA can, in principle, be used for two different purposes. On the one hand, MdRQA can be used to quantify a multidimensional construct, such as physiological arousal, by simultaneously looking at its different measurable dimensions (e.g., heart rate, breathing and body temperature). This would be the multivariate version of how univariate RQA quantifies the dynamics of a single unidimensional time series (e.g., breathing alone). On the other hand, MdRQA can be used to examine the shared dynamics of multiple individual time series, such as, for example, three electrodermal signals measured from three members of a team performing a collaborative task. Here, MdRQA variables would be interpreted as capturing higher-order inter-correlative properties between the three signals at the level of the group [6].
In either case, one has to make a decision about whether to normalize the different dimensions of the time series or not. If each dimension of the multidimensional time series is normalized, for example z-scored, it would effectively give each time series equal weight for the definition of recurrence. In particular, if one does not know how the different time series interact, or if they are measured on different scales without regard to their potential effects on one another, then normalization is recommended, otherwise the risk is to assign a greater weight to the time series bearing greater variance. If one is certain that the values of each dimension of the multidimensional time series are already properly scaled with regard to each other, such as when simultaneously analyzing the three dimensions of the Lorenz system [39], then one needs not-and perhaps should not-normalize the dimensions.

Multidimensional cross-recurrence quantification analysis (MdCRQA)
Multidimensional cross-recurrence quantification analysis (MdCRQA) extends MdRQA in the same way that CRQA extends RQA. Effectively, Md-CRQA allows for the computation of cross-recurrences between two multidimensional time series x and y [40], where cross-recurrences are defined between the two d-dimensional coordinate spaces between the points of x and y: It is important that the different dimensions of the two multivariate time series enter the analysis in the same order. For the example of physiological arousal, if heart rate is the first dimension in x, breathing is the second dimension in x, and body temperature is the third dimension in x, then heart rate also needs to be the first dimension in y, breathing needs to be the second dimension in y, and accordingly body temperature needs to be the third dimension in y. Otherwise, the resulting MdCRQA measure will not be interpretable.
Unidimensional and multidimensional cross-recurrence analysis can also be performed in a time-dependent manner. This so-called windowed crossrecurrence analysis is conceptually very similar to windowed cross-correlation analysis as in Boker et al. [41] and it can be used to track how cross-recurrence changes over the time course. To that end, one simply partitions the time series of interest into a number of overlapping or non-overlapping sub-series and calculates CRPs for each of the sub-series. For each CRP, i.e., each sub-series of the original time series, the recurrence measures are calculated which allows tracking changes in cross-recurrence over time.

The diagonal cross-recurrence profile (DCRP)
Finally, from cross-recurrence plots of either two unidimensional time series (i.e., CRQA) or two multidimensional time series (i.e., MdCRQA), it is possible to extract the diagonal cross-recurrence profiles (DCRPs), and use it to capture leader-follower-relationships [26,8]. To that end, one has to specify a window size w for the number of lags on the recurrence plot that one wants to investigate. For example, a window-size of 10 data-points, i.e., w = 10 would span recurrences of ± 10 diagonals from the LOI. Specifically, this procedure determines the cross-recurrence rates of each diagonal, CR w , by summing up all cross-recurrence points that fall along such diagonal, and divide them by its length: The DCRP permits quantification of the recurrence rate over different relative lags between two time series. If the peak of cross-recurrences fall along the central diagonal, the line of synchronization (LOS), then this suggests strong coupling at lag 0 between the two time series. If the peak of crossrecurrences falls instead on one of the diagonals off the LOS, it indicates that the dynamics of one time series follow the dynamics of the other time series by some lag equal to that diagonal position ( Figure 3).
Note, however that interpreting the lags in terms of the sampling rate of the underlying measured time series only applies to CRPs based on unembedded (often categorical) time series. If the time series are embedded, this means that the observed lag is based on coordinates that are made up of several data points from the respective original time series, and hence introduces a degree of uncertainty with regard to the precise time interval of the lag when one tries to map particular recurrence points back to data points of the original time series. In addition, the leader-follower interpretation of the lags cannot be granted the status of a causal interpretation. For example, a parent can deliberately 'lag' their behavior behind that of their child. In such a case, it would be odd to say definitively that the child's behavior is causing the parent's behavior. For this reason, the DCRP has to be interpreted with caution and is best treated as a general description of relative temporal relationships.

Package design
The crqa is available from the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/web/packages/crqa/index.html. The software is written in R with the exception of the spdiags function, which contains a section written in Fortran. The main function is crqa, which takes its name from the package. This function performs all types of recurrence quantification analyses discussed in the previous section, and it returns the actual recurrence plot, along with the measures listed in Table 1 extracted from it. Here is how to call this primary function, with arguments and defaults: Several arguments in this function allow the user to refine aspects of the computation. For example, the user can modify the metric to obtain the distance matrix when estimating recurrence, or the settings of thresholds to accept contiguous points as recurring (see Table 2 for the list of arguments of crqa() with a brief explanation of each). drpfromts() can be used to obtain the diagonal cross-recurrence profile, and this function is built using the crqa() at its core. In fact, most arguments stay exactly the same, and we will illustrate the additional arguments that are specific to this function when describing it. Likewise, the functions wincrqa() and windowdrp() are built on the main function crqa() and are used to compute windowed cross-recurrence. The package also contains functions, such as optimizeParam(), to automatically estimate the settings for the three main parameters of RQA analyses, i.e., radius ε, delay τ and embedding dimension m, for continuous measures. As recurrence quantification analysis is heavy on memory requirements, the package features a function (piecewiseRQA()) which can be used to break down the analysis of long time series into smaller and more manageable chunks that are computationally more tractable. Finally, the package provides the user with functions to simulate data from classic examples of dynamical systems, such as the Lorenz (lorenzattractor()) or categorical series with different distributions (simts()). In what follows, we briefly describe the data available with the package. These data are used to illustrate the different functionalities of the crqa() package.

Related Packages
A search of CRAN shows very few other packages offering alternative methods to compute recurrence quantification analysis. In particular, tseri-esChaos, nonlinearTseries and RHRV. The function recurr in tseriesChaos computes a recurrence plot for a (section of) a single unidimensional timeseries, but it does not compute any derived measures from the plot characterising the dynamics of the system, nor does it handle cross recurrence plots or any of the many extensions to simple recurrence plots provided by our crqa package. Going a little further, the function rqa in nonlinearTseries returns key measures from the recurrence plots (e.g., recurrence rate), as well as, a quick visualisation of the recurrence plot (available also with the function RecurrencePlot). Finally, the function RecurrencePlot in RHRV is simply a wrapper built on the function in nonlinearTseries with the same name, but specifically tailored to electrocardiogram data. The functions available in non-linearTseries provide very basic functionality in terms of the range of metrics available and optimization routines. They do not allow the user to examine diagonal structures for leader-follower analyses, nor explore the evolution of recurrence rate using windowed methods. All such features are integrated into crqa, which, to the best of our knowledge, is the most comprehensive statistical package to perform recurrence quantification analysis in R.

Data
Different types of categorical and continuous time series, both unidimensional and multidimensional, are available with the package. The command load(crqa) will load the data into the R workspace. In particular, we include a nursery rhyme "The wheels on the bus" by Verna Hills to illustrate the most basic recurrence quantification analysis. This text is a vector of 120 strings (i.e., the words of the song), and as it is extremely simple and highly repetitive, it makes it a very good example to illustrate the core concept of recurrence. Then, we move on to cross-recurrence quantification analysis and include in the data object of the package eye-tracking data from the study by [42]. In this study, a narrator describes the characters of a TV series (Friends) to a listener, who will have to later answer some comprehension questions about them, while their eye-movement are co-registered. Here, we use a single trial of this study, which is stored as a dataframe of 2,000 observations of six possible screen locations that are looked at by the narrator and the listener. These are numerically coded from 1 to 6, representing a 2x3 visual grid 2 ). This data will also be used to illustrate diagonal and windowed cross-recurrence. Finally, we illustrate multidimensional cross-recurrence analysis using hand movement data from the study by [43]. In this study, dyads were instructed to cooperate in a complex LEGO joint construction task, under different conditions, while their hand movements and heart-rates were co-registered. Again, we select only a single trial of hand-movement from the turn-taking condition. The dataframe comprises of 5,799 observations from two participants (P1 and P2) for the dominant (_d) and non-dominant (_n) hand.

Using the crqa package 3.3.1. RQA
As explained in section 2.1, RQA entails computing the auto-recurrence of a unidimensional time series. In the context of the nursery rhyme, we expect clear phases of recurrence to emerge because the words greatly repeat. In order to run this analysis, we use the main function crqa, and specify in the argument method that we are running an RQA analysis (method = "rqa"). As the data that we use is categorical, we need to specify in the argument datatype that the nature of the data is categorical (datatype = "categorical") this will automatically recode the categorical states of the series (i.e., the words) into unique numerical integers so that recurrence can be computed using a radius that has to be smaller than 1 (e.g., radius = 0.01), so that we can capture the recurrence of identical words. For categorical RQA, the delay and embedding dimension have to be set to 1 (delay = 1; embed = 1 3 .). We also need to set the Theiler window parameter to 1 (tw = 1), so that we can exclude the LOI from all recurrence measures. Finally, the same unidimensional time series has to be input both as ts1 and ts2 to obtain its auto-recurrence.
3 If embedding dimension is set to higher values, this becomes equivalent to doing recurrence on n-grams, where m = n. In fact this interpretation of categorical recurrence creates bridges to traditional natural language processing, summarized in [44] We can use the plotting function plotRP to visualise the resulting recurrence plot. This function provides some basic arguments to change the size of the points in the plot (pcex), the color (cols), or their type (pch) which are taken verbatim from the generic plot function.
RP ← res $ RP parC ← list ( unit = 10 , labelx = " Time " , labely = " Time " , cols = " black " , pcex = .5 , pch = 15 , las = 0 , labax = seq (0 , nrow ( RP ) , 10) , labay = seq (0 , nrow ( RP ) , 10) ) plotRP ( RP , parC ) In order to get a closer understanding of recurrences, we zoom into a segment of the text, re-run the crqa() function and visualise it (Figure 4). We add the labels of the axes (x,y), print the words vertically using the las argument, and decrease the temporal unit argument to print each individual word on the axes. As it can be clearly seen in Figure 4, the bulk of recurrence in that portion is driven by the repeated use of the single word wah. When looking at a few measures associated with the RP, we observe an overall determinism of 85.4%, which implies that the system is fairly repetitive, an average diagonal line length of 3.88, which implies that on average there are sequences of four words that repeat, and a maximum diagonal length of 9, which means that the longest sequence repeating is made of 9 words. Some measures such as determinism or the average diagonal line length depend on the setting of the argument mindiagline (equivalent to l min in Table 1). The default value of this argument is 2, as two contiguous points form a line, but it can depend on the type of data (e.g., words vs. eye-movement), or the sample rate at which it is acquired (55 Hz vs. 1,000 Hz). For example, if we have acquired data at 1,000 Hz, we would practically have one data-point every 2 ms. This means that if we use a default mindiagline of 2, we would be considering as lines, any states that contiguously repeat over a 4 ms window. This value would certainly be unrealistic for some type of responses that unfolds over longer period of time (e.g., an eye-movement fixation lasts for an average of 200 ms).

CRQA: crqa()
As already explained in section 2.2, cross-recurrence is an extension of auto-recurrence to two different unidimensional time series. In order to run a cross-recurrence analysis with the crqa package, we simply need to change the method argument to method = "crqa". Here, we illustrate its use through two time series of eye-movement data explained in section 3.2. Also, we input two different time series of eye-movement data, narrator and listener, rather than just one. An optional argument that is available in the crqa function is the side (upper, lower or both) of the recurrence plot on which recurrence measures are computed. This may be useful, for example, for researchers interested in leader-follower dynamics (see Figure 5, left panel, for the visualisation of the cross recurrence plot).

Diagonal-CRQA: drpfromts
In section 2.2, we explained what diagonal cross-recurrence is, and how it can be used. In the crqa package, this measure is computed by the function drpfromts, which utilises the same arguments of the main crqa function plus an additional argument, windowsize, to define the number of lags (or diagonals) around the line of synchronization (LOS) of the CRP over which recurrence rate is computed. In the example visualised in Figure 5, center panel, we have chosen a window of 100 lags, which spans about ±3 seconds around the LOS. We can clearly see that the peak recurrence is shifted by ≈ 1 second from the LOS, i.e., lag 0. This reflects the time taken by the listener to look at the same panel the narrator was looking at-namely, about 1 second for a listener to "catch up" to the speaker.

Windowed-CRQA: windowdrp
Windowed cross-recurrence captures the evolution of recurrence rate over time. In the context of the eye-movement data, this measure reflects how consistently listener and narrator are looking at the same scene location at any given point in time. More importantly, the windowed methodology reveals how recurrence changes over time (refer to section 2.5 for more details). We use the function windowdrp to compute this measure, which again shares the same arguments of crqa, plus three more that are specific to it. In particular, we have to set: (a) the size of the window that slides over the time-course, e.g., windowsize = 100, (b) the step that we want this window to move, e.g., windowstep = 20, and (c) the number of lags 4 within the window of interest over which recurrence rate is computed, e.g., lagwidth = 50. In Figure 5, right panel, we observe that recurrence rate grows over time, which means that the narrator and the listener tend to look more and more at the same panels as the trial progresses.

MdCRQA
Finally, we compute multidimensional cross-recurrence by setting the method argument, i.e., method = "mdcrqa". Note, in order to compute multidimensional recurrence, the user needs to provide the same dataframe as input to both ts1 and ts2. This method is also available for drpfromts and windowdrp. Applied to the hand movement data, we first restructure the data of the two participants into independent sets. We re-use the same parameter settings of [43] to compute MdCRQA, and leave all other arguments with their default values.

Breaking down the computation: piecewiseRQA
Often, researchers are interested in time series which contain several thousands of observations. Sometimes the dimensionality of these time series can be reduced without losing too much information, such as by downsampling. This strategy may not always be possible or serve the researcher's purpose. Recurrence quantification analysis can require more RAM than is available in standard laptops or personal workstations, hence making it nearly impossible to run. In the new version of the crqa package, we provide the user with the piecewiseRQA function, which can be used to compute all different variants of recurrence quantification analysis described above on long time series. Conceptually, this function divides the time series into blocks, it obtains a recurrence plot for each individual block, and then fills the original recurrence plot in with all such sub-blocks, before computing the measures. 5 For example, if we are handling a time series of 10,000 observations, we could divide it into 10 blocks of 1,000 observation each. piecewiseRQA has exactly the same arguments that we have already encountered in the main crqa function, but has an additional two that are used to control the size of the block, e.g., blockSize = 100, and the argument typeRQA, which can take two options, either full or diagonal. If the value for typeRQA is diagonal only the diagonal cross-recurrence will be computed; if full, the recurrence measures will be obtained out of the full plot 6 . In Figure 6, we visualise the computational speed (left panel) and memory demand (right panel) on simulated time series of sinusoids of increasing size, and compare what happens if we run the piecewiseRQA, also with blocks of increasing size, as compared to running the main crqa function. We can clearly see that for time series of increasing length, memory demands are kept lower by the piecewiseRQA function as compared to the crqa. However, we can also see that there is a wide variance for blocks of different sizes. Therefore, it may be wise to explore the block sizes to find the one that can optimize the computational performance over a single trial before running the piecewise recurrence analysis on an entire dataset.

Estimating starting parameters: optimizeParam
The last function we showcase is optimizeParam, which helps the user exploring the space of values for the parameters of delay, embedding dimension and radius to compute recurrence quantification on continuous valued time series. In particular, optimizeParam first estimates the average mutual information (AMI) of either the unidimensional or multidimensional time series, and chooses the value that minimizes it. Then, it takes such a delay value, and evaluates the embedding dimensions maximizing the false-nearest neighbors (FNN) 7 . As a last step, optimizeParam applies the values of delay and embedding dimension obtained, to find a radius which returns a recurrence rate within a minimum and maximum value established by the user. We apply optimizeParam() to simulated sinusoids to show the unidimensional case, and to the hand movement data of [43] to show the multidimensional

case.
In order to set up the estimation of the delay and embedding dimensions for unidimensional time series, the user has to decide how to choose the value of average mutual information (i.e., typeami = mindip, the lag at which minimal information is observed, or typeami = maxlag, the maximum lag at which minimal information is observed) and the relative percentage of information gained in FNN, relative to the first embedding dimension, when higher embeddings are considered (i.e., fnnpercent). Then, as crqa is integrated in the optimizeParam to estimate the radius, most of the arguments are the same (e.g., mindiagline or tw), except the number of values of that are considered (i.e., radiusspan = 100). ts1 ← seq (0 .1 , 200 , .1 ) ts1 ← sin ( ts1 ) + linspace (0 , 1 , length ( ts1 ) ) ts2 ← ts1 par ← list ( method = " rqa " , metric = " euclidean " , maxlag = 20 , radiusspan = 100 , normalize = 0 , rescale = 4 , mindiagline = 10 , minvertline = 10 , tw = 0 , whiteline = FALSE , recpt = FALSE , side = " both " , datatype = " continuous " , fnnpercent = 10 , typeami = " mindip " ) results ← optimizeParam ( ts1 , ts2 , par , min.rec = 2 , max.rec = 5) print ( unlist ( results ) ) radius emddim delay 0 .17 2 18 For multidimensional series, the user needs to specify the right RQA method (i.e., method = "mdcrqa"). Then, for the estimation of the delay via AMI: (1) nbins the number of breaks used to define the bins within which the two-dimensional histogram (or frequency distribution) of the original and delayed time series are computed and (2) the criterion to select the delay (firstBelow to use the lowest delay at which the AMI function drops below the value set by the threshold argument, and localMin to use the position of the first local AMI minimum). The estimation of the embedding dimensions instead needs the following arguments: (1) maxEmb, which is the maximum number of embedding dimensions considered, (2) noSamples, which is the number of randomly drawn coordinates from phase space used to estimate the percentage of false-nearest neighbors, (3) Rtol, which is the first distance criterion for separating false neighbors, and (4) Atol, which is the second distance criterion for separating false neighbors. The radius is estimated as before.

Conclusion
This paper describes recurrence quantification analysis, a statistical method to characterise the nonlinear dynamics of a system. It has received a growing interest by researchers across very different fields from physiology to psychology because of its flexibility, ease of application, and explanatory power. In particular, we explain recurrence analysis from the simplest case of auto-recurrence of a unidimensional time series to the most complex case of multidimensional cross-recurrence. More importantly, we presented a significantly updated version of the crqa to perform all different variants of recurrence analysis described in the theoretical section of this manuscript. We showcased the different functions available in crqa with real data of categorical and continuous nature, and illustrated how starting parameters for continuous data can be obtained (i.e., radius, embedding dimension and delay) as well as handling long time series in a memory efficient way using additional functions available in crqa.
It is useful to end on some observations regarding the broader relevance of the package presented here. The RQA methodology and the updated crqa tap into a number of evolving problems in data analysis across various disciplines. For example, there is a drive to improve measures and models of multimodal data [45,46,47], including multi-person measures [48,49,50,51,43]. RQA and its multidimensional counterpart implemented in our package constitute an extraordinarily expansive analysis tool for exploring varied kinds of complex and multidimensional data. In addition, a demand for more dynamic quantitative analyses has now also penetrated into the social sciences [52,53,54,55]. rqa is designed to be a comprehensive analysis package for studying the dynamics of diverse systems, especially systems that exhibit high degrees of interdependence, and that show signatures in their dynamics that are critical for understanding them. rescale Whether the distance matrix on which recurrence is evaluated should be rescaled and how If rescale = 0, keep the distance matrix as is; if rescale = 1, rescale the distance matrix to its mean; if rescale = 2, rescale to distance matrix to its maximum value. radius A constant used to decide whether the distance between two points is small enough to be considered recurrent For categorical time series, the radius needs to be set at values smaller than 1. For continuous time series, the value of the radius needs to tailored to the type of data and its range. normalize Normalize the time series if normalize = 0, keep the time series at their original scale; if normalize = 1, normalize the time series to unit interval; if normalize = 2, z-score the time series. mindiagline The minimum number of contiguous points along the diagonals to consider the system into a recurrent state The default value is usually 2, as it takes a minimum of two points to define any line.
minvertline The minimum number of contiguous points along the vertical lines to consider the system into a recurrent state The default value is usually 2, as it takes a minimum of two points to define any line.