dCovTS: Distance Covariance/Correlation for Time Series

Abstract:

The distance covariance function is a new measure of dependence between random vectors. We drop the assumption of iid data to introduce distance covariance for time series. The R package dCovTS provides functions that compute and plot distance covariance and correlation functions for both univariate and multivariate time series. Additionally it includes functions for testing serial independence based on distance covariance. This paper describes the theoretical background of distance covariance methodology in time series and discusses in detail the implementation of these methods with the R package dCovTS.

Cite PDF Tweet

Published

Oct. 21, 2016

Received

Apr 18, 2016

Citation

Pitsillou & Fokianos, 2016

Volume

Pages

8/2

324 - 340


1 Introduction

There has been a considerable recent interest in measuring dependence by employing the concept of the distance covariance function. initially introduced the distance covariance as a new measure of dependence defined as the weighted L2-norm between the joint characteristic function of two random vectors of arbitrary, but not necessarily of equal dimensions, and their marginal characteristic functions. However, the idea of using distance covariance for detecting independence can be also found in some early work by . He considered measures of this form, with the main differences being the restriction to the univariate case and the choice of the weight function. Since work, there has been a wide range of studies extending the distance covariance definition and methodology in various topics; see and and the references therein for a nice review.

distance covariance methodology is based on the assumption that the underlying data are iid. However, this assumption is often violated in many practical problems. proposed to extend the distance covariance methodology to a time series context in order to measure serial dependence. There have been few works on how to develop a distance covariance methodology in the context of time series . Motivated by the work of , recently defined the so-called auto-distance covariance function (ADCV) - and its rescaled version, the so-called auto-distance correlation function (ADCF), for a strictly stationary multivariate time series. Compared to the classical Pearson autocorrelation function (ACF) which measures the strength of linear dependencies and can be equal to zero even when the variables are related, ADCF vanishes only in the case where the observations are independent. However, studied the asymptotic behavior of ADCV at a fixed lag order. relaxed this assumption and constructed a univariate test of independence by considering an increasing number of lags following generalized spectral domain methodology. Although the proposed methodology is for univariate processes, it can be extended for multivariate processes.

developed a distance covariance methodology for multivariate time series, but he did not explore the interrelationships between the various time series components. made this possible by defining the matrix version of pairwise auto-distance covariance and correlation functions. In particular, they construct multivariate tests of independence based on these new measures in order to identify whether there is some inherent nonlinear interdependence between the component series.

The energy package for R is a package that involves a wide range of functions for the existing distance covariance methodology. However, there is no package for the aforementioned distance covariance methodology in time series. Thus, we aim at filling this gap by publishing an R-package named dCovTS. In this first version of the package, we provide functions that compute and plot ADCV and ADCF using the functions dcov() and dcor() respectively from energy package. The new testing methodology proposed by is also included in the package.

The structure of the paper is as follows. In the first two sections we introduce the theoretical background of distance covariance function for both univariate and multivariate time series respectively. In the next section, we briefly state the main results about the asymptotic properties of distance covariance function. The proposed testing methodology for both univariate and multivariate time series are also described. Empirical p-values of the tests and empirical critical values for the distance correlation plots are computed via the wild bootstrap methodology which is explained in the corresponding section. The implementation section demonstrates the usage of the package with two real data examples. Lastly, we give some concluding remarks and some further points for future extensions of the dCovTS package.

2 Distance covariance function

Denote a univariate strictly stationary time series by {Xt,tZ}. Motivated by and , we define the distance covariance function as a function of the joint and marginal characteristic functions of the pair (Xt,Xt+j). Denote by ϕj(u,v) the joint characteristic function of Xt and Xt+j; that is (1)ϕj(u,v)=E[exp(i(uXt+vXt+j))],    j=0,±1,±2,, and the marginal characteristic functions of Xt and Xt+j as ϕ(u):=ϕj(u,0) and ϕ(v):=ϕj(0,v) respectively, where (u,v)R2, and i2=1. For a strictly stationary α-mixing univariate time series, defined a new measure of dependence between the joint characteristic function of Xt and its lagged observation Xt+j and the product of their marginals, namely (2)σj(u,v)=ϕj(u,v)ϕ(u)ϕ(v),    j=0,±1,±2,, where (u,v)R2. Considering the property that the joint characteristic function factorizes under independence of Xt and Xt+j, σj(u,v) equals 0 if and only if Xt and Xt+j are independent. Thus, compared to the classical autocorrelation function (ACF), σj(,) can capture all pairwise dependencies including those with zero autocorrelation. The auto-distance covariance function (ADCV), VX(j), between Xt and Xt+j is then defined as the square root of (3)VX2(j)=R2|σj(u,v)|2dW(u,v),    j=0,±1,±2, where W(,) is a positive weight function for which the above integral exists.

Although suggests the use of an arbitrary integrable weight function, W(,), we propose the use of a non-integrable weight function, i.e. (4)W(u,v)=W0(u)W0(v)=1π|u|21π|v|2,   (u,v)R2 which avoids missing any potential dependence among observations . Rescaling (3), one can define the auto-distance correlation function (ADCF) as the positive square root of (5)RX2(j)=VX2(j)VX2(0),     j=0,±1,±2, for VX2(0)0 and zero otherwise. showed that by applying a non-integrable weight function, like (4), ADCF is scale invariant and is not zero under dependence.

The empirical ADCV, V^X(), is the non-negative square root of (6)V^X2(j)=1(nj)2r,l=1+jnArlBrl,     0j(n1) and V^X2(j)=V^X2(j), for (n1)j<0, where A=Arl and B=Brl are Euclidean distance matrices given by Arl=arla¯r.a¯.l+a¯.., with arl=|XrXl|, a¯r.=(l=1+jnarl)/(nj), a¯.l=(r=1+jnarl)/(nj), a¯..=(r,l=1+jnarl)/(nj)2. Brl is defined analogously in terms of brl=|YrYl|, where YtXt+j. proposed an unbiased version of the sample distance covariance. In the context of time series data this is given by (7)V~X2(j)=1(nj)(nj3)rlA~rlB~rl, for n>3, where A~rl is the (r,l) element of the so-called U-centered matrix A~, defined by

A~rl={arl1nj2t=1+jnart1nj2s=1+jnasl+1(nj1)(nj2)t,s=1+jnats,rl;0,r=l.

The empirical ADCF, R^X(j) (or its unbiased version, R~X(j)), can be obtained by replacing (6) (or (7)) into (5). The functions ADCV() and ADCF() in dCovTS return the empirical quantities V^X() and R^X() respectively. Using the same functions with argument unbiased=TRUE, the results correspond to the unbiased sqaured quantities V~X2() and R~X2(). Note that the default option has been set to unbiased=FALSE (corresponding to (6)).

3 Distance covariance matrix

We denote by {Xt:t=0,±1,±2,} a d-dimensional time series process, with components {Xt;i}i=1d. The characteristic functions can be defined in analogous way as in the univariate case. In particular, the joint characteristic function of Xt;r and Xt+j;m is given by ϕj(r,m)(u,v)=E[exp(i(uXt;r+vXt+j;m))],    j=0,±1,±2, and the marginal characteristic functions of Xt;r and Xt+j;m by ϕ(r)(u):=ϕj(r,m)(u,0) and ϕ(m)(v):=ϕj(r,m)(0,v) respectively, with (u,v)R2, r,m=1,d and i2=1. The pairwise ADCV between Xt;r and Xt+j;m is denoted by Vrm(j) and it is defined as the non-negative square root of Vrm2(j)=R2|σj(r,m)(u,v)|2W(u,v)dudv,    j=0,±1,±2, where W(,) is given by (4) and σj(r,m)(u,v) is similarly defined as in the univariate case, namely σj(r,m)(u,v)=ϕj(r,m)(u,v)ϕ(r)(u)ϕ(m)(v). Clearly, Vrm2(j)0,   j and Xt;r and Xt+j;m are independent if and only if Vrm2(j)=0. The ADCV matrix, V(j), is then defined by (8)V(j)=[Vrm(j)]r,m=1d,    j=0,±1,±2,

The pairwise ADCF between Xt;r and Xt+j;m, Rrm(j), is a coefficient that lies in the interval [0,1] and also measures dependence and is defined as the positive square root of (9)Rrm2(j)=Vrm2(j)Vrr2(0)Vmm2(0), for Vrr(0)Vmm(0)0 and zero otherwise. The ADCF matrix of Xt, is then defined as R(j)=[Rrm(j)]r,m=1d,    j=0,±1,±2,

Vrm(j) measures the dependence of Xt;r on Xt+j;m. In general, Vrm(j)Vmr(j) for rm, since they measure different dependence structure between the series {Xt;r} and {Xt;m} for all r,m=1,2,,d. Thus, V(j) and R(j) are non-symmetric matrices. Moreover, because of the assumed stationarity and relation Cov(x,y)=Cov(y,x), V(j)=V(j) and consequently R(j)=R(j). More properties of these new defined functions can be found in .

Estimation of Vrm2() can be dealt in a similar way as in the univariate case. In particular, let first Yt;mXt+j;m. Based on the sample {(Xt;r,Yt;m):t=1+j,,n}, we define the Euclidean distance matrices by (atsr)=|Xt;rXs;r| and (btsm)=|Yt;mYs;m| and the centered distance matrices by Atsr=atsra¯t.ra¯.sr+a¯..r,Btsm=btsmb¯t.mb¯.sm+b¯..m, where the quantities in the right hand side are defined analogously as those defined in the univariate case. The biased estimator of Vrm2() is then given by (10)V^rm2(j)={1(nj)2t,s=1+jnAtsrBtsm,0j(n1);1(n+j)2t,s=1n+jAtsrBtsm,(n1)j<0. Analogously to (7), an unbiased estimator of V^rm2() is given by

(11)V~rm2(j)={1(nj)(nj3)t,s=1+jnA~tsrB~tsm,0j(n1);1(n+j)(n+j3)t,s=1n+jA~tsrB~tsm,(n1)j<0, where A~tsr are computed appropriately.

The sample ADCV matrix, V^(), is then obtained by replacing its elements by the positive square root of (10) and can be calculated from dCovTS using the mADCV() function. The estimator based on (11), V~(), is obtained from dCovTS using the argument unbiased = TRUE. The package also gives the sample ADCF matrix R^() (function mADCF()) which is obtained after replacing (10) (or R~X(j) which is based on (7)) into (9). The distance correlation plots for both univariate and multivariate time series are obtained by the ADCFplot() and mADCFplot() functions respectively, where the shown critical values (blue dotted horizontal line) are computed by employing bootstrap methodology described in the appropriate section. Recall that these are computed by using the biased definition of distance covariance and correlation.

4 Consistency and asymptotic distribution of distance covariance

Consider first the univariate case. For a strictly stationary and α-mixing process Xt, with E|Xt|<, then for all j=0,±1,±2, V^X2(j)VX2(j) almost surely, as n. A detailed proof of this result can be found in . Under mild conditions, obtained the weak consistency of V^X2() and its asymptotic distribution at a fixed lag, but under alternative mixing conditions.

In addition, showed that for a d-dimensional strictly stationary and ergodic time series process {Xt} with E|Xt;r|< for r=1,,d, then for all j=0,±1,±2, V^(j)V(j) almost surely as n. Under pairwise independence, the empirical pairwise ADCV is a degenerate V-statistic of order two with a measurable kernel function that is symmetric, continuous and positive semidefinite Then (12)(nj)V^X2(j)Z:=kλkZk2 in distribution, as n, where {Zk} is an iid sequence of N(0,1) random variables, and (λk) is a sequence of nonzero eigenvalues. A similar result showing the limiting distribution of V^rm() can be obtained by replacing V^X() by V^rm() in (12).

5 Testing for pairwise dependence in univariate time series

As shown in the previous section, the asymptotic distribution of distance covariance is derived at a fixed lag, for both univariate and multivariate time series. constructed the asymptotic behavior of distance covariance considering an increasing number of lags by employing generalized spectral domain methodology. highlighted that standard spectral density approaches become inappropriate for non-Gaussian and nonlinear processes with zero autocorrelation. Considering a univariate strictly stationary α-mixing process, he proposed the generalized spectral density, which is the Fourier transform of σj(u,v) defined in (2), given by f(ω,u,v)=12πj=σj(u,v)eijω. Under the null hypothesis of independence, the corresponding null density is given by f0(ω,u,v)=12πσ0(u,v),    ω[π,π]. Any deviation of f from f0 is a strong evidence of pairwise dependence. Thus, compares the kernel-type estimators f^(ω,u,v) and f^0(ω,u,v) via an L2-norm resulting in a test statistic of the form (13)Tn(2)=R2j=1n1(nj)k2(j/p)|σ^j(u,v)|dW(u,v), where W(,):R2R is an arbitrary nondecreasing function with bounded total variation, p is a bandwidth of the form p=cnλ for c>0 λ(0,1) and k() is a Lipschitz-continuous kernel function satisfying the following assumption:

assumption

k:R[1,1] is symmetric and is continuous at 0 and at all but a finite number of points, with k(0)=1, k2(z)dz< and |k(z)|C|z|b for large z and b>1/2.

The function kernelFun() in dCovTS computes a number of such kernel functions including the truncated (default option), Bartlett, Daniell, QS and Parzen kernels.

proposed a portmanteau type statistic based on ADCV (14)Tn=j=1n1(nj)k2(j/p)V^X2(j). Under the null hypothesis that the data are iid and some further assumptions about the kernel function k(), the standardized version of Tn follows a N(0,1) asymptotically, and it is consistent. The authors also considered a similar test statistic based on ADCF (15)j=1n1(nj)k2(j/p)R^X2(j). The function UnivTest from dCovTS package performs univariate tests of independence based on (14) and its rescaled version (15), using the arguments testType = "covariance" and testType = "correlation" respectively.

6 Testing for pairwise dependence in multivariate time series

Following a similar methodology described in the previous section, suggested a test statistic suitable for testing pairwise independence in a multivariate time series framework. The proposed test statistic is based on the ADCV matrix (8), and it is given by (16)T~n=j=1n1(nj)k2(j/p)trV^(j)V^(j).

where k() is a univariate kernel function satisfying Assumption 1, p is a bandwidth as described before. Moreover, V^() denotes the complex conjugate matrix of V^() and tr(A) denotes the trace of the matrix A. The authors formed the statistic (16) in terms of the ADCF matrix as follows

(17)Tn=j=1n1(nj)k2(j/p)trV^(j)D^1V^(j)D^1,

where D=diag{Vrr(0),r=1,2,,d}. Under the null hypothesis of independence and some further assumptions about the kernel function k(), the standardized version of the test statistics T~n and Tn given in (16) and (17) were proved to follow N(0,1) asymptotically and they are consistent. The multivariate tests of independence based on T~n and Tn are performed via mADCVtest() and mADCFtest() respectively in dCovTS package.

7 Bootstrap methodology

To examine the asymptotic behavior of the proposed test statistics, a resampling method is proposed. First, recall that all test statistics Tn, T~n and Tn of equations (14), (16) and (17) respectively, are functions of degenerate V- statistics of order two. proposed wild bootstrap techniques to approximate the distribution of degenerate U-statistics for the case of iid data. Recently, suggested the use of a new variant of dependent wild bootstrap to approximate the limit distribution of degenerate U- and V-statistics for dependent data. The method relies on generating auxiliary random variables (Wtn)t=1nj. highlighted that the methodology of wild bootstrap for time series extends that of by allowing the auxiliary random variables Wtn to be dependent. In particular, proposed to generate the sequence Wtn by a first order autoregressive model. In the case of independent data, studied the wild bootstrap methodology by employing independent auxiliary variables Wtn. Because our focus is on testing independence we implement the calculation of the test statistics by using Wtn iid standard normal random variables. Thus, the empirical p-values of the tests are derived based on this methodology.

We also suggest the use of independent wild bootstrap for obtaining simultaneous 95% empirical critical values for the distance correlation plots. In the case of a univariate time series, we additionally propose the subsampling approach suggested by for computing the pairwise 95% critical values (argument method = "Subsampling"). The choice of the subsampling block size is based on the minimum volatility method proposed by . In addition, the package provides the ordinary independent bootstrap methodology to derive empirical p-values of the tests and simultaneous 95% critical values for the ADCF plots (argument method = "Independent Bootstrap"). The default bootstrap method provided to the user is the independent wild bootstrap technique.

The computation of the bootstrap replications, and thus the empirical p-values and the critical values, can be distributed to multiple cores simultaneously (argument parallel = TRUE). To do this, the doParallel package needs to be installed first, in order to register a computing cluster.

8 Implementation of dCovTS package

The current version of dCovTS package (version number 1.1) is available from CRAN and can be downloaded via https://cran.r-project.org/web/packages/dCovTS/. The aim of the dCovTS package is to provide a set of functions that compute and plot distance covariance and correlation functions in both univariate and multivariate time series. As we mentioned, the package supports both versions of biased and unbiased estimators of distance covariance and correlation functions. Moreover, it offers functions that perform univariate and multivariate tests of independence based on distance covariance function using the biased estimator (corresponding to (6) and (10)). All these functions are provided in Table 1. Apart from these functions, the package also provides two real datasets listed in Table 2. A more detailed description of the functions and datasets can be found in the help files. We apply dCovTS to two real data examples.

Table 1: Functions in dCovTS
Function Description
ADCF, mADCF Estimates distance correlation for a univariate and multivariate time series respectively
ADCV, mADCV Estimates distance covariance for a univariate and multivariate time series respectively
ADCFplot, mADCFplot Plots sample distance correlation in a univariate and multivariate time series framework respectively
kernelFun Gives a range of univariate kernel function, k(), that satisfy Assumption 1
UnivTest Performs a univariate test of independence based on Tn
mADCFtest, mADCVtest Perform multivariate tests of independence based on Tn and T~n respectively

Table 2: Datasets in dCovTS
Data Description
ibmSp500 Monthly returns of IBM and S&P 500 composite index from January 1926 to December 2011
MortTempPart Mortality, temperature and pollution data measured daily in Los Angeles County over the period 1970-1979

Regression with autocorrelated errors

We first consider the pollution, temperature and mortality data measured daily in Los Angeles County over the 10 year period 1970-1979 . The data are available in our package by the argument MortTempPart and contain 508 observations and 3 variables representing the mortality ("cmort"), temperature ("tempr") and pollutant particulates ("part") data.

library(dCovTS)
data(MortTempPart)
MortTempPart[1:10,] # the first ten observations
##     cmort tempr  part
## 1   97.85 72.38 72.72
## 2  104.64 67.19 49.60
## 3   94.36 62.94 55.68
## 4   98.05 72.49 55.16
## 5   95.85 74.25 66.02
## 6   95.98 67.88 44.01
## 7   88.63 74.20 47.83
## 8   90.85 74.88 43.60
## 9   92.06 64.17 24.99
## 10  88.75 67.09 40.41
attach(MortTempPart)

Following the analysis of , the possible effects of temperature (Tt) and pollutant particulates (Pt) on daily cardiovascular mortality (Mt) are examined via regression. In particular, once the temperature is adjusted for its mean (T.=74.3), we fit the following regression model using the function lm() (18)M^t=2831.491.396(0.101)t0.472(0.032)(TtT.)=+0.023(0.003)(TtT.)2+0.255(0.019)Pt, where the standard errors of the estimators are given in parentheses. Figure 1 provides the sample autocorrelation (ACF), partial correlation (PACF) and ADCF plots of the residuals of model (18).

graphic without alt textgraphic without alt text

graphic without alt textgraphic without alt text

Figure 1: Sample ACF, PACF and ADCF plots of the mortality residuals of model .

The plots shown in Figure 1 suggest an AR(2) process for the residuals. The new fit is (19)M^t=3075.151.517(0.423)t0.019(0.050)(TtT.)=+0.015(0.002)(TtT.)2+0.155(0.027)Pt, where the standard errors of the estimators are given in parentheses. The above model fit was derived by using the arima() function of R. The correlation plots for the residuals from the new model (19) are shown in Figure 2 indicating that there is no serial dependence.

graphic without alt textgraphic without alt text

graphic without alt textgraphic without alt text

Figure 2: Sample ACF, PACF and ADCF plots of the mortality residuals of model indicating that the new residuals can be taken as white noise.

The calls for both model fits and their diagnostic plots are given below. ADCF plots (lower plots of Figures 1 and 2) are constructed using both resampling schemes explained in the previous section: independent wild bootstrap (with b=499 replications) and subsampling.

temp <- tempr - mean(tempr)  # center temperature
temp2 <- temp^2
trend <- time(cmort)
fit <- lm(cmort ~ trend + temp + temp2 + part, na.action = NULL)
Residuals <- as.numeric(resid(fit))
##Correlation plots
acf(Residuals, lag.max = 18,main = "")
pacf(Residuals, lag.max = 18,main = "")
ADCFplot(Residuals, MaxLag = 18, main = "Wild Bootstrap", method = "Wild")
ADCFplot(Residuals, MaxLag = 18, main = "Subsampling", method = "Subsampling")

fit2 <- arima(cmort, order =c(2, 0, 0), xreg = cbind(trend, temp, temp2, part))
Residuals2 <- as.numeric(residuals(fit2))
##Correlation plots
acf(Residuals2, lag.max = 18, main = "")
pacf(Residuals2, lag.max = 18, main = "")
ADCFplot(Residuals2, MaxLag = 18, main = "Wild Bootstrap", method = "Wild")
ADCFplot(Residuals2, MaxLag = 18, main = "Subsampling", method = "Subsampling")

To formally confirm the absence of any serial dependence among the new residuals of model (19), as shown in Figure 2, we perform univariate tests of independence based on the test statistic Tn given in (14). We use the UnivTest() function from our package with argument testType = "covariance" (default option). In order to examine the effect of using different bandwidths, we choose p=[3nλ] for λ=0.1, 0.2 and 0.3, that is p=6, 11, and 20, and we apply Bartlett kernel. The resulting p-values are 0.118, 0.170 and 0.208 respectively suggesting acceptance of independence. We calculated p-values for b=499 independent wild bootstrap replications. The bootstrap procedure can be computed on multiple cores simultaneously by using the argument parallel = TRUE (they take about 10, 14 and 23 seconds respectively on a standard laptop with Intel Core i5 system and CPU 2.30 GHz):

UnivTest(Residuals2, type = "bartlett", p = 6, b = 499, parallel = TRUE)

##         Univariate test of independence based on distance covariance
##
## data:  Residuals2, kernel type: bartlett, bandwidth=6, boot replicates 499
## Tn = 67.7344, p-value = 0.118

UnivTest(Residuals2, type = "bartlett", p = 11, b = 499, parallel = TRUE)

##         Univariate test of independence based on distance covariance
##
## data:  Residuals2, kernel type: bartlett, bandwidth=11, boot replicates 499
## Tn = 125.6674, p-value = 0.170

UnivTest(Residuals2, type = "bartlett", p = 20, b = 499, parallel = TRUE)

##         Univariate test of independence based on distance covariance
##
## data:  Residuals2, kernel type: bartlett, bandwidth=20, boot replicates 499
## Tn = 225.9266, p-value = 0.208

We compare the proposed test statistic with other test statistics to check its performance. In particular, we consider the Box-Pierce test statistic BP=nj=1pρ^2(j), the Ljung-Box test statistic LB=n(n+2)j=1p(nj)1ρ^2(j), the test statistic proposed by Tn(1)=nj=1n1k2(j/p)ρ^2(j) and the test statistic Tn(2) proposed by defined in (13) with W(u,v)=Φ(u)Φ(v), and Φ() being the cumulative distribution function of standard normal. For the aforestated bandwidth values, all these alternative test statistic give large p-values indicating the absence of any serial dependence among the new residuals. More precisely, BP and LB give 0.848, 0.906, 0.170 and 0.844, 0.901, 0.142 respectively. BP and LB based tests are performed in R by the function Box.test() as follows:

box1 <- Box.test(Residuals2, lag = 6)
box2 <- Box.test(Residuals2, lag = 11)
box3 <- Box.test(Residuals2, lag = 20)
ljung1 <- Box.test(Residuals2, lag = 6, type = "Ljung")
ljung2 <- Box.test(Residuals2, lag = 11, type = "Ljung")
ljung3 <- Box.test(Residuals2, lag = 20, type = "Ljung")

The p-values obtained by Tn(1) are 0.896, 0.930 and 0.870 respectively. Tn(2) gives the following p-values: 0.854, 0.752 and 0.504 respectively. Tn(1) and Tn(2) are calculated by employing the Bartlett kernel. These p-values are calculated for b=499 ordinary bootstrap replications. The R functions for constructing these test statistics are beyond the scope of this paper and are available from the authors upon request.

Bivariate financial time series

We now analyze the monthly log returns of the stocks of International Business Machines (IBM) and the S&P 500 composite index starting from 30 September 1953 to 30 December 2011 for 700 observations. A larger dataset is available in our package by the object ibmSp500 starting from January 1926 for 1032 observations. It is actually a combination of two smaller datasets: the first one was first reported by and the second one was first reported by . ACF and ADCF plots of the original series are provided in Figure 3, whereas Figure 4 shows the ACF and ADCF plots of the squared series.

(a)
graphic without alt text
(b)
graphic without alt text

Figure 3: (a) The sample ACF of the original series. (b) The sample ADCF of the original series.

(a)
graphic without alt text
(b)
graphic without alt text

Figure 4: (a) The sample ACF of the squared series. (b) The sample ADCF of the squared series.

The R commands for constructing these plots are as follows:

data(ibmSp500)
new_data <- tail(ibmSp500[,2:3], 700)
series <- log(new_data + 1)
t=scale(lseries, center = TRUE, scale = FALSE)
t2 <- at^2
olnames(at) <- c("IBM", "SP")
olnames(at2) <- c("IBM_sq", "SP_sq")
cf(at, lag.max = 18)
cf(at2, lag.max = 18)
ADCFplot(at, MaxLag = 18, ylim = c(0, 0.2))
ADCFplot(at2, MaxLag = 18, ylim = c(0, 0.2))

The ACF plots of the original series (upper panel of Figure 3) suggest no serial correlation among observations, while the ACF plots of the squared series (upper panel of Figure 4) imply strong dependence. This confirms the conditional heteroscedasticity in the monthly log returns. However, the ADCF plots for both original and squared series (lower panels of Figures 3 and 4) suggest dependence. Indeed, choosing p=[3nλ] for λ=0.1, 0.2 and 0.3, that is p=6, 12 and 22, and employing Bartlett kernel, Tn gives low p-values (0.022, 0.014 and 0.020 respectively). The calls for these three multivariate tests of independence can be found below (they take about 2, 3 and 6 minutes respectively for b=499 bootstrap replications on a standard laptop with Intel Core i5 system and CPU 2.30 GHz):

mADCFtest(at, "bartlett", p = 6, b = 499, parallel = TRUE)

##        Multivariate test of independence based on distance correlation
##
## data:  at, kernel type: bartlett, bandwidth=6, boot replicates 499
## Tnbar = 34.1743, p-value = 0.022

mADCFtest(at, "bartlett", p = 12, b = 499, parallel = TRUE)

##        Multivariate test of independence based on distance correlation
##
## data:  at, kernel type: bartlett, bandwidth=12, boot replicates 499
## Tnbar = 71.1713, p-value = 0.014

mADCFtest(at, "bartlett", p = 22, b = 499, parallel = TRUE)

##        Multivariate test of independence based on distance correlation
##
## data:  at, kernel type: bartlett, bandwidth=22, boot replicates 499
## Tnbar = 122.9424, p-value = 0.02

To compare the performance of the proposed test statistic Tn, we consider the multivariate Ljung-Box statistic defined by: mLB=n2j=1p(nj)1tr{Γ^(j)Γ^1(0)Γ^(j)Γ^1(0)} where Γ^() is the ordinary covariance matrix. In contrast to the Mn’s results, mLB gives large p-values (0.218, 0.731 and 0.525) respectively. The portes package needs to be installed in order to perform tests of independence based on mLB statistic:

> library(portes)
> LjungBox(at, c(6, 12, 22))

Assuming that the bivariate log returns follows a VAR model and employing the AIC to choose its best order, we obtain that a VAR(2) model fits the data well. Figure 5 shows the ACF plots (upper panel) and ADCF plots (lower panel) of the residuals after fitting a VAR(2) model to the original bivariate log return series using the function VAR() from the MTS package.

(a)
graphic without alt text
(b)
graphic without alt text

Figure 5: (a) The sample ACF and (b) sample ADCF of the residuals after fitting VAR(2) model to the original series.

In contrast to the ACF plot, the ADCF plot still indicates some dependence among the residuals. Constructing tests of independence based on Tn and mLB for the same choices of bandwidth, p=6, 12, 22, we confirm this visual result. In particular, employing a Bartlett kernel, Tn statistic gives low p-values (0.036, 0.018 and 0.034 respectively) whereas the mLB statistic yields large p-values (0.669, 0.958 and 0.806 respectively). The calls for the plots of Figure 5 and the corresponding tests of independence are as follows:

library(MTS)
model <- VAR(at, 2)
resids <- residuals(model)
colnames(resids) <- c("IBM_res", "SP_res")
windows(9, 6)
acf(resids, lag.max = 18)
mADCFplot(resids, MaxLag = 18, ylim = c(0, 0.13))

## Tests of independence based on \overline{T}_n
mADCFtest(resids, "bartlett", p = 6, b = 499, parallel = TRUE)

##         Multivariate test of independence based on distance correlation
##
## data:  resids, kernel type: bartlett, bandwidth=6, boot replicates 499
## Tnbar = 29.9114, p-value = 0.036

mADCFtest(resids, "bartlett", p = 12, b = 499, parallel = TRUE)

##         Multivariate test of independence based on distance correlation
##
## data:  resids, kernel type: bartlett, bandwidth=12, boot replicates 499
## Tnbar = 64.7754, p-value = 0.018

mADCFtest(resids, "bartlett", p = 22, b = 499, parallel = TRUE)

##        Multivariate test of independence based on distance correlation
##
## data:  resids, kernel type: bartlett, bandwidth=22, boot replicates 499
## Tnbar = 115.3462, p-value = 0.034

## Tests of independence based on mLB
LjungBox(resids, c(6, 12, 22))

9 Summary and further research

There have been many works in the literature based on distance covariance methodology. The R package energy , provides functions that cover this methodology. However, there is no published package that includes functions about distance covariance for time series data. dCovTS contributes to filling this gap by providing functions that compute distance covariance and correlation functions for both univariate and multivariate time series. We also include functions that develop univariate and multivariate tests of serial dependence based on distance covariance and correlation functions.

There is a number of possible extensions of this package, and some of them are not covered by existing theory and can be seen as further research. One possible direction is to develop a theory based on partial ADCV or conditional ADCV and a related testing methodology to identify possible dependencies among time series (see for partial distance covariance methodology and , for conditional distance covariance methodology; all three works deal with independent random variables). Among the many applications of partial correlation are graphical models. Thus, a graphical modeling theory based on partial ADCV could be carried out and this methodology can be included for a future version of this package.

10 Acknowledgments

The authors thank Tobias Liboschik for his considerable help on the development of this package. The authors would also like to thank Dominic Edelmann for carefully checking the package and making helpful comments and suggestions for its improvement. In addition, we would like to extend our gratitude to R. Bivand and to an anonymous reviewer whose comments improved our original submission.

CRAN packages used

energy, doParallel, portes, MTS

CRAN Task Views implied by cited packages

TimeSeries

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

Footnotes

    References

    R. Analytics and S. Weston. doParallel: Foreach Parallel Adaptor for the ’parallel’ Package, . R package version 1.0.10, 2015. URL http://CRAN.R-project.org/package=doParallel.
    G. E. P. Box and D. A. Pierce. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American Statistical Association 65:, 1970.
    R. A. Davis, M. Matsui, T. Mikosch, and P. Wan. Applications of distance correlation to time series. 2016. URL http://arxiv.org/abs/1606.05481.
    H. Dehling and T. Mikosch. Random quadratic forms and the boostrap for U-statistics. Journal of Multivariate Analysis 51:, 1994.
    J. Dueck, D. Edelmann, T. Gneiting, and D. Richards. The affinely invariant distance correlation. Bernoulli 20:, 2014.
    A. Feuerverger. A consistent test for bivariate dependence. International Statistical Review 61:, 1993.
    K. Fokianos and M. Pitsillou. Consistent testing for pairwise dependence in time series. Technometrics natexlaba, 2016a. URL http://dx.doi.org/10.1080/00401706.2016.1156024.
    K. Fokianos and M. Pitsillou. Testing pairwise independence for multivariate time series by the auto-distance correlation matrix. Submitted for publication natexlabb, 2016b.
    A. Gretton, K. Fukumizu, and B. K. Sriperumbudur. Discussion of: Brownian distance covariance. The Annals of Applied Statistics 3:, 2009.
    Y. Hong. Consistent testing for serial correlation of unknown form. Econometrica 64:, 1996.
    Y. Hong. Hypothesis testing in time series via the emprical characteristic function: A generalized spectral density approach. Journal of the American Statistical Association 94:, 1999.
    J. R. M. Hosking. Multivariate Portmanteau statistic. Journal of the American Statistical Association 75:, 1980.
    J. Josse and S. Holmes. Tests of independence and beyond. 2014. URL http://arxiv.org/pdf/1307.7383.pdf.
    A. Leucht and M. H. Neumann. Degenerate U- and V-statistics under ergodicity: asymptotics, bootstrap and applications in statistics. Annals of the Institute of Statistical Mathematics,65: natexlaba, 2013a.
    A. Leucht and M. H. Neumann. Dependent wild bootstrap for degenerate U- and V- statistics. Journal of Multivariate Analysis 117: ,natexlabb, 2013b.
    G. M. Ljung and G. E. P. Box. On a measure of lack of fit in time series models. Biometrika 62:, 1978.
    E. Mahdi and A. I. McLeod. Improved multivariate Portmanteau diagnostic test. Journal of Time Series Analysis 33:, 2012. URL http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9892.2011.00752.x/abstract.
    E. Parzen. On consistent estimates of the spectrum of a stationary time series. Annals of Mathematical Statistics 28:, 1957.
    B. Poczos and J. Schneider. Conditional distance variance and correlation.  bapoczos/articles/poczos12distancecorr.pdf, 2012. URL http://www.cs.cmu.edu/.
    N. P. Politis, J. P. Romano, and M. Wolf. Subsampling. Springer-Verlag New York, 1999.
    B. Remillard. Discussion of: Brownian distance covariance. The Annals of Applied Statistics 3:, 2009.
    M. L. Rizzo and G. J. Szekely. energy: E-statistics (energy statistics), . R package version 1.6.2, 2014. URL https://CRAN.R-project.org/package=energy.
    X. Shao. The dependent wild bootstrap. Journal of the American Statistical Association 105:, 2010.
    R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications With R Examples. Springer-Verlag New York Third Edition, 2011.
    R. H. Shumway, R. S. Azari, and Y. Pawitan. Modeling mortality fluctuations in Los Angeles as functions of pollution and weather effects. Environmetnal research 45:, 1988.
    G. J. Székely and M. L. Rizzo. Partial distance correlation with methods for dissimilarities. The Annals of Statistics 42:, 2014.
    G. J. Székely, M. L. Rizzo, and N. K. Bakirov. Measuring and testing dependence by correlation of distances. The Annals of Statistics 35:, 2007.
    R. S. Tsay. Analysis of Financial Time Series. Wiley Series in Probability; Statistics Hoboken NJ Third Edition, 2010.
    R. S. Tsay. MTS: All-Purpose Toolkit for Analyzing Multivariate Time Series (MTS) and Estimating Multivariate Volatility Models, . R package version 0.33, 2015. URL http://CRAN.R-project.org/package=MTS.
    R. S. Tsay. Multivariate Time Series Analysis With R and Financial Applications. Wiley Series in Probability; Statistics Hoboken NJ, 2014.
    X. Wang, W. Pan, W. Hu, Y. Tian, and H. Zhang. Conditional distance correlation. Journal of the American Statistical Association DOI: 10.1080/01621459.2014.993081, 2015.
    C. F. J. Wu. Jackknife, bootstrap and other resampling methods in regression analysis. The Annals of Statistics 14:, 1986.
    Z. Zhou. Measuring nonlinear dependence in time-series, a distance correlation approach. Journal of Time Series Analysis 33:, 2012.

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Pitsillou & Fokianos, "dCovTS: Distance Covariance/Correlation for Time Series", The R Journal, 2016

    BibTeX citation

    @article{RJ-2016-049,
      author = {Pitsillou, Maria and Fokianos, Konstantinos},
      title = {dCovTS: Distance Covariance/Correlation for Time Series},
      journal = {The R Journal},
      year = {2016},
      note = {https://rjournal.github.io/},
      volume = {8},
      issue = {2},
      issn = {2073-4859},
      pages = {324-340}
    }