The distance covariance function is a new measure of dependence between random vectors. We drop the assumption of iid data to introduce distance covariance for time series. The R package dCovTS provides functions that compute and plot distance covariance and correlation functions for both univariate and multivariate time series. Additionally it includes functions for testing serial independence based on distance covariance. This paper describes the theoretical background of distance covariance methodology in time series and discusses in detail the implementation of these methods with the R package dCovTS.
There has been a considerable recent interest in measuring dependence by
employing the concept of the distance covariance function. Székely, M. L. Rizzo, and N. K. Bakirov (2007)
initially introduced the distance covariance as a new measure of
dependence defined as the weighted
(Székely, M. L. Rizzo, and N. K. Bakirov 2007) distance covariance methodology is based on the assumption that the underlying data are iid. However, this assumption is often violated in many practical problems. Remillard (2009) proposed to extend the distance covariance methodology to a time series context in order to measure serial dependence. There have been few works on how to develop a distance covariance methodology in the context of time series (Zhou 2012; Dueck, D. Edelmann, T. Gneiting, and D. Richards 2014; Davis, M. Matsui, T. Mikosch, and P. Wan 2016). Motivated by the work of Székely, M. L. Rizzo, and N. K. Bakirov (2007), Zhou (2012) recently defined the so-called auto-distance covariance function (ADCV) - and its rescaled version, the so-called auto-distance correlation function (ADCF), for a strictly stationary multivariate time series. Compared to the classical Pearson autocorrelation function (ACF) which measures the strength of linear dependencies and can be equal to zero even when the variables are related, ADCF vanishes only in the case where the observations are independent. However, Zhou (2012) studied the asymptotic behavior of ADCV at a fixed lag order. Fokianos and M. Pitsillou (2016a) relaxed this assumption and constructed a univariate test of independence by considering an increasing number of lags following (Hong 1999) generalized spectral domain methodology. Although the proposed methodology is for univariate processes, it can be extended for multivariate processes.
Zhou (2012) developed a distance covariance methodology for multivariate time series, but he did not explore the interrelationships between the various time series components. Fokianos and M. Pitsillou (2016b) made this possible by defining the matrix version of pairwise auto-distance covariance and correlation functions. In particular, they construct multivariate tests of independence based on these new measures in order to identify whether there is some inherent nonlinear interdependence between the component series.
The energy
(Rizzo and G. J. Szekely 2014) package for R is a package that involves a wide
range of functions for the existing distance covariance methodology.
However, there is no package for the aforementioned distance covariance
methodology in time series. Thus, we aim at filling this gap by
publishing an R-package named dCovTS. In this first version of the
package, we provide functions that compute and plot ADCV and ADCF using
the functions dcov()
and dcor()
respectively from energy package.
The new testing methodology proposed by Fokianos and M. Pitsillou (2016a,b) is also included in the package.
The structure of the paper is as follows. In the first two sections we
introduce the theoretical background of distance covariance function for
both univariate and multivariate time series respectively. In the next
section, we briefly state the main results about the asymptotic
properties of distance covariance function. The proposed testing
methodology for both univariate and multivariate time series are also
described. Empirical
Denote a univariate strictly stationary time series by
Although Hong (1999) suggests the use of an arbitrary integrable weight
function,
The empirical ADCV,
The empirical ADCF, ADCV()
and
ADCF()
in dCovTS return the empirical quantities
unbiased=TRUE
, the results correspond
to the unbiased sqaured quantities unbiased=FALSE
(corresponding to (6)).
We denote by
The pairwise ADCF between
Estimation of
The sample ADCV matrix, mADCV()
function. The
estimator based on (11), unbiased = TRUE
. The package
also gives the sample ADCF matrix mADCF()
) which is obtained after replacing (10) (or
ADCFplot()
and
mADCFplot()
functions respectively, where the shown critical values
(blue dotted horizontal line) are computed by employing bootstrap
methodology described in the appropriate section. Recall that these are
computed by using the biased definition of distance covariance and
correlation.
Consider first the univariate case. For a strictly stationary and
In addition, Fokianos and M. Pitsillou (2016b) showed that for a
As shown in the previous section, the asymptotic distribution of
distance covariance is derived at a fixed lag, for both univariate and
multivariate time series. Fokianos and M. Pitsillou (2016a,b)
constructed the asymptotic behavior of distance covariance considering
an increasing number of lags by employing (Hong 1999) generalized spectral
domain methodology. Hong (1999) highlighted that standard spectral density
approaches become inappropriate for non-Gaussian and nonlinear processes
with zero autocorrelation. Considering a univariate strictly stationary
The function kernelFun()
in dCovTS computes a number of such kernel
functions including the truncated (default option), Bartlett, Daniell,
QS and Parzen kernels.
Fokianos and M. Pitsillou (2016a) proposed a portmanteau type statistic based on ADCV
UnivTest
from dCovTS package performs univariate tests
of independence based on (14) and its rescaled version
(15), using the arguments testType = "covariance"
and
testType = "correlation"
respectively.
Following a similar methodology described in the previous section,
Fokianos and M. Pitsillou (2016b) suggested a test statistic suitable for testing
pairwise independence in a multivariate time series framework. The
proposed test statistic is based on the ADCV matrix (8), and it
is given by
where
where mADCVtest()
and mADCFtest()
respectively in dCovTS package.
To examine the asymptotic behavior of the proposed test statistics, a
resampling method is proposed. First, recall that all test statistics
We also suggest the use of independent wild bootstrap for obtaining
simultaneous method = "Subsampling"
). The choice of the subsampling block
size is based on the minimum volatility method proposed by Politis, J. P. Romano, and M. Wolf (1999 9.4.2). In addition, the package provides the ordinary
independent bootstrap methodology to derive empirical method = "Independent Bootstrap"
). The default bootstrap
method provided to the user is the independent wild bootstrap technique.
The computation of the bootstrap replications, and thus the empirical
parallel = TRUE
). To do this, the
doParallel
(Analytics and S. Weston 2015) package needs to be installed first, in order to register
a computing cluster.
The current version of dCovTS package (version number 1.1) is available from CRAN and can be downloaded via https://cran.r-project.org/web/packages/dCovTS/. The aim of the dCovTS package is to provide a set of functions that compute and plot distance covariance and correlation functions in both univariate and multivariate time series. As we mentioned, the package supports both versions of biased and unbiased estimators of distance covariance and correlation functions. Moreover, it offers functions that perform univariate and multivariate tests of independence based on distance covariance function using the biased estimator (corresponding to (6) and (10)). All these functions are provided in Table 1. Apart from these functions, the package also provides two real datasets listed in Table 2. A more detailed description of the functions and datasets can be found in the help files. We apply dCovTS to two real data examples.
Function | Description |
---|---|
ADCF , mADCF |
Estimates distance correlation for a univariate and multivariate time series respectively |
ADCV , mADCV |
Estimates distance covariance for a univariate and multivariate time series respectively |
ADCFplot , mADCFplot |
Plots sample distance correlation in a univariate and multivariate time series framework respectively |
kernelFun |
Gives a range of univariate kernel function, |
UnivTest |
Performs a univariate test of independence based on |
mADCFtest , mADCVtest |
Perform multivariate tests of independence based on |
Data | Description |
---|---|
ibmSp500 |
Monthly returns of IBM and S |
MortTempPart |
Mortality, temperature and pollution data measured daily in Los Angeles County over the period 1970-1979 |
We first consider the pollution, temperature and mortality data measured
daily in Los Angeles County over the 10 year period 1970-1979
(Shumway, R. S. Azari, and Y. Pawitan 1988). The data are available in our package by the argument
MortTempPart
and contain 508 observations and 3 variables representing
the mortality ("cmort"
), temperature ("tempr"
) and pollutant
particulates ("part"
) data.
library(dCovTS)
data(MortTempPart)
MortTempPart[1:10,] # the first ten observations
## cmort tempr part
## 1 97.85 72.38 72.72
## 2 104.64 67.19 49.60
## 3 94.36 62.94 55.68
## 4 98.05 72.49 55.16
## 5 95.85 74.25 66.02
## 6 95.98 67.88 44.01
## 7 88.63 74.20 47.83
## 8 90.85 74.88 43.60
## 9 92.06 64.17 24.99
## 10 88.75 67.09 40.41
attach(MortTempPart)
Following the analysis of Shumway and D. S. Stoffer (2011), the possible effects of
temperature (lm()
The plots shown in Figure 1 suggest an AR(2) process
for the residuals. The new fit is
arima()
function of R.
The correlation plots for the residuals from the new model
(19) are shown in Figure 2 indicating that
there is no serial dependence.
The calls for both model fits and their diagnostic plots are given
below. ADCF plots (lower plots of Figures 1 and
2) are constructed using both resampling schemes
explained in the previous section: independent wild bootstrap (with
temp <- tempr - mean(tempr) # center temperature
temp2 <- temp^2
trend <- time(cmort)
fit <- lm(cmort ~ trend + temp + temp2 + part, na.action = NULL)
Residuals <- as.numeric(resid(fit))
##Correlation plots
acf(Residuals, lag.max = 18,main = "")
pacf(Residuals, lag.max = 18,main = "")
ADCFplot(Residuals, MaxLag = 18, main = "Wild Bootstrap", method = "Wild")
ADCFplot(Residuals, MaxLag = 18, main = "Subsampling", method = "Subsampling")
fit2 <- arima(cmort, order =c(2, 0, 0), xreg = cbind(trend, temp, temp2, part))
Residuals2 <- as.numeric(residuals(fit2))
##Correlation plots
acf(Residuals2, lag.max = 18, main = "")
pacf(Residuals2, lag.max = 18, main = "")
ADCFplot(Residuals2, MaxLag = 18, main = "Wild Bootstrap", method = "Wild")
ADCFplot(Residuals2, MaxLag = 18, main = "Subsampling", method = "Subsampling")
To formally confirm the absence of any serial dependence among the new
residuals of model (19), as shown in Figure
2, we perform univariate tests of independence based
on the test statistic UnivTest()
function from our package with argument
testType = "covariance"
(default option). In order to examine the
effect of using different bandwidths, we choose
parallel = TRUE
(they take about 10, 14 and 23 seconds respectively on
a standard laptop with Intel Core i5 system and CPU 2.30 GHz):
UnivTest(Residuals2, type = "bartlett", p = 6, b = 499, parallel = TRUE)
## Univariate test of independence based on distance covariance
##
## data: Residuals2, kernel type: bartlett, bandwidth=6, boot replicates 499
## Tn = 67.7344, p-value = 0.118
UnivTest(Residuals2, type = "bartlett", p = 11, b = 499, parallel = TRUE)
## Univariate test of independence based on distance covariance
##
## data: Residuals2, kernel type: bartlett, bandwidth=11, boot replicates 499
## Tn = 125.6674, p-value = 0.170
UnivTest(Residuals2, type = "bartlett", p = 20, b = 499, parallel = TRUE)
## Univariate test of independence based on distance covariance
##
## data: Residuals2, kernel type: bartlett, bandwidth=20, boot replicates 499
## Tn = 225.9266, p-value = 0.208
We compare the proposed test statistic with other test statistics to
check its performance. In particular, we consider the Box-Pierce
(Box and D. A. Pierce 1970) test statistic
Box.test()
as
follows:
box1 <- Box.test(Residuals2, lag = 6)
box2 <- Box.test(Residuals2, lag = 11)
box3 <- Box.test(Residuals2, lag = 20)
ljung1 <- Box.test(Residuals2, lag = 6, type = "Ljung")
ljung2 <- Box.test(Residuals2, lag = 11, type = "Ljung")
ljung3 <- Box.test(Residuals2, lag = 20, type = "Ljung")
The
We now analyze the monthly log returns of the stocks of International
Business Machines (IBM) and the S&P 500 composite index starting from 30
September 1953 to 30 December 2011 for 700 observations. A larger
dataset is available in our package by the object ibmSp500
starting
from January 1926 for 1032 observations. It is actually a combination of
two smaller datasets: the first one was first reported by Tsay (2010) and
the second one was first reported by Tsay (2014). ACF and ADCF plots of the
original series are provided in Figure 3, whereas
Figure 4 shows the ACF and ADCF plots of the squared
series.
(a)
(b)
(a)
(b)
The R commands for constructing these plots are as follows:
data(ibmSp500)
new_data <- tail(ibmSp500[,2:3], 700)
series <- log(new_data + 1)
t=scale(lseries, center = TRUE, scale = FALSE)
t2 <- at^2
olnames(at) <- c("IBM", "SP")
olnames(at2) <- c("IBM_sq", "SP_sq")
cf(at, lag.max = 18)
cf(at2, lag.max = 18)
ADCFplot(at, MaxLag = 18, ylim = c(0, 0.2))
ADCFplot(at2, MaxLag = 18, ylim = c(0, 0.2))
The ACF plots of the original series (upper panel of Figure
3) suggest no serial correlation among observations,
while the ACF plots of the squared series (upper panel of Figure
4) imply strong dependence. This confirms the
conditional heteroscedasticity in the monthly log returns. However, the
ADCF plots for both original and squared series (lower panels of Figures
3 and 4) suggest dependence. Indeed,
choosing
mADCFtest(at, "bartlett", p = 6, b = 499, parallel = TRUE)
## Multivariate test of independence based on distance correlation
##
## data: at, kernel type: bartlett, bandwidth=6, boot replicates 499
## Tnbar = 34.1743, p-value = 0.022
mADCFtest(at, "bartlett", p = 12, b = 499, parallel = TRUE)
## Multivariate test of independence based on distance correlation
##
## data: at, kernel type: bartlett, bandwidth=12, boot replicates 499
## Tnbar = 71.1713, p-value = 0.014
mADCFtest(at, "bartlett", p = 22, b = 499, parallel = TRUE)
## Multivariate test of independence based on distance correlation
##
## data: at, kernel type: bartlett, bandwidth=22, boot replicates 499
## Tnbar = 122.9424, p-value = 0.02
To compare the performance of the proposed test statistic
> library(portes)
> LjungBox(at, c(6, 12, 22))
Assuming that the bivariate log returns follows a VAR model and
employing the AIC to choose its best order, we obtain that a VAR(2)
model fits the data well. Figure 5 shows the ACF plots
(upper panel) and ADCF plots (lower panel) of the residuals after
fitting a VAR(2) model to the original bivariate log return series using
the function VAR()
from the
MTS (Tsay 2015) package.
(a)
(b)
In contrast to the ACF plot, the ADCF plot still indicates some
dependence among the residuals. Constructing tests of independence based
on
library(MTS)
model <- VAR(at, 2)
resids <- residuals(model)
colnames(resids) <- c("IBM_res", "SP_res")
windows(9, 6)
acf(resids, lag.max = 18)
mADCFplot(resids, MaxLag = 18, ylim = c(0, 0.13))
## Tests of independence based on \overline{T}_n
mADCFtest(resids, "bartlett", p = 6, b = 499, parallel = TRUE)
## Multivariate test of independence based on distance correlation
##
## data: resids, kernel type: bartlett, bandwidth=6, boot replicates 499
## Tnbar = 29.9114, p-value = 0.036
mADCFtest(resids, "bartlett", p = 12, b = 499, parallel = TRUE)
## Multivariate test of independence based on distance correlation
##
## data: resids, kernel type: bartlett, bandwidth=12, boot replicates 499
## Tnbar = 64.7754, p-value = 0.018
mADCFtest(resids, "bartlett", p = 22, b = 499, parallel = TRUE)
## Multivariate test of independence based on distance correlation
##
## data: resids, kernel type: bartlett, bandwidth=22, boot replicates 499
## Tnbar = 115.3462, p-value = 0.034
## Tests of independence based on mLB
LjungBox(resids, c(6, 12, 22))
There have been many works in the literature based on (Székely, M. L. Rizzo, and N. K. Bakirov 2007) distance covariance methodology. The R package energy (Rizzo and G. J. Szekely 2014), provides functions that cover this methodology. However, there is no published package that includes functions about distance covariance for time series data. dCovTS contributes to filling this gap by providing functions that compute distance covariance and correlation functions for both univariate and multivariate time series. We also include functions that develop univariate and multivariate tests of serial dependence based on distance covariance and correlation functions.
There is a number of possible extensions of this package, and some of them are not covered by existing theory and can be seen as further research. One possible direction is to develop a theory based on partial ADCV or conditional ADCV and a related testing methodology to identify possible dependencies among time series (see Székely and M. L. Rizzo (2014) for partial distance covariance methodology and Poczos and J. Schneider (2012), Wang, W. Pan, W. Hu, Y. Tian, and H. Zhang (2015) for conditional distance covariance methodology; all three works deal with independent random variables). Among the many applications of partial correlation are graphical models. Thus, a graphical modeling theory based on partial ADCV could be carried out and this methodology can be included for a future version of this package.
The authors thank Tobias Liboschik for his considerable help on the development of this package. The authors would also like to thank Dominic Edelmann for carefully checking the package and making helpful comments and suggestions for its improvement. In addition, we would like to extend our gratitude to R. Bivand and to an anonymous reviewer whose comments improved our original submission.
energy, doParallel, portes, MTS
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Pitsillou & Fokianos, "dCovTS: Distance Covariance/Correlation for Time Series", The R Journal, 2016
BibTeX citation
@article{RJ-2016-049, author = {Pitsillou, Maria and Fokianos, Konstantinos}, title = {dCovTS: Distance Covariance/Correlation for Time Series}, journal = {The R Journal}, year = {2016}, note = {https://rjournal.github.io/}, volume = {8}, issue = {2}, issn = {2073-4859}, pages = {324-340} }