The last decades show an increased interest in modeling various types of data through copulae. Different copula models have been developed, which lead to the challenge of finding the best fitting model for a particular dataset. From the other side, a strand of literature developed a list of different Goodness-of-Fit (GoF) tests with different powers under different conditions. The usual practice is the selection of the best copula via the
Being firstly introduced in 1959 by Abe Sklar (see Sklar (1959)), copulae gained enormous popularity in applications in the last two decades. Researchers from different fields recognize the power of copulae while working with multivariate datasets from insurance (Fang and Madsen 2013; Shi et al. 2016), finance (Salvatierra and Patton 2015; Oh and Patton 2018), biology (Konigorski et al. 2014; Dokuzoğlu and Purutçuoğlu 2017), hydrology (Liu et al. 2018; Valle and Kaplan 2019), medicine (Kuss et al. 2014; Gomes et al. 2019), traffic engineering, (Huang et al. 2017; Ma et al. 2017), etc. For a recent review, we refer to (Größer and Okhrin 2021). Unfortunately, the correct specification of the multivariate distribution is not easy to find, and often interest in the understanding of the functional form of the copula is dominated by the expected performance of the whole model. This is natural, taking into account the huge list of different copula models proposed in the literature for different needs; see, e.g., Durante and Sempi (2010), Joe and Kurowicka (2010), or Genest and Nešlehová (2014). Although an expanding list of R-packages devoted to copulae is existent, the issue of GoF testing is less frequently addressed. Primarily, GoF tests for copulae are implemented in copula comparison packages as copula (Hofert et al. 2020), TwoCop (Remillard and Plante 2012), and VineCopula (Nagler et al. 2019), but since Remillard and Scaillet (2009) and Genest et al. (2009), many other powerful tests were developed that are not integrated into these packages. Most of the tests focus on the bivariate case, leaving a further gap in the existing R-package landscape.
Given a variety of tests, the selection of the most appropriate copula
seems simple at first glance. However, the power of these tests varies
significantly depending on the use case and the copula tested for. The
absence of the overall best GoF test leads researchers and practitioners
often to the selection of the test (and copula), which supports some
subjective expectation, but not the copula that fits the data at its
best. Although GoF tests are not intended for model selection but rather
to decide whether the selected copula is not suitable for the data,
the model selection strategy based on the rank of the
This implies that for each year, a different pair of tests returns consistent results. In an empirical study, it is difficult to decide which copula is suitable and which test provides the most plausible results. An extensive comparative study of different GoF tests was performed a decade ago by Genest et al. (2009), intensively discussing all, up to that moment existing, tests for copulae. The main findings are that there is no superior blanket test, but several tests have very good power under different, often disjunct conditions. A test proposed by Zhang et al. (2016) fills some gaps in the set of models under which this test performs better than others under certain conditions. However, a common phenomenon in empirical studies is the interpretation of the non-rejection of a copula as the correct model. Especially, in situations where the used GoF test has low power, this is not necessarily the case. Tackling this issue, Zhang et al. (2016) also developed the hybrid test, which is simple in construction and implementation. It combines the power of different tests and is very helpful for practitioners; see Section 6. However, even in this case, the interpretation of finding the correct copula should be treated with care.
We propose the R-package
gofCopula to
automatize the whole empirical procedure of selecting the most suitable
copula. Table 1 displays the broad range
of available tests, copula models, and the maximum dimension. The latest
version of this table is also accessible via the function
CopulaTestTable()
in the package. Further details on the functionality
of each test are provided in Section 3, while Table
3 of Appendix A
contains some characteristics of the copulae implemented in the package.
Test | normal |
t |
clayton |
gumbel |
frank |
joe |
amh |
galambos |
huslerReiss |
tawn |
tev |
fgm |
plackett |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gofCvM | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |||||||
gofKS | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |||||||
gofKendallCvM | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |||||||
gofKendallKS | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |||||||
gofRosenblattSnB | 2 | 2 | - | - | - | 2 | 2 | |||||||
gofRosenblattSnC | 2 | 2 | - | - | - | 2 | 2 | |||||||
gofRosenblattGamma | 2 | 2 | - | - | - | 2 | 2 | |||||||
gofRosenblattChisq | 2 | 2 | - | - | - | 2 | 2 | |||||||
gofArchmSnB | - | - | 2 | - | - | - | - | - | - | |||||
gofArchmSnC | - | - | 2 | - | - | - | - | - | - | |||||
gofArchmGamma | - | - | 2 | - | - | - | - | - | - | |||||
gofArchmChisq | - | - | 2 | - | - | - | - | - | - | |||||
gofKernel | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
gofWhite | 2 | 2 | 2 | 2 | 2 | 2 | - | - | - | - | - | - | - | |
gofPIOSTn | 3 | 2 | 3 | 3 | 3 | 3 | 2 | 2 | - | - | - | 2 | 2 | |
gofPIOSRn | 3 | 2 | 3 | 3 | 3 | 3 | 2 | 2 | - | - | - | 2 | 2 |
In summary, the package gofCopula offers the following attractive features which distinguish it from other R-packages:
"gofCOP"
, which allows for a comprehensive overview of the test
results. For a better comparison of the results, we extend the
generic plot
function for objects of class "gofCOP"
, which
illustrates the results in a convenient manner. The plot
function
is supported by the R-package
yarrr
(Phillips 2017), and was customized to provide the user an
insightful figure for the interpretation of the results.The test statistics of six GoF tests were already implemented in
R-packages. Thus, for the computation of the test statistics of some
tests, we use the functions gofTstat
and BiCopGofTest
from the
packages copula and VineCopula, respectively. For obvious reasons,
we did not implement all existing tests on copulae, but we will embed
new tests in the proposed package as soon as they become more relevant
and actively used among academics and practitioners.
The paper, introducing the R-package gofCopula, is structured as follows: The tests and methodology implemented in the package are introduced in Sections 2 and 3 before presenting the functionalities of the package in Section 4. We explain major functions, how to apply them, and elaborate the main arguments of each function. The explanations are supported by R-code and output. To provide an impression of the runtime of various tests, we discuss the speed of the tests depending on the copula to test for, the number of observations, and the number of bootstrap samples. A simulated example (Section 5) contains a typical step-by-step procedure of how the package can be used in practice, which is also applied to two real-world examples (Section 6), in which all corresponding codes are given and explained. The cases are illustrated with interpretations of the console output and plots, both generated directly from gofCopula, without any additional code. The results of the two applications can be fully reproduced by the gofCopula package, which also contains the used datasets. All illustrations, simulations, and applications in this paper are fully reproducible and designed to guide the user into conducting their own research with the gofCopula package.
Consider a
In the bivariate case, one of the standard methods of estimation of the
univariate parameter
Furthermore, Fermanian and Scaillet (2003) and Chen and Huang (2007) consider a
fully non-parametric estimation of the copula, which is heavily used in
the GoF testing. It is called an empirical copula and is shown to be a
consistent estimator of the true underlying copula, c.f.
Gaensler and Stute (1987) and Radulovic and Wegkamp (2004). This
estimator is defined as
Having a list of different copulae, the most suitable one for real
applications still needs to be found and motivated. For this purpose, a
series of different GoF tests has been developed in the last decades.
Several authors, e.g., Genest et al. (2009), tested the power
of those tests against each other and showed that no superior test for
all possible situations exist. We cover 16 tests and implement them into
the gofCopula package. Most of these tests work with the parametric
family of copulae denoted by
The first group is based on the most natural approach: the deviation of
the empirical copula gofTstat
, so we included it into our
package. The tests are later denoted by gofCvM
and gofKS
,
respectively.
The tests from the second group were developed and investigated by
Genest and Rivest (1993), Wang and Wells (2000), Genest et al. (2006).
The main idea behind them is to use the copula-based random variable:
gofKendallCvM
and gofKendallKS
, respectively.
Under the assumption of copula dependency, the conditional distribution
of
Definition 1. Rosenblatt’s probability integral transform of a
copula
If the copula is correctly specified, the variables
Two different types of tests may be constructed using this property. In
the first type, similar to the previous two groups, we measure the
deviation of the product copula of gofRosenblattSnB
and gofRosenblattSnC
, respectively.
The second type of test uses the fact that a specific combination of
independent uniformly distributed random variables follows some known
distribution. Based on this, two further Anderson-Darling type tests
were introduced by Breymann et al. (2003). By defining
gofRosenblattGamma
and
gofRosenblattChisq
, respectively, and are obtained via the function
gofTstat
from the package copula.
Recently, (Hering and Hofert 2015) proposed a procedure of GoF testing based on a transformation similar to the one of (Rosenblatt 1952) specifically designed for Archimedean copulae.
Definition 2. Hering and Hofert’s transformation of an Archimedean
copula
Distribution function
The null hypothesis equals ((6)) from the tests
based on Rosenblatt’s probability integral transform:
gofArchmSnB
, gofArchmSnC
, gofArchmGamma
,
and gofArchmChisq
in the package.
A test
from this group has been introduced by Scaillet (2007). Following his
approach, a nodes.Integration
. A scaling parameter for delta.J
, and the internal size of the bootstrapping samples can be
controlled via MJ
. This test is denoted by gofKernel
in the package.
This test was introduced by Huang and Prokhorov (2014) and had its
foundation in the information matrix equality stated by
White (1982). Given the presence of certain regularity conditions,
the White equality establishes a connection between the negative
sensitivity matrix BiCopGofTest
from the VineCopula package, again, in
order to avoid code redundancy. Note that the implementation of the test
can be unstable for the gofWhite
in the package.
A recent test using a leave-one-block strategy, and its approximation
were introduced by Zhang et al. (2016). Authors derive
compares the full likelihood, “in-sample”, against the resulting likelihoods from the leave-one-block out estimation, “out-of-sample”. If the data in each block significantly influence the estimation of the copula parameter under the null hypothesis, then the chosen copula model is inadequate to represent the data.
Depending on the number of blocks, gofPIOSTn
and gofPIOSRn
,
respectively.
Many power studies including Genest et al. (2009) showed that
no overall single optimal test exists for testing for copula models.
Zhang et al. (2016) introduced a Hybrid test to combine the
testing power of several tests. Having
As the distribution of the test statistics is in most cases unknown, we
perform a parametric bootstrap to receive the
Depending on the different tests, variants of the described steps have
to be performed. For example, the Kernel density estimation test of
Scaillet (2007) described in Section
3.5 relies on a
double bootstrapping procedure, in which for the computation of each
test statistic, M
and MJ
, respectively.
The core of the gofCopula package is the function gof
, which
computes different tests for different copulae for a given dataset,
based on the user’s choice.
R> library("gofCopula")
R> data("IndexReturns2D", package = "gofCopula")
R> system.time(result <- gof(IndexReturns2D, M = 100, seed.active = 1))
The margins will be estimated as: ranks
normal copula
Test gofCvM is running
Test gofKendallCvM is running
Test gofKendallKS is running
Test gofKernel is running
Test gofKS is running
t copula
Test gofCvM is running
Test gofKendallCvM is running
Test gofKendallKS is running
Test gofKernel is running
Test gofKS is running
clayton copula
Test gofCvM is running
Test gofKendallCvM is running
Test gofKendallKS is running
Test gofKernel is running
Test gofKS is running
gumbel copula
Test gofCvM is running
Test gofKendallCvM is running
Test gofKendallKS is running
Test gofKernel is running
Test gofKS is running
frank copula
Test gofCvM is running
Test gofKendallCvM is running
Test gofKendallKS is running
Test gofKernel is running
Test gofKS is running
joe copula
Test gofCvM is running
Test gofKendallCvM is running
Test gofKendallKS is running
Test gofKernel is running
Test gofKS is running
amh copula
The copula amh is excluded from the analysis since the parameters do not fit its
parameter space. See warnings and manual for more details.
galambos copula
Test gofCvM is running
Progress: [===>--------------------------------------] 15% | time left: 3s
...
user system elapsed
629.26 1.08 628.94
Warnings:
...
gof
considers all 13 available copula models if no copulae or tests
are specified. If a copula is unsuitable in the sense that the estimated
parameter is at the boundary of the parameter space, the copula is
automatically excluded, and the user is informed via a console statement
(see above) and additional warnings. In the given example, this is the
case for the AMH, tawn, and FGM copulae because the used
IndexReturns2D
dataset exhibits an estimated Kendall’s
result
is of class "gofCOP"
and has length 10, which is the number
of copulae used in testing (here: 13) minus the ones excluded during
calculation (here: 3). Following Table 1,
five tests are available for all of these copula models in
If the user specifies copulae, tests, or both, the intersection of
possible tests and copulae following Table
1 is considered. For example, if
copula = c("normal", "tawn")
is specified, the function calculates the
five tests which are implemented for both copulae (assuming tests = c("gofKernel", "gofArchmSnB")
is
selected, the five Archimedean copulae implemented for both tests are
computed. In the case when both copulae and tests are defined, the
function provides results for the possible combinations. During the
calculation, the user is informed about the computation progress by
statements about the running test and copula. Furthermore, a progress
bar indicates the percentage of progress for this specific test as well
as a dynamically updated estimated remaining time.
R> result$normal
$method
[1] "Parametric bootstrap goodness-of-fit test with hybrid test and normal copula"
$copula
[1] "normal"
$margins
[1] "ranks"
$param.margins
list()
$theta
[,1]
[1,] 0.8347428
$df
NULL
$res.tests
p.value test statistic
CvM 1.00 0.01520542
KendallCvM 0.41 0.06286712
KendallKS 0.11 0.80800000
Kernel 0.39 0.56012429
KS 1.00 0.31392428
hybrid(1, 2) 0.82 NA
hybrid(1, 3) 0.22 NA
...
The first element of result
provides results for the normal copula.
Note that in the field res.tests
the hybrid tests, starting after the
individual ones, contain numbers in brackets indicating which tests are
considered for this hybrid. Thus, hybrid(1, 2)
means that this is the
hybrid of CvM
and KendallCvM
tests. The copula = c("clayton", "gumbel")
together with flip = c(0, 180)
,
which would test for the Clayton copula and the 180 degrees rotated
Gumbel copula.
Objects of class "gofCOP"
are generated by the function gof
or a
single test function like, e.g., gofPIOSTn
. They consist of different
sub-elements - one for each copula - as, e.g., result$normal
in the
given example. These sub-elements are lists of length seven and contain
the estimation and test results for the specific copula. They present in
the field method
a description of the test scenario. The field
margins
lists the defined marginal distribution that can also be a
vector of distributions where each element is applied to the respective
data column, whereas param.margins
returns the estimates of the
parameters of the marginal distributions if a parametric approach was
specified. Field theta
contains the ML estimate of the copula
parameter. In case of the df
is
the estimated number of degrees of freedom for the copula. The values of
these parameters are identical for all the tests. In res.tests
, the
M
.
"gofCOP"
objects can be called by a generic plot
function allowing
the user to get the pirateplot
of the R-package yarrr. It enables the
user to select which copulae and hybrid testing sizes are desired for
plotting. The remaining customization options are equal to those of the
function pirateplot
from the package yarrr, except for the arguments
formula
, data
, sortx
, xaxt
, xlim
, ylim
, and ylab
.
R> plot(result, copula = c("clayton", "joe", "plackett"), hybrid = c(1, 3, 5))
Specifying hybrid = c(1, 3, 5)
means that the singleTests
in Figure 1), the
hyb3
), and size five
(column hyb5
) should be plotted, separated by selected copulae. For
example, we focus on the column hyb3
for the Plackett copula. It
contains information of all hybrid tests, which include three single
tests for the Plackett copula. In this case, we can see that the mean of
these tests is approximately 0.76, as shown by the thick horizontal
line. All test pirateplot
and its customization options, we refer to Phillips (2017).
The R-package gofCopula includes all the discussed tests in Section 3. For each of the tests, a separate function is implemented with a variety of arguments. We give shortly the most important arguments all the tests share before we go into details about the structure of the package.
copula
: The copula to test for. Possible options depend on the
test and dimension.x
: A matrix containing the data with rows being observations and
columns being variables.M
(default: 1000): Number of bootstrapping loops.param
(default: 0.5): The copula parameter to use if it shall not
be estimated. In case of the Gumbel copula, the default value is set
to 1.5.param.est
(default: TRUE
): Boolean. TRUE
means that param
will be estimated.margins
(default: ranks
): Specifies which estimation method
shall be used. The default ranks
stands for formula
((3)), which is the standard approach to convert data.
Alternatively, the following distributions can be specified: beta
,
cauchy
, Chi-squared (chisq
), f
, gamma
, Log normal (lnorm
),
Normal (norm
), t
, weibull
, Exponential (exp
).flip
(default: 0): The parameter to rotate the copula by 90, 180,
270 degrees clockwise. Only applicable for bivariate copula.seed.active
(default: NULL
): Sets the seeds for the
bootstrapping procedure. It has to be either an integer or a vector
of M+1
integers. If an integer is provided, the seeds for the
bootstrap samples will be simulated based on it. If M+1
seeds are
given, these are used in the bootstrapping procedure. In the default
case (seed.active = NULL
), R generates the seeds from the computer
runtime.processes
(default: 1): The number of parallel processes which are
performed to speed up the bootstrapping. Should not be larger than
the number of logical processors.The package is coded as a fire-and-forget package. Each of the single
tests just requires the input of a dataset x
and a copula
to test
for. All the other function parameters have reasonable default values
such that quick first results can be achieved easily. The calculation
steps of each GoF test function are the following:
processes
.Besides gof
and the single tests, the package gofCopula offers
additional functionality for the user. Next to descriptions,
illustrative examples are provided, assuming the following was called
beforehand:
R> library("gofCopula")
R> data("IndexReturns2D", package = "gofCopula")
R> (res <- gof(copula = "normal", x = IndexReturns2D, M = 10, seed.active = 1,
+ tests = c("gofPIOSRn", "gofCvM", "gofKernel")))
The margins will be estimated as: ranks
normal copula
Test gofPIOSRn is running
Test gofCvM is running
Test gofKernel is running
-------------------------------------------------------------------------------
Parametric bootstrap goodness-of-fit test with hybrid test and normal copula
Parameters:
theta.1 = 0.834742824340301
Tests results:
p.value test statistic
PIOSRn 0.5 -0.11032857
Sn 1.0 0.01520542
Kernel 0.3 0.56012429
hybrid(1, 2) 1.0 NA
hybrid(1, 3) 0.6 NA
hybrid(2, 3) 0.6 NA
hybrid(1, 2, 3) 0.9 NA
Please use the functions gofGetHybrid() and gofOutputHybrid() for
display of subsets of Hybrid tests. To access the results, please
obtain them from the structure of the gofCOP object.
The functions are:
gofCustomTest
: This function has - next to the standard arguments
listed in Section 4.3 - the argument
customTest
, a character string referencing one customized test
loaded in the workspace. The function containing this test should
have two arguments: a matrix named x
for the dataset and a
character string named copula
for the copula to test for. The
whole calculation process, including the estimation of the margins
and the copula, the calculation of the test statistics, and the
bootstrapping of the gofCvM
test.
R> Testfunc = function(x, copula) {
+ C.theo = pCopula(x, copula = copula)
+ C.n = F.n(x, X = x)
+ CnK = sum((C.n - C.theo)^2)
+ return(CnK)
+ }
R> gofCustomTest(copula = "normal", x = IndexReturns2D, M = 10,
+ customTest = "Testfunc", seed.active = 1)
The margins will be estimated as: ranks
-------------------------------------------------------------------------------
Parametric bootstrap goodness-of-fit test with Testfunc test and normal copula
Parameters:
theta.1 = 0.834742824340301
Tests results:
p.value test statistic
Testfunc 1 0.01520542
gofGetHybrid
: Allows calculating hybrid test "gofCOP"
generated in the package. Through the combination of gofCustomTest
and gofGetHybrid
, the users are not limited to the implemented
tests in the package and have the opportunity to include their own
tests in the analysis. Note that the function gofOutputHybrid
has
slightly different but comparable functionality, which is the reason
it is not separately shown.
R> gofGetHybrid(result = res, nsets = 5, p_values = c("MyTest" = 0.7,
+ "AnotherTest" = 0.3))
```
``` r
-------------------------------------------------------------------------------
Hybrid test p-values for given single tests.
Parameters:
theta.1 = 0.834742824340301
Tests results:
p.value
PIOSRn 0.5
Sn 1.0
Kernel 0.3
MyTest 0.7
AnotherTest 0.3
hybrid(1, 2, 3, 4, 5) 1.0
gofTest4Copula
: Returns for a given copula and a given dimension
the list of applicable implemented tests.
R> gofTest4Copula("gumbel", d = 5)
```
``` r
[1] "gofArchmChisq" "gofArchmGamma" "gofArchmSnB"
[4] "gofArchmSnC" "gofCustomTest" "gofCvM"
[7] "gofKendallCvM" "gofKendallKS" "gofKS"
[10] "gofRosenblattChisq" "gofRosenblattGamma" "gofRosenblattSnB"
[13] "gofRosenblattSnC"
gofCopula4Test
: Returns for a given test the list of applicable
implemented copulae.
R> gofCopula4Test("gofPIOSTn")
```
``` r
[1] "normal" "t" "clayton" "gumbel" "frank" "joe"
[7] "amh" "galambos" "fgm" "plackett"
gofCheckTime
: Estimates the time necessary to compute a selected
single or group of GoF tests for a given number of bootstrapping
rounds. This function uses an underlying regression model, so the
results may vary from reality and also from the progress bar
predictions. See Section 4.5.
R> gofCheckTime("normal", x = IndexReturns2D, tests = "gofRosenblattSnC",
+ M = 10000, seed.active = 1)
```
``` r
The margins will be estimated as: ranks
An estimate of the computational time is under derivation.
Depending on the tests chosen, dimensionality and complexity of the data, this
might take a while.
The computation will take approximately 0 d, 0 h, 10 min and 7 sec.
gofco
: In the case a copula is already estimated with the package
copula, one can provide an object of class "copula"
to this
function, and the parameter estimates are taken from the respective
object.
R> copObject = normalCopula(param = 0.8)
R> gofco(copObject, x = IndexReturns2D, M = 10, seed.active = 1,
+ tests = c("gofPIOSRn", "gofKernel"))
```
``` r
The margins will be estimated as: ranks
Test gofPIOSRn is running
Test gofKernel is running
-------------------------------------------------------------------------------
Parametric bootstrap goodness-of-fit test with hybrid test and normal copula
Parameters:
theta.1 = 0.8
Tests results:
p.value test statistic
PIOSRn 0.9 -0.03641543
Kernel 0.2 0.57115224
hybrid(1, 2) 0.4 NA
Please use the functions gofGetHybrid() and gofOutputHybrid() for
display of subsets of Hybrid tests. To access the results, please
obtain them from the structure of the gofCOP object.
One of the main drivers of long computation times is the high number of
bootstrapping loops to achieve an asymptotically reliable result. As
mentioned in Section 4.4, the build-in function
gofCheckTime
allows estimating the necessary computation time for a
given test, copula, dataset, and number of bootstrapping rounds. Since
different machines may have highly varying computation times for tests,
the function relies on a regression using the number of bootstrapping
loops as the independent variable. To ensure that the linear model is a
valid assumption, we investigated the case using the functions
gofKendallKS
and gofKernel
; see Section
5.2 for the results.
Enabling parallelization of the bootstrapping is necessary for
computationally demanding tests as gofPIOSTn
where, e.g., the
computation for a dataset of 500 observations and 1000 bootstrapping
loops for the
Test | normal | t | clayton | gumbel | frank |
---|---|---|---|---|---|
#processes = 1: | |||||
gofKendallCvM | 200.33 | 530.22 | 166.85 | 249.86 | 167.81 |
gofKendallKS | 115.43 | 450.50 | 100.69 | 172.26 | 88.83 |
gofPIOSRn | 84.90 | 433.32 | 67.20 | 148.09 | 50.98 |
gofRosenblattChisq | 75.79 | 431.02 | 73.09 | 143.94 | 55.97 |
gofRosenblattGamma | 75.28 | 420.31 | 75.77 | 142.16 | 52.17 |
#processes = 4: | |||||
gofKendallCvM | 106.19 | 288.86 | 104.29 | 140.25 | 94.69 |
gofKendallKS | 68.73 | 238.55 | 56.13 | 89.02 | 53.00 |
gofPIOSRn | 48.81 | 235.35 | 47.37 | 83.71 | 33.66 |
gofRosenblattChisq | 49.43 | 230.33 | 45.83 | 85.70 | 35.47 |
gofRosenblattGamma | 47.45 | 234.50 | 48.41 | 82.04 | 37.32 |
We would like to illustrate the power of the GoF tests with the use of
the gofCopula package. In practice, one is often confronted with
realizations of random variables for which an adequate copula model has
to be found, as, e.g., in the two examples from the financial domain
provided in Section 6. To illustrate the procedure,
we focus in this Section on an easy replicable example. For this
purpose, we start by simulating
R> library("gofCopula")
R> param = iTau(copula = claytonCopula(), tau = 0.5)
R> n = 1000; set.seed(1)
R> x = rCopula(n = n, copula = claytonCopula(param = param))
To gain a better understanding of the data, Figure 2 shows the simulated data with different margins, reflecting the typical shape of the Clayton copula.
R> par(mfrow = c(1,2))
R> u = cbind(ecdf(x[,1])(x[,1]), ecdf(x[,2])(x[,2])) * n / (n + 1)
R> plot(u, col = "blue3", pch = 19, cex.lab = 1.25,
+ xlab = expression(u[1]), ylab = expression(u[2]))
R> plot(qnorm(u), col = "blue3", pch = 19, cex.lab = 1.25,
+ xlab = expression(x[1]), ylab = expression(x[2]))
R> par(mfrow = c(1,1))
To make an adequate decision on which copula should be used in the
respective modeling task, the GoF testing should involve more than
looking at one or two test results and should consider a reasonable
amount of potential copula models. We structure our procedure by testing
for three groups of copulae separately: Elliptical, Archimedean, and
extreme value (EV) copulae. In the function call, we select the FGM and
Plackett copulae together with the EV category, although they do not
belong to any of the three categories. Elliptical copulae include the
normal and processes = 7
to speed up the calculation using
parallelization on 7 cores. We use the default margins = "ranks"
and
set seed.active = 10
for reproducibility.
R> cop_1 = gof(x = x, M = 100, MJ = 1000, processes = 7, seed.active = 10,
+ copula = c("normal", "t"))
R> cop_2 = gof(x = x, M = 100, MJ = 1000, processes = 7, seed.active = 10,
+ copula = c("clayton", "gumbel", "frank", "joe", "amh"))
R> cop_3 = gof(x = x, M = 100, MJ = 1000, processes = 7, seed.active = 10,
+ copula = c("galambos", "huslerReiss", "tawn", "tev", "fgm", "plackett"))
To evaluate the gained objects of class "gofCOP"
, one can manually
inspect the resulting plot
function.
R> plot(cop_1)
R> plot(cop_2)
R> plot(cop_3)
Interpreting Figures 3, 4, and
5 clearly shows the ability of the tests to detect the
true copula. The column singleTests
in Figure 4
indicates that the Clayton copula is appropriate. The decision is
supported by the higher-order hybrid tests, as all
In the next step, we validate the assumption of using a linear model for
estimating the computation time in gofCheckTime
. We have chosen the
gofKendallKS
test as a representative for the group of single
bootstrapping tests and gofKernel
, as the test having a double
bootstrapping procedure. Both tests are available for all copulae in the
bivariate case. For gofKendallKS
, we measured the computation times
for 12 copulae, varying numbers of bootstrap loops (gofKernel
, we
fixed
For the gofKendallKS
test, the computation time increases linearly
with the number of bootstrapping loops gofKernel
test. Here, a rapid increase in computation time is
expected if both gofKernel
a linear model with
gofKendallKS
for different copulae,
sample sizes gofKernel
for different copulae,
number of bootstrapping loops gofKernel
for different copulae,
number of bootstrapping loops We intend to demonstrate the functionality of the gofCopula package and show the empirical procedure as described in Section 5.1 on a real-world example from the market of cryptocurrencies. To account for the relevant steps in a realistic application study, we split the procedure into Data Investigation and Goodness-of-Fit testing.
We have chosen Bitcoin (BTC) and Litecoin (LTC) for our analysis. The
objective is to detect which copula is appropriate to model the
dependence structure between BTC-LTC and check whether the copula
changes over the years. For that purpose, we use the volatility-adjusted
log-returns of the currencies in the time span from 2015 to 2018. The
volatility correction was performed by fitting a GARCH(1,1) process to
each time series for each year separately in order to extract their
standardized residuals. These are included in the package as
CryptoCurrencies
, whereas each element of the list contains the data
for a particular year. In order to gain a visual impression beforehand,
we plotted the data with margins transformed to standard normal, leading
to Figure 9. A strong dependency between both
cryptocurrencies is visible, especially in the year 2018. Based on these
residual diagrams, it is possible to take a guess which copula is the
most adequate for the given situation. For 2015, one could possibly
argue that the elliptical shape of a normal copula is present, while in
2016 and 2018, the shapes are more similar to the one of a
R> library("gofCopula")
R> data("CryptoCurrencies", package = "gofCopula")
R> par(mfrow = c(2,2))
R> years = as.character(2015:2018)
R> for(i in years){
+ x1 = CryptoCurrencies[[i]][,1]
+ x2 = CryptoCurrencies[[i]][,2]
+ n = length(x1)
+ plot(qnorm(cbind(ecdf(x1)(x1), ecdf(x2)(x2)) * n / (n + 1)), col = "blue3",
+ pch = 19, cex.lab = 1.25, main = i, xlab = "Bitcoin", ylab = "Litecoin")
+ }
R> par(mfrow = c(1,1))
In this example, the focus in testing is on the most popular copula
models in practice: normal, gof
while setting the
bootstrap parameters processes = 7
. For replicability, we
set seed.active = 1:101
and apply the non-parametric margin
transformation by default.
R> copulae = c("normal", "t", "clayton", "gumbel", "frank")
R> BTC_LTC_15 = gof(x = CryptoCurrencies[["2015"]], copula = copulae, M = 100,
+ MJ = 1000, processes = 7, seed.active = 1:101)
R> BTC_LTC_16 = gof(x = CryptoCurrencies[["2016"]], copula = copulae, M = 100,
+ MJ = 1000, processes = 7, seed.active = 1:101)
R> BTC_LTC_17 = gof(x = CryptoCurrencies[["2017"]], copula = copulae, M = 100,
+ MJ = 1000, processes = 7, seed.active = 1:101)
R> BTC_LTC_18 = gof(x = CryptoCurrencies[["2018"]], copula = copulae, M = 100,
+ MJ = 1000, processes = 7, seed.active = 1:101)
After finishing the calculations, we proceed by plotting the received
objects of class "gofCOP"
. For a detailed explanation about the
information contained in the gofCopula pirateplots, please see Section
4.2 and Phillips (2017).
R> plot(BTC_LTC_15)
R> plot(BTC_LTC_16)
R> plot(BTC_LTC_17)
R> plot(BTC_LTC_18)
Figures 10 to 13
show the resulting "gofCOP"
-plots for all
the considered years. Following the usual approach in practice, we
select the copula corresponding to the highest
Summarizing, the following conclusions can be drawn from this analysis:
As a second real-world example, we analyze the volatility-adjusted stock log-returns of Citigroup (C) and the Bank of America (BoA) in the time span from 2004 to 2012. The procedure is, again, splitted into Data Investigation and Goodness-of-Fit testing.
This data was analyzed by Zhang et al. (2016), and we are
expanding their procedure and consider the same copulae and tests as in
the example in Section 6.1. The volatility correction
was performed similarly in terms of fitting a GARCH(1,1) process, and
the resulting data is included in gofCopula in the list Banks
. Note
that in this section, we focus on the years 2004 and 2007, while the
results of the other years are given in Appendix
B. We start by visualizing the residuals with
margins transformed to standard normal.
R> library("gofCopula")
R> data("Banks", package = "gofCopula")
R> par(mfrow = c(1,2))
R> for(i in c("2004", "2007")){
+ x1 = Banks[[i]][,1]
+ x2 = Banks[[i]][,2]
+ n = length(x1)
+ plot(qnorm(cbind(ecdf(x1)(x1), ecdf(x2)(x2)) * n / (n + 1)), col = "blue3",
+ pch = 19, cex.lab = 1.25, main = i, xlab = "Citigroup", ylab = "Bank of America")
+ }
R> par(mfrow = c(1,1))
Analyzing the shape of the data in Figure 14 for
2004, one may argue that the elliptical normal copula is present, while
in 2007, a
We set M = 100
and MJ = 1000
as bootstrap parameters, parallelize
via processes = 7
, and set seed.active = 1:101
for reproducibility.
Further, we implicitly keep the default margins = "ranks"
to perform
the margin transformation nonparametrically.
R> copulae = c("normal", "t", "clayton", "gumbel", "frank")
R> C_BoA_04 = gof(x = Banks[["2004"]], copula = copulae, M = 100, MJ = 1000,
+ processes = 7, seed.active = 1:101)
R> C_BoA_07 = gof(x = Banks[["2007"]], copula = copulae, M = 100, MJ = 1000,
+ processes = 7, seed.active = 1:101)
Following these calculations, we continue to plot the results, leading
to Figures 15 and 16.
For a detailed explanation about the information contained in the
gofCopula pirateplots, please see Section 4.2
and Phillips (2017). Interpreting these "gofCOP"
-plots of the
R> plot(C_BoA_04)
hyb12
of the gofWhite
could not be computed due to instability in the test statistics. For a
detailed description of this phenomenon, see
Nagler et al. (2019).R> plot(C_BoA_07)
This paper introduces a gofCopula package that provides maximum
flexibility in performing statistical Goodness-of-Fit tests for copulae.
The package provides an interface for 16 most popular GoF tests for 13
copulae with automatic estimation of margins via different techniques.
The user is not limited to the implemented tests as self-defined test
statistics functions can be easily embedded via a function provided in
the package. As the computation of
The authors are grateful to Shulin Zhang, Qian Zhou, Peter Song, and Pannier for helpful discussions and to Oliver Scaillet for the code of the version of his test used in Zhang et al. (2016) that is being adapted in this package. Financial support from NUS FRC grant R-146-000-298-114 “Augmented machine learning and network analysis with applications to cryptocurrencies and blockchains” as well as CityU Start-Up Grant 7200680 “Textual Analysis in Digital Markets” is gratefully acknowledged by Simon Trimborn. Ostap Okhrin thankfully received financial support from RFBR, project number 20-04-60158.
Table 3 contains parameter ranges, the
bivariate cumulative distribution function (CDF), and possible values of
Kendall’s
Copula | |||
---|---|---|---|
Normal | |||
Clayton | |||
Gumbel | |||
Frank | |||
Joe | |||
AMH | |||
Galambos | |||
Husler-Reiss | |||
Tawn | |||
see (Demarta and McNeil 2005) | |||
FGM | |||
Plackett |
This section is devoted to the results of Section 6.2.
hyb12
of the gofWhite
could not be computed due to instability in the test statistics. For a
detailed description of this phenomenon, see
Nagler et al. (2019).hyb12
of the gofWhite
could not be computed due to instability in the test statistics. For a
detailed description of this phenomenon, see
Nagler et al. (2019).hyb12
of the gofWhite
could not be computed due to instability in the test statistics. For a
detailed description of this phenomenon, see
Nagler et al. (2019).hyb12
of the gofWhite
could not be computed due to instability in the test statistics. For a
detailed description of this phenomenon, see
Nagler et al. (2019).copula, TwoCop, VineCopula, gofCopula, progress, yarrr
Distributions, ExtremeValue, Finance
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Okhrin, et al., "gofCopula: Goodness-of-Fit Tests for Copulae", The R Journal, 2021
BibTeX citation
@article{RJ-2021-060, author = {Okhrin, Ostap and Trimborn, Simon and Waltz, Martin}, title = {gofCopula: Goodness-of-Fit Tests for Copulae}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {1}, issn = {2073-4859}, pages = {493-524} }