Bivariate time-to-event data frequently arise in research areas such as clinical trials and epidemiological studies, where the occurrence of two events are correlated. In many cases, the exact event times are unknown due to censoring. The copula model is a popular approach for modeling correlated bivariate censored data, in which the two marginal distributions and the between-margin dependence are modeled separately. This article presents the R package CopulaCenR, which is designed for modeling and testing bivariate data under right or (general) interval censoring in a regression setting. It provides a variety of Archimedean copula functions including a flexible two-parameter copula and different types of regression models (parametric and semiparametric) for marginal distributions. In particular, it implements a semiparametric transformation model for the margins with proportional hazards and proportional odds models being its special cases. The numerical optimization is based on a novel two-step algorithm. For the regression parameters, three likelihood-based tests (Wald, generalized score and likelihood ratio tests) are also provided. We use two real data examples to illustrate the key functions in CopulaCenR.
Bivariate data arise frequently in many research areas such as health, epidemiology, and economics. For example, bivariate time-to-event endpoints are often used in clinical trials studying bilateral diseases (e.g., eye diseases) or complex diseases (e.g., cancer and psychiatric disorders). The two events are correlated as they come from the same subject. In many situations, the two event times cannot be precisely observed, leading to bivariate censored data. Specifically, bivariate right-censored data occur when the study ends prior to the occurrence of one or both events. An example of such data comes from a clinical study assessing the treatment effect on preventing blindness in Diabetic Retinopathy patients where each patient had one eye randomized to the treatment and the other eye received no treatment (Huster et al. 1989), and the time-to-blindness are bivariate and right-censored. We will illustrate the analysis of this study in Section 4. In another situation, bivariate interval-censored data occur when the status of both events are periodically examined at intermittent assessment times. In this case, the right censoring could also happen if the event still does not occur at the last assessment time. A special case of interval-censored data is the current status data if there is only one assessment time and the event is only known to occur or not by its assessment time. An example of bivariate interval-censored data will be demonstrated in Section 4, which came from a clinical trial studying the progression of a bilateral eye disease, Age-related Macular Degeneration (AMD), where the progression time to late-AMD are interval or right censored (AREDS Group 1999). More examples can be found in books Hougaard (2000) and Sun (2007).
The development of our package is motivated by researches that are interested in (1) discovering covariates that are significantly associated with the bivariate censored outcomes, and (2) characterizing the joint and conditional risks of two events. For the bivariate events, the joint and conditional risks could be clinically more important than the marginal risk (of a single event). For example, the joint 5-year progression-free probability for both eyes helps identify patients with a high risk of progressing to late-AMD. For another example, for patients having one eye already progressed, the conditional 5-year progression-free probability for the non-progressed eye (given its fellow eye already progressed) provides important information for both clinicians and the patient since patients with both eyes progressed to the late stage of the disease may lose the ability to live independently.
There are three major approaches to fit regression models for bivariate censored data. The simplest way is to fit a marginal model and estimate the variance-covariance by a robust sandwich estimator (for example, (Wei et al. 1989)). This approach takes a working independence assumption, and thus cannot generate joint or conditional distributions. The second approach is based on frailty models (for example, (Oakes 1982)), which are essentially mixed effects models and account for the dependence between two events by a latent frailty variable. However, the covariate effects in frailty models are usually interpreted on a conditional level (by conditioning on the frailty term), which is not straightforward. The third approach is to use copula models (for example, (Clayton 1978)). Unlike the marginal or frailty approaches, the copula approach models the joint survival distribution by directly connecting the two marginal distributions through a copula function. One unique advantage of the copula is that it separately models the marginal distributions and the dependence parameter(s), allowing flexibility in marginal models and straightforward interpretation for covariate effects. Moreover, the challenge from censoring can be naturally handled through the marginal distributions within the copula function. Besides, the joint and conditional distributions can be obtained based on the copula model.
Along with these three major approaches, multiple endeavors have been
devoted to the development of software, mostly R (R Core Team 2019)
packages, to build regression models for bivariate censored data. For
bivariate right-censored data, the
survival (Therneau 2018a)
package can fit parametric or semiparametric Cox (Cox 1972) marginal and
frailty models. Also, packages such as
parfm (Munda et al. 2012) and
frailtypack
(Rondeau et al. 2012) implement proportional hazards (PH) frailty models under
the parametric and semiparametric settings. Other R
packages such as coxme
(Therneau 2018b) and phmm (Donohue and Xu 2019)
also fit PH frailty models for right-censored data. For bivariate
interval-censored data, the
survival and
frailtypack packages
provide marginal and frailty models under the parametric or
semiparametric (Cox PH) situation, respectively. The C++
program IntCens
(codes located under
https://dlin.web.unc.edu/software/intcens/) implements a class of
semiparametric frailty models, including both PH and proportional odds
(PO) models.
To the best of our knowledge, there exists no R package for fitting copula-based regression models for both bivariate right-censored and interval-censored data. The existing copula packages for bivariate data handle either the non-censoring (i.e., complete data) or the right-censoring situation. In the non-censoring situation, the package copula (Hofert et al. 2018) by Yan (2007) and Kojadinovic and Yan (2010) implements multivariate copula models without covariates for complete data and obtains the maximum likelihood estimator for the copula dependence parameter. It gives useful codes for implementing regression models in bivariate complete data in the appendix of Yan (2007). It also provides copula goodness-of-fit tests for model selection purpose. The package VineCopula (Schepsmeier et al. 2018) can also model bivariate or multivariate complete data without covariates through the vine copula models (Aas et al. 2009). Packages such as CopulaRegression (Nicole Kraemer 2014) and gcmr (Masarotto and Varin 2017) can provide copula-based regression models with parametric margins for bivariate or multivariate complete data and provide maximum likelihood estimators for model parameters. The package gamCopula (Nagler and Vatter 2020) implements a generalized additive model that can take into account the effect of the predictors on the dependence structure of bivariate and vine copula models (Vatter and Chavez-Demoulin 2015). For the right-censoring situation, the Copula.surv package (Emura 2018) can estimate the Clayton copula dependence parameter in bivariate right-censored data without covariates and also perform a goodness-of-fit test for a fitted Clayton model (Emura et al. 2010). The Sunclarco package (Prenen et al. 2017b) provides Clayton or Gumbel copula-based regression models with parametric (Weibull and piecewise constant) or Cox semiparametric margins for multivariate right-censored data (Prenen et al. 2017a). The package GJRM (Marra and Radice 2020) can fit both marginal and copula regression models in complete and right-censored data (Marra et al. 2017; Marra and Radice 2017, 2019). By far, there is no copula-based R package for bivariate interval-censored data.
We develop the CopulaCenR package, which fits copula-based regression models for both bivariate right-censored and interval-censored data. The package is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=CopulaCenR. The main advantage of CopulaCenR relies on the diverse choice of copula and marginal models for both bivariate right-censored and interval-censored data. Specifically, it provides a class of Archimedean copulas that correspond to a variety of dependence structures, as illustrated in Table 1. In particular, in addition to these frequently used one-parameter Archimedean copulas, a two-parameter copula function (Copula2) is also included. This Copula2 has more flexibility in modeling dependence structure, as we show in Section 2. Furthermore, CopulaCenR implements a list of parametric and semiparametric marginal regression models, as illustrated in Table 2. For parameter estimation, the package utilizes a novel two-step procedure that is computationally stable and efficient. For the inference of regression parameters, three likelihood-based tests such as Wald, generalized score and likelihood ratio tests are provided.
We will describe the major features of
CopulaCenR in Section
2 and presents the model and estimation procedure in
Section 3. We will demonstrate two real data examples in
Section 4 using the version
The most popular copula family for bivariate censored data is the
Archimedean copula family, which has an explicit form of
Table 1 lists six Archimedean family copula models that
are implemented in
CopulaCenR. Two most
frequently used Archimedean copulas are Clayton (Clayton (1978), (1978))
and Gumbel (Gumbel (1960), (1960)) models, which account for the lower or
upper tail dependence between two margins using a single parameter
Family | Parameter Space | Generator |
Generator Inverse |
Kendall’s |
|||
---|---|---|---|---|---|---|---|
Clayton | 0 | ||||||
Gumbel | |||||||
Frank | |||||||
AMH | |||||||
Joe | |||||||
Copula2 |
To fit a copula-based regression model, one also needs to choose a
regression model for the margins. Table 2 lists the
available marginal models in
CopulaCenR. For
bivariate right-censored data, users can fit either a parametric
marginal model via the function rc_par_copula
or a semiparametric Cox
PH model via the function rc_spCox_copula
(Sun et al. 2019).
Specifically, the parametric models incorporate both the PH (e.g.,
Weibull, Gompertz) and the PO (e.g., Loglogistic) models. For bivariate
interval-censored data, one can choose to fit a parametric marginal
model via the function ic_par_copula
. Moreover, the package can also
fit a semiparametric transformation model via the function
ic_spTran_copula
. It contains a variety of marginal models including
the PH and PO models, as we explain in Section
3.3. A novel two-step sieve estimation
procedure is implemented (Sun and Ding 2019).
Type | Models | Survival Distributions |
Corresponding R Functions |
---|---|---|---|
Parametric | Weibull | rc_par_copula , ic_par_copula |
|
Gompertz | |||
Loglogistic | |||
Semiparametric | Cox | rc_spCox_copula |
|
Transformation | ic_spTran_copula |
For the inference of the covariate effects, three types of
likelihood-based tests are implemented in
CopulaCenR: the Wald
test (built within rc_par_copula
, rc_spCox_copula
, ic_par_copula
,
and ic_spTran_copula
), the generalized score test (score_copula
) and
the likelihood-ratio test (lrt_copula
).
After a copula model being fitted, fitted values (i.e., linear
predictors, survival probabilities) can be extracted by the general S3
function fitted
. For new observations, the linear predictors and
survival probabilities can be obtained using the function predict
.
Moreover, the user can plot three types of distributions (joint,
conditional and marginal) using the general functions plot
and
lines
. In particular, an interactive 3D contour will be plotted to
visualize the joint distribution.
Besides, the package provides a bivariate event time generating function
data_sim_copula
, which can generate random bivariate event times based
on a specified copula function, a marginal distribution, and covariate
values.
In summary, the key functions of CopulaCenR are
rc_par_copula
: for fitting parametric regression models to
bivariate right-censored data;rc_spCox_copula
: for fitting a semiparametric Cox regression model
to bivariate right-censored data;ic_par_copula
: for fitting parametric regression models to
bivariate interval-censored data;ic_spTran_copula
: for fitting a semiparametric transformation
model to bivariate interval-censored data;score_copula
: for performing the generalized score test on
covariate effects;lrt_copula
: for performing the likelihood ratio test (LRT) on
covariate effects between two nested models;tau_copula
: for calculating Kendall’s plot, lines
: S3 methods for plotting joint, conditional and
marginal distributions based on a fitted copula model;fitted, predict
: S3 methods for extracting fitted values and
predicting new observations;summary, print, coef, logLik, AIC, BIC
: other S3 functions for a
fitted object;data_sim_copula
: for generating bivariate event times through a
specified copula model and marginal distributions.We use two real data examples to illustrate the implementation of these functions in Section 4.
Let
By the Sklar’s theorem (Sklar (1959), (1959)), so long as the marginal
survival functions
In this section, we present the joint likelihood functions for bivariate right-censored data and bivariate interval-censored data, respectively.
Define the density function for copula
Similarly, using the notation introduced in Section
3.1, the joint likelihood function for bivariate
interval-censored data from
We implement several popular parametric marginal models in
CopulaCenR, as shown
in Table 2. For example, the marginal Weibull survival
distribution can be written as
More generally, we implement the semiparametric Cox PH marginal model
for bivariate right-censored data. The model does not specify the
marginal distribution for the baseline hazards function. Instead, the
baseline hazards are treated as piecewise constants between all
uncensored event times as suggested by Breslow (1972). The model is expressed
as
We also consider a class of semiparametric linear transformation models
for the marginal distribution of the interval-censored data. The model
is expressed as:
In model ((4)),
Note that the model ((4)) contains a class of survival
models. For example, when
In this section, we illustrate the estimation procedure for the unknown
parameter
Obtain initial estimates of
Simultaneously maximize the joint log-likelihood to get final
estimates:
[Remark 1.] In the case of semiparametric Cox PH margins (with the
Breslow baseline cumulative hazard estimator), although the maximum
likelihood estimators from step 2 are consistent and asymptotically
normal, the Hessian matrix cannot be directly used for estimating the
variance-covariance matrix of (
[Remark 2.] In the case of semiparametric transformation model margins (with the use of Bernstein polynomials), the two-step estimation procedure becomes a two-step “sieve” estimation procedure. Sun and Ding (2019) rigorously proved the asymptotic properties of the sieve maximum likelihood estimators.
The main model-fitting functions (rc_par_copula
, rc_spCox_copula
,
ic_par_copula
and
ic_spTran_copula
) provide a built-in optimization option, which is a
wrapper to the optimization routines optim
and nlm
in
R.
We now separate
The package
CopulaCenR provides a
user-friendly function data_sim_copula
for generating random bivariate
event times based on a specified copula model, marginal distributions
and covariate values. The arguments n
, copula
, and eta
assign the
sample size, the copula type, and the dependence parameter value. For
marginal distributions, the argument dist
can be one of the three
parametric distributions in Table 2 (i.e., Weibull,
Loglogistic and Gompertz), and their distribution parameters are given
through baseline
. For Weibull and Loglogistic, the baseline parameters
are var_list
and COV_beta
, respectively. Lastly, x1
and x2
input a data frame of covariate values for the two margins,
respectively. Figure 2 illustrates a scatter plot of
library(CopulaCenR)
set.seed(1)
dat <- data_sim_copula(n = 500, copula = "Clayton", eta = 3, dist = "Weibull",
baseline = c(0.1,2), var_list = c("var1", "var2"), COV_beta = c(0.1, 0.1),
x1 = cbind(rnorm(500, 6, 2), rbinom(500, 1, 0.5)),
x2 = cbind(rnorm(500, 6, 2), rbinom(500, 1, 0.5)))
head(dat)
id ind var1 var2 time
1 1 6.130533 1 8.062168
1 2 6.154606 1 7.472649
2 1 8.070653 1 6.317247
2 2 5.406263 1 5.904064
3 1 10.520432 1 4.195788
3 2 3.633516 1 4.771523
plot(x = dat$time[dat$ind == 1], y = dat$time[dat$ind == 2],
xlab = expression(t[1]), ylab = expression(t[2]), cex.axis = 1, cex.lab = 1.3)
The bivariate right-censored input dataset shall be a data frame including the covariates and four additional key input columns:
We use the DRS (Diabetic Retinopathy Study) data as an example. The DRS data contain bivariate right-censored time to blindness from 197 diabetic retinopathy patients. These patients were from a 50% random sample of the patients with "high-risk" diabetic retinopathy as defined by the DRS (Huster et al. 1989). Each patient had one eye randomized to one of the two laser treatments and the other eye received no treatment. For each eye, the event of interest was the time from initiation of treatment to the time to blindness in months. Censoring was caused by death, dropout, or end of the study. The data can be loaded by
data("DRS", package = "CopulaCenR")
head(DRS)
id ind obs_time status treat age type
5 1 46.23 0 0 28 2
5 2 46.23 0 2 28 2
14 1 42.50 0 2 12 1
14 2 31.30 1 0 12 1
16 1 42.27 0 1 9 1
16 2 42.27 0 0 9 1
There are three covariates: treat
is treatment with age
is the age at diagnosis of diabetes; type
is the type
of diabetes with
We now demonstrate how to fit a Clayton copula model with Weibull
margins to the DRS data using the function rc_par_copula
. We are
interested in the treatment effect, as indicated in argument var_list
.
The arguments copula
and m.dis
specify the fitted copula model and
marginal baseline distributions. The default optimization method is
BFGS
(Nash 1990). Other optimization methods and control
parameters can also be applied (see ?optim
).
library(CopulaCenR)
clayton_wb <- rc_par_copula(data = DRS, var_list = "treat", copula = "Clayton",
m.dist = "Weibull", method = "BFGS")
summary(clayton_wb)
Copula: Clayton
Margin: Weibull
estimate SE stat pvalue
lambda 90.6440318 13.1887218 47.2360 6.293e-12 ***
k 0.8062766 0.0586207 189.1758 < 2.2e-16 ***
treat1 -0.5714498 0.1997080 8.1878 0.004217 **
treat2 0.0052997 0.1739106 0.0009 0.975689
eta 0.6205855 0.2610638 5.6508 0.017447 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(The Wald tests are testing whether each coefficient is 0)
Final llk: -839.7212
Convergence is completed successfully
The estimation and Wald test results suggest the xenon treatment
significantly reduced the risk of blindness compared to controls
(treat1
). We also compared our estimates with
previous findings in Huster et al. (1989). Due to the differences in the model
parameterization, we first transformed our estimates into comparable
forms. Specifically, the xenon treatment effect in Huster et al. (1989) can be
expressed as AIC
and BIC
.
AIC(clayton_wb)
1689.442
BIC(clayton_wb)
1705.858
After the model is fitted, Kendall’s tau_copula
.
tau_copula(eta = as.numeric(coef(clayton_wb)["eta"]), copula = "Clayton")
0.2368118
The fitted values (i.e., linear predictors and survival probabilities)
can be extracted through the function fitted
. As the model is a PH
model, the linear predictors (type
is “lp”) are the estimated log
proportional hazards.
fit1 <- fitted(clayton_wb, type = "lp")
fit1[1:3, ]
id lp1 lp2
5 0.000000000 0.005299655
14 0.005299655 0.000000000
16 -0.571449835 0.000000000
When type
is “survival”, the fitted outputs are marginal (S1, S2
)
and joint (S12
) survival probabilities at the observed times
(t1, t2
).
fit2 <- fitted(clayton_wb, type = "survival")
fit2[1:3, ]
id t1 t2 S1 S2 S12
5 46.23 46.23 0.5592967 0.5575724 0.3643588
14 42.50 31.30 0.5793467 0.6542323 0.4234880
16 42.27 42.27 0.7369175 0.5823995 0.4655204
Similarly, the predict
function provides predictions for new
observations with covariates. Its outputs can be either linear
predictors or survival probabilities (at specified times). The following
newdata1
example contains two subjects under different treatments.
newdata1 <- data.frame(id = rep(1:2, each=2), ind = rep(c(1,2),2),
time = rep(40,4), treat = factor(c(0,1,0,2)))
newdata1
id ind time treat
1 1 40 0
1 2 40 1
2 1 40 0
2 2 40 2
predict(clayton_wb, newdata = newdata1, type = "lp")
id lp1 lp2
1 0 -0.571449835
2 0 0.005299655
predict(clayton_wb, newdata = newdata1, type = "survival")
id t1 t2 S1 S2 S12
1 40 40 0.5962669 0.7467754 0.4799705
2 40 40 0.5962669 0.5946309 0.4024998
The bivariate interval-censored input dataset shall be a data frame including the covariates and five key input columns:
We use the AREDS (Age-Related Eye Disease Study) data as an example. The event of interest is the AMD (Age-related Macular Degeneration) disease, which is a leading cause of blindness in the developed world (Swaroop et al. 2009). It is known as a polygenic, progressive and neurodegenerative disorder. The AREDS study is a multi-center randomized clinical trial studying the development and progression of AMD, sponsored by the National Eye Institute (AREDS Group (1999), (1999)). Due to intermittent assessment times (every 6 months up to the first 6 years and every 1 year since after), the exact time when each eye progressed to late-AMD was only known to lie in a certain interval. As a result, the outcome data are bivariate interval-censored. The package includes a subset data of 629 Caucasian participants from AREDS who had at least one eye in moderate AMD stage at baseline. The data can be loaded by
data("AREDS", package = "CopulaCenR")
head(AREDS)
id ind Left Right status SevScaleBL ENROLLAGE rs2284665
1 1 0.0 2.0 1 6 67.0 1
1 2 0.0 2.0 1 8 67.0 1
2 1 0.0 2.0 1 7 68.0 0
2 2 5.9 9.3 1 4 68.0 0
3 1 8.0 9.1 1 7 64.9 0
3 2 10.0 Inf 0 7 64.9 0
Out of these 629 subjects, 273 subjects developed late-AMD in both eyes during the study and the times to late-AMD were interval-censored; 138 subjects developed late-AMD in one eye (interval-censored) and did not develop late-AMD before the end of the study (right-censored); the rest 218 subjects were right-censored for late-AMD in both eyes.
There are three continuous covariates: SevScaleBL
for baseline AMD
severity score (a value between 1 and 8 with a higher value indicating
more severe AMD), ENROLLAGE
for baseline age and rs2284665
for a
genetic variant (SevScaleBL
and
ENROLLAGE
are well-known risk factors of AMD. Thus, our primary
interest is to find out whether the genetic variant rs2284665
is
significantly associated with AMD progression.
We fit a two-parameter copula semiparametric transformation model for
the AREDS data through the function ic_spTran_copula
. The arguments
l
and u
are the range of event times, which need to be pre-specified
by the user. In practice, l
and u
can be set as the minimum and
maximum of observed times. The argument m
corresponds to the degree of
Bernstein polynomials (as shown in formula (5)), with the
default value r
specifies the form of marginal
transformation model (as shown in formula (6)). In
practice, the values of m
and r
can be chosen based on the smallest
AIC for a list of fitted models with different values.
We now demonstrate how to fit a two-parameter copula semiparametric
model to the AREDS data. We chose the range of event times as
library(CopulaCenR)
copula2_sp <- ic_spTran_copula(data = AREDS, copula = "Copula2",
var_list = c("ENROLLAGE","rs2284665","SevScaleBL"),
l = 0, u = 15, m = 3, r = 3)
summary(copula2_sp)
Copula: Copula2
Margin: semiparametric
estimate SE stat pvalue
ENROLLAGE 0.042610 0.012271 12.057 0.0005159 ***
rs2284665 0.397712 0.091180 19.026 1.290e-05 ***
SevScaleBL 0.722681 0.053258 184.132 < 2.2e-16 ***
alpha 0.930508 0.058714 251.167 < 2.2e-16 ***
kappa 0.974037 0.226081 18.562 1.645e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(The Wald tests are testing whether each coefficient is 0)
Final llk: -2104.178
Convergence is completed successfully
From the output, the estimated odds ratio for the genetic variant
AIC(copula2_sp)
4226.356
BIC(copula2_sp)
4266.353
Also, the estimated Kendall’s
tau_copula(eta = as.numeric(coef(copula2_sp)[c("alpha","kappa")]),
copula = "Copula2")
0.3851248
Furthermore, we can test the effect of score_copula
.
copula2_sp_null <- ic_spTran_copula(data = AREDS, copula = "Copula2",
var_list = c("ENROLLAGE","SevScaleBL"),
l = 0, u = 15, m = 3, r = 3)
score_copula(object = copula2_sp_null, var_score = "rs2284665")
stat pvalue
1.943163e+01 1.042661e-05
The LRT can also be performed by applying two nested models to the
function lrt_copula
.
lrt_copula(model1 = copula2_sp, model2 = copula2_sp_null)
stat pvalue
9.543119588 0.002007003
The following codes plot the 3D joint survival probabilities for the
three subjects in newdata2
, which have the same SevScaleBL = 3
in
both eyes and ENROLLAGE = 60
, but vary in the genotype of rs2284665
.
In the plot
function, the argument class
specifies the plot type,
which can be one of “joint”, “conditional” and “marginal”. When
class = "joint"
, it generates a 3D interactive contour that can be
manually rotated for the desired visualization. Figure
3 is a snapshot of 3D contours for the three
subjects in newdata2
.
newdata2 <- data.frame(id = rep(1:3, each=2), ind = rep(c(1,2),3),
SevScaleBL = rep(3,6), ENROLLAGE = rep(60,6),
rs2284665 = c(0,0,1,1,2,2))
newdata2
id ind SevScaleBL ENROLLAGE rs2284665
1 1 3 60 0
1 2 3 60 0
2 1 3 60 1
2 2 3 60 1
3 1 3 60 2
3 2 3 60 2
plot(x = copula2_sp, class = "joint", newdata = newdata2)
Similarly, the conditional survival probabilities (Figure
4) can be obtained for the left eyes from the same
three subjects, given their right eyes (i.e., cond_margin
cond_time
plot(x = copula2_sp, class = "conditional", newdata = newdata2,
cond_margin = 2, cond_time = 5, ylim = c(0.25,1),
ylab = "Conditional Survival Probability")
Likewise, we can also obtain the eye-level marginal survival
probabilities (i.e., plot_margin
plot(x = copula2_sp, class = "marginal", newdata = newdata2,
plot_margin = 1, ylim = c(0.6,1), ylab = "Marginal Survival Probability")
This paper presents the R package CopulaCenR for implementing copula-based regression models in bivariate censored data, including both bivariate right-censored data and bivariate interval-censored data. A variety of Archimedean copulas, including a flexible two-parameter copula, are built in the package to accommodate different dependence structures. Moreover, the package can fit various parametric and semiparametric regression models for the two margins within the copula function. In particular, a general semiparametric transformation model with PH and PO models being its special cases is implemented for the margins in this package. For parameter estimation, a novel two-step procedure is adopted to guarantee stable and fast computation. For the inference of covariate effects, all three likelihood-based tests are provided. Lastly, two real data examples are given to demonstrate the key features and capabilities of this package.
One future extension of this package is to allow multivariate copula functions for handling multivariate censored events. Another important research extension is to add goodness-of-fit tests, which is critical for choosing a proper copula model. However, there are limited works in testing copula models in bivariate censored data, especially in bivariate interval-censored data under the regression setting. The current literature (e.g., (Shih 1998; Andersen et al. 2010; Emura et al. 2010; Wang 2010)) only focus on testing copulas in bivariate right-censored data without covariates. We are currently investigating these directions and plan to incorporate them in a future version of CopulaCenR.
The Authors are grateful to Dr. Wei Chen for providing valuable suggestions about package development and the AREDS data analysis.
CopulaCenR, survival, parfm, frailtypack, coxme, phmm, copula, VineCopula, CopulaRegression, gcmr, gamCopula, Copula.surv, Sunclarco, GJRM
Agriculture, ClinicalTrials, Distributions, Econometrics, ExtremeValue, Finance, MixedModels, Robust, Survival
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Sun & Ding, "CopulaCenR: Copula based Regression Models for Bivariate Censored Data in R", The R Journal, 2020
BibTeX citation
@article{RJ-2020-025, author = {Sun, Tao and Ding, Ying}, title = {CopulaCenR: Copula based Regression Models for Bivariate Censored Data in R}, journal = {The R Journal}, year = {2020}, note = {https://rjournal.github.io/}, volume = {12}, issue = {1}, issn = {2073-4859}, pages = {266-282} }