Interval censored outcomes arise when a silent event of interest is known to have occurred within a specific time period determined by the times of the last negative and first positive diagnostic tests. There is a rich literature on parametric and non-parametric approaches for the analysis of interval-censored outcomes. A commonly used strategy is to use a proportional hazards (PH) model with the baseline hazard function parameterized. The proportional hazards assumption can be relaxed in stratified models by allowing the baseline hazard function to vary across strata defined by a subset of explanatory variables. In this paper, we describe and implement a new R package straweib, for fitting a stratified Weibull model appropriate for interval censored outcomes. We illustrate the R package straweib by analyzing data from a longitudinal oral health study on the timing of the emergence of permanent teeth in 4430 children.
In many clinical studies, the time to a silent event is known only up to an interval defined by the times of the last negative and first positive diagnostic test. Event times arising from such studies are referred to as ‘interval-censored’ data. For example, in pediatric HIV clinical studies, the timing of HIV infection is known only up to the interval from the last negative to the first positive HIV diagnostic test (Dunn et al. 2000). Examples of interval-censored outcomes can also be found in many other medical studies (Gomez et al. 2009).
A rich literature exists on the analysis of interval-censored outcomes. Non-parametric approaches include the self-consistency algorithm for the estimation of the survival function (Turnbull 1976). A semi-parametric approach based on the proportional hazards model has been developed for interval-censored data (Finkelstein 1986; Goetghebeur and Ryan 2000). A variety of parametric models can also be used to estimate the distribution of the time to the event of interest, in the presence of interval-censoring (Lindsey and Ryan 1998). An often used parametric approach for the analysis of interval-censored data is based on the assumption of a Weibull distribution for the event times (Lindsey and Ryan 1998). The Weibull distribution is appropriate for modeling event times when the hazard function can be reliably assumed to be monotone. Covariate effects can be modeled through the assumption of proportional hazards (PH), which assumes that the ratio of hazard functions when comparing individuals in different strata defined by explanatory variables is time-invariant. The article by Gomez et al. (2009) presents a comprehensive review of the state-of-the-art techniques available for the analysis of interval-censored data.
In this paper, we implement a parametric approach for modeling covariates applicable to interval-censored outcomes, but where the assumption of proportional hazards may be questionable for a certain subset of explanatory variables. For this setting, we implement a stratified Weibull model by relaxing the PH assumption across levels of a subset of explanatory variables. We compare the proposed model to an alternative stratified Weibull regression model that is currently implemented in the R package survival (Therneau 2012). We illustrate the difference between these two models analytically and through simulation.
The paper is organized as follows: In Section 2, we present and compare two models for relaxing the PH assumption, based on the assumption of a Weibull distribution for the time to event of interest. In this section, we discuss estimation of the unknown parameters of interest, hazard ratios comparing different groups of subjects based on specific values of explanatory covariates and tests of the PH assumption. These methods are implemented in a new R package, straweib (Gu and R. Balasubramanian 2013). In Section 3, we perform simulation studies to compare two stratified Weibull models implemented in R packages straweib and survival. In Section 4, we illustrate the use of the R package straweib by analyzing data from a longitudinal oral health study on the timing of the emergence of permanent teeth in 4430 children in Belgium (Leroy et al. 2003; Gomez et al. 2009). In Section 5, we discuss the models implemented in this paper and present concluding remarks.
Let
We assume that the random variable
Thus, under the Weibull PH model, the survival and hazard functions
corresponding to
where,
In this section, we describe the stratified Weibull PH regression model implemented in the the R package survival (Therneau 2012).
Consider the following log-linear model for the random variable
The assumption of a standard Gumbel distribution with location and scale
parameters equal to 0 and 1, respectively, implies that the random
variable
The coefficients for the explanatory variables (
The log-linear form of the Weibull model can be generalized to allow
arbitrary baseline hazard functions within subgroups defined by a
stratum indicator
However, the hazard ratio comparing two individuals with covariate
vectors and stratum indicators denoted by (Z,
For
In this section, we describe the stratified Weibull regression model that is implemented in the new R package, straweib (Gu and R. Balasubramanian 2013).
To relax the proportional hazards assumption in the Weibull regression
model, we propose the following model for an individual in the stratum
Equivalently, the model can be stated in terms of the survival function as:
Here, we assume that the scale and shape parameters (
This hazard ratio is invariant with respect to time
Let
The unknown parameters to be estimated are optim
function in R. The shape and scale
parameters can be estimated from the estimates of
One can test whether or not the baseline hazard functions of each strata
are proportional to each other, by testing the equality of shape
parameters across strata
The log hazard ratio comparing two individuals with covariate vectors
and stratum indicators denoted by (Z,
In this section, we compare the stratified Weibull regression model implemented in the survival package to that implemented in our package, straweib.
In the absence of stratification, both models are identical and reduce to the Weibull PH model. However, in the presence of a stratification factor, the models implemented by survival and straweib correspond to different models, resulting in different likelihood functions and inference. As we discussed in Section 2, the hazard ratio between two subjects with different covariate values within same stratum depends on their stratum in the model implemented in the R package survival (Equation ((5))), whereas the hazard ratio comparing two individuals within the same stratum is invariant to stratum in the model implemented in the R package straweib (Equation ((7))). In particular, the Weibull model implemented in the straweib shares similarities with the semi-parametric, stratified Cox model for right censored data.
To illustrate the difference between the models implemented in the R
packages survival and straweib, we conducted a simulation
study in which 1000 datasets were simulated under the model assumed in
the straweib package (Equation ((6))). For each
simulated dataset, since both models have the same number of unknown
parameters, we compare the values of the log-likelihood evaluated at the
MLEs. Datasets were simulated based on the assumptions that there are 3
strata, each with a 100 subjects; the shape parameters
(survreg
function in the R package survival
and the icweib
function in the straweib package.
Figure 1 compares the maximized value of the log-likelihoods under both models, when the data are generated using a simulation mechanism that corresponds to the model implemented in the R package straweib. The maximized value of the log-likelihood from the R package survival is lower than that from the R package straweib for 93.1% of simulated datasets. This is expected as in this simulation study the data generating mechanism is identical to the model implemented in the R package straweib. In applications where the proportional hazards assumption is questionable, we recommend fitting both models and comparing the resulting maximized values of the log likelihood. Whether one model is better than another depends on the data.
We illustrate the R package straweib with data from a study on the timing of emergence of permanent teeth in Flemish children in Belgium (Leroy et al. 2003). The data analyzed were from the Signal-Tandmobiel project (Vanobbergen et al. 2000), a longitudinal oral health study in a sample of 4430 children conducted between 1996 and 2001. Dental examinations were conducted annually for a period of 6 years and tooth emergence was recorded based on visual inspection. As in Gomez et al. (2009), we will illustrate our R package by analyzing the timing of emergence of the permanent upper left first premolars. As dental exams were conducted annually, for each child, the timing of tooth emergence is known up to the interval from the last negative to the first positive dental examination.
data(tooth24)
head(tooth24)
id left right sex dmf
1 1 2.7 3.5 1 1
2 2 2.4 3.4 0 1
3 3 4.5 5.5 1 0
4 4 5.9 Inf 1 0
5 5 4.1 5.0 1 1
6 6 3.7 4.5 0 1
The dataset is formatted to include 1 row per child. The variable denoted id corresponds to the ID of the child, left and right correspond to the left and right endpoints of the censoring interval in years, sex denotes the gender of the child (0 = boy, and 1 = girl), and dmf denotes the status of primary predecessor of the tooth (0 = sound, and 1 = decayed or missing due to caries or filled). Right censored observations are denoted by setting the variable right to "Inf".
In our analysis below, we use the function icweib
in the package
straweib, to fit a stratified Weibull regression model, where the
variable dmf is the stratum indicator (
fit <- icweib(L = left, R = right, data = tooth24, strata = dmf, covariates = ~sex)
fit
Total observations used: 4386. Model Convergence: TRUE
Coefficients:
coefficient SE z p.value
sex 0.331 0.0387 8.55 0
Weibull parameters - gamma(shape), lambda(scale):
straname strata gamma lambda
dmf 0 5.99 1.63e-05
dmf 1 4.85 1.76e-04
Test of proportional hazards for strata (H0: all strata's shape parameters are equal):
test TestStat df p.value
Wald 44.2 1 2.96e-11
Likelihood Ratio 44.2 1 3.00e-11
Loglik(model)= -5501.781 Loglik(reduced)= -5523.87
Loglik(null)= -5538.309 Chisq= 73.05611 df= 1 p.value= 0
The likelihood ratio test of the PH assumption results in a p value of
3.00e-11, indicating that the PH model is not appropriate for this
dataset. Or in other words, the data suggest that the hazard functions
corresponding to the strata defined by
The p value from the Wald test of the null hypothesis of no effect of
gender results in a p value of approximately 0 (
To test the global null hypothesis that both covariates sex and
dmf are not associated with the outcome (time to teeth emergence),
we obtain the log-likelihood for global null model, as shown below.
fit0 <- icweib(L = left, R = right, data = tooth24)
fit0
Total observations used: 4386. Model Convergence: TRUE
Weibull parameters - gamma(shape), lambda(scale):
straname strata gamma lambda
strata ALL 5.3 7.78e-05
Loglik(model)= -5596.986
Loglik(null)= -5596.986
The likelihood ratio test testing the global null hypothesis results in
a test statistic
We illustrate the HRatio
function in the straweib package to
estimate the hazard ratio and corresponding 95% confidence intervals for
comparing boys without tooth decay (
HRatio(fit, times = 1:7, NumStra = 0, NumZ = 0, DemStra = 1, DemZ = 0)
time NumStra DemStra beta*(Z1-Z2) HR low95 high95
1 1 0 1 0 0.1143698 0.06596383 0.1982972
2 2 0 1 0 0.2520248 0.18308361 0.3469262
3 3 0 1 0 0.4000946 0.33112219 0.4834339
4 4 0 1 0 0.5553610 0.49863912 0.6185351
5 5 0 1 0 0.7162080 0.66319999 0.7734529
6 6 0 1 0 0.8816470 0.79879884 0.9730878
7 7 0 1 0 1.0510048 0.91593721 1.2059899
The output indicates that the hazard ratio for boys comparing the
stratum
We illustrate estimation of the survival function in Figure
2 by plotting the survival functions and corresponding 95%
point wise confidence intervals for girls (
plot(fit, Z = 1, tRange = c(1, 7), xlab = "Time (years)", ylab = "Survival Function",
main = "Estimated survival function for girls")
We compare our results from the straweib package to that obtained from the survival package.
library(survival)
tooth24.survreg <- tooth24
tooth24.survreg$right <- with(tooth24, ifelse(is.finite(right), right, NA))
fit1 <- survreg(Surv(left, right, type="interval2") ~ sex + strata(dmf) + factor(dmf),
data = tooth24.survreg)
fit1
Call:
survreg(formula = Surv(left, right, type = "interval2") ~ sex +
strata(dmf) + factor(dmf), data = tooth24.survreg)
Coefficients:
(Intercept) sex factor(dmf)1
1.84389938 -0.06254599 -0.06491729
Scale:
dmf=Sound1 dmf=Sound2
0.1659477 0.2072465
Loglik(model)= -5499.3 Loglik(intercept only)= -5576.2
Chisq= 153.8 on 2 degrees of freedom, p= 0
n= 4386
The maximized value of the log-likelihood from the R package
survival is
To clarify the specific assumptions made by the models implemented in
the survival and straweib packages, we carried out subgroup
analyses in which we fit a Weibull PH model separately to each of the
strata
fit20 <- icweib(L= left, R=right, data=tooth24[tooth24$dmf==0, ], covariates = ~sex)
fit20 ### Partial results shown below
Coefficients:
coefficient SE z p.value
sex 0.448 0.0543 8.25 2.22e-16
The results from the Weibull PH model fit to the subgroup
fit21 <- icweib(L= left, R=right, data=tooth24[tooth24$dmf==1, ], covariates = ~sex)
fit21 ### Partial results shown below
Coefficients:
coefficient SE z p.value
sex 0.208 0.0554 3.76 0.000169
The model using the PH scale (implemented by straweib package)
replaces the stratum specific hazard ratios for sex of
Since the Weibull distribution has both the PH and accelerated failure
time (AFT) property (Collett 2003), the identical set of subgroup analyses
can be fit using the survival package. Results from the fit using
the survival package for the subgroup
fit20.survreg <- survreg(Surv(left, right, type="interval2") ~ sex,
data = tooth24.survreg[tooth24.survreg$dmf==0, ])
fit20.survreg ### Partial results shown below
Coefficients:
(Intercept) sex
1.85029150 -0.07453785
Similar results using the survival package for the subgroup
fit21.survreg <- survreg(Surv(left, right, type="interval2") ~ sex,
data = tooth24.survreg[tooth24.survreg$dmf==1, ])
fit21.survreg ### Partial results shown below
Coefficients:
(Intercept) sex
1.76931556 -0.04303767
In particular, the model assuming a common sex coefficient in the AFT
scale (implemented by survival package) replaces the value of sex
coefficient
To assess the goodness of fit of the stratified Weibull model implemented by straweib, we created a multiple probability plot, as described in chapter 19 of Meeker and Escobar (1998). This diagnostic plot was created by splitting the dataset into 4 subgroups based on the values of sex and dmf. Within each group, we estimated the cumulative incidence at each visit time using a non-parametric procedure for interval censored data (Turnbull 1976). The non-parametric estimates of cumulative incidence within each subgroup were compared to that obtained from the stratified Weibull model implemented by straweib package. We use the R package interval (Fay and Shaw 2010) to obtain Turnbull’s NPMLE estimates and the R package straweib for the estimates from the stratified Weibull model (code available upon request). Figure 3 shows the diagnostic plot.
Table 1 presents the estimates of hazard ratio for sex,
within each of the strata defined by
HR.straweib <- exp(fit$coef[1, 1])
HR.survreg <- exp(-fit1$coefficients['sex']/fit1$scale)
HR.subgroup <- exp(c(fit20$coef[1, 1], fit21$coef[1, 1]))
stratum | Package survival | Package straweib | Stratum specific subgroup analyses |
---|---|---|---|
dmf = 0 | 1.46 | 1.39 | 1.56 |
dmf = 1 | 1.35 | 1.39 | 1.23 |
We have developed and illustrated an R package straweib for the analysis of interval-censored outcomes, based on a stratified Weibull regression model. The proposed model shares similarities with the semi-parametric stratified Cox model. We illustrated the R package straweib using data from a prospective study on the timing of emergence of permanent teeth in Flemish children in Belgium (Leroy et al. 2003).
Although the models and R package are illustrated for the analysis of interval-censored time-to-event outcomes, the methods proposed here are equally applicable for the analysis of right-censored outcomes. The syntax for the analysis of right-censored observations is explained in the manual accompanying the straweib package available on CRAN (Gu and R. Balasubramanian 2013).
This research was supported by NICHD grant R21 HD072792.
ClinicalTrials, Econometrics, Survival
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Gu, et al., "Stratified Weibull Regression Model for Interval-Censored Data", The R Journal, 2014
BibTeX citation
@article{RJ-2014-003, author = {Gu, Xiangdong and Shapiro, David and Hughes, Michael D. and Balasubramanian, Raji}, title = {Stratified Weibull Regression Model for Interval-Censored Data}, journal = {The R Journal}, year = {2014}, note = {https://rjournal.github.io/}, volume = {6}, issue = {1}, issn = {2073-4859}, pages = {31-40} }