Interval censored outcomes arise when a silent event of interest is known to have occurred within a specific time period determined by the times of the last negative and first positive diagnostic tests. There is a rich literature on parametric and non-parametric approaches for the analysis of interval-censored outcomes. A commonly used strategy is to use a proportional hazards (PH) model with the baseline hazard function parameterized. The proportional hazards assumption can be relaxed in stratified models by allowing the baseline hazard function to vary across strata defined by a subset of explanatory variables. In this paper, we describe and implement a new R package ** straweib**, for fitting a stratified Weibull model appropriate for interval censored outcomes. We illustrate the R package

In many clinical studies, the time to a silent event is known only up to an interval defined by the times of the last negative and first positive diagnostic test. Event times arising from such studies are referred to as ‘interval-censored’ data. For example, in pediatric HIV clinical studies, the timing of HIV infection is known only up to the interval from the last negative to the first positive HIV diagnostic test (Dunn et al. 2000). Examples of interval-censored outcomes can also be found in many other medical studies (Gomez et al. 2009).

A rich literature exists on the analysis of interval-censored outcomes. Non-parametric approaches include the self-consistency algorithm for the estimation of the survival function (Turnbull 1976). A semi-parametric approach based on the proportional hazards model has been developed for interval-censored data (Finkelstein 1986; Goetghebeur and Ryan 2000). A variety of parametric models can also be used to estimate the distribution of the time to the event of interest, in the presence of interval-censoring (Lindsey and Ryan 1998). An often used parametric approach for the analysis of interval-censored data is based on the assumption of a Weibull distribution for the event times (Lindsey and Ryan 1998). The Weibull distribution is appropriate for modeling event times when the hazard function can be reliably assumed to be monotone. Covariate effects can be modeled through the assumption of proportional hazards (PH), which assumes that the ratio of hazard functions when comparing individuals in different strata defined by explanatory variables is time-invariant. The article by Gomez et al. (2009) presents a comprehensive review of the state-of-the-art techniques available for the analysis of interval-censored data.

In this paper, we implement a parametric approach for modeling
covariates applicable to interval-censored outcomes, but where the
assumption of proportional hazards may be questionable for a certain
subset of explanatory variables. For this setting, we implement a
stratified Weibull model by relaxing the PH assumption across levels of
a subset of explanatory variables. We compare the proposed model to an
alternative stratified Weibull regression model that is currently
implemented in the R package
** survival** (Therneau 2012).
We illustrate the difference between these two models analytically and
through simulation.

The paper is organized as follows: In Section 2, we present and compare
two models for relaxing the PH assumption, based on the assumption of a
Weibull distribution for the time to event of interest. In this section,
we discuss estimation of the unknown parameters of interest, hazard
ratios comparing different groups of subjects based on specific values
of explanatory covariates and tests of the PH assumption. These methods
are implemented in a new R package,
** straweib**
(Gu and R. Balasubramanian 2013). In Section 3, we perform simulation studies to compare two
stratified Weibull models implemented in R packages

Let \(T\) denote the continuous, non-negative random variable
corresponding to the time to event of interest, with corresponding
probability distribution function (pdf) and cumulative distribution
function (cdf), denoted by \(f(t)\) and \(F(t)\), respectively. We let
\(S(t) = 1- F(t)\) to denote the corresponding survival function and
\(h(t) = lim_{\delta t \to 0} \frac{P( t \le T < t + \delta t \mid T \ge t)}{\delta t}\)
to denote the hazard function. We let **Z** denote the \(p \times 1\)
vector of explanatory variables or covariates.

We assume that the random variable
\(T \mid \boldsymbol{Z} = \boldsymbol{0}\) is distributed according to a
Weibull distribution, with scale and shape parameters denoted by
\(\lambda\) and \(\gamma\), respectively. The well known PH model to
accommodate the effect of covariates on \(T\) is expressed as:
\[h(t \mid \boldsymbol{Z}) = h(t \mid \boldsymbol{Z=0}) \times \exp(\boldsymbol{\beta'} \boldsymbol{Z})\,,\]
where \(\boldsymbol{\beta}\) denotes the \(p \times 1\) vector of regression
coefficients corresponding to the vector of explanatory variables,
**Z**.

Thus, under the Weibull PH model, the survival and hazard functions corresponding to \(T\) can be expressed as

\[\begin{aligned} \label{eq:weibphS} S(t \mid \boldsymbol{Z}) &=\exp\left(-\lambda \exp\left(\boldsymbol \beta ' \boldsymbol Z\right) t^\gamma\right) \end{aligned} \tag{1}\]

\[\begin{aligned} \label{eq:weibphH} h(t \mid \boldsymbol{Z}) &= \lambda \exp(\boldsymbol \beta ' \boldsymbol Z) \gamma t^{\gamma-1} \end{aligned} \tag{2}\]

where, \(\lambda > 0\) and \(\gamma > 0\) correspond to the scale and shape
parameters corresponding to \(T\) when \(\boldsymbol{Z} = \boldsymbol{0}\).
The hazard ratio comparing two individuals with covariate vectors **Z**
and \(\boldsymbol{Z}^{*}\) is equal to
\(\exp(\boldsymbol \beta '( \boldsymbol{Z} - \boldsymbol{Z}^{*}))\).

In this section, we describe the stratified Weibull PH regression model
implemented in the the R package ** survival** (Therneau 2012).

Consider the following log-linear model for the random variable \(T\):
\[log(T \mid \boldsymbol{Z} ) = \mu + \alpha_1 Z_{1} + \cdots \alpha_p Z_{p} + \sigma \epsilon\]
where, \(\alpha_1, \cdots, \alpha_p\) denote unknown regression
coefficients corresponding to the \(p\) dimensional vector of explanatory
variables, \(\mu\) denotes the intercept, and \(\sigma\) denotes the scale
parameter. The random variable \(\epsilon\) captures the random deviation
of event times on the natural logarithm scale (i. e. log(\(T\))) from the
linear model as a function of the covariate vector **Z**. In general,
the log-linear form of the model for \(T\) can be shown to be equivalent
to the accelerated failure time (AFT) model (Collett 2003).

The assumption of a standard Gumbel distribution with location and scale parameters equal to 0 and 1, respectively, implies that the random variable \(T\) follows a Weibull distribution. Moreover, in this case, both the PH and AFT assumptions (or equivalently, the log-linear model) lead to identical models with different parameterizations (Collett 2003). The survival and hazard functions can be expressed as:

\[\begin{aligned} \label{eq:loglnS} S(t \mid \boldsymbol {Z}) &= \exp \left[ -\exp\left(\frac{\log(t) - \mu - \boldsymbol{\alpha}^{'} \boldsymbol Z}{\sigma}\right) \right ] \\ \end{aligned} \tag{3}\]

\[\begin{aligned} \label{eq:loglnH} h(t \mid \boldsymbol{Z}) &=\exp \left[ -\frac{\mu + \boldsymbol{\alpha}^{'} \boldsymbol Z}{\sigma} \right ] \frac{1}{\sigma}t^{\frac{1}{\sigma}-1} \end{aligned} \tag{4}\]

The coefficients for the explanatory variables (\(\boldsymbol{\beta}\)) in the hazard function (\(h(t \mid \boldsymbol{Z})\)) are equal to \(-\frac{\boldsymbol \alpha}{\sigma}\). Moreover, there is a one-one correspondence between the parameters \(\lambda, \gamma, \boldsymbol{\beta}\) in equations ((1))-((2)) and the parameters \(\mu, \sigma, \boldsymbol{\alpha}\) in equations ((3))-((4)), where \(\lambda = \exp(-\frac{\mu}{\sigma}), \gamma = \sigma^{-1}\) and \(\beta_j = - \frac{\alpha_j}{\sigma}\) (Collett 2003).

The log-linear form of the Weibull model can be generalized to allow
arbitrary baseline hazard functions within subgroups defined by a
stratum indicator \(S = 1, \cdots, s\). Thus, the stratified Weibull
regression model for an individual in the \(j^{th}\) stratum is expressed
as:
\[\begin{array}{lll}
log\left(T \mid \boldsymbol{Z}, S=j\right) &=& \mu_j + \alpha_1 Z_{1} + \cdots \alpha_p Z_{p} + \sigma_j \epsilon
\end{array}\]
where \(\mu_j\) and \(\sigma_j\) denote stratum specific intercept and scale
parameters. This model is implemented in the R package ** survival**
(Therneau 2012). In this model, the regression coefficients
\(\boldsymbol{\alpha}\) on the AFT scale are assumed to be stratum
independent.

However, the hazard ratio comparing two individuals with covariate
vectors and stratum indicators denoted by (**Z**, \(S=j\)) and
(\(\boldsymbol{Z^{*}}, S = k\)) is stratum specific and is given by:

\[\begin{aligned} \frac{h\left(t \mid S=j, \boldsymbol{Z}\right)}{h\left(t \mid S=k, \boldsymbol{Z}^{*}\right)} &=t^{1/\sigma_j - 1/\sigma_k} \frac{\sigma_k}{\sigma_j} \exp\left({\frac{\mu_k}{\sigma_k} - \frac{\mu_j}{\sigma_j}}\right) \exp\left(\boldsymbol{\alpha}^{'}\left(\boldsymbol{Z}^{*}/\sigma_k - \boldsymbol{Z}/\sigma_j\right)\right) \end{aligned}\]

For \(j \ne k\), the hazard ratio varies with time \(t\). However, when \(j=k\), the hazard ratio comparing two individuals within the same stratum \(S=j\) is invariant with respect to time \(t\) but is stratum-dependent and reduces to:

\[\begin{aligned} \frac{h\left(t \mid S=j, \boldsymbol{Z}\right)}{h\left(t \mid S=j, \boldsymbol{Z}^{*}\right)} &= \exp\left(\frac{\boldsymbol{\alpha}^{'}}{\sigma_j}\left( \boldsymbol{Z}^{*} - \boldsymbol{Z}\right)\right) \label{eq:compLL} \end{aligned} \tag{5}\]

In this section, we describe the stratified Weibull regression model
that is implemented in the new R package, ** straweib** (Gu and R. Balasubramanian 2013).

To relax the proportional hazards assumption in the Weibull regression model, we propose the following model for an individual in the stratum \(S=j\):

\[\begin{aligned} h(t \mid \boldsymbol{Z}, S=j) &= \lambda_j \exp\left(\boldsymbol \beta ' \boldsymbol{Z}\right) \gamma_j t^{\gamma_j-1} \label{eq:stratweib} \end{aligned} \tag{6}\]

Equivalently, the model can be stated in terms of the survival function as:

\[\begin{aligned} S(t \mid \boldsymbol{Z}, S=j) &= \exp\left(-\lambda_j \exp\left(\boldsymbol \beta ' \boldsymbol Z\right) t^{\gamma_j}\right) \\ \end{aligned}\]

Here, we assume that the scale and shape parameters (\(\lambda, \gamma\))
are stratum specific - however, the regression coefficients
\(\boldsymbol{\beta}\) are assumed to be constant across strata (\(S\)). The
hazard ratio comparing two individuals with covariate vectors and
stratum indicators denoted by (**Z**, \(S=j\)) and
(\(\boldsymbol{Z^{*}}, S = k\)) is given by:
\[\begin{aligned}
\frac{h(t \mid S=j, \boldsymbol{Z})}{h(t \mid S=k, \boldsymbol{Z}^{*})} &= t^{\gamma_j- \gamma_k} \exp\left(\boldsymbol \beta^{'}\left( \boldsymbol{Z} - \boldsymbol{Z}^{*}\right)\right) \frac{ \lambda_j \gamma_j }{\lambda_k \gamma_k}
\end{aligned}\]
For \(j \ne k\), the hazard ratio varies with time \(t\) and thus relaxes
the PH assumption. However, for \(j=k\), the hazard ratio comparing two
individuals within the same stratum \(S=j\) reduces to:

\[\begin{aligned} \frac{h(t \mid S=j, \boldsymbol{Z})}{h(t \mid S=j, \boldsymbol{Z}^{*})} &= \exp\left(\boldsymbol \beta^{'}\left( \boldsymbol{Z} - \boldsymbol{Z}^{*}\right)\right) \label{eq:compPH} \end{aligned} \tag{7}\]

This hazard ratio is invariant with respect to time \(t\) and stratum \(S\), as in the stratified Cox model (Collett 2003).

Let \(u_j=\log(\lambda_j)\) and \(v_j=\log(\gamma_j)\). Let \(n_j\) denote the
number of subjects in stratum \(S=j\). For the \(k^{th}\) subject in stratum
\(j\), let \(\boldsymbol Z_{jk}\) denote the \(p\) dimensional vector of
covariates and let \(a_{jk}\) and \(b_{jk}\) denote the left and right
endpoints of the censoring interval. That is, \(a_{jk}\) denotes the time
of the last negative test and \(b_{jk}\) denotes the time of the first
positive test for the event of interest. Then the log-likelihood
function can be expressed as:

\[\begin{array}{rl} l(\boldsymbol v, \boldsymbol u, \boldsymbol \beta)=&\sum_{j=1}^{s} \sum_{k=1}^{n_j} \log \{\exp[-\exp[u_j+\boldsymbol \beta ' \boldsymbol Z_{jk}+\exp(v_j)\log(a_{jk})]] \\ & - \exp[-\exp[u_j+\boldsymbol \beta ' \boldsymbol Z_{jk}+\exp(v_j)\log(b_{jk})]] \} \\ \end{array}\]

The unknown parameters to be estimated are \(\boldsymbol v\),
\(\boldsymbol u\), and \(\boldsymbol \beta\). The log-likelihood function
can be optimized using the `optim`

function in R. The shape and scale
parameters can be estimated from the estimates of \(\boldsymbol v\) and
\(\boldsymbol u\). The covariance matrix of the estimates of these unknown
parameters can be obtained by inverting the negative Hessian matrix that
is output from the optimization routine (Cox and D. V. Hinkley 1979).

One can test whether or not the baseline hazard functions of each strata
are proportional to each other, by testing the equality of shape
parameters across strata \(S= 1, \cdots, s\). That is,
\[H_0 : \gamma_1 = \gamma_2 = \cdots = \gamma_s\]
or equivalently,
\[H_0 : v_1 = v_2 = \cdots = v_s.\]
The null hypothesis \(H_0\) can be tested using a likelihood ratio test,
by comparing a reduced model that assumes that
\(\gamma_1 = \gamma_2 == \cdots = \gamma_s\) to the full model in
((6)) assuming stratum specific shape parameters. We
note that the reduced model is equivalent to the Weibull PH model that
includes the stratum indicator \(S\) as an explanatory variable. Thus the
reduced model has \(s-1\) fewer parameters than the stratified model, or
the full model. Let \(l_F\) and \(l_R\) denote the log-likelihoods of the
full and reduced models evaluated at their MLE. Then the test statistic
\(T=-2(l_R - l_F)\) follows a \(\chi_{s-1}^2\) distribution under \(H_0\). In
addition to the likelihood ratio test, one can also use a Wald test to
test the null hypothesis \(H_0\). The R package ** straweib** illustrated
in Section 3 outputs both the Wald and Likelihood Ratio test statistics.

The log hazard ratio comparing two individuals with covariate vectors
and stratum indicators denoted by (**Z**, \(S=j\)) and
(\(\boldsymbol{Z^{*}}, S = j^*\)) at time \(t\) can be expressed as:
\[r_{tjj^*}=\log\left(R_{tjj^*}\right) = u_j + v_j + \log\left(t\right)\exp\left(v_j\right) - u_{j^*} - v_{j^*} - \log\left( t \right)\exp\left(v_{j^*}\right) + \boldsymbol{\beta}^{'}\left(\boldsymbol{Z} - \boldsymbol{Z}^{*}\right)\]
Let \(\boldsymbol{\hat{v}}\), \(\boldsymbol{\hat{u}}\) and
\(\boldsymbol{\hat{\beta}}\) denote the maximum likelihood estimates for
\(\boldsymbol v\), \(\boldsymbol u\) and \(\boldsymbol{\beta}\), then
\(r_{tjj^*}\) can be estimated by
\[\hat r_{tjj^*} = \hat u_j + \hat v_j + \log\left(t\right) \exp\left(\hat v_j\right) - \hat u_{j^*} - \hat v_{j^*} - \log\left(t\right) \exp\left(\hat v_{j^*}\right) + \boldsymbol{\hat{\beta}}^{'}\left(\boldsymbol{Z} - \boldsymbol{Z}^{*}\right)\]
Let
\(\boldsymbol w = (\boldsymbol v, \boldsymbol u, \boldsymbol \beta)=(v_1, v_2, \cdots, v_s, u_1, u_2, \cdots, u_s, \beta_1, \cdots, \beta_p)\).
Let \(\widehat{\Sigma}\) denote the estimate of the covariance matrix of
\(\boldsymbol{\hat w}\). Let \(\boldsymbol J_{tjj^*}\) denote the Jacobian
vector,
\(\boldsymbol J_{tjj^*}=\frac{\partial r_{tjj^*}}{\partial \boldsymbol w}|_{\boldsymbol w=\boldsymbol{\hat w}}\).
Thus, the estimate of the variance of \(\hat r_{tjj^*}\) is obtained by:
\[\widehat{Var}(\hat r_{tjj^*}) = \boldsymbol J_{tjj^*}^T \widehat{\Sigma} \boldsymbol J_{tjj^*}\]
We obtain a 95% confidence interval for \(r_{tjj^*}\) as
\(\bigg( \hat r_{tjj^*} - 1.96 \sqrt{\widehat{Var}(\hat r_{tjj^*})}, \hat r_{tjj^*} + 1.96 \sqrt{\widehat{Var}(\hat r_{tjj^*})} \bigg)\).
We exponentiate \(\hat r_{tjj^*}\) and its corresponding 95% confidence
interval to obtain the estimate and the 95% confidence interval for the
hazard ratio, \(R_{tjj^*}\). We illustrate the use of the ** straweib** R
package for obtaining hazard ratios and corresponding confidence
intervals in Section 4.

In this section, we compare the stratified Weibull regression model
implemented in the ** survival** package to that implemented in our
package,

In the absence of stratification, both models are identical and reduce
to the Weibull PH model. However, in the presence of a stratification
factor, the models implemented by ** survival** and

To illustrate the difference between the models implemented in the R
packages ** survival** and

`survreg`

function in the R package `icweib`

function in the Figure 1 compares the maximized value of the
log-likelihoods under both models, when the data are generated using a
simulation mechanism that corresponds to the model implemented in the R
package ** straweib**. The maximized value of the log-likelihood from
the R package

We illustrate the R package ** straweib** with data from a study on the
timing of emergence of permanent teeth in Flemish children in Belgium
(Leroy et al. 2003). The data analyzed were from the Signal-Tandmobiel project
(Vanobbergen et al. 2000), a longitudinal oral health study in a sample of 4430 children
conducted between 1996 and 2001. Dental examinations were conducted
annually for a period of 6 years and tooth emergence was recorded based
on visual inspection. As in Gomez et al. (2009), we will illustrate our R package by
analyzing the timing of emergence of the permanent upper left first
premolars. As dental exams were conducted annually, for each child, the
timing of tooth emergence is known up to the interval from the last
negative to the first positive dental examination.

```
data(tooth24)
head(tooth24)
id left right sex dmf1 1 2.7 3.5 1 1
2 2 2.4 3.4 0 1
3 3 4.5 5.5 1 0
4 4 5.9 Inf 1 0
5 5 4.1 5.0 1 1
6 6 3.7 4.5 0 1
```

The dataset is formatted to include 1 row per child. The variable
denoted **id** corresponds to the ID of the child, **left** and
**right** correspond to the left and right endpoints of the censoring
interval in years, **sex** denotes the gender of the child (0 = boy, and
1 = girl), and **dmf** denotes the status of primary predecessor of the
tooth (0 = sound, and 1 = decayed or missing due to caries or filled).
Right censored observations are denoted by setting the variable
**right** to "Inf".

In our analysis below, we use the function `icweib`

in the package
** straweib**, to fit a stratified Weibull regression model, where the
variable

```
<- icweib(L = left, R = right, data = tooth24, strata = dmf, covariates = ~sex)
fit
fit
: 4386. Model Convergence: TRUE
Total observations used
:
Coefficients
coefficient SE z p.value0.331 0.0387 8.55 0
sex
- gamma(shape), lambda(scale):
Weibull parameters
straname strata gamma lambda0 5.99 1.63e-05
dmf 1 4.85 1.76e-04
dmf
for strata (H0: all strata's shape parameters are equal):
Test of proportional hazards test TestStat df p.value
Wald 44.2 1 2.96e-11
Likelihood Ratio 44.2 1 3.00e-11
Loglik(model)= -5501.781 Loglik(reduced)= -5523.87
Loglik(null)= -5538.309 Chisq= 73.05611 df= 1 p.value= 0
```

The likelihood ratio test of the PH assumption results in a *p* value of
3.00e-11, indicating that the PH model is not appropriate for this
dataset. Or in other words, the data suggest that the hazard functions
corresponding to the strata defined by \(dmf=0\) and \(dmf=1\) are not
proportional. From the stratified Weibull regression model, the
estimated regression coefficient for **sex** is 0.331, corresponding to
a hazard ratio of 1.39 (95% CI: 1.29 - 1.50). In the output above, the
maximized value of the log likelihood of the null model corresponds to
the model stratified by covariate **dmf** but excluding the explanatory
variable **sex**.

The *p* value from the Wald test of the null hypothesis of no effect of
gender results in a *p* value of approximately 0 (\(p < 10^{-16}\)), which
indicates that the timing of emergence of teeth is significantly
different between girls and boys.

To test the global null hypothesis that both covariates **sex** and
**dmf** are not associated with the outcome (time to teeth emergence),
we obtain the log-likelihood for global null model, as shown below.

```
<- icweib(L = left, R = right, data = tooth24)
fit0
fit0
: 4386. Model Convergence: TRUE
Total observations used
- gamma(shape), lambda(scale):
Weibull parameters
straname strata gamma lambda5.3 7.78e-05
strata ALL
Loglik(model)= -5596.986
Loglik(null)= -5596.986
```

The likelihood ratio test testing the global null hypothesis results in
a test statistic \(T=-2(l_R - l_F) = -2(-5596.986 + 5501.781)=190.41\),
which follows a \(\chi_{3}^2\) distribution under \(H_0\), resulting in a
*p* value of approximately 0 (\(p < 10^{-16}\)).

We illustrate the `HRatio`

function in the ** straweib** package to
estimate the hazard ratio and corresponding 95% confidence intervals for
comparing boys without tooth decay (\(dmf=0\)) to boys with evidence of
tooth decay (\(dmf=1\)), where the hazard ratio is evaluated at various
time points from 1 through 7 years.

```
HRatio(fit, times = 1:7, NumStra = 0, NumZ = 0, DemStra = 1, DemZ = 0)
*(Z1-Z2) HR low95 high95
time NumStra DemStra beta1 1 0 1 0 0.1143698 0.06596383 0.1982972
2 2 0 1 0 0.2520248 0.18308361 0.3469262
3 3 0 1 0 0.4000946 0.33112219 0.4834339
4 4 0 1 0 0.5553610 0.49863912 0.6185351
5 5 0 1 0 0.7162080 0.66319999 0.7734529
6 6 0 1 0 0.8816470 0.79879884 0.9730878
7 7 0 1 0 1.0510048 0.91593721 1.2059899
```

The output indicates that the hazard ratio for boys comparing the stratum \(dmf = 0\) to stratum \(dmf = 1\) is small initially (e.g. 0.11 at 1 year) but tends to 1 in later years (e.g. 0.88 at 6 years and 1.05 at 7 years). Prior to 6 years, the hazard ratio is significantly less than 1, indicating that the timing of teeth emergence is delayed in children with tooth decay (\(dmf=1\)) when compared to children without tooth decay (\(dmf=0\)).

We illustrate estimation of the survival function in Figure 2 by plotting the survival functions and corresponding 95% point wise confidence intervals for girls (\(Z=1\)), with and without tooth decay .

```
plot(fit, Z = 1, tRange = c(1, 7), xlab = "Time (years)", ylab = "Survival Function",
main = "Estimated survival function for girls")
```

We compare our results from the ** straweib** package to that obtained
from the

```
library(survival)
<- tooth24
tooth24.survreg $right <- with(tooth24, ifelse(is.finite(right), right, NA))
tooth24.survreg<- survreg(Surv(left, right, type="interval2") ~ sex + strata(dmf) + factor(dmf),
fit1 data = tooth24.survreg)
fit1
:
Callsurvreg(formula = Surv(left, right, type = "interval2") ~ sex +
strata(dmf) + factor(dmf), data = tooth24.survreg)
:
Coefficientsfactor(dmf)1
(Intercept) sex 1.84389938 -0.06254599 -0.06491729
:
Scale=Sound1 dmf=Sound2
dmf0.1659477 0.2072465
Loglik(model)= -5499.3 Loglik(intercept only)= -5576.2
= 153.8 on 2 degrees of freedom, p= 0
Chisq= 4386 n
```

The maximized value of the log-likelihood from the R package
** survival** is \(-5499.3\) (shown below), as compared to the maximized
value of the log-likelihood of \(-5501.8\) from the R package

To clarify the specific assumptions made by the models implemented in
the ** survival** and

```
<- icweib(L= left, R=right, data=tooth24[tooth24$dmf==0, ], covariates = ~sex)
fit20 ### Partial results shown below
fit20 :
Coefficients
coefficient SE z p.value0.448 0.0543 8.25 2.22e-16 sex
```

The results from the Weibull PH model fit to the subgroup \(dmf=1\) is shown below:

```
<- icweib(L= left, R=right, data=tooth24[tooth24$dmf==1, ], covariates = ~sex)
fit21 ### Partial results shown below
fit21 :
Coefficients
coefficient SE z p.value0.208 0.0554 3.76 0.000169 sex
```

The model using the PH scale (implemented by ** straweib** package)
replaces the stratum specific hazard ratios for sex of \(e^{0.448}=1.57\)
for the subgroup \(dmf=0\) and \(e^{0.208}=1.23\) for the subgroup \(dmf=1\)
with a common value, \(e^{0.331}=1.39\).

Since the Weibull distribution has both the PH and accelerated failure
time (AFT) property (Collett 2003), the identical set of subgroup analyses
can be fit using the ** survival** package. Results from the fit using
the

```
<- survreg(Surv(left, right, type="interval2") ~ sex,
fit20.survreg data = tooth24.survreg[tooth24.survreg$dmf==0, ])
### Partial results shown below
fit20.survreg :
Coefficients
(Intercept) sex 1.85029150 -0.07453785
```

Similar results using the ** survival** package for the subgroup
\(dmf=1\) are shown below:

```
<- survreg(Surv(left, right, type="interval2") ~ sex,
fit21.survreg data = tooth24.survreg[tooth24.survreg$dmf==1, ])
### Partial results shown below
fit21.survreg :
Coefficients
(Intercept) sex 1.76931556 -0.04303767
```

In particular, the model assuming a common sex coefficient in the AFT
scale (implemented by ** survival** package) replaces the value of sex
coefficient \(-0.075\) for the subgroup with \(dmf=0\) and sex coefficient
of \(-0.043\) for the subgroup \(dmf=1\) with a shared common value,
\(-0.063\).

To assess the goodness of fit of the stratified Weibull model
implemented by ** straweib**, we created a multiple probability plot,
as described in chapter 19 of Meeker and Escobar (1998). This diagnostic
plot was created by splitting the dataset into 4 subgroups based on the
values of

Table 1 presents the estimates of hazard ratio for **sex**,
within each of the strata defined by \(dmf=0\) and \(dmf=1\), comparing
three different analyses - (1) Using the ** survival** package to
stratify on the variable

```
<- exp(fit$coef[1, 1])
HR.straweib <- exp(-fit1$coefficients['sex']/fit1$scale)
HR.survreg <- exp(c(fit20$coef[1, 1], fit21$coef[1, 1])) HR.subgroup
```

stratum | Package survival |
Package straweib |
Stratum specific subgroup analyses |
---|---|---|---|

dmf = 0 | 1.46 | 1.39 | 1.56 |

dmf = 1 | 1.35 | 1.39 | 1.23 |

We have developed and illustrated an R package ** straweib** for the
analysis of interval-censored outcomes, based on a stratified Weibull
regression model. The proposed model shares similarities with the
semi-parametric stratified Cox model. We illustrated the R package

Although the models and R package are illustrated for the analysis of
interval-censored time-to-event outcomes, the methods proposed here are
equally applicable for the analysis of right-censored outcomes. The
syntax for the analysis of right-censored observations is explained in
the manual accompanying the ** straweib** package available on CRAN
(Gu and R. Balasubramanian 2013).

This research was supported by NICHD grant R21 HD072792.

ClinicalTrials, Econometrics, Survival

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

D. Collett. *Modelling Survival Data in Medical Research, Second Edition.* Texts in statistical science Taylor & Francis, 2003.

D. R. Cox and D. V. Hinkley. *Theoretical Statistics.* Chapman; Hall, 1979.

D. T. Dunn, R. J. Simonds, M. Bulterys, L. A. Kalish, J. Moye, A. de Maria, C. Kind, C. Rudin, E. Denamur, A. Krivine, et al. Interventions to prevent vertical transmission of HIV-1: Effect on viral detection rate in early infant samples. *AIDS*, 14(10): 1421–1428, 2000.

M. P. Fay and P. A. Shaw. Exact and asymptotic weighted logrank tests for interval censored data: The interval R package. *Journal of Statistical Software*, 36(2): 1–34, 2010. URL http://www.jstatsoft.org/v36/i02/.

D. M. Finkelstein. A proportional hazards model for interval-censored failure time data. *Biometrics*, 42(4): 845–854, 1986.

E. Goetghebeur and L. Ryan. Semiparametric regression analysis of interval-censored data. *Biometrics*, 56(4): 1139–1144, 2000.

G. Gomez, M. L. Calle, R. Oller and K. Langohr. Tutorial on methods for interval-censored data and their implementation in R. *Statistical Modelling*, 9(4): 259–297, 2009.

X. Gu and R. Balasubramanian. *straweib: Stratified Weibull Regression Model, .* R package version 1.0, 2013. URL http://CRAN.R-project.org/package=straweib.

R. Leroy, K. Bogaerts, E. Lesaffre and D. Declerck. The emergence of permanent teeth in flemish children. *Community Dentistry and Oral Epidemiology*, 31(1): 30–39, 2003.

J. C. Lindsey and L. M. Ryan. Tutorial in biostatistics - methods for interval-censored data. *Statistics in Medicine*, 17(2): 219–238, 1998.

W. Q. Meeker and L. A. Escobar. *Statistical methods for reliability data.* Wiley, 1998.

T. Therneau. *A Package for Survival Analysis in S, .* R package version 2.36-14, 2012.

B. W. Turnbull. Empirical distribution function with arbitrarily grouped, censored and truncated data. *Journal of the Royal Statistical Society Series B-Methodological*, 38(3): 290–295, 1976.

J. Vanobbergen, L. Martens, E. Lesaffre and D. Declerck. The Signal-Tandmobiel project a longitudinal intervention health promotion study in flanders (belgium): Baseline and first year results. *Eur J Paediatr Dent*, 2: 87–96, 2000.

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Gu, et al., "Stratified Weibull Regression Model for Interval-Censored Data", The R Journal, 2014

BibTeX citation

@article{RJ-2014-003, author = {Gu, Xiangdong and Shapiro, David and Hughes, Michael D. and Balasubramanian, Raji}, title = {Stratified Weibull Regression Model for Interval-Censored Data}, journal = {The R Journal}, year = {2014}, note = {https://rjournal.github.io/}, volume = {6}, issue = {1}, issn = {2073-4859}, pages = {31-40} }