We present the frailtyHL package for fitting semi-parametric frailty models using h-likelihood. This package allows lognormal or gamma frailties for random-effect distribution, and it fits shared or multilevel frailty models for correlated survival data. Functions are provided to format and summarize the frailtyHL results. The estimates of fixed effects and frailty parameters and their standard errors are calculated. We illustrate the use of our package with three well-known data sets and compare our results with various alternative R-procedures.
Frailty models with a non-parametric baseline hazard have been widely adopted for the analysis of survival data (Hougaard 2000; Duchateau and P. Janssen 2008). The frailtyHL package (Ha, M. Noh and Y. Lee 2012) implements h-likelihood procedures (Ha, Y. Lee and J. K. Song 2001; Ha and Y. Lee 2003, 2005) for frailty models. The package fits Cox proportional hazards models (PHMs) with random effects, or frailties. The lognormal or gamma distribution can be adopted as the frailty distribution, corresponding to the normal or log-gamma distribution for the log frailties. H-likelihood obviates the need for marginalization over the frailty distribution, thereby providing a statistically efficient procedure for various random-effect models (Lee, J. A. Nelder and Y. Pawitan 2006).
The main function is frailtyHL()
. For instance,
> frailtyHL(Surv(time, status) ~ x + (1|id),
+ RandDist = "Normal",
+ mord = 0, dord = 1,
+ Maxiter = 200, convergence = 10^-6,
+ varfixed = FALSE, varinit = 0.1)
fits a lognormal frailty model. The first argument is a formula object,
with the response on the left of a ~
operator, and the terms for the
fixed and random effects on the right. The response is a survival object
as returned by the Surv
function (Therneau 2011). Here, time
and
status
denote survival time and censoring indicator taking value 1 or
0 for uncensored or censored observations, respectively; x
denotes a
fixed covariate and id
denotes the subject identifier. The expression
(1|id)
specifies a random intercept model ((x|id)
would specify a
random intercept, random slope model). The parameters mord
and dord
are the orders of Laplace approximations to fit the mean parameters
(mord
= 0 or 1) and the dispersion parameters (dord
= 1 or 2),
respectively. The Maxiter
parameter specifies the maximum number of
iterations and convergence
specifies the tolerance of the convergence
criterion. If varfixed
is specified as TRUE
(or FALSE
), the value
of one or more of the variance terms for the frailties is fixed (or
estimated) with starting value (e.g. 0.1) given by varinit
.
Previously, frailty models have been implemented in several R functions
such as the coxph()
function in the
survival package
(Therneau 2010) and the coxme()
function in the
coxme package
(Therneau 2011), based on penalized partial likelihood (PPL), the
phmm()
function in the
phmm package
(Donohue and R. Xu 2012), based on a Monte Carlo EM (MCEM) method, and the
frailtyPenal()
function in the
frailtypack package
(Gonzalez, V. Rondeau Y. Mazroui, A. Mauguen, and A. Diakité 2012), based on penalized marginal
likelihood. The phmm package fits one-component frailty models,
although it does allow multivariate frailties. The coxme()
function
can fit the multi-component model as shown in Example 2. Results from
frailtyHL are compared with those from survival, coxme and phmm.
Recently, the h-likelihood procedures of Lee and J. A. Nelder (1996) for fitting
hierarchical generalized linear models (HGLMs) have been implemented
using the hglm()
function in the
hglm package
(Alam, L. Ronnegard, and X. Shen 2010), the HGLMfit()
function in the
HGLMMM package
(Molas 2010) and the dhglmfit()
function in the
dhglm package
(Noh and Y. Lee 2011). The frailtyHL package is for fitting frailty models
with a non-parametric baseline hazard, providing estimates of fixed
effects, random effects, and variance components as well as their
standard errors. In addition, it provides a statistical test for the
variance components of frailties and also three AIC criteria for the
model selection.
The frailtyHL package makes it possible to
fit models with log-normal and gamma frailty distributions and
estimate variance components when the frailty structure is shared or nested.
For illustration, we present two models below and show how to fit these
models using frailtyHL()
in the Examples section.
Suppose that data consist of right censored time-to-event observations from \(q\) subjects (or clusters), with \(n_i\) observations each; \(i=1,\ldots ,q\). Let \(T_{ij}\) be the survival time for the \(j\)th observation of the \(i\)th subject; \(j=1,\ldots ,n_{i}\). Here, \(n=\sum_{i} n_{i}\) is the total sample size and \(n_i\) is the cluster size. Let \(C_{ij}\) be the corresponding censoring time and let \(y_{ij}=\mathrm{min} \{T_{ij},C_{ij}\}\) and \(\delta _{ij}=I(T_{ij}\leq C_{ij})\) be the observable random variables, where \(I(\cdot )\) is the indicator function.
Given the common unobserved frailty for the \(i\)th subject \(u_{i}\), the conditional hazard function of \(T_{ij}\) is of the form
\[\lambda _{ij}(t|u_{i})=\lambda _{0}(t)\exp (x_{ij}^{T}\beta )u_{i}, \label{eq:1} \tag{1}\]
where \(\lambda _{0}(\cdot )\) is an unspecified baseline hazard function and \(\beta=(\beta_1, \ldots, \beta_p)^T\) is a vector of regression parameters for the fixed covariates \(x_{ij}\). Here, the term \(x_{ij}^{T}\beta\) does not include an intercept term because of identifiability. Assume that the frailties \(u_{i}\) are independent and identically distributed (i.i.d) random variables with a frailty parameter \(\alpha\); the gamma and lognormal frailty models assume gamma and lognormal distributions for \(u_i\), respectively. That is, our package allows
The model represented by (1) is known as a shared or one-component frailty model (Hougaard 2000).
We can fit a multi-component model as below (Ha, Y. Lee and G. MacKenzie 2007):
\[X\beta +Z^{(1)}v^{(1)}+Z^{(2)}v^{(2)}+\cdots +Z^{(k)}v^{(k)}, \label{eq:2} \tag{2}\]
\(X\) is the \(n\times p\) model matrix, \(Z^{(r)}~(r=1,2,\ldots ,k)\) are the \(n\times q_{r}\) model matrices corresponding to the \(q_{r}\times 1\) frailties \(v^{(r)}\), and \(v^{(r)}\) and \(v^{(l)}\) are independent for \(r\neq l\).
For example, the CGD data (Fleming and D. P. Harrington 1991) have a multilevel structure in which patients nested within centers have recurrent infection times.
Later, we analyze these data using model (2) with \(k=2\). Here, the frailty structures are:
\(v^{(1)}\): center frailty \(\sim N(0, \alpha_1 A_1)\),
\(v^{(2)}\): patient frailty \(\sim N(0, \alpha_2 A_2)\),
where \(A_{r}=I_{r}~(r=1,2)\) are the \(q_{r}\times q_{r}\) identity matrices, and \(q_{1}\) and \(q_{2}\) are the number of centers and patients, respectively. Notice that the corresponding \(Z^{(1)}\) to \(v^{(1)}\) (or \(Z^{(2)}\) to \(v^{(2)}\)) is, respectively, a matrix of indicator variables such that \(Z_{st}^{(1)}=1\) (or \(Z_{st}^{(2)}=1\)) if observation \(s\) is a member of center (or patient) \(t\) and 0 otherwise (Therneau and P. M. Grambsch 2000). This is called the multilevel frailty model (Yau 2001; Ha, Y. Lee and G. MacKenzie 2007).
The h-likelihood \(h\) (Ha, Y. Lee and J. K. Song 2001) for frailty model (1) is defined by
\[h=h(\beta ,\lambda _{0},\alpha )=\ell _{0}+\ell _{1}, \label{eq:3} \tag{3}\]
where \[\begin{aligned} \ell _{0} &= \sum_{ij}\log f(y_{ij},\delta _{ij}|u_{i};\beta ,\lambda_{0}) \\ &= \sum_{ij}\delta _{ij}\{\log \lambda _{0}(y_{ij})+\eta _{ij}\} - \sum_{ij}\Lambda_{0}(y_{ij})\exp (\eta _{ij}) \end{aligned}\] is the sum of conditional log densities for \(y_{ij}\) and \(\delta _{ij}\) given \(u_{i}\), and \[\ell _{1}=\sum_{i}\log f(v_{i};\alpha )\] is the sum of log densities for \(v_{i}=\log u_{i}\) with parameter \(\alpha\). Here, \(\eta _{ij}=x_{ij}^{T}\beta +v_{i}\) is the linear predictor for the hazards, and \[\Lambda _{0}(t)=\int_{0}^{t}\lambda _{0}(k)dk\] is the baseline cumulative hazard function.
The functional form of \(\lambda _{0}(t)\) in (1) is unknown; following Breslow (1972), we consider \(\Lambda _{0}(t)\) to be a step function with jumps at the observed event times: \[\Lambda _{0}(t)=\sum_{k:y_{(k)}\leq t}\lambda _{0k}\] where \(y_{(k)}\) is the \(k\)th (\(k=1,\ldots ,l\)) smallest distinct event time among the \(y_{ij}\)’s, and \({\lambda }_{0k}=\lambda _{0}(y_{(k)})\). However, the dimension of \(\lambda _{0}=(\lambda _{01},\ldots ,\lambda _{0l})^{T}\) increases with the sample size \(n\). For inference, Ha, Y. Lee and J. K. Song (2001) proposed the use of the profile h-likelihood with \(\lambda _{0}\) eliminated, \(h^{\ast }\equiv h|_{\lambda _{0}=\widehat{\lambda }_{0}}\), given by
\[h^{\ast }=h^{\ast }(\beta ,\alpha )=\ell _{0}^{\ast }+\ell _{1}, \label{eq:4} \tag{4}\]
where \[\begin{aligned} \ell _{0}^{\ast } &= \sum_{ij}\log f^{\ast }(y_{ij},\delta_{ij}|u_{i};\beta )\\ &= \sum_{ij}f(y_{ij},\delta _{ij}|u_{i}; \beta ,\widehat{\lambda }_{0}) \end{aligned}\] does not depend on \(\lambda _{0}\), and
\[\widehat{\lambda }_{0k}(\beta ,v)=\frac{d_{(k)}}{\sum_{~(i,j)\in R_{(k)}}\exp (\eta _{ij})}, \label{eq:5} \tag{5}\] are solutions of the estimating equations, \(\partial h/\partial \lambda _{0k}=0\), for \(k=1,\ldots ,l\). Here, \(d_{(k)}\) is the number of events at \(y_{(k)}\) and \[R_{(k)}=R(y_{(k)})=\{(i,j):y_{ij}\geq y_{(k)}\}\] is the risk set at \(y_{(k)}\). Therneau and P. M. Grambsch (2000) and Ripatti and J. Palmgren (2000) proposed an h-likelihood (3), while using the partial likelihood (Breslow 1974) for \(\ell _{0}\). They call it the penalized partial likelihood (PPL) \(h_{p}\), defined by \[\begin{split} h_{p}(\beta ,v,\alpha )=&\sum_{ij}\delta _{ij}\eta _{ij}-\\ & \sum_{k}d_{(k)}\log \biggl \{\sum_{~ij\in R_{(k)}~}\exp (\eta _{ij})\biggl \}+\ell _{1}. \end{split}\] Furthermore, Ha, Y. Lee and J. K. Song (2001) and Ha, M. Noh and Y. Lee (2010) have shown that \(h^{\ast }\) is proportional to the PPL \(h_{p}\) because \[\begin{aligned} h^{\ast } &= \sum_{k}d_{(k)}\log \widehat{\lambda }_{0k}+\sum_{ij}\delta _{ij}\eta _{ij}-\sum_{k}d_{(k)}~+~\ell _{1} \\ &= h_{p}+\sum_{k}d_{(k)}\{\log d_{(k)}-1\}, \end{aligned}\] where \(\sum_{k}d_{(k)}\{\log d_{(k)}-1\}\) is a constant that does not depend upon unknown parameters. Notice here that PPL \(h_{p}\) does not depend on nuisance parameters \(\lambda _{0}\). Thus, Lee and Nelder’s (1996, 2001) h-likelihood procedure for HGLMs can be directly used to fit frailty models based on \(h_{p}\) (Ha, M. Noh and Y. Lee 2010).
Lee and J. A. Nelder (1996, 2001) have proposed the use of the Laplace approximation based on the h-likelihood when the marginal likelihood, \(m=\log \{\int \exp (h)dv\}\), is hard to obtain. To reduce the bias in small cluster sizes higher-order approximations for the mean (\(\beta\)) and the frailty variance (\(\alpha\)) have been developed. The lower-order approximation is computationally efficient, but could have large biases when cluster sizes are small (Ha and Y. Lee 2003, 2005; Ha, M. Noh and Y. Lee 2010).
The h-likelihood procedures use the Breslow method for handling tied event times, while the PPL procedures allow the Efron method. For estimating \(\beta\), the h-likelihood methods allow the Laplace approximation \(p_{v}(h_{p})\) to a marginal partial likelihood \(m_{p}=\log \{\int \exp (h_{p})dv\}\), but the PPL procedures do not. For estimating \(\alpha\), the PPL methods use adjusted profile h-likelihoods \(p_{v}(h_{p})\) and \(p_{\beta ,v}(h_{p}\)) which give maximum partial likelihood (MPL) and restricted maximum partial likelihood (REMPL) estimators, respectively. In contrast, the h-likelihood method uses the restricted partial likelihood (based upon the first-order Laplace approximation \(p_{\beta ,v}(h_{p})\) or the second-order Laplace approximation \(s_{\beta ,v}(h_{p})\)) for REMPL estimators. Here \(p_{v}(h_{p})\) and \(p_{\beta ,v}(h_{p})\) are defined as follows: \[p_{v}(h_{p})=[h_{p}-\frac{1}{2}\log \det \{D(h_{p},v)/(2\pi )\}]|_{v=\hat{v},}\] where \(D(h_{p},v)=-\partial ^{2}h_{p}/\partial v^{2}\) and \(\hat{v}\) solves \(\partial h_{p}/\partial v=0\), which is the first-order Laplace approximation of \(m_{p}\), and \[p_{\beta ,v}(h_{p})=\left.\left[h_{p}-\frac{1}{2}\log \det \left\{ \frac{D(h_{p},(\beta ,v))}{2\pi} \right\}\right]\right|_{\beta =\hat{\beta},v=\hat{v},}\] where \(D(h_{p},(\beta ,v))=-\partial ^{2}h_{p}/\partial (\beta ,v)^{2}\) and \((\hat{\beta},\hat{v})\) solves \(\partial h_{p}/\partial (\beta ,v)=0\), which becomes the Cox and N. Reid (1987) adjusted profile marginal likelihood, eliminating fixed effects \(\beta\) by conditioning their asymptotic sufficient statistics \(\hat{\beta},\) in addition to eliminating random effects \(v\) by the first-order Laplace approximation (Ha and Y. Lee 2003; Lee, J. A. Nelder and Y. Pawitan 2006). The corresponding second-order approximation is \[s_{\beta ,v}(h_{p})=p_{\beta ,v}(h_{p})-\{F(h_{p})/24\},\] with \[\begin{gathered} \textstyle F(h_{p})=\left.\mathrm{tr}\left[-\left\{3\frac{\partial ^{4}h_{p}}{\partial v^{4}}+ 5\frac{\partial ^{3}h_{p}}{\partial v^{3}}D(h_{p},v)^{-1} \frac{\partial^{3}h_{p}}{\partial v^{3}}\right\} %% Braces must match on each line of multline: \right.\right. %fake closure \\ %split \left.\left. %fake open on next line \times D(h_{p},v)^{-2}\right]\right|_{v=\widehat{v}.} \end{gathered}\] To reduce the computational burden Ha, M. Noh and Y. Lee (2010) used \(F(h)\) instead of \(F(h_{p})\).
Criterion | Literature | ||
Method | \(\beta\) | \(\alpha\) | |
HL(0,1) | \(h_p\) | \(p_{\beta,v}(h_p)\) | Ha and Y. Lee (2003) |
HL(0,2) | \(h_p\) | \(s_{\beta,v}(h_p)\) | Ha and Y. Lee (2003) |
HL(1,1) | \(p_v(h_p)\) | \(p_{\beta,v}(h_p)\) | Ha, M. Noh and Y. Lee (2012) |
HL(1,2) | \(p_v(h_p)\) | \(s_{\beta,v}(h_p)\) | Ha, M. Noh and Y. Lee (2012) |
coxph() |
\(h_p\) | \(p_{\beta,v}(h_p)\) | Therneau (2010) |
for LN FM | |||
coxph() |
\(h_p\) | \(m\) | Therneau (2010) |
for gamma FM | |||
coxme() |
\(h_p\) | \(p_{v}(h_p)\) | Therneau (2011) |
for LN FM | |||
phmm() |
\(m\) | \(m\) | Donohue and R. Xu (2012) |
for LN FM |
Table 1 shows historical developments of estimating criteria for frailty
models. The frailtyHL()
function provides estimators based on the
h-likelihood. As the orders in mord
and dord
increase, the bias of
estimator is reduced, but the calculation becomes computationally
intensive due to the extra terms. We recommend the use of HL(1,1) for
the lognormal frailty and of HL(1,2) for the gamma frailty. However,
HL(0,1) for the lognormal frailty and HL(0,2) for the gamma frailty
often perform well if \(\alpha\) is not large. Note that the variance
matrices of \(\hat{\tau}=(\hat{\beta},\hat{v})\) and \(\hat{\alpha}\) are
directly obtained from the Hessian matrices, \(\{-\partial ^{2}h_{p}/\partial \tau ^{2}\}^{-1}\) and \(\{-\partial ^{2}p_{\beta ,v}(h_{p})/\partial \alpha ^{2}\}^{-1}\), respectively; the frailtyHL
package provides the standard errors (SEs) of \(\hat{\alpha}\) as well as
\(\hat{\beta}\). For the use of standard errors of \(\hat{v}-v\), see
Lee and I. D. Ha (2010), Lee, M. Jang and W. Lee (2011) and Ha, R. Sylvester, C. Legrand and G. MacKenzie (2011).
Based on the PPL methods, the coxph()
and coxme()
functions,
respectively, implement REMPL and MPL estimators for \(\alpha\) in the
lognormal frailty model, and the coxph()
function the maximum
likelihood (ML) estimators, maximizing the marginal likelihood \(m\), for
\(\alpha\) in the gamma frailty model. For comparison, we present the
Breslow and Efron methods for handling ties in survival times in the
coxph()
and coxme()
functions; Therneau (2010) recommended the
Efron method. For the lognormal frailty the ML estimator maximizing \(m\)
is available via the phmm()
function, but care must be taken to ensure
that the MCEM algorithm has converged (Donohue and R. Xu 2012). However, the
ML estimator can be biased when the number of nuisance parameters
increases with the sample size (Ha, M. Noh and Y. Lee 2010).
Furthermore, for the lognormal frailty the coxph()
function uses the
existing codes in linear mixed models so that it misses the
\(\partial \hat{v} /\partial \alpha\) term in solving the score equation \(\partial p_{\beta ,v}(h_{p})/\partial \alpha =0\); this can lead to an underestimation of
the parameters, especially when the number of subjects \(q\) is large or
censoring is high (Lee, J. A. Nelder and Y. Pawitan 2006; Ha, M. Noh and Y. Lee 2010). To
overcome this problem, in gamma frailty models Therneau and P. M. Grambsch (2000)
develop the code for the ML estimator for \(\alpha\).
Suppose that HL(0,1) is used. The fitting algorithm is as follows:
Take \((0,0,0.1)\) as initial estimates of components of \(% (\beta ,v,\alpha )\).
Given \(\hat{\alpha}\), new estimates \((\hat{\beta},\hat{v})\) are obtained by solving the joint estimating equations \(\partial h_{p}/\partial (\beta,v)= \{\partial h/\partial (\beta,v)\}|_{\lambda_0=\hat \lambda_0}=0\); then, given \((\hat{\beta},\hat{v})\), new estimates \(\hat{% \alpha}\) are obtained by solving \(\partial p_{\beta ,v}(h_{p})/\partial \alpha =0\).
Repeat Step 2 until the maximum absolute difference between the previous and current estimates for \((\beta, v)\) and \(\alpha\) is less than \(10^{-6}\).
After convergence, we compute the estimates of the standard errors of \(\hat \beta\) and \(\hat \alpha\).
To demonstrate differences of various estimation methods in small
cluster size \(n_{i}\equiv 2\), we use the kidney infection data
(McGilchrist and C. W. Aisbett 1991). The data consist of times until the first
and second recurrences (\(n_{i}\equiv 2\) ) of kidney infection in 38
(\(q=38\)) patients using a portable dialysis machine. Each survival time
(time
) is the time until infection since the insertion of the
catheter. The survival times for the same patient are likely to be
related because of a shared frailty describing the common patient’s
effect. The catheter is later removed if infection occurs and can be
removed for other reasons, which we regard as censoring; about 24% of
the data were censored.
We fit frailty models with two covariates, the sex (1 = male; 2 =
female) and age of each patient, using the functions (frailtyHL()
,
coxph()
, coxme()
and phmm()
) in the four packages. The results
are summarized in Table 2.
Sex | Age | Patient | |
Method | \(\hat \beta_1\) (SE) | \(\hat \beta_2\) (SE) | \(\hat \alpha\) (SE) |
lognormal model | |||
HL(0,1) | -1.380 | 0.005 | 0.535 |
(0.431) | (0.012) | (0.338) | |
HL(1,1) | -1.414 | 0.005 | 0.545 |
(0.432) | (0.012) | (0.340) | |
coxph() |
-1.388 | 0.005 | 0.551 |
(Breslow) | (0.441) | (0.012) | ( – ) |
coxph() |
-1.411 | 0.005 | 0.569 |
(Efron) | (0.445) | (0.013) | ( – ) |
coxme() |
-1.332 | 0.005 | 0.440 |
(Breslow) | (0.414) | (0.012) | ( – ) |
coxme() |
-1.355 | 0.004 | 0.456 |
(Efron) | (0.417) | (0.012) | ( – ) |
phmm() |
-1.329 | 0.004 | 0.378 |
(0.452) | (0.012) | ( – ) | |
gamma model | |||
HL(0,2) | -1.691 | 0.007 | 0.561 |
(0.483) | (0.013) | (0.280) | |
HL(1,2) | -1.730 | 0.007 | 0.570 |
(0.485) | (0.013) | (0.281) | |
coxph() |
-1.557 | 0.005 | 0.398 |
(Breslow) | (0.456) | (0.012) | ( – ) |
coxph() |
-1.587 | 0.005 | 0.412 |
(Efron) | (0.461) | (0.012) | ( – ) |
In PPL procedures (coxph()
and coxme()
), the Breslow method provides
slightly smaller estimate for \(\alpha\) than the Efron method. In the
lognormal frailty, REMPL procedures (fraityHL()
and coxph()
) give
larger estimates for \(\alpha\) than ML (phmm()
) and MPL (coxme()
)
procedures. However, both ML and MPL estimates from phmm()
and
coxme()
are somewhat different when cluster size is small,
\(n_{i}\equiv 2\) for all \(i\). For the gamma frailty, coxph()
uses the
ML procedure, but it still gives smaller estimate for \(\alpha\) than the
REMPL (h-likelihood) procedures. Compared with the h-likelihood methods,
PPL methods are computationally more efficient, but could have larger
biases (Ha, M. Noh and Y. Lee 2010).
The current version of the frailtyHL package allows multi-component (multilevel) frailties. Allowance for correlation (Ha, R. Sylvester, C. Legrand and G. MacKenzie 2011) between random effects, for example correlations between random center effect and random treatment effect, is currently in progress. Other developments include dispersion frailty models based on double HGLMs (Lee and J. A. Nelder 2006), which allow fixed and random effects in dispersion parts (i.e. variance of random effects) of the model as well as the hazard functions, and joint modelling (Ha, T. Park and Y. Lee 2003) of HGLMs and frailty models.
The data set presented by Mantel, N. R. Bohidar and J. L. Ciminera (1977) is based on a
tumorigenesis study of 50 (\(q=50\)) litters of female rats. For each
litter, one rat was selected to receive the drug and the other two rats
were placebo-treated controls (\(n_{i}\equiv 3\)). Here each litter is
treated as a cluster. The survival time (time
) is the time to
development of tumor, measured in weeks. Death before occurrence of
tumor yields a right-censored observation; forty rats developed a tumor,
leading to censoring of about 73%. The survival times for rats in a
given litter may be correlated due to a random effect representing
shared genetic or environmental effects.
We fit frailty models with one covariate, rx (1 = drug; 0 = placebo),
using frailtyHL()
. Below, we present the code and results for the
lognormal frailty model with HL(1,1). The output from the R code shows
that the effect of rx is significant (t-value = 2.823 with p-value =
0.005). That is, the rx group has a significantly higher risk than in
control group. Here, the variance estimate of the frailty is
\(\hat{\alpha}=0.427\) (with SE = 0.423).
Note that although we report the SE of \(\alpha\), one should not use it
for testing the absence of frailty \(\alpha =0\) (Vaida and R. Xu 2000). Such
a null hypothesis is on the boundary of the parameter space, so that the
critical value of an asymptotic \((\chi _{0}^{2}+\chi _{1}^{2})/2\)
distribution is 2.71 at 5% significant level
(Lee, J. A. Nelder and Y. Pawitan 2006; Ha, R. Sylvester, C. Legrand and G. MacKenzie 2011). The
difference in deviance (based on REMPL) \(-2p_{\beta ,v}(h_{p})\) between
the Cox model without frailty and the lognormal frailty model is
\(364.15-362.56=1.59(<2.71)\), indicating that the frailty effect is
non-significant, i.e. \(\alpha =0\). Here, the results from fitting the
Cox model without frailty are available by adding the two arguments
varfixed = TRUE
and varinit = 0
in the frailtyHL()
: see below.
> library(survival)
> data(rats)
> frailtyHL(Surv(time,status) ~ rx + (1|litter),
+ data = rats,
+ varfixed = TRUE, varinit = 0)
:
iteration 4
:
convergence 4.801639e-09
1] "converged"
[1] "Results from the Cox model"
[1] "Number of data : "
[1] 150
[1] "Number of event : "
[1] 40
[1] "Model for conditional hazard : "
[Surv(time, status) ~ rx + (1 | litter)
1] "Method : HL(0,1)"
[1] "Estimates from the mean model"
[-value p-value
Estimate Std. Error t0.8982 0.3174 2.83 0.004655
rx 1] "Estimates from the dispersion model"
[
Estimate Std. Error"0" "NULL"
litter -2h0 -2*hp -2*p_b,v(hp)
1,] 363.69 363.69 364.15
[
cAIC mAIC rAIC1,] 365.69 365.69 364.15 [
> frailtyHL(Surv(time,status) ~ rx + (1|litter),
+ data = rats, RandDist = "Normal",
+ mord = 1, dord = 1)
:
iteration 87
:
convergence 9.97616e-07
1] "converged"
[1] "Results from the log-normal frailty model"
[1] "Number of data : "
[1] 150
[1] "Number of event : "
[1] 40
[1] "Model for conditional hazard : "
[Surv(time, status) ~ rx + (1 | litter)
1] "Method : HL(1,1)"
[1] "Estimates from the mean model"
[-value p-value
Estimate Std. Error t0.9107 0.3226 2.823 0.004754
rx 1] "Estimates from the dispersion model"
[
Estimate Std. Error0.4272 0.4232
litter -2h0 -2*hp -2*p_v(hp) -2*p_b,v(hp)
1,] 335.97 397.36 362.14 362.56
[
cAIC mAIC rAIC1,] 362.22 366.14 364.56 [
The code and results for the gamma frailty model with HL(1,2) are presented below. The output shows that these results are similar to those of the lognormal frailty, particularly for estimation of \(\beta\). The deviance difference (based on REMPL) between the Cox model and gamma frailty model using the second-order approximation \(-2s_{\beta ,v}(h_{p})\) is \(364.15-362.12=2.03(<2.71)\), again indicating the absence of frailty effect (i.e. \(\alpha =0\)) as evident in the lognormal frailty model.
> frailtyHL(Surv(time,status) ~ rx + (1|litter),
+ data = rats, RandDist = "Gamma",
+ mord = 1, dord = 2)
:
iteration 170
:
convergence 9.567765e-07
1] "converged"
[1] "Results from the gamma frailty model"
[1] "Number of data : "
[1] 150
[1] "Number of event : "
[1] 40
[1] "Model for conditional hazard : "
[Surv(time, status) ~ rx + (1 | litter)
1] "Method : HL(1,2)"
[1] "Estimates from the mean model"
[-value p-value
Estimate Std. Error t0.9126 0.3236 2.82 0.004806
rx 1] "Estimates from the dispersion model"
[
Estimate Std. Error0.5757 0.5977
litter -2h0 -2*hp -2*p_v(hp) -2*s_v(hp)
1,] 331.60 413.85 365.35 361.71
[-2*p_b,v(hp) -2*s_b,v(hp)
365.77 362.12
cAIC mAIC rAIC1,] 365.30 365.71 364.12 [
For the selection of a model among nested or non-nested models such as lognormal and gamma frailty models, we may use three Akaike information criteria (AIC) (Lee, J. A. Nelder and Y. Pawitan 2006; Ha, Y. Lee and G. MacKenzie 2007; Donohue, R. Overholser, R. Xu, and F. Vaida 2011) based on conditional likelihood, marginal likelihood and restricted likelihood, respectively, defined by \[\begin{aligned} \mathrm{cAIC} &= -2h_{0}+2\mathrm{df_{c}}, \\ \mathrm{mAIC} &= -2p_{v}(h_{p})+2\mathrm{df_{m}}, \\ \mathrm{rAIC} &= -2p_{\beta ,v}(h_{p})+2\mathrm{df_{r}}, \end{aligned}\] where \(h_{0}=\ell _{0}^{\ast }\) in (4), and \[\mathrm{df_{c}} = \mathrm{trace}\left\{ D^{-1}(h_{p},(\beta ,v)) D(h_{0},(\beta,v))\right\}\] is an ‘effective degrees of freedom adjustment’ for estimating the fixed and random effects, computed using the Hessian matrices \(D(h_{p},(\beta ,v))=-\partial ^{2}h_{p}/\partial (\beta ,v)^{2}\) and \(D(h_{0},(\beta ,v))=-\partial ^{2}h_{0}/\partial (\beta ,v)^{2}\). Note here that \(\mathrm{df_{m}}\) is the number of fixed parameters and \(\mathrm{df_{r}}\) is the number of dispersion parameters (Ha, Y. Lee and G. MacKenzie 2007). For calculation of the mAIC and rAIC of gamma frailty model using HL(0,2) or HL(1,2), we use the corresponding second-order approximations, defined by \(\mathrm{mAIC} =-2s_{v}(h_{p})+2\mathrm{df_{m}}\) and \(\mathrm{rAIC} =-2s_{\beta ,v}(h_{p})+2 \mathrm{df_{r}}\). We select a model to minimize the AIC values among models. If the AIC difference is larger than 1 the choice can be made (Sakamoto, M. Ishiguro and G. Kitagawa 1986). However, if the difference is less than 1 a simpler model can be selected by a parsimony principle (Donohue, R. Overholser, R. Xu, and F. Vaida 2011).
In the data set, in the Cox model cAIC=365.69, mAIC=365.69 and rAIC=364.15, and in the lognormal frailty model cAIC=362.22, mAIC=366.14 and rAIC=364.56, and in the gamma frailty model cAIC=365.30, mAIC=365.71 and rAIC=364.12. The likelihood tests based upon the REMPL showed the absence of frailty effect (\(\alpha =0\)), so that mAIC and rAIC of all the three models are similar. Thus, we may choose the parsimonious Cox model. However, the cAIC selects the lognormal model, indicating that this model could give better prediction.
The CGD data set presented by Fleming and D. P. Harrington (1991) consists of a
placebo-controlled randomized trial of gamma interferon (rIFN-g) in the
treatment of chronic granulomatous disease (CGD). 128 patients (id
)
from 13 centers (\(q_1=13, q_2=128\)) were tracked for around 1 year. The
number (i.e. cluster size) of patients per center ranged from 4 to 26.
The survival times (tstop
-tstart
) are the recurrent infection times
of each patient from the different centers. Censoring occurred at the
last observation for all patients, except one, who experienced a serious
infection on the day he left the study; in the CGD study about 63% of
the data were censored. The recurrent infection times for a given
patient are likely to be correlated. However, each patient belongs to
one of the 13 centers; hence, the correlation may also be attributed to
a center effect.
Ignoring important random components may render invalid many of the
traditional statistical analysis techniques. We fit a multilevel
lognormal frailty model with two frailties and a single covariate, treat
(rIFN-g, placebo), using frailtyHL()
. Here, the two frailties are
random center and patient effects. The code and results using HL(1,1)
are provided below. The output shows that the effect of treatment is
significant (t-value = -3.476 with p-value < 0.001), indicating that
rIFN-g significantly reduces the rate of serious infection in CGD
patients. The estimate of variance of patient frailty
(\(\hat{\alpha}_{2}=1.002\)) is considerably larger than variance of
center frailty (\(\hat{\alpha}_{1}=0.030\)), indicating that the
random-patient effect is more heterogeneous.
> library(survival)
> data(cgd)
> frailtyHL(Surv(tstop-tstart,status) ~ treat +
+ (1|center) + (1|id),
+ data = cgd,
+ RandDist = "Normal", mord = 1, dord = 1)
:
iteration 157
:
convergence 9.336249e-07
1] "converged"
[1] "Results from the log-normal frailty model"
[1] "Number of data : "
[1] 203
[1] "Number of event : "
[1] 76
[1] "Model for conditional hazard : "
[Surv(tstop-tstart,status)~treat+(1|center)+(1|id)
1] "Method : HL(1,1)"
[1] "Estimates from the mean model"
[-value p-value
Estimate Std. Error t-g -1.184 0.3407 -3.476 0.0005085
treatrIFN1] "Estimates from the dispersion model"
[
Estimate Std. Error0.02986 0.1572
center 1.00235 0.5089
id -2h0 -2*hp -2*p_v(hp) -2*p_b,v(hp)
1,] 603.30 853.66 692.63 692.95
[
cAIC mAIC rAIC1,] 684.92 698.63 696.95 [
For testing the absence of a random component (\(\alpha _{1}=0\) or \(\alpha _{2}=0\)), we use the deviance based on REMPL, \(-2p_{\beta ,v}(h_{p})\), and fit the following four models including the Cox model and three lognormal frailty models using HL(1,1) method,
Cox model without frailty \((\alpha _{1}=0,\alpha _{2}=0)\) has \(-2p_{\beta ,v}(h_{p})=707.48\),
model without patient effect \((\alpha_{1} >0, \alpha _{2}=0)\) has \(-2p_{\beta,v}(h_p)=703.66\),
model without center effect \((\alpha_{1}=0,\alpha_{2} >0)\) has \(-2p_{\beta,v}(h_p)=692.99\), and
multilevel model above requiring both patient and center effects \((\alpha_{1}>0,\alpha_{2}>0)\) has \(-2p_{\beta,v}(h_p)=692.95\).
The deviance difference between M3 and M4 is \(692.99-692.95=0.04\) , which is not significant at a 5% level (\(\chi _{1,0.10}^{2}=2.71\)), indicating the absence of the random-center effects, i.e. \(\alpha _{1}=0\). The deviance difference between M2 and M4 is 703.66-692.95=10.71, indicating that the random-patient effects are necessary, i.e. \(\alpha _{2}>0\). In addition, the deviance difference between M1 and M3 is 707.48-692.99=14.49, indicating that the random-patient effects are indeed necessary with or without random-center effects.
Let us choose a final model using information criteria. For M1 we have cAIC=708.68, mAIC=708.68 and rAIC=707.48; for M2 cAIC=702.96, mAIC=706.88 and rAIC=705.66; for M3 cAIC=684.84, mAIC=696.68 and rAIC=694.99; for M4 cAIC=684.92, mAIC=698.63 and rAIC=696.95. All of the three criteria choose M3 in the CGD data set.
Using the examples in the previous section, we compare the outcomes from
frailtyHL and other packages. We consider the three functions (
coxph()
, coxme()
and phmm()
) for the lognormal frailty model and
the coxph()
function for the gamma frailty model.
Example 1: Rat data The codes of coxph()
, coxme()
and phmm()
for fitting lognormal frailty model are, respectively, as follows:
>coxph(Surv(time, status) ~ rx +
+ frailty(litter, dist = "gauss"),
+ method = "breslow", rats)
> coxme(Surv(time, status) ~ rx + (1|litter),
+ ties = "breslow", rats)
> phmm(Surv(time, status) ~ rx + (1|litter),
+ data = rats, Gbs = 2000, Gbsvar = 3000,
+ VARSTART = 1, NINIT = 10,
+ MAXSTEP = 200, CONVERG=90)
Table 3 summarizes the estimation results. Even though cluster size
\(n_{i}\equiv 3\) is not large, the results are similar. For example, MPL
and ML estimates for \(\alpha\) from coxme()
and phmm()
are somewhat
different in Table 2 when \(n_{i}\equiv 2\), while they become similar in
Table 3.
Next, the code of coxph()
for fitting the gamma frailty model is
below:
> coxph(Surv(time, status) ~ rx +
+ frailty(litter, dist = "gamma"),
+ method = "breslow", rats)
The results of frailtyHL()
(HL(0,2), HL(1,2)) and coxph()
for gamma
frailty are also presented in Table 3. For the estimation of \(\beta\)
both results from frailtyHL()
and coxph()
are similar, but for
\(\alpha\) they are somewhat different. That is, our REMPL estimates from
frailtyHL()
(\(\hat{\alpha}=0.575\) with HL(0,2) and \(\hat{\alpha} =0.576\) with HL(1,2)) are somewhat larger than the ML estimates from
coxph()
(\(\hat{\alpha}=0.474\) with Breslow method and \(\hat{\alpha} =0.499\) with Efron method).
Rx | Litter | |
Method | \(\hat \beta\) (SE) | \(\hat \alpha\) (SE) |
lognormal model | ||
HL(0,1) | 0.906 (0.323) | 0.427 (0.423) |
HL(1,1) | 0.911 (0.323) | 0.427 (0.423) |
coxph() |
0.905 (0.322) | 0.395 ( – ) |
(Breslow) | ||
coxph() |
0.913 (0.323) | 0.412 ( – ) |
(Efron) | ||
coxme() |
0.905 (0.322) | 0.406 ( – ) |
(Breslow) | ||
coxme() |
0.913 (0.323) | 0.426 ( – ) |
(Efron) | ||
phmm() |
0.920 (0.326) | 0.449 ( – ) |
gamma model | ||
HL(0,2) | 0.908 (0.324) | 0.575 (0.598) |
HL(1,2) | 0.913 (0.324) | 0.576 (0.598) |
coxph() |
0.906 (0.323) | 0.474 ( – ) |
(Breslow) | ||
coxph() |
0.914 (0.323) | 0.499 ( – ) |
(Efron) |
Example 2: CGD data The code of the coxme()
function for fitting
multilevel lognormal frailty model is as follows:
> coxme(Surv(tstop - tstart,status) ~
+ treat + (1|center) + (1|id),
+ ties = "breslow", cgd)
The results of frailtyHL()
(HL(0,1), HL(1,1)) and coxme()
are
summarized in Table 4. The results from HL and PPL methods for frailty
parameters become more similar because the cluster sizes (the number of
patients from different centers) are somewhat large, ranging from 4 to
26.
Treat | Center | Patient | |
Method | \(\hat \beta\) (SE) | \(\hat \alpha_1\) (SE) | \(\hat \alpha_2\) (SE) |
lognormal model | |||
HL(0,1) | -1.074 | 0.026 | 0.982 |
(0.335) | (0.153) | (0.501) | |
HL(1,1) | -1.184 | 0.030 | 1.002 |
(0.341) | (0.157) | (0.509) | |
coxme() |
-1.074 | 0.033 | 0.939 |
(Breslow) | (0.333) | ( – ) | ( – ) |
coxme() |
-1.074 | 0.032 | 0.947 |
(Efron) | (0.333) | ( – ) | ( – ) |
The h-likelihood method offers novel possibilities to fit various models with random effects. The frailtyHL package for frailty models eliminates the nuisance parameters \(\lambda_{0}\) in the h-likelihood (3) by profiling. Such models have important applications in multi-center clinical study (Vaida and R. Xu 2000), meta analysis (Rondeau, S. Michiels, B. Liquet and J. P. Pignon 2008), and genetic analysis (Lee, J. A. Nelder and Y. Pawitan 2006). Therefore, this package can be potentially adopted by statisticians in several different fields.
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 2009-0088978, No. 2010-0021165 and No. 2010-0011372).
frailtyHL, survival, coxme, phmm, frailtypack, hglm, HGLMMM, dhglm
Agriculture, ClinicalTrials, Econometrics, MixedModels, Spatial, Survival
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ha, et al., "frailtyHL: A Package for Fitting Frailty Models with H-likelihood", The R Journal, 2012
BibTeX citation
@article{RJ-2012-010, author = {Ha, Il Do and Noh, Maengseok and Lee, Youngjo}, title = {frailtyHL: A Package for Fitting Frailty Models with H-likelihood}, journal = {The R Journal}, year = {2012}, note = {https://rjournal.github.io/}, volume = {4}, issue = {2}, issn = {2073-4859}, pages = {28-37} }