28 CONTRIBUTED RESEARCH ARTICLES frailtyHL: A Package for Fitting Frailty Models with H-likelihood

We present the frailtyHL package for fitting semi-parametric frailty models using h- likelihood. This package allows lognormal or gamma frailties for random-effect distribution, and it fits shared or multilevel frailty models for correlated survival data. Functions are provided to format and summarize the frailtyHL results. The estimates of fixed effects and frailty parame- ters and their standard errors are calculated. We illustrate the use of our package with three well- known data sets and compare our results with various alternative R-procedures.


Introduction
Frailty models with a non-parametric baseline hazard have been widely adopted for the analysis of survival data (Hougaard, 2000;Duchateau and Janssen, 2008).The frailtyHL package (Ha et al., 2012) implements h-likelihood procedures (Ha et al., 2001;Ha andLee, 2003, 2005) for frailty models.The package fits Cox proportional hazards models (PHMs) with random effects, or frailties.The lognormal or gamma distribution can be adopted as the frailty distribution, corresponding to the normal or log-gamma distribution for the log frailties.H-likelihood obviates the need for marginalization over the frailty distribution, thereby providing a statistically efficient procedure for various random-effect models (Lee et al., 2006).
The main function is frailtyHL().For instance, > frailtyHL(Surv(time, status) ~x + (1|id), + RandDist = "Normal", + mord = 0, dord = 1, + Maxiter = 200, convergence = 10^-6, + varfixed = FALSE, varinit = 0.1) fits a lognormal frailty model.The first argument is a formula object, with the response on the left of a ~operator, and the terms for the fixed and random effects on the right.The response is a survival object as returned by the Surv function (Therneau, 2011).Here, time and status denote survival time and censoring indicator taking value 1 or 0 for uncensored or censored observations, respectively; x denotes a fixed covariate and id denotes the subject identifier.The expression (1|id) specifies a random intercept model ((x|id) would specify a random intercept, random slope model).The parameters mord and dord are the orders of Laplace approximations to fit the mean parameters (mord = 0 or 1) and the dispersion parameters (dord = 1 or 2), respectively.The Maxiter parameter specifies the maximum number of iterations and convergence specifies the tolerance of the convergence criterion.If varfixed is specified as TRUE (or FALSE), the value of one or more of the variance terms for the frailties is fixed (or estimated) with starting value (e.g.0.1) given by varinit.
Previously, frailty models have been implemented in several R functions such as the coxph() function in the survival package (Therneau, 2010) and the coxme() function in the coxme package (Therneau, 2011), based on penalized partial likelihood (PPL), the phmm() function in the phmm package (Donohue and Xu, 2012), based on a Monte Carlo EM (MCEM) method, and the frailtyPenal() function in the frailtypack package (Gonzalez et al., 2012), based on penalized marginal likelihood.The phmm package fits one-component frailty models, although it does allow multivariate frailties.The coxme() function can fit the multi-component model as shown in Example 2. Results from frailtyHL are compared with those from survival, coxme and phmm.
Recently, the h-likelihood procedures of Lee and Nelder (1996) for fitting hierarchical generalized linear models (HGLMs) have been implemented using the hglm() function in the hglm package (Alam et al., 2010), the HGLMfit() function in the HGLMMM package (Molas, 2010) and the dhglmfit() function in the dhglm package (Noh and Lee, 2011).The frailtyHL package is for fitting frailty models with a non-parametric baseline hazard, providing estimates of fixed effects, random effects, and variance components as well as their standard errors.In addition, it provides a statistical test for the variance components of frailties and also three AIC criteria for the model selection.

Frailty models
The frailtyHL package makes it possible to 1. fit models with log-normal and gamma frailty distributions and 2. estimate variance components when the frailty structure is shared or nested.
For illustration, we present two models below and show how to fit these models using frailtyHL() in the Examples section.

One-component frailty models
Suppose that data consist of right censored time-toevent observations from q subjects (or clusters), with The R Journal Vol.4/2, December 2012 ISSN 2073-4859 n i observations each; i = 1, . . ., q.Let T ij be the survival time for the jth observation of the ith subject; j = 1, . . ., n i .Here, n = ∑ i n i is the total sample size and n i is the cluster size.Let C ij be the corresponding censoring time and let be the observable random variables, where I(•) is the indicator function.
Given the common unobserved frailty for the ith subject u i , the conditional hazard function of T ij is of the form where λ 0 (•) is an unspecified baseline hazard function and β = (β 1 , . . ., β p ) T is a vector of regression parameters for the fixed covariates x ij .Here, the term x T ij β does not include an intercept term because of identifiability.Assume that the frailties u i are independent and identically distributed (i.i.d) random variables with a frailty parameter α; the gamma and lognormal frailty models assume gamma and lognormal distributions for u i , respectively.That is, our package allows • gamma frailty with E(u i ) = 1 and var(u i ) = α, and The model represented by ( 1) is known as a shared or one-component frailty model (Hougaard, 2000).

Multi-component frailty models
We can fit a multi-component model as below (Ha et al., 2007): X is the n × p model matrix, Z (r) (r = 1, 2, . . ., k) are the n × q r model matrices corresponding to the q r × 1 frailties v (r) , and v (r) and v (l) are independent for r = l.
For example, the CGD data (Fleming and Harrington, 1991) have a multilevel structure in which patients nested within centers have recurrent infection times.
Later, we analyze these data using model (2) with k = 2. Here, the frailty structures are: where A r = I r (r = 1, 2) are the q r × q r identity matrices, and q 1 and q 2 are the number of centers and patients, respectively.Notice that the corresponding Z (1) to v (1) (or Z (2) to v (2) ) is, respectively, a matrix of indicator variables such that Z (1) st = 1 (or Z (2) st = 1) if observation s is a member of center (or patient) t and 0 otherwise (Therneau and Grambsch, 2000).This is called the multilevel frailty model (Yau, 2001;Ha et al., 2007).

H-likelihood theory
The h-likelihood h (Ha et al., 2001) for frailty model (1) is defined by is the sum of conditional log densities for y ij and δ ij given u i , and is the sum of log densities for v i = log u i with parameter α.Here, η ij = x T ij β + v i is the linear predictor for the hazards, and is the baseline cumulative hazard function.
The functional form of λ 0 (t) in (1) is unknown; following Breslow (1972), we consider Λ 0 (t) to be a step function with jumps at the observed event times: where y (k) is the kth (k = 1, . . ., l) smallest distinct event time among the y ij 's, and λ 0k = λ 0 (y (k) ).However, the dimension of λ 0 = (λ 01 , . . ., λ 0l ) T increases with the sample size n.For inference, Ha et al. (2001) proposed the use of the profile h-likelihood with λ 0 eliminated, h * ≡ h| where does not depend on λ 0 , and are solutions of the estimating equations, ∂h/∂λ 0k = 0, for k = 1, . . ., l.Here, d (k) is the number of events at y (k) and is the risk set at y (k) .Therneau and Grambsch (2000) and Ripatti and Palmgren (2000) proposed an h-likelihood (3), while using the partial likelihood (Breslow, 1974) for 0 .They call it the penalized partial likelihood (PPL) h p , defined by Furthermore, Ha et al. (2001) and Ha et al. (2010) have shown that h * is proportional to the PPL h p because where 1} is a constant that does not depend upon unknown parameters.Notice here that PPL h p does not depend on nuisance parameters λ 0 .Thus, Lee and Nelder's (1996;2001) h-likelihood procedure for HGLMs can be directly used to fit frailty models based on h p (Ha et al., 2010).Lee andNelder (1996, 2001) have proposed the use of the Laplace approximation based on the h-likelihood when the marginal likelihood, m = log{ exp(h)dv}, is hard to obtain.To reduce the bias in small cluster sizes higher-order approximations for the mean (β) and the frailty variance (α) have been developed.The lower-order approximation is computationally efficient, but could have large biases when cluster sizes are small (Ha andLee, 2003, 2005;Ha et al., 2010).The h-likelihood procedures use the Breslow method for handling tied event times, while the PPL procedures allow the Efron method.For estimating β, the h-likelihood methods allow the Laplace approximation p v (h p ) to a marginal partial likelihood m p = log{ exp(h p )dv}, but the PPL procedures do not.For estimating α, the PPL methods use adjusted profile h-likelihoods p v (h p ) and p β,v (h p ) which give maximum partial likelihood (MPL) and restricted maximum partial likelihood (REMPL) estimators, respectively.In contrast, the h-likelihood method uses the restricted partial likelihood (based upon the firstorder Laplace approximation p β,v (h p ) or the secondorder Laplace approximation s β,v (h p )) for REMPL estimators.Here p v (h p ) and p β,v (h p ) are defined as follows:

Review of estimation procedures
where D(h p , v) = −∂ 2 h p /∂v 2 and v solves ∂h p /∂v = 0, which is the first-order Laplace approximation of m p , and Cox and Reid (1987) adjusted profile marginal likelihood, eliminating fixed effects β by conditioning their asymptotic sufficient statistics β, in addition to eliminating random effects v by the first-order Laplace approximation (Ha and Lee, 2003;Lee et al., 2006).The corresponding second-order approximation is To reduce the computational burden Ha et al. (2010) used F(h) instead of F(h p ).We recommend the use of HL(1,1) for the lognormal frailty and of HL(1,2) for the gamma frailty.However, HL(0,1) for the lognormal frailty and HL(0,2) for the gamma frailty often perform well if α is not large.Note that the variance matrices of τ = ( β, v) and α are directly obtained from the Hessian matrices, {−∂ 2 h p /∂τ 2 } −1 and {−∂ 2 p β,v (h p )/∂α 2 } −1 , respectively; the frailtyHL package provides the standard errors (SEs) of α as well as β.For the use of standard errors of v − v, see Lee and Ha (2010), Lee et al (2011) and Ha et al. (2011).
Based on the PPL methods, the coxph() and coxme() functions, respectively, implement REMPL and MPL estimators for α in the lognormal frailty model, and the coxph() function the maximum likelihood (ML) estimators, maximizing the marginal likelihood m, for α in the gamma frailty model.For comparison, we present the Breslow and Efron methods for handling ties in survival times in the coxph() and coxme() functions; Therneau (2010) recommended the Efron method.For the lognormal frailty the ML estimator maximizing m is available via the phmm() function, but care must be taken to ensure that the MCEM algorithm has converged (Donohue and Xu, 2012).However, the ML estimator can be biased when the number of nuisance parameters increases with the sample size (Ha et al., 2010).
Furthermore, for the lognormal frailty the coxph() function uses the existing codes in linear mixed models so that it misses the ∂ v/∂α term in solving the score equation ∂p β,v (h p )/∂α = 0; this can lead to an underestimation of the parameters, especially when the number of subjects q is large or censoring is high (Lee et al., 2006;Ha et al., 2010).To overcome this problem, in gamma frailty models Therneau and Grambsch (2000) develop the code for the ML estimator for α.
Step 3: Repeat Step 2 until the maximum absolute difference between the previous and current estimates for (β, v) and α is less than 10 −6 .
After convergence, we compute the estimates of the standard errors of β and α.

Illustration: kidney infection data
To demonstrate differences of various estimation methods in small cluster size n i ≡ 2, we use the kidney infection data (McGilchrist and Aisbett, 1991).
The data consist of times until the first and second recurrences (n i ≡ 2 ) of kidney infection in 38 (q = 38) patients using a portable dialysis machine.Each survival time (time) is the time until infection since the insertion of the catheter.The survival times for the same patient are likely to be related because of a shared frailty describing the common patient's effect.
The catheter is later removed if infection occurs and can be removed for other reasons, which we regard as censoring; about 24% of the data were censored.
For the gamma frailty, coxph() uses the ML procedure, but it still gives smaller estimate for α than the REMPL (h-likelihood) procedures.Compared with the h-likelihood methods, PPL methods are computationally more efficient, but could have larger biases (Ha et al., 2010).

Potential future developments
The current version of the frailtyHL package allows multi-component (multilevel) frailties.Allowance for correlation (Ha et al., 2011) between random effects, for example correlations between random center effect and random treatment effect, is currently in progress.Other developments include dispersion frailty models based on double HGLMs (Lee and Nelder, 2006), which allow fixed and random effects in dispersion parts (i.e.variance of random effects) of the model as well as the hazard functions, and joint modelling (Ha et al, 2003) of HGLMs and frailty models.

Examples Example 1: Lognormal and gamma frailty models on rat data
The data set presented by Mantel et al. (1977) is based on a tumorigenesis study of 50 (q = 50) litters of female rats.For each litter, one rat was selected to receive the drug and the other two rats were placebotreated controls (n i ≡ 3).Here each litter is treated as a cluster.The survival time (time) is the time to development of tumor, measured in weeks.Death before occurrence of tumor yields a right-censored observation; forty rats developed a tumor, leading to censoring of about 73%.The survival times for rats in a given litter may be correlated due to a random effect representing shared genetic or environmental effects.
We fit frailty models with one covariate, rx (1 = drug; 0 = placebo), using frailtyHL().Below, we present the code and results for the lognormal frailty model with HL(1,1).The output from the R code shows that the effect of rx is significant (t-value = 2.823 with p-value = 0.005).That is, the rx group has a significantly higher risk than in control group.Here, the variance estimate of the frailty is α = 0.427 (with SE = 0.423).
Note that although we report the SE of α, one should not use it for testing the absence of frailty α = 0 (Vaida and Xu, 2000).Such a null hypothesis is on the boundary of the parameter space, so that the critical value of an asymptotic (χ 2 0 + χ 2 1 )/2 distribution is 2.71 at 5% significant level (Lee et al., 2006;Ha et al., 2011).The difference in deviance (based on REMPL) −2p β,v (h p ) between the Cox model without frailty and the lognormal frailty model is 364.15 − 362.56 = 1.59(< 2.71), indicating that the frailty effect is non-significant, i.e. α = 0. Here, the results from fitting the Cox model without frailty are available by adding the two arguments varfixed = TRUE and varinit = 0 in the frailtyHL(): see below.
> library(survival) > data(rats) > frailtyHL (Surv(time,status)  The code and results for the gamma frailty model with HL(1,2) are presented below.The output shows that these results are similar to those of the lognormal frailty, particularly for estimation of β.The deviance difference (based on REMPL) between the Cox model and gamma frailty model using the second-order approximation −2s β,v (h p ) is 364.15 − 362.12 = 2.03(< 2.71), again indicating the absence of frailty effect (i.e.α = 0) as evident in the lognormal frailty model.> frailtyHL(Surv(time,status) ~rx + (1|litter), + data = rats, RandDist = "Gamma", For the selection of a model among nested or non-nested models such as lognormal and gamma frailty models, we may use three Akaike information criteria (AIC) (Lee et al., 2006;Donohue et al., 2011;Ha et al., 2007) based on conditional likelihood, marginal likelihood and restricted likelihood, respectively, defined by where h 0 = * 0 in (4), and is an 'effective degrees of freedom adjustment' for estimating the fixed and random effects, computed using the Hessian matrices Note here that df m is the number of fixed parameters and df r is the number of dispersion parameters (Ha et al., 2007).For calculation of the mAIC and rAIC of gamma frailty model using HL(0,2) or HL(1,2), we use the corresponding second-order approximations, defined by mAIC = −2s v (h p ) + 2df m and rAIC = −2s β,v (h p ) + 2df r .We select a model to minimize the AIC values among models.If the AIC difference is larger than 1 the choice can be made (Sakamoto et al., 1986).However, if the difference is less than 1 a simpler model can be selected by a parsimony principle (Donohue et al., 2011).
In the data set, in the Cox model cAIC=365.69,mAIC=365.69and rAIC=364.15,and in the lognormal frailty model cAIC=362.22,mAIC=366.14and rAIC=364.56,and in the gamma frailty model cAIC=365.30,mAIC=365.71and rAIC=364.12.The likelihood tests based upon the REMPL showed the absence of frailty effect (α = 0), so that mAIC and rAIC of all the three models are similar.Thus, we may choose the parsimonious Cox model.However, the cAIC selects the lognormal model, indicating that this model could give better prediction.

Example 2: Multilevel frailty models on CGD infection data
The CGD data set presented by Fleming and Harrington (1991) consists of a placebo-controlled randomized trial of gamma interferon (rIFN-g) in the treatment of chronic granulomatous disease (CGD).128 patients (id) from 13 centers (q 1 = 13, q 2 = 128) were tracked for around 1 year.The number (i.e.cluster size) of patients per center ranged from 4 to 26.The survival times (tstop-tstart) are the recurrent infection times of each patient from the different centers.Censoring occurred at the last observation for all patients, except one, who experienced a serious infection on the day he left the study; in the CGD study about 63% of the data were censored.The recurrent infection times for a given patient are likely to be correlated.However, each patient belongs to one of the 13 centers; hence, the correlation may also be attributed to a center effect.
Ignoring important random components may render invalid many of the traditional statistical analysis techniques.We fit a multilevel lognormal frailty model with two frailties and a single covariate, treat (rIFN-g, placebo), using frailtyHL().Here, the two frailties are random center and patient effects.The code and results using HL(1,1) are provided below.The output shows that the effect of treatment is significant (t-value = -3.476with p-value < 0.001), indicating that rIFN-g significantly reduces the rate of serious infection in CGD patients.For testing the absence of a random component (α 1 = 0 or α 2 = 0), we use the deviance based on REMPL, −2p β,v (h p ), and fit the following four models including the Cox model and three lognormal frailty models using HL(1,1) method, The deviance difference between M3 and M4 is 692.99 − 692.95 = 0.04 , which is not significant at a 5% level (χ 2 1,0.10 = 2.71), indicating the absence of the random-center effects, i.e. α 1 = 0.The deviance difference between M2 and M4 is 703.66-692.95=10.71,indicating that the random-patient effects are necessary, i.e. α 2 > 0. In addition, the deviance difference between M1 and M3 is 707.48-692.99=14.49,indicating that the random-patient effects are indeed necessary with or without random-center effects.

Comparison of results with alternative procedures
Using the examples in the previous section, we compare the outcomes from frailtyHL and other packages.We consider the three functions ( coxph(), coxme() and phmm()) for the lognormal frailty model and the coxph() function for the gamma frailty model.Table 3 summarizes the estimation results.Even though cluster size n i ≡ 3 is not large, the results are similar.For example, MPL and ML estimates for α from coxme() and phmm() are somewhat different in Table 2 when n i ≡ 2, while they become similar in Table 3.
The R Journal Vol.4/2, December 2012 ISSN 2073-4859 The results of frailtyHL()(HL(0,1), HL(1,1)) and coxme() are summarized in Table 4.The results from HL and PPL methods for frailty parameters become more similar because the cluster sizes (the number of patients from different centers) are somewhat large, ranging from 4 to 26.

Summary
The h-likelihood method offers novel possibilities to fit various models with random effects.The frailtyHL package for frailty models eliminates the nuisance parameters λ 0 in the h-likelihood (3) by profiling.Such models have important applications in multi-center clinical study (Vaida and Xu, 2000), meta analysis (Rondeau et al., 2008), and genetic analysis (Lee et al., 2006).Therefore, this package can be potentially adopted by statisticians in several different fields.

Table 1
shows historical developments of estimating criteria for frailty models.The frailtyHL() function provides estimators based on the h-likelihood.As the orders in mord and dord increase, the bias of estimator is reduced, but the calculation becomes computationally intensive due to the extra terms.

Table 2 :
Comparison of different estimation methods for the kidney infection data

Table 3 :
Comparison of different estimation methods for the rat data

Table 4 :
Comparison of different estimation methods for the CGD data