We present the hglm package for fitting hierarchical generalized linear models. It can be used for linear mixed models and generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects. Fixed effects can also be fitted in the dispersion part of the model.
The hglm package (Alam, L. Ronnegard, and X. Shen 2010) implements the estimation algorithm for hierarchical generalized linear models [HGLM; Lee and J. A. Nelder (1996)]. The package fits generalized linear models [GLM; McCullagh and J. A. Nelder (1989)] with random effects, where the random effect may come from a distribution conjugate to one of the exponential-family distributions (normal, gamma, beta or inverse-gamma). The user may explicitly specify the design matrices both for the fixed and random effects. In consequence, correlated random effects, as well as random regression models can be fitted. The dispersion parameter can also be modeled with fixed effects.
The main function is hglm()
and the input is specified in a similar
manner as for glm()
. For instance,
R> hglm(fixed = y ~ week, random = ~ 1|ID,
family = binomial(link = logit))
fits a logit model for y
with week
as fixed effect and ID
representing the clusters for a normally distributed random intercept.
Given an hglm
object, the standard generic functions are print()
,
summary()
and plot()
.
Generalized linear mixed models (GLMM) have previously been implemented
in several R functions, such as the lmer()
function in the
lme4 package (Bates and M. Maechler 2010) and
the glmmPQL()
function in the
MASS package (Venables and B. D. Ripley 2002). In
GLMM, the random effects are assumed to be Gaussian whereas the hglm()
function allows other distributions to be specified for the random
effect. The hglm()
function also extends the fitting algorithm of the
dglm package (Dunn and G. K. Smyth 2009) by
including random effects in the linear predictor for the mean, i.e. it
extends the algorithm so that it can cope with mixed models. Moreover,
the model specification in hglm()
can be given as a formula or
alternatively in terms of y
, X
, Z
and X.disp
. Here y
is the
vector of observed responses, X
and Z
are the design matrices for
the fixed and random effects, respectively, in the linear predictor for
the means and X.disp
is the design matrix for the fixed effects in the
dispersion parameter. This enables a more flexible modeling of the
random effects than specifying the model by an R formula. Consequently,
this option is not as user friendly but gives the user the possibility
to fit random regression models and random effects with known
correlation structure.
The hglm package produces estimates of fixed effects, random effects and variance components as well as their standard errors. In the output it also produces diagnostics such as deviance components and leverages.
The hglm package makes it possible to
include fixed effects in a model for the residual variance,
fit models where the random effect distribution is not necessarily Gaussian,
estimate variance components when we have correlated random effects.
Below we describe three models that can be fitted using hglm()
, which
illustrate these three points. Later, in the Examples section, five
examples are presented that include the R syntax and output for the
hglm()
function.
We start by considering a normal-normal model with heteroscedastic
residual variance. In biology, for instance, this is important if we
wish to model a random genetic effect (e.g., Rönnegård and Ö. Carlborg 2007) for a trait
For the response
This model cannot be fitted with the
dglm package, for instance,
because we have random effects in the mean part of the model. It is also
beyond the scope of the lmer()
function since we allow a model for the
residual variance.
The implementation in hglm()
for this model is demonstrated in Example
2 in the Examples section below.
For dependent count data it is common to model a Poisson distributed
response with a gamma distributed random effect (Lee, J. A. Nelder, and Y. Pawitan 2006). If we assume no
overdispersion conditional on
This model can also be fitted with the
hglm package, since it
extends existing GLMM functions (e.g. lmer()
) to allow a non-normal
distribution for the random effect. Later on, in Example 3, we show the
hglm()
code used for fitting a gamma-Poisson model with fixed effects
included in the dispersion parameter.
In animal breeding it is important to estimate variance components prior
to ranking of animal performances (Lynch and B. Walsh 1998). In such models the genetic
effect of each animal is modeled as a level in a random effect and the
correlation structure
This may be reformulated as (see Lee, J. A. Nelder, and Y. Pawitan 2006; Rönnegård and Ö. Carlborg 2007)
Thus the model can be fitted using the hglm()
function with a
user-specified input matrix Z
(see R code in Example 4 below).
The fitting algorithm is described in detail in Lee, J. A. Nelder, and Y. Pawitan (2006) and is summarized
as follows. Let
Initialize starting values.
Construct an augmented model with response
Use a GLM to estimate
Use a gamma GLM to estimate
Use a similar GLM as in Step 4 to estimate
Iterate between steps 3-5 until convergence.
For a more detailed description of the algorithm in a particular context, see below.
Let
We follow the notation of Lee and J. A. Nelder (1996), which is based on the GLM
terminology by McCullagh and J. A. Nelder (1989). We also follow the likelihood approach where the
model is described in terms of likelihoods. The conditional
(log-)likelihood for
In this section we describe the fitting algorithm in detail for a linear mixed model where fixed effects are included in the model for the residual variance. The extension to distributions other than Gaussian is described at the end of the section.
Lee and J. A. Nelder (1996) showed that linear mixed models can be fitted using a
hierarchy of GLM by using an augmented linear model. The linear mixed
model
where
The estimates from weighted least squares are given by:
The two variance components are estimated iteratively by applying a
gamma GLM to the residuals
A gamma GLM is used to fit the dispersion part of the model with
response
The algorithm iterates by updating both
For a non-Gaussian response variable
There are two important classes of models that can be fitted in
hglm: GLMM and conjugate
HGLM. GLMMs have Gaussian random effects. Conjugate HGLMs have been
commonly used partly due to the fact that explicit formulas for the
marginal likelihood exist. HGLMs may be used to fit models in survival
analysis (frailty models), where for instance the complementary-log-log
link function can be used on binary responses (see e.g., Carling, L. Rönnegård, and K. Roszbach 2004). The
gamma distribution plays an important role in modeling responses with a
constant coefficient of variation (see Chapter 8 in McCullagh and J. A. Nelder 1989). For such
responses with a gamma distributed random effect we have a gamma-gamma
model. A summary of the most important models is given in Tables
1 and 3. Note that the
random-effect distribution can be an arbitrary conjugate
exponential-family distribution. For the specific case where the
random-effect distribution is a conjugate to the distribution of
Model name | Link |
Link |
||
---|---|---|---|---|
Linear mixed model | Gaussian | identity | Gaussian | identity |
Binomial conjugate | Binomial | logit | Beta | logit |
Binomial GLMM | Binomial | logit | Gaussian | identity |
Binomial frailty | Binomial | comp-log-log | Gamma | log |
Poisson GLMM | Poisson | log | Gaussian | identity |
Poisson conjugate | Poisson | log | Gamma | log |
Gamma GLMM | Gamma | log | Gaussian | identity |
Gamma conjugate | Gamma | inverse | Inverse-Gamma | inverse |
Gamma-Gamma | Gamma | log | Gamma | log |
Model name | Setting for family argument |
Setting for rand.family argument |
---|---|---|
Linear mixed model |
gaussian(link = identity) |
gaussian(link = identity) |
Beta-Binomial | binomial(link = logit) |
Beta(link = logit) |
Binomial GLMM | binomial(link = logit) |
gaussian(link = identity) |
Binomial frailty | binomial(link = cloglog) |
Gamma(link = log) |
Poisson GLMM | poisson(link = log) |
gaussian(link = identity) |
Poisson frailty | poisson(link = log) |
Gamma(link = log) |
Gamma GLMM | Gamma(link = log) |
gaussian(link = identity) |
Gamma conjugate | Gamma(link = inverse) |
inverse.gamma(link = inverse) |
Gamma-Gamma | Gamma(link = log) |
Gamma(link = log) |
hglm() code for a linear mixed model is |
hglm(family = gaussian(link = identity), rand.family = gaussian(link = identity), ...) |
In the current version of hglm()
it is possible to include a single
random effect in the mean part of the model. An important development
would be to include several random effects in the mean part of the model
and also to include random effects in the dispersion parts of the model.
The latter class of models is called Double HGLM and has been shown to
be a useful tool for modeling heavy tailed distributions (Lee and J. A. Nelder 2006).
The algorithm of hglm()
gives true marginal likelihood estimates for
the fixed effects in conjugate HGLM (Lee and J. A. Nelder 1996 629), whereas for
other models the estimates are approximated. Lee and co-workers (see Lee, J. A. Nelder, and Y. Pawitan 2006 and references therein) have developed higher-order
approximations, which are not implemented in the current version of the
hglm package. For such
extensions, we refer to the commercially available GenStat software
(Payne, D. A. Murray, S. A. Harding, D. B. Baird, and D. M. Soutar 2007), the recently available R package
HGLMMM (Molas 2010) and also
to coming updates of hglm.
The output from the hglm()
function for a linear mixed model is
compared to the results from the lme()
function in the
nlme (Pinheiro, D. Bates, S. DebRoy, D. Sarkar, and the R Core team 2009) package
using simulated data. In the simulated data there are five clusters with
20 observations in each cluster. For the mean part of the model, the
simulated intercept value is
Both functions produce the same estimate of the fixed intercept effect
of 0.1473 (s.e. 0.16) and also the same variance component estimates.
The summary.hglm()
function gives the estimate of the variance
component for the random intercept (0.082) as well as the residual
variance (0.84). It also gives the logarithm of the variance component
estimates together with standard errors below the lines
Model estimates for the dispersion term
and
Dispersion model for the random effects
. The lme()
function gives
the square root of the variance component estimates.
The model diagnostics produced by the plot.hglm
function are shown in
Figures 1 and 2. The data are completely
balanced and therefore produce equal leverages (hatvalues) for all
observations and also for all random effects (Figure 1).
Moreover, the assumption of the deviance components being gamma
distributed is acceptable (Figure 2).
The R code and output for this example is as follows:
R> set.seed(123)
R> n.clus <- 5 #No. of clusters
R> n.per.clus <- 20 #No. of obs. per cluster
R> sigma2_u <- 0.2 #Variance of random effect
R> sigma2_e <- 1 #Residual variance
R> n <- n.clus*n.per.clus
R> X <- matrix(1, n, 1)
R> Z <- diag(n.clus)%x%rep(1, n.per.clus)
R> a <- rnorm(n.clus, 0, sqrt(sigma2_u))
R> e <- rnorm(n, 0, sqrt(sigma2_e))
R> mu <- 0
R> y <- mu + Z%*%a + e
R> lmm <- hglm(y = y, X = X, Z = Z)
R> summary(lmm)
R> plot(lmm)
Call:
hglm.default(X = X, y = y, Z = Z)
DISPERSION MODEL
WARNING: h-likelihood estimates through EQL can be biased.
Model estimates for the dispersion term:[1] 0.8400608
Model estimates for the dispersion term:
Link = log
Effects:
Estimate Std. Error
-0.1743 0.1441
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
Dispersion parameter for the random effects
[1] 0.08211
Dispersion model for the random effects:
Link = log
Effects:
Estimate Std. Error
-2.4997 0.8682
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
MEAN MODEL
Summary of the fixed effects estimates
Estimate Std. Error t value Pr(>|t|)
X.1 0.1473 0.1580 0.933 0.353
Note: P-values are based on 96 degrees of freedom
Summary of the random effects estimate
Estimate Std. Error
[1,] -0.3237 0.1971
[2,] -0.0383 0.1971
[3,] 0.3108 0.1971
[4,] -0.0572 0.1971
[5,] 0.1084 0.1971
EQL estimation converged in 5 iterations.
R> #Same analysis with the lme function
R> library(nlme)
R> clus <- rep(1:n.clus,
+ rep(n.per.clus, n.clus))
R> summary(lme(y ~ 0 + X,
+ random = ~ 1 | clus))
Linear mixed-effects model fit by REML
Data: NULL
AIC BIC logLik
278.635 286.4203 -136.3175
Random effects:
Formula: ~1 | clus
(Intercept) Residual
StdDev: 0.2859608 0.9166
Fixed effects: y ~ 0 + X
Value Std.Error DF t-value p-value
X 0.1473009 0.1573412 95 0.9361873 0.3516
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.5834807 -0.6570612 0.0270673 0.6677986 2.1724148
Number of Observations: 100
Number of Groups: 5
Here, a heteroscedastic residual variance is added to the simulated data
from the previous example. Given the explanatory variable
R> beta.disp <- 1
R> X_d <- matrix(1, n, 2)
R> X_d[,2] <- rbinom(n, 1, .5)
R> colnames(X_d) <- c("Intercept", "x_d")
R> e <- rnorm(n, 0,
+ sqrt(sigma2_e*exp(beta.disp*X_d[,2])))
R> y <- mu + Z%*%a + e
R> summary(hglm(y = y, X = X, Z = Z,
+ X.disp = X_d))
Call:
hglm.default(X = X, y = y, Z = Z, X.disp = X_d)
DISPERSION MODEL
WARNING: h-likelihood estimates through EQL can be biased.
Model estimates for the dispersion term:
Link = log
Effects:
Estimate Std. Error
Intercept -0.3225 0.2040
x_d 1.4744 0.2881
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
Dispersion parameter for the random effects
[1] 0.1093
Dispersion model for the random effects:
Link = log
Effects:
Estimate Std. Error
-2.2135 0.8747
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
MEAN MODEL
Summary of the fixed effects estimates
Estimate Std. Error t value Pr(>|t|)
X.1 -0.0535 0.1836 -0.291 0.771
Note: P-values are based on 96 degrees of freedom
Summary of the random effects estimate
Estimate Std. Error
[1,] 0.0498 0.2341
[2,] -0.2223 0.2276
[3,] 0.4404 0.2276
[4,] -0.1786 0.2276
[5,] -0.0893 0.2296
EQL estimation converged in 5 iterations.
We simulate a Poisson model with random effects and estimate the
parameter in the dispersion term for an explanatory variable
R> u <- rgamma(n.clus,1)
R> eta <- exp(mu + Z%*%u)
R> y <- rpois(length(eta), eta)
R> gamma.pois <- hglm(y = y, X = X, Z = Z,
+ X.disp = X_d,
+ family = poisson(
+ link = log),
+ rand.family =
+ Gamma(link = log))
R> summary(gamma.pois)
Call:
hglm.default(X = X, y = y, Z = Z,
family = poisson(link = log),
rand.family = Gamma(link = log), X.disp = X_d)
DISPERSION MODEL
WARNING: h-likelihood estimates through EQL can be biased.
Model estimates for the dispersion term:
Link = log
Effects:
Estimate Std. Error
Intercept -0.0186 0.2042
x_d 0.4087 0.2902
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
Dispersion parameter for the random effects
[1] 1.926
Dispersion model for the random effects:
Link = log
Effects:
Estimate Std. Error
0.6556 0.7081
Dispersion = 1 is used in Gamma model on deviances
to calculate the standard error(s).
MEAN MODEL
Summary of the fixed effects estimates
Estimate Std. Error t value Pr(>|t|)
X.1 2.3363 0.6213 3.76 0.000293
---
Note: P-values are based on 95 degrees of freedom
Summary of the random effects estimate
Estimate Std. Error
[1,] 1.1443 0.6209
[2,] -1.6482 0.6425
[3,] -2.5183 0.6713
[4,] -1.0243 0.6319
[5,] 0.2052 0.6232
EQL estimation converged in 3 iterations.
The data consists of 2025 individuals from two generations where 1000
individuals have observed trait values http://www.qtlmas2009.wur.nl/UK/Dataset
We fitted a model with a fixed intercept and a random animal effect,
The R code for this example is given below.
R> data(QTLMAS)
R> y <- QTLMAS[,1]
R> Z <- QTLMAS[,2:2026]
R> X <- matrix(1, 1000, 1)
R> animal.model <- hglm(y = y, X = X, Z = Z)
R> print(animal.model)
Call:
hglm.default(X = X, y = y, Z = Z)
Fixed effects:
X.1
7.279766
Random effects:
[1] -1.191733707 1.648604776 1.319427376 -0.928258503
[5] -0.471083317 -1.058333534 1.011451565 1.879641994
[9] 0.611705900 -0.259125073 -1.426788944 -0.005165978
...
Dispersion parameter for the mean model:[1] 2.211169
Dispersion parameter for the random effects:[1] 1.502516
EQL estimation converged in 2 iterations
The seed germination data presented by Crowder (1978) has previously been
analyzed using a binomial GLMM (Breslow and D. G. Clayton 1993) and a binomial-beta HGLM
(Lee and J. A. Nelder 1996). The data consists of 831 observations from 21
germination plates. The effect of seed variety and type of root extract
was studied in a fix.disp = 1
in hglm()
produces comparable estimates to the ones obtained by Lee and Nelder
(with differences hglm()
. The output from the R code given
below gives
R> data(seeds)
R> germ <- hglm(
+ fixed = r/n ~ extract*I(seed=="O73"),
+ weights = n, data = seeds,
+ random = ~1|plate, family = binomial(),
+ rand.family = Beta(), fix.disp = 1)
R> summary(germ)
Call:
hglm.formula(family = binomial(), rand.family = Beta(),
fixed = r/n ~ extract * I(seed == "O73"),
random = ~1 | plate, data = seeds,
weights = n, fix.disp = 1)
DISPERSION MODEL
WARNING: h-likelihood estimates through EQL can be biased.
Model estimates for the dispersion term:[1] 1
Model estimates for the dispersion term:
Link = log
Effects:
[1] 1
Dispersion = 1 is used in Gamma model on deviances to
calculate the standard error(s).
Dispersion parameter for the random effects
[1] 0.02483
Dispersion model for the random effects:
Link = log
Effects:
Estimate Std. Error
-3.6956 0.5304
Dispersion = 1 is used in Gamma model on deviances to
calculate the standard error(s).
MEAN MODEL
Summary of the fixed effects estimates
Estimate Std. Error t value
(Intercept) -0.5421 0.1928 -2.811
extractCucumber 1.3386 0.2733 4.898
I(seed == "O73")TRUE 0.0751 0.3114 0.241
extractCucumber:I(seed=="O73") -0.8257 0.4341 -1.902
Pr(>|t|)
(Intercept) 0.018429
extractCucumber 0.000625
I(seed == "O73")TRUE 0.814264
extractCucumber:I(seed=="O73") 0.086343
---
Note: P-values are based on 10 degrees of freedom
Summary of the random effects estimate
Estimate Std. Error
[1,] -0.2333 0.2510
[2,] 0.0085 0.2328
...
[21,] -0.0499 0.2953
EQL estimation converged in 7 iterations.
The hierarchical generalized linear model approach offers new possibilities to fit generalized linear models with random effects. The hglm package extends existing GLMM fitting algorithms to include fixed effects in a model for the residual variance, fits models where the random effect distribution is not necessarily Gaussian and estimates variance components for correlated random effects. For such models there are important applications in, for instance: genetics (Noh, B. Yip, Y. Lee, and Y. Pawitan 2006), survival analysis (Ha and Y. Lee 2005), credit risk modeling (Alam and K. Carling 2008), count data (Lee, J. A. Nelder, and Y. Pawitan 2006) and dichotomous responses (Noh and Y. Lee 2007). We therefore expect that this new package will be of use for applied statisticians in several different fields.
hglm, lme4, MASS, dglm, HGLMMM, nlme
ChemPhys, Distributions, Econometrics, Environmetrics, Finance, MixedModels, NumericalMathematics, OfficialStatistics, Psychometrics, Robust, Spatial, SpatioTemporal, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
http://www.qtlmas2009.wur.nl/UK/Dataset
[↩]Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
ård, et al., "hglm: A Package for Fitting Hierarchical Generalized Linear Models", The R Journal, 2010
BibTeX citation
@article{RJ-2010-009, author = {ård, Lars and Shen, Xia and Alam, Moudud}, title = {hglm: A Package for Fitting Hierarchical Generalized Linear Models}, journal = {The R Journal}, year = {2010}, note = {https://rjournal.github.io/}, volume = {2}, issue = {2}, issn = {2073-4859}, pages = {20-28} }