Bootstrapping Clustered Data in R using lmeresampler

Linear mixed-effects models are commonly used to analyze clustered data structures. There are numerous packages to fit these models in R and conduct likelihood-based inference. The implementation of resampling-based procedures for inference are more limited. In this paper, we introduce the lmeresampler package for bootstrapping nested linear mixed-effects models fit via lme4 or nlme. Bootstrap estimation allows for bias correction, adjusted standard errors and confidence intervals for small samples sizes and when distributional assumptions break down. We will also illustrate how bootstrap resampling can be used to diagnose this model class. In addition, lmeresampler makes it easy to construct interval estimates of functions of model parameters.

Adam Loy (Carleton College) , Jenna Korobova (Carleton College)


Clustered data structures occur in a wide range of studies. For example, students are organized within classrooms, schools, and districts, imposing a correlation structure that must be accounted for in the modeling process. Similarly, cholesterol measurements could be tracked across time for a number of subjects, resulting in measurements being grouped by subject. Other names for clustered data structures include grouped, nested, multilevel, hierarchical, longitudinal, repeated measurements, and blocked data. The covariance structure imposed by clustered data is commonly modeled using linear mixed-effects (LME) models, also referred to as hierarchical linear or multilevel linear models (Pinheiro and Bates 2000; Raudenbush and Bryk 2002; Goldstein 2011).

In this paper, we restrict attention to the Gaussian response LME model for clustered data structures. For cluster \(i=1, \ldots, g\), this model is expressed as \[\begin{equation} \underset{(n_i \times 1)}{\boldsymbol{y}_i} = \underset{(n_i \times p)}{\boldsymbol{X}_i} \ \underset{(p \times 1)}{\boldsymbol{\beta}} + \underset{(n_i \times q)}{\boldsymbol{Z}_i} \ \underset{(q \times 1)}{\boldsymbol{b}_i} + \underset{(n_i \times 1)}{\boldsymbol{\varepsilon}_i}, \tag{1} \end{equation}\] where \[\begin{equation} \boldsymbol{\varepsilon}_i \sim \mathcal{N}(\boldsymbol{0}, \sigma^2 \boldsymbol{I}_{n_i}), \boldsymbol{b}_i \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{D}), \tag{2} \end{equation}\] where \(\boldsymbol{\varepsilon}_i\) and \(\boldsymbol{b}_i\) are independent for all \(i\), \(\boldsymbol{\varepsilon}_i\) is independent of \(\boldsymbol{\varepsilon}_j\) for \(i\ne j\), and \(\boldsymbol{b}_i\) is independent of \(\boldsymbol{b}_j\) for \(i\ne j\). Here, \(\boldsymbol{0}\) denotes a vector of zeros of length \(n_i\) (the number of observations in group \(i\)), \(\boldsymbol{I}_{n_i}\) denotes the \(n_i\)-dimensional identity matrix, and \(\boldsymbol{D}\) is a \(q \times q\) covariance matrix.

In R, the two most popular packages to fit LME models are nlme (Pinheiro et al. 2017) and lme4 (Bates et al. 2015). Both packages fit LME models using either maximum likelihood or restricted maximum likelihood methods. These methods rely on the distributional assumptions placed on the residual quantities, \(\boldsymbol{\varepsilon}_i\) and \(\boldsymbol{b}_i\), as well as large enough sample sizes. While some aspects of LME models are quite robust to model misspecification, others are more sensitive, leading to biased estimates and/or incorrect standard errors. Jacqmin-Gadda et al. (2007) found that inference for the fixed effects is robust if the distribution of the error terms, \(\boldsymbol{\varepsilon}_i\), is non-normal or heteroscedastic, but that variance components and random effects are biased if the covariance structure for the error terms is misspecified. The fixed intercept is not robust to misspecification of the random effects (Hui et al. 2021). In addition, misspecification of the random effects distribution can lead to biased estimates of the variance components and undercoverage of the confidence intervals (Hui et al. 2021). In cases where distributional assumptions are suspect, bootstrapping provides an alternative inferential approach that leads to consistent, bias-corrected parameter estimates, standard errors, and confidence intervals. Standard errors and confidence intervals for functions of model parameters are also easily calculated using a bootstrap procedure, and are available even in situations where closed-form solutions are not.

A variety of bootstrap procedures for clustered data and the LME model have been proposed and investigated, including the cases (non-parametric) bootstrap, the residual bootstrap, the parametric bootstrap, the random effect block (REB) bootstrap, and the wild bootstrap (Morris 2002; Carpenter et al. 2003; Field and Welsh 2007; Chambers and Chandra 2013; Modugno and Giannerini 2015). Van der Leeden et al. (2008) provide a thorough overview of bootstrapping LME models. Sánchez-Espigares and Ocaña (2009) developed a full-featured bootstrapping framework for LME models in R; however, this package is not available and there appears to be no active development. Consequently, R users must pick and choose packages based on what bootstrap procedure they wish to implement. The parametric bootstrap is available in lme4 via the bootMer() function, as is the semi-parametric (residual) bootstrap proposed by Morris (2002). The simulateY() function in nlmeU (Galecki and Burzykowski 2013) makes it easy to simulate values of the response variable for lme model objects; however, the user is required to implement the remainder of the parametric bootstrap. Chambers and Chandra (2013) made R code available to implement their REB bootstrap as well as the parametric bootstrap and the residual bootstrap proposed by Carpenter et al. (2003) for LME models fix via nlme::lme(). Unfortunately, these functions were not published on CRAN or extended to models fit via lme4::lmer(). Bootstrap procedures for specific inferential tasks have also been implemented. The parametric bootstrap has been implemented to carry out likelihood ratio tests for the presence of variance components in RLRsim (Scheipl et al. 2008). pbkrtest (Halekoh and Højsgaard 2014) provides a Kenward-Roger approximation for performing F-tests using a parametric bootstrap approach for LME models and generalized LME models. rptR uses the parametric bootstrap to estimate repeatabilities for LME models (Stoffel et al. 2017). Finally, constrained inference for LMEs using the residual bootstrap is implemented in CLME (Farnan et al. 2014).

In this paper, we introduce the lmeresampler package which implements a suite of bootstrap methods for LME models fit using either nlme or lme4 in a single package. lmeresampler provides a unified bootstrap() command to reduce cognitive load on the user and also provides access to procedures that were not previously available on CRAN. In the next section, we will clarify some notation and terminology for LME models. In Bootstrap procedures for clustered data, we provide an overview of the bootstrap methods implemented in lmeresampler. We then give an Overview of lmeresampler, discuss a variety of Example applications for LME models, and show how to Bootstrapping in parallel works in lmeresampler. We conclude with a Summary and highlight areas for future development.

Bootstrap procedures for clustered data

In lmeresampler, we implement five bootstrap procedures for clustered data: a cases (i.e., nonparamteric) bootstrap, a residual bootstrap (i.e., semiparametric), a parametric bootstrap, a wild bootstrap, and a block bootstrap. In this section, we provide an overview of these bootstrap approaches. Our discussion focuses on two-level models, but procedures generalize to higher-level models unless otherwise noted.

The cases bootstrap

The cases bootstrap is a fully non-parametric bootstrap that resamples the clusters in the data set to generate bootstrap resamples. Depending on the nature of the data, this resampling can be done only for the top-level cluster, only at the observation-level within a cluster, or at both levels. The choice of the exact resampling scheme should be dictated by the way the data were generated, since the cases bootstrap generates new data by mimicking this process. Van der Leeden et al. (2008) provide a cogent explanation of how to select a resampling scheme. To help ground the idea of resampling, consider a two-level hierarchical data set where students are organized into schools.

One version of the cases bootstrap is implemented by only resampling the clusters. This version of the bootstrap is what Field and Welsh (2007) term the cluster bootstrap and Goldstein (2011) term the non-parametric bootstrap. We would choose this resampling scheme, for example, if schools were chosen at random and then all students within each school were observed. In this case, the bootstrap proceeds as follows:

  1. Draw a random sample, with replacement, of size \(g\) from the clusters.
  2. For each selected cluster, \(k\), extract all of the cases to form the bootstrap sample \(\left(\boldsymbol{y}^*_k, \boldsymbol{X}^*_k, \boldsymbol{Z}^*_k \right)\). Since entire clusters are sampled, the total sample size may differ from the that of the original data set.
  3. Refit the model to the bootstrap sample and extract the parameter estimates of interest.
  4. Repeat steps 1–3 \(B\) times.

An alternative version of the cases bootstrap only resamples the observations within clusters, which makes sense in our example if the schools were fixed and students were randomly sampled within schools.

  1. For each cluster \(i=1,\ldots,g\), draw a random sample of the rows for cluster \(i\), with replacement, to form the bootstrap sample \(\left(\boldsymbol{y}^*_i, \boldsymbol{X}^*_i, \boldsymbol{Z}^*_i \right)\).
  2. Refit the model to the bootstrap sample and extract the parameter estimates of interest.
  3. Repeat steps 1–2 \(B\) times.

A third version of the cases bootstrap resamples both clusters and cases within clusters. This is what Field and Welsh (2007) term the two-state bootstrap. We would choose this resampling scheme if both schools and students were sampled during the data collection process.

All three versions of the case bootstrap are implemented in lmeresampler. We explain how the resampling is specified in our Overview of lmeresampler. Regardless of which version of the cases bootstrap you choose, it requires the weakest conditions: it only requires that the hierarchical structure in the data set is correctly specified.

The parametric bootstrap

The parametric bootstrap simulates random effects and error terms from the fitted distributions to form bootstrap resamples. Consequently, it requires the strongest conditions—that is, the parametric bootstrap assumes that the model, as specified by Equations (1) and (2), is correctly specified.

Let \(\widehat{\boldsymbol \beta}\), \(\widehat{\boldsymbol D}\), and \(\widehat{\sigma}^2\) be maximum likelihood or restricted maximum likelihood estimates of the fixed effects and variance components from the fitted model. The parametric bootstrap is then implemented through the following steps:

  1. Simulate \(g\) error term vectors, \(\boldsymbol{e}_i^*\), of length \(n_i\) from \(\mathcal{N}\left(\boldsymbol{0},\widehat{\sigma}^2 \boldsymbol{I}_{n_i} \right)\).
  2. Simulate \(g\) random effects vectors, \(\boldsymbol{b}_i^*\), from \(\mathcal{N}\left(\boldsymbol{0},\widehat{\boldsymbol{D}} \right)\).
  3. Generate bootstrap responses \(\boldsymbol{y}^*_i = \boldsymbol{X}_i \widehat{\boldsymbol{\beta}} + \boldsymbol{Z}_i \boldsymbol{b}_i^* + \boldsymbol{e}_i^*\).
  4. Refit the model to the bootstrap responses and extract the parameter estimates of interest.
  5. Repeat steps 2–4 \(B\) times.

The residual bootstrap

The residual bootstrap resamples the residual quantities from the fitted LME model in order to generate bootstrap resamples. There are three general types of residuals for LME models (Haslett and Haslett 2007; Singer et al. 2017). correspond to the errors made using the marginal predictions, \(\widehat{\boldsymbol r}_i = \boldsymbol{y_i} - \widehat{\boldsymbol{y}}_i = \boldsymbol{y}_i - \boldsymbol{X}_i \widehat{\boldsymbol{\beta}}\). correspond to the errors made using predictions conditioned on the clusters, \(\widehat{\boldsymbol{e}}_i = \boldsymbol{y}_i - \widehat{\boldsymbol{y}}_i| \boldsymbol{b}_i = \boldsymbol{y}_i - \boldsymbol{X}_i \widehat{\boldsymbol{\beta}} - \boldsymbol{Z}_i \widehat{\boldsymbol{b}}_i\). The are the last type of residual quantity and are defined as \(\widehat{\boldsymbol{b}}_i = \widehat{\boldsymbol{D}} \boldsymbol{Z}_i^\prime \widehat{\boldsymbol{V}}_i^{-1}\left(\boldsymbol{y}_i - \boldsymbol{X}_i \widehat{\boldsymbol{\beta}} \right)\) where \(\widehat{\boldsymbol{V}}_i = \boldsymbol{Z}_i \widehat{\boldsymbol{D}}\boldsymbol{Z}_i^\prime + \widehat{\sigma}^2 \boldsymbol{I}_{n_i}\).

A naive implementation of the residual bootstrap would draw random samples, with replacement, from the estimated conditional residuals and the best linear unbiased predictors (BLUPS); however, this will consistently underestimate the variability in the data because the residuals are shrunken toward zero (Morris 2002; Carpenter et al. 2003). Carpenter et al. (2003) solve this problem by “reflating” the residual quantities so that the empirical covariance matrices match the estimated covariance matrices prior to resampling:

  1. Fit the model and calculate the empirical BLUPs, \(\widehat{\boldsymbol{b}}_i\), and the predicted conditional residuals, \(\widehat{\boldsymbol{e}}_i\).

  2. Mean center each residual quantity and reflate the centered residuals. Only the process to reflate the predicted random effects is discussed below, but the process is analogous for the conditional residuals.

    1. Arrange the random effects into a \(g \times q\) matrix, where each row contains the predicted random effects from a single group. Denote this matrix as \(\widehat{\boldsymbol{U}}\). Define \(\boldsymbol{\widehat{\Gamma}} = \boldsymbol{I}_g \otimes \widehat{\boldsymbol{D}}\), the block diagonal covariance matrix of \(\widehat{\boldsymbol{U}}\).
    2. Calculate the empirical covariance matrix as \(\boldsymbol{S} = \widehat{\boldsymbol{U}}^\prime \widehat{\boldsymbol{U}} / g\). We follow the approach of Carpenter et al. (2003), dividing by the number of groups, \(g\), rather than \(g-1\).
    3. Find a transformation of \(\widehat{\boldsymbol{U}}\), \(\widehat{\boldsymbol{U}}^* = \widehat{\boldsymbol{U}} \boldsymbol{A}\), such that \(\widehat{\boldsymbol{U}}^{*\prime} \widehat{\boldsymbol{U}}^* / g = \boldsymbol{\widehat{\Gamma}}\). Specifically, we will find \(\boldsymbol{A}\) such that \(\boldsymbol{A}^\prime \widehat{\boldsymbol{U}}^\prime \widehat{\boldsymbol{U}} \boldsymbol{A} / g = \boldsymbol{A}^\prime \boldsymbol{SA} = \boldsymbol{\widehat{\Gamma}}\). The choice of \(\boldsymbol{A}\) is not unique, so we use the recommendation given by Carpenter et al. (2003): \(\boldsymbol{A} = \left(\boldsymbol{L}_D \boldsymbol{L}_S^{-1} \right)^\prime\) where \(\boldsymbol{L}_D\) and \(\boldsymbol{L}_S\) are the Cholesky factors of \(\boldsymbol{\widehat{\Gamma}}\) and \(\boldsymbol{S}\), respectively.
  3. Draw a random sample, with replacement, from the set \(\lbrace \boldsymbol{u}_i^* \rbrace\) of size \(g\), where \(\boldsymbol{u}_i^*\) is the \(i\)th row of the centered and reflated random effects matrix, \(\widehat{\boldsymbol{U}}^*\).

  4. Draw \(g\) random samples, with replacement, of sizes \(n_i\) from the set of the centered and reflated conditional residuals, \(\lbrace \boldsymbol{e}_i^* \rbrace\).

  5. Generate the bootstrap responses, \(\boldsymbol{y}^*_i\), using the fitted model equation: \(\boldsymbol{y}^*_i = \boldsymbol{X}_i \widehat{\boldsymbol{\beta}} + \boldsymbol{Z}_i \widehat{\boldsymbol{u}}^*_i + \boldsymbol{e}_i^*\).

  6. Refit the model to the bootstrap responses and extract the parameter estimates of interest.

  7. Repeat steps 3–6 B times.

Notice that the residual bootstrap is a semiparametric bootstrap, since it depends on the model structure (both the mean function and the covariance structure) but not the distributional conditions (Morris 2002).

The random-effects block bootstrap

Another semiparametric bootstrap is the random effect block (REB) bootstrap proposed by Chambers and Chandra (2013). The REB bootstrap can be viewed as a version of the residual bootstrap where conditional residuals are resampled from within clusters (i.e., blocks) to allow for weaker assumptions on the covariance structure of the residuals. The residual bootstrap requires that the conditional residuals are independent and identically distributed, whereas the REB bootstrap relaxes this to only require that the covariance structure of the error terms is similar across clusters. In addition, the REB bootstrap utilizes the marginal residuals to calculate non-parametric predicted random effects rather than relying on the model-based empirical best linear unbiased predictors (EBLUPS). Chambers and Chandra (2013) developed three versions of the REB bootstrap, all of which have been implemented in lmeresampler. We refer the reader to Chambers and Chandra (2013) for a discussion of when each should be used. It’s important to note that at the time of this writing, that the REB bootstrap has only been explored and implemented for use with two-level models.


The base algorithm for the REB bootstrap (also known as REB/0) is as follows:

  1. Calculate non-parametric residual quantities for the model.
    1. Calculate the marginal residuals for each group, \(\widehat{\boldsymbol{r}}_i = \boldsymbol{y}_i - \boldsymbol{X}_i \widehat{\boldsymbol{\beta}}\).
    2. Calculate the non-parametric predicted random effects, \(\tilde{\boldsymbol{b}}_i = \left( \boldsymbol{Z}_i^\prime \boldsymbol{Z}_i \right)^{-1} \boldsymbol{Z}^\prime_i \boldsymbol{r}_i\).
    3. Calculate the non-parametric conditional residuals using the residuals quantities obtained in the previous two steps, \(\boldsymbol{\tilde{e}}_i = \widehat{\boldsymbol{r}}_i - \boldsymbol{Z}_i \boldsymbol{\tilde{b}}_i\).
  2. Take a random sample, with replacement, of size \(g\) from the set \(\lbrace \tilde{\boldsymbol{b}}_i \rbrace\). Denote these resampled random effects as \(\boldsymbol{\tilde{b}}^*_i\).
  3. Take a random sample, with replacement, of size \(g\) from the cluster ids. For each sampled cluster, draw a random sample, with replacement, of size \(n_i\) from that cluster’s vector of error terms, \(\boldsymbol{\tilde{e}}_i\).
  4. Generate bootstrap responses, \(\boldsymbol{y}^*_i\), using the fitted model equation: \(\boldsymbol{y}^*_i = \boldsymbol{X}_i \widehat{\beta} + \boldsymbol{Z}_i \boldsymbol{\tilde{b}}^*_i + \boldsymbol{\tilde{e}}^*_i\).
  5. Refit the model to the bootstrap sample and extract the parameter estimates of interest.
  6. Repeat steps 2–5 \(B\) times.


The first variation of the REB bootstrap (REB/1) zero centers and reflates the residual quantities prior to resampling in order to satisfy the conditions for consistency (Shao and Tu 1995). This is the same process outlined in Step 2 of the residual bootstrap outlined above.


The second variation of the REB bootstrap (REB/2 or postscaled REB) addresses two issues: potential non-zero covariances in the joint bootstrap distribution of the variance components and potential bias in the parameter estimates. After the REB/0 algorithm is run, the following post processing is performed:

Uncorrelate the variance components.

To uncorrelate the bootstrap estimates of the variance components produced by REB/0, Chambers and Chandra (2013) propose the following procedure:

  1. Apply natural logarithms to the bootstrap distribution of each variance component and form the following matrices. Note that we use \(\nu\) to denote the total number of variance components.

    • \(\boldsymbol{S}^*\): a \(B \times \nu\) matrix formed by binding the columns of these distributions together
    • \(\boldsymbol{M}^*\): a \(B \times \nu\) matrix where each column contains the column mean from the corresponding column in \(\boldsymbol{S}^*\)
    • \(\boldsymbol{D}^*\): a \(B \times \nu\) matrix where each column contains the column standard deviation from the corresponding column in \(\boldsymbol{S}^*\)
  2. Calculate \(\boldsymbol{C}^*=\mathrm{cov} \left( \boldsymbol{S}^* \right)\).

  3. Calculate \(\boldsymbol{L}^* = \boldsymbol{M}^* + \lbrace \left(\boldsymbol{S}^* - \boldsymbol{M}^*\right)\boldsymbol{C}^{*-1/2} \rbrace \circ \boldsymbol{D}^*\), where \(\circ\) denotes the Hadamard (elementwise) product.

  4. Exponentiate the elements of \(\boldsymbol{L}^*\). The columns of \(\boldsymbol{L}^*\) are then uncorrelated versions of the bootstrap variance components.

Center the bootstrap estimates at the original estimate values.

To correct bias in the estimation of the fixed effects, apply a mean correction. For each parameter estimate, \(\widehat{\beta}_k\), adjust the bootstrapped estimates, \(\widehat{\boldsymbol{\beta}}_k^*\) as follows: \(\widetilde{\boldsymbol{\beta}}_k^{*} = \widehat{\beta}_k \boldsymbol{1}_{B} + \widehat{\boldsymbol{\beta}}_k^* -{\rm avg} \left( \widehat{ \boldsymbol{\beta} }_k^* \right)\). To correct bias in the estimation of the variance components, apply a ratio correction. For each estimated variance component , \(\widehat{\sigma}^2_v\), adjust the uncorrelated bootstrapped estimates, \(\widehat{\boldsymbol{\sigma}}_v^{2*}\) as follows: \(\widetilde{\boldsymbol{\sigma}}_{v}^{2*} = \widehat{\boldsymbol{\sigma}}_v^{2*} \circ \lbrace \widehat{\sigma}^2_v / {\rm avg} \left( \widehat{\boldsymbol{\sigma}}_v^{2*} \right) \rbrace\)

The wild bootstrap

The wild bootstrap also relaxes the assumptions made on the error terms of the model, allowing heteroscedasticity both within and across groups. The wild bootstrap is well developed for the ordinary regression model (Liu 1988; Flachaire 2005; Davidson and Flachaire 2008) and Modugno and Giannerini (2015) adapt it for the nested LME model.

To begin, we can re-express model (1) as

\[\begin{equation} \boldsymbol{y}_i = \boldsymbol{X}_i \boldsymbol{\beta} + \boldsymbol{v}_i, \text{ where } \boldsymbol{v}_i = \boldsymbol{Z}_i \boldsymbol{b}_i + \boldsymbol{\varepsilon}_i. \tag{3} \end{equation}\]

The wild bootstrap proceeds as follows:

  1. Draw a random sample, \(w_1, w_2, \ldots, w_g\), from an auxiliary distribution with mean zero and unit variance.

  2. Generate bootstrap responses using the re-expressed model equation (3): \(\boldsymbol{y}^*_i = \boldsymbol{X}_i \widehat{\boldsymbol{\beta}} + \tilde{\boldsymbol{v}}_i w_j\), where \(\tilde{\boldsymbol{v}}_i\) is a heteroscedasticity consistent covariance matrix estimator. Modugno and Giannerini (2015) suggest using what Flachaire (2005) calls \({{\rm HC}_2}\) or \({{\rm HC}_3}\) in the regression context:

    \[\begin{align} {{\rm HC}_2}&: \tilde{\boldsymbol{v}}_i = {\rm diag} \left( \boldsymbol{I}_{n_i} - \boldsymbol{H}_i \right)^{-1/2} \circ \boldsymbol{r}_i\\ {{\rm HC}_3}&: \tilde{\boldsymbol{v}}_i = {\rm diag} \left( \boldsymbol{I}_{n_i} - \boldsymbol{H}_i \right) \circ \boldsymbol{r}_i, \end{align}\]

    where \(\boldsymbol{H}_i = \boldsymbol{X}_i \left(\boldsymbol{X}_i^\prime \boldsymbol{X}_i \right)^\prime \boldsymbol{X}_i^\prime\), the \(i\)th diagonal block of the orthogonal projection matrix, and \(\boldsymbol{r}_i\) is the vector of marginal residuals for group \(i\).

  3. Refit the model to the bootstrap sample and extract the parameter estimates of interest.

  4. Repeat steps 1–3 \(B\) times.

Five options for the auxiliary distribution are implemented in lmeresampler:

Modugno and Giannerini (2015) found the Mammen distribution to be preferred based on a Monte Carlo study, but only compared it to the Rademacher distribution.

Overview of lmeresampler

The lmeresampler package implements the five bootstrap procedures outlined in the previous section for Gaussian response LME models for clustered data structures fit using either nlme (Pinheiro et al. 2017) or lme4. The workhorse function in lmeresampler is

bootstrap(model, .f, type, B, resample = NULL, reb_type = NULL, hccme, aux.dist,
          orig_data, .refit)

The four required parameters to are:

There are also five optional parameters: resample, reb_type, hccme, aux.dist, and orig_data.

The bootstrap() function is a generic function that calls functions for each type of bootstrap. The user can call the specific bootstrap function directly if preferred. An overview of the specific bootstrap functions is given in Table 1.

Table 1: Summary of the specific bootstrap functions called by bootstrap() and their required arguments.
Bootstrap Function name Required arguments
Cases case_bootstrap model, .f, type, B, resample, orig_data, .refit
Residual resid_bootstrap model, .f, type, B, orig_data, .refit
REB reb_bootstrap model, .f, type, B, reb_type, orig_data, .refit
Wild wild_bootstrap model, .f, type, B, hccme, aux.dist, orig_data, .refit
Parametric parametric_bootstrap model, .f, type, B, orig_data, .refit

Each of the specific bootstrap functions performs four general steps:

  1. Setup. Key information (parameter estimates, design matrices, etc.) is extracted from the fitted model to eliminate repetitive actions during the resampling process.

  2. Resampling. The setup information is passed to an internal resampler function to generate the B bootstrap samples.

  3. Refitting. The model is refit for each of the bootstrap samples and the specified parameters are extracted/calculated.

  4. Clean up. An internal completion function formats the original and bootstrapped quantities to return a list to the user.

Each function returns an object of class lmeresamp, which is a list with elements outlined in Table 2. print(), summary(), plot(), and confint() methods are available for lmeresamp objects.

Table 2: Summary of the specific bootstrap functions called by bootstrap() and their required arguments.
Element Description
observed values for the original model parameter estimates.
model the original fitted model object.
.f the function call defining the parameters of interest.
replicates a \(B \times p\) tibble containing the bootstrapped quantities. Each column contains a single bootstrap distribution.
stats a tibble containing the observed, rep.mean (bootstrap mean), se (bootstrap standard error), and bias values for each parameter.
B the number of bootstrap resamples performed.
data the original data set.
seed a vector of randomly generated seeds that are used by the bootstrap.
type a character string specifying the type of bootstrap performed.
call the user’s call to the bootstrap function.
message a list of length B giving any messages generated during refitting. An entry will be NULL if no message was generated.
warning a list of length B giving any warnings generated during refitting. An entry will be NULL if no warning was generated.
error a list of length B giving any errors encountered during refitting. An entry will be NULL if no error was encountered.

lmeresampler also provides the extract_parameters() helper function to extract the fixed effects and variance components from merMod and lme objects as a named vector.

Example applications

A two-level example: JSP data

As a first application of the bootstrap for nested LME models, consider the junior school project (JSP) data (Mortimore et al. 1988; Goldstein 2011) that is stored as jsp728 in lmeresampler. The data set is comprised of measurements taken on 728 elementary school students across 48 schools in London.

# A tibble: 728 × 9
   mathA…¹ mathA…² gender class school normA…³ normA…⁴ schoo…⁵ mathA…⁶
     <dbl>   <dbl> <fct>  <fct> <fct>    <dbl>   <dbl>   <dbl>   <dbl>
 1      39      36 M      nonm… 1       1.80    1.55      22.4  13.6  
 2      11      19 F      manu… 1      -2.29   -0.980     22.4  -3.42 
 3      32      31 F      manu… 1      -0.0413  0.638     22.4   8.58 
 4      27      23 F      nonm… 1      -0.750  -0.460     22.4   0.579
 5      36      39 F      nonm… 1       0.743   2.15      22.4  16.6  
 6      33      25 M      manu… 1       0.163  -0.182     22.4   2.58 
 7      30      27 M      manu… 1      -0.372   0.0724    22.4   4.58 
 8      17      14 M      manu… 1      -1.63   -1.52      22.4  -8.42 
 9      33      30 M      manu… 1       0.163   0.454     22.4   7.58 
10      20      19 M      manu… 1      -1.40   -0.980     22.4  -3.42 
# … with 718 more rows, and abbreviated variable names ¹​mathAge11,
#   ²​mathAge8, ³​normAge11, ⁴​normAge8, ⁵​schoolMathAge8, ⁶​mathAge8c

Suppose we wish to fit a model using the math score at age 8, gender, and the father’s social class to describe math scores at age 11, including a random intercept for school (Goldstein 2011). This LME model can be fit using the lmer() function in lme4.

jsp_mod <- lmer(mathAge11 ~ mathAge8 + gender + class + (1 | school), data = jsp728)

To implement the residual bootstrap to estimate the fixed effects, we can use the bootstrap() function and set type = "residual".

(jsp_boot <- bootstrap(jsp_mod, .f = fixef, type = "residual", B = 2000))
Bootstrap type: residual 

Number of resamples: 2000 

            term   observed   rep.mean        se          bias
1    (Intercept) 14.1577509 14.1697871 0.6702835  0.0120361440
2       mathAge8  0.6388895  0.6391441 0.0248334  0.0002546297
3        genderM -0.3571922 -0.3654058 0.3391162 -0.0082135863
4 classnonmanual  0.7200815  0.7154916 0.3783149 -0.0045898872

There were 0 messages, 0 warnings, and 0 errors.

We can then calculate normal, percentile, and basic bootstrap confidence intervals via confint().

# A tibble: 12 × 6
   term           estimate     lower  upper type  level
   <chr>             <dbl>     <dbl>  <dbl> <chr> <dbl>
 1 (Intercept)      14.2   12.8      15.5   norm   0.95
 2 mathAge8          0.639  0.590     0.687 norm   0.95
 3 genderM          -0.357 -1.01      0.316 norm   0.95
 4 classnonmanual    0.720 -0.0168    1.47  norm   0.95
 5 (Intercept)      14.2   12.8      15.5   basic  0.95
 6 mathAge8          0.639  0.588     0.687 basic  0.95
 7 genderM          -0.357 -1.00      0.323 basic  0.95
 8 classnonmanual    0.720 -0.000162  1.46  basic  0.95
 9 (Intercept)      14.2   12.9      15.5   perc   0.95
10 mathAge8          0.639  0.590     0.690 perc   0.95
11 genderM          -0.357 -1.04      0.287 perc   0.95
12 classnonmanual    0.720 -0.0187    1.44  perc   0.95

The default setting is to calculate all three intervals, but this can be restricted by setting the type parameter to "norm", "basic", or "perc".

User-specified statistics: Estimating repeatability/intraclass correlation

The beauty of the bootstrap is its flexibility. Interval estimates can be constructed for functions of model parameters that would otherwise require more complex derivations. For example, the bootstrap can be used to estimate the intraclass correlation. The intraclass correlation measures the proportion of the total variance in the response accounted for by groups, and is an important measure of repeatability in ecology and evolutionary biology (Nakagawa and Schielzeth 2010). As a simple example, we’ll consider the BeetlesBody data set in rptR (Stoffel et al. 2017). This simulated data set contains information on body length (BodyL) and the Population from which the beetles were sampled. A simple Guassian-response LME model of the form

\[ y_{ij} = \beta_0 + b_i + \varepsilon_{ij}, \qquad b_i \sim \mathcal{N}(0, \sigma^2_b), \qquad \varepsilon_{ij} \sim \mathcal{N}(0, \sigma^2), \]

can be used to describe the body length of beetle \(j\) from population \(i\). The repeatability is then calculated as \(R = \sigma^2_b / (\sigma^2_b + \sigma^2)\). Below we fit this model using lmer():

data("BeetlesBody", package = "rptR")
(beetle_mod <- lmer(BodyL ~ (1 | Population), data = BeetlesBody))
Linear mixed model fit by REML ['lmerMod']
Formula: BodyL ~ (1 | Population)
   Data: BeetlesBody
REML criterion at convergence: 3893.268
Random effects:
 Groups     Name        Std.Dev.
 Population (Intercept) 1.173   
 Residual               1.798   
Number of obs: 960, groups:  Population, 12
Fixed Effects:

To construct a bootstrap confidence interval for the repeatability, we first must write a function to calculate it from the fitted model. Below we write a one-off function for this model to demonstrate a “typical” workflow rather than trying to be overly general.

repeatability <- function(object) {
  vc <-
  vc$vcov[1] / (sum(vc$vcov))

The original estimate of repeatability can then be quickly calculated:

[1] 0.2985548

To construct a bootstrap confidence interval for the repeatability, we run the desired bootstrap procedure, specifying .f = repeatability and then pass the results to confint().

(beetle_boot <- bootstrap(beetle_mod, .f = repeatability, type = "parametric", B = 2000))
Bootstrap type: parametric 

Number of resamples: 2000 

   observed  rep.mean         se        bias
1 0.2985548 0.2874126 0.08787679 -0.01114219

There were 0 messages, 0 warnings, and 0 errors.
(beetle_ci <- confint(beetle_boot, type = "basic"))
# A tibble: 1 × 6
  term  estimate lower upper type  level
  <chr>    <dbl> <dbl> <dbl> <chr> <dbl>
1 ""       0.299 0.139 0.471 basic  0.95

Notice that the column of beetle_ci is an empty character string since we did not have return a named vector.

Alternatively, we can plot() the results, as shown in Figure 1. The plot method for lmeresamp objects uses stat_halfeye() from ggdist (Kay 2021) to render a density plot with associated 66% and 95% percentile intervals.

plot(beetle_boot, .width = c(.5, .9)) + 
    title = "Bootstrap repeatabilities",
    y = "density",
    x = "repeatability"
The figure is a density plot displaying the bootstrap distribution of the repeatability. Repeatability is displayed on the x-axis and density is displayed on the y-axis. A point marks the median repeatability on the x-axis, with the 66% and 95% bootstrap percentile intervals marked using lines.

Figure 1: Density plot of the repeatabilities from the beetle model. The median bootstrap repeatability is approximatley .28 and denoted by a point under the density. 66% and 95% confidence intervals for the bootstrap repeatability are (0.217, 0.394) and (0.118, 0.475), respectively, and are displayed as line segments below the density.

Bootstrap tests for a single parameter

While lmeresampler was designed with a focus on estimation, the bootstrap functions can be used to conduct bootstrap tests on individual parameters. For example, returning to the JSP example, we might be interested in generating approximate \(p\)-values for the fixed effects:

                 Estimate Std. Error   t value
(Intercept)    14.1577509 0.73344165 19.303173
mathAge8        0.6388895 0.02544478 25.108865
genderM        -0.3571922 0.34009284 -1.050279
classnonmanual  0.7200815 0.38696812  1.860829

To generate a bootstrap \(p\)-value for a fixed effect, we must generate \(B\) bootstrap resamples under the reduced model (i.e., the null hypothesis), refit the full model, and then calculate the \(t\)-statistic for each resample, denoted \(t^*_i\). The bootstrap \(p\)-value is then calculated as \((n_{\rm extreme} + 1) / (B + 1)\) (Davison and Hinkley 1997; Halekoh and Højsgaard 2014).

Using the bootstrap() function, you can implement this procedure for a single parameter. For example, if we wish to calculate the bootstrap \(p\)-value for the class fixed effect we first generate \(B=1000\) bootstrap samples from the reduced model without class. In this example, we use the Wild bootstrap to illustrate that we are not restricted to a parametric bootstrap.

reduced_model <- update(jsp_mod, . ~ . - class)
reduced_boot <- bootstrap(reduced_model, type = "wild", B = 1000, hccme = "hc2", 
                          aux.dist = "mammen", .refit = FALSE)

Next, we refit the full model, jsp_mod, to each simulation and extract the \(t\)-statistic for the class variable. Note that the function extract_t() extracts the specified \(t\)-statistic from the coefficient table from model summary.

extract_t <- function(model, term) {
  coef(summary(model))[term, "t value"]

tstats <- purrr::map_dbl(
  ~refit(jsp_mod, .x) %>% extract_t(., term = "classnonmanual")

With the bootstrap \(t\)-statistics in hand, we can approximate the \(p\)-value using basic logical and arithmetic operators

(sum(abs(tstats) >= extract_t(jsp_mod)) + 1) / (1000 + 1)
[1] 0.2687313

While the above process is not particularly difficult to implement using the tools provided by lmeresampler, things get tedious if multiple parameters are of interest in this summary table. To help reduce this burden on the user, we have provided the bootstrap_pvals() function that will add bootstrap \(p\)-values for each term in the coefficient summary table. For jsp_mod, this is achieved in the below code chunk:

bootstrap_pvals(jsp_mod, type = "wild", B = 1000, hccme = "hc2", aux.dist = "mammen")
Bootstrap type: wild 

Number of resamples: 1000 

# A tibble: 4 × 5
  term           Estimate `Std. Error` `t value`  p.value
  <chr>             <dbl>        <dbl>     <dbl>    <dbl>
1 (Intercept)      14.2         0.733      19.3  0.000999
2 mathAge8          0.639       0.0254     25.1  0.000999
3 genderM          -0.357       0.340      -1.05 0.273   
4 classnonmanual    0.720       0.387       1.86 0.0659  

It’s important to note that running bootstraps for each term in the model is computationally demanding. To speed up the computation, you can run the command in parallel, as we discuss below in Bootstrapping in parallel.

Model comparison

The bootstrap can be useful during model selection. For example, if you are comparing a full and reduced model where the reduced model has fewer random effects, a 50:50 mixture of \(\chi^2\) distributions is often used (Stram and Lee 1994); however, Pinheiro and Bates (2000) point out that this approximation is not always optimal. In this example, we explore the Machine data set discussed by Pinheiro and Bates (2000), which consists of productivity scores for six workers on three brands of machine. This data set can be loaded from nlme:

data("Machines", package = "nlme")

Pinheiro and Bates (2000) consider two LME models for these data. The first model has a fixed effect for the machine and a random intercept for the worker.

reduced_mod <- lmer(score ~ Machine + (1 | Worker), data = Machines, REML = FALSE)

The second model has the same fixed effects structure, but adds an additional random effect for the machine within the worker.

full_mod <- lmer(score ~ Machine + (1 | Worker/Machine), data = Machines, REML = FALSE)

Pinheiro and Bates (2000) note that the approximate null distribution given by \(0.5\chi^2_0 + 0.5 \chi^2_1\) is not successful when the models are fit via maximum likelihood, and that the mixture is closer to \(0.65\chi^2_0 + 0.35 \chi^2_1\). Instead of relying on the conventional approximation, a bootstrap test can be conducted using bootstrap() to simulate the responses from the reduced model.

To conduct this bootstrap test, we first extract the observed statistic obtained via anova() and then generate \(B=1000\) bootstrap responses from the reduced model, fm1_machine. Recall that specifying .refit = FALSE returns a data frame of the simulated responses. Here, we use a residual bootstrap for illustration.

observed <- anova(full_mod, reduced_mod)$Chisq[2]

reduced_boot <- bootstrap(reduced_mod, type = "residual", B = 1000, .refit = FALSE)

Next, we must fit both the full and reduced models to the bootstrap responses and calculate the test statistic. The user-written compare_models() function performs this refit and calculation for given models and bootstrap responses. The control argument for the full model was set to reduce the number of convergence warnings, since the null model had a variance component of 0 for machines within workers, so we expect warnings as we fit an expanded model.

compare_models <- function(full, reduced, newdata) {
  full_mod <- refit(full, newdata, 
                    control = lmerControl(check.conv.singular = "ignore", 
                                          check.conv.grad = "ignore"))
  reduced_mod <- refit(reduced, newdata)
  anova(full_mod, reduced_mod)$Chisq[2]
chisq_stats <- purrr::map_dbl(reduced_boot, ~compare_models(full_mod, reduced_mod, newdata = .x))

With the test statistics in hand, we can quickly calculate the \(p\)-value

(sum(chisq_stats >= observed) + 1) / (1000 + 1)
[1] 0.000999001

Simulation-based model diagnostics

Our final example illustrates how the bootstrap() function can be used for model diagnosis. Loy et al. (2017) propose using the lineup protocol to diagnose LME models, since artificial structures often appear in conventional residual plots for this model class that are not indicative of a model deficiency.

In this example, we consider the Dialyzer data set provided by nlme. The data arise from a study characterizing the water transportation characteristics of 20 high flux membrane dialyzers, which were introduced to reduce the time a patient spends on hemodialysis (Vonesh and Carter 1992). The dialyzers were studied in vitro using bovine blood at flow rates of either 200 or 300 ml/min. The study measured the the ultrafiltration rate (ml/hr) at even transmembrane pressures (in mmHg). Pinheiro and Bates (2000) discuss modeling these data. Here, we explore how to create a lineup of residual plots to investigate the adequacy of the initial homoscedastic LME model fit by Pinheiro and Bates (2000).

dialyzer_mod <- lme(
  rate ~ (pressure + I(pressure^2) + I(pressure^3) + I(pressure^4)) * QB, 
  data = Dialyzer, 
  random = ~ pressure + I(pressure^2)

Pinheiro and Bates (2000) construct a residual plot of the conditional residuals plotted against the transmembrane pressure to explore the adequacy of the fitted model (Figure 2). There appears to be increasing spread of the conditional residuals, which would indicate that the homoscedastic model is not sufficient.

A plot of the residuals on the y-axis and the transmembrane pressure on the x-axis. A line is draw at y = 0 and the spread of the points around this line is increasing with transmembrane pressure.

Figure 2: A plot of the conditional residuals against the transmembrane pressure for the dialyzer model. It appears that the variability of the conditional residuals increases with transmembrance pressure, but does this indicate a model condition is violated?

To check if this pattern is actually indicative of a problem, we construct a lineup of residual plots. To do this, we must generate data from a number of properly specified models, say 19, to serve as decoy residual plots. Then, we create a faceted set of residual plots where the observed residual plot (Figure 2) is randomly assigned to a facet. To generate the residuals from properly specified models, we use the parametric bootstrap via bootstrap() and extract a data frame containing the residuals from each bootstrap sample using hlm_resid() from HLMdiag (Loy and Hofmann 2014).

sim_resids <- bootstrap(dialyzer_mod, .f = hlm_resid, type = "parametric", B = 19)

The simulated residuals are stored in the replicates element of the sim_resids list. sim_resids$replicates is a tibble containing the 19 bootstrap samples, with the the replicate number stored in the .n column.

Rows: 2,660
Columns: 15
$ id              <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
$ rate            <dbl> -0.8579226, 17.0489029, 34.3658738, 44.84955…
$ pressure        <dbl> 0.240, 0.505, 0.995, 1.485, 2.020, 2.495, 2.…
$ `I(pressure^2)` <I<dbl>>   0.0576, 0.255025, 0.990025, 2.205225,  …
$ `I(pressure^3)` <I<dbl>>     0.013824,  0.128787625,  0.985074875,…
$ `I(pressure^4)` <I<dbl>>   0.00331776, 0.065037...., 0.980149....,…
$ QB              <fct> 200, 200, 200, 200, 200, 200, 200, 200, 200,…
$ Subject         <ord> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3,…
$ .resid          <dbl> -1.19758632, 0.65600425, -0.90104247, 1.3913…
$ .fitted         <dbl> 0.3396637, 16.3928987, 35.2669162, 43.458244…
$ .ls.resid       <dbl> -0.51757746, 1.23968783, -1.32231163, 0.5907…
$ .ls.fitted      <dbl> -0.3403452, 15.8092151, 35.6881854, 44.25884…
$ .mar.resid      <dbl> -2.1236410, -0.3100360, -2.1298323, -0.34531…
$ .mar.fitted     <dbl> 1.265718, 17.358939, 36.495706, 45.194867, 4…
$ .n              <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

Now, we use the lineup() function from nullabor (Buja et al. 2009) to generate the lineup data. lineup() will randomly insert the observed (true) data into the samples data, “encrypt” the position of the observed data, and print a message that you can later decrypt in the console.

lineup_data <- lineup(true = hlm_resid(dialyzer_mod), n = 19, samples = sim_resids$replicates)
Rows: 2,800
Columns: 15
$ id              <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
$ rate            <dbl> -0.8579226, 17.0489029, 34.3658738, 44.84955…
$ pressure        <dbl> 0.240, 0.505, 0.995, 1.485, 2.020, 2.495, 2.…
$ `I(pressure^2)` <I<dbl>>   0.0576, 0.255025, 0.990025, 2.205225,  …
$ `I(pressure^3)` <I<dbl>>     0.013824,  0.128787625,  0.985074875,…
$ `I(pressure^4)` <I<dbl>>   0.00331776, 0.065037...., 0.980149....,…
$ QB              <fct> 200, 200, 200, 200, 200, 200, 200, 200, 200,…
$ Subject         <ord> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3,…
$ .resid          <dbl> -1.19758632, 0.65600425, -0.90104247, 1.3913…
$ .fitted         <dbl> 0.3396637, 16.3928987, 35.2669162, 43.458244…
$ .ls.resid       <dbl> -0.51757746, 1.23968783, -1.32231163, 0.5907…
$ .ls.fitted      <dbl> -0.3403452, 15.8092151, 35.6881854, 44.25884…
$ .mar.resid      <dbl> -2.1236410, -0.3100360, -2.1298323, -0.34531…
$ .mar.fitted     <dbl> 1.265718, 17.358939, 36.495706, 45.194867, 4…
$ .sample         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

With the lineup data in hand, we can create a lineup of residual plots using facet_wrap():

ggplot(lineup_data, aes(x = pressure, y = .resid)) +
  geom_hline(yintercept = 0, color = "gray60") +
  geom_point(shape = 1) +
  facet_wrap(~.sample) +
  theme_bw() +
  labs(x = "Transmembrane pressure (dmHg)", y = "Residuals (ml/hr)")