This paper introduces a very comprehensive implementation, available in the new `R`

package `glmtoolbox`

, of a very flexible statistical tool known as Generalized Estimating Equations (GEE), which analyzes cluster correlated data utilizing marginal models. As well as providing more built-in structures for the working correlation matrix than other GEE implementations in `R`

, this GEE implementation also allows the user to: \((1)\) compute several estimates of the variance-covariance matrix of the estimators of the parameters of interest; \((2)\) compute several criteria to assist the selection of the structure for the working-correlation matrix; \((3)\) compare nested models using the Wald test as well as the generalized score test; \((4)\) assess the goodness-of-fit of the model using Pearson-, deviance- and Mahalanobis-type residuals; \((5)\) perform sensibility analysis using the global influence approach (that is, dfbeta statistic and Cook’s distance) as well as the local influence approach; \((6)\) use several criteria to perform variable selection using a hybrid stepwise procedure; \((7)\) fit models with nonlinear predictors; \((8)\) handle dropout-type missing data under MAR rather than MCAR assumption by using observation-specific or cluster-specific weighted methods. The capabilities of this GEE implementation are illustrated by analyzing four real datasets obtained from longitudinal studies.

D. D. Boos. On generalized score tests. *The American Statistician*, 46(4): 327–33, 1992.

V. J. Carey. *Gee: Generalized estimation equation solver.* 2022. URL https://CRAN.R-project.org/package=gee. R package version 4.13-23.

V. J. Carey and Y.-G. Wang. Working covariance model selection for generalized estimating equations. *Statistics in Medicine*, 30(26): 3117–3124, 2011. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.4300.

R. D. Cook. Assessment of local influence. *Journal of the Royal Statistical Society. Series B (Methodological)*, 48(2): 133–169, 1986.

M. Davidian and D. M. Giltinan. *Nonlinear models for repeated measurement data.* CRC press, 1995.

G. M. Fitzmaurice, N. M. Laird and J. H. Ware. *Applied longitudinal analysis. 2nd ed.* John Wiley & Sons, 2011.

L. Fu, Y. Hao and Y.-G. Wang. Working correlation structure selection in generalized estimating equations. *Computational Statistics*, 33: 983–96, 2018.

M. Gosho. Criteria to select a working correlation structure in SAS. *Journal of Statistical Software, Code Snippets*, 57(1): 1–10, 2014. URL https://www.jstatsoft.org/index.php/jss/article/view/v057c01.

M. Gosho, C. Hamada and I. Yoshimura. Criterion for the selection of a working correlation structure in the generalized estimating equation approach for longitudinal balanced data. *Communications in Statistics - Theory and Methods*, 40(21): 3839–3856, 2011.

A. Gregoire, R. Kumar, B. Everitt and J. Studd. Transdermal oestrogen for treatment of severe postnatal depression. *The Lancet*, 347(9006): 930–933, 1996.

B. G. Hammill and J. S. Preisser. A SAS/IML software program for GEE and regression diagnostics. *Computational Statistics & Data Analysis*, 51(2): 1197–1212, 2006.

L. Y. Hin, V. J. Carey and Y.-G. Wang. Criteria for working-correlation-structure selection in GEE: Assessment via simulation. *The American Statistician*, 61(4): 360–364, 2007. URL http://www.jstor.org/stable/27643940.

L.-Y. Hin and Y.-G. Wang. Working-correlation-structure identification in generalized estimating equations. *Statistics in Medicine*, 28(4): 642–658, 2009.

S. Højsgaard, U. Halekoh and J. Yan. The r package geepack for generalized estimating equations. *Journal of Statistical Software*, 15(2): 1–11, 2005.

G. James, D. Witten, T. Hastie and R. Tibshirani. *An introduction to statistical learning: With applications in r.* Springer, 2013.

K.-M. Jung. Local influence in generalized estimating equations. *Scandinavian Journal of Statistics*, 35(2): 286–294, 2008.

N. M. Laird. Missing data in longitudinal studies. *Statistics in medicine*, 7(1-2): 305–315, 1988.

K. Y. Liang and S. L. Zeger. Longitudinal data analysis using generalized linear models. *Biometrika*, 73: 13–22, 1986.

S. R. Lipsitz, N. M. Laird and D. P. Harrington. Using the jackknife to estimate the variance of regression estimators from repeated measures studies. *Communications in Statistics - Theory and Methods*, 19(3): 821–845, 1990.

S. Lipsitz and G. Fitzmaurice. Generalized estimating equations for longitudinal data analysis. In *Longitudinal data analysis*, 2008. CRC Press.

L. A. Mancl and T. A. DeRouen. A covariance estimator for GEE with improved small-sample properties. *Biometrics*, 57(1): 126–134, 2001. DOI doi: 10.1111/j.0006-341x.2001.00126.x.

P. McCullagh and J. A. Nelder. *Generalized linear models, second edition.* Chapman & Hall, 1989. URL http://books.google.com/books?id=h9kFH2\_FfBkC.

L. S. McDaniel, N. C. Henderson and P. J. Rathouz. Fast pure R implementation of GEE: Application of the Matrix package. *The R Journal*, 5: 181–187, 2013. URL https://journal.r-project.org/archive/2013-1/mcdaniel-henderson-rathouz.pdf.

W. Pan. Akaike’s information criterion in generalized estimating equations. *Biometrics*, 57(1): 120–125, 2001. URL http://www.jstor.org/stable/2676849.

M. C. Pardo and R. Alonso. Working correlation structure selection in GEE analysis. *Statistical Papers*, 60(5): 1447–1467, 2019. URL https://doi.org/10.1007/s00362-017-0881-0.

J. Pinheiro and D. Bates. *Mixed-effects models in s and s-PLUS.* Springer New York, 2000. URL https://books.google.com.co/books?id=N3WeyHFbHLQC.

J. Pinheiro, D. Bates and R Core Team. *Nlme: Linear and nonlinear mixed effects models.* 2022. URL https://CRAN.R-project.org/package=nlme. R package version 3.1-160.

J. S. Preisser, K. K. Lohman and P. J. Rathouz. Performance of weighted estimating equations for longitudinal binary data with drop-outs missing at random. *Statistics in Medicine*, 21: 3035–3054, 2002.

J. S. Preisser and B. F. Qaqish. Deletion diagnostics for generalised estimating equations. *Biometrika*, 83(3): 551–562, 1996.

J. M. Robins, A. Rotnitzky and L. P. Zhao. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. *Journal of the American Statistical Association*, 90: 122–129, 1995.

A. Rotnitzky and N. P. Jewell. Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data. *Biometrika*, 77(3): 485–497, 1990. URL http://www.jstor.org/stable/2336986.

R. W. M. Wedderburn. Quasi-likelihood functions, generalized linear models, and the gauss—newton method. *Biometrika*, 61(3): 439–447, 1974.

J. Xu, J. Zhang and L. Fu. Variable selection in generalized estimating equations via empirical likelihood and gaussian pseudo-likelihood. *Communications in Statistics - Simulation and Computation*, 48(4): 1239–1250, 2019.

J. Yan. Geepack: Yet another package for generalized estimating equations. *R-News*, 2/3: 12–14, 2002.

A. Zeileis and Y. Croissant. Extended model formulas in R: Multiple parts and multiple responses. *Journal of Statistical Software*, 34(1): 1–13, 2010. DOI 10.18637/jss.v034.i01.

X. Zhu and Z. Zhu. Comparison of criteria to select working correlation matrix in generalized estimating equations. *Chinese Journal of applied probability and statistics*, 5: 515–30, 2013.

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Vanegas, et al., "Generalized Estimating Equations using the new R package glmtoolbox", The R Journal, 2023

BibTeX citation

@article{RJ-2023-056, author = {Vanegas, L.H. and Rondón, L.M. and Paula, G.A.}, title = {Generalized Estimating Equations using the new R package glmtoolbox}, journal = {The R Journal}, year = {2023}, note = {https://doi.org/10.32614/RJ-2023-056}, doi = {10.32614/RJ-2023-056}, volume = {15}, issue = {2}, issn = {2073-4859}, pages = {105-133} }