influence.ME: Tools for Detecting Influential Data in Mixed Effects Models

influence.ME provides tools for detecting influential data in mixed effects models. The application of these models has become common practice, but the development of diagnostic tools has lagged behind. influence.ME calculates standardized measures of influential data for the point estimates of generalized mixed effects models, such as DFBETAS, Cook’s distance, as well as percentile change and a test for changing levels of significance. influence.ME calculates these measures of influence while accounting for the nesting structure of the data. The package and measures of influential data are introduced, a practical example is given, and strategies for dealing with influential data are suggested. The application of mixed effects regression models has become common practice in the field of social sciences. As used in the social sciences, mixed effects regression models take into account that observations on individual respondents are nested within higher-level groups such as schools, classrooms, states, and countries (Snijders and Bosker, 1999), and are often referred to as multilevel regression models. Despite these models’ increasing popularity, diagnostic tools to evaluate fitted models lag behind. We introduce influence.ME (Nieuwenhuis, Pelzer, and te Grotenhuis, 2012), an R-package that provides tools for detecting influential cases in mixed effects regression models estimated with lme4 (Bates and Maechler, 2010). It is commonly accepted that tests for influential data should be performed on regression models, especially when estimates are based on a relatively small number of cases. However, most existing procedures do not account for the nesting structure of the data. As a result, these existing procedures fail to detect that higher-level cases may be influential on estimates of variables measured at specifically that level. In this paper, we outline the basic rationale on detecting influential data, describe standardized measures of influence, provide a practical example of the analysis of students in 23 schools, and discuss strategies for dealing with influential cases. Testing for influential cases in mixed effects regression models is important, because influential data negatively influence the statistical fit and generalizability of the model. In social science applications of mixed models the testing for influential data is especially important, since these models are frequently based on large numbers of observations at the individual level while the number of higher level groups is relatively small. For instance, Van der Meer, te Grotenhuis, and Pelzer (2010) were unable to find any country-level comparative studies involving more than 54 countries. With such a relatively low number of countries, a single country can easily be overly influential on the parameter estimates of one or more of the country-level variables.

Rense Nieuwenhuis , Manfred te Grotenhuis , Ben Pelzer
2012-12-01

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Nieuwenhuis, et al., "influence.ME: Tools for Detecting Influential Data in Mixed Effects Models", The R Journal, 2012

BibTeX citation

@article{RJ-2012-011,
  author = {Nieuwenhuis, Rense and Grotenhuis, Manfred te and Pelzer, Ben},
  title = {influence.ME: Tools for Detecting Influential Data in Mixed Effects Models},
  journal = {The R Journal},
  year = {2012},
  note = {https://doi.org/10.32614/RJ-2012-011},
  doi = {10.32614/RJ-2012-011},
  volume = {4},
  issue = {2},
  issn = {2073-4859},
  pages = {38-47}
}