Supplementary materials are available in addition to this article. It can be downloaded at
RJ-2023-062.zip
E. Alfaro, M. Gámez and N. García.
adabag
: An
R package for classification with boosting and bagging.
Journal of Statistical Software, 54(2): 1–35, 2013. URL
http://www.jstatsoft.org/v54/i02/.
A. B. Arrieta, N. Dı́az-Rodrı́guez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcı́a, S. Gil-López, D. Molina, R. Benjamins, et al. Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion, 58: 82–115, 2020.
M. Banerjee, Y. Ding and A.-M. Noone. Identifying representative trees from ensembles.
Statistics in Medicine, 31(15): 1601–1616, 2012. DOI
10.1002/sim.4492.
L. Breiman. Bagging predictors. Machine Learning, 24(2): 123–140, 1996a.
L. Breiman. Heuristics of instability and stabilization in model selection.
The Annals of Statistics, 24(6): 2350–2383, 1996b. DOI
10.1214/aos/1032181158.
L. Breiman. Random forests.
Machine Learning, 45(1): 5–32, 2001. DOI
10.1023/A:1010933404324.
L. Breiman, J. Friedman, C. J. Stone and R. A. Olshen. Classification and regression trees. Belmont: Wadsworth International Group, 1984.
B. Briand, G. R. Ducharme, V. Parache and C. Mercat-Rommens. A similarity measure to assess the stability of classification trees.
Computational Statistics & Data Analysis, 53(4): 1208–1217, 2009. DOI
10.1016/j.csda.2008.10.033.
H. Chipman, E. George and R. McCulloh. Making sense of a forest of trees. In Computing science and statistics, proceedings of the 30th symposium on the interface., Ed S. Weisberg pages. 84–92 1998. Fairfax, VA: Interface Foundation of North America.
D. Dua and C. Graff. UCI machine learning repository. 2017. URL
http://archive.ics.uci.edu/ml.
E. Fehrman, A. K. Muhammad, E. M. Mirkes, V. Egan and A. N. Gorban. The five factor model of personality and evaluation of drug consumption risk. In
Data science, pages. 231–242 2017. Springer. DOI
10.1037/10140-001.
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1): 119–139, 1997.
T. J. Hastie, R. J. Tibshirani and J. H. Friedman.
The elements of statistical learning: Data mining, inference, and prediction. New York: Springer, 2009. DOI
10.1007/978-0-387-84858-7.
K. Hornik, C. Buchta and A. Zeileis. Open-source machine learning:
R meets
Weka.
Computational Statistics, 24(2): 225–232, 2009. DOI
10.1007/s00180-008-0119-7.
T. Hothorn, K. Hornik and A. Zeileis. Unbiased recursive partitioning: A conditional inference framework.
Journal of Computational and Graphical Statistics, 15(3): 651–674, 2006. DOI
10.1198/106186006X133933.
T. Hothorn and A. Zeileis.
partykit
: A modular toolkit for recursive partytioning in
R.
Journal of Machine Learning Research, 16: 3905–3909, 2015. URL
http://jmlr.org/papers/v16/hothorn15a.html.
L. Kaufman and P. J. Rousseeuw.
Finding groups in data: An introduction to cluster analysis. Hoboken: John Wiley & Sons, 1990. DOI
10.1002/9780470316801.
G. W. Leibniz. Nouveaux essais sur l’entendement humain, livre iv, chap. xvii. 1764.
A. Liaw and M. Wiener. Classification and regression by
randomForest
.
R News, 2(3): 18–22, 2002. URL
https://CRAN.R-project.org/doc/Rnews/.
S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal and S.-I. Lee. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2(1): 56–67, 2020.
M. Maechler, P. Rousseeuw, A. Struyf, M. Hubert and K. Hornik. cluster
: Cluster analysis basics and extensions. 2019.
R. R. McCrae and P. T. Costa. A contemplated revision of the NEO five-factor inventory.
Personality and Individual Differences, 36(3): 587–596, 2004. DOI
10.1016/s0191-8869(03)00118-1.
J. H. Patton, M. S. Stanford and E. S. Baratt. Factor structure of the barratt impulsiveness scale.
Journal of Cinical Psychology, 51(6): 768–774, 1995. DOI
10.1002/1097-4679(199511)51:6<768::aid-jclp2270510607>3.0.co;2-1.
B. Pfeifer, H. Baniecki, A. Saranti, P. Biecek and A. Holzinger. Multi-omics disease module detection with an explainable greedy decision forest. Scientific Reports, 12(1): 1–15, 2022.
M. Philipp, T. Rusch, K. Hornik and C. Strobl. Measuring the stability of results from supervised statistical learning. Journal of Computational and Graphical Statistics, 27(4): 685–700, 2018.
M. Philipp, A. Zeileis and C. Strobl. A toolkit for stability assessment of tree-based learners. In Proceedings of COMPSTAT 2016 – 22nd international conference on computational statistics, Eds A. Colubi, A. Blanco and C. Gatu pages. 315–325 2016. The International Statistical Institute/International Association for Statistical Computing. ISBN 978-90-73592-36-0.
G. Ridgeway. Generalized boosted models: A guide to the gbm
package. Update, 1(1): 2007, 2007.
P. J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.
Journal of Computational and Applied Mathematics, 20: 53–65, 1987. DOI
10.1016/0377-0427(87)90125-7.
E. Schubert and P. J. Rousseeuw. Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS algorithms. In International conference on similarity search and applications, pages. 171–187 2019. Springer.
W. D. Shannon and D. Banks. Combining classification trees using MLE.
Statistics in Medicine, 18(6): 727–740, 1999. DOI
10.1002/(sici)1097-0258(19990330)18:6<727::aid-sim61>3.3.co;2-u.
A. Sies and I. Van Mechelen. C443: A methodology to see a forest for the trees.
Journal of Classification, 37: 730–753, 2020. DOI
10.1007/s00357-019-09350-4.
M. Skurichina and R. P. Duin. Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis & Applications, 5(2): 121–135, 2002.
C. Strobl, J. Malley and G. Tutz. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Psychological Methods, 14(4): 323, 2009. DOI
10.1037/a0016973.
T. Therneau, B. Atkinson and B. Ripley. Package rpart
. 2015.
P. Turney. Technical note: Bias and the quantification of stability.
Machine Learning, 20(1): 23–33, 1995. DOI
10.1007/bf00993473.
S. van Buuren and K. Groothuis-Oudshoorn.
mice
: Multivariate imputation by chained equations in r.
Journal of Statistical Software, 45(3): 1–67, 2011. URL
https://www.jstatsoft.org/v45/i03/.
M. N. Wright and A. Ziegler.
ranger: A fast implementation of random forests for high dimensional data in
C++ and
R.
Journal of Statistical Software, 77(1): 1–17, 2017. DOI
10.18637/jss.v077.i01.
M. Zuckerman, D. M. Kuhlman, J. Joireman, P. Teta and M. Kraft. A comparison of three structural models for personality: The big three, the big five, and the alternative five.
Journal of Paediatric Personality and Social Psychology, 65(4): 757, 1993. DOI
10.1037//0022-3514.65.4.757.