ggdensity: Improved Bivariate Density Visualization in R

The ggdensity R package extends the functionality of ggplot2 by providing more interpretable visualizations of bivariate density estimates using highest density regions (HDRs). The visualizations are created via drop-in replacements for the standard ggplot2 functions used for this purpose: geom_hdr() for geom_density_2d_filled() and geom_hdr_lines() for geom_density_2d(). These new geoms improve on those of ggplot2 by communicating the probabilities associated with the displayed regions. Various statistically rigorous estimators are available, as well as convenience functions geom_hdr_fun() and geom_hdr_fun_lines() for plotting HDRs of user-specified probability density functions. Associated geoms for rug plots and pointdensity scatterplots are also presented.

James Otto (Baylor University) , David Kahle (Baylor University)
2023-11-01

0.1 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2023-048.zip

A. Azzalini. Sn: The Skew-Normal and Related Distributions Such as the Skew-t and the SUN. 2022. URL https://CRAN.R-project.org/package=sn [online; last accessed April 29, 2022].
A. Azzalini and A. W. Bowman. A Look at Some Data on the Old Faithful Geyser. Journal of the Royal Statistical Society. Series C (Applied Statistics), 39(3): 357–365, 1990. URL https://www.jstor.org/stable/2347385 [online; last accessed April 13, 2022]. Publisher: [Wiley, Royal Statistical Society].
B. Cadre. Kernel estimation of density level sets. Journal of Multivariate Analysis, 97(4): 999–1023, 2006. URL https://www.sciencedirect.com/science/article/pii/S0047259X05000825 [online; last accessed April 13, 2022].
A. Cameron and T. van den Brand. Geomtextpath: Curved text in ’ggplot2’. 2022. URL https://CRAN.R-project.org/package=geomtextpath. R package version 0.1.0.
J. A. Hartigan. Estimation of a Convex Density Contour in Two Dimensions. Journal of the American Statistical Association, 82(397): 267–270, 1987. URL https://www.tandfonline.com/doi/abs/10.1080/01621459.1987.10478428 [online; last accessed April 13, 2022]. Publisher: Taylor & Francis _eprint: https://www.tandfonline.com/doi/pdf/10.1080/01621459.1987.10478428.
A. Horst, A. Hill and K. Gorman. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. 2020. URL https://CRAN.R-project.org/package=palmerpenguins [online; last accessed April 13, 2022].
R. J. Hyndman. Computing and Graphing Highest Density Regions. The American Statistician, 50(2): 120–126, 1996. URL https://www.jstor.org/stable/2684423 [online; last accessed April 13, 2022]. Publisher: [American Statistical Association, Taylor & Francis, Ltd.].
M. Kay. ggdist: Visualizations of distributions and uncertainty. 2023. URL https://mjskay.github.io/ggdist/. R package version 3.2.1.
D. W. Muller and G. Sawitzki. Excess Mass Estimates and Tests for Multimodality. Journal of the American Statistical Association, 86(415): 738–746, 1991. URL https://www.jstor.org/stable/2290406 [online; last accessed April 13, 2022]. Publisher: [American Statistical Association, Taylor & Francis, Ltd.].
W. Polonik. Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach. The Annals of Statistics, 23(3): 855–881, 1995. URL https://projecteuclid.org/journals/annals-of-statistics/volume-23/issue-3/Measuring-Mass-Concentrations-and-Estimating-Density-Contour-Clusters-An-Excess/10.1214/aos/1176324626.full [online; last accessed April 13, 2022]. Publisher: Institute of Mathematical Statistics.
P. Rigollet and R. Vert. Optimal rates for plug-in estimators of density level sets. Bernoulli, 15(4): 1154–1178, 2009. URL https://projecteuclid.org/journals/bernoulli/volume-15/issue-4/Optimal-rates-for-plug-in-estimators-of-density-level-sets/10.3150/09-BEJ184.full [online; last accessed April 13, 2022]. Publisher: Bernoulli Society for Mathematical Statistics and Probability.
D. W. Scott. Multivariate density estimation: Theory, practice, and visualization / David W. Scott. New York: Wiley, 1992.
W. N. Venables and B. D. Ripley. Modern applied statistics with s. Fourth New York: Springer, 2002. URL https://www.stats.ox.ac.uk/pub/MASS4/. ISBN 0-387-95457-0.
H. Wickham. ggplot2. New York, NY: Springer, 2009. URL http://link.springer.com/10.1007/978-0-387-98141-3 [online; last accessed April 13, 2022].
C. O. Wilke and T. L. Pedersen. Isoband: Generate isolines and isobands from regularly spaced elevation grids. 2021. URL https://CRAN.R-project.org/package=isoband. R package version 0.2.5.
L. Wilkinson. The Grammar of Graphics. New York: Springer-Verlag, 2005. URL http://link.springer.com/10.1007/0-387-28695-0 [online; last accessed April 13, 2022].

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Otto & Kahle, "ggdensity: Improved Bivariate Density Visualization in R", The R Journal, 2023

BibTeX citation

@article{RJ-2023-048,
  author = {Otto, James and Kahle, David},
  title = {ggdensity: Improved Bivariate Density Visualization in R},
  journal = {The R Journal},
  year = {2023},
  note = {https://doi.org/10.32614/RJ-2023-048},
  doi = {10.32614/RJ-2023-048},
  volume = {15},
  issue = {2},
  issn = {2073-4859},
  pages = {220-236}
}