This paper introduces the R package bayesanova, which performs Bayesian inference in the analysis of variance (ANOVA). Traditional ANOVA based on null hypothesis significance testing (NHST) is prone to overestimating effects and stating effects if none are present. Bayesian ANOVAs developed so far are based on Bayes factors (BF), which also enforce a hypothesis testing stance. Instead, the Bayesian ANOVA implemented in bayesanova focusses on effect size estimation and is based on a Gaussian mixture with known allocations, for which full posterior inference for the component parameters is implemented via Markov-Chain-Monte-Carlo (MCMC). Inference for the difference in means, standard deviations and effect sizes between each of the groups is obtained automatically. Estimation of the parameters instead of hypothesis testing is embraced via the region of practical equivalence (ROPE), and helper functions provide checks of the model assumptions and visualization of the results.
This article introduces
bayesanova, an R
package for conducting a Bayesian analysis of variance (ANOVA) via
Markov Chain Monte Carlo (MCMC) in a Gaussian mixture model. Classic
frequentist analysis of variance is based on null hypothesis
significance testing (NHST), which recently has been shown to produce
serious problems regarding the reproducibility and reliability of
scientific results
(Wasserstein and Lazar 2016; Colquhoun 2017, 2019; Benjamin et al. 2018; Wasserstein et al. 2019).
NHST is based on test statistics,
As a solution to these problems, Bayesian methods have been proposed
recently and are since gaining popularity in a wide range of scientific
domains (Kruschke 2013, 2015; McElreath and Smaldino 2015). The Bayesian
philosophy proceeds by combining the model likelihood
The bayesanova package is designed as a Bayesian alternative to the frequentist analysis of variance. By using a Gaussian mixture model and implementing a Markov Chain Monte Carlo algorithm for this model, full posterior inference can be obtained. This allows for explicit hypothesis testing between groups as in the frequentist ANOVA, or for estimation of parameters under uncertainty. The focus in bayesanova is on the latter perspective and avoids explicit hypothesis testing. While Bayesian versions of the analysis of variance have been proposed recently by (Rouder et al. 2012) and (Bergh et al. 2019), these implementations focus on the Bayes factor as a measure of evidence (Doorn et al. 2019; JASP Team 2019). As the Bayes factor suffers from multiple problems, one of which is its strong dependence on the used priors – see (Kamary et al. 2014) and (Robert 2016) – the implementation in bayesanova avoids the Bayes factor and uses a different posterior index, the region of practical equivalence (ROPE) (Kruschke 2018), which has lately been shown to have some desirable properties, in particular in contrast to the Bayes factor (Makowski et al. 2019a).
The plan of the paper is as follows: The next section introduces the analysis of variance in a frequentist and Bayesian fashion and gives an overview about packages implementing these methods. The following section then introduces the novel approach implemented in bayesanova. The details on the mixture representation of the Bayesian analysis of variance are discussed and scenarios where bayesanova is designed to be used are detailed. The section thereafter outlines the structure of the package and details the included functions. The following section presents a variety of examples and illustrations using real datasets from biomedical and psychological research as well as synthetic datasets. The last section then provides a summary of the benefits and drawbacks of the used implementation, as well as future plans for the package.
In applied statistics, the one-way analysis of variance is a method
which can be used to compare means of two or more samples (typically
three). The one-way ANOVA assumes numerical (response) data in each
group and (usually) categorical input data like a group indicator in a
randomized clinical trial (RCT). Interpreting the ANOVA as a linear
model, one obtains for data
The one-way ANOVA then tests the null hypothesis
The one-way ANOVA as detailed above makes several assumptions, the most
important of which are: (1) variances of populations are equal; (2)
responses for a given group are independent and identical distributed
random variables; (3) response variable residuals are normally
distributed, that is
While Monte Carlo studies have shown that the ANOVA is quite robust to small to medium violations of these assumptions (Donaldson 1966), severe violations of assumptions (1)-(3) will result in inflated rates of false positives and and thereby unreliable results (Tiku 1971).
Bayesian models for the ANOVA have been developed recently to solve some of the problems of NHST. The developed models can be categorized broadly into two approaches: The first approach relies on the Bayes factor as a measure of relative evidence and was developed by (Rouder et al. 2012). The second approach is based on MCMC algorithms like Gibbs sampling in JAGS (Plummer 2003) or Hamiltonian Monte Carlo (HMC) in Stan (Carpenter et al. 2017; Stan Development Team 2020). This approach was popularized by (Kruschke 2015). Here the region of practical equivalence (ROPE) as introduced by (Kruschke 2015) is used for measuring the evidence given the data. Also, an explicit hypothesis testing stance is avoided.
The approach of (Rouder et al. 2012) can be summarized as follows: An
independent Cauchy prior is considered
Putting a Jeffreys prior
The second approach of a Bayesian ANOVA model can be credited to (Kruschke 2015), who uses the MCMC sampler JAGS (Plummer 2003) to obtain full posterior inference in his model instead of relying on the Bayes factor. The reasons for avoiding the Bayes factor as a measure of evidence are that (1) it depends strongly on the selected prior modeling (Kamary et al. 2014); (2) the Bayes factor states only relative evidence for the alternative to the null hypothesis (or vice versa) so that even a large Bayes factor does not indicate that either one of both hypotheses is a good fit for the actual data (Kelter 2020b,a); (3) it can be located in the same formalism of hypothesis testing the pioneers of frequentist testing advocated at the time of invention (Robert 2016; Tendeiro and Kiers 2019). In addition, the calculation of the Bayes factor for increasingly complex models can be difficult, as the above derivations of (Rouder et al. 2012) exemplify, see also (Kamary et al. 2014). Importantly, the Bayes factor assigns positive measure to a Lebesgue-null-set which is puzzling from a measure-theoretic perspective, compare (Kelter 2021d), (Rao and Lovric 2016), and (Berger 1985).
(Kruschke 2015) modeled the Bayesian ANOVA for
Inference for the full posterior, that is for the parameters
Conducting a traditional analysis of variance is possible with an abundance of software, for example via the stats package (R Core Team 2020) which is part of the R programming language (R Core Team 2020).
The BayesFactor package by (Morey and Rouder 2018) provides the Bayesian ANOVA Bayes factor of (Rouder et al. 2012), and various helper functions for analysis of the results.
A simple illustration of the main workflow in the BayesFactor package
is given here, using the ToothGrowth
dataset in the
datasets package
(Cannon et al. 2019). The ToothGrowth
dataset contains three columns:
len
, the dependent variable each of which is the length of a guinea
pig’s tooth after treatment with vitamin C. The predictor supp
corresponds to the supplement type (either orange juice or ascorbic
acid), the predictor dose
is the amount of vitamin C administered.
The BayesFactor package’s core function allows the comparison of
models anovaBF
computes several model estimates
at once, so that the model with the largest Bayes factor can be
selected. The data are first loaded and the categorial predictors
converted to factors:
R> set.seed(42)
R> library(datasets)
R> data(ToothGrowth)
R> head(ToothGrowth,n=3)
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
R> ToothGrowth$dose = factor(ToothGrowth$dose)
R> levels(ToothGrowth$dose) = c('Low', 'Medium', 'High')
Then, a Bayesian ANOVA is conducted using both predictors dose
, supp
and the interaction term dose * supp
:
R> library(BayesFactor)
R> bf = anovaBF(len ~ supp * dose, data = ToothGrowth)
Bayes factor analysis
--------------
[1] supp : 1.198757 +- 0.01%
[2] dose : 4.983636e+12 +- 0%
[3] supp + dose : 2.963312e+14 +- 1.59%
[4] supp + dose + supp:dose : 8.067205e+14 +- 1.94%
Against denominator:
Intercept only
---
Bayes factor type: BFlinearModel, JZS
The results are shown in form of the Jeffreys-Zellner-Siow (JZS) Bayes
factor supp
and dose
is largest, the Bayesian
ANOVA favours this model over the null model which includes only the
intercept. Thus, as there are the low, medium and high dose groups and
the two supplement groups, in total one obtains [1]
, [2]
and [3]
).
Note, that this solution is also implemented in the open-source software JASP, for an introduction see (Bergh et al. 2019).
The Bayesian ANOVA model of (Kruschke 2015) is not implemented in a software package by now. Instead, users have to write their own model scripts for JAGS (Plummer 2003) to run the analysis. Still, recently the package BANOVA was published by (Dong and Wedel 2019), which uses JAGS (Plummer 2003) and the Hamiltonian Monte Carlo (HMC) sampler Stan (Carpenter et al. 2017) via the package RStan (Stan Development Team 2020) to provide similar inferences without the need to code the JAGS or Stan models on your own.
Note that in the above example, a traditional ANOVA can easily be fit via
R> summary(aov(len ~ supp * dose, data = ToothGrowth))
Df Sum Sq Mean Sq F value Pr(>F)
supp 1 205.4 205.4 15.572 0.000231 ***
dose 2 2426.4 1213.2 92.000 < 2e-16 ***
supp:dose 2 108.3 54.2 4.107 0.021860 *
Residuals 54 712.1 13.2
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
which yields similar results, favouring the full model with both predictors and interaction term, as both predictors and the interaction term are significant.
The method used in the bayesanova package is based on estimation of
parameters in a Gaussian mixture distribution. On this mixture a Gibbs
sampling algorithm is applied to produce posterior distributions of all
unknown parameters given the data in the Gaussian components, that is
for
Consider a population made up of
subgroups, mixed at random in proportion to the relative group sizes . Assume interest lies in some random feature which is heterogeneous across and homogeneous within the subgroups. Due to heterogeneity, has a different probability distribution in each group, usually assumed to arise from the same parametric family however, with the parameter differing across the groups. The groups may be labeled through a discrete indicator variable taking values in the set . When sampling randomly from such a population, we may record not only , but also the group indicator . The probability of sampling from the group labeled is equal to , whereas conditional on knowing , is a random variable following the distribution with being the parameter in group . (...) The marginal density is obviously given by the following mixture density
Clearly, this resembles the situation of the analysis of variance, in
which the allocations
If unbalanced groups are the goal, weights can be adjusted accordingly,
for example
The Bayesian ANOVA model based on Gaussian mixtures is summarized in Figure 1 using the three-group case as an example:
The measured variables
Instead of relying on the Bayes factor, the bayesanova package follows
the approach of (Kruschke 2018) to use a region of practical
equivalence (ROPE). The effect size
Table 1 provides an overview about the four ANOVA models and their purpose.
Model | Purpose | Evidence measure | Computational aspects |
---|---|---|---|
Frequentist ANOVA | Testing the global hypothesis that all samples are drawn from populations with identical means against the alternative | F-statistic and p-value | Analytic solution |
Bayesian ANOVA of (Rouder et al. 2012) | Test the global hypothesis that the effect size vector is zero versus the alternative | Bayes factor | Numerical integration required |
Bayesian ANOVA of (Kruschke 2015) | Estimation of effect sizes between groups via ROPE and 95% HPD | ROPE | Gibbs sampling in MCMC sampler JAGS (or Stan) required |
Bayesian ANOVA based on Gaussian mixtures | Estimation of effect sizes between groups via the ROPE and posterior probability mass | ROPE | Gibbs sampling without MCMC sampler JAGS (or Stan) required |
Although it appears that the model of (Kruschke 2015) and the Gaussian
mixture modeling approach proposed in this paper have the same purpose,
they differ in how data
The bayesanova package has four functions. These provide (1) the MCMC algorithm for conducting the Bayesian ANOVA in the Gaussian mixture model with known allocations, detailed above, (2) checks of the model assumptions and (3) visualizations of the posterior results for easy interpretation and communication of research results. Visualizations of the posterior mixture components in comparison with the original data are provided by the fourth function. An overview is provided in Table 2.
Function | Description |
---|---|
bayes.anova |
Main function of the package, conducts the MCMC algorithm to provide full posterior inference in the three-component Gaussian mixture model |
assumption.check |
Helper function for checking the assumption of normality in each group previous to running a Bayesian ANOVA |
anovaplot |
Provides multiple visualizations of the results, including posterior distributions, difference in means and standard deviations and effect sizes as well as a full ROPE-analysis |
post.pred.check |
Provides a posterior predictive check for a fitted Bayesian ANOVA model |
The core function is bayes.anova
, which provides the MCMC algorithm to
obtain full posterior inference in a
the means
the standard deviations
The details of the Gibbs sampler can be found in (Kelter 2020c, 2021a), and the validity of the method follows from standard MCMC theory, see for example (Robert and Casella 2004).
The bayes.anova
function takes as input three numerical vectors
first
, second
and third
, which correspond to the observed
responses in each of the three groups and provides multiple optional
parameters:
bayes.anova(n=10000, first, second, third,
fourth = NULL, fifth = NULL, sixth = NULL,
hyperpars="custom", burnin=n/2, sd="sd", q=0.1, ci=0.95)
These are the only mandatory input values, and currently six groups are
the limit bayesanova supports. More than three groups can be handed to
the function by providing numerical vectors for the parameters fourth
,
fifth
and sixth
.
If no other parameters are provided, the function chooses a default of
n=10000
Gibbs sampling iterations, where the burn-in of the Markov
chains is set to burnin=n/2
, so that the first 5000 iterations are
discarded. The default setting uses inference for means sd="sd"
, but inference for variances sd="var"
. The
credible level for all computed credible intervals defaults to ci=0.95
. The two remaining parameters hyperpars
and q
define preselected values for the hyperparameters in the prior, to
ensure weakly informative priors are used which influence the analysis
as little as possible. For details, see
(Kelter 2020c, 2021a), but in
general these values apply to a broad range of contexts so that changing
them is not recommended. Note, that another set of hyperparameters based
on (Raftery 1996) can be selected via hyperpars="rafterys"
, if
desired.
After execution, the function returns a dataframe including four Markov
chains for each parameter of the specified size n-burnin
, to make
subsequent convergence assessment or post-processing of the MCMC results
possible.
The second function is assumption.check
. This function runs a
preliminary assumption check on the data, which is recommended before
running a Bayesian ANOVA. The model assumptions are normality in each
mixture component, so that the assumption.check
function runs
Shapiro-Wilk tests to check for normality (Shapiro and Wilk 1965). The
input parameters are the three numerical vectors x1
, x2
and x3
including the observed responses in the first, second and third group,
and the desired confidence level conf.level
for the Shapiro-Wilk
tests:
assumption.check(x1, x2, x3, x4 = NULL, x5 = NULL, x6 = NULL, conf.level=0.95)
The default confidence level is 0.95
. More than three groups can
easily be added by providing values for x4
, x5
and x6
.
The third function is anovaplot
, which provides a variety of
visualizations of results. The function takes as input a dataframe
dataframe
, which should be the result of the bayes.anova
function
detailed above, a parameter type
, which indicates which visualization
is desired, a parameter sd
, which indicates if the provided dataframe
includes posterior draws of ci
, which again defined the credible level used in the
computations.
anovaplot(dataframe, type="rope", sd="sd", ci=0.95)
The default values for sd
is "sd"
, and the default credible level is
0.95
. The type
parameter takes one of four possible values: (1)
type="pars"
, (2) type="diff"
, (3) type="effect"
and (4)
type="rope"
. In the first case, posterior distributions of all model
parameters are produced, complemented by convergence diagnostics in form
of trace plots, autocorrelation plots and the Gelman-Brooks-Rubin shrink
factor (Gelman and Brooks 1998), which should be close to one to indicate
convergence to the posterior. In the second case, the posterior
distributions of the differences sd="var"
and the dataframe includes
posterior draws of the
The last function post.pred.check
provides a posterior predictive
check for a fitted Bayesian ANOVA model against the original data, which
is routine in a Bayesian workflow (Gabry et al. 2019).
This section provides illustrations and a variety of examples, in which the bayesanova package can be used and provides richer information than existing solutions.
The guinea pig dataset from above is used as a first example. The data
are included in the dataset ToothGrowth
in the datasets package
which is part of R. First, data is loaded and split into three groups,
corresponding to a low, medium and high administered vitamin C dose:
R> library(datasets)
R> data(ToothGrowth)
R> head(ToothGrowth,n=3)
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
R> library(dplyr)
R> library(tidyr)
R> library(bayesanova)
R> grp1 = (ToothGrowth %>% filter(dose==0.5) %>% select(len))$len
R> grp2 = (ToothGrowth %>% filter(dose==1.0) %>% select(len))$len
R> grp3 = (ToothGrowth %>% filter(dose==2.0) %>% select(len))$len
Next, we run the assumption checks on the data
R> assumption.check(grp1, grp2, grp3, conf.level=0.95)
Model assumptions checked. No significant deviations from normality detected.
Bayesian ANOVA can be run safely.
ToothGrowth
dataset using the
assumption.check()
function in bayesanova, showing that data in the
three groups can be assumed as normally distributed so that running the
Bayesian ANOVA based on the Gaussian mixture model is
justifiedFigure 2 shows the histograms and
quantile-quantile plots for all three groups produced by
assumption.check()
. Clearly, there are no large deviations, and no
Shapiro-Wilk test was significant at the
Next, the Bayesian ANOVA can be run via the bayes.anova
function.
Therefore, the default parameter values are used, yielding n=5000
posterior draws:
R> set.seed(42)
R> res = bayes.anova(first = grp1, second = grp2, third = grp3)
|Parameter |LQ |Mean |UQ |Std.Err |
|:-------------|:-----|:-----|:-----|:-------|
|mu1 |8.69 |10.61 |12.5 |0.91 |
|mu2 |18.05 |19.75 |21.46 |0.84 |
|mu3 |24.94 |26.1 |27.25 |0.57 |
|sigma1 |3.02 |4.07 |5.67 |0.67 |
|sigma2 |2.95 |3.96 |5.43 |0.64 |
|sigma3 |2.43 |3.25 |4.42 |0.52 |
|mu2-mu1 |6.7 |9.15 |11.7 |1.25 |
|mu3-mu1 |13.42 |15.49 |17.67 |1.06 |
|mu3-mu2 |4.36 |6.34 |8.38 |1.01 |
|sigma2-sigma1 |-2.02 |-0.11 |1.68 |0.93 |
|sigma3-sigma1 |-2.62 |-0.81 |0.74 |0.85 |
|sigma3-sigma2 |-2.46 |-0.71 |0.85 |0.82 |
|delta12 |-5.77 |-4.59 |-3.21 |0.65 |
|delta13 |-9.37 |-8.14 |-6.63 |0.71 |
|delta23 |-4.36 |-3.36 |-2.19 |0.56 |
The results table shows the lower and upper quantile, corresponding to
the ci
ci
ci
ci
is the credible level chosen above. Also, the posterior mean and
standard error are given for each parameter, difference of parameters
and effect size. The results clearly show that there are huge
differences between the groups: For example, one can immediately spot
that the more vitamin c given, the more tooth growth can be observed via
tooth lengths. While the first group (low dose) has a posterior mean of
Note that the information provided is much more fine-grained than in the
solutions via the traditional ANOVA and the Jeffreys-Zellner-Siow based
Bayes-factor ANOVA above. While in these two solutions, one could only
infer that the model using both predictors and the interaction term is
the best, now we are given precise estimates of the effect sizes between
each group defined by the dose of vitamin c administered. Note also,
that including the second predictor supp
is no problem, leading to a
setting which incorporates six groups in the mixture then.
The second example is from the biomedical sciences and uses the heart
rate data from (Moore et al. 2012). In the study, heart rates of female and
male runners and generally sedentary participants (not regularly
running) following six minutes of exercise were recorded. The
participant’s Gender
and Heart.rate
are given and which group he or
she belongs to (Group=="Runners"
or Group=="Control"
). In the study,
800 participants were recruited, so that in each of the four groups
given by the combinations of Gender
and Group
200 subjects
participated.
Therefore, the situation requires a
R> library(dplyr)
R> hr=read.csv("heartrate.csv",sep=",")
R> head(hr)
Gender Group Heart.Rate
1 Female Runners 119
2 Female Runners 84
3 Female Runners 89
4 Female Runners 119
5 Female Runners 127
6 Female Runners 111
R> femaleRunners = (hr %>% filter(Gender=="Female")
+ %>% filter(Group=="Runners")
+ %>% select(Heart.Rate))$Heart.Rate
R> maleRunners = (hr %>% filter(Gender=="Male") %>% filter(Group=="Runners")
+ %>% select(Heart.Rate))$Heart.Rate
R> femaleControl = (hr %>% filter(Gender=="Female")
+ %>% filter(Group=="Control")
+ %>% select(Heart.Rate))$Heart.Rate
R> maleControl = (hr %>% filter(Gender=="Male") %>% filter(Group=="Control")
+ %>% select(Heart.Rate))$Heart.Rate
Then, we check the model assumptions:
R> assumption.check(femaleRunners, maleRunners, femaleControl, maleControl)
We can thus safely proceed running the Bayesian ANOVA:
R> set.seed(42)
R> resRunners = bayes.anova(first = femaleRunners, second = maleRunners,
+ third = femaleControl, fourth = maleControl)
|Parameter |LQ |Mean |UQ |Std.Err |
|:-------------|:------|:------|:------|:-------|
|mu1 |113.48 |116 |118.5 |1.27 |
|mu2 |102.51 |103.98 |105.55 |0.76 |
|mu3 |145.44 |148.04 |150.52 |1.3 |
|mu4 |127.12 |130.01 |132.82 |1.47 |
|sigma1 |14.38 |15.87 |17.51 |0.8 |
|sigma2 |11.21 |12.35 |13.67 |0.63 |
|sigma3 |14.71 |16.19 |17.85 |0.82 |
|sigma4 |15.46 |17.02 |18.79 |0.85 |
|mu2-mu1 |-14.9 |-12.01 |-9.06 |1.48 |
|mu3-mu1 |28.47 |32.04 |35.6 |1.83 |
|mu4-mu1 |10.19 |14.01 |17.9 |1.96 |
|mu3-mu2 |41.12 |44.05 |46.95 |1.51 |
|mu4-mu2 |22.83 |26.02 |29.21 |1.66 |
|mu4-mu3 |-21.8 |-18.03 |-14.4 |1.92 |
|sigma2-sigma1 |-5.6 |-3.52 |-1.57 |1.02 |
|sigma3-sigma1 |-1.94 |0.32 |2.53 |1.15 |
|sigma4-sigma1 |-1.14 |1.15 |3.51 |1.18 |
|sigma3-sigma2 |1.83 |3.84 |5.85 |1.03 |
|sigma4-sigma2 |2.7 |4.67 |6.8 |1.05 |
|sigma4-sigma3 |-1.48 |0.83 |3.13 |1.17 |
|delta12 |2.4 |3.2 |3.96 |0.4 |
|delta13 |-8.92 |-8.01 |-7.05 |0.48 |
|delta14 |-4.42 |-3.46 |-2.5 |0.49 |
|delta23 |-12.55 |-11.67 |-10.77 |0.45 |
|delta24 |-7.65 |-6.79 |-5.91 |0.45 |
|delta34 |3.52 |4.43 |5.37 |0.48 |
The results reveal multiple insights now. To support the interpretation,
we first produce visualisations of the results via the anovaplot()
function:
R> anovaplot(resRunners)
Figure 3 shows the plots produces by the above call to
anovaplot()
.
Heart rate
dataset using the anovaplot()
function in bayesanova, showing (1) the resulting posterior
distributions of the effect sizes between each pair of groups (first and
third row) and (2) the posterior ROPE-analysis for each group comparison
(second and fourth row)
The first row shows the posterior distributions of the effect sizes
Thus, we can see that for
To check if this effect exists also in the control groups, we compare
the posterior of
To check the model fit, we use the post.pred.check
function, which
performs a posterior predictive check against the observed data by
drawing reps
samples from the posterior distribution and visualizing
the original data’s density with density overlays for the reps sampled
posterior predictive densities of the data:
post.pred.check(anovafit = resRunners, ngroups = 4, out = hr$Heart.Rate ,
reps = 50, eta = c(1/4,1/4,1/4,1/4))
The argument anovafit
takes the resulting dataframe of the
bayes.anova
function as input, the number of groups is specified in
ngroups
, out
is the vector of all data originally observed (no
matter which group), reps
is the number of posterior predictive
density overlays desired, and eta
is the vector of weights used in the
Gaussian mixture. Here, as all four groups include 200 participants,
each weight is
post.pred.check
function; left: For the
runners
dataset; right: For the feelings
dataset; in both cases, results show that the overall fit of the
Gaussian mixture model is reasonable
This example uses data from a study conducted by (Balzarini et al. 2017), in
which men and women’s feelings towards their partners after watching
either erotic or abstract art pictures were analysed. The study was
published in the Journal of Experimental Social Psychology, and also
the average pleasantness obtained from viewing the pictures was studied,
as one of the research questions was whether men and women rate
pleasantness of the pictures differently for nude and abstract art. This
leads to a Gender
and Condition
in the dataframe.
First, data is loaded and split into the four groups of interest:
R> feelings=read.csv("feelings.csv",sep=",")
R> head(feelings)
Gender Age RelLen Condition PartnerAttractiveness
1 Male 43 3.7500 Nudes 21
2 Female 26 3.0000 Nudes 19
3 Female 35 5.2500 Abstract Art 27
4 Female 31 2.0000 Abstract Art 22
5 Female 23 4.0000 Abstract Art 27
6 Male 36 19.9167 Nudes 16
LoveForPartner AveragePleasantness
1 76 5.9375
2 66 4.7500
3 103 6.2500
4 76 5.5625
5 109 2.3750
6 98 5.1250
R> femaleArtistic = (feelings %>% filter(Gender=="Female") %>%
+ filter(Condition=="Abstract Art"))$AveragePleasantness
R> maleArtistic = (feelings %>% filter(Gender=="Male") %>%
+ filter(Condition=="Abstract Art"))$AveragePleasantness
R> femaleNude = (feelings %>% filter(Gender=="Female") %>%
+ filter(Condition=="Nudes"))$AveragePleasantness
R> maleNude = (feelings %>% filter(Gender=="Male") %>%
+ filter(Condition=="Nudes"))$AveragePleasantness
Second, the model assumption of normality in each group is checked:
R> assumption.check(femaleArtistic, maleArtistic, femaleNude, maleNude)
1: In assumption.check(femaleArtistic, maleArtistic, femaleNude, maleNude) :
Model assumption of normally distributed data in each group is violated.
All results of the Bayesian ANOVA based on a Gaussian mixture
could therefore be unreliable and not trustworthy.
2: In assumption.check(femaleArtistic, maleArtistic, femaleNude, maleNude) :
Run further diagnostics (like Quantile-Quantile-plots) to check if the
Bayesian ANOVA can be expected to be robust to the violations of normality
This time the function gives a warning, that there are violations of the distributional assumptions. Investigating the results leads to the conclusion that data in the fourth group deviate from normality, shown in 5 in the QQ-plot. Still, as all other groups show no strong deviations from normality, we proceed and are cautious when drawing inferences including any statements involving the fourth group.
feelings
dataset, showing that the assumption of normality is
violatedKeeping this in mind, the Bayesian ANOVA is run now with default hyperparameters:
R> set.seed(42)
R> resFeelings = bayes.anova(first = femaleArtistic, second = maleArtistic,
+ third = femaleNude, fourth = maleNude)
|Parameter |LQ |Mean |UQ |Std.Err |
|:-------------|:-----|:-----|:-----|:-------|
|mu1 |4.86 |4.9 |4.95 |0.02 |
|mu2 |4.62 |4.66 |4.69 |0.02 |
|mu3 |4.07 |4.2 |4.34 |0.07 |
|mu4 |5.42 |5.47 |5.53 |0.03 |
|sigma1 |0.98 |1.16 |1.4 |0.11 |
|sigma2 |0.86 |1.02 |1.21 |0.09 |
|sigma3 |1.34 |1.66 |2.06 |0.19 |
|sigma4 |1.06 |1.26 |1.52 |0.12 |
|mu2-mu1 |-0.31 |-0.25 |-0.19 |0.03 |
|mu3-mu1 |-0.85 |-0.7 |-0.56 |0.07 |
|mu4-mu1 |0.5 |0.57 |0.64 |0.04 |
|mu3-mu2 |-0.6 |-0.46 |-0.32 |0.07 |
|mu4-mu2 |0.75 |0.81 |0.87 |0.03 |
|mu4-mu3 |1.12 |1.27 |1.41 |0.07 |
|sigma2-sigma1 |-0.43 |-0.14 |0.12 |0.14 |
|sigma3-sigma1 |0.1 |0.49 |0.95 |0.21 |
|sigma4-sigma1 |-0.21 |0.1 |0.42 |0.16 |
|sigma3-sigma2 |0.27 |0.64 |1.07 |0.21 |
|sigma4-sigma2 |-0.04 |0.24 |0.55 |0.15 |
|sigma4-sigma3 |-0.84 |-0.39 |0.01 |0.22 |
|delta12 |0.18 |0.24 |0.29 |0.03 |
|delta13 |0.47 |0.6 |0.73 |0.07 |
|delta14 |-0.58 |-0.52 |-0.44 |0.04 |
|delta23 |0.27 |0.41 |0.53 |0.06 |
|delta24 |-0.84 |-0.76 |-0.69 |0.04 |
|delta34 |-1.2 |-1.07 |-0.91 |0.07 |
The results show that differences are now much more subtle than in the previous examples. From the results one can spot that the means in the first three groups are located nearer to each other than in the previous examples, and the fourth group differs more strongly from the first three. The standard deviations do not differ a lot between groups, and the magnitude of the posterior effect sizes is now smaller, too. To investigate the effect sizes, visualisations are produced first:
R> anovaplot(resFeelings)
feelings
dataset using the anovaplot()
function in bayesanova,
showing which effects are most probable a posteriori based on a
ROPE-analysis for each pair of groupsFigure 6 shows the plots produces by the above call to
anovaplot()
. The two left plots show that with
The middle two plots in 6 show the effect between the
female artistic and female nude picture groups. We can see that based on
the posterior distribution of
The right two plots in 6 show the effect between the
male artistic and female nude groups. The posterior reveals that
Figure 7 shows the effects which include the fourth
group. Due to the violations of distributional assumptions one need to
be cautious now, as the results could be deterred. Still, the two right
plots show the effect size between the female and male nude groups, and
indicate that the full posterior (100%) signals a large negative effect.
This means, males rate the pleasantness of nude pictures much higher
than females. Still, the result (as well as the results for
The posterior predictive check in the right plot of 4 obtained via
post.pred.check(anovafit = resFeelings, ngroups = 4, out = feelings$AveragePleasantness,
reps = 100, eta = c(58/223,64/223,41/223,60/223))
shows that the overall fit seems reasonable, although there is some room for improvement in the range of average pleasantness ratings between zero and two, and in the peak between average pleasantness ratings of four and six. Subdividing the data even further and refitting the ANOVA model with a higher number of components would be an option to improve the fit. Alternatively, one could discuss the prior hyperparameters chosen here.
feelings
dataset using the anovaplot()
function in bayesanova,
showing which effects are most probable a posteriori based on a
ROPE-analysis for each pair of groupsA solution via a traditional ANOVA in this case would yield:
R> summary(aov(AveragePleasantness ~ Gender * Condition, data = feelings))
Df Sum Sq Mean Sq F value Pr(>F)
Gender 1 10.63 10.629 7.605 0.00631 **
Condition 1 1.27 1.267 0.906 0.34210
Gender:Condition 1 31.15 31.155 22.291 4.18e-06 ***
Residuals 219 306.09 1.398
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here, the condition is not significant, but the interaction is. The
above analysis via the Bayesian mixture ANOVA made this more explicit:
The posteriors for each combination of gender and condition were derived
via MCMC, leading for example to the conclusion that females rate
artistic picture as more pleasant than nude pictures with
A solution via a Bayes factor based ANOVA would yield:
R> library(BayesFactor)
R> set.seed(42)
R> feelings$Gender = factor(feelings$Gender)
R> feelings$Condition = factor(feelings$Condition)
R> bfFeelings = anovaBF(AveragePleasantness ~ Gender * Condition, data = feelings)
Bayes factor analysis
--------------
[1] Gender : 3.727898 +- 0%
[2] Condition : 0.2532455 +- 0.01%
[3] Gender + Condition : 0.822604 +- 1.01%
[4] Gender + Condition + Gender:Condition : 3048.134 +- 1.14%
Against denominator:
Intercept only
---
Bayes factor type: BFlinearModel, JZS
The conclusions drawn in this case are that the model including both
gender, the condition and the interaction between both is most
favourable due to the huge Bayes factor of
This example uses data from medical research about Alzheimer’s disease. Amyloid-beta (Abeta) is a protein fragment which has been linked frequently to Alzheimer’s disease. Autopsies from a sample of Catholic priests included measurements of Abeta (pmol/g tissue from the posterior cingulate cortex) from three groups: subjects who had exhibited no cognitive impairment before death, subjects who had exhibited mild cognitive impairment, and subjects who had mild to moderate Alzheimer’s disease. The original study results were published by (Pivtoraiko et al. 2015) in the journal Neurobiology of Aging and are used here.
The Amyloid
dataset is available in the
Stat2Data package
(Cannon et al. 2019) and includes a group indicator Group
, which takes
either one of three values: mAD
, which classifies a subject as having
had mild Alzheimer’s disease, MCI
, which is a mild cognitive
impairment and NCI
, which is no cognitive impairment. Also, the amount
of Amyloid-beta from the posterior cingulate cortex is given in pmol per
gram tissue in the variable Abeta
.
Amyloid
dataset using the
assumption.check()
function in bayesanova, showing that the
assumption of a Gaussian mixture model is
violatedAfter loading and splitting the data into the three groups, we run the
assumption.check()
function:
R> library(Stat2Data)
R> data(Amyloid)
R> head(Amyloid)
Group Abeta
1 NCI 114
2 NCI 41
3 NCI 276
4 NCI 0
5 NCI 16
6 NCI 228
R> NCI = (Amyloid %>% filter(Group=="NCI"))$Abeta
R> MCI = (Amyloid %>% filter(Group=="MCI"))$Abeta
R> mAD = (Amyloid %>% filter(Group=="mAD"))$Abeta
R> assumption.check(NCI, MCI, mAD)
1: In assumption.check(NCI, MCI, mAD) :
Model assumption of normally distributed data in each group is violated.
All results of the Bayesian ANOVA based on a multi-component Gaussian
mixture could therefore be unreliable and not trustworthy.
2: In assumption.check(NCI, MCI, mAD) :
Run further diagnostics (like Quantile-Quantile-plots) to check if the
Bayesian ANOVA can be expected to be robust to the violations of normality
The results in 8 clearly show that the model assumptions are violated. Therefore, it is not recommended to run a Bayesian ANOVA in this case. A solution via a traditional ANOVA or via a Bayes factor based ANOVA would not proceed at this point, too.
The next example is more in the veins of a simulation approach. We
simulate three-, four-, five- and six-component Gaussian mixtures with
increasing means
bayes.anova()
, showing that the Gibbs sampler yields consistent
estimates of the underlying effect sizesThe results clearly show that even for
This paper introduces bayesanova, an R package for conducting a Bayesian analysis of variance based on MCMC in a Gaussian mixture distribution with known allocations. The Bayesian ANOVA implemented in bayesanova is based on Gibbs sampling and supports up to six distinct components, which covers the typical range of ANOVAs used in empirical research.
The package provides four functions to check the model assumptions, run
the Bayesian ANOVA, visualize the results and check the posterior fit.
All functions have a variety of optional parameters to adapt them to a
specific workflow or goal. Also, convergence issues can be detected via
the built-in convergence diagnostics of all MCMC results in the
anovaplot()
function and it is possible to post-process the results
delivered as raw Markov chain draws by bayes.anova
, for example via
the R package
bayestestR
(Makowski et al. 2019b).
In the paper, multiple examples from medical and psychological research
using real datasets were provided, showing the richness of information
provided by the proposed procedure. Also, while explicit testing (for
example via Bayes factors) is not implemented as standard output, it is
worth noting that computing Bayes factors numerically based on the
Gaussian mixture model is possible for example by using numerical
techniques such as the Savage-Dickey density ratio
(Dickey and Lientz 1970; Verdinelli and Wasserman 1995; Wagenmakers et al. 2010; Kelter 2021c). However,
the focus of explicit hypothesis testing is replaced in the default
output of the procedure by estimation of the effect sizes between groups
(or component density parameters) under uncertainty. If hypothesis
testing is needed, the implemented ROPE can be used for rejecting a
hypothesis based on interval hypothesis tests – compare
(Kelter 2021b), (Linde et al. 2020) and (Kruschke 2018) –
or by using external packages like bayestestR (Makowski et al. 2019b) in
conjunction with the raw samples provided by bayes.anova
. Also, other
indices like the probability of direction (Makowski et al. 2019a) or the
MAP-based p-value (Mills 2018) can be obtained via the package
bayestestR (Makowski et al. 2019b) if hypothesis testing is desired, for an
overview see (Kelter 2021c). To offer users the freedom of choice for
their preferred statistical evidence measure, only a ROPE-based estimate
of the maximum a posteriori effect size
A small simulation study showed for the case of three-component Gaussian mixtures, that the provided MCMC algorithm precisely captures the true parameter values. Similar results hold for the four- or more-component case, as can easily be checked by adapting the provided R code.
In summary, the bayesanova package provides a novel and easy to apply alternative to existing packages like stats (R Core Team 2020) or BayesFactor (Morey and Rouder 2018), which implement the traditional frequentist ANOVA and Bayesian ANOVA models based on the Bayes factor.
Future plans include to add prior predictive checks and up to
12-component support, allowing for
After observing the data, the following quantities are calculated: For
group
bayesanova, stats, BayesFactor, datasets, BANOVA, RStan, Stat2Data, bayestestR
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Kelter, "bayesanova: An R package for Bayesian Inference in the Analysis of Variance via Markov Chain Monte Carlo in Gaussian Mixture Models", The R Journal, 2022
BibTeX citation
@article{RJ-2022-009, author = {Kelter, Riko}, title = {bayesanova: An R package for Bayesian Inference in the Analysis of Variance via Markov Chain Monte Carlo in Gaussian Mixture Models}, journal = {The R Journal}, year = {2022}, note = {https://rjournal.github.io/}, volume = {14}, issue = {1}, issn = {2073-4859}, pages = {54-78} }