Log Likelihood Ratios for Common Statistical Tests Using the likelihoodR Package

Peter Cahusac

doi:10.32614/RJ-2022-051

1 Introduction

Maximum likelihood estimation (MLE) is well-understood and widely used throughout statistics. In contrast, the use of the likelihood function as a basis for inference is much less understood and even confused with MLE. As Edwards wrote in his excellent book : "At one recent international conference at which I laboured for three-quarters of an hour to make clear the advantages of likelihood inference, the chairman thanked me for my lecture on the Method of Maximum Likelihood" (Edwards 1992) p 101. In the Epilogue of this book on p 212, Edwards says that MLE is “a red herring”. To clarify: MLE is used to estimate a parameter value according to a supposed probability distribution, while likelihood inference is used to compare two hypotheses through the ratio of their values on a likelihood function.
All the main statistical approaches to scientific inference (frequentist, Bayesian and information criterion) are based upon calculated probabilities. The frequentist approach, for example, typically uses a sampling distribution centred on a null hypothesis and calculates the probability of obtaining the observed value or values more extreme from the null. The likelihood approach, also known as the evidential approach, differs in that it is simply based upon the evidence provided by the observed data (not including more extreme values) and represented by the likelihood function. The ratio of the heights under the likelihood function according to the hypothesis values tested provides the likelihood ratio. To create a linear scale of evidence, the natural logarithm of this is taken to give us the log likelihood ratio, also known as the support, a term that was first defined by Harold Jeffreys (Jeffreys 1936).
The likelihood approach is not subject to the same criticisms often leveled at the frequentist and Bayesian approaches (Goodman and Royall 1988; Edwards 1992; Royall 1997; Goodman 1999; Dixon 2003; Dienes 2008; Wasserstein and Lazar 2016; Lakens 2021). Likelihood ratios provide objective measures of evidence between competing hypotheses, unaffected by the intentions of the investigator. Log likelihood ratio (support, see below) values are proportional to the quantity of data, representing the weight of evidence. This means that support values from independent studies can simply be added together, e.g. for meta-analysis. Unlike other approaches based on probabilities, likelihood ratios are unaffected by transformations of variables.
Despite the apparent advantages of the evidential approach there is a dearth of available resources in statistical computing. None of the major commercial packages (e.g. SPSS, SAS, Minitab) provide likelihood ratios or support values. This should not be confused with the wide availability of likelihood ratio tests (also known as G-tests), which ultimately provide p values according to the frequentist approach. Because virtually no software is available for analysis, this impacts on the use of the evidential approach in scientific reporting. This also impacts on the teaching of the evidential approach, which then negatively feeds back to reduced scientific reporting. The likelihoodR package for R (Cahusac 2021) is an attempt to address this situation and calculations are based upon the recent book (Cahusac 2020b). R (Ihaka and Gentleman 1996) is a widely used statistical platform with a huge variety of packages.

2 Support

The package always reports the support (log likelihood ratio). This relative evidence scale ranges from -$\infty$ to +$\infty$, with zero representing no evidence either way. A working interpretation for this scale has been offered by (Goodman and Royall 1988), see Table 1. No threshold need be applied, and S values are given to just one decimal place, e.g. 2.3.

Table 1: Interpretation for values of S, the support, calculated as the natural logarithm of the likelihood ratio. Negative values would represent support for hypothesis values H$_{2}$ vs H$_{1}$. Typically, it is sufficient to give S to one decimal place.
S	Interpretation of H$_{1}$ vs H$_{2}$
0	No evidence either way
1	Weak evidence
2	Moderate evidence
3	Strong evidence
4	Extremely strong evidence

The support (S) values reported in the package are distinct from the surprise/surprisal S-values described by (Palm 2012) based on Shannon information theory, and that by (Greenland 2019) produced by simply taking the negative log base 2 of the p value.
The likelihood intervals are reported wherever possible and these are given in terms of support rather than likelihood ratio. A typical likelihood interval (support interval) is the S-2 interval, which numerically often closely corresponds to the frequentist 95% confidence interval (Cahusac 2020b). The S-2 interval represents an interval based upon e$^{-2}$ = 0.135 = 1 / 7.40 likelihood ratio interval. All the values within the interval would have a likelihood ratio of no less than 1 / 7.40, or an S = -2. The S-3 interval would be e$^{-3}$ = 0.05 = 1 / 20.09, and so on. Analyses also report other relevant statistics, such as t, F and $\chi^{2}$, as well as the corresponding frequentist p value. Where possible the likelihood function is given, decorated with hypothesis parameter values shown as coloured lines. Currently there are 14 different statistical tests implemented by the package, the summary for each are given in Table 2 (listed alphabetically).

Table 2: A summary of the functions available in the **likelihoodR** package
Function	Description
`L_1way_ANOVA`	Independent samples one-way ANOVA
`L_1way_cat`	One-way categorical data analysis for binomial and multinomial
`L_1way_RM_ANOVA`	One-way repeated measures ANOVA
`L_2S_ttest`	Independent samples t test
`L_2way_cat`	Two-way categorical data analysis
`L_2way_Factorial_ANOVA`	Two-way independent samples factorial ANOVA
`L_corr`	Bivariate normal correlation
`L_efficacy`	Efficacy analysis for binomial categorical data
`L_logistic_regress`	Multiple logistic regression
`L_OR`	Odds ratio
`L_regress`	Bivariate regression for linear, quadratic and cubic comparisons
`L_RR`	Relative risk
`L_ttest`	One sample and related samples t test
`L_t_test_sample_size`	Sample size calculation using the evidential approach for t tests

3 Continuous data

A range of tests for differences of continuous variables is available. One function L_ttest performs a one sample and related samples test. As well as specifying the null value, two alternative hypothesis values can be specified, one in terms of the measurements used and the other in terms of Cohen’s d.

L_ttest(data1, data2, null=0, d=0.5, alt.2=NULL, L.int=2, verb=TRUE)

Where the arguments are:

data1	a (non-empty) numeric vector of data values
data2	a (non-empty) numeric vector of data values for related sample, default = NULL
null	value for the null hypothesis, default = 0
d	Cohen’s effect size, default = 0.5
alt.2	value for an alternative hypothesis, in units used for data, default = NULL
L.int	likelihood interval given for a given support value, e.g. 2 or 3, default = 2
verb	show output, default = TRUE

As an example:

> # one sample Gosset's original additional hours of sleep data Cahusac (2020) p 29 
> mysample <- c(0.7, -1.6, -0.2, -1.2, -0.1, 3.4, 3.7, 0.8, 0.0, 2.0)
> a1=L_ttest(mysample, d=0.5, alt.2=2, L.int=2)

  Maximum support for the observed mean 0.75 (dashed line) against the null 0 (black line) = 0.892
Support for d of 0.5 (0.8945048, blue line) versus null = 0.856
Support for d versus 2nd alt Hypothesis 2 (green line) = 2.131
Support for 2nd alt Hypothesis versus null = -1.275

S-2 likelihood interval (red line) is from -0.44025 to 1.94025

t(9) = 1.326, p = 0.2175978, d = 0.419

graphic without alt text — Figure 1: Graphical output from running the one-sample test function `L_ttest`. The likelihood function for the mean value given the data, with the MLE (sample mean) represented by the vertical dashed line. The black line is the specified null value (here at 0), the blue line is for a specified alternative hypothesis (for an effect size of 0.5), and the green line represents a second alternative hypothesis (for 2 hours). The red horizontal line shows the S-2 likelihood interval.

The assigned object a1 contains values of interest which can be assessed in the usual way with a1$..., but entering a1 on its own gives all the values:

`$obs.mean`	the observed mean or difference in mean for related samples
`$df`	degrees of freedom
`$alt.H1`	mean value according to specified d
`$alt.H2`	specified second hypothesis value
`$S_max`	maximum support for observed mean against the null
`$S_10`	support for d versus null
`$S_12`	support for d versus specified second hypothesis
`$S_20`	support for second hypothesis versus the null
`$like.int`	likelihood interval
`$L.int.spec`	specified likelihood interval in units of support
`$null.value`	null value
`$t.val`	t value for test against null
`$p.val`	p value for test against null
`$d.obs`	observed effect size

For the related sample test a second vector of values of equal length to data1 (and paired) is entered for data2. The independent samples test function is L_2S_ttest. The data is entered as the first argument followed by the vector of the same length coding groups. Similar arguments to the one sample/related sample test can be specified and similar output, but this time showing the difference in means.
The package includes one-way ANOVA (L_1way_ANOVA), one-way repeated measures ANOVA (L_1way_RM_ANOVA) and two-way between-participants factorial ANOVA (L_2way_Factorial_ANOVA). All these functions use contrasts, employing the model comparison approach espoused by Glover and Dixon (Dixon 2003, 2013; Glover and Dixon 2004). For the one-way analyses if the arguments for the contrasts are not specified then they default to testing a linear and a quadratic contrast. In the factorial ANOVA, no contrast comparisons are made if the contrasts are left unspecified. If the first contrast is specified then this is compared to the main effects, and if the second contrast is specified then this is compared to the first contrast. Such contrast comparisons are easy to do in this approach, and are difficult or impossible to do using frequentist p values. The support values for the between-participants analyses are adjusted using Akaike’s correction (Hurvich and Tsai 1989). We will look at L_2way_Factorial_ANOVA. The first line of output gives the S for comparing the full model of main effects and interaction with the null model. Using data from Cahusac (2020b) p 91

> time <- c(6.4, 4.6, 6.4, 5.6, 5.9, 6.1, 6.3, 4.5, 4.8, 6.6, 7, 9.3, 7.9, 9.4, 8.2,
          4.4, 4.2, 5, 6.9,4.5, 4, 4.3, 6.9, 5.5, 5.8, 4.4, 4.2, 5.1, 6.9, 4.5)
> Treatment = gl(3,5,30, labels=c("T1","T2","T3"))
> Health = gl(2,15,30, labels=c("Hemophiliac","Normal"))
> contrast1 <- c(-1, -1, 5, -1, -1, -1) # interaction Hemo T3 higher than others
> contrast2 <- c(-1, -1, -1, 1, 1, 1)    # main effect of health status (Hemo higher)
> m1=L_2way_Factorial_ANOVA(time, Treatment, Health, contrast1, contrast2, verb=TRUE)

Support for full model (including interaction) versus null = 7.036
 Support for full model versus main effects = 2.968
 Support for contrast 1 versus main effects = 7.587
 Support for contrast 1 versus contrast 2 = 8.462

First factor main effect F(2,24) = 5.039, p = 0.01488452, partial eta-squared = 0.296
 Second factor main effect F(1,24) = 15.992, p = 0.0005281828, partial eta-squared = 0.4
 Interaction F(2,24) = 6.219, p = 0.006664992, partial eta-squared = 0.341
 Contrast 1 F(1,24) = 36.048, p = 3.373928e-06

As before, the assigned object m1 contains values of interest which can be assessed in the usual way with m1$..., but entering m1 on its own gives all the values.
Bivariate normal correlation uses the L_corr function, and is demonstrated by using data from the heptathlon (Cahusac 2020b) p 104

> m200 <- c(22.6,23.7,23.1,23.6,23.6,23.6,25.5,23.9,24.5,23.9,24.9,24.8,24.7,
          25.0,24.6,24.9,25.0,25.6,24.8,25.5,25.7,24.9,26.6,25.2,26.2)
> m800 <- c(128.5,126.1,124.2,132.5,134.7,132.5,138.5,127.9,133.7,132.2,136.1,142.8,
          125.8,131.5,137.1,134.9,146.7,133.9,146.4,144.0,133.4,138.0,139.2,137.3,163.4)
> m2=L_corr(m200, m800, null=0, exp.r=0.5, L.int=3, alpha=.05, verb=TRUE)

Support for observed correlation 0.6198 (dashed line) versus null of 0 (black line) = 5.776
 Support for specified correlation of 0.5 (blue line) versus observed r = -0.338
 Support for specified correlation versus null = 5.438
 S-3 likelihood interval (red line) is from 0.19968 to 0.8474

 P value = 0.00095 
 N = 25

The assigned object m2 contains values of interest which can be assessed in the usual way with m2$..., but entering m2 on its own gives all the values.
The regression function L_regress only accommodates one predictor, while the logistic regression function L_logistic_regress allows up to 6 predictors, which need to be dummy coded for nominal data with more than 2 levels.

4 Categorical data

There are 5 different tests included in the package (excluding logistic regression mentioned above). The simplest is the one-way categorical data analysis using the function L_1way_cat. Two categories represents the binomial (giving the likelihood function plot), while multiple categories represents the multinomial distribution. The two-way categorical analysis uses the L_2way_cat function. For these two functions an additional evidence-based statistic S for the variance is calculated. This uses the formula given by Cahusac (2020b) p 158 and derived from Edwards (1992) p 187:

\[\label{eqn} S = \frac{df}{2} \left( log \frac{df}{\chi_{df}^{2}} \right) - \frac{1}{2}(df - \chi_{df}^{2}), \tag{1}\] where df is the degrees of freedom. This is most useful to test the variance in the model, specifically whether data are "too good to be true", i.e. the data fit a particular hypothesis closer than we would expect by chance (Edwards 1986). Using just 2 categories in the one-way analysis can be demonstrated:

> obs <- c(18,5); exp.p <- c(0.7, 0.3)  # observed and expected values
> m3 <- L_1way_cat(obs, exp.p, verb = TRUE)

Binomial support for difference of MLE 0.7826087 (dashed line) 
 from 0.7 (blue line) with 1 df = 0.398
 Support for variance differing more than expected = 0.019

 S-2 likelihood interval (red line) from 0.5853 to 0.91794

Chi-square(1) = 0.747,  p = 0.3872968
 Likelihood ratio test G(1) = 0.795, p = 0.37258, N = 23
 Likelihood-based 95% confidence interval from 0.58958 to 0.91597

As previously, the assigned object m3 contains values of interest which can be assessed in the usual way with m3$..., but entering m3 on its own gives all the values.
Some of the functions provide likelihood-based % confidence interval (Aitkin et al. 1989).
Finally, there are functions to calculate the odds ratio L_OR, relative risk L_RR and binomial efficacy L_efficacy.

5 Sample size calculations

The main challenge faced by the evidential researcher is of obtaining sufficiently strong evidence for or against one of two specified hypotheses. Like the probability of a Type II error, this probability is large with a small sample size and decreases as sample size increases. The function L_t_test_sample_size can be used to calculate the pre-study sample size for all the t tests (Cahusac and Mansour 2022). The combined misleading and weak probability (Royall 1997, 2000; Royall 2004) is entered, with a default of 0.05, together with the strength of evidence desired (default = 3). For a paired samples test, where we wish to calculate sample size with a combined .2 probability of obtaining misleading or weak evidence, strength of evidence S = 2 and effect size 0.5, we would obtain a value of 38 by using the following:

> L_t_test_sample_size(MW = 0.2, sd = 1, d = 0.5, S = 2, paired = TRUE)

For 1 sample, or related samples, t test with M1 + W1 probability of 0.2
 Strength of evidence required is 2, and effect size of 0.5
 Required sample size = 38

The somewhat comparable calculation for Type II error of 0.2, two-sided alpha = 0.05 and same effect size of 0.5 produces a sample size of 34 (using stats::power.t.test(power = .80, delta = 0.5, sig.level=0.05, type="paired")).

6 Conclusions

The functions described for the likelihoodR package may be useful for those researchers and statisticians who wish to use the evidential approach for their data analysis (Cahusac 2020a). In addition to the advantages mentioned earlier in the introduction there are other desirable features. First, categorical data analyses are not restricted by normality assumptions, and support values for independent components of cross-tabulated data sum precisely and algebraically (unlike such calculations in chi-square analyses). Second, in categorical and measurement analyses it is possible to show that the data fit the null (or other) hypothesis too well (e.g. for detection of data fraud). Third, analyses are versatile with unlimited complexity for model comparisons within a dataset, for example in ANOVA (Glover and Dixon 2004).
As far as the author is aware, no other packages are available in R or other platforms. Currently users calculate likelihood ratios manually. This package addresses this shortcoming and hopefully will encourage more users to express their results in terms of log likelihood ratios.
One of the products of the likelihoodR package is that a module has been developed for jamovi (The jamovi project 2021), named jeva, which includes many of these functions. Hopefully this will encourage further interest in the likelihood approach and facilitate teaching and research. The equivalent jamovi analysis is given earlier for the L_ttest produces output given in Figure 5. The output produced by jamovi is identical to that produced earlier by the package, although simpler in that it lacks the option of comparing an effect size (d, illustrated by the blue line in package output). The null versus observed (1st line of Support output) is -0.892, while in the package it is 0.892, the positive value being due to comparing observed versus the null. Other outputs match apart from decimal rounding, which can be selected in jamovi. The t, degrees of freedom (df) and p are the same for the null versus observed, although the jamovi output includes another line giving these statistics for the alternative hypothesis versus the observed. The jamovi output includes group descriptive statistics (although these are available from the assigned object in the package, e.g. $obs.mean. The jamovi output also includes the option of a descriptives plot (not shown) which displays the mean with specified likelihood interval.
The likelihoodR package described in this article provides a large number of functions, currently many more than envisaged for the jamovi module. As such, it will provide a major reference package for users interested in the likelihood approach.

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2022-051.zip

Log Likelihood Ratios for Common Statistical Tests Using the likelihoodR Package

1 Introduction

2 Support

3 Continuous data

4 Categorical data

5 Sample size calculations

6 Conclusions

6.1 Supplementary materials

References

Reuse

Citation