Recent statistical literature has paid attention to the presentation of pairwise comparisons either from the point of view of the reference category problem in generalized linear models (GLMs) or in terms of multiple comparisons. Both schools of thought are interested in the parsimonious presentation of sufficient information to enable readers to evaluate the significance of contrasts resulting from the inclusion of qualitative variables in GLMs. These comparisons also arise when trying to interpret multinomial models where one category of the dependent variable is omitted as a reference. While considerable advances have been made, opportunities remain to improve the presentation of this information, especially in graphical form. The factorplot package provides new functions for graphically and numerically presenting results of hypothesis tests related to pairwise comparisons resulting from qualitative covariates in GLMs or coefficients in multinomial logistic regression models.
The problem of presenting information about categorical covariates in
generalized linear models is a relatively simple one. Nevertheless, it
has received some attention in the recent literature. To be clear about
the problem, consider the following linear model where
where
where
Thus, the calculation is not difficult, but calculating and presenting
all of these differences can become cumbersome, especially as
However, when simple contrasts are the only quantities of interest,
neither method above is perfect. When floating/quasi-variances are
presented, the user still has to evaluate a potentially large number of
hypothesis tests by either relying on the overlap in the floating
confidence intervals or by calculating the floating t-statistic. Either
solution requires a good deal of cognitive energy on the part of the
analyst or reader. Compact letter displays do well at identifying
patterns of statistical significance, but are perhaps cumbersome to
investigate when patterns of (in)significance are complicated and,
though mitigated to some degree, the problem still exists for the more
recent multcompTs suggested by Graves et al. (2012). Below, I
discuss a means for presenting this information in a manner that will
permit the immediate evaluation of all the
There are a number of reasonable solutions to the reference category
problem.
Easton et al. (1991) proposed the idea of floating absolute risk as a
means for evaluating multiple comparisons in matched case-control
studies. The idea was to provide sufficient information such that
readers could perform multiple comparisons with estimates of floating
absolute risk at the expense of presenting a single extra number for
each binary variable representing a level of a categorical covariate
(i.e., risk factor). Although Greenland et al. (1999) disagreed on
terminology and on the utility of Easton’s idea of a floating scale,
they agreed on the utility presenting information that would permit
users to easily make the right inferences about relative risks among any
levels of a categorical risk factor. Both Firth and De Menezes (2004)
and Plummer (2004) provided a more rigorous statistical
foundation on which to build estimates of floating absolute risk (or as
Firth and De Menezes call them, quasi-variances). Firth and De Menezes’
method has been operationalized in R in the
qvcalc package
(Firth 2010) and both the methods of Plummer as well as
Greenland et al. have been operationalized in the float()
and
ftrend()
functions, respectively, in the
Epi package
(Carstensen et al. 2013). In general, these solutions allow sufficient (or
nearly sufficient) information to be presented in a single column of a
statistical table that makes valid, arbitrary multiple comparisons
possible.
The measures of floating absolute risk are often used to create floating
(or quasi-) confidence intervals.
The methods discussed above still require the analyst or reader to either evaluate the pairwise hypothesis tests based on the extent to which confidence intervals overlap or calculate the floating t-statistic for each desired contrast. If the former, readers must still engage in a cognitive task of position detection (Cleveland 1985) and then make an inference based on the extent to which intervals overlap. As the horizontal distance between vertically-oriented floating confidence intervals grows, this task becomes more difficult. Finally, as Easton et al. (1991) suggests, floating variances are a “virtually sufficient” summary of the uncertainty relating to relative risks; however, they can produce erroneous inferences if the error rate is sufficiently high. Both Firth and De Menezes (2004) and Plummer (2004) provide methods for calculating this error rate, which is often small relative to other sources of error in the model.
To put a finer point on the problem, consider the example below using data from Ornstein (1976) from the car package (Fox and Weisberg 2011). The model of interest is:
where
The plot of the floating confidence intervals provides similar
information, but readers are still required to make judgements about
statistical significance with a visual method prone to occasional
inferential errors. Consider Figure 1, which presents
confidence intervals using the three different functions that produce
floating variances R — qvcalc()
, float()
and ftrend()
.ftrend()
to put all of these estimates on the same scale. I
recognize that this is not what the authors had intended, but this
should not lead to erroneous inferences in any event
(Easton and Peto 2000).
Contrast | Estimate/(SE) |
---|---|
MIN - AGR | 0.250* |
(0.069) | |
MIN - BNK | 0.416* |
(0.084) | |
MIN - CON | 0.739* |
(0.210) | |
MIN - FIN | 0.361* |
(0.067) | |
MIN - HLD | 0.265* |
(0.118) | |
MIN - MAN | 0.128 |
(0.071) | |
MIN - MER | 0.188* |
(0.085) | |
MIN - TRN | 0.098 |
(0.071) | |
MIN - WOD | -0.248* |
(0.072) |
Estimates and standard errors produced by glht()
from the multcomp
package.
Even if the evidence regarding the outcome of a hypothesis test from two confidence intervals is clear, there are other potential sources of error. Cleveland (1985) finds that detecting position along a common scale is one of the easiest tasks of graphical perception, but that discerning length is considerably more difficult. His experiments show that readers are prone to errors in even the easiest graphical perception tasks and the error rate is nearly twice as high when readers are asked to adjudicate the relative lengths of lines. Conducting hypothesis tests using confidence intervals is an endeavor rife with opportunities for inferential errors.
Means for calculating and presenting models with multiple simple
contrasts have developed in the multiple testing literature as well.
While the thrust of the literature mentioned above was dealing with the
reference category problem directly, the multiple comparisons literature
has placed greater focus on finding the appropriate
Gramm et al. (2006) discuss the two generally accepted methods for
presenting multiple comparisons — the line display and the letter
display. A line display (see for example, Steel and Torrie 1980) prints
a column where each row represents a single element in the multiple
comparisons. In the example above, using the Ornstein data, these would
be the names of the various sectors. Then, vertical lines are drawn
connecting all values that are not significantly different from each
other. This is a relatively simple display, but as shown generally by
Piepho (2004) and in this particular case, it is not always
possible to faithfully represent all of the pairwise comparisons with
connecting line segments. Note that in the third line, a discontinuity
is required to properly depict all of the pairwise relationships.
Further, this method requires that the levels of factors (at least
potentially) be reordered to identify insignificant differences. This
reordering, while reasonable for unordered factors, is not at all
reasonable if the factor is inherently ordered.
Figure 2(a) shows the line display for the Ornstein
model above. A compact letter display (Piepho 2004) places a
series of letters by each level of the categorical variable such that
any two levels with the same letter are not significantly different from
each other.
|
|
|
|
Graves et al. (2012) discuss enhancements to the letter display
that make it somewhat more visually appealing and make the cognitive
tasks involved less cumbersome. This method is operationalized by the
multcompTs
function in the
multcompView
package. While these functions are potentially useful, they are A) still
improved upon by the method discussed below and B) not intended for use
directly with “glm” class objects or “glht” class objects. Despite the
improvements over letter displays, complicated patterns of
(in)significance still result in cluttered displays.
I argue that a good solution to the reference category problem is one
that permits the most efficient presentation and evaluation of a series
of hypothesis tests relating to various (simple) factor contrasts. As
discussed above, both the numerical presentation of floating variances
and the visual presentation of floating confidence intervals are not
maximally efficient on either dimension (presentation or evaluation)
when the analyst desires information about the simple pairwise
difference between coefficients related to the levels of a factor (i.e.,
simple contrasts). Similarly, I suggested that compact letter displays
(and to a lesser extent multcompTs), though they present all of the
appropriate information, are not maximally efficient at presenting the
desired information graphically. As Chambers et al. (1983) and
Cleveland (1985) suggest, one efficient way of presenting many
pairwise relationships is through a scatterplot matrix or a generalized
draftsman’s display (a lower- or upper-triangular scatterplot
matrix).
The factorplot
function in the package of the same name (version 1.1) for R computes
all pairwise comparisons of coefficients relating to a factor; its
print, summary and plot methods provide the user with a wealth of
information regarding the nature of the differences in these
coefficients.glht()
and only those comparisons will
be computed, thus increasing efficiency.
The function calculates equation ((1)) for each simple contrast
directly through a set of elementary matrix operations. First,
The function has methods for objects of class “lm”, “glm”, “glht” and “multinom” which do slightly different things depending on the input. The default method will accept a vector of estimates and either A) a full variance-covariance matrix or B) a vector of quasi or floated variances that will be turned into a diagonal variance-covariance matrix. The methods for “lm”, “glm”, “glht” and “summary.glht” objects calculate the pairwise differences in the linear predictor for the values of the specified factor variable. The method for “multinom” class objects calculates the pairwise differences in coefficients across the categories of the dependent variable for a single variable (i.e., column of the model matrix).
The factorplot
method for “lm” class objects has six arguments. The
first two arguments, obj
and adjust.method
, indicate the object and
the method by which p-values are to be adjusted for multiple comparisons
(possibilities include all of those to p.adjust
from the stats
package). The factor.variable
argument indicates the factor for which
comparisons are desired. pval
allows the user to set the desired Type
I error rate and two.sided
allows the user to specify whether the null
hypothesis is tested against a one- or two-sided alternative with the
latter as the default. The order
argument sets the ordering of the
coefficients, with three possibilities — ‘natural’, ‘alph’ and ‘size’.
The ‘natural’ option maintains the original ordering of the factor, the
‘alph’ option sorts them alphabetically and the ‘size’ option sorts in
ascending order of the magnitude of the coefficient. The choices made
here propagate through the plot, print and summary methods.
The plot method for “factorplot” class objects produces something akin
to an upper-triangular scatterplot. The analogy is not perfect, but the
idea is similar; each entry of the rows-by-columns display indicates the
pairwise difference between coefficients. The statistical significance
of these differences is indicated by three colors (one for
significant-positive, one for significant-negative and one for
insignificant differences). The three colors can be controlled with the
polycol
argument and the text color within the polygons can be
controlled with the textcol
argument.print.est = FALSE
and print.se = FALSE
,
respectively.abbrev.char
argument. Setting this to an
arbitrarily high value will result in no abbreviation. Finally, the
trans
argument allows the user to impose a post-hypothesis-test
transformation to the coefficient estimates. For example, if the
underlying model is a logistic regression, tests will be done on the
log-relative risks, but the relative risks could be plotted with
trans = "exp"
.r.bdiff
holds the coefficient differences. The transformation is done as
follows: do.call(trans, list(r.bdiff))
, so only transformations
amenable to this procedure will work.print.sig.leg
and print.square.leg
,
respectively. Figure 3 shows the display for the Ornstein
model. The following code produces the result in the figure.
library(factorplot)
library(car)
mod <- glm(interlocks ~ log2(assets) + nation + sector, data = Ornstein,
family = poisson)
fp <- factorplot(mod, adjust.method="none", factor.variable = "sector", pval = 0.05,
two.sided = TRUE, order = "natural")
plot(fp, abbrev.char = 100)
The print method for a “factorplot” object prints all of the pairwise
differences, their accompanying analytical standard errors and
(optionally adjusted) digits
argument. The
sig
argument is logical allowing the user to print all pairwise
differences if FALSE
and only significant differences when TRUE
. The
print method also permits the same trans
argument as the plot method
for objects of class “factorplot”. An example of the output from the
print method is below. Here, twenty-five of the forty-five pairwise
differences are statistically different from zero when.
print(fp, sig = T)
Difference SE p.val
AGR - CON 0.489 0.213 0.023
CON - HLD -0.474 0.235 0.045
BNK - MAN -0.288 0.102 0.005
CON - MAN -0.611 0.215 0.005
FIN - MAN -0.233 0.082 0.005
BNK - MER -0.228 0.106 0.032
CON - MER -0.551 0.220 0.013
AGR - MIN -0.250 0.069 0.000
BNK - MIN -0.416 0.084 0.000
CON - MIN -0.739 0.210 0.001
FIN - MIN -0.361 0.067 0.000
HLD - MIN -0.265 0.118 0.026
MER - MIN -0.188 0.085 0.029
BNK - TRN -0.318 0.082 0.000
CON - TRN -0.641 0.217 0.004
FIN - TRN -0.263 0.070 0.000
AGR - WOD -0.498 0.076 0.000
BNK - WOD -0.665 0.095 0.000
CON - WOD -0.988 0.215 0.000
FIN - WOD -0.610 0.077 0.000
HLD - WOD -0.513 0.121 0.000
MAN - WOD -0.376 0.080 0.000
MER - WOD -0.437 0.090 0.000
MIN - WOD -0.248 0.072 0.001
TRN - WOD -0.346 0.081 0.000
The summary method for “factorplot” objects prints the number of coefficients that are significantly smaller than the one of interest and the number of coefficients larger than the one of interest for each level of the factor. While this is not a common means of presenting the results, this does nicely summarize the extent of significant differences among the coefficients. Below is an example of printout from the summary method. It is easy to see that the wood industry (WOD) has the highest conditional means as it is significantly bigger than all of other categories. It is also easy to see that the construction industry (CON) has one of the smallest conditional means as it is significantly smaller than seven of the other categories and not significantly bigger than any.
summary(fp)
sig+ sig- insig
AGR 1 2 6
BNK 0 5 4
CON 0 7 2
FIN 0 4 5
HLD 1 2 6
MAN 3 1 5
MER 2 2 5
MIN 6 1 2
TRN 3 1 5
WOD 9 0 0
Together, the factorplot
function and its associated print, plot and
summary methods provide a wealth of information including direct
hypothesis tests using analytical standard errors for the simple
contrasts most commonly desired in (G)LMs.
Plummer et al. (2007) were interested in discerning the extent to which infection with H. pylori containing the cytotoxin-associated (cagA) gene increased the severity of gastric precancerous lesions. They found that cagA+ patients had increased risks of more severe lesions while cagA- patients were only at significantly higher risk (than their uninfected counterparts) of chronic gastritis. Table 2 summarizes the results of the relative risk of the various types of gastric lesions versus the baseline of normal or superficial gastritis.
cagA- | cagA+ | |||
OR | FSE | OR | FSE | |
Normal and superficial gastritis | 1.00 | 0.242 | 1.00 | 0.320 |
Chronic gastritis | 2.12 | 0.096 | 4.33 | 0.101 |
Chronic atrophic gastritis | 1.44 | 0.156 | 3.89 | 0.160 |
Intestinal metaplasia I | 1.31 | 0.140 | 4.14 | 0.141 |
Intestinal metaplasia II | 1.44 | 0.380 | 10.8 | 0.349 |
Intestinal metaplasia III | 1.46 | 0.484 | 21.9 | 0.431 |
Dysplasia | 0.90 | 0.375 | 15.5 | 0.311 |
OR = odds ratio
FSE = floating standard error
Adapted from Figure 1 in Plummer et al. (2007 p1331).
The default method for the factorplot
function allows the user to
supply a vector of point estimates and (floating) variances rather than
an estimated model object. This function will be particularly useful for
those scholars in epidemiology, where floating standard errors are more
routinely presented. With 7 levels of the factor in
Table 2, there are 21 pairwise comparisons implied,
which would require users to do a lot of calculations. However,
inputting the estimates and floated variances into R and subjecting them
to the factorplot
function can do all of the calculations
automatically. Below is an example of how the results could be used in
conjunction with the factorplot suite of functions.
est1 <- log(c(1.00,2.12,1.44,1.31,1.44,1.46,0.90))
var1 <- c(0.242,0.096,0.156,0.140,0.380,0.484,0.375)^2
est2 <- log(c(1.00,4.33,3.89,4.14,10.8,21.9,15.5))
var2 <- c(0.320,0.101,0.160,0.141,0.349,0.431,0.311)^2
resdf <- 48+16+27+532+346+144+144+124+58+166+162+75+24+
53+10+15+61+6+18+90+12-18
names(est1) <- names(est2) <- c(
"Normal Gas","Chronic Gas", "Chronic A. Gas",
"IM I", "IM II", "IM III", "Dysplasia")
plummer_fp1 <- factorplot(est1, var = var1, resdf = resdf, adjust.method = "none")
plummer_fp2 <- factorplot(est2, var = var2, resdf = resdf, adjust.method = "none")
plot(plummer_fp1, trans = "exp", abbrev.char = 100, scale.text = 1.5,
scale.space = 1.5)
plot(plummer_fp2, trans = "exp", abbrev.char = 100, scale.text = 1.5,
scale.space = 1.5)
The plots are displayed in Figure 4. The left-hand plot suggests that H. pylori cagA- seems to raise the risk of chronic gastritis relative to Intestinal metaplasia I and the reference group of normal and superficial gastritis. The differences in the risk of chronic gastritis and chronic atrophic gastritis or dysplasia are also significant. The right-hand plot indicates that there are no significant differences among the second through fourth diagnoses and the fifth through seventh diagnoses. The difference between the risk of intestinal metaplasia I and II (for cagA+) is also significant.
|
|
|
|
When factorplot()
encounters an object of class multinom
, it will
make comparisons within the same variable across all levels of the
dependent variable. The coefficient table presents a specific set of
pairwise comparisons — namely those indicating the relationship of
each variable to the binary choice of each non-reference category versus
the reference category. However, other comparisons implied by that
coefficient table may be interesting or useful and should be
investigated.
In the example below, I estimate a multinomial logistic regression model
of vote choice (vote
) on a number of standard controls: retrospective
national economic evaluations (retnat
), self-placement on the
left-right ideological continuum (lrself
), gender (male
) and age
(age
).france
in the package factorplot for more
details about the origin and coding of the data.
library(nnet)
data(france)
france.mod <- multinom(vote ~ retnat + lrself + male + age, data = france)
fp3 <- factorplot(france.mod, variable = "age")
plot(fp3)
Figure 5 shows that as people get older, they are more
likely to vote for RPR or UDF than the Greens or Communists (PCF) and
more likely to vote for the Socialists (PS) than the Greens. If one is
interested in whether variables have significant effects on vote choice,
all pairwise comparisons should be considered. factorplot
makes it
easy for users to appropriately evaluate all relevant pairwise
comparisons.
Easton’s (1991) contribution of floating absolute risk has been
influential, especially in epidemiology and medicine, allowing
researchers to present easily information that permits the reader to
make any pairwise comparison among the different levels of a risk
factor. (De Menezes 1999; Firth and De Menezes; 2004) and
Plummer (2004) have provided not only a rigorous, model-based
foundation for this idea, but have also provided software that easily
produces these quantities for a wide array of statistical models. I
argue that while these quantities are interesting and useful, floating
confidence intervals, which are often provided ostensibly to permit
hypothesis testing can be imprecise and potentially misleading, as
regards hypothesis testing. Compact letter displays
(Piepho 2004) are a step in the right direction, but I argue
that they can still be improved upon in terms of graphically presenting
information of interest to many researchers. In the common situation
wherein one is interested in simple contrasts, the factorplot()
functions and their associated print, plot and summary methods discussed
above provide much greater transparency with respect to the presentation
and evaluation of hypothesis tests than floating absolute risk or
quasi-variance estimates. The visual presentation of direct hypothesis
tests requires much less effort to adjudicate significance and uncover
patterns in the results than other methods, including compact letter
displays. While the calculation of these hypothesis tests is not novel,
the methods of presenting and summarizing the information represent a
significant advance over the previously available general solutions
available in R.
I would like to thank Bob Andersen, Ryan Bakker, John Fox, Bill Jacoby, Martyn Plummer and an anonymous reviewer as well as participants in the Regression III course at the ICPSR Summer Program for helpful comments and suggestions.
multcomp, qvcalc, Epi, car, multcompView, factorplot
ClinicalTrials, Econometrics, Epidemiology, Finance, MixedModels, Survival, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
ftrend()
to put all of these estimates on the same scale. I
recognize that this is not what the authors had intended, but this
should not lead to erroneous inferences in any event
(Easton and Peto 2000).[↩]glht()
and only those comparisons will
be computed, thus increasing efficiency.[↩]print.est = FALSE
and print.se = FALSE
,
respectively.[↩]r.bdiff
holds the coefficient differences. The transformation is done as
follows: do.call(trans, list(r.bdiff))
, so only transformations
amenable to this procedure will work.[↩]france
in the package factorplot for more
details about the origin and coding of the data.[↩]Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
II, "factorplot: Improving Presentation of Simple Contrasts in Generalized Linear Models", The R Journal, 2013
BibTeX citation
@article{RJ-2013-021, author = {II, David A. Armstrong}, title = {factorplot: Improving Presentation of Simple Contrasts in Generalized Linear Models}, journal = {The R Journal}, year = {2013}, note = {https://rjournal.github.io/}, volume = {5}, issue = {2}, issn = {2073-4859}, pages = {4-15} }