In crossed, two-factor studies with one observation per factor-level combination, interaction effects between factors can be hard to detect and can make the choice of a suitable statistical model difficult. This article describes hiddenf, an R package that enables users to quantify and characterize a certain form of interaction in two-factor layouts. When effects of one factor (a) fall into two groups depending on the level of another factor, and (b) are constant within these groups, the interaction pattern is deemed "hidden additivity" because within groups, the effects of the two factors are additive, while between groups the factors are allowed to interact. The hiddenf software can be used to estimate, test, and report an appropriate factorial effects model corresponding to hidden additivity, which is intermediate between the unavailable full factorial model and the overly-simplistic additive model. Further, the software also conducts five statistical tests for interaction proposed between 1949 and 2014. A collection of 17 datasets is used for illustration.
The crossed, two-factor layout with one observation per factor-level combination is a simple and widely used plan. These designs arise in the context of (a) completely randomized two-factor experiments, (b) randomized complete block experiments that block on a source of nuisance variability, and (c) observational studies with two factors. If the effects of the two factors do not interact, then this design permits inference for both factors simultaneously. The following model is commonly used for data collected using these designs:
where
Model ((1)) assumes that the effect of the row factor on
the response
If the effect of factor cjejuni.mtx
data set included in the hiddenf package
(Qiu 2013).
cjejuni.mtx
data, which visually
exhibit non-additivity.Since the data in Figure 1 panel B do not exhibit
parallelism, there is graphical evidence to suggest that
((1)) may not be a suitable model. This is problematic
because inferences conducted under ((1)) will be incorrect
if ((1)) fails to describe the true nature of the variables
under investigation. Fortunately, many methods have been devised to
assess non-additivity in these designs, including those in (Tukey 1949),
(Anscombe and Tukey 1963), (Mandel 1961), (Mandel 1971), (Johnson and Graybill 1972), (Hirotsu 1982), (Kharrati-Kopaei and Sadooghi-Alvandi 2007),
(Tusell 1990), (Boik 1993b), (Franck et al. 2013), and (Malik et al. 2014). (Alin and Kurt 2006) provide a review of
methods available up to 2006. We return to the analysis of these and
other data in subsequent sections of the paper.
Despite the wide usage of unreplicated two-way designs, computational
tools to assess non-additivity are lacking, particularly for more recent
methods. Currently, the
additivityTests
package (Simecek and Simeckova 2012) provides test statistics, critical thresholds, and
binary reject/fail to reject decisions for tests proposed in (Tukey 1949),
(Mandel 1961), (Boik 1993b), (Tusell 1990), (Johnson and Graybill 1972), and (Simecek and Simeckova 2012). The
additivityTests package does not produce p-values.
This paper describes the R package hiddenf, which makes available
several tools for the analysis of non-additivity in unreplicated two-way
layouts. This package contributes to available computational resources
by (a) providing statistical and graphical diagnostic tools within the
context of hidden additivity that enable the user to go beyond overall
tests of additivity towards the inferential goal of characterizing and
quantifying the magnitude of interaction, and (b) providing p-value
computations for five methods to detect non-additivity. Supported tests
include those proposed in (Tukey 1949), (Mandel 1961), (Kharrati-Kopaei and Sadooghi-Alvandi 2007), (Franck et al. 2013), and
(Malik et al. 2014). The latter three are newly available in an open-source
repository via the hiddenf package, which is available from the
Comprehensive R Archive Network (CRAN).
Hidden additivity occurs when the levels of factor
The HiddenF()
function accepts an print()
, anova()
,
summary()
and plot()
, can be applied to objects of the HiddenF
class to produce output useful for the quantification and
characterization of interaction. By default, the HiddenF()
function
considers the row factor as the variable to group. Submitting a
transposed matrix executes the grouping on the column variable. Deciding
which factor to group is subjective, and can be approached using domain
knowledge or a trial-and-error approach. Figure 2
summarizes the functionality of the hiddenf package.
HiddenF()
function, which returns an
object of the HiddenF class. The user can then characterize hidden
additivity and obtain p-values for tests of
non-additivity.The remainder of the paper is organized as follows. The C. jejuni data are first described as a relevant example. The following section reviews the testing procedure for hidden additivity. Functionality to detect hidden additivity using hiddenf is then detailed. Four other supported methods to detect non-additivity are then reviewed. Seventeen data sets are described and explored using the methods supported by hiddenf. The article concludes with a summary.
We now return to the data from Figure 1 panel B. The
vertical axis in this plot is the proportion of disease-resistant
bacteria samples of the C. jejuni strain taken from turkey processing
facilities. These data were collected over a five year period across
four processing plants in North Carolina. Each line corresponds to a
plant, and the year labels correspond to 2008-2012. The matrix
cjejuni.mtx
contains the fractions of bacteria that are classified as
the strain C. jejuni.
> library(hiddenf)
> data(cjejuni.mtx)
> cjejuni.mtx
[,1] [,2] [,3] [,4] [,5]
[1,] 0.16 0.08 0.44 0.06 0.10
[2,] 0.21 0.10 0.16 0.55 0.25
[3,] 0.16 0.08 0.56 0.26 0.26
[4,] 0.07 0.16 0.21 0.42 0.04
Hidden additivity is a useful concept for the analysis of many types of
data because the structure is easy to visualize and interpret. The ACMIF
test (Franck et al. 2013) to detect hidden additivity assigns classical
significance testing-based p-values. A combined testing and plotting
approach allows the researcher to interpret the way in which column
effects change across groups of the rows.
The ACMIF test works by (1) considering each placement of factor
where
Objects in the HiddenF class can be used to characterize hidden
additivity further with application of the generic plot()
, print()
,
summary()
, and anova()
functions. The following code produces an
enhanced interaction plot to help visualize hidden additivity (Figure
3):
> cjejuni.out <-HiddenF(cjejuni.mtx)
> plot(cjejuni.out, main="Hidden Additivity of time effects across plants",
+ rfactor="plant", cfactor="year", colorvec=c("blue", "red"), lwd=3, legendx=TRUE)
The enhanced interaction plot in Figure 3 displays
levels of factor colorvec
which is a vector of length two giving color names. Another
optional argument, legendx
, may be set to TRUE
in order to provide a
legend whose location will be determined according to where on the plot
the user clicks.
Because of the ellipsis(…), arguments such as main
(for a title) or
lwd
(for specifying line widths) can be passed to matplot
or
legend
. The arguments lty
, type
, ylab
, and xlab
are utilized
by the plot.HiddenF()
function to produce the enhanced interaction
plot and therefore are not available for customization by the user.
The print()
function returns results from the ACMIF test.
> print(cjejuni.out)
The ACMIF test for the hidden additivity form of interaction
F=8.965 p-value =0.03309 df=4,8
(Bonferroni-adjusted for all 7 possible configurations)
This output can also be obtained by typing the name of an object of
class HiddenF into the console. The method
argument in the print.HiddenF()
function
supports all of the methods supported by the hiddenf package discussed
later. To see which of these seven possible configurations of rows in
two non-empty groups leads to the greatest additivity within groups, use
the generic summary()
function, which returns information useful for
analysis based on ((2)):
> summary(cjejuni.out)
Number of configurations: 7
Minimum adjusted pvalue: 0.03308869
Rows in group 1: 1 3
Rows in group 2: 2 4
Column means for grp 1: 0.16 0.08 0.5 0.16 0.18
Column means for grp 2: 0.14 0.13 0.185 0.485 0.145
The output above includes the number of configurations under
consideration, the smallest Bonferroni adjusted p-value among these, the
identity of the rows that fall into each group and the means across
levels of
> anova(cjejuni.out)
The ACMIF test for the hidden additivity form of interaction
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
group 1 0.00000 0.000005 0.0009 0.977350
col 4 0.18753 0.046882 8.0450 0.006606
row 2 0.03673 0.018365 3.1514 0.097874
group:col 4 0.20897 0.052243 8.9648 0.004727
Residuals 8 0.04662 0.005828
C.Total 19 0.47986
(Pvalues in ANOVA table are NOT corrected for multiplicity.)
For these data, the interaction between time (col
) and plant group
(group
) accounts for more variability (method
argument as "KKSA", "Mandel" or "Tukey" (see Other Tests for
Non-additivity section).
Levels of factor HiddenF()
function. The output below
suggests no statistical evidence for hidden additivity in the levels of
factor
> cjejuni.trans.mtx<-t(cjejuni.mtx)
> cjejuni.trans.out<-HiddenF(cjejuni.trans.mtx)
> print(cjejuni.trans.out)
The ACMIF test for the hidden additivity form of interaction
F=3.63 p-value =0.8671 df=3,9
(Bonferroni-adjusted for all 15 possible configurations)
Note that support for more than 20 rows is not yet included due to the exponential increase in computational demand as the number of grouped levels increases. Factors with more levels than this will be accommodated in future versions of the software.
The center
option in the plot command allows the user to graph the
hidden additivity plot with data centered at the row means to better see
the concordance of rows with regard to column effects. This is a way of
de-noising the plot. Figure 4 demonstrates this feature
using data from (Graybill 1954). These data will be further analyzed
later. Inspection of the right panel of Figure 4
suggests that the third genotype appears to give inferior yields for
rows in the red group (relative to the performance of other genotypes in
these blocks), but does better for rows colored black. To produce this
plot:
> data(Graybill.mtx)
> Graybill.out <- HiddenF(Graybill.mtx)
> par(mfrow=c(1,2))
> plot(Graybill.out)
> plot(Graybill.out, center=TRUE, main="Hidden Additivity plot\ncenter=TRUE")
The ACMIF approach (Franck et al. 2013) to detect hidden additivity has been described above. The subsections below include code examples and briefly review the other four supported approaches to detect non-additivity in unreplicated studies.
The one degree of freedom test (Tukey 1949) is the earliest and most widely-known approach. The following model for non-additivity corresponds to Tukey’s procedure:
A hypothesis test for this approach is based on the null hypothesis
The TukeyPvalue()
function accepts an object of class HiddenF and
returns a list that contains the p-value for the one degree of freedom
test and an lm object that contains the model that includes the
squared predictions from ((1)) as described above. The
method
argument in the anova.HiddenF()
function can be used to
produce an ANOVA table for this test. Since the p-value is large, the
output below does not provide statistical evidence for non-additivity of
the form suggested in ((3)).
> TukeyPvalue(cjejuni.out)
$pvalue
[1] 0.7077391
$singledf.out
Call:
lm(formula = y ~ rows + cols + psq, data = hfobj$tall)
Coefficients:
(Intercept) rows2 rows3 rows4 cols2 cols3
0.095454 0.029036 0.030905 0.005445 -0.026989 0.043692
cols4 cols5 psq
0.044568 0.006369 1.569601
An extension of the one degree of freedom test is based on the following model (Mandel 1961):
where
By setting
The ANOVA table sum of squares associated with the
The residual sum of squares has the following form:
The test statistic is:
The statistic MandelPvalue()
in a similar fashion.
The MandelPvalue()
function accepts an object of class HiddenF and
returns a list that includes a p-value, sums of squares, F ratio, and
degrees of freedom for the rows-linear test. The output below fails to
detect interaction using Mandel’s rows-linear test:
> MandelPvalue(cjejuni.out)
$pvalue
[1] 0.9458807
$SumSq
SSRow SSCol SSMandel SSE SSTot
0.036735000 0.187530000 0.009849526 0.245740474 0.479855000
$Fratio
[1] 0.120243
$df
[1] 3 9
The error mean square (MSE) subtable comparison approach (Kharrati-Kopaei and Sadooghi-Alvandi 2007) forms
an
Under the assumptions of additivity and homogeneous variance,
As with the ACMIF procedure, grouping is subjective and is accomplished
by choosing one of the two factors upon which to form the groups, and
then placing that factor’s levels into two groups. Different groupings
result in different p-values. Groups are typically chosen based on a
priori knowledge, inspection of an interaction plot, or by screening
several candidate groupings and applying multiplicity adjustment to the
resulting p-values. The hiddenf package adopts the authors’ (Kharrati-Kopaei and Sadooghi-Alvandi 2007)
suggestion of assessing the
The KKSAPvalue()
function accepts an object of class HiddenF and
returns a list that contains the maximal
> t(KKSAPvalue(cjejuni.out))
fmax pvalue grp.vector NumDf DenomDf
[1,] 1.748821 1 Numeric,4 4 4
A recent approach (Malik et al. 2014) considers non-additivity elicited due to cell
values which are remote relative to the additive model. Not all cells
are assumed to contribute to the interaction pattern. The method detects
non-additivity by exploiting the large residuals that arise under this
structure.
Execution of the residual clustering method proceeds as follows. First,
residuals from ((1))
where MalikPvalue
and MalikTab
, respectively. By
default
The MalikPvalue
function accepts an object of class HiddenF and
returns a Monte Carlo p-value, observed test statistic, and Monte Carlo
sample size used in the computation. The k-means approach uses the
method of (Hartigan and Wong 1979) with 100 starting values per Monte Carlo
iteration. The output below suggests that the C. jejuni data do not
exhibit non-additivity of the form suggested by this method. This
approach is invariant to data transposition.
> t(MalikPvalue(cjejuni.out, N=10000))
(Pvalue from Malik's test estimated with N=10000 Monte Carlo datasets)
pvalue Tc N
[1,] 0.8498 40.97983 10000
The MalikTab function accepts as input a user-specified
> Mtab.24.6<-MalikTab(r=24, c=6, N=10000)
> ls(Mtab.24.6)
[1] "q" "Tcsim"
> Mtab.24.6\$q
r c 99% 95% 90%
24.0000 6.0000 445.5725 404.3399 384.0591
additivityPvalues
functionThe additivityPvalues
function accepts an object of class HiddenF
and returns p-values for each of the supported methods.
> t(additivityPvalues(cjejuni.out))
(Pvalue from Malik's test estimated with N=500 Monte Carlo datasets)
Malik.pvalue Mandel.pvalue Tukey.pvalue KKSA.pvalue ACMIF.pvalue
[1,] 0.834 0.9459 0.7077 1 0.0331
Table 1 summarizes an investigation of 17 data sets using the methods
supported by hiddenf. These examples are taken from bioinformatic,
agricultural, industrial, and medical settings, and are all publicly
available. To our knowledge, this is the largest collection and
description of data that has been used to study non-additivity in the
literature to date. Table 1 includes columns for references, labels, and
p-values. Note that ACMIF c, columns-linear, and KKSA c are obtained by
submitting the transposed data matrix to HiddenF()
. A brief
description of each data set follows.
The "Liming" data arises from an experiment that explores the utility
of seven types of blast furnace slags as liming material in agriculture
on three types of soil (Carter et al. 1951). The data were analyzed in the context
of non-additivity in (Johnson and Graybill 1972) and (Kharrati-Kopaei and Sadooghi-Alvandi 2007). The outcome variable is
yields of corn in bushels per acre.
The "Penicillin" data set (Davies and Goldsmith 1972) assesses variability between six
samples of penicillin on 24 plates. The outcome variable is the diameter
of the zone of inhibition. The data were analyzed for non-additivity in
(Malik et al. 2014).
The "Osmotic" data set (Kharrati-Kopaei and Sadooghi-Alvandi 2007) explores five varieties of safflower
grown in solutions with six osmotic potentials. Safflowers are an
important crop due to their oil which can be used for cooking and their
flowers which can be used as a substitute for saffron. The outcome
variable is average root weight.
The "Fertilizer" example originally appeared in (Ostle 1963) and was
re-analyzed to detect non-additivity in (Kharrati-Kopaei and Sadooghi-Alvandi 2007). This example explores
the ratio of dry to wet wheat in four blocks for four fertilizers.
The "Bottles" example was originally analyzed in (Ott and Snee 1973) and also
explored in (Boik 1993a) and (Kharrati-Kopaei and Sadooghi-Alvandi 2007). The study assesses the performance of
a six-headed machine on five occasions.
The "Grain Yields" data measures the yield of a grain crop in bushels
per acre for five levels of fertilizer on five blocks. The data come
from (Ostle 1963) and are re-analyzed in (Kharrati-Kopaei and Sadooghi-Alvandi 2007).
The "Absorbance" data (Mandel 1991) measures absorbancy of wood pulp of
nine polysachharide concentrations from seven different laboratories.
The data are also analyzed in (Alin and Kurt 2006).
The "Permeability" example examines permeability of three sheets of
building material on nine different days. The data appear in (Hald 1952) and
(Giesbrecht and Gumpertz 2004).
The "Wool" data (Lentner and Bishop 1993) examines four cleaning processes for wool
from five different batches. The outcome is losses in weight in
The "Red Blood Cells" data measures the number of red blood cells
counted by five doctors in ten counting chambers. The data appear in
(Biggs and Macmillan 1948) and were also analyzed in (Boik 1993b).
The "Tukey" data set is an illustrative example that includes three
rows and four columns from (Tukey 1949).
The "Ethyl Alcohol" data (Osborne et al. 1913) measures the density of aqueous
solutions of ethyl alcohol at six concentrations and seven temperatures.
These data were also analyzed in (Mandel 1971).
The "Insecticide" data (Ott and Longnecker 2001) comes from a complete block
design that measures the impact of four plots and three insecticides on
the number of string bean seedlings that emerge.
The "Rubber" data (Mandel 1961) contains data on stress measurements in
The "C. jejuni" data measures the fraction of bacteria found on
disease-resistant turkeys (Qiu 2013). These data were collected over a
five year period across four turkey plants in North Carolina. These data
are displayed in Figures 1 and 3.
The "Wheat" data (Graybill 1954) measures wheat yields in bushels per
acre in four varieties in 13 locations. These data are plotted in Figure
4.
The eight Copy number variation data sets "CNV1-CNV8" compare
discrepancies in the copy number signal between normal and tumor tissue
samples from a comparative genomic hybridization array. Each data set
corresponds to a separate location in the genome that is tested with the
assay. Six dogs each had two tissue samples (one normal and one tumor)
upon which the assay was conducted. The eight specific sets were
selected from 5899 sets that were analyzed in a previous study and
because they showed evidence of hidden additivity (Franck et al. 2013). The hidden
additivity exhibited in these copy number responses might suggest the
existence of multiple tumor sub-types for lymphoma in dogs. The full
data for this example may be accessed here:
http://www.sciencedirect.com/science/article/pii/S0167947313001618.
The results of non-additivity tests using methods supported by the
hiddenf package are reported in Table 1. The data come from 17
sources, and there are 24 total sets with the copy number variation
contributing eight individual sets. This collection is not intended as a
representative sample of the larger class of all unreplicated two-factor
data, since many sets were included due to their apparent
non-additivity. This exercise is intended to show the breadth of
applications for which the study of non-additivity is important. To
facilitate discussion, the standard p-value less than
This paper describes the main functionality of the hiddenf package. hiddenf provides descriptive and inferential tools to visualize and characterize hidden additivity. Further, hiddenf includes the ability to compute p-values for five tests for non-additivity. The package is illustrated using seventeen data sets spanning studies in industrial applications, agriculture, and the medical sciences. Non-additivity is evident in many of these data sets, motivating the importance of statistical interaction in unreplicated studies across a variety of scientific domains. The hiddenf package contributes to existing software resources specifically by making three recent tests for non-additivity available in an open-source repository for the first time (in addition to two historical methods) and by providing visualization and descriptive tools to further explore hidden additivity.
Citation | Label | r | c | ACMIF r | ACMIF c | Tukey | row-linear | col-linear | KKSA r | KKSA c | Malik |
---|---|---|---|---|---|---|---|---|---|---|---|
(Carter et al. 1951) | Liming | 7 | 3 | 0.1993 | 0.0110 | 0.9106 | 0.9045 | 0.8604 | 0.0068 | 0.2471 | |
(Davies and Goldsmith 1972) | Penicillin | 6 | 24 | 0.0387 | 0.5382 | 0.9374 | 1.000 | 0.2231 | |||
(Kharrati-Kopaei and Sadooghi-Alvandi 2007) | Osmotic Bars | 6 | 5 | 0.2075 | 0.0388 | 0.4450 | 0.5300 | 0.1848 | 0.1859 | 0.4155 | 0.7453 |
(Ostle 1963) | Fertilizer | 4 | 4 | 0.0107 | 0.0043 | 0.0012 | 0.0146 | 0.0077 | 0.0339 | 0.0493 | 0.4373 |
(Ott and Snee 1973) | Bottles | 6 | 5 | 0.0001 | 0.0031 | 0.0003 | 0.0025 | 0.0006 | 0.0178 | 0.4925 | 0.8839 |
(Ostle 1963) | Grain yields | 5 | 5 | 0.6556 | 0.1485 | 0.1139 | 0.0103 | 0.4425 | 0.0528 | 1.0000 | 0.0403 |
(Mandel 1991) | Absorbance | 7 | 9 | 0.0001 | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | 0.9972 |
(Hald 1952) | Permeability | 9 | 3 | 0.4066 | 0.2629 | 0.7844 | 0.7761 | 0.9544 | 0.3908 | 0.8674 | |
(Lentner and Bishop 1993) | Wool | 4 | 5 | 0.2355 | 0.4087 | 0.0359 | 0.0519 | 0.0410 | 0.7155 | 0.1998 | 0.4182 |
(Biggs and Macmillan 1948) | Red blood cells | 5 | 10 | 0.3291 | 0.0825 | 0.2322 | 0.6790 | 0.4903 | 0.0668 | 1.0000 | 0.8702 |
(Tukey 1949) | Tukey | 3 | 4 | 0.2213 | 0.5773 | 0.8566 | 0.8743 | 0.9298 | 0.5806 | 0.4769 | |
(Osborne et al. 1913) | Ethyl Alcohol | 6 | 7 | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | 0.0002 | 0.0469 | 0.9954 |
(Ott and Longnecker 2001) | Insecticide | 3 | 4 | 0.4038 | 0.5569 | 0.6875 | 0.3846 | 0.8125 | 0.3000 | 0.7603 | |
(Mandel 1961) | Rubber | 11 | 7 | <.0001 | <.0001 | 0.3465 | <.0001 | 0.5403 | 0.0021 | 0.0058 | 0.9787 |
(Qiu 2013) | C. jejuni | 4 | 5 | 0.0331 | 0.8671 | 0.7077 | 0.9459 | 0.8842 | 1.0000 | 0.2862 | 0.8443 |
(Graybill 1954) | Wheat | 13 | 4 | <.0001 | 0.0070 | <.0001 | 0.0003 | <.0001 | 0.0320 | 0.0143 | 0.1335 |
(Franck et al. 2013) | CNV1 | 6 | 2 | 0.0007 | 0.0138 | 0.0310 | 0.0009 | ||||
CNV2 | 6 | 2 | 0.0005 | 0.0659 | 0.0010 | 0.0164 | |||||
CNV3 | 6 | 2 | <.0001 | 0.4674 | 0.0017 | 0.0018 | |||||
CNV4 | 6 | 2 | 0.0005 | 0.1659 | 0.1249 | 0.0182 | |||||
CNV5 | 6 | 2 | 0.0005 | 0.2388 | 0.0259 | 0.0173 | |||||
CNV6 | 6 | 2 | 0.0007 | 0.6187 | 0.2799 | 0.0224 | |||||
CNV7 | 6 | 2 | 0.0006 | 0.0277 | 0.0939 | 0.0193 | |||||
CNV8 | 6 | 2 | 0.0007 | 0.1391 | 0.0340 | 0.0216 |
We’d like to acknowledge Zahra Shenavari of Shiraz University for helpful comments regarding the error mean square subtable comparison approach.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Franck & Osborne, "Exploring Interaction Effects in Two-Factor Studies using the hiddenf Package in R.", The R Journal, 2016
BibTeX citation
@article{RJ-2016-011, author = {Franck, Christopher and Osborne, Jason}, title = {Exploring Interaction Effects in Two-Factor Studies using the hiddenf Package in R.}, journal = {The R Journal}, year = {2016}, note = {https://rjournal.github.io/}, volume = {8}, issue = {1}, issn = {2073-4859}, pages = {159-172} }