In this paper we present a new R package called sgof for multiple hypothesis testing. The principal aim of this package is to implement SGoF-type multiple testing methods, known to be more powerful than the classical false discovery rate (FDR) and family-wise error rate (FWER) based methods in certain situations, particularly when the number of tests is large. This package includes Binomial and Conservative SGoF and the Bayesian and Beta-Binomial SGoF multiple testing procedures, which are adaptations of the original SGoF method to the Bayesian setting and to possibly correlated tests, respectively. The sgof package also implements the Benjamini-Hochberg and Benjamini-Yekutieli FDR controlling procedures. For each method the package provides (among other things) the number of rejected null hypotheses, estimation of the corresponding FDR, and the set of adjusted
Multiple testing refers to any instance that involves the simultaneous testing of several null hypotheses, i.e.,
Nowadays, we find many statistical inference problems in areas such as
genomics and proteomics which involve the simultaneous testing of
thousands of null hypotheses producing as a result a number of
significant
The goal here is to decide which
It is well known that the smaller the
In this paper we introduce the sgof package which implements, for the first time in R, SGoF-type methods (Carvajal-Rodríguez et al. 2009; de Uña-Álvarez 2011), which have been proved to be more powerful than FDR and FWER based methods in certain situations, particularly when the number of tests is large (Castro-Conde and de Uña-Álvarez {in press}). BH (Benjamini and Hochberg 1995) and BY (Benjamini and Yekutieli 2001) methods are included in the package for completeness. Users can easily obtain from this package a complete list of results of interest in the multiple testing context. The original SGoF procedure (Carvajal-Rodríguez et al. 2009) is also implemented in the GNU software SGoF+ (Carvajal-Rodríguez and Uña-Álvarez 2011), see http://webs.uvigo.es/acraaj/SGoF.htm, while a MATLAB version was also developed (Thompson 2010). However, none of these tools work within R, nor do they include the several existing corrections of SGoF for dependent tests. These limitations are overcome by package sgof.
Recent contributions in which the SGoF method has been found to be a very useful tool include protein evolution (Ladner et al. 2012) and neuroimaging (Thompson et al. 2014).
The Bioconductor software (Gentleman et al. 2004) provides tools for the analysis and comprehension of high-throughput genomics data. Bioconductor uses the R statistical programming language and is open source and open development. It has two releases each year, nearly thousand software packages, and an active user community. Some of the tools of Bioconductor related to multiple testing methods are the following.
The
qvalue
package (Dabney and Storey 2014) takes the list of
The HybridMTest package (Pounds and Fofana 2011) performs hybrid multiple testing that incorporates method selection and assumption evaluations into the analysis using empirical Bayes probability estimates obtained by Grenander density estimation.
The
multtest
package (Pollard et al. 2005) performs non-parametric bootstrap and permutation
resampling-based multiple testing procedures (including empirical
Bayes methods) for controlling the FWER, generalized FWER, tail
probability of the proportion of false positives, and FDR. Results
are reported in terms of adjusted
Other R packages for multiple testing problems include the following.
The mutoss package (MuToss Coding Team et al. 2014) is designed to the application and comparison of multiple hypotheses testing procedures like the LSL method presented in (Hochberg and Benjamini 1990) or the (Storey et al. 2004) adaptive step-up procedure.
The multcomp package (Hothorn et al. 2008) performs simultaneous tests and confidence intervals for general linear hypotheses in parametric models, including linear, generalized linear, linear mixed effects and survival models.
The stats package includes the function p.adjust
which, given a
set of "holm"
(Holm 1979), "hochberg"
(Hochberg 1988), "hommel"
(Hommel 1988)
and "BH"
(Benjamini and Hochberg 1995).
The rest of the paper is organized as follows. First we introduce the methodological background for SGoF- and FDR-type methods. Then the sgof package is described and its usage is illustrated through the analysis of two real data sets. Finally, the last section contains the main conclusions of this work.
(Carvajal-Rodríguez et al. 2009) proposed a new multiple comparisons adjustment, called SGoF (from
sequential goodness-of-fit) which is based on the idea of comparing the
number of
In order to formalize things, let
A slightly different version of the SGoF procedure is obtained when
declaring as true effects the
The main properties of SGoF-type procedures were analyzed in detail by
(de Uña-Álvarez 2011, 2012). In particular, it was shown that SGoF gives flexibility to
the FDR by controlling it at level
The SGoF Bayesian procedure is an adaptation of the original SGoF to the
Bayesian paradigm (Castro-Conde and de Uña-Álvarez 2013). In this context, it is assumed that the
probability
The Bayesian SGoF procedure consists of two main steps. In the first
step Bayesian SGoF decides if the complete null hypothesis is true or
false, by using a pre-test rule which works as follows. First, the usual
default prior probabilities for
The second step in Bayesian SGoF is to compute the number of rejected
nulls. Proceeding analogously to the frequentist SGoF, a one-sided
Compared to frequentist versions of SGoF (Binomial SGoF, Conservative
SGoF), Bayesian SGoF may result in a more conservative approach,
particularly when the number of tests is low to moderate. This is so
because of the Bayesian perspective for testing for point nulls, on
which the pre-test rule is based That is, Bayesian SGoF will accept the
absence of features in situations when classical SGoF detects a signal.
Another feature of Bayesian SGoF is the interpretation of the results.
It should be taken into account that Bayesian SGoF controls for the
probability of type I errors conditionally on the given set of
It has been quoted that the SGoF multiple testing procedure is very
sensitive to correlation among the tests, in the sense that it may
become too liberal when the
Given the initial significance threshold
Analogously to original SGoF, BB-SGoF proceeds by computing a one-sided
confidence interval for the difference between the observed and expected
amounts of
A practical issue in the application of BB-SGoF is the choice of the
number and the size of the blocks (otherwise these blocks are assumed to
be located following the given sequence of
Unlike SGoF-type procedures, FDR based methods aim to control the
expected proportion of false discoveries at a given level
For a given
Then reject (i.e., declare positive discoveries) all
The BH procedure controls the FDR al level
FDR based methods are often used nowadays to take the multiplicity of tests into account. However, as mentioned, they may exhibit a poor power in particular scenarios, namely, those with a large number of tests and a small to moderate proportion of ‘weak effects’ (true alternatives close to the corresponding nulls). In such settings, application of alternative methods like SGoF-type procedures is recommended.
A method closely related to BH is the q-value approach (Storey 2003). The q-value of an individual test is the expected proportion of false positives incurred when calling that test significant. Formally, define the positive false discovery rate (pFDR) as follows:
where
Therefore, the q-value is the minimum possible pFDR when rejecting a
statistic with value
The q-value procedure rejects all the null hypotheses with a q-value
below the nominal level
A very important concept in the multiple testing context is that of
adjusted
Adjusted
On the other hand, the adjusted
In this case, the adjusted
As mentioned, the sgof package implements different procedures for
solving multiple testing problems. This section illustrates the usage of
sgof by describing its main features and by analyzing two real data
sets. The first data set refers to a situation in which the number of
tests (Hedenfalk
. The package sgof implements for the first time
the four SGoF-type methods which have been reviewed in the previous
section.
The sgof package includes six functions: Binomial.SGoF
, SGoF
,
Bayesian.SGoF
, BBSGoF
, BH
and BY
. All of the six functions
estimate the FDR by the simple method proposed by (Dalmasso et al. 2005) by taking
BH() and BY() arguments |
|
u |
The (non-empty) numeric vector of |
alpha |
Numerical value. The significance level of the metatest. Default is alpha = 0.05 . |
Binomial.SGoF() , SGoF() , Bayesian.SGoF() and BBSGoF() arguments |
|
u |
The (non-empty) numeric vector of |
alpha |
Numerical value. The significance level of the metatest. Default is alpha = 0.05 . |
gamma |
Numerical value. The |
significance in the amount of gamma = 0.05 . |
|
Bayesian.SGoF() arguments |
|
P0 |
Numerical value. The a priori probability of the null hypothesis. Default is P0 = 0.5 . |
a0 |
Numerical value. The first parameter of the a priori beta distribution. Default is a0 = 1 . |
b0 |
Numerical value. The second parameter of the a priori beta distribution. Default is b0 = 1 . |
BBSGoF() arguments |
|
kmin |
Numerical value. The smallest allowed number of blocks of correlated tests. Default is kmin = 2 . |
kmax |
Numerical value. The largest allowed number of blocks of correlated tests. Default is |
kmax = min(length(u)/10, 100) . |
|
tol |
Numerical value. The tolerance in model fitting. Default is tol = 10 . It allows for a |
stronger (small tol ) or weaker (large tol ) criterion when removing poor fits of the |
|
beta-binomial model. When the variance of the estimated beta-binomial parameters | |
for a given tol times the median variance along kmin kmax , the |
|
particular value of |
|
adjusted.pvalues |
Logical. Default is FALSE . If TRUE , the adjusted |
blocks |
Numerical value. The number of existing blocks in order to compute the adjusted |
Table 1 shows a list of the arguments in the six functions. It
should be noted that only the argument u
(the vector of u
in the function SGoF()
, the
following message will be returned:
> SGoF(alpha = 0.05, gamma = 0.05)
Error in SGoF(alpha = 0.05, gamma = 0.05) : data argument is required
Moreover, in the event the user chooses the option
adjusted.pvalues = TRUE
in the function BBSGoF()
and forgets to
write the argument blocks
, then this function will return the
following message:
> BBSGoF(u, adjusted.pvalues = TRUE)
Error in BBSGoF(u, adjusted.pvalues = TRUE) :
blocks argument is required to compute the Adjusted p-values
Note also that kmax
should be larger than kmin
and smaller than the
number of tests SGoF()
instead of
BBSGoF()
), otherwise BBSGoF()
will return the following messages:
> BBSGoF(u, kmin = 5, kmax = 3)
Error in BBSGoF(u, kmin = 5, kmax = 3) : kmax should be larger than kmin
> BBSGoF(u, kmax = length(u))
Error in BBSGoF(u, kmax = length(u)) : kmax should be lower than n
Finally, note that BBSGoF()
usually returns a warning message
indicating which blocks tol
which allows for a stronger or weaker criterion when
removing poor fits of the beta-binomial model (see Section Package sgof in practice for an
example).
Binomial.SGoF() , SGoF() , Bayesian.SGoF() , BBSGoF() , BH() and BY() |
|
Rejections |
The number of declared effects. |
FDR |
The estimated false discovery rate. |
Adjusted.pvalues |
The adjusted |
BBSGoF() |
|
effects |
A vector with the number of effects declared by BBSGoF() for each value of |
SGoF |
The number of effects declared by SGoF() . |
automatic.blocks |
The automatic number of blocks. |
deleted.blocks |
A vector with the values of |
n.blocks |
A vector with the values of |
p |
The average ratio of |
cor |
A vector with the estimated within-block correlation. |
Tarone.pvalues |
A vector with the |
Tarone.pvalue.auto |
The |
beta.parameters |
The estimated parameters of the Beta(a, b) model for the automatic |
betabinomial.parameters |
The estimated parameters of the Betabinomial(p, rho) model for the automatic |
sd.betabinomial.parameters |
The standard deviation of the estimated parameters of the |
Betabinomial(p, rho) model for the automatic |
|
Bayesian.SGoF() |
|
Posterior |
The posterior probability that the complete null hypothesis is true considering the |
prior information a0 , b0 and P0 . |
|
s |
The amount of |
s.alpha |
Critical point at level alpha of the Bayesian pre-test for the complete null |
depending on P0 . |
On the other hand, Table 2 shows a summary of the results given
by each of the functions. It can be seen that the number of rejections
and the estimation of the FDR are a common returned value whereas the
adjusted Bayesian.SGoF()
. Moreover, the Bayesian.SGoF()
function also
computes the posterior probability that the complete null hypothesis is
true, based on the default a priori probabilities s
) and the critical point at level alpha for the Bayesian pre-test
for the complete null (s.alpha
). Finally, the BBSGoF()
function also
computes some parameters of interest like (among others) a vector with
the number of effects declared by BBSGoF()
for each value of k
(effects
), the automatic number of blocks (automatic.blocks
), a
vector with the values of k for which the model fitted well
(n.blocks
), a vector with the estimated within-block correlation
(cor
), a vector with the Tarone.pvalues
), and the estimated parameters of the
Beta(a, b) and Betabinomial(p, rho) models for the automatic k.
Finally, the sgof package implements three different methods for the
returned objects of classes ‘Binomial.SGoF
’, ‘SGoF
’, ‘BBSGoF
’,
‘BH
’ and ‘BY
’ classes. The print
method which prints the
corresponding object in a nice way, the summary
method which prints a
summary of the main results reported, and the plot
method which
provides a graphical representation of the adjusted BBSGoF()
, four more plots of
interest: the fitted beta density, the Tarone’s Bayesian.SGoF
’ class does not have a plot
method as the adjusted
(Needleman et al. 1979) compared various psychological and classroom performances between
two groups of children in order to study the neuropsychologic effects of
unidentified childhood exposure to lead. Needleman’s study was attacked
because it presented three families of endpoints but carried out
separate multiplicity adjustments within each family. For illustration
of sgof, we will focus on the family of endpoints corresponding to the
teacher’s behavioral ratings. Table 3 shows the original
u
) as well as the adjusted BH()
and Binomial.SGoF()
functions, computed using
the following code:
> u <- c(0.003, 0.003, 0.003, 0.01, 0.01, 0.04, 0.05, 0.05, 0.05, 0.08, 0.14)
> BH(u)$Adjusted.pvalues
> Binomial.SGoF(u)$Adjusted.pvalues
Note that tied
Adjusted |
|||
---|---|---|---|
BH | Binomial.SGoF | ||
Distractible | 0.003 | 0.011 | 0.010 |
Does not follows sequence of directions | 0.003 | 0.011 | 0.010 |
Low overall functioning | 0.003 | 0.011 | 0.010 |
Impulsive | 0.010 | 0.022 | 0.050 |
Daydreamer | 0.010 | 0.022 | 0.050 |
Easily frustrated | 0.040 | 0.061 | 0.050 |
Not persistent | 0.050 | 0.061 | 1.000 |
Dependent | 0.050 | 0.061 | 1.000 |
Does not follow simple directions | 0.050 | 0.061 | 1.000 |
Hyperactive | 0.080 | 0.088 | 1.000 |
Disorganized | 0.140 | 0.140 | 1.000 |
We will use Needleman’s u
) to
illustrate the performance of the BH()
, Binomial.SGoF()
and
Bayesian.SGoF()
functions, using default argument values. SGoF()
and
BBSGoF()
are not applied in this case because these are asymptotic
methods and here the sample size is small (
The first step to analyze Neddleman data is to load the sgof package
by using the code line: library(sgof)
. We start then by applying the
BH()
function:
> m1 <- BH(u)
> summary(m1)
Call:
BH(u = u)
Parameters:
alpha= 0.05
$Rejections
[1] 5
$FDR
[1] 9e-04
$Adjusted.pvalues
>alpha <=alpha
6 5
The output of the summary shows that the BH procedure with alpha
, which is by force the case. If we apply the BY procedure to
this set of
> BY(u)$Rejections
[1] 3
Now, we illustrate the usage of the Binomial.SGoF()
function:
> m2 <- Binomial.SGoF(u)
> summary(m2)
Call:
Binomial.SGoF(u = u)
Parameters:
alpha= 0.05
gamma= 0.05
$Rejections
[1] 6
$FDR
[1] 0.0031
$Adjusted.pvalues
>gamma <=gamma
5 6
In this case, the summary indicates that the default Binomial SGoF
procedure (gamma
is equal to the number of rejections, which
will not be the case in general (recall that the number of rejections of
SGoF is an increasing-decreasing function of
Figure 1 reports the graphical displays from the plot
method
applied to the m1
and m2
objects (plot(m1)
, plot(m2)
), where the
adjusted
In the multiple testing setting, these plots are often used to inspect
the relative size and distribution of the adjusted
Results of Bayesian.SGoF()
are as follows:
> m3 <- Bayesian.SGoF(u)
> summary(m3)
Call:
Bayesian.SGoF(u = u)
Parameters:
alpha= 0.05
gamma= 0.05
P0= 0.5
a0= 1
b0= 1
$Rejections
[1] 6
$FDR
[1] 0.0031
$Posterior
[1] 0
$s
[1] 9
$s.alpha
[1] 5
By using the Bayesian.SGoF()
function one obtains the same number of
declared effects and estimated FDR as those reported by the Binomial
SGoF procedure. Besides, the summary of the ‘Bayesian.SGoF
’ object
shows that, while there are nine original gamma
, the critical point at level alpha
for the Bayesian pre-test
is five, which is lower than s
as expected (if s.alpha
s
then
Bayesian SGoF would have accepted the complete null). Besides, the
posterior probability that the complete null is true is zero. By
choosing the default values of Bayesian.SGoF()
one considers as
non-informative a0
and b0
may be used to include such information.
Below we provide the results when choosing a0 = 2
and b0 = 8
, which
corresponds to a Beta(2, 8) distribution with mean s.alpha
is
not depending on a0
and b0
.
> m32 <- Bayesian.SGoF(u, a0 = 2, b0 = 8)
> summary(m32)
Call:
Bayesian.SGoF(u = u, a0 = 2, b0 = 8)
Parameters:
alpha= 0.05
gamma= 0.05
P0= 0.5
a0= 2
b0= 8
$Rejections
[1] 3
$FDR
[1] 5e-04
$Posterior
[1] 0
$s
[1] 9
$s.alpha
[1] 5
Now, by choosing P0 = 0.2
to represent a small a priori probability
that the complete null is true, one obtains the same number of
rejections but the lower bound of the Bayesian pre-test changes (which
depends on P0
but not on a0
or b0
), being s.alpha = 3
. That is,
if the number of existing gamma
were 4 (rather than
9), the complete null would be rejected with P0 = 0.2
but not with the
default option P0 = 0.5
. This means that, by choosing a lower a priory
probability, P0
, Bayesian SGoF is more likely to reject the complete
null hypothesis.
> m33 <- Bayesian.SGoF(u, a0 = 2, b0 = 8, P0 = 0.2)
> summary(m33)
...
$Rejections
[1] 3
$FDR
[1] 5e-04
$Posterior
[1] 0
$s
[1] 9
$s.alpha
[1] 3
In order to illustrate how some of these results change when changing
the value for the argument alpha
, we apply the Binomial.SGoF()
,
Bayesian.SGoF()
, BH()
and BY()
functions to the Needleman data
with alpha = 0.01
. While the number of rejections reported by the
Binomial SGoF procedure remains the same, Bayesian SGoF becomes more
conservative declaring one less effect. Stronger consequences are found
for BH and BY procedures, which are unable to find any effect with such
a restrictive FDR level.
> Binomial.SGoF(u, alpha = 0.01)$Rejections
[1] 6
> Bayesian.SGoF(u, alpha = 0.01)$Rejections
[1] 5
> BH(u, alpha = 0.01)$Rejections
[1] 0
> BY(u, alpha = 0.01)$Rejections
[1] 0
As an illustrative example of a large number of dependent tests, we
consider the microarray study of hereditary breast cancer of (Hedenfalk et al. 2001). The
principal aim of this study was to find genes differentially expressed
between BRCA1- and BRCA2-mutation positive tumors. For that, a Hedenfalk
.
The first step to analyze the Hedenfalk data is to load the package and the data set. To do so, we use the next code lines:
> library(sgof)
> u <- Hedenfalk$x
Here we use the Hedenfalk data to illustrate the BH()
and BBSGoF()
functions which are suitable because these BY()
, SGoF()
,
Binomial.SGoF()
and Bayesian.SGoF()
functions, and the qvalue
procedure, to compare the results. Starting with the BH()
function
(with default argument
> m41 <- BH(u)
> summary(m41)
Call:
BH(u = u)
Parameters:
alpha= 0.05
$Rejections
[1] 94
$FDR
[1] 0.0356
$Adjusted.pvalues
>alpha <=alpha
3076 94
The summary of the object m41
reveals that the Benjamini and Hochberg
procedure (with a FDR of BY()
function with default argument
> m42 <- BY(u)
> summary(m42)
Call:
BY(u = u)
Parameters:
alpha= 0.05
$Rejections
[1] 0
$FDR
[1] 0
$Adjusted.pvalues
>alpha
3170
The output of the summary indicates that the Benjamini and Yekutieli FDR
controlling procedure (with a FDR of alpha
. In fact, the smallest adjusted min(m42$Adjusted.pvalues)
) which means that, to find at
least one effect, a FDR greater than
In third place, we apply to the Hedenfalk data the qvalue()
function
of the qvalue package for comparison purposes. We obtain that the
qvalue procedure declares 162 effects at a
> m43 <- qvalue(u)
> summary(m43)
Call:
qvalue(p = u)
pi0: 0.6635185
Cumulative number of significant calls:
<1e-04 <0.001 <0.01 <0.025 <0.05 <0.1 <1
p-value 15 76 265 424 605 868 3170
q-value 0 0 1 73 162 319 3170
When applying the BBSGoF()
function to the Hedenfalk data and printing
the results (saved in the m5
object), a warning alerts the user that
blocks 2, 3, 4, 5, 6, 7, 8, 9, 11, 15, and 19 have been removed because
they provided negative or atypical variances (see output below). We see
that the BBSGoF procedure rejects 393 nulls. In this case, we have
chosen the option adjusted.pvalues = TRUE
in order to compute the
adjusted blocks = 13
(the automatic number of blocks
obtained in a preliminary application of the same function). We note
that the output is not immediately obtained in this case since the
computation of the adjusted m5
object indicates that BBSGoF’s
decision entails an estimated FDR of kmin = 2
to
kmax = 100
), as well as the 5e-04
), and the parameters of
the fitted beta and beta-binomial distributions.
> m5 <- BBSGoF(u, adjusted.pvalues = TRUE, blocks = 13)
> m5
Call:
BBSGoF(u = u, adjusted.pvalues = TRUE, blocks = 13)
Parameters:
alpha= 0.05
gamma= 0.05
kmin= 2
kmax= 100
Warning:
Blocks 2 3 4 5 6 7 8 9 11 15 18 19 have been removed because they provided negative or
atypical variances.
Rejections:
[1] 393
> summary(m5)
...
$Rejections
[1] 393
$FDR
[1] 0.1296
$Adjusted.pvalues
>gamma <=gamma
2777 393
$Tarone.pvalue.auto
[1] 5e-04
$beta.parameters
[1] 35.0405 148.4139
$betabinomial.parameters
[1] 0.1910 0.0054
$sd.betabinomial.parameters
[1] 0.0106 0.0038
$automatic.blocks
[1] 13
Figure 2 depicts the graphics obtained when using the plot
method (plot(m5)
). In the upper left plot, the adjusted.pvalues = FALSE
this last
plot is not displayed).
Next, we apply the SGoF()
function to the Hedenfalk data (with default
values of alpha
and gamma
):
> m6 <- SGoF(u)
> summary(m6)
Call:
SGoF(u = u)
Parameters:
alpha= 0.05
gamma= 0.05
$Rejections
[1] 412
$FDR
[1] 0.131
$Adjusted.pvalues
>gamma <=gamma
2758 412
The Conservative SGoF procedure reports 412 effects with an estimated
FDR of BH()
and SGoF()
versus the original ones,
obtained using the code lines: plot(m41)
and plot(m6)
.
We will use the Hedenfalk data example to illustrate the role of the
alpha
and gamma
arguments of the SGoF-type procedures too. The m61
object shows that Conservative SGoF with alpha = 0.05
and
gamma = 0.1
declares 510 effects, while 520 adjusted gamma
. This illustrates how the number of rejections may
change depending on the initial threshold alpha
is not equal to gamma
, the number of
rejections may be different to the number of adjusted gamma
(if alpha
= gamma
, the number of rejections is always
a lower bound for the number of adjusted gamma
, see
Castro-Conde and de Uña-Álvarez (2015); something which does not hold in general). When alpha = 0.1
and
gamma = 0.05
, SGoF()
reports 420 effects, which illustrates how
alpha
argument increases but, when
> m61 <- SGoF(u, gamma = 0.1)
> m61
Call:
SGoF(u = u, gamma = 0.1)
Parameters:
alpha= 0.05
gamma= 0.1
Rejections:
[1] 510
> sum(m61$Adjusted.pvalues <= m61$gamma)
[1] 520
> m62 <- SGoF(u,alpha = 0.1)
> m62
Call:
SGoF(u = u, alpha = 0.1)
Parameters:
alpha= 0.1
gamma= 0.05
Rejections:
[1] 420
> sum(m62$Adjusted.pvalues <= m62$gamma)
[1] 412
Finally, by applying Binomial.SGoF()
(m7
) and Bayesian.SGoF()
(m8
) functions to the Hedenfalk data one obtains 427 and 413
rejections, respectively. Binomial SGoF rejects more nulls than
Conservative SGoF does (427 vs. 412), as expected, since the first
method estimates the variance under the complete null of no effects. On
the other hand, Bayesian SGoF reports approximately the same number of
effects than Conservative SGoF, which will be generally the case with a
large number of tests. Note that, as
> m7 <- Binomial.SGoF(u)
> m7
Call:
Binomial.SGoF(u = u)
Parameters:
alpha= 0.05
gamma= 0.05
Rejections:
[1] 427
> m8 <- Bayesian.SGoF(u)
> m8
Call:
Bayesian.SGoF(u = u)
Parameters:
alpha= 0.05
gamma= 0.05
P0= 0.5
a0= 1
b0= 1
Rejections:
[1] 413
In this paper we introduced the sgof package which implements in R for the first time SGoF-type multiple testing procedures; the classical FDR-controlling step-up BH and BY procedures are also included. We reviewed the definition of the several methods and discussed their relative advantages and disadvantages, and how they are implemented. Guidelines to decide which method is best suited to the data at hand have been given. Specifically, if the tests are independent, Binomial SGoF is recommended, with the possibility of using Conservative SGoF when the number of tests is moderate to large. On the other hand, BB-SGoF is suitable for serially dependent tests, while Bayesian SGoF allows for a stronger dependence structure with pairwise correlation depending on the user’s a priori information. Finally, BH (independent tests or positively correlated tests) and BY (dependent tests) methods are indicated when the aim is to strongly control for the expected proportion of false discoveries. Existing improvements on BH and BY methods include the qvalue procedure (Storey and Tibshirani 2003) or the empirical Bayes procedures (Pollard et al. 2005), which are implemented in other packages. sgof has been illustrated in practice by analyzing two real well-known data sets: Needleman data (Needleman et al. 1979) and Hedenfalk data (Hedenfalk et al. 2001). Summarizing, it has be shown that sgof package is very user-friendly and it is hoped that it serves the community by providing a simple and powerful tool for solving multiple testing problems.
Financial support from the Grant MTM2011-23204 (FEDER support included) of the Spanish Ministry of Science and Innovation is acknowledged.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Castro-Conde & Uña-Álvarez, "sgof: An R Package for Multiple Testing Problems", The R Journal, 2014
BibTeX citation
@article{RJ-2014-027, author = {Castro-Conde, Irene and Uña-Álvarez, Jacobo de}, title = {sgof: An R Package for Multiple Testing Problems}, journal = {The R Journal}, year = {2014}, note = {https://rjournal.github.io/}, volume = {6}, issue = {2}, issn = {2073-4859}, pages = {96-113} }