IsoGene is an R package for the analysis of dose-response microarray experiments to identify gene or subsets of genes with a monotone relationship between the gene expression and the doses. Several testing procedures (i.e., the likelihood ratio test, Williams, Marcus, the
The exploration of dose-response relationship is important in drug-discovery in the pharmaceutical industry. The response in this type of studies can be either the efficacy of a treatment or the risk associated with exposure to a treatment. Primary concerns of such studies include establishing that a treatment has some effect and selecting a dose or doses that appear efficacious and safe (Pinheiro et al. 2006). In recent years, dose-response studies have been integrated with microarray technologies (Lin et al. 2010). Within the microarray setting, the response is gene expression measured at a certain dose level. The aim of such a study is usually to identify a subset of genes with expression levels that change with experimented dose levels.
One of four main questions formulated in dose-response studies by Ruberg
((1995b), (1995a)) and Chuang-Stein and Agresti (1997) is whether there is
any evidence of the drug effect. To answer this question, the null
hypothesis of homogeneity of means (no dose effect) is tested against
ordered alternatives. Testing for trends in dose-response microarray experiments: A comparison of severaltesting procedures, multiplicity and resampling-based inference (2007) ((2007), (2010)) discussed
several testing procedures used in dose-response studies of microarray
experiments. Testing procedures which take into account the order
restriction of means with respect to the increasing doses include
Williams (Williams (1971), (1971) and (1972)), Marcus
(Marcus 1976), the likelihood ratio test (Barlow et al. (1972,),
and Robertson et al. (1988,)), the
To carry out the analysis of dose-response microarray experiments discussed by Testing for trends in dose-response microarray experiments: A comparison of severaltesting procedures, multiplicity and resampling-based inference (2007) ((2007), (2010)), an R package called IsoGene has been developed. The IsoGene package implements the testing procedures described by Testing for trends in dose-response microarray experiments: A comparison of severaltesting procedures, multiplicity and resampling-based inference (2007) to identify a subset of genes where a monotone relationship between gene expression and dose can be detected. In this package, the inference is based on resampling methods, both permutations (Ge, S. Dudoit, and T. P. Speed 2003) and the Significance Analysis of Microarrays (SAM), Tusher et al. (2001), (2001). To control the False Discovery Rate (FDR) the Benjamini Hochberg (BH) procedure (Benjamini and Hochberg 1995) is implemented.
This paper introduces the IsoGene package with background information about the methodology used for analysis and its main functions. Illustrative examples of analysis using this package are also provided.
In a microarray experiment, for each gene, the following ANOVA model is considered:
where
The null hypothesis of homogeneity of means (no dose effect) is given by
where
For testing
In order to test
Here,
Test statistic | Formula |
---|---|
Likelihood Ratio Test (LRT) |
|
Williams | t = (μ̂K⋆−ȳ0)/s |
Marcus | t = (μ̂K⋆−μ̂0⋆)/s |
M | M = (μ̂K⋆−μ̂0⋆)/s̃ |
Modified M (M’) | M′ = (μ̂K⋆−μ̂0⋆)s̃′ |
Williams ((1971), (1972)) proposed a step-down
procedure to test for the dose effect. The tests are performed
sequentially from the comparison between the isotonic mean of the
highest dose and the sample mean of the control, to the comparison
between the isotonic mean of the lowest dose and the sample mean of the
control. The procedure stops at the dose level where the null hypothesis
(of no dose effect) is not rejected. The test statistic is shown in
Figure 1, where
Analysis of dose responseeffects on gene expression data with comparison of two microarrayplatforms (2005) proposed a test statistic (denoted by
The Significance Analysis of Microarrays procedure (SAM, Tusher et al. (2001),
(2001)) can be also adapted to the five test statistics
described above. The generic algorithm of SAM discussed by Chu et al. (2001) is
implemented in this package. For the
In this study, a gene-by-gene analysis is carried out. When many hypotheses are tested, the probability of making the type I error increases sharply with the number of hypotheses tested. Hence, multiplicity adjustments need to be performed. Dudoit, J. P. Shaffer, and J. Boldrick (2003) and Dudoit and Laan (2008) provided extensive discussions on the multiple adjustment procedures in genomics. Testing for trends in dose-response microarray experiments: A comparison of severaltesting procedures, multiplicity and resampling-based inference (2007) compared several approaches for multiplicity adjustments, and showed the application in dose-response microarray experiments. In the IsoGene package the inference is based on resampling-based methods. The Benjamini Hochberg (BH) procedure is used to control the FDR . We use the definition of FDR from Benjamini and Hochberg (1995). The Significance Analysis of Microarrays (SAM, Tusher et al. (2001), (2001)) approach is also considered, which estimates the FDR by using permutations, where the FDR is computed as median of the number of falsely called genes divided by the number of genes called significant.
The IsoGene package provides the estimates and
The main functions of the IsoGene package are IsoRawp()
and
IsoTestBH()
which calculate the raw IsoTestSAM()
.
Function | Description |
---|---|
IsoRawp()
|
Calculates raw p-values for |
each test statistic using | |
permutations | |
IsoTestBH()
|
Adjusts raw p-values of the |
five test statistics using the | |
BH- or BY-FDR procedure | |
IsoGene1()
|
Calculates the values of the |
five test statistics for a | |
single gene | |
IsoGenem()
|
Calculates the values of the |
five test statistics for all genes | |
IsoPlot()
|
Plots the data points, sample |
means at each dose and | |
an isotonic regression line | |
(optional) | |
IsopvaluePlot()
|
Plots the pUp and pDown-values |
of a gene for a given test | |
IsoBHplot()
|
Plots the raw p-values and |
adjusted BH- and BY-FDR | |
p-values | |
IsoGenemSAM()
|
Calculates the values for the |
five SAM regularized test | |
statistics | |
IsoTestSAM()
|
Obtains the list of significant |
genes using the SAM | |
procedure |
The supporting functions are IsoGenem()
and IsoGenemSAM()
, which
calculate the values for the five test statistics and for five SAM
regularized test statistics, respectively. The functions
IsopvaluePlot(), IsoBHplot()
, and IsoPlot()
can be used to display
the data and show the results of the testing procedures. The summary of
the functions and their descriptions are presented in
Figure 2.
The IsoGene package can be obtained from CRAN: http://cran.r-project.org/package=IsoGene. The IsoGene package requires the ff and Iso packages.
To illustrate the analysis of dose-response in microarray using IsoGene package, we use the dopamine data. The data were obtained from a pre-clinical evaluation study of an active compound and Talloen, (2009)). In this study the potency of the compound was investigated. The compound had 6 dose levels (0, 0.01, 0.04, 0.16, 0.63, 2.5 mg/kg) and each dose was given to four to five independent rats. The experiment was performed using Affymetrix whole-human genome chip. There are 26 chips/arrays and each chip/array contains 11,562 probe sets (for simplicity, we refer to the probe sets as genes). The dopamine data set with 1000 genes is provided inside the package as example data. The complete data set can be obtained on request upon the first author. For this paper, the analysis is based on the original data set (using all genes).
The example data is in an ExpressionSet
object called dopamine
. More
detailed explanation of the ExpressionSet
object is discussed by
Falcon et al. (2007). In order to load the object into R, the following code
can be used:
> library(affy)
> library(IsoGene)
> data(dopamine)
> dopamine
ExpressionSet(storageMode:lockedEnvironment)
assayData: 11562 features, 26 samples
element names: exprs
phenoData
sampleNames: X1, X2, ..., X26 (26 total)
varLabels and varMetadata description:
dose: Dose Levels
featureData
featureNames: 201_at,202_at,...,11762_at
(11562 total)
fvarLabels and fvarMetadata
description: none
experimentData: use 'experimentData(object)'
The IsoGene package requires the information of dose levels and gene expression as input. The information of dose levels and the log2 transformed gene intensities can be extracted using the following code:
> express <- data.frame(exprs(dopamine))
> dose <- pData(dopamine)$dose
For data exploration, the function IsoPlot()
can be used to produce a
scatterplot of the data. The function IsoPlot()
has two options to
specify the dose levels, i.e., type ="ordinal"
or "continuous"
. By
default, the function produces the plot with continuous dose levels and
data points. To add an isotonic regression line, one can specify
add.curve=TRUE
.
Plots of the data and an isotonic regression line for one of the genes in the data set with dose on the continuous and ordinal scale can be produced by using the following code:
# Figure 3
> par(mfrow=c(1,2))
> IsoPlot(dose,express[56,],type="continuous",
+ add.curve=TRUE)
> IsoPlot(dose,express[56,],type="ordinal",
+ add.curve=TRUE)
IsoPlot()
with dose as continuous variable (left panel) and
dose as ordinal variable (right panel and the real dose level is
presented on the x-axis). The data points are plotted as
circles, while the sample means as red crosses and the fitted increasing
isotonic regression model as a blue solid line.
In this section, we illustrate an analysis to test for monotone trend
using the five statistics presented in section 2 using the
IsoGene package. The function IsoRawp()
is used to calculate raw
set.seed
to obtain the same random number. To calculate raw dopamine
data using 1000 permutations with seed=1234
, we can
use the following code:
> set.seed(1234)
> rawpvalue <- IsoRawp(dose, express,
+ niter=1000)
The output object rawpvalue
is a list with four components containing
the
> twosided.pval <- rawpvalue[[2]]
> twosided.pval[1:4, ]
Probe.ID E2 Williams Marcus M ModM
1 201_at 1.000 0.770 0.996 0.918 0.946
2 202_at 0.764 0.788 0.714 0.674 0.700
3 203_at 0.154 0.094 0.122 0.120 0.134
4 204_at 0.218 0.484 0.378 0.324 0.348
Once the raw IsoTestBH()
is used to
adjust the rawp
), FDR level, type of multiplicity adjustment (BH-FDR or
BY-FDR) and the statistic need to be specified in following way:
IsoTestBH(rawp, FDR=c(0.05,0.1),
type=c("BH","BY"), stat=c("E2",
"Williams","Marcus","M","ModifM"))
IsoTestBH()
produces a list of genes, which have a significant
increasing/decreasing trend. The following code can be used to adjust
the two-sided
> E2.BH <- IsoTestBH(twosided.pval,
+ FDR = 0.05, type = "BH", stat ="E2")
> dim(E2.BH)
[1] 250 4
The object E2.BH
contains a list of significant genes along with the
probe ID, the corresponding row numbers of the gene in the original data
set, the unadjusted (raw), and the BH-FDR adjusted dopamine
data, there are 250
genes tested with a significant monotone trend using the likelihood
ratio test (
> E2.BH[1:4, ]
Probe.ID row.num raw p-values BH adj.p-values
1 256_at 56 0 0
2 260_at 60 0 0
3 280_at 80 0 0
4 283_at 83 0 0
One can visualize the number of significant findings for the BH-FDR and
BY-FDR procedures for a given test statistic using the IsoBHPlot()
function.
> # Figure 4
> IsoBHPlot(twosided.pval,FDR=0.05,stat="E2")
Figure 4 shows the raw
The Significance Analysis of Microarrays (SAM) for testing for the
dose-response relationship under order restricted alternatives is
implemented in the IsoGene package as well. The function
IsoTestSAM()
provides a list of significant genes based on the SAM
procedure.
> IsoTestSAM(x, y, fudge=c("none","pooled"),
+ niter=100, FDR=0.05, stat=c("E2",
+ "Williams","Marcus","M","ModifM"))
The input for this function is the dose levels (x
), gene expression
(y
), number of permutations (niter
), the FDR level, the test
statistic, and the option of using fudge factor. The option fudge
is
used to specify the calculation the fudge factor in the SAM regularized
test statistic. If the option fudge ="pooled"
is used, the fudge
factor will be calculated using the method described in the SAM manual
(Chu et al. 2001). If we specify fudge ="none"
no fudge factor is used.
The following code is used for analyzing the dopamine
data using the
SAM regularized
> set.seed(1235)
> SAM.ModifM <- IsoTestSAM(dose, express,
+ fudge="pooled", niter=100,
+ FDR=0.05, stat="ModifM")
The resulting object SAM.ModifM
, contains three components:
sign.genes1
contains a list of genes declared significant using
the SAM procedure.
qqstat
gives the SAM regularized test statistics obtained from
permutations.
allfdr
provides a delta table in the SAM procedure for the
specified test statistic.
To extract the list of significant gene, one can do:
> SAM.ModifM.result <- SAM.ModifM[[1]]
> dim(SAM.ModifM.result)
[1] 151 6
The object SAM.ModifM.result
, contains a matrix with six columns: the
Probe IDs, the corresponding row numbers of the genes in the data set,
the observed SAM regularized
For dopamine
data, there are 151 genes found to have a significant
monotone trend based on the SAM regularized
> SAM.ModifM.result[1:4,]
Probe.ID row.num stat.val qvalue pvalue adj.pvalue
1 4199_at 3999 -4.142371 0.0000 3.4596e-06 0.0012903
2 4677_at 4477 -3.997728 0.0000 6.9192e-06 0.0022857
3 7896_at 7696 -3.699228 0.0000 1.9027e-05 0.0052380
4 9287_at 9087 -3.324213 0.0108 4.4974e-05 0.0101960
Note that genes are sorted increasingly based on the SAM regularized
stat.val
).
In the IsoGene package, the function IsoSAMPlot()
provides graphical
outputs of the SAM procedure. This function requires the SAM regularized
test statistic values and the delta table in the SAM procedure, which
can be obtained from the resulting object of the IsoTestSAM()
function, which in this example data is called SAM.ModifM
. To extract
the objects we can use the following code:
# Obtaining SAM regularized test statistic
qqstat <- SAM.ModifM[[2]]
# Obtaining delta table
allfdr <- SAM.ModifM[[3]]
We can also obtain the qqstat
and allfdr
from the functions
Isoqqstat()
and Isoallfdr()
, respectively. The code for the two
functions are:
Isoqqstat(x, y, fudge=c("none","pooled"),niter)
Isoallfdr(qqstat, ddelta, stat=c("E2",
"Williams","Marcus","M","ModifM"))
The examples of using the functions Isoqqstat()
and Isoallfdr()
are
as follows:
# Obtaining SAM regularized test statistic
> qqstat <- Isoqqstat(dose, express,
+ fudge="pooled", niter=100)
# Obtaining delta table
> allfdr <- Isoallfdr(qqstat, ,stat="ModifM")
Note that in Isoallfdr()
, ddelta
is left blank, with default values
taken from the data, i.e., all the percentiles of the standard errors of
the
The two approaches above will give the same result. Then to produces the
SAM plot for the SAM regularized IsoSAMPlot
:
# Figure 5
> IsoSAMPlot(qqstat, allfdr, FDR = 0.05,
+ stat = "ModifM")
Panel
In the previous sections we have illustrated analysis for testing a monotone trend for dose-response microarray data by using permutations and the SAM. The same approach can also be applied to obtain a list of significant genes based on other statistics and other multiplicity adjustments. Figure 6 presents the number of genes that are found to have a significant monotone trend using five different statistics with the FDR level of 0.05 using the two approaches.
It can be seen from Figure 6 that the
Number of significant genes | |||
Test statistic | BH-FDR | SAM | # common genes |
Ē012 | 250 | 279 | 200 |
Williams | 186 | 104 | 95 |
Marcus | 195 | 117 | 105 |
M | 209 | 158 | 142 |
M′ | 203 | 151 | 134 |
In the inference based on the permutations (BH-FDR) and the SAM, most of the genes found by the five statistics are in common (see Figure 6). The plots of the samples and the isotonic trend of four best genes that are found in all statistics and in both the permutations and the SAM approaches are presented in Figure 7, which have shown a significant increasing monotone trend.
We have shown the use of the IsoGene package for dose-response studies
within a microarray setting along with the motivating examples. For the
analysis using the SAM procedure, it should be noted that the fudge
factor in the
The calculation of the raw IsoRawp()
function is computationally intensive. For example, when we
used 100 permutations for analyzing 10,000 genes, it takes around 30
minutes. When we use 1,000 permutations, the elapsed time increases
around 10 times. Usually to approximate the null distribution of the
test statistics, a large number of permutations is needed, which leads
to the increase of computation time. Considering the computation time,
we also provide the IsoTestSAM()
. This approach is sufficient to obtain small
A further development of this package is the Graphical User Interface
(GUI) using tcl/tk
for users without knowledge of R programming.
Financial support from the IAP research network nr P5/24 of the Belgian Government (Belgian Science Policy) is gratefully acknowledged.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
http://bioconductor.org/packages/2.3/bioc/ vignettes/Biobase/inst/doc/ ExpressionSetIntroduction.pdf
: 2007.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Shkedy & Verbeke, "IsoGene: An R Package for Analyzing Dose-response Studies in Microarray Experiments", The R Journal, 2010
BibTeX citation
@article{RJ-2010-001, author = {Shkedy, Setia Pramana, Dan Lin, Philippe Haldermans and Ziv and Verbeke, Tobias}, title = {IsoGene: An R Package for Analyzing Dose-response Studies in Microarray Experiments}, journal = {The R Journal}, year = {2010}, note = {https://rjournal.github.io/}, volume = {2}, issue = {1}, issn = {2073-4859}, pages = {5-12} }