This article describes the R package copas which is an add-on package to the R package meta. The R package copas can be used to fit the Copas selection model to adjust for bias in meta-analysis. A clinical example is used to illustrate fitting and interpreting the Copas selection model.
Systematic reviews play a key role in the evaluation of new interventions. While combining evidence from a number of studies should reduce both bias and uncertainty, this is sometimes not the case because published studies represent a biased selection of the evidence. This can happen for a number of reasons, for example authors may be more likely to submit trials with ‘significant’ results for publication or journals may be more likely to publish smaller trials if they have ‘significant’ results. Empirical studies have established evidence for these kinds of biases and others (Rothstein, A. J. Sutton, and M. Borenstein 2005; Nieminen, G. Rücker, J. Miettunen, J. R. Carpenter, and M. Schumacher 2007).
In consequence, considerable effort has been directed at the problem of developing reliable tests for selection biases (Harbord, M. Egger, and J. A. Sterne 2006; Rücker, G. Schwarzer, and J. R. Carpenter 2008), and in a second step, correcting estimates for publication bias (Rothstein, A. J. Sutton, and M. Borenstein 2005). One of the most promising methods to date has been the so-called Copas selection model (Copas 1999; Copas and J. Q. Shi 2000; Copas and J. Q. Shi 2001), which is derived from the Heckman 2-stage regression model (see Little and D. B. Rubin (2002)).
Comprehensive evaluations suggest that this approach (i) can provide a useful summary in around 80% of meta-analyses (Carpenter, G. Schwarzer, G. Rücker, and R. Künstler 2009) and (ii) is preferable to the trim-and-fill method to adjust for bias in meta-analysis (Schwarzer, J. R. Carpenter, and G. Rücker 2009). This article describes the R package copas for fitting and interpreting the Copas selection model.
We first briefly describe the model. The Copas selection model has two
components: (i) a model for the outcome, measured on a chosen scale,
e.g. the log odds ratio, log risk ratio or the arcsine difference
(Rücker, G. Schwarzer, J. Carpenter, and I. Olkin 2009), and (ii) a ‘selection’ model giving the
probability that study
In more detail, let
and say study
with fixed
where
Copas (1999) and Copas and J. Q. Shi (2000) use standard properties of the
normal distribution to show that the probability of observing study
Thus if
> ## Read in data
> data(Crowther2003)
> ## Do meta-analysis using function metabin() from R package meta
> m.crowther <- metabin(event.e, total.e, event.c, total.c,
+ data=Crowther2003, sm="OR", studlab=study)
Warning message:
In metabin(event.e, total.e, event.c, total.c, data = Crowther2003, :
Increment 0.5 added to each cell in 2x2 tables with zero cell frequencies
> ## Do test for funnel plot asymmetry (Harbord et al.)
> ## using function metabias() from R package meta
> metabias(m.crowther, meth="score")
Linear regression test of funnel plot asymmetry (efficient score)
data: m.crowther
t = -3.4551, df = 7, p-value = 0.01062
alternative hypothesis: asymmetry in funnel plot
sample estimates:
bias se.bias slope
-2.6149672 0.7568465 0.2927025
> ## Do Copas analysis
> cop1 <- copas(m.crowther)
> ## Plot Copas analysis
> plot(cop1)
> ## Redo Copas analysis as P-value of residual selection bias
> ## is still significant
> cop2 <- copas(m.crowther,
+ gamma0.range=c(-0.55, 2),
+ gamma1.range=cop1$gamma1.range)
Warning messages:
1: In sqrt(1 - rho.tilde^2) : NaNs produced
2: In sqrt((tau^2 + sigma^2) * (1 - rho.tilde^2)) : NaNs produced
3: In sqrt((1 - rho.tilde^2)) : NaNs produced
> ## Plot Copas analysis
> plot(cop2)
> ## Print summary of Copas analysis
> summary(cop2)
Summary of Copas selection model analysis:
publprob OR 95%-CI pval.treat pval.rsb N.unpubl
1.00 0.4967 [0.3247; 0.7599] 0.0013 0.006 0
0.82 0.5483 [0.3494; 0.8605] 0.009 0.007 1
0.67 0.6063 [0.3938; 0.9335] 0.023 0.0115 2
0.55 0.6702 [0.4601; 0.9761] 0.037 0.0205 4
0.45 0.7402 [0.5376; 1.0193] 0.0653 0.046 6
0.37 0.8337 [0.6055; 1.1480] 0.2652 0.2461 9
Copas model (adj) 0.8337 [0.6055; 1.1480] 0.2652 0.2461 9
Random effects model 0.4880 [0.3234; 0.7363] 0.0006
Legend:
publprob - Probability of publishing the study with the largest standard error
pval.treat - P-value for hypothesis that the treatment effect is equal in both groups
pval.rsb - P-value for hypothesis that no further selection remains unexplained
N.unpubl - Approximate number of studies the model suggests remain unpublished
summary.copas()
.
We have developed the R package copas to provide a comprehensive set of R functions for fitting the Copas selection model, then printing and displaying the results. The R package copas is an add-on package to the R package meta (Schwarzer 2007). To illustrate the use of the R package copas, consider a meta-analysis of 9 studies comparing prophylactic maternal phenobarbital with control for periventricular haemorrhage in preterm infants (Crowther and D. J. Henderson-Smart 2003).
Figure 1 illustrates the analysis of these data,
which have a binary response (presence or absence of haemorrhage). After
reading the data in, we perform a meta-analysis on the odds-ratio scale
using function metabin()
from the R package
meta. Note the warning that
a continuity correction has been used, since one study has a zero cell.
By default, 0.5 is added only to cell counts of two-by-two tables with
zero cell frequencies. Two other stategies to adjust for zero cell
frequencies are implemented in the function metabin()
(Schwarzer 2007): (i) add 0.5 to all two-by-two tables in the case of
zero cell counts in one or more studies (allincr = TRUE
), (ii) add 0.5
to all two-by-two tables irrespective of zero cell counts
(addincr = TRUE
).
The random effects model suggests a significant intervention benefit (p
= 0.0006). Nevertheless, a funnel plot of the data (top left panel
Figure 2) suggests effects may be systematically
larger in smaller studies. Investigating this with a statistical test
(function metabias()
, Figure 1) supports this
suspicion with
We therefore perform a Copas analysis, using the function copas()
, as
shown in Figure 1. This fits the Copas selection
model repeatedly, by maximising the likelihood
(Carpenter, G. Schwarzer, G. Rücker, and R. Künstler 2009) — subject to the constraints that
optim()
(L-BFGS-B method). We use transformations of rho.tilde
which is used internally in likelihood estimation
(Copas and J. Q. Shi 2000 250). Typically, these warnings can be safely
ignored. The function plot.copas()
can be used as a diagnostic tool
(see Discussion).
An object of class copas
is created which we can print
, summarise
and plot
. Figure 2 shows the result of function
plot.copas()
, which by default displays four panels. The top left plot
is the usual funnel plot, which plots the study specific effect sizes
(here log odds ratios) against their standard error. The vertical broken
line is the usual fixed effect estimate of the treatment effect, while
the vertical grey line is the usual random effects estimate of the
treatment effect. Diagonal broken lines show
Given the range of study standard errors in the meta-analysis, the
function copas()
chooses a range of
The function copas()
fits the Copas selection model over a grid of (by
default 400) points. The top right panel produced by plot.copas
shows
a contour plot of the resulting treatment estimates
copas
analysis of phenobarbital versus control for
reducing neonatal periventricular haemorrhage
(Crowther and D. J. Henderson-Smart 2003).The lower two panels of Figure 2 use this
information to present an accessible summary of the Copas selection
model analysis. First, at each of the line/contour intersections marked
with an ‘o’ in the contour plot, the program calculates the probability
of publishing the trial with the largest SE. Then, in the lower left
panel this is plotted against the corresponding treatment estimate
(
Finally, the bottom right panel attempts to answer the question of which degree of selection is most plausible under the model; i.e. which treatment estimates should we pay most attention to. It does this as follows. For each degree of selection (summarised by the probability of publishing the trial with the largest SE) the program calculates whether the remaining asymmetry in the funnel plot is more than might be expected by chance, and calculates a p-value to test this hypothesis. These are then plotted against the corresponding degree of selection. Degrees of selection corresponding to p-values above 0.1 (i.e. at this degree of selection, no evidence of residual selection bias in the data) are more plausible under the model; corresponding treatment estimates in the bottom left panel are more plausible.
In this example, with no selection bias (left hand end of bottom left
panel) the p-value for residual selection bias in the data is summary.copas()
(Figure 1) complements this with a
numerical summary.
The Copas selection model analysis therefore suggests that after accounting for selection bias and/or other small study effects, there is no evidence of a benefit of the intervention. This agrees with the authors of the original study, who comment that the two largest trials — which are of higher quality — show no benefit, and conclude “evidence does not support phenobarbital treatment to women giving birth before 34 weeks to decrease the risk of bleeding into the babies’ brains”.
copas()
Although the majority of studies can be analysed automatically using the Copas selection model this is not always true. Some analyses will require fine tuning. The following options are available:
gamma0.range
, gamma1.range
:
These allow users to control the grid of
ngrid
:
This parameter controls how fine the grid of copas
function fits the Copas selection model over a grid
of ngrid
ngrid
values.
levels
:
Fitting the Copas model over the grid specified by the arguments
gamma0.range
, gamma1.range
and ngrid
results in a treatment
estimate at every point in the grid. These are then displayed on a
contour plot where contours of treatment effect (z-axis) are shown
by gamma0 (x-axis) and gamma1 (y-axis). This argument is a numeric
vector which specifies the treatment effects for which contour lines
will be drawn.
left
:
A logical indicating whether the cause of any selection bias is due
to missing studies on the left or right of the funnel plot: left
hand side if left = TRUE
, right hand side if left = FALSE
. This
information is needed in order to be sure the test for presence of
residual selection bias is calculated correctly.
rho.bound
:
A number giving the upper bound for the correlation parameter
silent
:
A logical indicating whether information on progress in fitting the
Copas selection model should be printed: silent = TRUE
specifies
not to print information (the default).
warn
:
A number setting the handling of warning messages. It is not
uncommon for numerical problems to be encountered during estimation
over the grid of (warn = -1
: ignore all warnings;
warn = 0
(the default): store warnings till the function finishes;
if there are less than 10, print them, otherwise print a message
saying warning messages were generated; warn = 1
: print warnings
as they occur; warn = 2
: stop the function when the first warning
is generated.
All the information used to generate the plots is available as
attributes of the object created by the copas
function. Thus
tailor-made versions of the panels in Figure 2 can
be created by users without any further calculation.
Finally, the summary
function and the plot
function allow the user
to specify the confidence level.
The extensive literature on selection bias in meta-analysis (see Rothstein, A. J. Sutton, and M. Borenstein (2005) and references therein) reflects the importance to the community of systematic reviewers of detecting, and where possible adjusting for, selection bias. The Copas selection model is a key tool for doing this (Carpenter, G. Schwarzer, G. Rücker, and R. Künstler 2009) (alongside other methods such as the trim-and-fill method included in the meta package (Schwarzer 2007)).
An empirical evaluation of 157 meta-analyses with 4 to 66 studies showed
that our implementation of the Copas selection model provided a useful
summary in about 80% of meta-analyses (Carpenter, G. Schwarzer, G. Rücker, and R. Künstler 2009).
In the remaining meta-analyses (i) the contour plot did not show roughly
parallel contour lines, (ii) the 95% confidence intervals in the
treatment effect plot did not vary smoothly, or (iii) plot.copas()
which should be used routinely as a
diagnostic tool.
We have attempted to make the help files for the copas package accessible to systematic reviewers, who in most cases are likely to be new users of R. With this package, we therefore believe that R has a powerful toolkit for systematic reviewers.
This research was supported by the Deutsche Forschungsgemeinschaft (German Research Foundation) under grant FOR 534 Schw 821/2-2
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Carpenter, et al., "copas: An R package for Fitting the Copas Selection Model", The R Journal, 2009
BibTeX citation
@article{RJ-2009-012, author = {Carpenter, James and Rücker, Gerta and Schwarzer, Guido}, title = {copas: An R package for Fitting the Copas Selection Model}, journal = {The R Journal}, year = {2009}, note = {https://rjournal.github.io/}, volume = {1}, issue = {2}, issn = {2073-4859}, pages = {31-36} }