Meta-analysis, a statistical procedure that compares, combines, and synthesizes research findings from multiple studies in a principled manner, has become popular in a variety of fields. Meta-analyses using study-level (or equivalently aggregate) data are of particular interest due to data availability and modeling flexibility. In this paper, we describe an R package metapack that introduces a unified formula interface for both meta-analysis and network meta-analysis. The user interface—and therefore the package—allows flexible variance-covariance modeling for multivariate meta-analysis models and univariate network meta-analysis models. Complicated computing for these models has prevented their widespread adoption. The package also provides functions to generate relevant plots and perform statistical inferences like model assessments. Use cases are demonstrated using two real data sets contained in metapack.
The U.S. Food and Drug Administration provides a clear definition of meta-analysis as “the combining of evidence from relevant studies using appropriate statistical methods to allow inference(s) to be made to the population of interest” (U.S. Food and Drug Administration et al. 2018). In fields like medicine, pharmacology, and epidemiology, meta-analysis has become popular for reconciling conflicting results or corroborating consistent ones in multiple studies (Chalmers et al. 2002; Borenstein et al. 2011; Hartung et al. 2011; Balduzzi et al. 2019). Findings produced from meta-analyses are often placed at the apex of the evidence hierarchy (U.S. Food and Drug Administration et al. 2018).
R already has a large supply of meta-analysis packages. meta (Schwarzer 2007) and rmeta (Lumley 2018) use the method of moments introduced in DerSimonian and Laird (1986). metafor (Viechtbauer 2010) further contains moderator analyses and fits meta-regression (Berkey et al. 1995) through weighted least squares. On the other hand, metaLik (Guolo 2012) takes a likelihood approach based on the second-order approximation of the modified likelihood ratio test statistic (Skovgaard et al. 1996). metatest (Huizenga et al. 2011) further includes hypothesis testing capabilities through the likelihood ratio test with Barlett correction, and mvmeta (Gasparrini et al. 2012) fits multivariate meta-analysis and meta-regression models via the method of maximum likelihood. There are packages for Bayesian meta-analytic inference as well. bayesmeta (Röver 2020) assumes the normal-normal hierarchical random-effect model and allows the user to choose prior distributions with a great deal of flexibility, both informative and noninformative. nmaINLA (Guenhan et al. 2018) provides functionalities for network meta-analysis and meta-regression with integrated nested Laplace approximations (INLA) as an alternative to the Markov chain Monte Carlo (MCMC) algorithm. On the other hand, bmeta (Ding and Baio 2016) delivers flexible meta-analytic modeling by interfacing with JAGS (Plummer 2003). MetaStan (Guenhan 2020) provides binomial-normal hierarchical models with weakly informative priors, building upon the probabilistic language Stan (Stan Development Team 2020).
Despite its importance and the wide array of R packages available, meta-analysis is still regarded as a niche field that interests a narrow group of researchers and remains relatively low impact. We partially attribute this phenomenon to the fact that the R community has yet to come up with a user interface that unifies the theoretical distinctions between univariate and multivariate models, and between meta-analysis and network meta-analysis although the models have grown more complicated in the intervening years. Furthermore, a large class of simple Bayesian meta-analytic models is handled by probabilistic programming languages like Stan (Stan Development Team 2020) or BUGS (Sturtz et al. 2005). Meanwhile, many complex models are not easily programmable in probabilistic languages, and are not readily available in R. Especially, in the context of variance-covariance matrix modeling in (network) meta-analysis, metapack is the first attempt in the R cosmos, to the best of our knowledge, to provide easy access to regression-modeling of the variances (of the treatment effects) as well as a wide array of options for modeling the response covariance matrices when the aggregate responses are multivariate with only a partially observed within-study sample covariance matrix.
metapack presented in this paper proposes a formula structure that flexibly represents the types of responses (univariate and multivariate) and the number of treatments (meta-analysis and network meta-analysis). The package also provides functions to assess model fits such as the deviance information criterion (DIC) and the logarithm of the pseudo-marginal likelihood (LPML), and to generate diagnostic plots. Some potential complications, theoretical and computational, in these model selection criteria may break the algorithm or erode the statistical inference when unaddressed (see Section 5.0.0.1), which metapack takes care of by default—an advantage over model-agnostic programming languages.
The rest of this paper is organized as follows. Section 2 briefly reviews the (network) meta-analysis model. Section 3 describes the general form of a meta-analysis data set to establish a generic data structure for meta-analysis and network meta-analysis. Section 4 explains how the data structure can be represented using R’s extended formula and how metapack’s main function parses it, and lays down the various modeling options for meta-analysis and network meta-analysis. Section 5 further introduces the S3 methods available for performing statistical inferences and comparing models. Some computational considerations to accelerate the computation are detailed as well. Section 6 provides demonstrations using the cholesterol data and TNM data included in metapack. Finally, Section 7 concludes the paper by offering a cautionary remark for multivariate meta-analysis models regarding the number of observations required to perform valid inferences on the correlation matrix, and discussing future research and package development directions.
In this section, we briefly review the models considered in metapack.
There are largely two umbrella models: univariate or multivariate
meta-analysis based on (Yao et al. 2015), and univariate network
meta-analysis based on (Li et al. 2021). The various modeling options
for each model introduced in this section encompass the ones in
(Yao et al. 2015) and (Li et al. 2021). In what follows, the model
description will deal with a general multivariate model including both
meta-analysis and network meta-analysis, which is valid even when
univariate response is assumed, i.e.
Assume
Stacking the random effects for all response endpoints,
To streamline configuring models in R formula, it is important to understand the data structure for metapack. Table 1 represents a typical arm-level data set for (network) meta-analysis, where each row represents a trial arm.
Outcome ( |
SD ( |
DesignM1 ( |
DesignM2 ( |
Trial (k ) |
Treat (t ) |
n |
---|---|---|---|---|---|---|
1 | 4 | 1000 | ||||
1 | 1 | 545 | ||||
2 | 1 | 1200 | ||||
Outcome
is the response (or responses for multivariate cases), SD
is
the standard deviation(s) of the response(s), DesignM1
and DesignM2
are design matrices, and n
is the arm sample size. The pair of trial
and treatment indicators is unique to a row. The first design matrix,
DesignM1
, contains the covariates for fixed effects and will be
written as DesignM2
or
A meta-analytic data set is characterized as folows: (1) univariate or
multivariate and (2) meta-analysis or network meta-analysis. Here,
meta-analysis refers to when trials have specifically two treatments
(i.e. Treat
can have more than two unique values. The first
category (univariate vs. multivariate) is determined by the number of
response endpoints, and the second category (meta- vs. network
meta-analysis) by the number of treatments. All other modeling choices
fall into prior specification.
bmeta_analyze
is the main function in metapack, whose first argument
is an R formula. bmeta_analyze
internally parses a formula to identify
a model and ultimately calls a worker function. An extension of R’s
formula class, Formula
(Zeileis and Croissant 2010), accommodates multiple responses and parts, lending itself
well into meta-analysis modeling. Once a model is fully identified, the
MCMC algorithm is executed in C++, thanks to
Rcpp (Eddelbuettel and Balamuta 2017) and
RcppArmadillo
(Eddelbuettel and Sanderson 2014).
Name | Functionality | Description |
---|---|---|
bmeta_analyze |
Estimation | Fits (network) meta-analysis models |
hpd |
Inference | Computes highest posterior density (HPD) intervals of model parameters |
model_comp |
Inference | Computes model comparison measures (DIC or LPML) |
print |
Displays a summary of the output | |
summary |
Displays a summary of the output | |
plot |
Plots trace plots, density plots, or surface-under-the-cumulative-ranking-curve (SUCRA) plots | |
fitted |
Computes posterior means, standard deviations, and HPD intervals | |
coef |
Computes posterior means of fixed-effect coefficients | |
cholesterol |
Data set | Cholesterol data for multivariate meta-analysis |
TNM |
Data set | Triglyceride data for network meta-analysis |
Formula
The three characterizations of a meta-analytic data set must be encoded
in the formula. Requiring the formula to have two left-hand sides (LHS)
and two or three right-hand sides (RHS)
y1 + y2 | sd1 + sd2 ~ x1 + x2 + x3 + ns(n) | w1 + w2 + w3 | treat + trial (+ groups)
Each part in LHS or RHS is an R expression where variables (or
functions of variables) are chained with a plus sign (+
)—e.g.
x1 + x2
. The tilde (~
) separates all LHSs from all RHSs, each
further separated into parts by vertical bars (|
). The meaning of each
part is syntactically determined by its location within the formula,
like an English sentence. Therefore, every part must come in the exact
order as prescribed for bmeta_analyze
to correctly identify the model.
y1 + y2
), the responses, is required of all models.
Depending on the number of variables given in the first LHS,
bmeta_analyze
will determine whether the model is multivariate or
univariate. For example, a first LHS with only y
would flag the
model as univariate.sd1 + sd2
) supplies the standard deviations of the
endpoints required of an aggregate-data meta-analysis. The function
call will break if this part is missing.x1 + x2 + x3 + ns(n)
) defines the fixed-effect
covariates. For aggregate-data models, the arm sample sizes must be
passed as an argument to ns()
. In the example code, n
is the
variable containing the arm sample sizes.w1 + w2 + w3
) defines either the random-effect
covariates (bmeta_analyze
will
assume treat + trial
or treat + trial + groups
)
corresponds to the treatment and trial indicators, and optionally
the grouping variable if it exists. Variables here must be provided
in the exact order shown in the example.The dimension of the response(s) is explicit in the formula, which
determines the first characterization. The treatments are coerced to a
factor—if not already one—whose number of levels is extracted (i.e.
nlevels(treat)
) to resolve the second characterization, meta-analysis
versus network meta-analysis.
Aside from the first two arguments, formula
and data
, there are four
other optional arguments that must be provided as R’s list
class:
prior
, mcmc
, control
, and init
. All hyperparameters for the
prior distributions should be included in prior
—see
Section 2 for hyperparameters. mcmc
only
regards the numbers of MCMC iterations: ndiscard
for the number of
burn-in iterations, nskip
for thinning, and nkeep
for the posterior
sample size. control
configures the Metropolis-Hastings algorithm.
*_stepsize
with the asterisk replaced with one of the parameter names
indicates the step size for determining the sample evaluation points in
the localized Metropolis algorithm. Lastly, initial values for the
parameters can be provided in init
in case a user has a priori known
high-quality starting points.
For meta-analysis models, metapack acknowledges the possible existence
of the first-line and second-line treatments trials. More generally, the
trials may be grouped by a factor believed to generate disparate random
effects. Although an arbitrary number of groups can exist in theory, we
restrict our attention to two groups. Denoting the binary group
indicators by
A (boilerplate) template for this class of models is as follows:
f <- "y1 + y2 | sd1 + sd2 ~ x1 + x2 | w1 + w2 | treat + trial + groups"
fit <- bmeta_analyze(formula(f), data = df,
prior = list(c0 = [real], dj0 = [real], Omega0 = [matrix],
a0 = [real], b0 = [real],
d0 = [real], nu0 = [real], Sigma0 = [matrix]),
control = list(sample_Rho = [logical], Rho_stepsize = [real],
R_stepsize = [real], delta_stepsize = [real], model = [string]))
We use real
and string
as aliases for double
and character
in R.
Every bracketed expression should be replaced with an instance of the
enclosed class. The hyperparameters in prior
and step sizes in
control
will be clarified in the following modeling options. Note that
all parameters with a step size are sampled through the
Metropolis-Hastings algorithm.
The covariance matrix between the response endpoints (model
in the control
argument. For
model="NoRecovery"
) The simplest and easiest way
to model the covariance matrices is to relinquish correlation
recovery, in which case
Equation (5) is ignored. We assume
model="EquiCovariance"
) We can assume the
covariance matrix is for all pairs of treatments and trials. That
is, model="EquiWithinTreat"
) If the equal covariance
is too strong an assumption, we can allow the covariance matrices to
be equivalent within a treatment. That is, model="EquiCorrelation"
) To achieve the best of
both worlds—variances and correlations—we can let the variances
enjoy maximum freedom but attempt to recover the correlations by
restricting them to be identical across treatments and trials for
identifiability. Performing the decomposition,
model="Hierarchical"
) The hierarchical prior for
For univariate network meta-analysis, the design matrix for random
effects is restricted to be the selection matrix
A (boilerplate) template for this class of models is as follows:
f <- "y | sd ~ x1 + x2 | w1 + w2 | treat + trial"
fit <- bmeta_analyze(formula(f), data = df,
prior = list(c01 = [real], c02 = [real], df = [real],
a4 = [real], b4 = [real], a5 = [real], b5 = [real]),
control = list(sample_df = [logical], sample_Rho = [logical],
Rho_stepsize = [real], phi_stepsize = [real], lambda_stepsize = [real]))
Every bracketed expression should be replaced with an instance of the
enclosed class. The hyperparameters in prior
and step sizes in
control
will be clarified in the following modeling options. Note that
all parameters with a step size are sampled through the
Metropolis-Hastings algorithm.
The appeal of considering heavy-tailed random effects and modeling the
variance to be a deterministic linear function of a covariate is that
both extend but still cover the common cases (normal random effects and
no variance modeling) as either the limiting or a special case. Unlike
meta-analysis, there is no single argument (model
) determining the
modeling option. A model is rather specified by a combination of
arguments.
y | sd ~x1 + x2 | treat + trial
.bmeta_analyze
treats df
) as part of the prior
specification. Thus, df=Inf
in the prior
argument corresponds to
opting for normal random effects.sample_df
is placed in the
control
argument. If sample_df=TRUE
, the value for df
in
prior
will be used as the initial value, and the related
hyperparameters (a4
, b4
, a5
, and b5
) can be set in prior
.
For obvious reasons, df
cannot be assigned Inf
if
sample_df=TRUE
.The object (fit
) returned from bmeta_analyze
remembers the function
arguments, encapsulates the model specification, and contains the
posterior sample from the MCMC algorithm. Therefore, this object alone
can be passed to other methods to perform subsequent inferences. These
methods include fitted
, hpd
, coef
, model_comp
, and sucra
.
The posterior means, standard deviations, and the HPD intervals are
computed via the fitted()
function. The fitted()
function has two
optional arguments: level
and HPD
. level
determines the
credibility level of the interval estimation. HPD
, a logical
parameter, decides whether a highest posterior density or equal-tailed
credible interval will be produced. It is also possible to obtain the
posterior interval estimates only using hpd()
.
R> est <- fitted(fit, level=0.99, HPD = TRUE)
R> hpd <- hpd(fit, level = 0.95, HPD = TRUE)
coef(fit)
allows users to extract the posterior mean of the
fixed-effect coefficients.
It is crucial to determine a suitable model to base the statistical
inference on. Section 4.3 introduces five models for
The deviance information criterion (Spiegelhalter et al. 2002), or DIC,
is defined as follows:
R> dic <- model_comp(fit, type = "dic", verbose = TRUE, ncores = 3)
Another Bayesian model selection criterion is the logarithm of the
pseudo-marginal likelihood (LPML), defined as the summed logarithm of
the conditional predictive ordinates (CPO). The CPO has a
“leave-one-out” predictive interpretation, which in meta-analyses is
often overlooked—a significant oversight that will undermine model
comparison. The aggregate nature of meta-analysis data calls for a
redefinition of what is “left out” and what should be the base unit for
prediction. With trials as the base unit, the CPO for the
R> lpml <- model_comp(fit, type = "lpml", verbose = TRUE, ncores = 3)
Treatments included in only one trial violate the predictive
interpretation of the CPO. The corresponding trials thus should be
removed from LPML calculation. model_comp
returns the logarithm of the
CPO of every trial but corrects the LPML should such treatments exist.
For those occasions, a naive sum of the logarithm of the
The DIC and LPML for the meta-analysis models are relatively
straightforward since the observed-data likelihood is a multivariate
normal density. However, network meta-analysis models require some
technical considerations in evaluating the following observed-data
likelihood that involves an analytically intractable integral when
ncores
is specified otherwise, model_comp
will use two CPU cores for
parallel computing by default.
To prevent overflow, we take the following steps:
Although the exponential shifting scheme does not warrant preventing every occurrence of numerical overflow, we have observed stable evaluations of the integral for over several thousand batches of simulations.
It is important to diagnose whether the results are consistent with the
assumptions and visualize the findings. metapack provides methods for
these: plot
and sucra
. The plot
method is available for both
meta-analysis and network meta-analysis whereas sucra
is exclusively
for network meta-analysis.
The plot
method will take the fit
object and generate the density
plots and trace plots of
R> library("coda")
R> posterior <- as.mcmc(data.frame(gammaR = fit$mcmc.draws$gamR,
+ sig2 = fit$mcmc.draws$sig2))
R> plot(posterior)
Similarly, boa (Smith 2007)
can be used for output analysis. The posterior sample of
R> idx <- lower.tri(fit$mcmc.draws$Rho[,,1])
R> n_idx <- ncol(idx) * (ncol(idx) - 1) / 2
R> posterior <- as.mcmc(data.frame(
+ rho = t(vapply(1:fit$mcmc$nkeep, function(ikeep) {
+ rho_i <- fit$mcmc.draws$Rho[,,ikeep]
+ rho_i[idx]
+ }, FUN.VALUE=numeric(n_idx)))))
R> plot(posterior)
The surface under the cumulative ranking (SUCRA) curve is useful when
the ranking of the treatments in a network meta-analysis is of interest.
Based on the posterior sample, sucra
will take the fit
object and return the SUCRA and the discrete
rank probability matrix
R> s <- sucra(fit)
R> s$SUCRA
R> s$rankprob
If only plotting is needed, the storage of the sucra
object can be
bypassed via running plot(sucra(fit))
. Note that SUCRA
metapack includes a data set, cholesterol
, which consists of 26
double-blind, randomized, active, or placebo-controlled clinical trials
on patients with primary hypercholesterolemia sponsored by Merck & Co.,
Inc., Kenilworth, NJ, USA (Yao et al. 2015). The data set can be loaded
by running data("cholesterol")
. The cholesterol
data set has three
endpoints: low density lipoprotein cholesterol (pldlc
), high density
lipoprotein cholesterol (phdlc
), and triglycerides (ptg
). The
percent change from the baseline in the endpoints, variables prefixed by
p-
, are the aggregate responses, followed by the corresponding
standard deviations prefixed by sd-
.
Variable | Description |
---|---|
study |
study identifier |
trial |
trial identifier |
treat |
treatment indicator for Statin or Statin+Ezetimibe |
n |
the number of participants in the study arms |
pldlc |
aggregate percentage change in LDL-C |
phdlc |
aggregate percentage change from baseline in HDL-C |
ptg |
aggregate percentage change from baseline in triglycerides (TG) |
sdldl |
sample standard deviation of percentage change in LDL-C |
sdhdl |
sample standard deviation of percentage change in HDL-C |
sdtg |
sample standard deviation of percentage change in triglycerides (TG) |
onstat |
whether the participants were on Statin prior to the trial |
bldlc |
baseline LDL-C |
bhdlc |
baseline HDL-C |
btg |
baseline triglycerides (TG) |
age |
age in years |
white |
the proportion of white participants |
male |
the proportion of male participants |
dm |
the proportion of participants with diabetes mellitus |
durat |
duration in weeks |
Variable documentation is also available in help("cholesterol")
.
R> set.seed(2797542)
R> f_1 <- 'pldlc + phdlc + ptg | sdldl + sdhdl + sdtg ~
+ 0 + bldlc + bhdlc + btg + age + durat + white + male + dm + ns(n) | treat |
+ treat + trial + onstat'
R> fit_ma <- bmeta_analyze(formula(f_1), data = cholesterol,
+ prior = list(model="NoRecovery"),
+ mcmc = list(ndiscard = 1000, nkeep = 1000),
+ control=list(scale_x = TRUE, verbose=TRUE))
plot(fit_ma)
—plots have been omitted for brevity. The
trace plots show good mixing and convergence of MCMC chains and the
density plots indicate that the marginal posterior distribution for each
treatment effect is roughly symmetric. The MCMC samples of the
regression coefficients will be automatically assigned row names
according to the formula provided by the user. treat*(1-2nd)_3
indicates the treatment effect within the first-line patients (i.e.
onstat=0
) with respect to the third response variable, ptg
.
Likewise, (Intercept)*2nd_3
is the baseline effect within the
second-line patients (i.e. onstat=1
) with respect to
ptg
.summary
can be used to summarize the posterior sample from the fit
object. summary
will name the variables accordingly in the output,
suffixed by the index bldlc_1
in the output below corresponds to the
base LDL-C’s coefficient associated with the first endpoint (pldlc
).
The summary table consists of the posterior mean, posterior standard
deviation, the HPD lower bound, and the HPD upper bound. Users may
choose to compute the equal-tailed credible intervals (CI) instead of
the HPD intervals by setting HPD=FALSE
.
R> summary(fit_ma, HPD = TRUE, level = 0.95)
Call:
bmeta_analyze(formula = formula(f_1), data = cholesterol,
prior = list(model = "NoRecovery"),
mcmc = list(ndiscard = 1000, nkeep = 1000), control = list(scale_x = TRUE,
verbose = TRUE))
Fixed-effects:
Post.Mean Std.Dev HPD(Lower) HPD(Upper)
bldlc_1 0.1394 0.0938 -0.0532 0.3051
bhdlc_1 -0.5146 0.6329 -1.7641 0.7243
btg_1 0.0018 0.0818 -0.1693 0.1607
age_1 0.3785 0.4389 -0.5308 1.1648
durat_1 0.9699 0.5234 -0.1658 1.9979
white_1 -5.2449 7.4048 -19.7345 8.9764
male_1 -1.2652 12.3948 -25.6807 23.0538
dm_1 0.3987 5.3392 -10.4995 9.4673
bldlc_2 -0.0128 0.0137 -0.0393 0.0134
bhdlc_2 -0.0294 0.1280 -0.2887 0.2149
btg_2 0.0228 0.0212 -0.0203 0.0619
age_2 -0.0963 0.0735 -0.2422 0.0460
durat_2 -0.0007 0.0712 -0.1320 0.1382
white_2 5.5822 1.3044 3.0802 8.1901
male_2 -2.1449 2.6956 -7.5975 2.6164
dm_2 -1.0457 1.0923 -3.1816 1.0341
bldlc_3 0.0141 0.0406 -0.0719 0.0853
bhdlc_3 0.0082 0.2637 -0.5278 0.4872
btg_3 -0.0734 0.0467 -0.1598 0.0154
age_3 -0.1385 0.2043 -0.5077 0.2591
durat_3 0.0996 0.2546 -0.3718 0.6080
white_3 -0.7053 5.2345 -11.0570 8.9591
male_3 14.5482 7.1735 0.0642 28.2959
dm_3 5.0535 2.9542 -1.4157 10.5545
(Intercept)*(1-2nd)_1 -42.8675 3.2934 -48.8356 -35.8794
treat*(1-2nd)_1 -12.1435 1.1211 -14.4040 -10.0135
(Intercept)*2nd_1 -3.2219 3.0426 -9.3109 2.7419
treat*2nd_1 -20.0843 1.5235 -23.3169 -17.2637
(Intercept)*(1-2nd)_2 5.1425 0.5840 3.8240 6.1036
treat*(1-2nd)_2 2.0787 0.4718 1.0663 3.0222
(Intercept)*2nd_2 0.7346 0.5814 -0.3377 1.8738
treat*2nd_2 1.3482 0.3116 0.7579 1.9880
(Intercept)*(1-2nd)_3 -18.5035 2.1543 -22.3232 -13.7985
treat*(1-2nd)_3 -4.7726 0.9952 -6.7784 -2.9511
(Intercept)*2nd_3 -4.1805 1.2910 -6.7672 -1.7005
treat*2nd_3 -8.8579 0.9443 -10.6208 -7.0530
---------------------------------------------------
*HPD level: 0.95
The suffixed _j
where j
can be 1, 2, or 3 corresponds to the
response endpoint. Since scale_x = TRUE
is equivalent to
scale(<var>, center=TRUE, scale=TRUE)
, the covariates have been
centered, which affects the interpretation of the intercepts. This
allows us to interpret (Intercept*(1-2nd)_1)=-42.8675
as the statin
effect in the first-line studies, where 2nd
represents the indicator
variable for second-line studies evaluating to one if second-line and
zero otherwise. On the other hand, the coefficient estimate -12.1435 for
treat*(1-2nd)_1
is the Statin+Ezetimibe effect, compared to
administering statin alone. For the second-line studies where patients
had already been on statin, (Intercept)*2nd_1=-3.2219
came out
insignificant, according to the 95% HPD interval, as anticipated because
the treatment for this group was merely a continuation of taking statin.
The coefficient estimate -20.0843 for treat*2nd_1
shows that ezetimibe
on top of statin has a greater cholesterol-lowering effect than statin
alone.
print
is similar to summary
but additionally prints the model
specification. The output from print(fit_ma)
is given as follows.
R> print(fit_ma, HPD = TRUE, level = 0.95)
Call:
bmeta_analyze(formula = formula(f_1),
data = cholesterol, prior = list(model = "NoRecovery"),
mcmc = list(ndiscard = 1000, nkeep = 1000),
control = list(scale_x = TRUE, verbose = TRUE))
Model:
(Aggregate mean)
y_kt = X_kt * theta + W_kt * gamma_k + N(0, Sigma_kt / n_kt)
(Sample Variance)
(n_kt - 1) S_kt ~ Wishart(n_kt - 1, Sigma_kt)
(Random effects)
[gamma_k | Omega] ~ N(0, Omega)
Priors:
theta ~ MVN(0, 1e+05 * I_p)
Omega_j^{-1} ~ Wishart( 2.1 , Omega0)
Sigma_kt = diag(sig_{tk,11}^2, ..., sig_{tk,JJ}^2)
where sig_{tk,jj}^2 ~ IG( 0.1 , 3.1 )
---------------------------------------------------
Number of trials: 26
Number of arms: 52
Number of treatments: 2
Post.Mean Std.Dev HPD(Lower) HPD(Upper)
bldlc_1 0.1394 0.0938 -0.0532 0.3051
bhdlc_1 -0.5146 0.6329 -1.7641 0.7243
btg_1 0.0018 0.0818 -0.1693 0.1607
age_1 0.3785 0.4389 -0.5308 1.1648
durat_1 0.9699 0.5234 -0.1658 1.9979
white_1 -5.2449 7.4048 -19.7345 8.9764
male_1 -1.2652 12.3948 -25.6807 23.0538
dm_1 0.3987 5.3392 -10.4995 9.4673
bldlc_2 -0.0128 0.0137 -0.0393 0.0134
bhdlc_2 -0.0294 0.1280 -0.2887 0.2149
btg_2 0.0228 0.0212 -0.0203 0.0619
age_2 -0.0963 0.0735 -0.2422 0.0460
durat_2 -0.0007 0.0712 -0.1320 0.1382
white_2 5.5822 1.3044 3.0802 8.1901
male_2 -2.1449 2.6956 -7.5975 2.6164
dm_2 -1.0457 1.0923 -3.1816 1.0341
bldlc_3 0.0141 0.0406 -0.0719 0.0853
bhdlc_3 0.0082 0.2637 -0.5278 0.4872
btg_3 -0.0734 0.0467 -0.1598 0.0154
age_3 -0.1385 0.2043 -0.5077 0.2591
durat_3 0.0996 0.2546 -0.3718 0.6080
white_3 -0.7053 5.2345 -11.0570 8.9591
male_3 14.5482 7.1735 0.0642 28.2959
dm_3 5.0535 2.9542 -1.4157 10.5545
(Intercept)*(1-2nd)_1 -42.8675 3.2934 -48.8356 -35.8794
treat*(1-2nd)_1 -12.1435 1.1211 -14.4040 -10.0135
(Intercept)*2nd_1 -3.2219 3.0426 -9.3109 2.7419
treat*2nd_1 -20.0843 1.5235 -23.3169 -17.2637
(Intercept)*(1-2nd)_2 5.1425 0.5840 3.8240 6.1036
treat*(1-2nd)_2 2.0787 0.4718 1.0663 3.0222
(Intercept)*2nd_2 0.7346 0.5814 -0.3377 1.8738
treat*2nd_2 1.3482 0.3116 0.7579 1.9880
(Intercept)*(1-2nd)_3 -18.5035 2.1543 -22.3232 -13.7985
treat*(1-2nd)_3 -4.7726 0.9952 -6.7784 -2.9511
(Intercept)*2nd_3 -4.1805 1.2910 -6.7672 -1.7005
treat*2nd_3 -8.8579 0.9443 -10.6208 -7.0530
---------------------------------------------------
*HPD level: 0.95
For model comparison, the deviance information criterion (DIC) and the
logarithm of the pseudo marginal likelihood (LPML) can be computed using
the model_comp
method. The DIC will also contain
R> dic <- model_comp(fit_ma, "dic")
R> lpml <- model_comp(fit_ma, "lpml")
R> c(dic$dic, dic$Dev, dic$pD)
[1] 827.80726 734.50691 46.65018
R> lpml$lpml
[1] -428.1813
metapack includes another data set, TNM
, which consists of 29
studies, dubbed the Triglycerides Network Meta (TNM) data
(Li et al. 2019). The data set has 73 observations and 15 variables,
which can be loaded via data("TNM")
. The aggregate response variable
is the mean percentage difference in triglycerides (ptg
), paired with
its corresponding standard deviation (sdtg
).
Variable | Description |
---|---|
trial |
trial identifier |
treat |
lovastatin (L), rosuvastatin (R), pravastatin (P), ezetimibe (E), |
simvastatin+ezetimibe (SE), atorvastatin+ezetimibe (AE), | |
lovastatin+ezetimibe (LE), or pravastatin+ezetimibe (PE) | |
n |
the number of participants in the study arms |
ptg |
percentage change from baseline in triglycerides (TG) |
sdtg |
sample standard deviation of percentage change in triglycerides (TG) |
bldlc |
baseline LDL-C |
bhdlc |
baseline HDL-C |
btg |
baseline triglycerides (TG) |
age |
age in years |
white |
the proportion of white participants |
male |
the proportion of male participants |
bmi |
body fat index |
potencymed |
the proportion of medium statin potency |
potencyhigh |
the proportion of high statin potency |
durat |
duration in weeks |
Similarly to cholesterol
, variable descriptions are also available
through help("TNM")
.
LPML in Section 5.0.0.1 is not the only quantity
affected by the treatments only included in a single trial. The
variances of the corresponding treatment effects are nonestimable.
(Li et al. 2019) proposes to group those treatments and allow the
treatments in a group to share the same variance. This grouping scheme
can be easily achieved using match
in R:
R> TNM$group <- factor(match(TNM$treat, c("PBO", "R"), nomatch = 0))
In the following demonstration, we consider the following model:
R> f_2 <- 'ptg | sdtg ~
+ 0 + bldlc + bhdlc + btg + age + white + male + bmi +
+ potencymed + potencyhigh + durat + ns(n) |
+ scale(bldlc) + scale(btg) + group | treat + trial'
The model can be fit by running
R> set.seed(2797542)
R> fit_nma <- bmeta_analyze(formula(f_2), data = TNM,
+ mcmc = list(ndiscard = 1000, nskip = 1, nkeep = 1000),
+ control=list(scale_x = TRUE, verbose=TRUE))
Again, the model summary can be obtained using either summary
or
print
with minor differences.
R> summary(fit_nma)
Call:
bmeta_analyze(formula = formula(f_2), data = TNM, mcmc = list(ndiscard = 1000,
nskip = 1, nkeep = 1000), control = list(scale_x = TRUE,
verbose = TRUE))
Posterior inference in network meta-regression models
Fixed-effects:
Post.Mean Std.Dev HPD(Lower) HPD(Upper)
bldlc -0.0071 0.0171 -0.0409 0.0230
bhdlc 0.2332 0.2490 -0.2802 0.7027
btg 0.0846 0.0366 0.0006 0.1460
age -0.0068 0.1028 -0.2003 0.2020
white -7.2148 1.9867 -10.7112 -3.0757
male 1.2323 7.3305 -12.5660 15.6644
bmi -0.4505 0.3534 -1.1544 0.2184
potencymed 6.8278 7.9203 -8.5359 22.4551
potencyhigh -0.6474 7.9330 -16.9216 14.4191
durat 0.1880 0.1879 -0.1638 0.5652
A -24.3175 1.7820 -27.9255 -20.9635
AE -30.3604 2.9591 -35.7474 -24.2297
E -5.2737 6.1207 -17.4431 6.4066
L -11.6521 4.3747 -20.1106 -2.7431
LE -26.9507 3.4730 -33.3236 -19.9223
P -8.3357 4.5065 -16.9311 0.7870
PBO 1.6170 6.0629 -9.9366 13.6298
PE -22.7722 3.6333 -29.6129 -15.7540
R -17.7669 2.0010 -21.3989 -13.3839
S -19.1616 1.2175 -21.7647 -16.9279
SE -21.8271 2.0509 -25.8031 -17.6347
---------------------------------------------------
*HPD level: 0.95
We observe that with covariate adjustment, all active treatments (A, AE, E, L, LE, P, PE, R, S, SE) reduce triglyceride (TG) more effectively than the placebo (PBO), although E and P have 95% HPD intervals including zero.
The output from print(fit_nma)
further includes the model
specification, and summary statistics for
R> print(fit_nma)
Call:
bmeta_analyze(formula = formula(f_2), data = TNM, mcmc = list(ndiscard = 1000,
nskip = 1, nkeep = 1000), control = list(scale_x = TRUE,
verbose = TRUE))
Model:
(Aggregate mean)
y_kt = x_kt'theta + tau_kt * gamma_kt + N(0, sigma_kt^2 / n_kt)
(Sample Variance)
(n_kt - 1) S^2 / sigma_kt^2 ~ chi^2(n_kt - 1)
(Random effects)
[gam | Rho,nu] ~ MVT(0, E_k' Rho E_k, nu)
Priors:
theta ~ MVN(0, c01 * I_p), c01= 1e+05
phi ~ MVN(0, c02 * I_q), c02= 4
p(sigma^2) ~ 1/sigma^2 * I(sigma^2 > 0)
p(Rho) ~ 1
---------------------------------------------------
Number of studies: 29
Number of arms: 73
Number of treatments: 11
Post.Mean Std.Dev HPD(Lower) HPD(Upper)
bldlc -0.0071 0.0171 -0.0409 0.0230
bhdlc 0.2332 0.2490 -0.2802 0.7027
btg 0.0846 0.0366 0.0006 0.1460
age -0.0068 0.1028 -0.2003 0.2020
white -7.2148 1.9867 -10.7112 -3.0757
male 1.2323 7.3305 -12.5660 15.6644
bmi -0.4505 0.3534 -1.1544 0.2184
potencymed 6.8278 7.9203 -8.5359 22.4551
potencyhigh -0.6474 7.9330 -16.9216 14.4191
durat 0.1880 0.1879 -0.1638 0.5652
A -24.3175 1.7820 -27.9255 -20.9635
AE -30.3604 2.9591 -35.7474 -24.2297
E -5.2737 6.1207 -17.4431 6.4066
L -11.6521 4.3747 -20.1106 -2.7431
LE -26.9507 3.4730 -33.3236 -19.9223
P -8.3357 4.5065 -16.9311 0.7870
PBO 1.6170 6.0629 -9.9366 13.6298
PE -22.7722 3.6333 -29.6129 -15.7540
R -17.7669 2.0010 -21.3989 -13.3839
S -19.1616 1.2175 -21.7647 -16.9279
SE -21.8271 2.0509 -25.8031 -17.6347
phi1 0.4088 0.2716 -0.1610 0.8882
phi2 -0.3248 0.3869 -1.1721 0.2902
phi3 0.2692 0.2239 -0.1859 0.6835
phi4 -1.0862 1.2024 -3.2686 0.9419
phi5 0.5973 0.3407 -0.0341 1.2829
---------------------------------------------------
*HPD level: 0.95
The model comparison measures are computed using model_comp
. For
example,
R> dic <- model_comp(fit_nma, "dic")
R> c(dic$dic, dic$Dev, dic$pD)
[1] 386.41450 334.72118 25.84666
R> lpml <- model_comp(fit_nma, "lpml")
R> lpml$lpml
[1] -161.518
The plot
method will generate trace plots and density plots of the
fixed-effect coefficients. Figure 2 shows the
trace plots and density plots of treatments R, S, and SE from the
network meta-analysis model, generated by plot(fit_nma)
.
plot(fit_nma)
. The
trace plots show good mixing and convergence of MCMC chains and the
density plots indicate that the marginal posterior distribution for each
treatment effect is roughly symmetric. The treatment labels come from
the group
. If group
is a factor, its levels will be used. Otherwise,
treatment labels will be
numbered.In addition to the MCMC diagnostics, network meta models can be
visualized using SUCRA plots, i.e. plot(sucra(fit_nma))
.
Figure 3 shows the SUCRA plot from the fit_nma
object. The plot
function for SUCRA uses
ggplot2 (Wickham 2016)
and gridExtra
(Auguie 2017) to generate and combine plots.
plot(sucra(fit_nma))
. The green dashed line represents the discrete
probability mass and the orange solid line represents the cumulative
probability. The SUCRA values are displayed on top of each subplot. For
optimal visualization, we recommend the labels be three characters or
fewer. AE is ranked highest according to SUCRA, followed by
LE.This paper introduces metapack for (network) meta-analysis, and
illustrates the usage of the main function, bmeta_analyze
. We further
demonstrate how to analyze data using the cholesterol
and TNM
data
sets included in metapack. The package relies on Rcpp,
RcppArmadillo, and OpenMP to boost computation speed. Furthermore,
we propose a unified formula structure to represent meta-analytic data
using Formula, which we hope to see gain currency in the community.
There is a cautionary remark worth mentioning about the ratio between
the correlation information in the data and the number of
correlation-related parameters. The number of endpoints and the number
of arms in the multivariate meta-analysis models are critical in
determining whether the missing correlations (cholesterol
results in nonidentifiable correlations since
each off-diagonal entry of
The efficient estimation of the correlation matrix is an important future research direction for which the package will serve as a valuable repository of resources. The first few future implementations will focus on the regression modeling of the correlations to address the cases where either the data are too small to estimate the correlations or the number of treatments is too large. Models accommodating various circumstances regarding the sample variances are under active development. For example, researchers might want to suppress the sampling of the variance-covariance matrix as a whole for various reasons. Researchers might also have partially observed sample variances, not covariances, depending on the study included in the systematic review. metapack in the coming releases will provide these options. Therefore, metapack has great potential for further development.
We would like to thank the Editor, the Associate Editor, and the two reviewers for their helpful comments and suggestions, which led to a much improved version of the paper. Dr. Chen and Dr. Ibrahim’s research was partially supported by NIH grants #GM70335 and #P01CA142538, and Merck & Co., Inc., Rahway, NJ, USA. Dr. Kim’s research was supported by the Intramural Research Program of National Institutes of Health, National Cancer Institute.
meta, rmeta, metafor, metaLik, metatest, mvmeta, bayesmeta, nmaINLA, bmeta, MetaStan, metapack, Formula, Rcpp, RcppArmadillo, BH, coda, boa, ggplot2, gridExtra
Bayesian, ClinicalTrials, GraphicalModels, HighPerformanceComputing, MetaAnalysis, NumericalMathematics, Phylogenetics, Spatial, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Lim, et al., "metapack: An R Package for Bayesian Meta-Analysis and Network Meta-Analysis with a Unified Formula Interface", The R Journal, 2022
BibTeX citation
@article{RJ-2022-047, author = {Lim, Daeyoung and Chen, Ming-Hui and Ibrahim, Joseph G. and Kim, Sungduk and Shah, Arvind K. and Lin, Jianxin}, title = {metapack: An R Package for Bayesian Meta-Analysis and Network Meta-Analysis with a Unified Formula Interface}, journal = {The R Journal}, year = {2022}, note = {https://rjournal.github.io/}, volume = {14}, issue = {3}, issn = {2073-4859}, pages = {142-161} }