Meta-analysis, a statistical procedure that compares, combines, and synthesizes research findings from multiple studies in a principled manner, has become popular in a variety of fields. Meta-analyses using study-level (or equivalently *aggregate*) data are of particular interest due to data availability and modeling flexibility. In this paper, we describe an R package *metapack* that introduces a unified formula interface for both meta-analysis and network meta-analysis. The user interface—and therefore the package—allows flexible variance-covariance modeling for multivariate meta-analysis models and univariate network meta-analysis models. Complicated computing for these models has prevented their widespread adoption. The package also provides functions to generate relevant plots and perform statistical inferences like model assessments. Use cases are demonstrated using two real data sets contained in *metapack*.

The U.S. Food and Drug Administration provides a clear definition of
meta-analysis as *“the combining of evidence from relevant studies using
appropriate statistical methods to allow inference(s) to be made to the
population of interest”* (U.S. Food and Drug Administration et al. 2018). In fields like
medicine, pharmacology, and epidemiology, meta-analysis has become
popular for reconciling conflicting results or corroborating consistent
ones in multiple studies
(Chalmers et al. 2002; Borenstein et al. 2011; Hartung et al. 2011; Balduzzi et al. 2019).
Findings produced from meta-analyses are often placed at the apex of the
evidence hierarchy (U.S. Food and Drug Administration et al. 2018).

R already has a large supply of meta-analysis packages.
*meta* (Schwarzer 2007) and
*rmeta* (Lumley 2018) use the
method of moments introduced in DerSimonian and Laird (1986).
*metafor* (Viechtbauer 2010)
further contains moderator analyses and fits meta-regression
(Berkey et al. 1995) through weighted least squares. On the other hand,
*metaLik* (Guolo 2012)
takes a likelihood approach based on the second-order approximation of
the modified likelihood ratio test statistic (Skovgaard et al. 1996).
*metatest*
(Huizenga et al. 2011) further includes hypothesis testing capabilities through
the likelihood ratio test with Barlett correction, and
*mvmeta* (Gasparrini et al. 2012) fits
multivariate meta-analysis and meta-regression models via the method of
maximum likelihood. There are packages for Bayesian meta-analytic
inference as well.
*bayesmeta*
(Röver 2020) assumes the normal-normal hierarchical random-effect
model and allows the user to choose prior distributions with a great
deal of flexibility, both informative and noninformative.
*nmaINLA* (Guenhan et al. 2018)
provides functionalities for network meta-analysis and meta-regression
with integrated nested Laplace approximations (INLA) as an alternative
to the Markov chain Monte Carlo (MCMC) algorithm. On the other hand,
*bmeta* (Ding and Baio 2016)
delivers flexible meta-analytic modeling by interfacing with JAGS
(Plummer 2003).
*MetaStan*
(Guenhan 2020) provides binomial-normal hierarchical models with weakly
informative priors, building upon the probabilistic language Stan
(Stan Development Team 2020).

Despite its importance and the wide array of R packages available,
meta-analysis is still regarded as a niche field that interests a narrow
group of researchers and remains relatively low impact. We partially
attribute this phenomenon to the fact that the R community has yet to
come up with a user interface that unifies the theoretical distinctions
between univariate and multivariate models, and between meta-analysis
and network meta-analysis although the models have grown more
complicated in the intervening years. Furthermore, a large class of
simple Bayesian meta-analytic models is handled by probabilistic
programming languages like Stan (Stan Development Team 2020) or BUGS (Sturtz et al. 2005).
Meanwhile, many complex models are not easily programmable in
probabilistic languages, and are not readily available in R. Especially,
in the context of variance-covariance matrix modeling in (network)
meta-analysis, *metapack*
is the first attempt in the R cosmos, to the best of our knowledge, to
provide easy access to regression-modeling of the variances (of the
treatment effects) as well as a wide array of options for modeling the
response covariance matrices when the aggregate responses are
multivariate with only a partially observed within-study sample
covariance matrix.

*metapack* presented in this paper proposes a formula structure that
flexibly represents the types of responses (univariate and multivariate)
and the number of treatments (meta-analysis and network meta-analysis).
The package also provides functions to assess model fits such as the
deviance information criterion (DIC) and the logarithm of the
pseudo-marginal likelihood (LPML), and to generate diagnostic plots.
Some potential complications, theoretical and computational, in these
model selection criteria may break the algorithm or erode the
statistical inference when unaddressed (see
Section 5.0.0.1), which *metapack* takes care of
by default—an advantage over model-agnostic programming languages.

The rest of this paper is organized as follows.
Section 2 briefly reviews the (network)
meta-analysis model. Section 3 describes the
general form of a meta-analysis data set to establish a generic data
structure for meta-analysis and network meta-analysis.
Section 4 explains how the data structure can be
represented using R’s extended formula and how *metapack*’s main
function parses it, and lays down the various modeling options for
meta-analysis and network meta-analysis.
Section 5 further introduces the S3 methods
available for performing statistical inferences and comparing models.
Some computational considerations to accelerate the computation are
detailed as well. Section 6 provides
demonstrations using the *cholesterol* data and *TNM* data included in
*metapack*. Finally, Section 7 concludes the paper by
offering a cautionary remark for multivariate meta-analysis models
regarding the number of observations required to perform valid
inferences on the correlation matrix, and discussing future research and
package development directions.

In this section, we briefly review the models considered in *metapack*.
There are largely two umbrella models: univariate or multivariate
meta-analysis based on (Yao et al. 2015), and univariate network
meta-analysis based on (Li et al. 2021). The various modeling options
for each model introduced in this section *encompass* the ones in
(Yao et al. 2015) and (Li et al. 2021). In what follows, the model
description will deal with a general multivariate model including both
meta-analysis and network meta-analysis, which is valid even when
univariate response is assumed, i.e. \(J=1\). Occasionally, the univariate
model description will be provided side by side to avoid confusion.

Assume \(K\) randomized controlled trials (RCTs) where the \(k\)-th trial includes \(T_k\) treatment arms. Meta-analysis refers to a special case where \(T_k=2\) for all \(k=1,\ldots,K\). We adopt the notational abuse of omitting the trial indicator in the treatment’s subscript. For instance, for the sample size of the \(t\)-th treatment arm of the \(k\)-th trial, we write \(n_{kt}\), not \(n_{t_kk}\). Furthermore, for readability, note that boldface lowercase Latin letters are vectors, while uppercase Latin letters are matrices. Boldface Greek letters can either be vectors or matrices and will be defined contextually. Let \(\mathbf{y}_{kt}\) be a \(J\)-dimensional aggregate response vector for the \(t\)-th treatment of the \(k\)-th trial. Similarly, let \(\boldsymbol{x}_{ktj}\in \mathbf{R}^{p_j}\) be the treatment-within-trial level covariate corresponding to the \(j\)-th response, reflecting the fixed effects of the \(t\)-th treatment arm, and let \(\boldsymbol{w}_{ktj} \in \mathbf{R}^{q_j}\) be the vector of covariates for the random effects. The model \(\boldsymbol{y}_{kt}=\boldsymbol{X}_{kt}\boldsymbol{\beta} + \boldsymbol{W}_{kt} \boldsymbol{\gamma}_k + \boldsymbol{\epsilon}_{kt}\) describes the aggregate data model, where \(\boldsymbol{\epsilon}_{kt} = (\epsilon_{kt1},\ldots,\epsilon_{ktJ})^\top\), \(\boldsymbol{X}_{kt} = \oplus_{j=1}^J x_{ktj}^\top\) for which \(\oplus\) indicates direct sum, \(\boldsymbol{\beta} = (\boldsymbol{\beta}_1^\top,\ldots,\boldsymbol{\beta}_J^\top)^\top\), \(\boldsymbol{W}_{kt}=\oplus_{j=1}^J \boldsymbol{w}_{ktj}^\top\), and \(\boldsymbol{\gamma}_k = (\boldsymbol{\gamma}_{k1}^\top,\boldsymbol{\gamma}_{k2}^\top,\ldots,\boldsymbol{\gamma}_{kJ}^\top)^\top\). The aggregate (network) meta-analysis model becomes \[\label{eq:inital-mvmr} \begin{dcases} \mathbf{y}_{kt} = \mathbf{X}_{kt}\mathbf{\beta} + \mathbf{W}_{kt} \mathbf{\gamma}_k + \mathbf{\epsilon}_{kt} \;\; and \;\; (n_{kt}-1)\mathbf{S}_{kt} \sim \mathcal{W}_{n_{kt}-1}(\Sigma_{kt}),&\text{if J }\geq \text{ 2 (multivariate)}\\ y_{kt} = \mathbf{x}_{kt}^\top\mathbf{\beta} + \mathbf{w}_{kt}^\top \mathbf{\gamma}_k + \epsilon_{kt} \;\; and \;\; (n_{kt}-1)s^2_{kt}/\sigma^2 \sim \chi^2_{n_{kt}-1},&\text{if J=1 (univariate)} \end{dcases}, \tag{1}\] where \(\mathbf{\epsilon}_{kt} \sim \mathcal{N}(\mathbf{0}, \Sigma_{kt}/n_{kt})\) or \(\epsilon_{kt} \sim \mathcal{N}(0,\sigma_{kt}^2/n_{kt})\), \(\mathbf{S}_{kt}\) is the sample covariance matrix, \(s_{kt}^2\) is the sample variance, and \(\Sigma_{kt} \in \mathcal{S}_{++}^J\) for which \(\mathcal{S}_{++}^J\) is the space of \(J\times J\) symmetric positive-definite matrices. In Equation (1), \(\mathcal{W}_{\nu}(\Sigma)\) is the Wishart distribution with \(\nu\) degrees of freedom and a \(J\times J\) scale matrix \(\Sigma\) with density \[p(X\mid \nu, \Sigma) = \dfrac{1}{2^{J\nu}|\Sigma|^{\nu/2}\Gamma_J(\nu/2)} |X|^{(\nu-J-1)/2}\exp\left(-\dfrac{1}{2}\mathrm{tr}(\Sigma^{-1}X) \right)I(X\in \mathcal{S}_{++}^J),\] where \(\Gamma_J\) is the multivariate gamma function defined by \(\Gamma_J(z) = \pi^{J(J-1)/4}\prod_{j=1}^J \Gamma[z+(1-j)/2]\). \(\chi_\nu^2\) indicates the chi-squared distribution with \(\nu\) degrees of freedom.

Stacking the random effects for all response endpoints, \(\boldsymbol{\gamma}_k = (\boldsymbol{\gamma}_{k1}^\top, \ldots, \boldsymbol{\gamma}_{kJ}^\top)^\top\). Since the random effects are assumed to follow a distribution in the location-scale family (either a multivariate \(t\)-distribution or a multivariate normal distribution), i.e. \(\boldsymbol{\gamma}_k \sim \mathcal{LS}(\boldsymbol{\gamma}, \Omega)\), the fixed-effect coefficient vector \(\boldsymbol{\beta}\) absorbs the random effects’ location parameter \(\boldsymbol{\gamma}\), forming \(\boldsymbol{\theta} = (\boldsymbol{\beta}^\top,\boldsymbol{\gamma}^\top)^\top\). The corresponding design matrix \(\boldsymbol{X}_{kt}\) is also expanded to include the random-effect design matrix \(\boldsymbol{W}_{kt}\), written as \(\boldsymbol{X}_{kt}^* = [\boldsymbol{X}_{kt}, \boldsymbol{W}_{kt}]\). With \(\boldsymbol{\gamma}_{k,o} = \boldsymbol{\gamma}_k - \boldsymbol{\gamma}\), the model now becomes \[\label{eq:model-centered} \begin{dcases} \mathbf{y}_{kt} = \mathbf{X}_{kt}^*\mathbf{\theta} + \mathbf{W}_{kt} \mathbf{\gamma}_{k,o} + \mathbf{\epsilon}_{kt},&\text{if J } \geq \text{ 2 (multivariate)}\\ \begin{split} y_{kt} &= {\mathbf{x}_{kt}^*}^\top\mathbf{\theta} + \mathbf{w}_{kt}^\top \mathbf{\gamma}_{k,o} + \epsilon_{kt}\\ &\Rightarrow \boldsymbol{y}_k = \boldsymbol{X}_k^* \boldsymbol{\theta} + \boldsymbol{W}_k \boldsymbol{\gamma}_{k,o} + \boldsymbol{\epsilon}_k \end{split} ,&\text{if J=1 (univariate)} \end{dcases}, \tag{2}\] where \(\boldsymbol{y}_k=(y_{k,t_{k1}},\ldots,y_{k,t_{kT_k}})^\top\), \(\boldsymbol{X}_k^*= ((\boldsymbol{x}^*_{k,t_{k1}})^\top,\ldots,(\boldsymbol{x}^*_{k,t_{kT_k}})^\top)^\top\), \(\boldsymbol{W}_k = (\boldsymbol{w}_{k,t_{k1}}^\top,\ldots,\boldsymbol{w}_{k,t_{kT_k}}^\top)^\top\), and \(\boldsymbol{\epsilon}_k=(\epsilon_{k,t_{k1}},\ldots,\epsilon_{k,t_{kT_k}})^\top\). We briefly restore the correct subscripts (i.e. \(t_{kl}\) for \(l=1,\ldots,T_k\)) to demonstrate that \(\boldsymbol{y}_k\), \(\boldsymbol{X}_k^*\), and \(\boldsymbol{W}_k\) may be different lengths and dimensions for different \(k\)’s. Here, \(\{t_{k1},\ldots,t_{kT_k}\}\) denotes the set of treatments compared in the \(k\)-th trial. The random effects are modeled differently in meta-analysis than in network meta-analysis. A major reason for this divergence is that the variables explaining the treatment effects are not easily found in the presence of varying numbers of treatments in different trials. The differences are further detailed in Section 4.3 and Section 4.4.

To streamline configuring models in R formula, it is important to
understand the data structure for
*metapack*.
Table 1 represents a typical arm-level data set
for (network) meta-analysis, where each row represents a trial arm.

Outcome (\(\mathbf{y}_{kt}\)) |
SD (\(s_{kt}\)) |
DesignM1 (\(\mathbf{x}_{kt}\)) |
DesignM2 (\(\mathbf{w}_{kt}\)) |
Trial (`k` ) |
Treat (`t` ) |
n |
---|---|---|---|---|---|---|

\(\mathbf{y}_{14}^\top\) | \(s_{14}\) | \(\mathbf{x}_{14}^\top\) | \(\mathbf{w}_{14}^\top\) | 1 | 4 | 1000 |

\(\mathbf{y}_{11}^\top\) | \(s_{11}\) | \(\mathbf{x}_{11}^\top\) | \(\mathbf{w}_{11}^\top\) | 1 | 1 | 545 |

\(\mathbf{y}_{21}^\top\) | \(s_{21}\) | \(\mathbf{x}_{21}^\top\) | \(\mathbf{w}_{21}^\top\) | 2 | 1 | 1200 |

\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |

`Outcome`

is the response (or responses for multivariate cases), `SD`

is
the standard deviation(s) of the response(s), `DesignM1`

and `DesignM2`

are design matrices, and `n`

is the arm sample size. The pair of trial
and treatment indicators is unique to a row. The first design matrix,
`DesignM1`

, contains the covariates for fixed effects and will be
written as \(\mathbf{X}\) henceforth. The second design matrix, `DesignM2`

or \(\mathbf{W}\), represents different things depending on the model,
which will be explained in Section 4.3 and
Section 4.4. It should be noted that there can
always be two design matrices, whose configuration will be illustrated
in Section 4.

A meta-analytic data set is characterized as folows: (1) univariate or
multivariate and (2) meta-analysis or network meta-analysis. Here,
meta-analysis refers to when trials have specifically two treatments
(i.e. \(t=1,2\) for all \(k\)) and all treatments are compared head to head.
On the other hand, network meta-analysis includes more than two total
treatments, where each trial can have a different set of treatments,
allowing indirect comparison between treatments that are not compared
head to head. The data structure is unchanged for network meta-analysis
except that `Treat`

can have more than two unique values. The first
category (univariate vs. multivariate) is determined by the number of
response endpoints, and the second category (meta- vs. network
meta-analysis) by the number of treatments. All other modeling choices
fall into prior specification.

`bmeta_analyze`

is the main function in *metapack*, whose first argument
is an R formula. `bmeta_analyze`

internally parses a formula to identify
a model and ultimately calls a *worker function*. An extension of R’s
formula class, *Formula*
(Zeileis and Croissant 2010), accommodates multiple responses and parts, lending itself
well into meta-analysis modeling. Once a model is fully identified, the
MCMC algorithm is executed in C++, thanks to
*Rcpp* (Eddelbuettel and Balamuta 2017) and
*RcppArmadillo*
(Eddelbuettel and Sanderson 2014).

Name | Functionality | Description |
---|---|---|

`bmeta_analyze` |
Estimation | Fits (network) meta-analysis models |

`hpd` |
Inference | Computes highest posterior density (HPD) intervals of model parameters |

`model_comp` |
Inference | Computes model comparison measures (DIC or LPML) |

`print` |
Displays a summary of the output | |

`summary` |
Displays a summary of the output | |

`plot` |
Plots trace plots, density plots, or surface-under-the-cumulative-ranking-curve (SUCRA) plots | |

`fitted` |
Computes posterior means, standard deviations, and HPD intervals | |

`coef` |
Computes posterior means of fixed-effect coefficients | |

`cholesterol` |
Data set | Cholesterol data for multivariate meta-analysis |

`TNM` |
Data set | Triglyceride data for network meta-analysis |

`Formula`

The three characterizations of a meta-analytic data set must be encoded
in the formula. Requiring the formula to have two left-hand sides (LHS)
and two or three right-hand sides (RHS)^{1} is enough to communicate the
characterizations for a wide class of meta-analysis models. We invite
other R package developers to adopt the following representation for
meta-analytic models, the general form of which is given by

`+ y2 | sd1 + sd2 ~ x1 + x2 + x3 + ns(n) | w1 + w2 + w3 | treat + trial (+ groups) y1 `

Each *part* in LHS or RHS is an R expression where variables (or
functions of variables) are chained with a plus sign (`+`

)—e.g.
`x1 + x2`

. The tilde (`~`

) separates all LHSs from all RHSs, each
further separated into parts by vertical bars (`|`

). The meaning of each
part is syntactically determined by its location within the formula,
like an English sentence. Therefore, every part must come in the exact
order as prescribed for `bmeta_analyze`

to correctly identify the model.

- The first LHS (
`y1 + y2`

), the responses, is required of all models. Depending on the number of variables given in the first LHS,`bmeta_analyze`

will determine whether the model is multivariate or univariate. For example, a first LHS with only`y`

would flag the model as univariate. - The second LHS (
`sd1 + sd2`

) supplies the standard deviations of the endpoints required of an aggregate-data meta-analysis. The function call will break if this part is missing. - The first RHS (
`x1 + x2 + x3 + ns(n)`

) defines the fixed-effect covariates. For aggregate-data models, the arm sample sizes must be passed as an argument to`ns()`

. In the example code,`n`

is the variable containing the arm sample sizes. - The second RHS (
`w1 + w2 + w3`

) defines either the random-effect covariates (\(\boldsymbol{w}_{kt}^\top \boldsymbol{\gamma}\)) or the variance-related covariates (\(\log \tau_{kt} = \boldsymbol{w}_{kt}^\top \boldsymbol{\phi}\))—see Sections 4.3 or 4.4 for details. This part is optional. If omitted,`bmeta_analyze`

will assume \(\boldsymbol{w}_{kt}=\boldsymbol{1}\) where \(\boldsymbol{1}\) is a vector of ones. - The third RHS (
`treat + trial`

or`treat + trial + groups`

) corresponds to the treatment and trial indicators, and optionally the grouping variable if it exists. Variables here must be provided in the exact order shown in the example.

The dimension of the response(s) is explicit in the formula, which
determines the first characterization. The treatments are coerced to a
factor—if not already one—whose number of levels is extracted (i.e.
`nlevels(treat)`

) to resolve the second characterization, meta-analysis
versus network meta-analysis.

Aside from the first two arguments, `formula`

and `data`

, there are four
other optional arguments that must be provided as R’s `list`

class:
`prior`

, `mcmc`

, `control`

, and `init`

. All hyperparameters for the
prior distributions should be included in `prior`

—see
Section 2 for hyperparameters. `mcmc`

only
regards the numbers of MCMC iterations: `ndiscard`

for the number of
burn-in iterations, `nskip`

for thinning, and `nkeep`

for the posterior
sample size. `control`

configures the Metropolis-Hastings algorithm.
`*_stepsize`

with the asterisk replaced with one of the parameter names
indicates the step size for determining the sample evaluation points in
the localized Metropolis algorithm. Lastly, initial values for the
parameters can be provided in `init`

in case a user has *a priori* known
high-quality starting points.

For meta-analysis models, *metapack* acknowledges the possible existence
of the first-line and second-line treatments trials. More generally, the
trials may be grouped by a factor believed to generate disparate random
effects. Although an arbitrary number of groups can exist in theory, we
restrict our attention to two groups. Denoting the binary group
indicators by \(u_{kt}\in \{0,1\}\) yields
\[\label{eq:meta-first-second-line}
y_{ktj} = \mathbf{x}_{ktj}^\top \mathbf{\beta} + (1-u_{kt})\mathbf{w}_{ktj}^\top \mathbf{\gamma}_{kj}^0 + u_{kt}\mathbf{w}_{ktj}^\top \mathbf{\gamma}_{kj}^1 + \epsilon_{ktj}. \tag{3}\]
The random effects are modeled as
\(\mathbf{\gamma}_{kj}^{l} \overset{\text{ind}}{\sim} \mathcal{N}({\mathbf{\gamma}_j^l}^*, \Omega_j^{l})\)
and \((\Omega_j^{l})^{-1} \sim \mathcal{W}_{d_{0j}}(\Omega_{0j})\). Stacking the vectors,
\(\mathbf{\gamma}_k^l = ((\mathbf{\gamma}_{k1}^l)^\top,\ldots,(\mathbf{\gamma}_{kJ}^l)^\top)^\top \sim \mathcal{N}({\mathbf{\gamma}^l}^*,\Omega^l)\), where
\({\mathbf{\gamma}^l}^* = (({\mathbf{\gamma}_1^l}^*)^\top,\ldots,({\mathbf{\gamma}_J^l}^*)^\top)^\top\),
\(\Omega_j =\Omega_j^0 \oplus \Omega_j^1\), and
\(\Omega = \oplus_{j=1}^J \Omega_j\) for \(l\in \{0,1\}\). Adopting the
*noncentered parameterization* (Bernardo et al. 2003), define
\(\mathbf{\gamma}_{k,o}^l = \mathbf{\gamma}_k^l - {\mathbf{\gamma}^l}^*\).
Denoting \(\mathbf{W}_{kt}^* = [(1-u_{kt})\mathbf{W}_{kt}, u_{kt}\mathbf{W}_{kt}]\),
\(\mathbf{X}_{kt}^* = [\mathbf{X}_{kt}, \mathbf{W}_{kt}^*]\),
\(\mathbf{\theta} = (\mathbf{\beta}^\top,{{\mathbf{\gamma}^0}^*}^\top, {{\mathbf{\gamma}^1}^*}^\top)^\top\), and \(\mathbf{\gamma}_{k,o} = ((\mathbf{\gamma}_{k,o}^0)^\top, (\mathbf{\gamma}_{k,o}^1)^\top)^\top\),
the model is written as follows:
\[\label{eq:final-mvmr}
\mathbf{y}_{kt} = \mathbf{X}_{kt}^*\mathbf{\theta} + \mathbf{W}_{kt}^* \mathbf{\gamma}_{k,o} + \mathbf{\epsilon}_{kt}. \tag{4}\]
If there is no distinction between the first-line and second-line
therapies, then setting \(u_{kt} = 0\) for all \((k,t)\) reduces the model
back to Equation (1). Finally, we assume
\(\boldsymbol{\theta}\sim \mathcal{N}(\boldsymbol{0},c_0\boldsymbol{I})\)
where \(\boldsymbol{I}\) is an identity matrix.

A (boilerplate) template for this class of models is as follows:

```
<- "y1 + y2 | sd1 + sd2 ~ x1 + x2 | w1 + w2 | treat + trial + groups"
f <- bmeta_analyze(formula(f), data = df,
fit prior = list(c0 = [real], dj0 = [real], Omega0 = [matrix],
a0 = [real], b0 = [real],
d0 = [real], nu0 = [real], Sigma0 = [matrix]),
control = list(sample_Rho = [logical], Rho_stepsize = [real],
R_stepsize = [real], delta_stepsize = [real], model = [string]))
```

We use `real`

and `string`

as aliases for `double`

and `character`

in R.
Every bracketed expression should be replaced with an instance of the
enclosed class. The hyperparameters in `prior`

and step sizes in
`control`

will be clarified in the following modeling options. Note that
all parameters with a step size are sampled through the
Metropolis-Hastings algorithm.

The covariance matrix between the response endpoints (\(\Sigma_{kt}\)) can
be modeled depending on (1) the amount of data available; and (2) what
assumptions the practitioner is willing to make. The diagonal elements
of \(\Sigma_{kt}\) are always identifiable whereas the off-diagonal
elements require additional modeling assumptions. *metapack* presently
offers five options, specifiable through `model`

in the `control`

argument. For \(\mathcal{M}_2\)–\(\mathcal{M}_5\), the unobserved sample
correlation matrices are sampled from their conditional distributions
\((R_{kt}\mid V_{kt},\Sigma_{kt})\) given by
\[\label{eq:sample-correlation-density}
p(R_{kt}\mid V_{kt},\Sigma_{kt}) \propto |R_{kt}|^{(n_{kt}-J-2)/2} \exp\left\{-\dfrac{(n_{kt}-1)}{2}\mathrm{tr}\left(V_{kt}^{\frac{1}{2}}\Sigma_{kt}^{-1}V_{kt}^{\frac{1}{2}}R_{kt} \right) \right\}. \tag{5}\]

- (\(\mathcal{M}_1\):
`model="NoRecovery"`

) The simplest and easiest way to model the covariance matrices is to relinquish correlation recovery, in which case Equation (5) is ignored. We assume \(\Sigma_{kt} = \mathrm{diag}(\sigma_{kt,11}^2,\ldots, \sigma_{kt,JJ}^2)\) with \(\sigma_{kt,jj}^2 \sim \mathcal{IG}(a_0,b_0)\) for \(a_0,b_0>0\), where \(\mathcal{IG}(a,b)\) denotes the inverse-gamma distribution whose density function is proportional to \(x^{-(a+1)} \exp(-b/x)\) for \(x>0\). For univariate meta-analyses, this is the only valid option since there are no off-diagonal entries. - (\(\mathcal{M}_2\):
`model="EquiCovariance"`

) We can assume the covariance matrix is for all pairs of treatments and trials. That is, \(\Sigma_{kt} = \Sigma\) for every combination of \((k,t)\) for \(t=1,\ldots,T\) and \(k=1,\ldots,K\). A Wishart prior distribution is assumed for \(\Sigma^{-1}\), i.e. \(\Sigma^{-1} \sim \mathcal{W}_{s_0}(\Sigma_0)\) for \(s_0>J-1\) and \(\Sigma_0\in \mathcal{S}_{++}^J\). - (\(\mathcal{M}_3\):
`model="EquiWithinTreat"`

) If the equal covariance is too strong an assumption, we can allow the covariance matrices to be equivalent within a treatment. That is, \(\Sigma_{kt} = \Sigma_t\) for \(t=1,\ldots,T\). Similar to \(\mathcal{M}_2\), a Wishart prior distribution is assumed for \(\Sigma_t^{-1}\), i.e. \(\Sigma_t^{-1} \sim \mathcal{W}_{s_0}(\Sigma_0)\). - (\(\mathcal{M}_4\):
`model="EquiCorrelation"`

) To achieve the best of both worlds—variances and correlations—we can let the variances enjoy maximum freedom but attempt to recover the correlations by restricting them to be identical across treatments and trials for identifiability. Performing the decomposition, \[\Sigma_{kt} = \mathbf{\delta}_{kt}\mathbf{\rho} \mathbf{\delta}_{kt},\] where \(\mathbf{\delta}_{kt} = \mathrm{diag}(\Sigma_{kt,11}^{1/2},\ldots,\Sigma_{kt,JJ}^{1/2})\), and \(\mathbf{\rho}\) is the correlation matrix, the elements in \(\mathbf{\delta}_{kt}\) and \(\mathbf{\rho}\) are sampled through the Metropolis-Hastings algorithm. - (\(\mathcal{M}_5\):
`model="Hierarchical"`

) The hierarchical prior for \(\Sigma_{kt}\) is given by \((\Sigma_{kt}^{-1}\mid \Sigma) \sim \mathcal{W}_{\nu_0}((\nu_0-J-1)^{-1}\Sigma^{-1})\) and \(\Sigma \sim \mathcal{W}_{d_0}(\Sigma_0)\). By allowing the \(\Sigma_{kt}\)’s to differ but having them share information across treatments and trials via \(\Sigma\), this assumption aims to control the between-treatment and between-trial variations simultaneously. Since the amount of information shared between treatments and trials is controlled by \(\nu_0\) \((>J-1)\), it is advised to try multiple values for \(\nu_0\) and perform a model assessment through the deviance information criterion (DIC) or the logarithm of the pseudo-marginal likelihood (LPML). \(\Sigma\) is further decomposed for Metropolis-Hastings algorithm as \(\Sigma=\Delta \boldsymbol{\rho}\Delta\) where \(\Delta = \mathrm{diag}(\delta_1,\ldots,\delta_J)\) is the diagonal matrix of standard deviations, and \(\boldsymbol{\rho}\) is a correlation matrix with unit diagonal elements.

For univariate network meta-analysis, the design matrix for random effects is restricted to be the selection matrix \(E_k^\top = (e_{t_{k1}}, e_{t_{k2}}, \ldots, e_{t_{k T_k}})^\top\), where \(e_{t_{kl}}=(0,\ldots, 1,\ldots,0)^\top\), \(l=1,\ldots,T_k\), with the \(t_{kl}\)th element set to 1 and 0 otherwise, and \(T_k\) is the number of treatments included in the \(k\)-th trial. Furthermore, we redefine \(\boldsymbol{\gamma}_{k,o}\) to be \(\boldsymbol{\gamma}_{k,o}:= E_k^\top \boldsymbol{\gamma}_k\), a vector of \(T_k\)-dimensional scaled random effects. The random effects \(\boldsymbol{\gamma}_k \sim t_T(\boldsymbol{\gamma},\boldsymbol{\rho},\nu)\), where \(t_T(\boldsymbol{\mu},\Sigma,\nu)\) denotes a multivariate \(t\)-distribution with \(\nu\) degrees of freedom, a location parameter vector \(\boldsymbol{\mu}\), and a scale matrix \(\Sigma\). \(T\) indicates the number of distinct treatments in all trials. The random effects \(\boldsymbol{\gamma}_k\) are scaled since \(\boldsymbol{\rho}\) is a correlation matrix with unit diagonal entries, and the variance components can be modeled as a multiplicative term. That is, with \(\mathbf{W}_k(\mathbf{\phi}) = \mathrm{diag}(\exp(\mathbf{w}_{kt_{k1}}^\top \mathbf{\phi}), \ldots, \exp(\mathbf{w}_{kt_{k T_{k}}}^\top \mathbf{\phi}))\), the model is recast as \[\mathbf{y}_{k} = \mathbf{X}_k^*\mathbf{\theta} + \mathbf{W}_k(\mathbf{\phi})\mathbf{\gamma}_{k,o} + \mathbf{\epsilon}_k,\] where \(\mathbf{X}_k^* = (\mathbf{X}_k, E_k^\top)\), \(\mathbf{\theta} = (\mathbf{\beta}^\top, \mathbf{\gamma}^\top)^\top\), \(\mathbf{\epsilon}_k \sim \mathcal{N}_{T_k}(\mathbf{0},\Sigma_k)\), and \(\Sigma_k = \mathrm{diag}\left(\frac{\sigma_{kt_{k1}}^2}{n_{kt_{k1}}}, \ldots, \frac{\sigma_{kt_{k T_{k}}}^2}{n_{kt_{k T_{k}}}}\right)\). This allows \(\exp(\boldsymbol{w}_{kt}^\top \boldsymbol{\phi})\) to be the standard deviation of \(\gamma_{kt}\). Since the multivariate \(t\)-random effects are not analytically marginalizable, we represent it as a scale mixture of normals as \[\label{eq:nma-hierarchical-re} (\mathbf{\gamma}_{k,o}\mid \lambda_k) \overset{\text{ind}}{\sim}\mathcal{N}_{T_k} \left(\mathbf{0},\lambda_k^{-1}(E_k^\top \mathbf{\rho}E_k) \right),\qquad \lambda_k \overset{\text{iid}}{\sim}\mathcal{G}a\left(\frac{\nu}{2},\frac{\nu}{2} \right), \tag{6}\] where \(\mathcal{G}a(a,b)\) indicates the gamma distribution with mean \(a/b\). Finally, \(\boldsymbol{\theta} \sim \mathcal{N}(\boldsymbol{0},c_{01}\boldsymbol{I})\) and \(\boldsymbol{\phi} \sim \mathcal{N}(\boldsymbol{0},c_{02}\boldsymbol{I})\).

A (boilerplate) template for this class of models is as follows:

```
<- "y | sd ~ x1 + x2 | w1 + w2 | treat + trial"
f <- bmeta_analyze(formula(f), data = df,
fit prior = list(c01 = [real], c02 = [real], df = [real],
a4 = [real], b4 = [real], a5 = [real], b5 = [real]),
control = list(sample_df = [logical], sample_Rho = [logical],
Rho_stepsize = [real], phi_stepsize = [real], lambda_stepsize = [real]))
```

Every bracketed expression should be replaced with an instance of the
enclosed class. The hyperparameters in `prior`

and step sizes in
`control`

will be clarified in the following modeling options. Note that
all parameters with a step size are sampled through the
Metropolis-Hastings algorithm.

The appeal of considering heavy-tailed random effects and modeling the
variance to be a deterministic linear function of a covariate is that
both extend but still cover the common cases (normal random effects and
no variance modeling) as either the limiting or a special case. Unlike
meta-analysis, there is no single argument (`model`

) determining the
modeling option. A model is rather specified by a combination of
arguments.

- (\(\mathcal{M}_1\) - No variance modeling) Sometimes, no covariate
information is available for modeling the variances, where
\(\boldsymbol{w}_{kt_{kt}}\) reduces to
one—\(\boldsymbol{W}_k(\boldsymbol{\phi}) = \mathrm{diag}(\mathrm{e}^\phi,\ldots,\mathrm{e}^\phi)\).
The marginal variance of \(y_{kt}\) becomes
\(\mathrm{Var}(y_{kt})=\sigma_{kt}^2/n_{kt}+\mathrm{e}^{2\phi}\). This
can be achieved by simply omitting the
*second RHS*, thereby making the trial and treatment configuration the second RHS. For example,`y | sd ~x1 + x2 | treat + trial`

. - (\(\mathcal{M}_2\) - Normal random effects) In the limiting case where
\(\nu\to\infty\),
\(\mathbf{\gamma}_{k,o} \sim \mathcal{N}_{T_k}(0,E_k^\top \boldsymbol{\rho}E_k)\).
`bmeta_analyze`

treats \(\nu\) (`df`

) as part of the prior specification. Thus,`df=Inf`

in the`prior`

argument corresponds to opting for normal random effects. - (\(\mathcal{M}_3\) - Random degrees of freedom) The degrees of freedom
of the multivariate \(t\)-distribution for the random effects are
treated as unknown. In this case, a hierarchical prior is considered
for the degrees of freedom. That is,
\((\nu\mid \nu_a,\nu_b) \sim \mathcal{G}a(\nu_a,\nu_a/\nu_b)\),
\(\nu_a \sim \mathcal{G}a(a_4,b_4)\), and
\(\nu_b \sim \mathcal{IG}(a_5,b_5)\). Since sampling \(\nu\) regards the
MCMC algorithm, a logical variable
`sample_df`

is placed in the`control`

argument. If`sample_df=TRUE`

, the value for`df`

in`prior`

will be used as the initial value, and the related hyperparameters (`a4`

,`b4`

,`a5`

, and`b5`

) can be set in`prior`

. For obvious reasons,`df`

cannot be assigned`Inf`

if`sample_df=TRUE`

.

The object (`fit`

) returned from `bmeta_analyze`

remembers the function
arguments, encapsulates the model specification, and contains the
posterior sample from the MCMC algorithm. Therefore, this object alone
can be passed to other methods to perform subsequent inferences. These
methods include `fitted`

, `hpd`

, `coef`

, `model_comp`

, and `sucra`

.

The posterior means, standard deviations, and the HPD intervals are
computed via the `fitted()`

function. The `fitted()`

function has two
optional arguments: `level`

and `HPD`

. `level`

determines the
credibility level of the interval estimation. `HPD`

, a logical
parameter, decides whether a highest posterior density or equal-tailed
credible interval will be produced. It is also possible to obtain the
posterior interval estimates only using `hpd()`

.

```
> est <- fitted(fit, level=0.99, HPD = TRUE)
R> hpd <- hpd(fit, level = 0.95, HPD = TRUE) R
```

`coef(fit)`

allows users to extract the posterior mean of the
fixed-effect coefficients.

It is crucial to determine a suitable model to base the statistical inference on. Section 4.3 introduces five models for \(\Sigma_{kt}\), and Section 4.4 contains infinitely many models since the degrees of freedom \(\nu\) for the random effects can assume any number on the positive real line including infinity. Such a circumstance calls for a principled way of comparing models as well as evaluating goodness of fit.

The deviance information criterion (Spiegelhalter et al. 2002), or DIC, is defined as follows: \[\mathrm{DIC} = \mathrm{Dev}(\bar{\mathbf{\eta}}) +2 p_D,\] where \(\mathbf{\eta}\) indicates all model parameters for which \(\bar{\mathbf{\eta}} = \mathbb{E}[\mathbf{\eta}\mid D_{\text{obs}}]\). \(\mathrm{Dev}(\mathbf{\eta})\) is the deviance function given by \(\mathrm{Dev}(\mathbf{\eta}) = -2 \log L_{o\mathbf{y}}(\mathbf{\eta}\mid D_{\text{obs}})\), for which \(L_{o\mathbf{y}}\) is the observed-data likelihood associated with \(\mathbf{y}\), and \(p_D\) is defined as \(p_D = \overline{\mathrm{Dev}(\mathbf{\eta})} -\mathrm{Dev}(\bar{\mathbf{\eta}})\) where \(\overline{\mathrm{Dev}(\mathbf{\eta})} =\mathbb{E}[\mathrm{Dev}(\mathbf{\eta})\mid D_{\text{obs}}]\).

`> dic <- model_comp(fit, type = "dic", verbose = TRUE, ncores = 3) R`

Another Bayesian model selection criterion is the logarithm of the pseudo-marginal likelihood (LPML), defined as the summed logarithm of the conditional predictive ordinates (CPO). The CPO has a “leave-one-out” predictive interpretation, which in meta-analyses is often overlooked—a significant oversight that will undermine model comparison. The aggregate nature of meta-analysis data calls for a redefinition of what is “left out” and what should be the base unit for prediction. With trials as the base unit, the CPO for the \(k\)-th trial is \[\mathrm{CPO}_k = \int L(\mathbf{\eta}\mid D_{o\mathbf{y}})p(\mathbf{\eta}\mid D_{o\mathbf{y}}^{-k},D_{o\mathbf{s}})\,\mathrm{d}\mathbf{\eta},\] where \(D_{o\mathbf{y}}^{-k}\) is \(D_{o\mathbf{y}}\) with the \(k\)-th trial removed, and \(p(\mathbf{\eta}\mid D_{o\mathbf{y}}^{-k},D_{o\mathbf{s}})\) is the posterior distribution based on the data without the \(k\)-th trial. Then, \(\mathrm{LPML} = \frac{1}{K}\sum_{k=1}^K \log (\mathrm{CPO}_k)\).

`> lpml <- model_comp(fit, type = "lpml", verbose = TRUE, ncores = 3) R`

Treatments included in only one trial violate the predictive
interpretation of the CPO. The corresponding trials thus should be
removed from LPML calculation. `model_comp`

returns the logarithm of the
CPO of every trial but corrects the LPML should such treatments exist.
For those occasions, a naive sum of the logarithm of the
\(\mathrm{CPO}_k\)’s will not equal the corrected LPML in the returned
object.

The DIC and LPML for the meta-analysis models are relatively
straightforward since the observed-data likelihood is a multivariate
normal density. However, network meta-analysis models require some
technical considerations in evaluating the following observed-data
likelihood that involves an analytically intractable integral when
\(t\)-random effects are used: slow computation speed and numerical
overflow. Observe the following observed likelihood function for the
network meta-analysis model:
\[\begin{aligned}
L_{o\mathbf{y}}(\mathbf{\eta}\mid D_\text{obs}) = \prod_{k=1}^K \int_0^\infty &(2\pi)^{-\frac{T_k}{2}}g(\lambda_k)\left| \lambda_k^{-1}(\mathbf{W}_k(\mathbf{\phi})E_k^\top\mathbf{\rho}E_k \mathbf{W}_k(\mathbf{\phi})) + \Sigma_k \right|^{-\frac{1}{2}}\\
&\times \exp\left\{-\dfrac{(\mathbf{y}_k - \mathbf{X}_k^*\mathbf{\theta})^\top\left[\lambda_k^{-1}\mathbf{W}_k(\mathbf{\phi})E_k^\top \mathbf{\rho}E_k \mathbf{W}_k(\mathbf{\phi})+\Sigma_k \right]^{-1}(\mathbf{y}_k - \mathbf{X}_k^*\mathbf{\theta})}{2}\right\}\,\mathrm{d}\lambda_k,
\end{aligned}\]
where \(g(\lambda_k)\) is the gamma density with shape and rate parameters
\(\nu/2\). The integral is evaluated via double exponential (DE)
quadrature, or equivalently tanh-sinh quadrature
(Takahasi and Mori 1974; Bailey et al. 2005), available in the *math* header of
the *BH* package
(Eddelbuettel et al. 2019). DE quadrature is robust to singularities, terminates
fast, and provides high precision (Bailey et al. 2005). We address the slow
speed through *“shared memory multiprocessing programming”* via *OpenMP*
(OpenMP Architecture Review Board 2018). *OpenMP* is a widely used application programming interface
(API) for portable and scalable parallel processing in C, C++, and
Fortran across many operating systems. As long as R is configured for
*OpenMP*, *metapack* will deploy parallelism. Unless the argument
`ncores`

is specified otherwise, `model_comp`

will use two CPU cores for
parallel computing by default.

To prevent overflow, we take the following steps:

- Let \(h(\lambda_k)\) denote the integrand. Then, compute \(\widehat{\lambda}_k = \arg\max_{\lambda_k} \log h(\lambda_k)\).
- Redefine the integral as \[\exp\left\{\log h(\widehat{\lambda}_k) + \log \int_0^\infty\exp\left[\log h(\lambda_k) - \log h(\widehat{\lambda}_k) \right]\,\mathrm{d}\lambda_k\right\}.\]

Although the exponential shifting scheme does not warrant preventing every occurrence of numerical overflow, we have observed stable evaluations of the integral for over several thousand batches of simulations.

It is important to diagnose whether the results are consistent with the
assumptions and visualize the findings. *metapack* provides methods for
these: `plot`

and `sucra`

. The `plot`

method is available for both
meta-analysis and network meta-analysis whereas `sucra`

is exclusively
for network meta-analysis.

The `plot`

method will take the `fit`

object and generate the density
plots and trace plots of \(\mathbf{\theta}\). To see the plots for other
parameters, run the following commands using
*coda* (Plummer et al. 2006):

```
> library("coda")
R> posterior <- as.mcmc(data.frame(gammaR = fit$mcmc.draws$gamR,
R+ sig2 = fit$mcmc.draws$sig2))
> plot(posterior) R
```

Similarly, *boa* (Smith 2007)
can be used for output analysis. The posterior sample of \(\mathbf{\rho}\)
comes in three-dimensional arrays, which requires suitable indexing to
generate trace plots for the off-diagonal lower-triangular elements. To
generate such trace plots, run the following command:

```
> idx <- lower.tri(fit$mcmc.draws$Rho[,,1])
R> n_idx <- ncol(idx) * (ncol(idx) - 1) / 2
R> posterior <- as.mcmc(data.frame(
R+ rho = t(vapply(1:fit$mcmc$nkeep, function(ikeep) {
+ rho_i <- fit$mcmc.draws$Rho[,,ikeep]
+ rho_i[idx]
+ }, FUN.VALUE=numeric(n_idx)))))
> plot(posterior) R
```

The *surface under the cumulative ranking (SUCRA)* curve is useful when
the ranking of the treatments in a network meta-analysis is of interest.
Based on the posterior sample, \(\mathbb{P}(t,r)\) denotes the probability
that the treatment \(t\) is ranked \(1\leq r \leq T\) for \(t=1,\ldots,T\).
Let \(\mathbf{P}\) be the \(T\times T\) discrete rank (row-stochastic) probability matrix whose \((t,r)\)th
element is \(\mathbb{P}(t,r)\). The cumulative probability is then
computed through \(F(t,x) = \sum_{r=1}^x \mathbb{P}(t,r),\) where \(F(t,x)\)
is the probability that the \(t\)-th treatment is ranked \(x\) or better.
Since \(F(t,T)=1\) for every \(t\), the surface under the cumulative ranking
distribution for the \(t\)-th distribution is given by
\[\mathrm{SUCRA}(t) = \dfrac{1}{T-1}\sum_{x=1}^{T-1}F(t,x).\]
`sucra`

will take the `fit`

object and return the SUCRA and the discrete
rank probability matrix \(\mathbf{P}\).

```
> s <- sucra(fit)
R> s$SUCRA
R> s$rankprob R
```

If only plotting is needed, the storage of the `sucra`

object can be
bypassed via running `plot(sucra(fit))`

. Note that SUCRA \(s\) has a
bijection with the mean rank \(r\) of a treatment, \(r = 1 + (1-s)(T-1)\),
where \(T\) is the number of treatments.

*metapack* includes a data set, `cholesterol`

, which consists of 26
double-blind, randomized, active, or placebo-controlled clinical trials
on patients with primary hypercholesterolemia sponsored by Merck & Co.,
Inc., Kenilworth, NJ, USA (Yao et al. 2015). The data set can be loaded
by running `data("cholesterol")`

. The `cholesterol`

data set has three
endpoints: low density lipoprotein cholesterol (`pldlc`

), high density
lipoprotein cholesterol (`phdlc`

), and triglycerides (`ptg`

). The
percent change from the baseline in the endpoints, variables prefixed by
`p-`

, are the aggregate responses, followed by the corresponding
standard deviations prefixed by `sd-`

.

Variable | Description |
---|---|

`study` |
study identifier |

`trial` |
trial identifier |

`treat` |
treatment indicator for Statin or Statin+Ezetimibe |

`n` |
the number of participants in the study arms |

`pldlc` |
aggregate percentage change in LDL-C |

`phdlc` |
aggregate percentage change from baseline in HDL-C |

`ptg` |
aggregate percentage change from baseline in triglycerides (TG) |

`sdldl` |
sample standard deviation of percentage change in LDL-C |

`sdhdl` |
sample standard deviation of percentage change in HDL-C |

`sdtg` |
sample standard deviation of percentage change in triglycerides (TG) |

`onstat` |
whether the participants were on Statin prior to the trial |

`bldlc` |
baseline LDL-C |

`bhdlc` |
baseline HDL-C |

`btg` |
baseline triglycerides (TG) |

`age` |
age in years |

`white` |
the proportion of white participants |

`male` |
the proportion of male participants |

`dm` |
the proportion of participants with diabetes mellitus |

`durat` |
duration in weeks |

Variable documentation is also available in `help("cholesterol")`

.

```
> set.seed(2797542)
R> f_1 <- 'pldlc + phdlc + ptg | sdldl + sdhdl + sdtg ~
R+ 0 + bldlc + bhdlc + btg + age + durat + white + male + dm + ns(n) | treat |
+ treat + trial + onstat'
> fit_ma <- bmeta_analyze(formula(f_1), data = cholesterol,
R+ prior = list(model="NoRecovery"),
+ mcmc = list(ndiscard = 1000, nkeep = 1000),
+ control=list(scale_x = TRUE, verbose=TRUE))
```

`summary`

can be used to summarize the posterior sample from the `fit`

object. `summary`

will name the variables accordingly in the output,
suffixed by the index \(j\) in the corresonding outcome variable
\(y_{ktj}\). For example, `bldlc_1`

in the output below corresponds to the
base LDL-C’s coefficient associated with the first endpoint (`pldlc`

).
The summary table consists of the posterior mean, posterior standard
deviation, the HPD lower bound, and the HPD upper bound. Users may
choose to compute the equal-tailed credible intervals (CI) instead of
the HPD intervals by setting `HPD=FALSE`

.

```
> summary(fit_ma, HPD = TRUE, level = 0.95)
R:
Callbmeta_analyze(formula = formula(f_1), data = cholesterol,
prior = list(model = "NoRecovery"),
mcmc = list(ndiscard = 1000, nkeep = 1000), control = list(scale_x = TRUE,
verbose = TRUE))
-effects:
FixedHPD(Lower) HPD(Upper)
Post.Mean Std.Dev 0.1394 0.0938 -0.0532 0.3051
bldlc_1 -0.5146 0.6329 -1.7641 0.7243
bhdlc_1 0.0018 0.0818 -0.1693 0.1607
btg_1 0.3785 0.4389 -0.5308 1.1648
age_1 0.9699 0.5234 -0.1658 1.9979
durat_1 -5.2449 7.4048 -19.7345 8.9764
white_1 -1.2652 12.3948 -25.6807 23.0538
male_1 0.3987 5.3392 -10.4995 9.4673
dm_1 -0.0128 0.0137 -0.0393 0.0134
bldlc_2 -0.0294 0.1280 -0.2887 0.2149
bhdlc_2 0.0228 0.0212 -0.0203 0.0619
btg_2 -0.0963 0.0735 -0.2422 0.0460
age_2 -0.0007 0.0712 -0.1320 0.1382
durat_2 5.5822 1.3044 3.0802 8.1901
white_2 -2.1449 2.6956 -7.5975 2.6164
male_2 -1.0457 1.0923 -3.1816 1.0341
dm_2 0.0141 0.0406 -0.0719 0.0853
bldlc_3 0.0082 0.2637 -0.5278 0.4872
bhdlc_3 -0.0734 0.0467 -0.1598 0.0154
btg_3 -0.1385 0.2043 -0.5077 0.2591
age_3 0.0996 0.2546 -0.3718 0.6080
durat_3 -0.7053 5.2345 -11.0570 8.9591
white_3 14.5482 7.1735 0.0642 28.2959
male_3 5.0535 2.9542 -1.4157 10.5545
dm_3 *(1-2nd)_1 -42.8675 3.2934 -48.8356 -35.8794
(Intercept)*(1-2nd)_1 -12.1435 1.1211 -14.4040 -10.0135
treat*2nd_1 -3.2219 3.0426 -9.3109 2.7419
(Intercept)*2nd_1 -20.0843 1.5235 -23.3169 -17.2637
treat*(1-2nd)_2 5.1425 0.5840 3.8240 6.1036
(Intercept)*(1-2nd)_2 2.0787 0.4718 1.0663 3.0222
treat*2nd_2 0.7346 0.5814 -0.3377 1.8738
(Intercept)*2nd_2 1.3482 0.3116 0.7579 1.9880
treat*(1-2nd)_3 -18.5035 2.1543 -22.3232 -13.7985
(Intercept)*(1-2nd)_3 -4.7726 0.9952 -6.7784 -2.9511
treat*2nd_3 -4.1805 1.2910 -6.7672 -1.7005
(Intercept)*2nd_3 -8.8579 0.9443 -10.6208 -7.0530
treat---------------------------------------------------
*HPD level: 0.95
```

The suffixed `_j`

where `j`

can be 1, 2, or 3 corresponds to the
response endpoint. Since `scale_x = TRUE`

is equivalent to
`scale(<var>, center=TRUE, scale=TRUE)`

, the covariates have been
centered, which affects the interpretation of the intercepts. This
allows us to interpret `(Intercept*(1-2nd)_1)=-42.8675`

as the statin
effect in the first-line studies, where `2nd`

represents the indicator
variable for second-line studies evaluating to one if second-line and
zero otherwise. On the other hand, the coefficient estimate -12.1435 for
`treat*(1-2nd)_1`

is the Statin+Ezetimibe effect, compared to
administering statin alone. For the second-line studies where patients
had already been on statin, `(Intercept)*2nd_1=-3.2219`

came out
insignificant, according to the 95% HPD interval, as anticipated because
the treatment for this group was merely a continuation of taking statin.
The coefficient estimate -20.0843 for `treat*2nd_1`

shows that ezetimibe
on top of statin has a greater cholesterol-lowering effect than statin
alone.

`print`

is similar to `summary`

but additionally prints the model
specification. The output from `print(fit_ma)`

is given as follows.

```
> print(fit_ma, HPD = TRUE, level = 0.95)
R:
Callbmeta_analyze(formula = formula(f_1),
data = cholesterol, prior = list(model = "NoRecovery"),
mcmc = list(ndiscard = 1000, nkeep = 1000),
control = list(scale_x = TRUE, verbose = TRUE))
:
Model
(Aggregate mean)= X_kt * theta + W_kt * gamma_k + N(0, Sigma_kt / n_kt)
y_kt
(Sample Variance)- 1) S_kt ~ Wishart(n_kt - 1, Sigma_kt)
(n_kt
(Random effects)| Omega] ~ N(0, Omega)
[gamma_k :
Priors~ MVN(0, 1e+05 * I_p)
theta ^{-1} ~ Wishart( 2.1 , Omega0)
Omega_j= diag(sig_{tk,11}^2, ..., sig_{tk,JJ}^2)
Sigma_kt ^2 ~ IG( 0.1 , 3.1 )
where sig_{tk,jj}---------------------------------------------------
: 26
Number of trials: 52
Number of arms: 2
Number of treatmentsHPD(Lower) HPD(Upper)
Post.Mean Std.Dev 0.1394 0.0938 -0.0532 0.3051
bldlc_1 -0.5146 0.6329 -1.7641 0.7243
bhdlc_1 0.0018 0.0818 -0.1693 0.1607
btg_1 0.3785 0.4389 -0.5308 1.1648
age_1 0.9699 0.5234 -0.1658 1.9979
durat_1 -5.2449 7.4048 -19.7345 8.9764
white_1 -1.2652 12.3948 -25.6807 23.0538
male_1 0.3987 5.3392 -10.4995 9.4673
dm_1 -0.0128 0.0137 -0.0393 0.0134
bldlc_2 -0.0294 0.1280 -0.2887 0.2149
bhdlc_2 0.0228 0.0212 -0.0203 0.0619
btg_2 -0.0963 0.0735 -0.2422 0.0460
age_2 -0.0007 0.0712 -0.1320 0.1382
durat_2 5.5822 1.3044 3.0802 8.1901
white_2 -2.1449 2.6956 -7.5975 2.6164
male_2 -1.0457 1.0923 -3.1816 1.0341
dm_2 0.0141 0.0406 -0.0719 0.0853
bldlc_3 0.0082 0.2637 -0.5278 0.4872
bhdlc_3 -0.0734 0.0467 -0.1598 0.0154
btg_3 -0.1385 0.2043 -0.5077 0.2591
age_3 0.0996 0.2546 -0.3718 0.6080
durat_3 -0.7053 5.2345 -11.0570 8.9591
white_3 14.5482 7.1735 0.0642 28.2959
male_3 5.0535 2.9542 -1.4157 10.5545
dm_3 *(1-2nd)_1 -42.8675 3.2934 -48.8356 -35.8794
(Intercept)*(1-2nd)_1 -12.1435 1.1211 -14.4040 -10.0135
treat*2nd_1 -3.2219 3.0426 -9.3109 2.7419
(Intercept)*2nd_1 -20.0843 1.5235 -23.3169 -17.2637
treat*(1-2nd)_2 5.1425 0.5840 3.8240 6.1036
(Intercept)*(1-2nd)_2 2.0787 0.4718 1.0663 3.0222
treat*2nd_2 0.7346 0.5814 -0.3377 1.8738
(Intercept)*2nd_2 1.3482 0.3116 0.7579 1.9880
treat*(1-2nd)_3 -18.5035 2.1543 -22.3232 -13.7985
(Intercept)*(1-2nd)_3 -4.7726 0.9952 -6.7784 -2.9511
treat*2nd_3 -4.1805 1.2910 -6.7672 -1.7005
(Intercept)*2nd_3 -8.8579 0.9443 -10.6208 -7.0530
treat---------------------------------------------------
*HPD level: 0.95
```

For model comparison, the deviance information criterion (DIC) and the
logarithm of the pseudo marginal likelihood (LPML) can be computed using
the `model_comp`

method. The DIC will also contain
\(\mathrm{Dev}(\bar{\mathbf{\eta}})\) and \(p_D\). Similarly, the LPML will
contain the logarithm of the CPOs, which is omitted.

```
> dic <- model_comp(fit_ma, "dic")
R> lpml <- model_comp(fit_ma, "lpml")
R> c(dic$dic, dic$Dev, dic$pD)
R1] 827.80726 734.50691 46.65018
[
> lpml$lpml
R1] -428.1813 [
```

*metapack* includes another data set, `TNM`

, which consists of 29
studies, dubbed the *Triglycerides Network Meta (TNM)* data
(Li et al. 2019). The data set has 73 observations and 15 variables,
which can be loaded via `data("TNM")`

. The aggregate response variable
is the mean percentage difference in triglycerides (`ptg`

), paired with
its corresponding standard deviation (`sdtg`

).

Variable | Description |
---|---|

`trial` |
trial identifier |

`treat` |
lovastatin (L), rosuvastatin (R), pravastatin (P), ezetimibe (E), |

simvastatin+ezetimibe (SE), atorvastatin+ezetimibe (AE), | |

lovastatin+ezetimibe (LE), or pravastatin+ezetimibe (PE) | |

`n` |
the number of participants in the study arms |

`ptg` |
percentage change from baseline in triglycerides (TG) |

`sdtg` |
sample standard deviation of percentage change in triglycerides (TG) |

`bldlc` |
baseline LDL-C |

`bhdlc` |
baseline HDL-C |

`btg` |
baseline triglycerides (TG) |

`age` |
age in years |

`white` |
the proportion of white participants |

`male` |
the proportion of male participants |

`bmi` |
body fat index |

`potencymed` |
the proportion of medium statin potency |

`potencyhigh` |
the proportion of high statin potency |

`durat` |
duration in weeks |

Similarly to `cholesterol`

, variable descriptions are also available
through `help("TNM")`

.

LPML in Section 5.0.0.1 is not the only quantity
affected by the treatments only included in a single trial. The
variances of the corresponding treatment effects are nonestimable.
(Li et al. 2019) proposes to group those treatments and allow the
treatments in a group to share the same variance. This grouping scheme
can be easily achieved using `match`

in R:

`> TNM$group <- factor(match(TNM$treat, c("PBO", "R"), nomatch = 0)) R`

In the following demonstration, we consider the following model:

```
> f_2 <- 'ptg | sdtg ~
R+ 0 + bldlc + bhdlc + btg + age + white + male + bmi +
+ potencymed + potencyhigh + durat + ns(n) |
+ scale(bldlc) + scale(btg) + group | treat + trial'
```

The model can be fit by running

```
> set.seed(2797542)
R> fit_nma <- bmeta_analyze(formula(f_2), data = TNM,
R+ mcmc = list(ndiscard = 1000, nskip = 1, nkeep = 1000),
+ control=list(scale_x = TRUE, verbose=TRUE))
```

Again, the model summary can be obtained using either `summary`

or
`print`

with minor differences.

```
> summary(fit_nma)
R:
Callbmeta_analyze(formula = formula(f_2), data = TNM, mcmc = list(ndiscard = 1000,
nskip = 1, nkeep = 1000), control = list(scale_x = TRUE,
verbose = TRUE))
in network meta-regression models
Posterior inference -effects:
FixedHPD(Lower) HPD(Upper)
Post.Mean Std.Dev -0.0071 0.0171 -0.0409 0.0230
bldlc 0.2332 0.2490 -0.2802 0.7027
bhdlc 0.0846 0.0366 0.0006 0.1460
btg -0.0068 0.1028 -0.2003 0.2020
age -7.2148 1.9867 -10.7112 -3.0757
white 1.2323 7.3305 -12.5660 15.6644
male -0.4505 0.3534 -1.1544 0.2184
bmi 6.8278 7.9203 -8.5359 22.4551
potencymed -0.6474 7.9330 -16.9216 14.4191
potencyhigh 0.1880 0.1879 -0.1638 0.5652
durat -24.3175 1.7820 -27.9255 -20.9635
A -30.3604 2.9591 -35.7474 -24.2297
AE -5.2737 6.1207 -17.4431 6.4066
E -11.6521 4.3747 -20.1106 -2.7431
L -26.9507 3.4730 -33.3236 -19.9223
LE -8.3357 4.5065 -16.9311 0.7870
P 1.6170 6.0629 -9.9366 13.6298
PBO -22.7722 3.6333 -29.6129 -15.7540
PE -17.7669 2.0010 -21.3989 -13.3839
R -19.1616 1.2175 -21.7647 -16.9279
S -21.8271 2.0509 -25.8031 -17.6347
SE ---------------------------------------------------
*HPD level: 0.95
```

We observe that with covariate adjustment, all active treatments (A, AE, E, L, LE, P, PE, R, S, SE) reduce triglyceride (TG) more effectively than the placebo (PBO), although E and P have 95% HPD intervals including zero.

The output from `print(fit_nma)`

further includes the model
specification, and summary statistics for \(\boldsymbol{\phi}\).

```
> print(fit_nma)
R:
Callbmeta_analyze(formula = formula(f_2), data = TNM, mcmc = list(ndiscard = 1000,
nskip = 1, nkeep = 1000), control = list(scale_x = TRUE,
verbose = TRUE))
:
Model
(Aggregate mean)= x_kt'theta + tau_kt * gamma_kt + N(0, sigma_kt^2 / n_kt)
y_kt (Sample Variance)
(n_kt - 1) S^2 / sigma_kt^2 ~ chi^2(n_kt - 1)
(Random effects)
[gam | Rho,nu] ~ MVT(0, E_k' Rho E_k, nu)
:
Priors~ MVN(0, c01 * I_p), c01= 1e+05
theta ~ MVN(0, c02 * I_q), c02= 4
phi p(sigma^2) ~ 1/sigma^2 * I(sigma^2 > 0)
p(Rho) ~ 1
---------------------------------------------------
: 29
Number of studies: 73
Number of arms: 11
Number of treatmentsHPD(Lower) HPD(Upper)
Post.Mean Std.Dev -0.0071 0.0171 -0.0409 0.0230
bldlc 0.2332 0.2490 -0.2802 0.7027
bhdlc 0.0846 0.0366 0.0006 0.1460
btg -0.0068 0.1028 -0.2003 0.2020
age -7.2148 1.9867 -10.7112 -3.0757
white 1.2323 7.3305 -12.5660 15.6644
male -0.4505 0.3534 -1.1544 0.2184
bmi 6.8278 7.9203 -8.5359 22.4551
potencymed -0.6474 7.9330 -16.9216 14.4191
potencyhigh 0.1880 0.1879 -0.1638 0.5652
durat -24.3175 1.7820 -27.9255 -20.9635
A -30.3604 2.9591 -35.7474 -24.2297
AE -5.2737 6.1207 -17.4431 6.4066
E -11.6521 4.3747 -20.1106 -2.7431
L -26.9507 3.4730 -33.3236 -19.9223
LE -8.3357 4.5065 -16.9311 0.7870
P 1.6170 6.0629 -9.9366 13.6298
PBO -22.7722 3.6333 -29.6129 -15.7540
PE -17.7669 2.0010 -21.3989 -13.3839
R -19.1616 1.2175 -21.7647 -16.9279
S -21.8271 2.0509 -25.8031 -17.6347
SE 0.4088 0.2716 -0.1610 0.8882
phi1 -0.3248 0.3869 -1.1721 0.2902
phi2 0.2692 0.2239 -0.1859 0.6835
phi3 -1.0862 1.2024 -3.2686 0.9419
phi4 0.5973 0.3407 -0.0341 1.2829
phi5 ---------------------------------------------------
*HPD level: 0.95
```

The model comparison measures are computed using `model_comp`

. For
example,

```
> dic <- model_comp(fit_nma, "dic")
R> c(dic$dic, dic$Dev, dic$pD)
R1] 386.41450 334.72118 25.84666
[
> lpml <- model_comp(fit_nma, "lpml")
R> lpml$lpml
R1] -161.518 [
```

The `plot`

method will generate trace plots and density plots of the
fixed-effect coefficients. Figure 2 shows the
trace plots and density plots of treatments R, S, and SE from the
network meta-analysis model, generated by `plot(fit_nma)`

.

In addition to the MCMC diagnostics, network meta models can be
visualized using SUCRA plots, i.e. `plot(sucra(fit_nma))`

.
Figure 3 shows the SUCRA plot from the `fit_nma`

object. The `plot`

function for SUCRA uses
*ggplot2* (Wickham 2016)
and *gridExtra*
(Auguie 2017) to generate and combine plots.