CIMTx: An R package for causal inference with multiple treatments using observational data

CIMTx provides efficient and unified functions to implement modern methods for causal inferences with multiple treatments using observational data with a focus on binary outcomes. The methods include regression adjustment, inverse probability of treatment weighting, Bayesian additive regression trees, regression adjustment with multivariate spline of the generalized propensity score, vector matching and targeted maximum likelihood estimation. In addition, CIMTx illustrates ways in which users can simulate data adhering to the complex data structures in the multiple treatment setting. Furthermore, the CIMTx package offers a unique set of features to address the key causal assumptions: positivity and ignorability. For the positivity assumption, CIMTx demonstrates techniques to identify the common support region for retaining inferential units using inverse probability of treatment weighting, Bayesian additive regression trees and vector matching}. To handle the ignorability assumption, CIMTx provides a flexible Monte Carlo sensitivity analysis approach to evaluate how causal conclusions would be altered in response to different magnitude of departure from ignorable treatment assignment.


Introduction
Modern comparative effectiveness research (CER) questions often require comparing the effectiveness of multiple treatments on a binary outcome (Hu et al., 2020a).To answer these CER questions, specialized causal inference methods are needed.Methods appropriate for drawing causal inferences about multiple treatments include regression adjustment (RA) (Rubin, 1973;Linden et al., 2016), inverse probability of treatment weighting (IPTW) (Feng et al., 2012;McCaffrey et al., 2013), Bayesian Additive Regression Trees (BART) (Hill, 2011;Hu et al., 2021bHu et al., , 2020a)), regression adjustment with multivariate spline of the generalized propensity score (RAMS) (Hu and Gu, 2021), vector matching (VM) (Lopez and Gutman, 2017) and targeted maximum likelihood estimation (TMLE) (Rose and Normand, 2019).Drawing causal inferences using observational data, however, inevitably requires assumptions.A key causal identification assumption is the positivity or sufficient overlap assumption, which implies that there are no values of pre-treatment covariates that could occur only among units receiving one of the treatments (Hu et al., 2020a).Another key assumption requires appropriately conditioning on all pre-treatment variables that predict both treatment and outcome.The pre-treatment variables are known as confounders and this requirement is referred to as the ignorability assumption (also as no unmeasured confounding) (Hu et al., 2022b).An important strategy to handle the positivity assumption is to identify a common support region for retaining inferential units.The ignorability assumption can be violated in observational studies, and as a result can lead to biased treatment effect estimates.One widely recognized way to address such concerns is sensitivity analysis (Erik von Elm et al., 2007;Hu et al., 2022b).
The CIMTx package provides a suite of functions to easily implement the causal estimation methods, many of which were recently developed (Lopez and Gutman, 2017;Hu et al., 2020a;Hu and Gu, 2021).In addition, CIMTx provides strategies to define a common support region to address the positivity assumption using IPTW, BART, VM and implements a flexible Monte Carlo sensitivity analysis approach (Hu et al., 2022b) for unmeasured confounding to address the ignorability assumption.Finally, CIMTx offers detailed examples of how to simulate data adhering to the complex structures in the multiple treatment setting.The simulated data can then be used by an analyst to compare the performance of different causal estimation methods.Table 1 summarizes key functionalities of CIMTx in comparison to recent R packages designed for causal inference with multiple treatments using observational data.CIMTx provides a comprehensive set of functionalities: from simulating data to estimating the causal effects to addressing causal assumptions and elucidating their ramifications.To assist applied researchers and practitioners who work with observational data and wish to draw inferences about the effects of multiple treatments, this article provides a comprehensive illustration of the CIMTx package.

Design factors for data simulation
CIMTx provides specific functions to simulate data possessing complex data characteristics of the multiple treatment setting.Seven design factors are considered: (1) sample size, (2) ratio of units across treatment groups, (3) whether the treatment assignment model and the outcome generating model are linear or nonlinear, (4) whether the covariates that best predict the treatment also predict the outcome well, (5) whether the response surfaces are parallel across treatment groups, (6) outcome prevalence, and (7) degree of covariate overlap.

Design factors (1)-(5)
For the data generating process of treatment assignment, consider a multinomial logistic regression model, ln where Q denotes the nonlinear transformations and higher-order terms of the predictors are vectors of coefficients for the untransformed versions of the predictors X and ξ NL 1 , . . ., ξ NL (T−1) for the transformed versions of the predictors captured in Q.The intercepts δ 1 , . . ., δ (T−1) can be specified to create the corresponding ratio of units across T treatment groups.The T sets of potential response The R Journal Vol.14/3, September 2022 ISSN 2073-4859 surfaces can be generated as follows: where the coefficient setting γ L 1 = . . .= γ L T , γ NL 1 = . . .= γ NL T and τ 1 ̸ = . . .̸ = τ T corresponds to the parallel response surfaces, and by assigning different values to γ L w and γ NL w and setting τ 1 = . . .= τ T = 0, nonparallel response surfaces are generated, which imply treatment effect heterogeneity.Note that the predictors X and the transformed versions of the predictors Q in the treatment assignment model (1) can be different than those in the outcome generating model (2) to create various degrees of alignment.The observed outcomes are related to the potential outcomes through Covariates X can be generated from user-specified data distributions.

Outcome prevalence
Values for parameters τ 1 , . . ., τ T in model ( 2) can be chosen to create various outcome prevalence rates.The outcomes are considered rare if the prevalence rate is < 5%.

Covariate overlap
With observational data, it is important to investigate how the sparsity of covariate overlap impacts the estimation of causal effects.We can modify the formulation of the treatment assignment model (1) to adjust the sparsity of overlap by including a multiplier parameter ψ (Hu et al., 2021a) as follows: where larger values of ψ correspond to increased sparsity degrees of overlap.

Implementation in CIMTx
We will first demonstrate the functionality of data_sim() in CIMTx to simulate data in the multiple treatment setting using the above 7 design factors.We first use the data_sim() function to simulate a dataset with the following characteristics: (1) sample size = 500, (2) ratio of units = 1:1:1 across three treatment groups, (3) nonlinear treatment assignment and outcome generating models, (4) different predictors for the treatment assignment and outcome generating mechanisms, (5) parallel response surfaces, (6) outcome prevalence = (0.16, 0.51, 0.75) in three treatment groups with an overall rate around 0.5 and (7) moderate covariate overlap.Note that for the design factor (6), we can adjust tau to generate rare outcome events.

Methodology and implementation in CIMTx Estimation of causal effects
Consider an observational study with N individuals, indexed by i = 1, . . ., N, drawn randomly from a target population.Each individual was exposed to one and only one treatment, indexed by W. The goal of this study is to estimate the causal effect of treatment W on a binary outcome Y.
There are a total of T possible treatments, and where w ∈ W = {1, 2, . . ., T}. Pre-treatment measured confounders are indexed by X i .Under the potential outcomes framework, (Rubin, 1974;Holland, 1986), individual i has T potential outcomes {Y i (1), . . ., Y i (T)} under each treatment of W .For each individual, at most one of the potential outcomes is observed -the one corresponding to the treatment to which the individual is exposed.All other potential outcomes are missing, which is known as the fundamental problem of causal inference (Holland, 1986).In general, three standard causal identification assumptions (Rubin, 1980;Hu et al., 2020a) need to be maintained in order to estimate the causal effects from observational data: (A1) The stable unit treatment value assumption: there is no interference between units and there are no different versions of a treatment.
(A2) Positivity: the GPS for treatment assignment e(X i ) = P(W i = 1 | X i ) is bounded away from 0 and 1.
(A3) Ignorability: pre-treatment covariates X i are sufficiently predictive of both treatment assignment and outcome, p The CIMTx package addresses assumption (A2) in the section of "Identification of a common support region" and (A3) in the section of "Sensitivity analysis for unmeasured confounding".
Causal effects can be estimated by summarizing functionals of individual-level potential outcomes.For dichotomous outcomes, causal estimands can be the risk difference (RD), odds ratio (OR) or relative risk (RR).For purposes of illustration, we define causal effects based on the RD.Let s 1 and s 2 be two subgroups of treatments such that s 1 , s 2 ⊂ W and s 1 ∩ s 2 = ∅, and define |s 1 | as the cardinality of s 1 and |s 2 | of s 2 .Two commonly used causal estimands are the average treatment effect (ATE), ATE s 1 ,s 2 , and the average treatment effect on the treated (ATT), for example, among those receiving s 1 , ATT s 1 |s 1 ,s 2 .They are defined as: We now introduce six methods implemented in CIMTx for estimating the causal effects of multiple treatments: RA, IPTW, BART, RAMS, VM and TMLE.(Rubin, 1973;Linden et al., 2016), also known as model-based imputation (Imbens and Rubin, 2015), uses a regression model to impute missing

Regression adjustment Regression adjustment
The R Journal Vol.14/3, September 2022 ISSN 2073-4859 potential outcomes: what would have happened to a specific individual had this individual received a treatment to which he or she not exposed.RA regresses the outcomes on treatment and confounders, where β 0 is the intercept, β 1 is the coefficient for treatment and β 2 is a vector of coefficients for covariates X i .From the fitted regression model ( 5), the missing potential outcomes for each individual are imputed using the observed data.The causal effects can be estimated by contrasting the imputed potential outcomes between treatment groups.CIMTx implements RA with the Bayesian logistic regression model via the bayesglm() function of the arm package.For the ATE effects, we first average the L predictive posterior draws { f l (w, X i ), l = 1, . . ., L} over the empirical distribution of {X i } N i=1 , and for the ATT effects using s 1 as the reference group, over the empirical distribution of {X i } i:W i ∈s 1 .We then take the difference of the averaged values between two treatment groups w ∈ s 1 and w ′ ∈ s 2 .Inferences about treatment effect can be obtained based on the L posterior average treatment effects.The 95% credible interval is calculated using the 2.5th percentile and the 97.5th percentile of the posterior draws (Kruschke, 2014).
(i) The multinomial logistic regression model for treatment assignment is as follows: where α ′ w is a vector of coefficients for X i corresponding to treatment w, and can be estimated by using an iterative procedure such as generalized iterative scaling or iteratively reweighted least squares.
(ii) GBM uses machine learning to flexibly model the relationships between treatment assignment and covariates.It does this by growing a series of boosted classification trees to minimize an exponential loss function.This process is effective for fitting nonlinear treatment models characterized by curves and interactions.The procedure of estimating the GPS can be tuned to find the GPS model producing the best covariate balance between treatment groups.
(iii) Super learner is an algorithm that creates the optimally weighted average of several machine learning models.The machine learning models can be specified via the SL.library argument of the SuperLearner package.This approach has been proven to be asymptotically as accurate as the best possible prediction algorithm that is included in the library (Van der Laan et al., 2007).
IPTW can be implemented in CIMTx by setting a specific method and estimand.For IPTW estimators, variance can be estimated via a robust sandwich-type variance estimator or a bootstrap variance estimator.In practice, a bootstrap variance estimator is often recommended.(Austin, 2016).
The following shows the code to estimate ATE using IPTW with weights estimated by multinomial logistic regression.
iptw_multi_res <-ce_estimate(y = data$y, x = data$covariates , w = data$w, method = "IPTW-Multinomial", estimand = "ATE") We can estimate the ATE effects with weights estimated by super learner and GBM by changing the argument of method to "IPTW-SL","IPTW-GBM" respectively.We can then estimate the causal effects and bootstrap confidence intervals by setting boot = TRUE.iptw_sl_trim_ate_res <-ce_estimate(y = data$y, x = data$covariates , w = data$w, method = "IPTW-SL", estimand = "ATE", sl_library = c("SL.glm","SL.glmnet", "SL.rpart"), trim_perc = c(0.(Chipman et al., 2010) is a likelihood-based machine learning model and has been adapted into causal inference settings in recent years (Hill, 2011;Hu et al., 2020a;Hu and Gu, 2021;Hu et al., 2021a,c).For a binary outcome, BART uses the probit regression where Φ is the the standard normal cumulative distribution function, (T j , M j ) indexes a single subtree model in which T j denotes the regression tree and M j is a set of parameter values associated with the terminal nodes of the jth regression tree, g j (w, X i ; T j , M j ) represents the mean assigned to the node in the jth regression tree associated with covariate value X i and treatment level w, and the number of regression trees J is considered to be fixed and known.BART uses regularizing priors for (T j , M j ) to keep the impact of each tree small.Although the prior distributions can be specified via the ce_estimate() function of CIMTx, the default priors tend to work well and require little modification in many situations (Hill, 2011;Hu et al., 2020a,b).The details of prior specification and Bayesian backfitting algorithm for posterior sampling can be found in Chipman et al. (2010).The posterior inferences about the treatment effects can be drawn in a similar way as described in the Regression adjustment section.
bart_res <-ce_estimate(y = data$y, x = data$covariates, w = data$w, method = "BART", estimand = "ATT", ndpost=100, reference_trt = 1) summary(bart_res) #> $ATT12 #> EST SE LOWER UPPER #> RD -0.38 0.07 -0.51 -0.25 #> RR 0.47 0.08 0.31 0.61 #> OR 0.21 0.07 0.10 0.35 EST SE LOWER UPPER #> RD -0.56 0.07 -0.69 -0.43 #> RR 0.38 0.07 0.24 0.50 #> OR 0.06 0.03 0.02 0.13 Regression adjustment with multivariate spline of GPS For a binary outcome, the number of outcome events can be small.The estimation of causal effects is challenging with rare outcomes because the great majority of units contribute no information to explaining the variability attributable to the differential treatment regimens in the health outcomes (Hu and Gu, 2021).Franklin et al. (2017) found that regression adjustment on propensity score using one nonlinear spline performed best with respect to bias and root-mean-squared-error in estimating treatment effects.Hu and Gu (2021) proposed RAMS, which accommodates multiple treatments by using a nonlinear spline model for the outcome that is additive in the treatment and multivariate spline function of the GPS as the following: where h(•) is a spline function of the GPS indexed by ϕ and β = [β 1 , . . ., β T ] ⊤ are regression coefficients associated with the treatment W i .The dimension of the spline function h(•) depends on the number of treatments T. Confidence intervals of treatment effect estimates can be obtained using nonparametric bootstrap for RAMS (Hu and Gu, 2021).
The R Journal Vol.14/3, September 2022 ISSN 2073-4859 In CIMTx, RAMS is implemented using the gam() function with tensor product smoother te() between treatments from the mgcv package.Treatment effects can then be estimated by averaging and contrasting the predicted f (w, X i ) between treatment groups.The RAMS can be called by setting method = "RAMS-Multinomial" and specifying the estimand estimand = "ATE" or estimand = "ATT".rams_multi_res <-ce_estimate(y = data$y, x = data$covariates, w = data$w, method = "RAMS-Multinomial", estimand = "ATE", boot = TRUE, nboots = 100, verbose_boot = F) Vector matching Lopez and Gutman (2017) proposed the VM algorithm, which matches individuals with similar vector of the GPS.VM obtains matched sets using a combination of k-means clustering and one-to-one matching with replacement within each cluster strata.Currently, VM is only designed to estimate the ATT effects.In CIMTx , VM is implemented via method = "VM".The CIMTx does not provide confidence intervals for treatment effect estimates because the authors of this method, Lopez and Gutman (2017), did not provide an approach to estimate the sampling variance of the VM estimator.

#> 158
Targeted maximum likelihood estimation TMLE is a doubly robust approach that combines outcome estimation, IPTW estimation, and a targeting step to optimize the parameter of interest with respect to bias and variance.Rose and Normand (2019) implemented TMLE to estimate the ATE effects of multiple treatments.CIMTx calls the R package tmle to implement TMLE for the ATE effects.
As suggested by Rose and Normand (2019), nonparametric bootstrap is used in CIMTx to obtain the confidence interval of the treatment effect estimate.

Identification of a common support region
Turning to causal identification assumptions.If the positivity assumption (A2) is violated, problems can arise when extrapolating over the areas of the covariate space where common support does not exist.It is important to define a common support region to which the causal conclusions can be generalized.In CIMTx, the identification of a common support region is offered in three methods: IPTW, VM and BART.
For IPTW, one strategy is weight truncation, by which extreme weights that fall outside a specified range limit of the weight distribution are set to the range limit.This functionality is offered in CIMTx via the trim_perc argument.trim_perc, which can take two values -one for the lower-and one for the upper-percentile of the weight distribution for trimming.Figure 3 shows the distributions of the weights estimated by the three methods before and after weight trimming at the 5% and 95% of the weight distribution.
plot(iptw_multi_res, iptw_sl_res, iptw_gbm_res, iptw_multi_trim_res, iptw_sl_trim_res, iptw_gbm_trim_res) For VM, Lopez and Gutman (2017) proposed a rectangular support region defined by the maximum value of the smallest GPS and the minimum value of the largest GPS among the treatment groups.Individuals that fall outside the region are discarded from the causal analysis.This feature is automatically implemented with "VM" in CIMTx.
For BART, Hu et al. (2020a) supplied BART with a strategy to identify a common support region for retaining inferential units, which is to discard individuals with a large variability in their predicted potential outcomes.Specifically, for the ATT effects, any individual i with W i = w will be discarded if where s f w j and s f w ′ i respectively denote the standard deviation of the posterior distribution of the potential outcomes under treatment W = w and W = w ′ , for a given sample j.For the ATE effects, the discarding rule in equation ( 9) is applied to each treatment group.Users can implement the discarding rule by setting the discard argument in CIMTx.Using ATT 1|1,2 as an example, 5 (bart_dis_res$n_discard) individuals in the reference group w = 1 were discarded from the simulated data.

Sensitivity analysis for unmeasured confounding
The violation of the ignorability assumption (A3) can lead to biased treatment effect estimates.Sensitivity analysis is useful in gauging how much the causal conclusions will be altered in response to different magnitude of departure from the ignorability assumption.CIMTx implements a new flexible sensitivity analysis approach developed by Hu et al. (2022b).This approach first defines a confounding function for any pair of treatments (w, w ′ ) as The confounding function, also viewed as the sensitivity parameter in a sensitivity analysis, directly represents the difference in the mean potential outcomes Y(w) between those treated with W = w and those treated with W = w ′ , who have the same level of x.If the ignorability assumption holds, the confounding function will be zero for all w ∈ W .When treatment assignment is not ignorable, the unmeasured confounding is present and the causal effect estimates using measured X will be biased.Hu et al. (2022b) derived the form of the resultant bias as: where Table 2 demonstrates the plausible assumptions about the confounding functions and their interpretations.There are three ways in which we can specify the prior for the confounding functions: (i) point mass prior; (ii) re-analysis over a range of point mass priors (tipping point); (iii) full prior with uncertainty specified.Since the new sensitivity analysis approach was developed within the Bayesian framework, strategy (iii) offers an advantage of incorporating the statistical uncertainty due to sampling and the uncertainty about the values of the sensitivity parameters.In strategy (i), a fixed value is assumed for the sensitivity parameter.Strategy (ii) expands on strategy (i) and examines how the causal conclusion would change when a range of values are assumed for the sensitivity parameter.We will demonstrate all three cases of prior specifications with sa() function in CIMTx package.Hu et al. (2022b) further discussed (a) strategies to specify the confounding functions that represent our prior beliefs about the degrees of unmeasured confounding via the remaining variability in the outcomes unexplained by measured X (Hogan et al., 2014); and (b) ways in which the causal effects can be estimated adjusting for the presumed degree of unmeasured confounding.
Table 2: Interpretation of assumed priors on c(w, w ′ , x) and c(w ′ , w, x) for causal estimands based on the risk difference, assuming the outcome is an adverse event.
Prior assumption Interpretation and implications of the assumptions c(w, w ′ , x) c(w ′ , w, x) > 0 < 0 Unhealthier individuals are treated with w. < 0 > 0 Contrary to the above interpretation, unhealthier individuals are treated with w ′ .< 0 < 0 The observed treatment allocation between w ′ and w is beneficial relative to the alternative which reverses treatment assignment for everyone.> 0 > 0 Contrary to the above interpretation, the observed treatment allocation between w ′ and w is undesirable relative to the alternative which reverses treatment assignment for everyone.
The proposed sensitivity analysis algorithm proceeds with the following steps (Hu et al., 2022b): 1. Fit a multinomial probit BART model (Kindo et al., 2016 for m ← 1 to M 1 do Draw M 2 values η * lm1 , . . ., η * lmM 2 from the prior distribution of each of the confounding functions c(w, l, x), for each l ̸ = j ∧ l ∈ W .

end for end for
The R Journal Vol.14/3, September 2022 ISSN 2073-4859 3. Compute the adjusted outcomes, 4. Fit a BART model to each of M 1 × M 2 sets of observed data with the adjusted outcomes Y CF . 5. Estimate the combined adjusted causal effects and uncertainty intervals by pooling posterior samples across model fits arising from the M 1 × M 2 data sets.
We now demonstrate the Monte Carlo sensitivity analysis approach for unmeasured confounding (Hu et al., 2022b).We first simulate a small dataset in a simple causal inference setting.There are two binary confounders: X 1 is measured and X 2 is unmeasured.The output of the posterior GPS is a three-dimensional array.The first dimension is the number of posterior draws for the GPS (M 1 ).The second dimension is the number of treatment W, and the third dimension is the total sample size.
3. Specify the prior distributions and the number of draws (M 2 ) for the confounding functions c(w, w ′ , x).In this illustrative simulation example, we use the true values of the confounding functions within each stratum of x 1 .This represents the strategy (i) point mass prior.
The true values of the confounding functions within the stratum of x 1 can be calculated using the helper true_c_fun_cal() in our package.
true_c_fun <-true_c_fun_cal(x = x1, w w) 4. Calculate the confounding function adjusted outcomes with the drawn values of GPS and confounding functions.
5. Use the adjusted outcomes to estimate the causal effects.
The R Journal Vol.14/3, September 2022 ISSN 2073-4859 The sa() function implements the sensitivity analysis approach while fitting the M 1 × M 2 models using parallel computation.

Discussion
We contribute a comprehensive R package CIMTx suitable for causal analysis of observational data with multiple treatments and a binary outcome.In this package, we introduce six methods for the The R Journal Vol.14/3, September 2022 ISSN 2073-4859 estimation of causal effects, including both the classical approaches and machine learning based methods.Drawing causal inference from non-experimental data inevitably involves structural causal assumptions.CIMTx offers a unique set of features to address two key assumptions: positivity and ignorability, using appropriate estimation procedures.Additionally, the CIMTx package provides guidance to readers on how to simulate data possessing the data characteristics in the multiple treatment setting.Detailed step-by-step examples are provided to demonstrate all methods.The current version of the CIMTx package focuses on binary outcomes.For future research, developing methods and R packages for causal inferences with more complex outcomes such as censored survival outcomes (Hu et al., 2022a) could be a worthwhile contribution.

Figure 1 :
Figure 1: Moderate overlap with psi = 1.Each panel presents boxplots by treatment group of the true generalized propensity score for one of the treatments, P(W i = w | X = x) for every unit in the sample.The left-hand panel presents treatment 1 (W = 1), the middle panel presents treatment 2 (W = 2), and the right-hand panel presents treatment 3 (W = 3).

Figure 2 :
Figure 2: Strong overlap with psi = 0.1.Each panel presents boxplots by treatment group of the true generalized propensity score for one of the treatments for every unit in the sample.

Figure 3 :
Figure 3: Distributions of the inverse probability of treatment weights estimated by multinomial logistic regression, super learner and generalized boosted models.Panel (a) shows results before weight trimming.Panel (b) displays results after trimming the weights at 5% and 95% of the distribution.Super learner and the generalized boosted models produced less extreme weights compared to multinomial logistic regression.

Table 1 :
Comparisons of R packages for causal inference.