Propensity score weighting is an important tool for comparative effectiveness research. Besides the inverse probability of treatment weights (IPW), recent development has introduced a general class of balancing weights, corresponding to alternative target populations and estimands. In particular, the overlap weights (OW) lead to optimal covariate balance and estimation efficiency, and a target population of scientific and policy interest. We develop the R package PSweight to provide a comprehensive design and analysis platform for causal inference based on propensity score weighting. PSweight supports (i) a variety of balancing weights, (ii) binary and multiple treatments, (iii) simple and augmented weighting estimators, (iv) nuisance-adjusted sandwich variances, and (v) ratio estimands. PSweight also provides diagnostic tables and graphs for covariate balance assessment. We demonstrate the functionality of the package using a data example from the National Child Development Survey (NCDS), where we evaluate the causal effect of educational attainment on income.
Propensity score is one of the most widely used causal inference methods for observational studies (Rosenbaum and Rubin 1983). Propensity score methods include weighting, matching, stratification, regression, and mixed methods such as the augmented weighting estimators. The PSweight package provides a design and analysis pipeline for causal inference with propensity score weighting (Robins et al. 1994; Hirano et al. 2003; Lunceford and Davidian 2004; Li et al. 2018). There are a number of existing R packages on propensity score weighting (see Table 1). Comparing to those, PSweight offers three major advantages: it incorporates (i) visualization and diagnostic tools of checking covariate overlap and balance, (ii) a general class of balancing weights, including overlap weights and inverse probability of treatment weights, and (iii) multiple treatments. More importantly, PSweight comprises a wide range of functionalities, whereas each of the competing packages only supports a subset of these functionalities. As such, PSweight is currently the most comprehensive platform for causal inference with propensity score weighting, offering analysts a one-stop shop for the design and analysis. Table 1 summarizes the key functionalities of PSweight in comparison to related existing R packages. We elaborate the main features of PSweight below.
PSweight facilitates better practices in the design stage of observational studies, an aspect that has not been sufficiently emphasized in related packages. Specifically, we provide a design module that facilitates visualizing overlap (also known as the positivity assumption) and evaluating covariate balance without access to the final outcome (Austin and Stuart 2015). When there is limited overlap, PSweight allows for symmetric propensity score trimming (Crump et al. 2009; Yoshida et al. 2018) and optimal trimming (Crump et al. 2009; Yang et al. 2016) to improve the internal validity. We extend the class of balance metrics suggested in Austin and Stuart (2015) and Li et al. (2019) for binary treatments, and those in McCaffrey et al. (2013) and Li and Li (2019) for multiple treatments. In addition, the design module helps describe the weighted target population by providing the information required in the standard “Table 1” of a clinical article.
In addition to the standard inverse probability of treatment weights (IPW), PSweight implements the average treatment effect among the treated (Treated) weights, overlap weights (OW), matching weights (MW) and entropy weights (EW) for both binary (Li and Greene 2013; Li et al. 2018; Mao et al. 2018; Zhou et al. 2020) and multiple treatments (Yoshida et al. 2017; Li and Li 2019). All weights are members of the family of balancing weights (Li et al. 2018); the last three types of weights target at the subpopulation with improved overlap in the covariates between (or across) treatment groups, similar to the target population in randomized controlled trials (Thomas et al. 2020a,b). Among them, OW achieves optimal balance and estimation efficiency (Li et al. 2018, 2019). We also implement the augmented weighting estimators corresponding to each of the above weighting schemes (Mao et al. 2018). By default, PSweight employs parametric regression models to estimate propensity scores and potential outcomes. Nonetheless, it also allows for propensity scores to be estimated by external machine learning methods including generalized boosted regression models (McCaffrey et al. 2013) and super learner (Van der Laan et al. 2007), or importing any other propensity or outcome model estimates of interest.
Multiple | Balance | IPW/ATT | OW/other | Ratio | Augmented | Nuisance-adj | Optimal | |
treatments | diagnostics | weights | weights | estimands | weighting | variance | trimming | |
PSweight | ||||||||
twang | ||||||||
CBPS | ||||||||
PSW | ||||||||
optweight | ||||||||
ATE | ||||||||
WeightIt | ||||||||
causalweight | ||||||||
sbw |
References: twang (Version 1.6): Ridgeway et al. (2020); CBPS (Version 0.21): Fong et al. (2019); PSW (Version 1.1-3): Mao and Li (2018); optweight (Version 0.2.5): Greifer (2019); ATE (Version 0.2.0): Haris and Chan (2015); WeightIt (Version 0.10.2): Greifer (2020); causalweight (Version 0.2.1): Bodory and Huber (2020); sbw (Version 1.1.1): Zubizarreta and Li (2020).
To our knowledge, PSweight is the first R package to accommodate a variety of balancing weighting schemes with multiple treatments. Existing R packages such as twang (Ridgeway et al. 2020), CBPS (Fong et al. 2019), optweight (Greifer 2019) have also implemented weighting-based estimation with multiple treatments, but focus on IPW. The PSW R package (Mao and Li 2018) implements both OW and MW and allows for nuisance-adjusted variance estimation, but it is only for binary treatments. To better assist applied researchers to perform propensity score weighting analysis, this article provides a full introduction of the PSweight package. In what follows, we explain the methodological foundation of PSweight and outline the main functions and their arguments. We further illustrate the use of these functions with a data example that studies the causal effect of educational attainment on income, and finally conclude with a short discussion.
Before diving into the implementation details of PSweight, we briefly introduce the basics of the propensity score weighting framework.
Assume we have an observational study with
Causal effects are contrasts of the potential outcomes of the same units
in a target population, which usually is the population of a
scientific interest (Thomas et al. 2020b). PSweight incorporates a
general class of weighted average treatment effect (WATE) estimands.
Specifically, assume the observed sample is drawn from a probability
density
For a given tilting function
IPW has been the dominant weighting method in the literature, but has a well-known shortcoming of being sensitive to extreme propensity scores, which induces bias and large variance in estimating treatment effects. OW addresses the conceptual and operational problems of IPW. Among all balancing weights, OW leads to the smallest asymptotic (and often finite-sample) variance of the weighting estimator (2). (Li et al. 2018, 2019). Recent simulations also show that OW provides more stable causal estimates under limited overlap (Yoshida et al. 2017; Mao et al. 2018; Yoshida et al. 2018; Li et al. 2019), and is more robust to misspecification of the propensity score model (Zhou et al. 2020).
PSweight implements two additional types of balancing weights: matching weights (MW) (Li and Greene 2013), and entropy weights (EW) (Zhou et al. 2020). Similar to OW, MW and EW focus on target populations with substantial overlap between treatment groups. Though having similar operating characteristics, MW and EW do not possess the same theoretical optimality as OW, and are less used in practice. Therefore, we will not separately describe MW and EW hereafter.
Target population | Tilting function |
Estimand | Balancing weights |
---|---|---|---|
Combined | 1 | ATE | |
Treated | ATT | ||
Overlap | ATO | ||
Matching | ATM | ||
Entropy | ATEN |
Notes:
In observational studies, propensity scores are generally unknown and
need to be estimated. Therefore, propensity score analysis usually
involves two steps: (1) estimating the propensity scores, and (2)
estimating the causal effects based on the estimated propensity scores.
In PSweight, the default model for estimating propensity scores with
binary treatments is a logistic regression model. Spline or polynomial
models can be easily incorporated by adding bs()
, ns()
or poly()
terms into the model formula. PSweight also allows for importing
propensity scores estimated from external routines, such as boosted
models or super learner.
Goodness-of-fit of the propensity score model is usually assessed based
on the resulting covariate balance. In the context of propensity score
weighting, this is measured based on either the absolute standardized
difference (ASD):
(Li and Li 2019) extend the framework of balancing weights to
multiple treatments. Assume that we have
To define the target estimand, let
With multiple treatments, the tilting function
Among all the weights, OW minimizes the total asymptotic variances of all pairwise comparisons, and has been shown to have the best finite-sample efficiency in estimating pairwise WATEs (Li and Li 2019). Table 3 summarizes the target population, tilting function and balancing weight for multiple treatments that are available in PSweight.
Target population | Tilting function |
Balancing weights |
---|---|---|
Combined | 1 | |
Treated ( |
||
Overlap | ||
Matching | ||
Entropy |
To estimate the generalized propensity scores for multiple treatments, the default model in PSweight is a multinomial logistic model. PSweight also allows for externally estimated generalized propensity scores. Goodness-of-fit of the generalized propensity score model is assessed by the resulting covariate balance, which is measured by the pairwise versions of the ASD and PSD. The detailed formula of these metrics can be found in (Li and Li 2019). A common threshold for balance is that the maximum pairwise ASD or maximum PSD is below 0.1.
Propensity score trimming excludes units with estimated (generalized)
propensity scores close to zero (or one). It is a popular approach to
address the extreme weights problem of IPW. PSweight implements the
symmetric trimming rules in Crump et al. (2009) and Yoshida et al. (2018). Operationally,
we allow users to specify a single cutoff
PSweight also implements augmented weighting estimators, which augment
a weighting estimator by an outcome regression and improves the
efficiency. With IPW, the augmented weighting estimator is known as the
doubly-robust estimator
(Lunceford and Davidian 2004; Bang and Robins 2005; Funk et al. 2011). With binary
treatments, the augmented estimator with general balancing weights are
discussed Hirano et al. (2003) and Mao et al. (2018). Below, we briefly outline the form
of this estimator with multiple treatments. Recall the conditional mean
of
With a pre-specified tilting function, the augmented weighting estimator
for group
With binary and count outcomes, ratio causal estimands are often of
interest. Using notation from the multiple treatments as an example,
once we use weighting to obtain estimates for the set of average
potential outcomes
PSweight by default implements the empirical sandwich variance for
propensity score weighting estimators
(Lunceford and Davidian 2004; Mao et al. 2018; Li et al. 2019) based on the M-estimation
theory (Stefanski and Boos 2002). The variance adjusted for the uncertainty in
estimating the propensity score and outcome models, and are sometime
referred to as the nuisance-adjusted sandwich variance. Below we
illustrate the main steps with multiple treatments and general balancing
weights. Write
For ratio causal estimands, PSweight applies the logarithm
transformation to improve the accuracy of the normal approximation
(Agresti 2003). For estimating the variance of causal RR, we
first obtain the joint variance of
PSweight also allows using bootstrap to estimate variances, which can
be much more computationally intensive than the closed-form sandwich
estimator but sometimes give better finite-sample performance in small
samples. By default, PSweight resamples
The PSweight package includes two modules tailored for design and analysis of observational studies. The design module provides diagnostics to assess the adequacy of the propensity score model and the weighted target population, prior to the use of outcome data. The analysis module provides functions to estimate the causal estimands. We briefly describe the two modules below.
PSweight offers the SumStat()
function to visualize the distribution
of the estimated propensity scores, to assess the balance of covariates
under different weighting schemes, and to characterize the weighted
target population. It uses the following code snippet:
SumStat(ps.formula, ps.estimate = NULL, trtgrp = NULL, Z = NULL, covM = NULL,
zname = NULL, xname = NULL, data = NULL,weight = "overlap", delta = 0,
method = "glm", ps.control = list())
By default, the (generalized) propensity scores are estimated by the
(multinomial) logistic regression, through the argument ps.formula
.
Alternatively, gbm()
functions in the gbm package (Greenwell et al. 2019) or the
SuperLearner()
function in the SuperLearner package (Polley et al. 2019) can
also be called by using method = "gbm"
or method = "SuperLearner"
.
Additional parameters of those functions can be supplied through the
ps.control
argument. The argument ps.estimate
supports estimated
propensity scores from external routines. SumStat()
produces a
SumStat
object, with estimated propensity scores, unweighted and
weighted covariate means for each treatment group, balance diagnostics,
and effective sample sizes (defined in (Li and Li 2019)). We then
provide a summary.SumStat()
function, which takes the SumStat
object
and summarizes weighted covariate means by treatment groups and the
between-group differences in either ASD or PSD. The default options in
weighted.var = TRUE
and metric = "ASD"
yield ASD based on weighted
standard deviations in (Austin and Stuart 2015). The weighted covariate means
can be used to build a baseline characteristics “Table 1” to illustrate
the target population where trimming or balancing weights are applied.
summary(object, weighted.var = TRUE, metric = "ASD")
Function | Description |
---|---|
SumStat() |
Generate a SumStat object with information of propensity scores and weighted covariate balance |
summary.SumStat() |
Summarize the SumStat object and return weighted covariate means by treatment groups and weighted or unweighted between-group differences in ASD or PSD |
plot.SumStat() |
Plot the distribution of propensity scores or weighted covariate balance metrics from the SumStat object |
PStrim() |
Trim the data set based on estimated propensity scores |
Diagnostics of propensity score models can be visualized with the
plot.SumStat()
function. It takes the SumStat
object and produces a
balance plot (type = "balance"
) based on the ASD and PSD. A vertical
dashed line can be set by the threshold
argument, with a default value
equal to plot.SumStat()
function can also supply density
plot (type = "density"
), or histogram (type = "hist"
) of the
estimated propensity scores. The histogram, however, is only available
for the binary treatment case. The plot function is implemented as
follows:
plot(x, type = "balance", weighted.var = TRUE, threshold = 0.1, metric = "ASD")
In the design stage, propensity score trimming can be carried out with
the PStrim()
function. The trimming threshold delta
is set to 0 by
default. PStrim()
also enables optimal trimming rules
(optimal = TRUE
) that give the most statistically efficient (pairwise)
subpopulation ATE, among all possible trimming rules. A trimmed data set
along with a summary of trimmed cases will be returned by PStrim()
.
This function is given below:
PStrim(data, ps.formula = NULL, zname = NULL, ps.estimate = NULL, delta = 0,
optimal = FALSE, method = "glm", ps.control = list())
Alternatively, trimming is also anchored in the SumStat()
function
with the delta
argument. All functions in the design module are
summarized in Table 4.
The analysis module of PSweight includes two functions: PSweight()
and summary.PSweight()
. The PSweight()
function estimates the
average potential outcomes in the target population,
bootstrap = TRUE)
. The weight
argument can take "IPW"
,
"treated"
, "overlap"
, "matching"
or "entropy"
, corresponding to
the weights introduced in Tables 2 and
3. More detailed descriptions of each input
argument in the PSweight()
function can be found in Table
5. A typical PSweight()
code snippet looks like
PSweight(ps.formula, ps.estimate, trtgrp, zname, yname, data, weight = "overlap",
delta = 0, augmentation = FALSE, bootstrap = FALSE, R = 50, out.formula = NULL,
out.estimate = NULL, family = "gaussian", ps.method = "glm", ps.control = list(),
out.method = "glm", out.control = list())
Argument | Description | Default |
---|---|---|
ps.formula |
A symbolic description of the propensity score model. | – |
ps.estimate |
An optional matrix or data frame with externally estimated (generalized) propensity scores for each observation; can also be a vector with binary treatments. | NULL |
trtgrp |
An optional character defining the treated population for estimating (pairwise) ATT. It can also be used to specify the treatment level when only a vector of values are supplied for ps.estimate in the binary treatment setting. |
Last value in alphabetic order |
zname |
An optional character specifying the name of the treatment variable when ps.formula is not provided. |
NULL |
yname |
A character specifying name of the outcome variable in data . |
|
weight |
A character specifying the type of weights to be used. | "overlap" |
delta |
Trimming threshold for (generalized) propensity scores. | 0 |
augmentation |
Logical value of whether augmented weighting estimators should be used. | FALSE |
bootstrap |
Logical value of whether bootstrap is used to estimate the standard error | FALSE |
R |
Number of bootstrap replicates if bootstrap = TRUE |
50 |
out.formula |
A symbolic description of the outcome model to be estimated when augmentation = TRUE |
|
out.estimate |
An optional matrix or data frame containing externally estimated potential outcomes for each observation under each treatment level. | NULL |
family |
A description of the error distribution and canonical link function to be used in the outcome model if out.formula is provided |
"gaussian" |
ps.method |
a character to specify the method for propensity model. | "glm" |
ps.control |
A list to specify additional options when method is set to "gbm" or "SuperLearner" . |
list() |
out.method |
A character to specify the method for outcome model. | "glm" |
out.control |
A list to specify additional options when methodout is set to "gbm" or "SuperLearner" . |
list() |
Similar to the design module, the summary.PSweight()
function
synthesizes information from the PSweight
object for statistical
inference. A typical code snippet looks like
summary(object, contrast, type = "DIF", CI = TRUE)
In the summary.PSweight()
function, the argument type
corresponds to
the three types estimands: type = "DIF"
is the default argument that
specifies the additive causal contrasts; type = "RR"
specifies the
contrast on the log scale as in equation (10); type = "OR"
specifies the contrast on the log odds scale as in equation
(11). Confidence intervals and p-values are obtained using
normal approximation and reported by the summary.PSweight()
function.
The argument contrast
represents a contrast vector contrast
is not specified,
summary.PSweight()
provides all pairwise comparisons of the average
potential outcomes. By default, confidence interval is printed
(CI = TRUE
); alternatively, one can print the test statistics and
p-values by CI = FALSE
.
We demonstrate PSweight in a case study that estimates the causal
effect of educational attainment on hourly wage, based on the National
Child Development Survey (NCDS) data. The National Child Development
Survey (NCDS) is a longitudinal study on children born in the United
Kingdom (UK) in wage
is log of the gross hourly
wage in Pound. The treatment variable is educational attainment. For the
multiple treatment case, To start with, we created Dmult
as a
treatment variable with three levels: ">=A/eq"
, "O/eq"
and "None"
,
representing advanced qualification (white
indicates whether an individual
identified himself as white race; scht
indicates the school type they
attended at age qmab
and qmab2
are math test scores at age qvab
and qvab2
are two reading test scores at age sib
u
stands for the number of siblings; agepa
and
agema
are the ages of parents in year 1,974; in the same year, the
employment status of mother maemp
was also collected; paed
u
and maed
u
are the years of education for parents. For
simplicity, we will focus on IPW and the three types of weights that
improve covariate overlap: OW, MW and EW.
We use Dmult
, the three-level variable, as the treatment of interest.
About one half of the population attained advanced academic
qualification, there are approximately equal numbers of individuals with
intermediate academic qualification or no academic qualification. To
illustrate the estimation and inference for ratio estimands, we also
introduce a binary outcome of wage, wagebin
. The dichotomized wage was
obtained with the cutoff of the average hourly wage of actively employed
British male aged
We specify a multinominal regression model, ps.mult
, to estimate the
generalized propensity scores.
ps.mult <- Dmult ~ white + maemp + as.factor(scht) + as.factor(qmab)
as.factor(qmab2) + as.factor(qvab) + as.factor(qvab2) + paed_u + maed_u +
agepa + agema + sib_u + paed_u * agepa + maed_u * agema
Then we obtain the propensity score estimates and assess weighted
covariate balance with the SumStat()
function.
bal.mult <- SumStat(ps.formula = ps.mult,
weight = c("IPW", "overlap", "matching", "entropy"), data = NCDS)
plot(bal.mult, type = "density")
Dmult
generated by plot.SumStat()
function in
the PSweight package.
The distributions of generalized propensity scores are given in Figure
1 (in alphabetic order of the names of treatment
groups). For the generalized propensity score to receive the advanced
qualification (">=A/eq"
) or no qualification ("None"
), there is a
mild lack of overlap due to separation of the group-specific
distribution. Since bal.mult
includes four weighting schemes, we plot
the maximum pairwise ASD and assess the (weighted) covariate balance in
a single Love plot.
plot(bal.mult, metric = "ASD")
The covariates are imbalanced across the three groups prior to any
weighting. Although IPW can generally improve covariate balance, the
maximum pairwise ASD still ocassionally exceeds the threshold
Dmult
using the maximum pairwise ASD metric, generated by plot.SumStat()
function in the PSweight package.The PSweight package can perform trimming based on (generalized)
propensity scores. As IPW does not adequately balance the covariates
across the three groups in Figure 2, we explore trimming
as a way to improve balance for IPW. There are two types of trimming
performed by the PSweight package: (1) symmetric trimming that removes
units with extreme (generalized propensity scores)
(Crump et al. 2009; Yoshida et al. 2018) and (2) optimal trimming that provides the
most efficient IPW estimator for estimating (pairwise) ATE
(Crump et al. 2009; Yang et al. 2016). Specifically, the symmetric trimming is
supported by both the SumStat()
and PSweight()
functions through the
delta
argument. Both functions refit the (generalized) propensity
score model after trimming following the recommendations in
Li et al. (2019). We also provide a stand-alone PStrim
function that
performs both symmetric trimming and optimal trimming. Following
Yoshida et al. (2018), with three treatment groups, we exclude all individuals
with the estimated generalized propensity scores less than
trim
element in the SumStat
object). As discussed in
Yoshida et al. (2018), propensity trimming could improve the estimation of ATE
and ATT, but barely have any effect for estimation of ATO and ATM.
Evidently, Figure 3 indicates that IPW controls all
pairwise ASD within
bal.mult.trim <- SumStat(ps.formula = ps.mult, weight = c("IPW", "overlap", "matching",
"entropy"), data = NCDS, delta = 0.067)
bal.mult.trim
1050 cases trimmed, 2592 cases remained
trimmed result by trt group:
>=A/eq None O/eq
trimmed 778 71 201
remained 1028 824 740
weights estimated for: IPW overlap matching entropy
plot(bal.mult.trim,metric = "ASD")
Dmult
using the maximum pairwise ASD metric, after symmetric trimming with
plot.SumStat()
function in
the PSweight package.Alternatively, if one does not specify the trimming threshold, the
PStrim
function supports the optimal trimming procedure that
identifies the optimal threshold based on data. Example syntax is given
as follows. By pulling out the summary statistics for trimming, we can
see that optimal trimming excludes
PStrim(ps.formula = ps.mult, data = NCDS, optimal = TRUE)
>=A/eq None O/eq
trimmed 479 21 82
remained 1327 874 859
For illustration, we estimate the ratio estimands using the binary
outcome wagebin
. For illustration, we will only estimate the causal
effects based on the data without trimming, and the analysis with the
trimmed data follows the exact same steps. Based on the multinomial
logistic propensity score model, we obtain the pairwise causal RR among
the combined population via IPW.
ate.mult <- PSweight(ps.formula = ps.mult, yname = "wagebin", data = NCDS,
weight = "IPW")}
contrasts.mult <- rbind(c(1,-1, 0), c(1, 0,-1), c(0, -1, 1))
sum.ate.mult.rr <- summary(ate.mult, type = "RR", contrast = contrasts.mult)
sum.ate.mult.rr
Closed-form inference:
Inference in log scale:
Original group value: >=A/eq, None, O/eq
Contrast:
>=A/eq None O/eq
Contrast 1 1 -1 0
Contrast 2 1 0 -1
Contrast 3 0 -1 1
Estimate Std.Error lwr upr Pr(>|z|)
Contrast 1 0.607027 0.115771 0.380120 0.83393 1.577e-07 ***
Contrast 2 0.459261 0.052294 0.356767 0.56176 < 2.2e-16 ***
Contrast 3 0.147766 0.121692 -0.090746 0.38628 0.2246
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
By providing the appropriate contrast matrix, we obtain all pairwise
comparisons of the average potential outcomes on the log scale with the
summary.PSweight()
function, and estimate
summary.PSweight()
function.
exp(sum.ate.mult.rr$estimates[,c(1,4,5)])
Estimate lwr upr
Contrast 1 1.834968 1.4624601 2.302358
Contrast 2 1.582904 1.4287028 1.753749
Contrast 3 1.159241 0.9132496 1.471493
Focusing on the target population that has the most overlap in the
observed covariates, we further use the OW to estimate the pairwise
causal RR. OW theoretically provides the best internal validity for
pairwise comparisons; Figure 3 also indicates that OW
achieves better covariate balance among the overlap population.
Exponentiating the results provided by the summary.PSweight()
function, we observe each pairwise causal RR has a larger effect size
among the overlap weighted population. Interestingly, among the overlap
population, the proportion that receives an above-average hourly wage
when everyone attains intermediate qualification becomes approximately
ato.mult <- PSweight(ps.formula = ps.mult, yname = "wagebin", data = NCDS,
weight = "overlap")
sum.ato.mult.rr <- summary(ato.mult, type = "RR", contrast = contrasts.mult)
exp(sum.ato.mult.rr$estimates[,c(1,4,5)])
Estimate lwr upr
Contrast 1 2.299609 1.947140 2.715882
Contrast 2 1.527931 1.363092 1.712705
Contrast 3 1.505048 1.257180 1.801785
The above output suggests that among the overlap population, the causal
RR for comparing advanced qualification to intermediate qualification is
similar in magnitude to that for comparing intermediate qualification to
no qualification. We can formally test for the equality of two
consecutive causal RR based on the null hypothesis
contrast = c(1, 1, -2)
. The
p-value for testing this null is
summary(ato.mult, type = "RR", contrast = c(1, 1, -2), CI = FALSE)
With the binary outcome wagebin
, we can also estimate the pairwise
causal OR among a specific target population. For example, using OW, the
causal conclusions regarding the effectiveness due to attaining academic
qualification do not change, because all three
sum.ato.mult.or <- summary(ato.mult, type = "OR", contrast = contrasts.mult)
exp(sum.ato.mult.or$estimates[,c(1,4,5)])
Estimate lwr upr
Contrast 1 3.586050 2.841383 4.525879
Contrast 2 2.050513 1.696916 2.477791
Contrast 3 1.748855 1.375483 2.223578
As a final step, we illustrate how to combine OW with outcome regression and estimate the pairwise causal RR among the overlap population. We use the same set of covariates in the binary outcome regression model.
out.wagebin <- wagebin ~ white + maemp + as.factor(scht) + as.factor(qmab) +
as.factor(qmab2) + as.factor(qvab) + as.factor(qvab2) + paed_u + maed_u +
agepa + agema + sib_u + paed_u * agepa + maed_u * agema
Loading this outcome regression formula into the PSweight()
function,
and specifying family = "binomial"
to indicate the type of outcome, we
obtain the augmented overlap weighting estimates on the log RR scale.
Exponentiating the point estimates and confidence limits, one reports
the pairwise causal RR. The pairwise causal RR reported by the augmented
OW estimator is similar to that reported by the simple OW estimator;
further, the width of the confidence interval is also comparable before
and after outcome augmentation, and the causal conclusions based on
pairwise RR remain the same. The similarity between simple and augmented
OW estimators implies that OW itself may already be efficient.
ato.mult.aug <- PSweight(ps.formula = ps.mult, yname = "wagebin", data = NCDS,
augmentation = TRUE, out.formula = out.wagebin, family = "binomial")
sum.ato.mult.aug.rr <- summary(ato.mult.aug, type = "RR", contrast = contrasts.mult)
exp(sum.ato.mult.aug.rr$estimates[,c(1,4,5)])
Estimate lwr upr
Contrast 1 2.310628 1.957754 2.727105
Contrast 2 1.540176 1.375066 1.725111
Contrast 3 1.500237 1.253646 1.795331
As an alternative to the default generalized linear models, we can use
more advanced machine learning models to estimate propensity scores and
potential outcomes. Flexible propensity score and outcome estimation has
been demonstrated to reduce bias due to model misspecification, and
potentially improve covariate balance
(Lee et al. 2010; Hill 2011; McCaffrey et al. 2013). This can be
achieved in PSweight for both balance check and constructing weighted
estimator by specifying the method as the generalized boosted model
(GBM) or the super learner methods. Additional model specifications for
these methods can be supplied through ps.control
and out.control
.
Machine learning models that are included in neither gbm
nor
SuperLearner
could be estimated externally and then imported through
the ps.estimate
and out.estimate
arguments. These two arguments
broaden the utility of PSweight where any externally generated
estimates of propensity scores and potential outcomes models can be
easily incorporated.
We now illustrate the use of GBM as an alternative of the default
generalized linear models. For simplicity, this illustration is based on
binary education. Specifically, we created Dany
to indicate whether
one had attained any academic qualification. There are 2,399 individuals
that attained academic qualification, and 1,243 individuals without any.
GBM is a family of non-parametric tree-based regressions that allow for
flexible non-linear relationships between predictors and outcomes
(Friedman et al. 2000). The following propensity model formula is specified;
the formula does not include interactions terms because boosted
regression is already capable of capturing non-linear effects and
interactions (McCaffrey et al. 2004). In this illustration, we use the AdaBoost
(Freund and Schapire 1997) algorithm to fit the propensity model through the
control setting, ps.control=list(distribution = "adaboost")
. We use
the default values for other model parameters such as the number of
trees (n.trees = 100
), interaction depth (interaction.depth = 1
),
the minimum number of observations in the terminal nodes
(n.minobsinnode = 1
), shrinkage reduction (shrinkage = 0.1
), and
bagging fraction (shrinkage = 0.5
). Alternative values for these
parameters could also be passed through ps.control
.
ps.any.gbm <- Dany ~ white + maemp + as.factor(scht) + as.factor(qmab) +
as.factor(qmab2) + as.factor(qvab) + as.factor(qvab2) + paed_u + maed_u+
agepa + agema + sib_u
bal.any.gbm <-SumStat(ps.formula = ps.any.gbm, data= NCDS, weight = "overlap",
method = "gbm", ps.control = list(distribution = "adaboost"))
The balance check through plot.SumStat()
suggests substantial
improvement in covariate balance with SMD of all covariates below 0.1
after weighting. After assessing balance and confirming the adequacy of
the propensity score model, we further fit the outcome model using GBM
with the default logistic regression and parameters. In the PSweight()
function, we can specify both ps.method = "gbm"
and
out.method = "gbm"
and leave the out.control
argument as default.
The detailed code and summary of the output is in below. Here we
redefine the propensity score model without interaction terms because
GBM considers interactions between covariates by default. The results
using GBM, in this example, are very similar to those using generalized
linear models (results omitted).
out.wage.gbm <- wage ~ white + maemp + as.factor(scht) + as.factor(qmab) +
as.factor(qmab2) + as.factor(qvab) + as.factor(qvab2) + paed_u +
maed_u + agepa + agema + sib_u
ato.any.aug.gbm <- PSweight(ps.formula = ps.any.gbm, yname = "wagebin",
data = NCDS, augmentation = TRUE, out.formula = out.wage.gbm,
ps.method = "gbm", ps.control = list(distribution = "adaboost"),
out.method = "gbm")
summary(ato.any.aug.gbm, CI = FALSE)
Closed-form inference:
Original group value: 0, 1
Contrast:
0 1
Contrast 1 -1 1
Estimate Std.Error z value Pr(>|z|)
Contrast 1 0.186908 0.018609 10.044 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Propensity score weighting is an important tool for causal inference and comparative effectiveness research. This paper introduces the PSweight package and demonstrates its functionality with the NCDS data example in the context of binary and multiple treatment groups. In addition to providing easy-to-read balance statistics and plots to aid the design of observational studies, the PSweight offers point and variance estimation with a variety of weighting schemes for the (weighted) average treatment effects on both the additive and ratio scales. These weighting schemes include the optimal overlap weights recently introduced in Li et al. (2018) and Li and Li (2019), and could help generate valid causal comparative effectiveness evidence among the population at equipoise.
Although propensity score weighting has been largely developed in observational studies, it is also an important tool for covariate adjustment in randomized controlled trials (RCTs). Williamson et al. (2014) showed that IPW can reduce the variance of the unadjusted difference-in-means treatment effect estimator in RCTs, and Shen et al. (2014) proved that the IPW estimator is semiparametric efficient and asymptotically equivalent to the analysis of covariance (ANCOVA) estimator (Tsiatis et al. 2008). Zeng et al. (2020) generalized these results of IPW to the family of balancing weights. Operationally, there is no difference in implementing propensity score weighting between RCTs and observational studies. Therefore, PSweight is directly applicable to perform covariate-adjusted analysis in RCTs.
The PSweight package is under continuing development to include other useful components for propensity score weighting analysis. Specifically, future versions of PSweight will include components to enable pre-specified subgroup analysis with balancing weights and flexible variable selection tools (Yang et al. 2021). We are also studying overlap weighting estimators with time-to-event outcomes and complex survey designs. Those new features are being actively developed concurrently with our extensions to the methodology.
Tianhui Zhou and Guangyu Tong contributed equally to this work and are considered co-first authors. The authors would like to acknowledge the support of the Patient-Centered Outcomes Research Institute (PCORI) contract ME-2018C2-13289, and the NCDS replication data published on Harvard Dataverse (https://dataverse.harvard.edu/) (Battistin and Sianesi 2012), which provides a coded data set for our illustrative analysis.
PSweight, twang, CBPS, PSW, optweight, ATE, WeightIt, causalweight, sbw
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Zhou, et al., "PSweight: An R Package for Propensity Score Weighting Analysis", The R Journal, 2022
BibTeX citation
@article{RJ-2022-011, author = {Zhou, Tianhui and Tong, Guangyu and Li, Fan and Thomas, Laine E. and Li, Fan}, title = {PSweight: An R Package for Propensity Score Weighting Analysis}, journal = {The R Journal}, year = {2022}, note = {https://rjournal.github.io/}, volume = {14}, issue = {1}, issn = {2073-4859}, pages = {282-300} }