We describe a new package called pseval that implements the core methods for the evaluation of principal surrogates in a single clinical trial. It provides a flexible interface for defining models for the risk given treatment and the surrogate, the models for integration over the missing counterfactual surrogate responses, and the estimation methods. Estimated maximum likelihood and pseudo-score can be used for estimation, and the bootstrap for inference. A variety of post-estimation methods are provided, including print, summary, plot, and testing. We summarize the main statistical methods that are implemented in the package and illustrate its use from the perspective of a novice R user.
A valid principal surrogate endpoint, also called a specific nonmechanistic correlate of protection (Plotkin and Gilbert 2012), can be used as a target for treatment improvement in early phase trials and, in the specific setting of evaluation, for predicting individual treatment effects post-licensure. A surrogate is considered to be valid if it provides reliable predictions of treatment effects on the clinical endpoint of interest. Frangakis and Rubin (2002) introduced the concept of principal stratification and the definition of a principal surrogate (PS). Informally, a post-treatment intermediate response variable is a principal surrogate if causal effects of the treatment on the clinical outcome only exist when causal effects of the treatment on the intermediate variable exist. The criteria for a PS have been modified and extended in more recent works, with most current literature focusing on wide effect modification as the primary criterion of interest.
The goal of PS evaluation is estimation and testing of how treatment efficacy on the clinical outcome of interest varies over subgroups defined by possible treatment and surrogate combinations of interest; this is an effect modification objective. The combinations of interest are called the principal strata and they include a set of unobservable counterfactual responses: responses that would have occurred under a set of conditions counter to the observed conditions. To finesse this problem of unobservable responses, a variety of clever trial designs and estimation approaches have been proposed. Several of these have been implemented in the pseval package (Sachs and Gabriel 2016).
Let
Criteria for
For instance, for a binary outcome, the risk function may simply be the
probability
Currently we focus only on marginal risk estimands which condition only
on
Neither of the joint risk estimands are indentifiable in a standard
randomized trial, as either
There are specific trial augmentations that allow for the measurement or
imputation of the missing counterfactual
Specification of the distributions of
Frangakis and Rubin (2002) gave a single criterion for a biomarker S to be a PS:
causal effects of the treatment on the clinical outcome only exist when
causal effects of the treatment on the intermediate variable exist. In
general this can only be evaluated using the joint risk estimands, which
consider not only the counterfactual values of the biomarker under
treatment, but also under control
More recently, other works Gilbert and Hudgens (2008), Wolfson and Gilbert (2010), Huang and Gilbert (2011), Huang et al. (2013),
Gabriel and Gilbert (2014), and Gabriel and Follmann (2016) have suggested that this criterion is both
too restrictive and in some cases can be vacuously true. Instead most
current works suggest that the wide effect modification (WEM) criterion
is of primary importance, ACN being of secondary importance. WEM is
given formally in terms of the risk estimands and a known contrast
function
We first make three standard assumptions used in much of the literature for absorbing events outcomes:
Stable Unit Treatment Value Assumption (SUTVA): Observations on the independent units in the trial should be unaffected by the treatment assignment of other units.
Ignorable Treatment Assignment: The observed treatment assignment does not change the counterfactual clinical outcome.
Equal individual risk up to the time of candidate surrogate
measurement
In time-to-event settings one more assumption is needed:
It should be noted that the equal individual risk assumption requires
that time-to-event analysis start at time
Wolfson and Gilbert (2010) outlines how these assumptions are needed for identification
of the risk estimands. Now to deal with the missing
Briefly, a BIP
The assumptions needed for a BIP to be useful depend on the risk model
used. If the BIP is included in the risk model, only the assumption of
no interaction with treatment and the candidate surrogate are needed.
However, if the BIP is not included in the risk model, the assumption
that that clinical outcome is independent of the BIP given the candidate
surrogate is needed. Although not a requirement for identification of
the risk estimands, it has been found in most simulations studies that a
correlation between the BIP and
Under a CPV or CCT augmented design, control recipients that do not have
events, or all willing control subjects for a non-event driven clinical
outcome, are given the experimental treatment at the end of the
follow-up period. Then, their intermediate response is measured at some
time after that treatment. This measurement is then used as a direct
imputation for the missing
Individual time constancy of the true intermediate response under
active treatment,
No events (infections) during the close-out period.
In the general treatment trial setting, the CCT augmentation can be used under the same Individual time constancy assumption, and the assumption that drop-out or unwillingness to receive close-out treatment is completely at random.
Gabriel and Gilbert (2014) suggested the baseline augmentation BSM, which is a
pre-treatment measurement of the candidate PS, denoted
then
Let
This procedure is called estimated maximum likelihood (EML) and was
developed in Pepe and Fleming (1991). The key idea is that we are averaging the
likelihood contributions for subjects missing
Closed-form inference is not available for EML estimates, thus we recommend use of the bootstrap for estimation of standard errors. It was suggested as an approach to principal surrogate evaluation by Gilbert and Hudgens (2008) and Huang and Gilbert (2011).
Huang et al. (2013) suggest a different estimation procedure that does have a
closed form variance estimator. Instead of numerically optimizing the
estimated likelihood, the pseudoscore approach iteratively finds the
solution to weighted versions of the score equations. Pseudoscore
estimates were also suggested in Wolfson (2009) and implemented for
several special cases in Huang et al. (2013). We have implemented here only one of
the special cases: categorical
Typically, users would have to code up the likelihood, integration model, and perform the optimization themselves. This is beyond the reach of many researchers who desire to use these methods. The goal of pseval is to correctly implement these methods with a flexible and user-friendly interface, enabling researchers to implement and interpret a wide variety of models.
The pseval package allows users to specify the type of augmented
design that is used in their study, specify the form of the risk model
along with the distribution of
Here we will walk through some basic analyses from the point of view of a new R user. Along the way we will highlight the main features of pseval. We support binary, continuous, count, and time-to-event outcomes, thus we will also need to load the survival package (Therneau and Grambsch 2000; Therneau 2015).
First let’s create an example dataset. The pseval package provides the
function generate_example_data
which takes a single argument: the
sample size.
set.seed(1492)
fakedata <- generate_example_data(n = 800)
head(fakedata)
## Z BIP CPV BSM S.obs time.obs
## 1 0 0.3353179 1.4851399 0.45961614 0.35268095 0.3301972
## 2 0 1.4536863 2.6379400 1.39591042 1.46688905 0.1195136
## 3 0 -0.7243934 NA -0.62723499 -0.73190763 0.2631222
## 4 0 -0.1183592 0.9421504 0.07738308 -0.01833409 0.1373458
## 5 0 -0.2352566 NA -0.14971448 -0.18470242 0.8543703
## 6 0 -0.7782851 0.1159434 -0.65721609 -0.66313714 0.2200481
## event.obs Y.obs S.obs.cat BIP.cat
## 1 0 0 (-0.198,0.503] (0.0574,0.766]
## 2 1 0 (1.36, Inf] (0.766, Inf]
## 3 1 1 (-Inf,-0.198] (-Inf,-0.678]
## 4 1 0 (-0.198,0.503] (-0.678,0.0574]
## 5 1 1 (-0.198,0.503] (-0.678,0.0574]
## 6 1 0 (-Inf,-0.198] (-Inf,-0.678]
The example data includes both a time-to-event outcome, a binary outcome, a surrogate, a BIP, CPV, and BSM, and a categorical version of the surrogate. The true model for the time is exponential, with parameters (intercept) = -1, S(1) = 0.0, Z = 0.0, S(1):Z = -0.75. The true model for binary is logistic, with the same parameter values.
In the above table S.obs.cat and BIP.cat are formed as
S.obs.cat <- factor(S.obs,levels=c(-Inf, quantile(c(S.0, S.1), c(.25, .5, .75), na.rm = TRUE), Inf))
and similarly for BIP.cat. Alternatively a user could input arbitrary
numeric values to represent different discrete subgroups (e.g., 0s and
1s to denote 2 subgroups).
"psdesign"
objectWe begin by creating a "psdesign"
object with the synonymous function.
This is the object that combines the raw dataset with information about
the study design and the structure of the data. Subsequent analysis will
operate on this psdesign object. It is designed to be analogous to the
svydesign
function in the
survey package
(Lumley 2004, 2014). The first argument is the data frame where the data
are stored. All subsequent arguments describe the mappings from the
variable names in the data frame to important variables in the PS
analysis, using the same notation as above. Other covariates or
variables can be mapped to arbitrary variable names using the same
syntax. An optional weights argument describes the sampling weights, if
present. Our first analysis will use the binary version of the outcome,
with continuous
binary.ps <- psdesign(data = fakedata, Z = Z, Y = Y.obs, S = S.obs, BIP = BIP)
binary.ps
## Augmented data frame: 800 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 0 0 NA 0.3527 1 0.335
## 2 0 0 NA 1.4669 1 1.454
## 3 0 1 NA -0.7319 1 -0.724
## 4 0 0 NA -0.0183 1 -0.118
## 5 0 1 NA -0.1847 1 -0.235
## 6 0 0 NA -0.6631 1 -0.778
##
## Empirical TE: 0.526
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## None present, see ?add_integration for information on integration models.
##
## Risk models:
## None present, see ?add_riskmodel for information on risk models.
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
The printout displays a brief description of the data, including the
empirical treatment efficacy estimate, the variables used in the
analysis and their corresponding variables in the original dataset.
Finally the printout invites the user to see the help page for
add_integration
, in order to add an integration model to the psdesign
object, the next step in the analysis.
Missing values in the
psdesign
easily accommodates case-control or case-cohort sampling. In
this case, the surrogate
fakedata.cc <- fakedata
missdex <- sample((1:nrow(fakedata.cc))[fakedata.cc$Y.obs == 0],
size = floor(sum(fakedata.cc$Y.obs == 0) * .8))
fakedata.cc[missdex, ]$S.obs <- NA
fakedata.cc$weights <- ifelse(fakedata.cc$Y.obs == 1, 1, .2)
Now we can create the "psdesign"
object, using the entire dataset
(including those missing S.obs
) and passing the weights to the
weights
field.
binary.cc <- psdesign(data = fakedata.cc, Z = Z, Y = Y.obs, S = S.obs,
BIP = BIP, weights = weights)
The other augmentation types can be defined by mapping variables to the
names BIP
, CPV
, and/or BSM
. The augmentations are handled as
described in the previous section: CPV is used as a direct imputation
for
For survival outcomes, a key assumption is that the potential surrogate
is measured at a fixed time tau
is not specified in the
psdesign
object, then it is assumed to be 0. Survival outcomes are
specified by mapping Y
to a Surv
object, which requires the
survival package:
surv.ps <- psdesign(data = fakedata, Z = Z, Y = Surv(time.obs, event.obs), S = S.obs,
BIP = BIP, CPV = CPV, BSM = BSM)
## Warning in psdesign(data = fakedata, Z = Z, Y = Surv(time.obs,
## event.obs), : tau missing in psdesign: assuming that the
## surrogate S was measured at time 0.
The EML procedure requires an estimate of add_integration
. Several integration models are implemented,
including a parametric model that uses a formula interface to define a
regression model, a semiparametric model that specifies a location and a
scale model is robust to the specification of the distribution, and a
non-parametric model that uses empirical conditional probability
estimates for discrete
For this first example, let’s use the parametric integration model. We
specify the mean model for
binary.ps <- binary.ps + integrate_parametric(S.1 ~ BIP)
binary.ps
## Augmented data frame: 800 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 0 0 NA 0.3527 1 0.335
## 2 0 0 NA 1.4669 1 1.454
## 3 0 1 NA -0.7319 1 -0.724
## 4 0 0 NA -0.0183 1 -0.118
## 5 0 1 NA -0.1847 1 -0.235
## 6 0 0 NA -0.6631 1 -0.778
##
## Empirical TE: 0.526
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
##
## Risk models:
## None present, see ?add_riskmodel for information on risk models.
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
We can add multiple integration models to a psdesign object, say we want
a model for
binary.ps + integrate_parametric(S.0 ~ BIP)
## Augmented data frame: 800 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 0 0 NA 0.3527 1 0.335
## 2 0 0 NA 1.4669 1 1.454
## 3 0 1 NA -0.7319 1 -0.724
## 4 0 0 NA -0.0183 1 -0.118
## 5 0 1 NA -0.1847 1 -0.235
## 6 0 0 NA -0.6631 1 -0.778
##
## Empirical TE: 0.526
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
## integration model for S.0 :
## integrate_parametric(formula = S.0 ~ BIP )
##
## Risk models:
## None present, see ?add_riskmodel for information on risk models.
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
In a future version of the package, we will allow for estimation of the
joint risk estimands that depend on both
library(splines)
binary.ps + integrate_parametric(S.1 ~ BIP^2)
binary.ps + integrate_parametric(S.1 ~ bs(BIP, df = 3))
binary.ps + integrate_parametric(S.1 ~ BIP + BSM + BSM^2)
To include additional baseline covariates in the model for psdesign
function call so that they are visible
in the subsequent functions:
binary.ps <- psdesign(data = fakedata, Z = Z, Y = Y.obs, S = S.obs, BIP = BIP,
BSM = BSM, age = age)
binary.ps + integrate_parametric(S.1 ~ BIP + age)
These are shown as examples, we will proceed with the simple linear
model for integration. The other integration models are called
integrate_bivnorm
, integrate_nonparametric
, and
integrate_semiparametric
. See their help files for details on the
models and their specification.
The next step is to define the risk model.
The risk model is the specification of the distribution for the outcome
add_riskmodel
for more details.
Let’s add a simple binary risk model using the logit link. The argument
D
specifies the number of samples to use for the simulated annealing,
also known as empirical integration, in the EML procedure. In general, D
should be set to something reasonably large, like 2 or 3 times the
sample size.
binary.ps <- binary.ps + risk_binary(model = Y ~ S.1 * Z, D = 50, risk = risk.logit)
binary.ps
## Augmented data frame: 800 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 0 0 NA 0.3527 1 0.335
## 2 0 0 NA 1.4669 1 1.454
## 3 0 1 NA -0.7319 1 -0.724
## 4 0 0 NA -0.0183 1 -0.118
## 5 0 1 NA -0.1847 1 -0.235
## 6 0 0 NA -0.6631 1 -0.778
##
## Empirical TE: 0.526
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
##
## Risk models:
## risk_binary(model = Y ~ S.1 * Z, D = 50, risk = risk.logit )
##
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
We estimate the parameters and bootstrap using the same type of syntax.
We can add a "ps_estimate"
object, which takes optional arguments
start
for starting values, and other arguments that are passed to the
optim
function in base R. The method
argument determines the
optimization method, we have found that “BFGS” works well in these types
of problems and it is the default. Use "pseudo-score"
as the method
argument for pseudo-score estimation for binary risk models with
categorical BIPs.
The ps_bootstrap
function takes the additional arguments n.boots
for
the number of bootstrap replicates, and progress.bar
which is a
logical that displays a progress bar in the R console if true. It is
helpful to pass the estimates as starting values in the bootstrap
resampling. With estimates and bootstrap replicates present, printing
the psdesign object displays additional information.
binary.est <- binary.ps + ps_estimate(method = "BFGS")
binary.boot <- binary.est + ps_bootstrap(n.boots = 500, progress.bar = FALSE,
start = binary.est$estimates$par, method = "BFGS")
binary.boot
## Augmented data frame: 800 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 0 0 NA 0.3527 1 0.335
## 2 0 0 NA 1.4669 1 1.454
## 3 0 1 NA -0.7319 1 -0.724
## 4 0 0 NA -0.0183 1 -0.118
## 5 0 1 NA -0.1847 1 -0.235
## 6 0 0 NA -0.6631 1 -0.778
##
## Empirical TE: 0.526
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
##
## Risk models:
## risk_binary(model = Y ~ S.1 * Z, D = 50, risk = risk.logit )
##
## Estimated parameters:
## (Intercept) S.1 Z S.1:Z
## -0.920 -0.028 -0.220 -1.133
## Convergence: TRUE
##
## Bootstrap replicates:
## Estimate boot.se lower.CL.2.5. upper.CL.97.5.
## (Intercept) -0.920 0.182 -1.286 -0.580
## S.1 -0.028 0.128 -0.276 0.220
## Z -0.220 0.250 -0.697 0.277
## S.1:Z -1.133 0.214 -1.581 -0.780
## p.value
## (Intercept) 4.02e-07
## S.1 8.27e-01
## Z 3.80e-01
## S.1:Z 1.29e-07
##
## Out of 500 bootstraps, 500 converged ( 100 %)
##
## Test for wide effect modification on 1 degree of freedom. 2-sided p value < .0001
The next code chunk shows how the model can be defined and estimated all at once.
binary.est <- psdesign(data = fakedata, Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) +
integrate_parametric(S.1 ~ BIP) +
risk_binary(model = Y ~ S.1 * Z, D = 50, risk = risk.logit) +
ps_estimate(method = "BFGS")
We provide summary and plotting methods for the psdesign object. If
bootstrap replicates are present, the summary method does a test for
wide effect modification. Under the parametric risk models implemented
in this package, the test for wide effect modification is equivalent to
a test of the null hypothesis that the
Another way to assess wide effect modification is to compute the
standardized total gain (STG) (Huang and Gilbert 2011) and (Gabriel et al. 2015). This is
implemented in the calc_STG
function. The standardized total gain can
be interpreted as the area sandwiched between the risk difference curve
and the horizontal line at the marginal risk difference. It is a measure
of the spread of the distribution of the risk difference, and is a less
parametric way to test for wide effect modification. The calc_STG
function computes the STG at the estimated parameters and at the
bootstrap samples, if present. The function prints the results and
invisibly returns a list containing the observed STG, and the
bootstrapped STGS.
calc_STG(binary.boot, progress.bar = FALSE)
## $obsSTG
## [1] 0.3397774
##
## $bootstraps
## STG.boot.se STG.lower.CL.2.5 STG.upper.CL.97.5
## V1 0.1243311 0.1573031 0.6382418
The summary method also computes the marginal treatment efficacy
marginalized over
smary <- summary(binary.boot)
## Augmented data frame: 800 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 0 0 NA 0.3527 1 0.335
## 2 0 0 NA 1.4669 1 1.454
## 3 0 1 NA -0.7319 1 -0.724
## 4 0 0 NA -0.0183 1 -0.118
## 5 0 1 NA -0.1847 1 -0.235
## 6 0 0 NA -0.6631 1 -0.778
##
## Empirical TE: 0.526
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
##
## Risk models:
## risk_binary(model = Y ~ S.1 * Z, D = 50, risk = risk.logit )
##
## Estimated parameters:
## (Intercept) S.1 Z S.1:Z
## -0.920 -0.028 -0.220 -1.133
## Convergence: TRUE
##
## Bootstrap replicates:
## Estimate boot.se lower.CL.2.5. upper.CL.97.5.
## (Intercept) -0.920 0.182 -1.286 -0.580
## S.1 -0.028 0.128 -0.276 0.220
## Z -0.220 0.250 -0.697 0.277
## S.1:Z -1.133 0.214 -1.581 -0.780
## p.value
## (Intercept) 4.02e-07
## S.1 8.27e-01
## Z 3.80e-01
## S.1:Z 1.29e-07
##
## Out of 500 bootstraps, 500 converged ( 100 %)
##
## Test for wide effect modification on 1 degree of freedom. 2-sided p value < .0001
##
## Treatment Efficacy:
## empirical marginal model
## 0.526 0.526 0.539
## Model-based average TE is 2.3 % different from the empirical and 2.3 % different
## from the marginal.
The calc_risk
function computes the risk in each treatment arm, and
contrasts of the risks. By default it computes the treatment efficacy,
but there are other contrast functions available. The contrast function
is a function that takes 2 inputs, the calc_risk
. See ?calc_risk
for more information about contrast functions.
Other arguments of the calc_risk
function include t
, the time at
which to calculate the risk for time-to-event outcomes, n.samps
which
is the number of samples over the range of S.1 at which the risk will be
calculated, and CI.type
, which can be set to "pointwise"
for
pointwise confidence intervals or "band"
for a simultaneous confidence
band. sig.level
is the significance level for the bootstrap confidence
intervals. If the outcome is time-to-event and
head(calc_risk(binary.boot, contrast = "TE", n.samps = 20), 3)
## S.1 Y R0 R1 Y.boot.se
## V1 -2.2756987 -1.7437221 0.2980453 0.8177536 1.1622104
## V2 -1.4262708 -1.1360482 0.2930970 0.6260692 0.6957994
## V3 -0.5973759 -0.3532827 0.2883149 0.3901715 0.3328793
## Y.upper.CL.0.95 Y.lower.CL.0.95 R0.boot.se R0.upper.CL.0.95
## V1 -0.30455106 -3.780389 0.08994238 0.4766278
## V2 -0.05541741 -2.556452 0.06901664 0.4331237
## V3 0.29675970 -1.275280 0.04941685 0.4007098
## R0.lower.CL.0.95 R1.boot.se R1.upper.CL.0.95 R1.lower.CL.0.95
## V1 0.1188411 0.06911306 0.9368827 0.6720409
## V2 0.1468385 0.07592515 0.7762517 0.4768403
## V3 0.1734875 0.05079824 0.5248493 0.2834909
head(calc_risk(binary.boot, contrast = function(R0, R1) 1 - R1/R0, n.samps = 20), 3)
## S.1 Y R0 R1 Y.boot.se
## V1 -0.97417991 -0.71327775 0.2904830 0.4976780 0.4840781
## V2 -0.11875337 0.05966359 0.2855748 0.2685364 0.1882398
## V3 -0.09236484 0.08009338 0.2854242 0.2625636 0.1820450
## Y.upper.CL.0.95 Y.lower.CL.0.95 R0.boot.se R0.upper.CL.0.95
## V1 0.1395560 -1.6775172 0.05815888 0.4084405
## V2 0.4746753 -0.4357974 0.03903555 0.3909614
## V3 0.4835707 -0.4036444 0.03849822 0.3904263
## R0.lower.CL.0.95 R1.boot.se R1.upper.CL.0.95 R1.lower.CL.0.95
## V1 0.1763127 0.06531273 0.6258814 0.3695242
## V2 0.2043944 0.03260224 0.3454742 0.1923057
## V3 0.2053113 0.03176977 0.3379838 0.1879937
It is easy to plot the risk estimates. By default, the plot method
displays the TE contrast, but this can be changed using the same syntax
as in calc_risk
.
plot(binary.boot, contrast = "TE", lwd = 2)
abline(h = smary$TE.estimates[2], lty = 3)
expit <- function(x) exp(x)/(1 + exp(x))
trueTE <- function(s){
r0 <- expit(-1 - 0 * s)
r1 <- expit(-1 - 1.25 * s)
1 - r1/r0
}
rug(binary.boot$augdata$S.1)
curve(trueTE(x), add = TRUE, col = "red")
legend("bottomright", legend = c("estimated TE", "95\\% CB",
"marginal TE", "true TE"),
col = c("black", "black", "black", "red"),
lty = c(1, 2, 3, 1), lwd = c(2, 2, 1, 1))
By default, plots of psdesign objects with bootstrap samples will
display simultaneous confidence bands for the curve. These bands
for confidence level CI.type = "pointwise"
. These
intervals satisfy
Different summary measures are available for plotting. The options are
“TE” for treatment efficacy = log
option of plot
.
plot(binary.boot, contrast = "logRR", lwd = 2,
col = c("black", "grey75", "grey75"))
plot(binary.boot, contrast = "RR", log = "y", lwd = 2,
col = c("black", "grey75", "grey75"))
plot(binary.boot, contrast = "RD", lwd = 2,
col = c("black", "grey75", "grey75"))
plot(binary.boot, contrast = "risk", lwd = 2, lty = c(1, 0, 0, 2, 0, 0))
legend("topright", legend = c("R0", "R1"), lty = c(1, 2), lwd = 2)
The calc_risk
function is the workhorse that creates the plots. You
can call this function directly to obtain estimates, standard errors,
and confidence intervals for the estimated risk in each treatment arm
and transformations of the risk like TE. The parameter n.samps
determines the number of points at which to calculate the risk. The
points are evenly spaced over the range of S.1. Use this function to
compute other summaries, make plots using
ggplot2 (Wickham 2009)
or lattice (Sarkar 2008)
and more.
te.est <- calc_risk(binary.boot, CI.type = "pointwise", n.samps = 200)
head(te.est, 3)
## S.1 Y R0 R1 Y.boot.se
## V1 -2.328509 -1.770899 0.2983546 0.8267105 1.1943792
## V2 -2.275699 -1.743722 0.2980453 0.8177536 1.1622104
## V3 -1.694556 -1.360959 0.2946547 0.6956675 0.8334835
## Y.lower.CL.2.5 Y.upper.CL.97.5 R0.boot.se R0.lower.CL.2.5
## V1 -4.768551 -0.6276106 0.09125321 0.1460172
## V2 -4.671015 -0.6184582 0.08994238 0.1475266
## V3 -3.456249 -0.4524787 0.07557692 0.1648504
## R0.upper.CL.97.5 R1.boot.se R1.lower.CL.2.5 R1.upper.CL.97.5
## V1 0.4890998 0.06786962 0.6663453 0.9237321
## V2 0.4856823 0.06911306 0.6556765 0.9183559
## V3 0.4513863 0.07721039 0.5287710 0.8307616
We have implemented the core methods for principal surrogate evaluation in our pseval package. Our aim was to create a flexible and consistent user interface that allows for the estimation of a wide variety of statistical models in this framework. There has been some other work in this area. The Surrogate package implements the core methods for the evaluation of trial-level surrogates using a meta-analytic framework. It also has a wide variety of models, each implemented in a different function each with a long list of parameters (Elst et al. 2016).
Our package uses the +
sign to combine function calls into a single
object. This is called “overloading the +
operator” and is most
famously known from the ggplot2 package. Conceptually, this was
appealing to us because it allows users to build up analysis objects
starting from the design, and ending with the estimation. The distinct
analysis concepts of the design, risk model specification, integration
model, and estimation/bootstrap approaches are separated into distinct
function calls, each with a limited number of parameters. This makes it
easier for users to keep track of their models, makes it easier to
understand the methods involved, and allows for the specification of a
wide variety of models by mixing and matching the function calls. This
framework will also make it easier to maintain the codebase, and to
extend it in the future as the methods evolve. Our package is useful for
novice and expert R users alike, and implements an important set of
statistical methods for the first time.
plot(binary.boot, contrast = "TE", lwd = 2, CI.type = "band")
sbs <- calc_risk(binary.boot, CI.type = "pointwise", n.samps = 200)
lines(Y.lower.CL.2.5 ~ S.1, data = sbs, lty = 3, lwd = 2)
lines(Y.upper.CL.97.5 ~ S.1, data = sbs, lty = 3, lwd = 2)
legend("bottomright", lwd = 2, lty = 1:3,
legend = c("estimate", "simultaneous CI", "pointwise CI"))
library(ggplot2)
TE.est <- calc_risk(binary.boot, n.samps = 200)
ggplot(TE.est,
aes(x = S.1, y = Y, ymin = Y.lower.CL.0.95, ymax = Y.upper.CL.0.95)) +
geom_line() + geom_ribbon(alpha = .2) + ylab(attr(TE.est, "Y.function"))
cc.fit <- binary.cc + integrate_parametric(S.1 ~ BIP) +
risk_binary(D = 10) + ps_estimate()
cc.fit
surv.fit <- psdesign(fakedata, Z = Z, Y = Surv(time.obs, event.obs),
S = S.obs, BIP = BIP, CPV = CPV) +
integrate_semiparametric(formula.location = S.1 ~ BIP, formula.scale = S.1 ~ 1) +
risk_exponential(D = 10) + ps_estimate(method = "BFGS") + ps_bootstrap(n.boots = 20)
surv.fit
plot(surv.fit)
fakedata$Y.cont <- log(fakedata$time.obs + 0.01)
cont.fit <- psdesign(fakedata, Z = Z, Y = Y.cont,
S = S.obs, BIP = BIP, CPV = CPV) +
integrate_semiparametric(formula.location = S.1 ~ BIP, formula.scale = S.1 ~ 1) +
risk_continuous(D = 10) + ps_estimate(method = "BFGS") + ps_bootstrap(n.boots = 20)
cont.fit
plot(cont.fit, contrast = "risk")
S.obs.cat
and BIP.cat
are factors:
with(fakedata, table(S.obs.cat, BIP.cat))
cat.fit <- psdesign(fakedata, Z = Z, Y = Y.obs,
S = S.obs.cat, BIP = BIP.cat) +
integrate_nonparametric(formula = S.1 ~ BIP) +
risk_binary(Y ~ S.1 * Z, D = 10, risk = risk.probit) + ps_estimate(method = "BFGS")
cat.fit
plot(cat.fit)
Categorical W allows for estimation of the model using the pseudo-score
method for binary outcomes.
cat.fit.ps <- psdesign(fakedata, Z = Z, Y = Y.obs,
S = S.obs, BIP = BIP.cat) +
integrate_nonparametric(formula = S.1 ~ BIP) +
risk_binary(Y ~ S.1 * Z, D = 10, risk = risk.logit) +
ps_estimate(method = "pseudo-score") +
ps_bootstrap(n.boots = 20, method = "pseudo-score")
summary(cat.fit.ps)
plot(cat.fit.ps)
pseval, survival, survey, ggplot2, lattice, Surrogate
ClinicalTrials, Econometrics, MissingData, OfficialStatistics, Phylogenetics, Spatial, Survival, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Sachs & Gabriel, "An Introduction to Principal Surrogate Evaluation with the pseval Package", The R Journal, 2016
BibTeX citation
@article{RJ-2016-046, author = {Sachs, Michael C and Gabriel, Erin E}, title = {An Introduction to Principal Surrogate Evaluation with the pseval Package}, journal = {The R Journal}, year = {2016}, note = {https://rjournal.github.io/}, volume = {8}, issue = {2}, issn = {2073-4859}, pages = {277-292} }