The R package gset calculates equivalence and futility boundaries based on the exact bivariate non-central \(t\) test statistics. It is the first R package that targets specifically at the group sequential test of equivalence hypotheses. The exact test approach adopted by gset neither assumes the large-sample normality of the test statistics nor ignores the contribution to the overall Type I error rate from rejecting one out of the two one-sided hypotheses under a null value. The features of gset include: error spending functions, computation of equivalence boundaries and futility boundaries, either binding or nonbinding, depiction of stagewise boundary plots, and operating characteristics of a given group sequential design including empirical Type I error rate, empirical power, expected sample size, and probability of stopping at an interim look due to equivalence or futility.
Group sequential tests are repeated significance testing on data
accumulated during a study, in contrast to the traditional one-time
analysis at the end of the study. Since the same hypothesis is
repeatedly tested, it is critical to compute the proper critical values
at each interim analysis to keep the overall Type I error rate at a
prespecified nominal level. Applied appropriately, group sequential
designs (GSDs) can help saving resources, shortening study duration, and
stopping ineffective treatments much earlier than the traditional
non-sequential designs. There are existing software applications, both
commercial and open-source, of GSDs in studies, including
PROC SEQDESIGN
and PROC SEQTEST
procedures in SAS©, EAST© developed
by Cytel, as well as the R packages
gsDesign (general GSDs
and operating characteristics; Anderson (2014)),
GroupSeq (GSD via the
Type I error spending approach; Pahl (2014)), the ldBand
function from
Hmisc (GSD via from the
Lan-DeMets method with a variety of \(\alpha\)-spending functions;
Harrell Jr (2014)), ldbands
(boundary calculation using the Lan-DeMets \(\alpha\)
spending function approach),
PwrGSD (evaluation of
interim analysis plans for GSD on a survival endpoint; construction of
efficacy and futility boundaries, and calculation of power of a GSD;
Izmirlian (2014)), AGSDest
(parameter estimation in adaptive GSDs; Hack et al. (2013)),
clinfun (design and
analysis of clinical trials; Seshan (2014)).
This discussion focuses on GSDs in studies with equivalence hypotheses. Equivalence studies concern “equivalence” between two groups. Since mathematical equivalence is impossible to establish, the concept of “equivalence bounds” is often applied. Denote the parameter that represents the dissimilarity in a response variable between two groups by \(\theta\), then the hypothesis set being tests in equivalence studies can be expressed as two one sided hypotheses \(\{H_{10}\!\!: \theta < L \mbox{ v.s. } H_{11}\!\!: \theta > L\}\) and \(\{H_{20}\!\!: \theta > U\mbox{ v.s. } H_{21}\!\!: \theta < U\}\), where equivalence bounds \(L\) and \(U\) are located on the opposite sides of the value that represents mathematical equivalence though not necessarily symmetric. The choice of \((L,U)\) is primarily driven by practical and scientific justifications, or set by regulatory requirements in case of pharmaceutical studies. Simultaneous rejection of \(H_{10}\) and \(H_{20}\) leads to the claim of equivalence. GSDs with equivalence hypotheses have been discussed in the literature (see Jennison and Turnbull (1989, 1993; Durrleman and Simon 1990; Gould 1995; Hwang 1996; Whitehead 1996; Potvin et al. 2008; Huang and Self 2010; Montague et al. 2012)). Many test procedures discussed the literature are based on the large-sample normality assumption of the test statistics. This could be problematic in sequential tests given that the stagewise sample sizes at the early looks can be small. Secondly, to facilitate the computation of the critical values, rather than working with the dual test statistics (one for each of the two one-sided hypothesis), which is computationally more demanding, an analytical approximation that ignores the contribution to the probability of rejecting one \(H_0\) out of the two under \(H_0\) is commonly applied in practice. Though this approach is valid in the sense that it guarantees the Type I error rate at or below the nominal level, it can be overly conservative when the Type I error rate or sample size is small, which can be the very situation that GSDs have to deal with in early stages. The exact test procedure formed with the duel \(t\) statistics in the latest work of Liu and Li (2014) overcomes these major methodological shortcomings in the GSDs of equivalence studies.
To the best of our knowledge, there are no software packages or tools that target specifically at the GSDs of equivalence studies. Though some existing software applications, which are not designed for GSD in equivalence studies, can be tricked into doing sequential tests on equivalence hypotheses, the tricking process itself can be confusing and error-prone. Even if the tricking is successful, other potential problems still exist. First, many of the procedures are based on the large sample normality assumption. Second, the futility boundaries are calculated from testing \(H_0\!\!:\theta=0\). Since \(\theta=0\) represents perfect equivalence, rejecting \(\theta=0\) to claim futility is inappropriately conservative and results in inflated Type II error rate since any other \(\theta\in(L,U)\) per the equivalence hypothesis also falls within the range of “equivalence”. Third, the existing GSD software packages, because they are not designed for equivalence studies, can only accommodate symmetric equivalence bounds about 0: \((-\Delta, \Delta)\), which can be restrictive in some real life applications.
In this discussion, we introduce the R package gset that fills the software gap in GSDs for studies with equivalence hypotheses. The stagewise equivalence and futility boundaries, either binding or nonbinding, are back-calculated from the equations that are formed based on the exact dual \(t\) test statistics. The procedure works given any \(\theta\in(L,U)\) rather than just \(\theta=0\). Furthermore, symmetry of the equivalence bounds \(L\) and \(U\) about 0 is not required. In the rest of the paper, we will provide more details on the following topics: the method and computation employed by gset, the functions and outputs from the package gset, and some examples on applying the package gset in designing GSDs in equivalence studies.
Denote by \(\theta\) the true difference between two groups of normally distributed variables \(Y_1\) and \(Y_2\). The estimate of \(\theta\) is denoted by \(\hat{\theta}\); and its standard error is \(\sigma_{\hat{\theta}}\), which is estimated by \(s\). If the original data deviates significantly from the normal distribution, an appropriate transformation can be used, and \(\theta\) would be the difference on the transformed scale. A familiar example is the area under the concentration curve (AUC) and the maximum concentration (Cmax) following a drug administration in bioequivalence studies, which are often modeled by the log-normal distribution. The comparisons between two groups on AUC and Cmax are often performed on the log-scale after the log transformation.
The \(t\) statistics for testing the dual hypotheses \(H_{10}\) and \(H_{20}\) in equivalence studies (in the non-sequential testing) are \(T(L) = (\hat{\theta} - L)s^{-1}\) and \(T(U) = (\hat{\theta} - U)s^{-1}= \frac{ (\hat{\theta}-L)/\sigma_{\hat{\theta}}+ (L-U)/\sigma_{\hat{\theta}}} {(s/\sigma_{\hat{\theta}})}\), respectively. \(\{T(L), T(U)\}\) jointly follow a bivariate non-central \(t\) distribution; and marginally, \(T(L)\) follows a central \(t\) distribution and \(T(U)\) follows a non-central \(t\) distribution with non-centrality parameter \((L-U)/\sigma_{\hat{\theta}}\). A decision rule would reject both \(H_0\) simultaneously and should control the overall Type I error rate at a nominal level \(\alpha\). In the sequential test case, the dual test statistics are denoted by \(T_k(L)=(\hat{\theta}_k-L)/s_k\) and \(T_k(U)=(\hat{\theta}_k-U)/s_k\), where \(\hat{\theta}_k\) and \(s_k\) are computed with the data accrued up to stage \(k\) (\(k=1,\ldots, K\); \(K\) is the total number of looks). At an interim look \(k\) (\(k<K\)), there are 3 stopping options: a) stop the trial and reject both \(H_{10}\) and \(H_{20}\) to claim equivalence; b) stop the trial and “reject” either \(H_{21}\) or \(H_{22}\) to conclude futility; c) move on to the next stage if the test statistics do no cross either the equivalence or the futility boundaries. The rejection region \(\mathcal{R}_k\) in stage \(k\) (to claim equivalence), acceptance region \(\mathcal{A}_k\) (to conclude futility), and the continuation region \(\mathcal{C}_k\) are:
\[\begin{aligned} \mathcal{R}_k&=&\{T_k(L) > c_k^L\} \cap \{T_k(U) < c_k^U\}\label{eq:Rk}, \end{aligned} \tag{1} \]
\[\begin{aligned} \mathcal{A}_k&=&\{T_k(L) < d_k^L\} \cup \{T_k(U) > d_k^U\}\label{eq:Ak}, \end{aligned} \tag{2} \]
\[\begin{aligned} \mathcal{C}_k&=&\left(\mathcal{R}_k \cup \mathcal{A}_k\right)^c,\label{eq:Ck} \end{aligned} \tag{3} \]
where \((c_k^L,c_k^U)\) are the equivalence boundaries, and \((d_k^L,d_k^U)\) are the futility boundaries. In the last stage \(K\), \(\Pr(\mathcal{C}_K)\) is necessarily 0 to allow a final dichotomous decision made between rejection and acceptance of non-equivalence.
The futility boundaries can be binding or non-binding, depending on whether the equivalence boundaries are affected by the action of stopping for futility or not. When the futility boundaries are binding, the decisions to stop for equivalence or futility are “competitive”; that is, the trial will stop at the time point whichever boundaries are crossed first. Otherwise, the overall Type I error rate would be inflated. The equivalence and binding futility boundaries can be determined from the following equation system
\[\begin{aligned} &\Pr(\mathcal{R}_1 \mid \theta = L) = \Pr(\mathcal{R}_1 \mid \theta = U) = \alpha(t_1)\label{eq:c1}\\ \end{aligned} \tag{4} \]
\[\begin{aligned} &\Pr(\mathcal{C}_1 \cap \cdots \cap \mathcal{C}_{k-1} \cap \mathcal{R}_k \mid \theta = L) = \Pr(\mathcal{C}_1 \cap \cdots \cap \mathcal{C}_{k-1} \cap \mathcal{R}_k \mid \theta = U) = \alpha(t_k) - \alpha(t_{k-1}) \label{eq:c2} \end{aligned} \tag{5} \]
\[\begin{aligned} &\Pr(\mathcal{A}_1 \mid \theta \in(L, U)) = \beta(t_1) \label{eq:c3} \end{aligned} \tag{6} \]
\[\begin{aligned} & \Pr(\mathcal{C}_1 \cap \cdots \cap \mathcal{C}_{k-1} \cap \mathcal{A}_k \mid \theta \in(L, U))=\beta(t_k) - \beta(t_{k-1}),\label{eq:c4} \end{aligned} \tag{7} \] where \(k=2,\ldots,K\) in Equations ((5)) and ((7)). \(\alpha(t_k)\) and \(\beta(t_k)\) are the cumulative Type I and II error rates at stage \(k\) that can be conveniently specified using the error spending functions (Lan and DeMets 1983; Hwang et al. 1990). The \(\alpha\)- and \(\beta\)-spending functions do not have to be the same.
In many practical cases, the study sponsors would want an option to continue the trial even when the interim data suggests stopping for futility. The futility boundaries calculated under this circumstance are referred to as “non-binding”. A study with non-binding futility boundaries can stop for futility at an interim look when the futility boundaries are crossed, or not. Non-binding futility provides more flexibility to a study without inflating Type I error rate. The equivalence and non-binding futility can be determined from Equations ((4)), ((6)), and ((7)), plus the equation below
\[\label{eq:c2NB} \Pr(\mathcal{R}_1^c \cap \cdots \cap \mathcal{R}_{k-1}^c \cap \mathcal{R}_k \mid \theta = L) = \Pr(\mathcal{R}_1^c \cap \cdots \cap \mathcal{R}_{k-1}^c \cap \mathcal{R}_k \mid \theta = U) = \alpha(t_k) - \alpha(t_{k-1}). \tag{8} \]
In terms of the actual computation of the critical values, gset employs the Monte Carlo (MC) simulation approach to calculate equivalence and futility boundaries (either non-binding or binding) from the two equation systems given above. Liu and Li (2014) prove that there are some inherent constraints among the critical values including \(c_k^U + c_k^L=0\) in both the binding and non-binding cases, as well as \(d_k^U+ d_k^L=0\) in the non-binding case. In the binding case, some constraint has to be imposed on \((d_k^L, d_k^U)\) in order to obtain unique solutions on the boundaries since the the number of unknowns outnumber the number of equations. We adopt the probability symmetry constraint as suggest by Liu and Li (2014). That is, the likelihood that equivalence is rejected due to rejecting \(H_{10}\) is the same as the likelihood of rejecting from \(H_{20}\). It turns out that the probability symmetry constraint can be reduced to the simple constraint \(d_k^L= -d_k^U\) as in the non-binding case if \((\theta-L)\sigma_{\hat{\theta}}^{-1}\gg1\) and \((U-\theta)\sigma_{\hat{\theta}}^{-1}\gg1\), which is the case in many real-life situations. With the appropriate constraints in place, the boundaries in the non-binding case are calculated in two steps: first solve for \((c_k^L, c_k^U)\) as if there were no futility boundaries, and then compute \((d_k^L, d_k^U)\) conditional on the calculated \((c_k^L, c_k^U)\) from the previous step (Cytel 2009). In the case of binding futilities, \(c_k\) and \(d_k\) will need to be determined simultaneously. After the first look, calculation of \((c_k^L, c_k^U)\) will be conditional on the values \((d_k^L, d_k^U)\) computed from the previous step, and vice versa.
In the last step, the equivalence and futility boundaries must meet so that a final dichotomous decision on whether equivalence is achieved can be made. If the originally planned \(n\) is not large enough, the two types of boundaries will not meet. One simple approach is to force \(|d_K^L| \equiv |d_K^U|=|c^L_K| \equiv|c^U_K|\), with the side effect of inflating the Type II error rate. An alternative, which is better, is to calculate the minimum required \(n\) that guarantees that the equivalence and futility boundaries will meet in the last step. In gset, this minimum required \(n\) is denoted by \(n_{minimax}\), “max” because \(n_{minimax}\) would be the maximum used subjects of a study if it makes it into the last stage \(K\). In gset, \(n_{minimax}\) is searched by a simple bisection approach, with a user-supplied search region.
The package gset contains 8 functions (Table 1).
Among the 8 functions, 4 functions can be used to compute the
equivalence and futility boundaries (equivonly
, nonbinding
,
binding
, nminmax
). The futility critical values calculated by
nonbinding
and binding
in the last step are forced to equal to the
equivalence critical values by default so that a dichotomous decision on
equivalence can be made in the last stage, though users can use argument
force = FALSE
to not force them to be the same. The sample size where
the futility and equivalence critical values naturally coincide in the
last stage are calculated by function nminmax
, with the nominal Type I
and II error rates maintained. In other words, the futility and
equivalence boundaries agree naturally in the last stage with
\(n_{minmax}\). Different sample sizes would only affect the futility
boundaries, but not the equivalence boundaries. One function (nfix
)
computes the sample size of an equivalence study in the traditional
non-sequential setting. Two functions generate the stagewise boundary
plots (figureE
, figureEF
); and one function (oc
) examines the
operating characteristics of a given GSD in equivalence studies;
including the empirical Type I error rate, empirical power, expected
sample size, and the probabilities of stopping at interim looks due to
equivalence or futility.
Function | Description |
---|---|
equivonly |
computes equivalence boundaries for GSD in equivalence studies that only stop for equivalence |
nonbinding |
computes equivalence and non-binding futility boundaries for GSD in equivalence studies |
binding |
computes equivalence and binding futility boundaries for GSD in equivalence studies |
nminmax |
calculates the sample size of an equivalence study in the sequential setting as well as the equivalence and futility boundaries (either binding or non-binding) under the calculated sample size |
nfix |
calculates the sample size of an equivalence study in the traditional non-sequential setting |
figureE |
generates the stagewise equivalence boundary plots |
figureEF |
generates the stagewise equivalence and futility boundary plots |
oc |
examines the operating characteristics of a given GSD in equivalence studies |
Table 2 lists the arguments to be supplied by users for
calculating equivalence and futility boundaries. To calculate the sample
size of an equivalence study in the traditional non-sequential setting
via nfix
, besides l
, u
, theta
, sigma
, type1
, type2
as
listed in Table 2, users also need to supply the sampling
ratio between the two groups: \(r\), and a 2-dimensional vector containing
the end-points of the interval from which the sample size will be
solved: nrange
; the default is c(0, 1000)
. To calculate the sample
size of an equivalence study in the GSD setting via nminmax
, besides
l
, u
, theta
, sigma
, t.vec
, type1
, type2
, gamma
, n.sim
as given in Table 2, users also need to specify a logical
argument binding
: whether the futility boundaries are binding, and
n1.lower
, n1.upper
, n2.lower
, n2.upper
, which represent the
lower and upper bounds of the interval from which nminmax
in groups 1
and 2 will be solved using a bisection method. The boundary plots are
generated directly from functions equivonly
, nonbinding
and
binding
by default (users can suppress the plots by specifying
binding = FALSE
), or they can be generated by functions figureE
and
figureEF
by taking the boundary outputs from equivonly
,
nonbinding
, binding
, nminmax
as their input.
Argument | Description |
---|---|
l |
the lower equivalence bound as given in the equivalence hypothesis |
u |
the upper equivalence bound as given in the equivalence hypothesis |
theta |
the true mean difference between 2 groups |
sigma |
between-subject standard deviation of the response variable \(y\) for two independent samples; and within-subject SD of \(y\) for paired samples. |
n1 |
size of group 1 |
n2 |
size of group 2 |
t.vec |
cumulative interim look time points assuming a constant accrual rate. For example, if a study has equally spaced 4 looks, then t.vec = 1:4/4 . t.vec can be any vector as long as it is monotonically increasing and the last element is 1 |
type1 |
overall Type I error rate |
type2 |
overall Type II error rate |
gamma |
the gamma parameter in the gamma cumulative error spending function. gamma is a scalar for equivonly , and a 2-dimensional vector for binding, nonbinding, nminmax ; it can be any value; default is \(-4\), which produces O’Brien-Fleming type error spending function. |
crange |
a 2-dimensional vector containing the end-points of the interval from the equivalence boundaries will be solved; default is c(-10, 10) . |
drange |
a 2-dimensional vector containing the end-points of the interval from which the futility boundaries will be solved; default is c(-10, 10) . |
force |
whether to force the futility boundaries to equal to the equivalence boundaries in the last look; default is force = TRUE . |
plot |
whether to generate the boundaries plot. Default plot = TRUE . |
ll |
a parameter in the boundary plot; the short arm of the \(t(L)\) and \(t(U)\) axes |
ul |
a parameter in the boundary plot; the long arm of the \(t(L)\) and \(t(U)\) axes |
n.sim |
number of randomly simulated samples in the MC computation of the boundaries; default n.sim \(=10^5\) |
seed |
seed used in the MC computation. If non-specified, the seed is set randomly. |
As for the error spending function employed by gset, the gamma error spending function as introduced by Hwang et al. (1990) is used on both Type I and Type II error. The function, using Type I error rate \(\alpha\) as an example, is \(f(t)=\alpha(1-e^{-t\gamma})(1-e^{-\gamma})^{-1}\) if \(\gamma \ne0\) or \(\alpha t\) if \(\gamma=0\). The values of \(\gamma\) can be different for \(\alpha\) and \(\beta\) spending. The error spending function is versatile and yield a wide range of spending patterns by varying \(\gamma\) on a continuous scale. For example, \(\gamma =-4\) corresponds to more conservative spending at early stages and gradually become liberal toward the end (producing O’Brien-Fleming like boundaries), and \(\gamma =1\) corresponds to more spending at early stages than the later stages (producing Pocock-like boundaries).
In this section, we illustrate the implementation of package gset with
3 examples. In all examples, the base design is the complete crossover
design, there are 4 looks of the GSD in each example. The equivalence
bounds in the hypothesis are \((L, U)=(-0.2, 0.2)\), the assumed true
different \(\theta=0\), the within-subject standard deviation
\(\sigma=0.4\), and the overall Type I and Type II error rates are \(0.05\)
and \(0.2\), respectively. The O’Brien-Fleming type error spending
function is used for both the Type I and Type II error rates
(\(\gamma = -4\)), which is the default. Since the critical values and the
sample size in gset are calculated via the MC simulation, there will
be some MC errors in the results. In other words, the results will be
slightly different when re-running the commands. To decrease the MC
errors, the number of simulations can be increased; the default n.sim
is \(10^5\). Users can fix the random effect using set.seed
to have
their results reproduced in different runs.
library(gset)
#### specify the parameters
<- -0.2
L <- 0.2
U <- 0
theta <- 0.4
sigma <- 0.05
alpha <- 0.2
beta <- 4
K <- 1
r #### sample size in the non-sequential setting
<- nfix(r, L, U, theta, sigma, alpha, beta) n.fix
The output is given below. The sample size is 69 for a non-sequential crossover design.
$n1
1] 69
[$n2
1] 69 [
Example 1: If a study considers only stopping for equivalence, then the following command computes the equivalence boundaries and generates the boundary plots.
<- equivonly(L, U, sigma, n.fix$n1, n.fix$n2, 1:K/K, alpha, beta)
bound1#### the boundary plot can be regenerated by using figureE(bound1, K)
The output is given below. It contains the cumulative Type I error spending and the equivalence critical values at each look. If \(T(L)>1.858\) and \(T(U)<-1.858\) at the first interim look of a GSD, the study stops for equivalence; otherwise, the study continues. The usage of the critical values at other looks are similar. The boundary plots are given in Figure 1.
$typeI
1] 0.001602930 0.005960146 0.017804287 0.050000000
[$equivL
1] 1.857541 2.151003 2.212045 1.739736
[$equivU
1] -1.857541 -2.151003 -2.212045 -1.739736 [
The operating characteristics of the GSD with the calculated equivalence
boundaries can be investigated by applying commands
oc(L, U, theta = L, sigma, K, 69, 69, bound1, futility = FALSE)
(under
\(H_0\)) and
oc(L, U, theta = 0, sigma, K, 69, 69, bound1, futility = FALSE)
(under
\(H_1\)). The outputs are not provided due to space limitation.
Example 2: If a study considers stopping for equivalence and
futility but would like to have the flexibility to continue even if the
futility boundaries are crossed at an interim look, then the following
commands can be used to get the critical values for a GSD with
non-binding futility. By default, the futility boundaries in the last
step are forced to equal to the equivalence boundaries; users can use
argument force = FALSE
to remove the constraint.
<- nonbinding(L, U, theta, sigma, n.fix$n1, n.fix$n2, 1:K/K, alpha, beta)
bound2 ### the boundary plot can be regenerated by using figureEF(bound2, K)
The output is given below. It contains the cumulative error spending and
the equivalence and non-binding futility boundaries at each look. Note
that the equivalence critical values with non-binding futility in theory
should be the same as the equivalence critical values in a study with
equivalence boundaries only. Comparing the results from equivonly
above, we can see that the critical values are close but not exactly the
same due to the MC errors. If the same random seed had been used, the
results would have been the same.
$typeI
1] 0.001602930 0.005960146 0.017804287 0.050000000
[$typeII
1] 0.006411721 0.023840584 0.071217148 0.200000000
[$equivL
1] 1.829343 2.160757 2.197812 1.721998
[$equivU
1] -1.829343 -2.160757 -2.197812 -1.721998
[$futilL
1] -1.2607491 -0.2314749 0.6712118 1.7219983
[$futilU
1] 1.2607491 0.2314749 -0.6712118 -1.7219983 [
The boundary plots are given in Figure 2. Since \(T(L)>T(U)\), the region above the identity line \(T(U)=T(L)\) is impossible. In this particular example, the futility critical values at the \(1^{rd}\) and \(2^{nd}\) interim looks appear above the identity line (in the \(2^{nd}\) quadrant, specifically), implying that the trial cannot stop for futility in the first two looks.
The operating characteristics of the GSD with the calculated equivalence
and futility boundaries can be investigated by applying commands
oc(L, U, theta = L, sigma, K, 69, 69, bound2, futility = TRUE)
(under
\(H_0\)) and
oc(L, U, theta = 0, sigma, K, 69, 69, bound2, futility = TRUE)
(under
\(H_1\)). The outputs are given in Table 3. The empirical Type
I error (0.048) is maintained at the nominal level (5%). At \(n=69\), the
study is a little underpowered (empirical power = 78.7%, below the
desired level 80%). The expected sample size is smaller under \(H_1\)
compared to that under \(H_0\) but close, and both are smaller than the
fixed sample size \(n=69\). The probability of stopping due to equivalence
at look 3 when \(H_1\) is true is large (25.4%), but not at other looks.
This is partly due to the conservative spending of the Type I error rate
in early stages of the O’Brien-Fleming type error spending, making it
harder to reject early, especially with the slight under-power.
under \(H_0\) | under \(H_1\) |
---|---|
|
|
|
|
under \(H_0\)
$reject.rate
1] 0.0479
[$accept.rate
1] 0.9521
[$En1
1] 66.9
[$En2
1] 66.9
[$prob.stop
1] 0.008025 0.022650 0.055450 0.913875
[$prob.stopE
1] 0.002075 0.004375 0.010900 0.030550
[$prob.stopF
1] 0.005950 0.018275 0.044550 0.883325 [
under \(H_1\)
$reject.rate
1] 0.787275
[$accept.rate
1] 0.223075
[$En1
1] 62.3
[$En2
1] 62.3
[$prob.stop
1] 0.011450 0.052500 0.299125 0.636925
[$prob.stopE
1] 0.005325 0.034550 0.253925 0.493475
[$prob.stopF
1] 0.006125 0.017950 0.045200 0.153800 [
To calculate the sample size that yields the desirable power (80%), the following command can be used. The equivalence boundaries should remain the same (except than some MC numerical errors) as those under \(n=69\), but the futility boundaries would alter.
<- nminmax(L, U, theta, sigma, 69, 69, 1:K/K, alpha, beta) bound3
The outputs are given below.
$n1minmax
1] 75
[$n2minmax
1] 75
[$typeI
1] 0.001602930 0.005960146 0.017804287 0.050000000
[$typeII
1] 0.006411721 0.023840584 0.071217148 0.200000000
[$equivL
1] 1.852941 2.193359 2.208099 1.729116
[$equivU
1] -1.852941 -2.193359 -2.208099 -1.729116
[$futilL
1] -1.2006119 -0.1403367 0.8067496 1.7291157
[$futilU
1] 1.2006119 0.1403367 -0.8067496 -1.7291157 [
The new sample size is 75. The boundary plot, which can be obtained
using figureEF(bound3, K)
, is similar to Figure 2, and not
provided due to space limitation. The operating characteristics of the
GSD with \(n=75\) can be also investigated by applying the following
commands
oc(L, U, theta = L, sigma, K, 75, 75, bound3, futility = FALSE)
(under
\(H_0\)) and oc(L, U, theta = 0, sigma, K, 75, 75,
bound3, futility = FALSE)
(under \(H_1\)). Due to space limitation, the
output is not shown. The empirical power with \(n=75\) now increases to
82.7%. The probabilities of stopping due to equivalence in early stages
also increase under \(H_1\). The empirical Type I error rate remains
controlled at 0.05 under \(H_0\).
Example 3: If a study plans to stop the study whenever the
equivalence or the flexibility boundaries are crossed at an interim
look, then the following command can be used. By default, the futility
boundaries in the last step are forced to equal to the equivalence
boundaries; users can use argument force = FALSE
to remove the
constraint.
<- binding(L, U, theta, sigma, n.fix$n1, n.fix$n2, 1:K/K, alpha, beta)
bound4 ### the boundary plot can be regenerated by using figureEF(bound4, K)
The output is given below. The equivalence critical values (binding futility) are different from those from studies with nonbinding futilities (Example 2). The boundary plots are given in Figure 3. The futility critical values at the first and second interim looks appear above the identity line in this example, and the trial cannot stop for futility in the first two looks.
$typeI
1] 0.001602930 0.005960146 0.017804287 0.050000000
[$typeII
1] 0.006411721 0.023840584 0.071217148 0.200000000
[$equivL
1] 1.806721 2.155557 2.220632 1.730838
[$equivU
1] -1.806721 -2.155557 -2.220632 -1.730838
[$futilL
1] -1.2525572 -0.2457909 0.6858693 1.7308381
[$futilU
1] 1.2525572 0.2457909 -0.6858693 -1.7308381 [
The operating characteristics of the GSD with the equivalence and
binding futility boundaries can be investigated by applying commands
oc(L, U, theta = L, sigma, K, 69, 69, bound4, futility = TRUE, binding = TRUE)
(under \(H_0\)) and
oc(L, U, theta = 0, sigma, K, 69, 69, bound4, futility = TRUE, binding = TRUE)
(under \(H_1\)). The output is given in Table 4. The empirical
Type I error (0.049) is maintained at the nominal level (5%). At \(n=69\),
the GSD study is a bit underpowered (empirical power = 76.8%, below the
nominal level 80%). It is interesting to note that the expected sample
size is much smaller if \(H_0\) is true compared to that under \(H_1\),
which is exactly the opposite than the non-binding case though the
difference is not as dramatic. This is because the study will have to
stop if the futility boundaries are crossed under \(H_0\).
under \(H_0\) | under \(H_1\) |
---|---|
|
|
|
|
under \(H_0\)
$reject.rate
1] 0.0491
[$accept.rate
1] 0.9509
[$En1
1] 46.7
[$En2
1] 46.7
[$prob.stop
1] 0.114075 0.306725 0.354850 0.224350
[$prob.stopE
1] 0.002075 0.004200 0.011050 0.031775
[$prob.stopF
1] 0.112000 0.302525 0.343800 0.192575 [
under \(H_1\)
$reject.rate
1] 0.76775
[$accept.rate
1] 0.23225
[$En1
1] 61.9
[$En2
1] 61.9
[$prob.stop
1] 0.011675 0.052475 0.275500 0.660350
[$prob.stopE
1] 0.005325 0.036075 0.227675 0.498675
[$prob.stopF
1] 0.006350 0.016400 0.047825 0.161675 [
To calculate the sample size that yields the desirable power (80%) in Example 3, the following command can be used. The equivalence boundaries should remain the same (except for some MC numerical errors) as those under \(n=60\), but the futility boundaries would alter.
<- nminmax(L, U, theta, sigma, 69, 69, 1:K/K, alpha, beta, binding = TRUE) bound5
The outputs are given below. The new sample size is 73. The boundary
plot, which can be obtained using figureEF(bound5, K)
is not provided
due to space limitation. The operating characteristics of the GSD with
\(n=73\) can be investigated by applying commands
oc(L, U, theta = L, sigma, K, 73, 73, bound5, futility = FALSE, binding = TRUE)
(under \(H_0\)) and
oc(L, U, theta = 0, sigma, K, 73, 73, bound5, futility = FALSE, binding = TRUE)
(under \(H_1\)). Due to space limitation, the output is not shown. The
empirical power with \(n=75\) increases to 80.3%.
$n1minmax
1] 73
[$n2minmax
1] 73
[$typeI
1] 0.001602930 0.005960146 0.017804287 0.050000000
[$typeII
1] 0.006411721 0.023840584 0.071217148 0.200000000
[$equivL
1] 1.881127 2.187860 2.217254 1.716097
[$equivU
1] -1.881127 -2.187860 -2.217254 -1.716097
[$futilL
1] -1.2035253 -0.1731546 0.7543735 1.7160975
[$futilU
1] 1.2035253 0.1731546 -0.7543735 -1.7160975 [
We have introduced the R package gset that computes the stagewise critical values and sample size for testing equivalence hypothesis in GSDs. We outlined the underlying theory and computation approach that gset is based on, and illustrated the usage of the package with several GSD examples. gset can compute the critical values for GSDs that stop only for equivalence, or stop for equivalence and futility – either binding or non-binding. It also produces 2-dimensional boundary plots which give a direct visualization of the stagewise stopping boundaries. The operating characteristics of the a proposed GSD can be examined in gset via the computation of empirical Type I error rates, empirical power, stopping probabilities at the interim looks, and expected sample sizes.
gset is the first package in R that targets specifically at the GSD with equivalence hypothesis. Furthermore, it is based on the exact bivariate \(t\) test statistics, making it a fitting choice for GSDs with small (stagewise) sample size cases and small stagewise Type I error rates. gset is based on the traditional GSD framework and it does not accommodate sample size re-estimation during a study when interim data becomes available, a feature of “flexible” GSDs. gset assumes the analyzed variables follow a normal distribution. If there is severe deviation from normality, transformation can be applied before applying the package.
gsDesign, GroupSeq, Hmisc, PwrGSD, AGSDest, clinfun
Bayesian, ClinicalTrials, Databases, Econometrics, ExperimentalDesign, MissingData, ReproducibleResearch, Survival
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Liu, "gset: An R Package for Exact Sequential Test of Equivalence Hypothesis Based on Bivariate Non-Central t-Statistics", The R Journal, 2015
BibTeX citation
@article{RJ-2014-033, author = {Liu, Fang}, title = {gset: An R Package for Exact Sequential Test of Equivalence Hypothesis Based on Bivariate Non-Central t-Statistics}, journal = {The R Journal}, year = {2015}, note = {https://rjournal.github.io/}, volume = {6}, issue = {2}, issn = {2073-4859}, pages = {174-184} }