Besides the type 1 and type 2 error rate and the clinically relevant effect size, the sample size of a clinical trial depends on so-called nuisance parameters for which the concrete values are usually unknown when a clinical trial is planned. When the uncertainty about the magnitude of these parameters is high, an internal pilot study design with a blinded sample size recalculation can be used to achieve the target power even when the initially assumed value for the nuisance parameter is wrong. In this paper, we present the R-package blindrecalc that helps with planning a clinical trial with such a design by computing the operating characteristics and the distribution of the total sample size under different true values of the nuisance parameter. We implemented methods for continuous and binary outcomes in the superiority and the non-inferiority setting.
Determining the sample size that is necessary to achieve a certain
target power is a fundamental step in the planning phase of every
clinical trial. The sample size depends on the type 1 error rate
Consider, as an example, the meta analysis by Nakata et al. (2018) that compares minimally invasive preservation with splenectomy during distal pancreatectomy. Among others, the overall morbidity of the two groups is compared and data from 13 studies are reported (cf. Figure 2(c) in Nakata et al. (2018)). Within these 13 studies, overall morbidity rates pooled over both groups between 0.10 and 0.62 are reported. This illustrates the high uncertainty about the “true” overall morbidity rate, which is the nuisance parameter in this setting.
In these cases, an internal pilot study design with blinded sample size recalculation can be used. In such a design, the nuisance parameter is estimated in a blinded way (i.e., without using information about the group assignment of the patients) after a certain number of outcome data is available, and the sample size is recalculated using this information (Wittes and Brittain 1990). While in principle blinded sample size recalculation could be done without any a priori sample size calculation, it is still advisable to calculate an initial sample size based on the best guess for the nuisance parameter available in the planning phase and to determine when to recalculate the sample size based on this initial calculation. This is done to avoid conducting the recalculation too early (so that there is still a great uncertainty about the magnitude of the nuisance parameter when recalculation is performed), or too late (so that there may be no room for adjusting the sample size any longer as the recalculated sample size is already exceeded). Using this method to recalculate the sample size is an attractive option because the cost in terms of additional sample size is very small (depending on the outcome) and in most scenarios the type 1 error rate is unaffected by the blinded sample size recalculation. Hence, whenever there is uncertainty about the value of a nuisance parameter and the logistics of the trial allow it, blinded sample size recalculation can be used. Meanwhile, this is even recommended by regulatory authorities. For instance, the Committee for Medical Products for Human Use (CHMP) (2006) states that “(w)henever possible, methods for blinded sample size reassessment (…) should be used”.
Methods to reassess the sample size in a blinded manner in an internal pilot study design have been developed for a variety of outcomes. Based on the early work by Stein (1945), Wittes and Brittain (1990) introduced the internal pilot study design for continuous outcomes. Their work was extended in different manners by different authors (cf. among others Birkett and Day (1994), Denne and Jennison (1999), and Kieser and Friede (2000)). In all these papers, the main task is to re-estimate the variance of a continuous outcome in a blinded way. These ideas can be applied to binary outcomes as well where the re-estimated nuisance parameter is the overall response rate over both treatment arms. Associated methods were, for instance, presented by Gould (1992) and Friede and Kieser (2004) for superiority trials and by Friede et al. (2007) for non-inferiority trials.
However, despite the clear benefit of a blinded sample size
recalculation and a great number of publications on that topic,
blindrecalc is to the knowledge of the authors the first R-package on
CRAN, and thus a freely available software, that helps with the planning
of a clinical trial with such a design by computing the operating
characteristics and the distribution of the total sample size of the
study. The package can be used for pre-planned and midcourse implemented
blinded sample size reassessments in order to evaluate the potential
scenarios the blinded sample size re-estimation may imply. For
continuous outcomes, we implemented the
The structure of the paper is as follows: In the Statistical methods section, we explain the general way of proceeding when conducting a trial with an internal pilot study and how to obtain a blinded estimate of the nuisance parameter for continuous and binary outcomes. The structure of the package is introduced in Package structure. We demonstrate how blindrecalc can be utilized to plan a trial with an internal pilot study design and blinded sample size recalculation in Usage and example. In Development principles, we outline the principles of the development process and how we ensure the quality of our code. Finally, a brief Conclusion complements this paper.
The general procedure for planning and conducting a trial with a blinded
sample size recalculation is as follows: At first, an initial sample
In the following, we shortly introduce the implemented tests and how to obtain a blinded estimate of the nuisance parameter in each case.
Assume a clinical two-arm trial with normally distributed outcomes where
a higher value is deemed to be favorable, with mean values
In this framework, the nuisance parameter is the unknown variance
In the superiority case, i.e., if
Interestingly, the cost of this procedure in terms of sample size is
quite low. Since the one-sample variance estimate slightly overestimates
the variance, an increase in sample size arises. However, this increase
amounts to only 8 patients with
Lu (2016) gives closed formulas for the exact distribution of the test
statistic of the two-sample
In a superiority trial with binary outcomes where a higher response
probability is assumed to be favorable, the one-sided null and
alternative hypothesis are
It is well known that the chi-squared test in a fixed design does not maintain the nominal significance level, hence the same can be expected for a chi-squared test with a blinded sample size recalculation. In fact, Friede and Kieser (2004) showed that the actual levels of the test with and without recalculating the sample size are very close.
In a non-inferiority trial, the null and alternative hypothesis are
Like the chi-squared test, the Farrington-Manning test is also no exact
test and can exceed the nominal significance level. Friede et al. (2007) showed
that in general no further inflation of the type 1 error rate is caused
by blinded re-estimation of the sample size. Nevertheless, it is
possible for the chi-squared test as well as for the Farrington-Manning
test to choose the nominal significance level smaller than
When a clinical trial with an internal pilot study is planned, it is
essential to know the characteristics of the applied design. To this
end, the performance in terms of achieved power levels, type I error
rates, and sample size distribution has to be known for different values
of the nuisance parameter and the first-stage sample size
blindrecalc makes use of R’s S4 class system. This allows the application of the same methods for different design classes and facilitates the usage of the package. Furthermore, this approach makes the package easily extendable without any changes in the current source code.
The usage of blindrecalc is intended to be as intuitive as possible.
To obtain characteristics of a blinded sample size recalculation
procedure, two steps have to be made. At first, the user has to define a
design object to indicate which test and which characteristics such as
the desired type 1 and type 2 error rates are to be applied. To this
end, the three functions setupChiSquare
, setupFarringtonManning
, and
exist to define a design object of the class
corresponding to the respective test.
Secondly, the trial characteristic of interest can be calculated.
Currently, the following methods are implemented: The method toer
allows the computation of the actual type 1 error rate for different
values of the nuisance parameter and the sample size of the internal
pilot study. By means of adjusted_alpha
, the adjusted significance
level can be calculated that can be applied as nominal significance
level when strict type 1 error rate control is desired. The method pow
computes the achieved power of the design under a given set of nuisance
parameters or internal pilot sample sizes. With n_fix
, the sample size
of the corresponding fixed design can be computed. Finally, the method
provides plots and summaries of the distribution of the sample
size. For all these methods (except for n_dist
), the logical parameter
allows to define whether a fixed design or a design with
blinded sample size recalculation is analyzed.
For each test, there is a setup
function (e.g., setupChiSquare
the chi-squared test) that creates an object of the class of the test.
Each setup
function takes the same arguments:
: The one-sided type 1 error rate.beta
: The type 2 error rate.r
: The allocation ratio between experimental and control group,
with a default of
: The difference in effect size between alternative and null
: Whether the alternative hypothesis contains greater
(default) or smaller values than the null.n_max
: The maximal total sample size, with a default value of
.In this example, the nuisance parameter is the overall response rate
design <- setupChiSquare(alpha = 0.025, beta = 0.2, delta = 0.2)
The sample size for a fixed design given one or multiple values of the
nuisance parameter (argument nuisance
) can then be calculated with the
function n_fix
n_fix(design, nuisance = c(0.2, 0.3, 0.4, 0.5))
#> [1] 124 164 186 194
The function toer
calculates the actual level of a design with blinded
sample size recalculation or of a fixed design (logical argument
) given either one or more values of the total sample
size in a fixed or the sample size for the first stage in a
recalculation design (argument n1
) or one or more values of the
nuisance parameter. Note that all functions are only vectorized in one
of the two arguments n1
and nuisance
. In this example, it is assumed
that the internal pilot study contains half of the fixed sample size
that would be needed if the overall response rate
n <- n_fix(design, nuisance = 0.2)
p <- seq(0.1, 0.9, by = 0.01)
toer_fix <- toer(design, n1 = n, nuisance = p, recalculation = FALSE)
toer_ips <- toer(design, n1 = n/2, nuisance = p, recalculation = TRUE)
In Figure 1, the type 1 error rate in dependence of the
nuisance parameter is depicted for the designs with and without sample
size recalculation. Note that, as mentioned in Section Binary
outcomes, the level of significance exceeds the
pre-defined level of adjusted_alpha
be used to to calculate an adjusted significance level, such that the
nominal significance level is preserved.
adj_sig <- adjusted_alpha(design, n1 = n/2, nuisance = p, precision = 0.0001,
recalculation = TRUE)
design@alpha <- adj_sig
toer_adj <- toer(design, n1 = n/2, nuisance = p, recalculation = TRUE)
In this example, the adjusted significance level equals 0.0232 for the
trial with internal pilot study, i.e., using this value as nominal level
ensures that the actual significance level does not exceed
In the setting of binary outcomes, adjusting the level such that the
nominal type 1 error rate is protected for any realization of the
nuisance parameter in its domain gamma
To calculate the power of either the internal pilot study design or the
fixed design, the function pow
can be used. Again, the function is
vectorized in either n1
or nuisance
. This function can be used to
compare the power values of the two designs under different actual
values of the nuisance parameter.
pow_fix <- pow(design, n1 = n, nuisance = p, recalculation = FALSE)
pow_ips <- pow(design, n1 = n/2, nuisance = p, recalculation = TRUE)
As we can see in Figure 2, the power achieved by the internal pilot study design is very close to the target power of 0.8 in most cases. Only when the overall response rate is very close to 0 or 1, the power is exceeded. On the other hand, the fixed design is much more sensitive to the actual value of the nuisance parameter and the actual power can either be way too large or way too small if the sample size was calculated under wrong assumptions.
Finally, the distribution of the total sample size can be computed under
different assumptions on the nuisance parameter with the function
. This is particularly useful for the planning of internal pilot
study designs since it allows the investigation of what could happen in
a certain clinical trial and helps the applicant to prepare for
different scenarios.
p <- seq(0.2, 0.8, by = 0.1)
n_dist(design, n1 = n/2, nuisance = p, plot = TRUE)
#> p = 0.2 p = 0.3 p = 0.4 p = 0.5 p = 0.6 p = 0.7 p = 0.8
#> Min. 62.000 78.0000 124.0000 158.0000 124.0000 78.0000 62.000
#> 1st Qu. 108.000 152.0000 182.0000 196.0000 182.0000 152.0000 108.000
#> Median 124.000 170.0000 192.0000 198.0000 192.0000 170.0000 124.000
#> Mean 125.021 164.6057 188.4801 196.5075 188.4801 164.6057 125.021
#> 3rd Qu. 138.000 178.0000 196.0000 200.0000 196.0000 178.0000 138.000
#> Max. 190.000 200.0000 200.0000 200.0000 200.0000 200.0000 190.000
By default, n_dist
prints a summary of the sample size distribution
for each nuisance parameter. With plot = TRUE
, a series of boxplots is
drawn (cf. Figure 3). Since the maximum sample size is
obtained if the overall response rate is estimated to be 0.5 in the
sample size recalculation, this maximum can occur under any true value
of the nuisance parameter (except for 0 and 1), albeit with very small
probability. For this reason, sample sizes that occur with a probability
of less than 0.01% are ignored. This is not the case for continuous
outcomes since there, the sample size distributions are determined by
For continuous outcomes, i.e., the (shifted) iters
, defining the number of simulation iterations, and
, the random seed for the simulation.
The utilization of R’s object-oriented programming capabilities implies
that the example that was presented for the chi-squared test could very
similarly be applied to the Farrington-Manning test or the
All calculations for binary outcomes are exact and require nested for-loops. Since for-loops are known to be very slow in R, all computation-intensive functions for the chi-squared test and the Farrington-Manning test are implemented in C++ via the Rcpp package (Eddelbuettel and François 2011) to speed up the calculations significantly.
blindrecalc is developed open-source on The entire source code can be found at This allows anyone to contribute to blindrecalc and, furthermore, provides maximal transparency. To ensure a certain quality of the provided code, blindrecalc is checked by unit tests using the package covr (Hester 2020). The unit tests compare numbers for the sample size, type 1 error rate, and power calculated with blindrecalc with numbers from peer-reviewed publications and, furthermore, check the technical functionality of the package such as vectorization and display of error messages. Thus, the unit tests do not only monitor the technical accuracy of the package’s results but also their content-related correctness. The current version blindrecalc 0.1.3 achieves a code coverage of 100%, i.e., each line of the source code is checked by at least one unit test.
In this paper, we introduced the R-package blindrecalc that can be used to plan clinical trials with a blinded sample size recalculation in an internal pilot study design when either continuous or binary outcomes in a superiority or non-inferiority setting are of interest. We introduced the basic methodology of internal pilot studies and explained how the package can be used to calculate the operating characteristics of a trial with such a design.
The scope of blindrecalc can simply be extended due to its modular character. Blinded sample size recalculation can be applied to many different types of clinical trials. For instance, there exists research on further kinds of outcomes (e.g., see Friede and Schmidli (2010) for count data) or on different study designs (e.g., see Golkowski et al. (2014) for bioequivalence trials). The implementation of internal pilot studies for such cases in blindrecalc is an exciting area of future work.
The first two authors contributed equally to this manuscript.
HighPerformanceComputing, NumericalMathematics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Baumann, et al., "blindrecalc - An R Package for Blinded Sample Size Recalculation", The R Journal, 2022
BibTeX citation
@article{RJ-2022-001, author = {Baumann, Lukas and Pilz, Maximilian and Kieser, Meinhard}, title = {blindrecalc - An R Package for Blinded Sample Size Recalculation}, journal = {The R Journal}, year = {2022}, note = {}, volume = {14}, issue = {1}, issn = {2073-4859}, pages = {137-145} }