The cluster randomized trial (CRT) is a randomized controlled trial in which randomization is conducted at the cluster level (e.g., school or hospital) and outcomes are measured for each individual within a cluster. Often, the number of clusters available to randomize is small (
Cluster randomized trials (CRTs) randomize clusters of individuals, such as schools, hospitals or clinics (Brown and Li 2015). The CRT design is chosen when there are concerns of treatment contamination, when it is logistically easier to conduct the trial using cluster randomization and when intervention of interest is delivered at the group level (Turner et al. 2017a). CRTs have been used in many disciplines including social sciences, public policy, medicine and implementation science (Hayes and Moulton 2009).
In this paper, we focus on the two-arm parallel cluster randomized
trial. Usually, there are a total of
Several design strategies are available to avoid reliance only on statistical adjustment that accounts for baseline covariate imbalance in the analysis phase. Two most popular ones are pair-matching and stratification (Ivers et al. 2012), both of which are examples of restricted randomization. Matching pairs of clusters according to similarity in the baseline covariate profile (e.g., location), and performs randomization within each pair. Stratification is similar to matching but instead of only considering pairs of clusters, the procedure forms strata of 2 or more clusters where each stratum includes clusters with similar baseline covariate profiles. As with matching, the clusters within each stratum are then randomized into the two arms and, when there are an even number of clusters in each stratum, there is perfect balance of strata across treatment and control arms. There are several limitations of these procedures. The power of a pair-matched study might decrease due to a small number of pairs and a small correlation between the matching covariates and the outcome (Diehr et al. 1995). Loss to follow-up of a single cluster may require the exclusion of the matched pair in the analysis, and reduces study power (Ivers et al. 2012). In addition, the intracluster correlation coefficient is not easy to compute from the matched pairs (Klar and Donner 1997; Donner and Klar 2004; Campbell et al. 2012). In a CRT with a small number of clusters, stratified randomization is only possible with a small number of stratification covariates. Otherwise, a single cluster might be in a stratum and will cause an imbalance between the arms (Ivers et al. 2012). Given that CRTs often identify and recruit all clusters at the start of the trial, minimization, a common restricted randomization for individually randomized trials, is rarely applicable to CRTs. To deal with these limitations especially with a small number of clusters and more than a few baseline covariates to balance, alternative restricted randomization methods are necessary.
Covariate-constrained randomization is an alternative restricted randomization procedure (Raab and Butcher 2001; Ivers et al. 2012). Unlike matching and stratification, covariate-constrained randomization uses a measure called a balance metric to quantify the difference in mean covariate values between the two arms for a given randomization scheme across all baseline covariates that we wish to balance. The simple randomization space is then constrained by keeping the subset of randomization schemes with which covariates are considered sufficiently balanced by the balance metric. A final scheme is then selected from this constrained space, and tends to exhibit better baseline balance on average than a scheme randomly selected without constraints. Compared with pair-matching and stratification, covariate-constrained randomization may be preferred due to its capacity to accommodate multiple covariates, both categorical and continuous. Further, the ICC calculation remains unaffected under constrained randomization.
Although covariate-constrained randomization is a promising design
strategy for CRTs, it is not commonly used in practice. One possible
reason is that it requires more programming than simple randomization,
pair-matching or stratification. Therefore, to facilitate its
implementation in the design and analysis of cluster randomized trials,
we have developed the cvrall
and cvrcov
functions in the
cvcrand package. The
cvrall
function performs constrained randomization based on covariate
balance measured by a scalar balance metric and can assign weights to
reflect the relative importance of candidate covariates. The cvrcov
function performs constrained randomization based on multivariate
balance defined through each single covariate, similar to the routine
provided in Greene (2017).
From an analysis perspective, when a CRT is designed using
covariate-constrained randomization, this design feature should be
reflected in the analysis of the individual-level outcome data collected
during the trial (Li et al. 2016, 2017). To do so, a
permutation-based approach can be used. The permutation test, discussed
in Gail et al. (1996), should account for the variability within the
constrained randomization space. In other words, the resulting clustered
permutation test obtains the p-value for the treatment effect by
referencing the observed test statistics to the permutation distribution
within the constrained space. We provide the cptest
function in the
cvcrand package to facilitate the implementation of this permutation
test.
To demonstrate the utility of the cvcrand package, we describe the
concepts of covariate-constrained randomization and the clustered
permutation test using an example of a real cluster randomized trial
presented in Dickinson et al. (2015). This CRT aims to compare a
collaborative centralized reminder approach with a practice-based
reminder approach for increasing the immunization rate in children aged
19 to 35 months from
Variable name | Variable description |
---|---|
location |
location (“rural” or “urban”) |
inciis |
percentage of children aged 19-35 months in the Colorado Immunization Information System (CIIS) |
numberofchildrenages1935months |
number of children aged 19-35 months |
uptodateonimmunizations |
percentage of up-to-date on immunizations |
africanamerican |
percentage of African American |
hispanic |
percentage of Hispanic ethnicity |
income |
average income ($) |
incomecat |
category of average income (“low”, “medium”, and “high”) |
pediatricpracticetofamilymedicin |
pediatric practice-to-family medicine practice ratio |
communityhealthcenters |
number of community health centers |
Covariate-constrained randomization, henceforth referred to simply as constrained randomization, is a promising balancing technique for cluster randomized trials (CRTs), especially for those with a limited number of clusters (Hayes and Moulton 2009). Constrained randomization usually involves the following steps: (i) specifying the baseline covariates that one wishes to balance; (ii) enumerating all possible randomization schemes or randomly simulating a large number of randomization schemes within the simple randomization space (duplicates are removed if the schemes are randomly simulated); (iii) retaining a constrained randomization space with a subset of schemes where sufficient balance across baseline covariates is achieved according to some pre-specified balance metric; (iv) randomly selecting a scheme from the constrained randomization space for implementation.
Stratification can be viewed as a special case of constrained randomization. For instance, we could consider stratifying on a single binary baseline covariate, geographic location (rural or urban), for the immunization trial introduced previously. Suppose that 6 counties are located in the rural area and that 10 counties are located in the urban area. Stratified randomization ensures that half of the clusters in each stratum, defined by distinct values of the geographic location variable, are assigned to treatment and the rest to control. If we measure balance by the absolute differences in the average covariate values between arms, it follows that the stratified randomization space coincides with a constrained randomization space with zero balance scores when each stratum contains an even number of clusters.
Constrained randomization generalizes stratification and extends
naturally to situations where there are several, possibly continuous,
baseline covariates. The generalization is featured by defining a
balance metric accommodating multiple covariates. A balance metric gives
a quantitative assessment about the balance between the two arms for
each randomization scheme, and essentially any sensible balance metric
can be used. We first develop the cvrall
function that balances
covariates by scalar balance scores as in Raab and Butcher (2001) and
Li et al. (2016, 2017). Suppose we wish to balance
An alternative
To reflect the relative importance of different covariates, one may
specify different weights in the
Another important element of constrained randomization is the cutoff
value, which we denote by
Ideally, the cutoff value cvrall
, unless specified
otherwise by the user. Finally, we note that in our cvrall
function,
one could also specify the exact number of schemes kept in the
constrained randomization instead of the cutoff quantile value, through
the numschemes
argument.
In addition to constraining the randomization space via a scalar summary
score, we further developed the cvrcov
function to implement
constrained randomization with baseline balance defined directly through
each covariate. This covariate-by-covariate constrained randomization
places separate constraints on each covariate and ensures that the final
allocation scheme satisfies marginal balance of each covariate. In
particular, we follow the routine developed by Greene (2017) and
constrain the arm mean difference (or arm total difference) to be no
larger than a pre-specified value or a certain percentage of overall
mean (or mean arm total). The covariate-by-covariate balance allows
user-specified constraints on different covariates and is more flexible,
but simulating the constrained randomization space usually requires more
computations since the balance metric does not reduce to simple forms as
the
To better understand the constrained randomization space, we also
include a check on the randomization validity (Bailey and Rowley 1987).
Constraining the randomization may induce linkage or correlation between
clusters so that certain pairs of clusters may always be allocated to
the same arm (cluster coincidence) or never be allocated to the same arm
(cluster separation), both of which lead to loss of randomization
validity. To assess the degree of loss of validity, the cvrall
and
cvrcov
functions provide summary statistics on cluster pairs that
always or never appear together in the same arm, similar to the routine
by Greene (2017). Such descriptive statistics may inform the
appropriate selection of a constrained space.
Finally, enumerating all possible schemes in the entire simple
randomization space may be computationally demanding, when there are
quite a few clusters to randomize (e.g., more than 20). In that case,
the cvrall
and cvrcov
functions in our package will randomly
simulate a large number of randomization schemes and remove duplicates
if any. By default, this large number is set to be 50,000, unless
specified otherwise by the user through the size
option. With this
default setting, when the total number of schemes in the simple
randomization space is no greater than 50,000, the enumeration method
will be used. Otherwise, 50,000 schemes will be randomly simulated from
the simple randomization space and duplicates will be removed to
approximate the simple randomization space.
After using constrained randomization in the design of a CRT, a
permutation test can be used to test the intervention effect. We
implement the clustered permutation test used in Gail et al. (1996) and
Li et al. (2016) in the cptest
function. Specifically, we denote the
outcome of the
The permutation test is implemented in a two-step procedure. In the
first step, an outcome regression model is fitted for response
Suppose there are
cvrall
We used the cvrall
function to perform constrained randomization based
on the CRT data published in Dickinson et al. (2015). To focus ideas, we
selected five variables in Table 1 to balance in the
design stage. These variables include location
(categorical), inciis
(continuous), uptodateonimmunizations
(continuous) and hispanic
(continuous). We further considered incomecat
as a categorical
variable to illustrate the use of cvrall
in the presence of a factor
variable. Of note, the cvrall
function automatically converts the
categorical variables into dummy variables when implementing the
constrained randomization. For instance, here we categorized the
county-level covariate incomecat
into three levels based on sample
tertiles: “low”, “medium”, and “high”. Two dummy variables are then
introduced to represent these three categories. The “high" level is by
default considered as the reference level by alphanumerical order of the
first letter. Similarly, when the permutation test is executed in the
cptest
function, each categorical covariates will be transformed into
dummy variables before performing the analysis as well. It is also
important to point out that there is more than one way to define dummy
variables because any one of the levels of the categorical variable
could be chosen as the reference level. In the cvrall
function, if the
variable is not specified as a factor with a specific reference level,
we defined the reference level to be the first level by alphanumerical
order. However, if one would like to specify other reference levels, it
is possible to preprocess the data to manually create dummy variables
before invoking the cvrall
routines, or to specify the variables as
factors with the specific reference levels.
In this trial, we would like to randomize 8 counties into the arm with a
collaborative centralized reminder approach and 8 into the other arm
with a practice-based approach. So we specified ntotal_cluster = 16
and ntrt_cluster = 8
for the total number of clusters and the number
of clusters in the treatment arm. Since the total number of possible
schemes is x=
argument references the data frame of the covariates that
will be used in the calculation of balance scores and hence be balanced
by constrained randomization.
Design_result <- cvrall(clustername = Dickinson_design$county,
balancemetric = "l2",
x = data.frame(Dickinson_design[ , c("location", "inciis",
"uptodateonimmunizations", "hispanic", "incomecat")]),
ntotal_cluster = 16,
ntrt_cluster = 8,
categorical = c("location", "incomecat"),
savedata = "dickinson_constrained.csv",
bhist = TRUE,
cutoff = 0.1,
seed = 12345,
check_validity = TRUE)
Here we used the balance scores calculated by the balancemetric = "l2"
. The cateogrical variables were
specified with categorical = c("location", "incomecat")
. Location has
two levels: "rural"
and code"urban"; the level "rural"
is the
reference level. As income category is a three-level categorical
variable of "low"
, "med"
, and "high"
, the level "high"
is
considered as the reference level and 2 dummy variables were created.
Since we specified the cutoff value as cutoff = 0.1
, the constrained
randomization space only included the schemes with
We saved the constrained randomization space in a file named
dickinson_constrained.csv
in the current working directory. In this
file, the first column is an indicator variable of whether the scheme is
the final one selected by the program. The remaining columns records the
constrained randomization matrix; each column of the matrix corresponds
to a cluster, and each row of the matrix corresponds to an allocation
scheme coded by cutoff = 1
, the constrained randomization matrix has
12,870 rows and 16 columns. We provide the option to save the
constrained randomization space to a local directory so that it could be
used as an input for the permutation inference during the data analysis
stage, which usually happens at a later calendar time.
To facilitate the understanding of the constrained randomization
process, we could specify bhist = TRUE
to generate a histogram
displaying the distribution of all balance scores with a red line
indicating the cutoff value (the bscores
object,
regardless of the bhist =
option. As indicated below, the bscores
object contains the cutoff value, the balance score corresponding to the
selected scheme, and other quantiles of the balance score distribution.
> Design_result$bscores
1 score (selected scheme) 6.764
2 cutoff score 7.638
3 Mean 24.000
4 SD 15.775
5 Min 1.161
6 5% 5.826
7 10% 7.638
8 20% 10.849
9 25% 12.221
10 30% 13.840
11 50% 20.578
12 75% 31.621
13 95% 55.486
14 Max 116.656
In order to be transparent about the constrained randomization
procedure, we also included additional summary messages in the following
objects: assignment_message
, scheme_message
,
cutoff_message
and choice_message
. These objects summarize the
sample size and randomization ratio, the number of schemes used to
calculate the balance score distribution, the balance metric and cutoff
value, as well as the balance score of the selected scheme,
respectively. For example, the sample size and randomization ratio are
indicated in the following message:
> Design_result$assignment_message
[1] "You have indicated that you want to assign 8 clusters to treatment and 8 to control"
The final randomization scheme is included in the allocation
object.
In addition, we also provided a data frame containing the final
randomization scheme in the data_CR
element. The data frame includes
the covariate values for each cluster in addition to the information on
cluster allocation.
> Design_result$data_CR
arm clustername location inciis uptodateonimmunizations hispanic incomecat
1 0 1 Rural 94 37 44 Low
2 0 2 Rural 85 39 23 High
3 0 3 Rural 85 42 12 Low
4 1 4 Rural 93 39 18 High
5 1 5 Rural 82 31 6 High
6 0 6 Rural 80 27 15 Med
7 1 7 Rural 94 49 38 Low
8 0 8 Rural 100 37 39 Low
9 1 9 Urban 93 51 35 Med
10 1 10 Urban 89 51 17 Med
11 0 11 Urban 83 54 7 High
12 1 12 Urban 70 29 13 Med
13 1 13 Urban 93 50 13 High
14 0 14 Urban 85 36 10 Med
15 1 15 Urban 82 38 39 Low
16 0 16 Urban 84 43 28 Med
To assess whether the selected constrained randomization scheme balances
the baseline covariates, we provided a baseline table summarized under
the selected randomization scheme. The baseline table indicates that the
covariates are approximately balanced across the two arms, although more
“urban” clusters are assigned to the collaborative centralized reminder
approach. The baseline table is provided in the baseline_table
element, and is illustrated below.
> Design_result$baseline_table
arm = 0 arm = 1
n 8 8
location = Urban (%) 3 (37.5) 5 (62.5)
inciis (mean (sd)) 87.00 (6.59) 87.00 (8.45)
uptodateonimmunizations (mean (sd)) 39.38 (7.65) 42.25 (9.18)
hispanic (mean (sd)) 22.25 (13.77) 22.38 (12.94)
incomecat (%)
High 2 (25.0) 3 (37.5)
Low 3 (37.5) 2 (25.0)
Med 3 (37.5) 3 (37.5)
Finally, we considered the validity of the randomization and used the
check_validity =
argument to summarize the cluster coincidence
(cluster pairs assigned to the same arm) and cluster separation (cluster
pairs assigned to different arms) within the constrained space. If
check_validity = TRUE
, we could obtain the relevant descriptive
statistics in the cluster_coin_des
object. The four rows in this
object summarize the count and fraction of clusters appearing together,
as well as count and fraction of clusters appearing in the different
arms across the constrained randomization space. Recall that under
simple randomization, no linkage or correlation is introduced between
clusters and so each cluster pair has a
> Design_result$cluster_coin_des
Mean Std Dev Minimum 25th Pctl Median 75th Pctl Maximum
samecount 600.600 88.807 368.000 551.750 603.000 648.500 804.000
samefrac 0.467 0.069 0.286 0.429 0.469 0.504 0.625
diffcount 686.400 88.807 483.000 638.500 684.000 735.250 919.000
difffrac 0.533 0.069 0.375 0.496 0.531 0.571 0.714
cvrall
Of note, the cvrall
function could perform constrained randomization
with a stratification factor to ensure exact balance on that
stratification factor. We still considered the above trial example, but
now we wish to perform constrained randomization within each strata
defined by the binary location
variable. In other words, two strata of
eight counties each will be defined depending on location
, and
constrained randomization is then performed based on the additional four
covariates within each stratum. Motivated by the weighted location
and ensure exact balance on that
variable, while keeping the weights for other variables as weights = c(1000, 1, 1, 1, 1)
). Intuitively, a large weight assigned
to a covariate sharply penalizes any imbalance of that covariate,
therefore the resulting randomization space approximates the one
obtained by stratifying on location
. The example syntax is provided
below.
# Stratification on location, with constrained randomization on other
# specified covariates.
Design_stratified_result <- cvrall(clustername = Dickinson_design$county,
balancemetric = "l2",
x = data.frame(
Dickinson_design[,
c("location", "inciis",
"uptodateonimmunizations", "hispanic",
"incomecat")]),
ntotal_cluster = 16,
ntrt_cluster = 8,
categorical = c("location", "incomecat"),
weights = c(1000, 1, 1, 1, 1),
cutoff = 0.1,
seed = 12345)
Depending on the choice of cutoff
value, the above syntax may not lead
to a randomization space exactly the same as the one obtained after
stratifying on location
. The cvrall
function also allows one to
directly stratify on the location
variable using the stratify
option, as shown next. We omitted the baseline covariate table obtained
from stratified constrained randomization, but just comment that final
scheme ensures exact balance on the location
variable so that each arm
has now 4 urban counties and 4 rural counties.
# An alternative and equivalent way to stratify on location
Design_stratified_result <- cvrall(clustername = Dickinson_design$county,
balancemetric = "l2",
x = data.frame(
Dickinson_design[ ,
c("location", "inciis",
"uptodateonimmunizations", "hispanic",
"incomecat")]),
ntotal_cluster = 16,
ntrt_cluster = 8,
categorical = c("location", "incomecat"),
stratify = "location",
cutoff = 0.1,
seed = 12345)
cvrcov
We additionally provided the cvrcov
function to perform
covariate-by-covariate constrained randomization, similar to the routine
provided by Greene (2017). This approach is particularly attractive for
its flexibility in directly balancing each covariate. We still
considered our example trial where we randomized 8 counties into the
each arm for illustration. We specified ntotal_cluster = 16
and
ntrt_cluster = 8
for the total number of clusters and the number of
clusters in the treatment arm. Since the total number of possible
schemes is 50,000
), we enumerated all 12870
schemes.
As the covariate-by-covariate constrained randomization acts on the
numeric values of each variable, we transformed the values of the
location
to be numeric with "Rural"
being 1
and "Urban"
being
0
. For illustrative purposes, we also used the numeric average income
values rather than its categories in this example. The x =
argument
points to the data frame containing the covariates that will be balanced
by constrained randomization routine.
Syntax | Explanation |
---|---|
any |
no constraints, any arm means or arm totals are acceptable |
s5 |
arm totals must differ in absolute value by no more than |
sf.5 |
arm totals must differ in absolute value by no more than |
m10 |
arm means must differ in absolute value by no more than 10 |
mf0.2 |
arm means must differ in absolute value by no more than |
mf.5 |
arm means must differ in absolute value by no more than |
The cvrcov
function works the same way as the cvrall
function,
except for that the former requires additional syntax to specify the
balancing constraints for each covariate. The syntax used to balance
each covariate is the same those used in Greene (2017). Specifically,
if the first letter is specified as m
, the balancing constraint acts
on means, whereas if the first letter is s
, the balancing constraint
acts on sums or totals. If the second letter is f
, the balancing
constraint will be compared to a fractional of a population quantity
(overall mean or mean arm total), otherwise the constraint will be
compared to an actual value. A numeric constraint will follow the
specified letters and indicates the tightness of the constraint.
Additional examples are provided in Table 2.
Dickinson_design_numeric <- Dickinson_design
Dickinson_design_numeric$location = (Dickinson_design$location == "Rural") * 1
Design_cov_result <- cvrcov(clustername = Dickinson_design_numeric$county,
x = data.frame(Dickinson_design_numeric[ , c("location", "inciis",
"uptodateonimmunizations",
"hispanic", "income")]),
ntotal_cluster = 16,
ntrt_cluster = 8,
constraints = c("s5", "mf.5", "any", "mf0.2", "mf0.2"),
categorical = c("location"),
savedata = "dickinson_cov_constrained.csv",
seed = 12345,
check_validity = TRUE)
We specified constraints = c("s5", "mf.5", "any", "mf0.2", "mf0.2")
for the five covariates respectively. As indicated above, s5
indicates
that the allocation scheme should ensure that the arm totals differ in
absolute value by no more than 5. Synaxt mf.5
indicates that the
allocation scheme should ensure that the arm means differ by no more
than inciis
, among others. We saved
the resulting constrained randomization space as
dickinson_cov_constrained.csv
.
Similar to cvrall
, the cvrcov
routine included additional summary
messages in the following objects: assignment_message
and
scheme_message
. These two objects summarize the sample size and
randomization ratio, the number of schemes enumerated or simulated
before applying the constraints. In addition, a data frame containing
the selected final allocation scheme is saved in the data_CR
element
as follows.
> Design_cov_result$data_CR
arm id location inciis uptodateonimmunizations hispanic income
1 0 1 1 94 37 44 35988
2 1 2 1 85 39 23 67565
3 0 3 1 85 42 12 35879
4 0 4 1 93 39 18 63617
5 1 5 1 82 31 6 59118
6 0 6 1 80 27 15 57179
7 1 7 1 94 49 38 29738
8 1 8 1 100 37 39 37350
9 1 9 0 93 51 35 52923
10 0 10 0 89 51 17 58302
11 0 11 0 83 54 7 93819
12 0 12 0 70 29 13 54839
13 1 13 0 93 50 13 63857
14 1 14 0 85 36 10 53502
15 0 15 0 82 38 39 39570
16 1 16 0 84 43 28 52457
To evaluate whether the selected constrained randomization scheme balances the baseline covariates, we provided a baseline table summarized under the that selected randomization scheme. The baseline table indicates that the covariates are well balanced across the two arms, with an equal number of “urban" clusters assigned to each reminder approach.
> Design_cov_result$baseline_table
arm = 0 arm = 1
n 8 8
location = 1 (%) 4 (50.0) 4 (50.0)
inciis (mean (sd)) 84.50 (7.76) 89.50 (6.35)
uptodateonimmunizations (mean (sd)) 39.62 (9.44) 42.00 (7.43)
hispanic (mean (sd)) 20.62 (13.38) 24.00 (13.09)
income (mean (sd)) 54899.12 (19130.82) 52063.75 (12800.82)
The cvrcov
function permits the check of randomization validity
(Bailey and Rowley 1987), and summarizes the cluster coincidence and
separation statistics in the cluster_coin_des
object. The result
indicates that all cluster pairs appear together in the same arm at
least cvrcov
function summarizes the
information of the constrained space in the overall_allocations
and
overall_summary
objects, which are suppressed here due to limited
space. In short, the summary information informs that the there are in
total 12,870 allocations and 5,776 (
> Design_cov_result$cluster_coin_des
Mean Std Dev Minimum 25th Pctl Median 75th Pctl Maximum
samecount 2695.467 197.148 2138.000 2567.000 2720.000 2824.500 3182.000
samefrac 0.467 0.034 0.370 0.444 0.471 0.489 0.551
diffcount 3080.533 197.148 2594.000 2951.500 3056.000 3209.000 3638.000
difffrac 0.533 0.034 0.449 0.511 0.529 0.556 0.630
cptest
Since the immunization study is an ongoing trial, we used simulated
outcome data to demonstrate the clustered permutation test with the
above example where constrained randomization was performed using
cvrall
based on the 5 covariates (the selected scheme had a balance
score of 6.764). The same syntax applies to the constrained
randomization results obtained from cvrcov
and so is not considered
further here. Suppose that the researchers were able to assess 300
children in each county, and the trial is randomized according to the
selected final scheme. For illustration, we chose the covariates to be
adjusted in the test
To generate the correlated binary outcome of whether the children is
eventually up-to-date on immunizations (1
) or not (0
), we used a
generalized linear mixed model (GLMM) with a logistic link to induce
correlation by including a random intercept at the county level. The
intraclass correlation coefficient (ICC) is usually used to quantify the
degree of association between individual outcomes in a cluster (county).
We used the latent response definition of binary ICC defined by variance
components in the GLMM (Eldridge et al. 2009). The ICC was set to be 0.01,
which is a reasonable value for population health studies
(Hannan et al. 1994). The outcome variable depends on the
county-level covariates used in performing the constrained
randomization, as previously mentioned, and we simulated a treatment
effect so that the collaborative reminder approach increases up-to-date
immunization rates compared to the practice-based reminder approach
(odds ratio equals to
We performed the clustered permutation test using the cptest
function
for the binary outcome of the status of up-to-date on immunizations. As
indicated in Li et al. (2016), valid permutation test under constrained
randomization should only shuffle the treatment label within the
constrained space, and so it is important to save and input the
constrained randomization space in the design stage (the file named
dickinson_constrained.csv
). The permutation test is performed by first
regressing the outcome on the five covariates, inciis
,
uptodateonimmunizations
, hispanic
, location
, and incomecat
. As
the last two covariates are categorical, the cptest()
function creates
dummy variables and set reference levels according to alphanumerical
order, matching the steps in cvrall
. Of note, had different reference
levels been selected for the constrained randomization design procedure,
the corresponding dummy coding should be reflected in the analysis phase
when the clustered permutation test is used. We specified outcometype
to be “binary” so that logistic regression is performed to compute the
residuals. An example syntax of the function is given as follows.
Analysis_result <- cptest(outcome = Dickinson_outcome$outcome,
clustername = Dickinson_outcome$county,
z = data.frame(Dickinson_outcome[, c("location", "inciis",
"uptodateonimmunizations", "hispanic", "incomecat")]),
cspacedatname = "dickinson_constrained.csv",
outcometype = "binary",
categorical = c("location","incomecat"))
The covariates to be adjusted for in the permutation test is indicated
in the z =
option, which matches the covariate matrix used in
cvrall
for constrained randomization. If one wishes to an unadjusted
permutation test, one could leave out the z =
option as it is an
optional argument. The output of Analysis_result
includes the final
scheme selected by design (FinalScheme
object), the p-value of the
test (p-value
object) and a sentence to describe the p-value
(pvalue_statement
object). We omitted the code output here for
brevity, but comment that, in this example, the p-value equals to
0.042
, indicating that there is a significant difference in the effect
of the interventions on the outcome of up-to-date on immunizations, if
testing is performed at the 5% significance level. Again, if the
constrained randomization is performed by cvrcov
, we could use the
cptest
function in a similar way once we provided the constrained
permutation matrix obtained from cvrcov
in the cspacedatname =
argument.
The cvcrand package contains three main functions for the design and
analysis of cluster randomized trials. Given that it is common for such
trials to enroll a small number of clusters and that this gives rise to
chance imbalance in covariates that are predictive of the outcome, the
cvrall
and cvrcov
functions can be used to implement
covariate-constrained randomization in the design phase to ensure better
balance. The cvrall
function uses a balance metric to quantify balance
across multiple cluster-level covariates, whereas the cvrcov
allows
for covariate-by-covariate balance and could potentially be more
flexible. For analysis of the individual-level outcome data collected in
the CRT, the cptest
function could help perform the clustered
permutation test, which accommodate both continuous and binary outcomes
and should be treated as a flexible alternative to model-based analysis.
There are several limitations of the cvcrand package. First, the
cvrall
and cvrcov
only deal with two-arm parallel cluster randomized
trials and may not be directly applied to balance covariates in other
designs such as the stepped wedge designs
(Hussey and Hughes 2007; Li et al. 2018). Second, although the cptest
function performs a valid analysis for individual-level outcome data
when there is an equal number of clusters per arm, the test may be
anti-conservative when there is an unequal number of clusters per arm
(Gail et al. 1996). Furthermore, our cptest
routine does not provide a
confidence interval for the intervention effect estimate, and additional
programming is required to obtain a permutation confidence interval.
Essentially, the permutation test will be inverted to numerically search
for the interval limits, as is done in Gail et al. (1996) for an unadjusted
test under simple randomization. For the adjusted test under constrained
randomization, the following steps could be carried out: (i) hypothesize
an treatment effect cptest
function can handle both continuous and binary outcomes, we have not yet
extended the function to accommodate count outcomes. In summary, these
limitations reflect the current research on constrained randomization.
We plan to update the cvcrand package as the theory and knowledge of
these procedures develop in the future.
The authors would like to thank Alyssa Platt, Joe Egger, and Ryan Simmons of the Duke Global Health Institute Research Design and Analysis Core for testing and providing feedback on the programs. This research was funded in part by National Institutes of Health grant R01 HD075875 (PI: Dr. Joanna Maselko).
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Yu, et al., "cvcrand: A Package for Covariate-constrained Randomization and the Clustered Permutation Test for Cluster Randomized Trials", The R Journal, 2019
BibTeX citation
@article{RJ-2019-027, author = {Yu, Hengshi and Li, Fan and Gallis, John A. and Turner, Elizabeth L.}, title = {cvcrand: A Package for Covariate-constrained Randomization and the Clustered Permutation Test for Cluster Randomized Trials}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {2}, issn = {2073-4859}, pages = {191-204} }