Longitudinal studies are ubiquitous in medical and clinical research. Sample size computations are critical to ensure that these studies are sufficiently powered to provide reliable and valid inferences. There are several methodologies for calculating sample sizes for longitudinal studies that depend on many considerations including the study design features, outcome type and distribution, and proposed analytical methods. We briefly review the literature and describe sample size formulas for continuous longitudinal data. We then apply the methods using example studies comparing treatment versus control groups in randomized trials assessing treatment effect on clinical outcomes. We also introduce a Shiny app that we developed to assist researchers with obtaining required sample sizes for longitudinal studies by allowing users to enter required pilot estimates. For Alzheimer’s studies, the app can estimate required pilot parameters using data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Illustrative examples are used to demonstrate how the package and app can be used to generate sample size and power curves. The package and app are designed to help researchers easily assess the operating characteristics of study designs for Alzheimer’s clinical trials and other research studies with longitudinal continuous outcomes. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu).
Longitudinal designs are generally preferred over cross-sectional research designs as they provide richer data and greater statistical power. As such, many biomedical and medical studies employ longitudinal designs to study changes over time in outcomes at the individual, group, or population level. Early in the design of a longitudinal experimental or natural history study, it is imperative to ensure that the study is adequately powered for its aims. Inadequate sample sizes leads to invalid or inconclusive inference and squandered resources (Yan and Su 2006; Lu et al. 2009). On the other hand, oversampling squanders resources and exposes participants to unnecessary risks associated with the intervention (Lu et al. 2009). Thus, optimal sample size and power analysis have become important prerequisites for any quantitative research design. Not only are these required during the design phase of research, but it has also become mandatory for ethical, scientific, or economic justification in submissions to institutional review boards and research funding agencies.
Determining the right sample size for a study is not a straightforward task. Despite the plethora of sample size formulas for repeated measures (Rochon 1991; Lui 1992; Overall and Doyle 1994; Guo et al. 2013), cluster repeated measures (Liu et al. 2002), multivariate repeated measures (Vonesh and Schork 1986; Guo and Johnson 1996), longitudinal research designs (Lefante 1990), the tasks of gathering pilot estimates of the necessary parameters and getting the right software to carry out the computation can be challenging. Researchers commonly rely on formulas for very basic cross-sectional studies and adjust for attrition and other longitudinal design effects. Although such approaches can yield appropriate approximations, the ideal approach is to use a formula derived directly from the longitudinal model that researchers plan to eventually use on the study data.
(Guo et al. 2013) describe practical methods for the selection of appropriate sample size for repeated measures addressing issues of missing data, and the inclusion of more than one covariate to control for differences in response at baseline. Sample size formulas are refined depending on the specific situation and design features. For example, (Hedeker et al. 1999) considered a sample size for longitudinal designs comparing two groups that accounted for participant attrition or drop-out. (Basagaña et al. 2011) derived sample size formulas for continuous longitudinal data with time-varying exposure variables typical of observational studies. Ignoring time-varying exposure was demonstrated to lead to substantial overestimation of the minimum required sample size which can be economically disadvantageous. In non-traditional longitudinal designs such as designs for mediation analysis of the longitudinal study, further refinements to sample size formulas are needed to ensure that sufficient sample sizes are obtained (Pan et al. 2018). However, these formulas usually require additional parameters such as exposure mean, variance, and intraclass correlations (Basagaña et al. 2011), mediation effect, number of repeated measures (Pan et al. 2018), covariance structures (Rochon 1991), non-linear trends (Yan and Su 2006), missing, attrition or dropout rates (Roy et al. 2007; Lu et al. 2008), among others. Advanced sample size methods simultaneously handle several practical issues associated with the research design and complications that may arise during data collection. However, such methods are only available in commercial software (NCSS Statistical Software 2021; nQuery 2021).
Several R packages can be found on CRAN to compute sample size based on
mixed-effect models and other specific designs depending on the area of
applications. For example, (Martin et al. 2011) proposed a
simulation-based power calculation and an R package
pamm for random regression
models, a specific form of mixed-effect model that detects significant
variation in individual or group slopes. In their approach, a power
analysis was performed to detect a specified level of individual and
environmental interactions within evolution and ecology applications.
This is achieved by simulating power to detect a given covariance
structure. Other simulation-based packages for power analysis are the
SIMR by
(Green and MacLeod 2016) for linear and generalized linear mixed models
and clusterPower by
(Kleinman 2021) for cluster-randomized and cross-over designs.
(Schoenfeld 2019) developed a power and sample size package called
LPower
(Diggle et al. 1994) to perform power analysis for longitudinal
design accounting for attrition and different random effect
specification. The approach requires the specification of a design
matrix, and the variance-covariance matrix of the repeated measures
(Yi and Panzarella 2002). In pharmacokinetic study designs,
(Kloprogge and Tarning 2015) developed the
PharmPow power
calculation package for mixed study designs including crossover and
parallel designs. Quite recently, other R packages for performing power
analysis exist for different designs; for example the
powerMediation
(Qiu 2021) for mediation effect, mean change for longitudinal study
with 2-time points, the slope for simple Poisson regression, etc.;
powerEQTL
(Dong et al. 2021) for unbalanced one-way ANOVA in a Bulk Tissue and
Single-Cell eQTL Analysis;
WebPower
(Zhang et al. 2021) for basic and advanced power correlation,
proportion,
As the analysis model and associated sample size formula become more sophisticated, estimating the parameters required by the formula becomes more challenging. A major hurdle to overcome is the availability of pilot data to inform these parameters. To assist Alzheimer’s researchers with this challenge, we have developed a power and sample size Shiny app for Alzheimer’s clinical trials. The app implements formulas for the linear mixed-effects model and mixed model for repeated measures [MMRM; Lu et al. (2008)] allowing the user to input their pilot estimates, or allow the app to generate pilot estimates using data from the Alzheimer’s Disease Neuroimaging Initiative [ADNI; Weiner et al. (2015)].
Continuous outcomes in clinical trial data collected longitudinally over time are commonly analyzed using linear mixed models [LMM; Laird and Ware (1982)] or MMRM (Mallinckrodt et al. 2001, 2003). Before such trials, it is necessary to estimate the required sample size for a given treatment effect with desired power and Type I error. Various sample size approaches for longitudinal data have been proposed. We review a few of the most commonly used methods applied in Alzheimer’s disease trials with continuous outcomes.
Several sample size approaches have been developed by different authors.
(Diggle et al. 2002) proposed sample size formulas for two approaches
to continuous longitudinal outcomes, one that assumes a constant mean
over time and compares the average response over time between groups,
and another that assumes a linear change over time and compares the mean
rate of change or slope difference between groups. In either case, the
null hypothesis is that there is no difference between groups. Suppose
that
To test difference in the slopes or average rate of change between the
two groups, consider the parameterization:
Another sample size computation approach for correlated data is derived
by (Liu and Liang 1997) to detect differences in the average response
between two groups. This approach derived sample size following the
generalized estimating equation (Liang and Zeger 1986) approach. Thus,
different outcomes types can be handled. A special case is for a
continuous response measured repeatedly over time and modeled using a
linear model. Consider the model
Another approach is based on the LMM analysis
(Fitzmaurice et al. 2004) comparing mean rate of change between
groups (Ard and Edland 2011; Zhao and Edland 2020). Consider the LMM
An alternative approach is to treat time as a categorical variable. This approach, common in trials of treatments of Alzheimer’s and other therapeutic areas, is often referred to as the MMRM (Mallinckrodt et al. 2001, 2003; Lane 2008). The approach to fitting the MMRM is similar to that for other linear mixed-effect models for longitudinal or repeated measures except the unstructured modelling of time – treated as a categorical variable, and the specification of a within-participant correlation structure to account for association among the repeated measurements. The MMRM provides an estimate of the mean response for each time point category, for each group, and the resulting mean trajectory over time is unconstrained. The primary test statistic is usually the estimated group difference at the final time point. The null hypothesis is again that there is no difference between groups.
Suppose that
The aforementioned sample size approaches have been implemented in an R package, longpower (Donohue and Edland 2020), that can be found on CRAN via the URL https://cran.r-project.org/web/packages/longpower/index.html. The package also contains functions which translate pilot mixed effect model parameters (e.g. random intercept and/or slope) into marginal model parameters so that the formulas of (Diggle et al. 2002) or (Liu and Liang 1997) or (Lu et al. 2008) formula can be applied to produce sample size calculations for two sample longitudinal designs assuming known variance.
The interactive Shiny (Chang et al. 2021) application available from the URL https://atrihub.shinyapps.io/power/ is an interface to the longpower package developed to easily generate sample size and conduct power analysis for a longitudinal study design with two-group comparisons for a continuous outcome. The app similarly implements the sample size formula of (Liu and Liang 1997) and (Diggle et al. 1994; Diggle et al. 2002) using functions in the longpower package. A novel feature of the app is that it can generate required pilot estimates by sourcing data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI).
ADNI is a population-based longitudinal cohort study that follows study participants to collect data on their clinical, cognitive, imaging including MRI and PET images, genetic, and biochemical biomarkers. The study was designed to discover, optimize, standardize, and validate clinical trial measures and biomarkers that are used in AD clinical research. This multi-site longitudinal study runs at about 63 sites in the US and Canada and began in 2004. All the data generated from the ADNI study are entered into a data repository hosted at the Laboratory of Neuroimaging (LONI) at the University of Southern California, the LONI Image & Data Archive (IDA). The data can be freely accessed upon request. Apart from the many uses of the data for advancing knowledge for AD trials (Weiner et al. 2015), this big data resource can be used to improve study design. Specifically, in this paper, the data is used to generate pilot estimates for the computation of sample size and power.
We consider an Alzheimer’s disease example using ADAS-Cog
(Rosen et al. 1984) pilot estimates from the ADNI database. Suppose
that we want to compute the sample size required to detect an effect of
1.5 at the 5% level of significance and 80% power. The LMM fit to ADNI
data has an estimated variance of random slope for group A (placebo or
control group) of 74, and a residual variance of 10. Assuming study
visits at (0.00, 0.25, 0.50, 0.75, 1.00, 1.25, 1.50) years, the sample
size using the ‘edland’ approach can be obtained by using the
edland.linear.power()
functions in the longpower R package as
follows.
> t = seq(0,1.5,0.25)
> edland.linear.power(delta=1.5, t=t, sig2.s = 24, sig2.e = 10, sig.level=0.05,
power = 0.80)
Zhao and Edland, in process
N = 414.6202
n = 207.3101, 207.3101
delta = 1.5
t = 0.00, 0.25, 0.50, 0.75, 1.00, 1.25, 1.50
p = 0, 0, 0, 0, 0, 0, 1
p_2 = 0, 0, 0, 0, 0, 0, 1
sig2.int = 0
sig.b0b1 = 0
sig2.s = 24
sig2.e = 10
sig2.int_2 = 0
sig.b0b1_2 = 0
sig2.s_2 = 24
sig2.e_2 = 10
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: N is *total* sample size and n is sample size in *each* group
An alternative approach is to use lmmpower()
and specify the argument
method="edland"
in the following way.
> lmmpower(delta=1.5, t=t, sig2.s = 24, sig2.e = 10, sig.level=0.05,
power = 0.80,method="edland")
Zhao and Edland, in process
N = 414.6202
n = 207.3101, 207.3101
delta = 1.5
t = 0.00, 0.25, 0.50, 0.75, 1.00, 1.25, 1.50
p = 0, 0, 0, 0, 0, 0, 1
p_2 = 0, 0, 0, 0, 0, 0, 1
sig2.int = 0
sig.b0b1 = 0
sig2.s = 24
sig2.e = 10
sig2.int_2 = 0
sig.b0b1_2 = 0
sig2.s_2 = 24
sig2.e_2 = 10
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: N is *total* sample size and n is sample size in *each* group
The Diggle and Liu & Liang approaches can be applied with the
diggle.linear.power()
and
lui.liang.linear.power()
functions, respectively. The lmmpower()
functions can be used for either approach with the appropriate
specification of the ‘method’
argument.
The second illustrative example is the hypothetical clinical trial
discussed in (Diggle et al. 2002). Suppose that we are interested in
testing the effect of a new treatment in reducing blood pressure through
a clinical trial. The investigator is interested in randomizing
participants between a control and active treatment group to have equal
size. Three visits are envisaged with assessments planned at years 0, 2,
and 5. Thus,
> n <- 3
> t <- c(0,2,5)
> rho <- c(0.2, 0.5, 0.8)
> sigma2 <- c(100, 200, 300)
> tab = outer(rho, sigma2,
+ Vectorize(function(rho, sigma2){
+ ceiling(diggle.linear.power(
+ delta=0.5,
+ t=t,
+ sigma2=sigma2,
+ R=rho,
+ alternative="one.sided",
+ power = 0.80)$n[1])}))
> colnames(tab) = paste("sigma2 =", sigma2)
> rownames(tab) = paste("rho =", rho)
> tab
sigma2 = 100 sigma2 = 200 sigma2 = 300
rho = 0.2 313 625 938
rho = 0.5 196 391 586
rho = 0.8 79 157 235
The above code reproduces the table on page 29 of (Diggle et al. 2002). We also reproduce the table on page 30 of (Diggle et al. 2002) for detecting a difference in average response between two groups through the method by (Liu and Liang 1997) as follows:
> u = list(u1 = rep(1,n), u2 = rep(0,n)) #a list of covariate vectors or
matrices associated with the parameter of interest
> v = list(v1 = rep(1,n), v2 = rep(1,n)) #a respective list of covariate
vectors or matrices associated with the nuisance parameter
> rho = c(0.2, 0.5, 0.8) #correlations
> delta = c(20, 30, 40, 50)/100 #effect size
> tab = outer(rho, delta,Vectorize(function(rho, delta){
+ ceiling(liu.liang.linear.power(
+ delta=delta, u=u, v=v,
+ sigma2=1,
+ R=rho, alternative="one.sided",
+ power=0.80)$n[1])}))
> colnames(tab) = paste("delta =", delta)
> rownames(tab) = paste("rho =", rho)
> tab
delta = 0.2 delta = 0.3 delta = 0.4 delta = 0.5
rho = 0.2 145 65 37 24
rho = 0.5 207 92 52 33
rho = 0.8 268 120 67 43
The sample size formula for the MMRM approach is also implemented in the
longpower package’s power.mmrm()
function. To illustrate how this
approach is implemented, consider a hypothetical example with a
correlation matrix having 0.25 as off-diagonal entries (exchangeable), a
retention vector (1, 0.90,0.80,0.70) and standard deviation of 1, for
group A. Assuming these values to be the same for group B, then the
sample size required to detect an effect size of 0.5 at 5% level of
significance and 80% power is computed as follows:
> Ra <- matrix(0.25, nrow = 4, ncol = 4)
> diag(Ra) <- 1 #exchangeable correlation matrix for group A
> ra <- c(1, 0.90, 0.80, 0.70)#retention in group A
> sigmaa <- 1 #standard deviation for group A
> power.mmrm(Ra = Ra, ra = ra, sigmaa = sigmaa, delta = 0.5, power = 0.80)
Power for Mixed Model of Repeated Measures (Lu, Luo, & Chen, 2008)
n1 = 86.99175
n2 = 86.99175
retention1 = 1.0, 0.9, 0.8, 0.7
retention2 = 1.0, 0.9, 0.8, 0.7
delta = 0.5
sig.level = 0.05
power = 0.8
alternative = two.sided
Suppose the allocation ratio is 2, then the function argument lambda=2
can be added.
We demonstrate how the app is used to perform power analysis by inputting user-specified values and a specific sample size approach. We make use of the values shown in Table 1. For the MMRM method, we specify the options assuming an exchangeable correlation structure.
Parameter | LMM | MMRM |
---|---|---|
Start time | 0 | 1 |
End time | 1.5 | 4 |
Timestep | 0.5 | 1 |
Type I error rate | 0.05 | 0.05 |
Effect size | 1.5 | - |
Estimate of variance of random intercept | 55 | - |
Estimate of variance of random slope | 22 | - |
Estimate of covariance of random intercept and slope | 29 | - |
Estimate of the error variance | 10 | - |
Standard deviation of observation in group A | - | 1 |
Standard deviation of observation in group B | - | 1 |
Exchangeable correlation | - | 0.25 |
Allocation ratio | 1 | 2 |
For the ADNI-based pilot estimate generator, we select the full range of values for some selected variables (See Figure 3).
The app has three main menus on the sidebar. The first menu has two sub-menus which let the user perform power and sample size calculations based on longitudinal designs when necessary pilot estimates are known. The first submenu, the default, accepts user-specified inputs and generates a graphical output and summary of the inputs. In the ‘input’ box, as shown in Figure 1a, users are aided with the selection of their inputs through widgets such as select input, numerical input, slider input, and radio button to facilitate the user selection. The select input for ‘Analysis Type’ allows the user to select whether the user desires to determine power or sample size. Input parameters that are not applicable for a given selection or are not applicable for the selected sample size method are grayed out to enhance the user experience. For example, the allocation ratio input widget is grayed out when the ‘diggle’ and ‘liuliang’ procedures are selected. The ‘output’ box, shown in Figure 1b, displays a graph of power versus sample size, a note on the method for the sample size computation, and a summary of the selected inputs by the user. The second submenu enables the user to conduct power analysis based on the MMRM methodology. Widgets such as select inputs, slider inputs, numerical inputs, and radio buttons display the current values of the model parameter in the ‘input’ box (See Figure 2a). As the MMRM requires the specification of an association structure and a vector of the retention rates, the app display vectors, and matrix inputs widgets. The size of these widgets depends on users’ choice of the number of time points for the study. However, these widgets are not reactive, and therefore the user must use the “Update/Enter" action button when changes are made to the number of time points to update the vector and matrix input widgets. The corresponding graphical and summary outputs are reactively displayed within the ‘output’ box, immediately below the ‘input’ box (See Figure 2b).
The second menu item also has two sub-menus for the LMM and the MMRM methods, respectively. In this menu, the app enables the user to generate pilot estimates from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data that is fed into the application to perform the power calculations.
In the box for ‘baseline selection’, the user can select variables that define their study population. By default, all variables are selected, utilizing all the data in the ADNI. A variable can be deselected from the left side of the box with a click. After the user selection of the variable, the user can submit by clicking on the ‘submit selected criteria’ action button to activate the corresponding widgets in the next box. The ‘Inclusion/exclusion criteria’ box is made of a slider and select input widgets which allow the user to select a range of values for the selected population characteristics (See Figure 3). Baseline summaries are produced by the app according to the selections of the user. Summaries by gender, education, ethnicity, and race are provided for the selected population. For continuous variables, the number of observations, number of missing observations, mean, median, lower and upper quantiles are displayed, while for categorical variables, the total and level-specific number of observations, and percentages are displayed. These bivariate summaries give the user a clear presentation of the data per the selected inclusion and exclusion criteria.
Next, the user can choose a primary outcome from among some that are commonly used for Alzheimer’s disease studies. Pilot estimates are obtained from fitting a LMM for the selected outcome for the power analysis. The estimates of the model can be adjusted with some user-selected covariate options. Data on the selected outcome from baseline to the selected number of years of the study are presented by individual-level and smooth mean profile plots. The app allows the selection of options for selecting the sample size method, type of test, type I error rate, percentage change, and allocation ratio as inputs for the power analysis (see Figure 4). Finally, the graphical output for the power analysis and summary of the inputs used are displayed in the ‘output’ box. The summary of the LMM is also shown (see Figure 5).
The interface of the second sub-menu is very similar to the first except that the sample size methodology is based on the MMRM. Additionally, the model fitting options include specifying an association structure, allocation ratio, and percentage retention for the two groups.
The final menu on the sidebar is the ‘About’ menu, which provides brief information of the dashboard, data description, and acknowledgment of the ADNI resources, packages used for developing the dashboard, and contact information of the developers of the dashboard.
In this manuscript, we have presented the longpower R package, and a Shiny app dashboard that facilitates sample size and power analysis for a longitudinal study design with two-group comparisons of a continuous outcome. The app implements the sample size formulas of (Liu and Liang 1997), (Diggle et al. 1994; Diggle et al. 2002), (Lu et al. 2008), and (Ard and Edland 2011) using functions in the longpower package. The package also handles models in which time is treated either as continuous (e.g. with random intercepts and slopes) or categorical (MMRM).
The longpower package was created to allow R users easy access to sample size formulas for longitudinal data that were already available in the literature. Many of the earlier papers on the topic provided no software, and so considerable effort was required by each reader to program implementations of the formulas. Collecting these formulas into an R package makes the methods more accessible and easy to compare. The package includes unit tests to ensure the software can adequately reproduce published results, and alternative approaches for the same study design are validated against each other.
A novel feature of the app is the ability to source pilot data for Alzheimer’s disease trials to generate required parameter estimates. We focus on Alzheimer’s data as our primary area of interest, but future work could bring in data from other disease areas. Other future directions include accommodating other outcome types, and keeping up with the evolving landscape of model parameterizations.
We are grateful to the ADNI study volunteers and their families. This
work was supported by Biomarkers Across Neurodegenerative Disease
(BAND-14-338179) grant from the Alzheimer’s Association, Michael J. Fox
Foundation, and Weston Brain Institute; and National Institute on Aging
grant R01-AG049750. Data collection and sharing for this project was
funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI)
(National Institutes of Health Grant U01 AG024904) and DOD ADNI
(Department of Defense award number W81XWH-12-2-0012). ADNI is funded by
the National Institute on Aging, the National Institute of Biomedical
Imaging and Bioengineering, and through generous contributions from the
following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery
Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers
Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.;
Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its
affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO
Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.;
Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity;
Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx
Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation;
Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company;
and Transition Therapeutics. The Canadian Institutes of Health Research
is providing funds to support ADNI clinical sites in Canada. Private
sector contributions are facilitated by the Foundation for the National
Institutes of Health (www.fnih.org). The grantee organization is the
Northern California Institute for Research and Education, and the study
is coordinated by the Alzheimer’s Therapeutic Research Institute at the
University of Southern California. ADNI data are disseminated by the
Laboratory for Neuro Imaging at the University of Southern California.
Data used in preparation of this article were obtained from the
Alzheimer’s Disease Neuroimaging Initiative (ADNI) database
(adni.loni.usc.edu). As such, the investigators within the ADNI
contributed to the design and implementation of ADNI and/or provided
data but did not participate in analysis or writing of this report. A
complete listing of ADNI investigators can be found at:
http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
Conflict of Interest: None declared.
pamm, SIMR, clusterPower, LPower, PharmPow, powerMediation, powerEQTL, WebPower, longpower
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Iddi & Donohue, "Power and Sample Size for Longitudinal Models in R -- The longpower Package and Shiny App", The R Journal, 2022
BibTeX citation
@article{RJ-2022-022, author = {Iddi, Samuel and Donohue, Michael C}, title = {Power and Sample Size for Longitudinal Models in R -- The longpower Package and Shiny App}, journal = {The R Journal}, year = {2022}, note = {https://rjournal.github.io/}, volume = {14}, issue = {1}, issn = {2073-4859}, pages = {264-282} }