atable : Create Tables for Clinical Trial Reports

Examining distributions of variables is the first step in the analysis of a clinical trial before more specific modelling can begin. Reporting these results to stakeholders of the trial is an essential part of a statistician’s work. The atable package facilitates these steps by offering easy-to-use but still flexible functions.


Introduction
Reporting the results of clinical trials is such a frequent task that guidelines have been established that recommend certain properties of clinical trial reports; see Moher et al. (2010).In particular, Item 17a of CONSORT states that "Trial results are often more clearly displayed in a table rather than in the text".Item 15 of CONSORT suggests "a table showing baseline demographic and clinical characteristics for each group".
The atable package facilitates this recurring task of data analysis by providing a short approach from data to publishable tables.The atable package satisfies the requirements of CONSORT statements Item 15 and 17a by calculating and displaying the statistics proposed therein, i.e. mean, standard deviation, frequencies, p-values from hypothesis tests, test statistics, effect sizes and confidence intervals thereof.Only minimal post-processing of the table is needed, which supports reproducibility.The atable package is intended to be modifiable: it can apply arbitrary descriptive statistics and hypothesis tests to the data.For this purpose, atable builds on R's S3-object system.R already has many functions that perform single steps of the analysis process (and they perform these steps well).Some of these functions are wrapped by atable in a single function to narrow the possibilities for end users who are not highly skilled in statistics and programming.Additionally, users who are skilled in programming will appreciate atable because they can delegate this repetitive task to a single function and then concentrate their efforts on more specific analyses of the data at hand.

Context
The atable package supports the analysis and reporting of randomised parallel group clinical trials.Data from clinical trials can be stored in data frames with rows representing 'patients' and columns representing 'measurements' for these patients or characteristics of the trial design, such as location or time point of measurement.These data frames will generally have hundreds of rows and dozens of columns.The columns have different purposes: • Group columns contain the treatment that the patient received, e.g.new treatment, control group, or placebo.
• Split columns contain strata of the patient, e.g.demographic data such as age, sex or time point of measurement.
• Target columns are the actual measurements of interest, directly related to the objective of the trial.In the context of ICH E9 ICH E9 (1999), these columns are called 'endpoints'.
The task is to compare the target columns between the groups, separately for every split column.This is often the first step of a clinical trial analysis to obtain an impression of the distribution of data.The atable package completes this task by applying descriptive statistics and hypothesis tests and arranges the results in a table that is ready for printing.
Additionally atable can produce tables of blank data.frameswith arbitrary fill-ins (e.g.X.xx) as placeholders for proposals or report templates.

Usage
To exemplify the usage of atable, we use the dataset arthritis of multgee Touloumis (2015).This dataset contains observations of the self-assessment score of arthritis, an ordered variable with five categories, collected at baseline and three follow-up times during a randomised comparative study of alternative treatments of 302 patients with rheumatoid arthritis.
The R Journal Vol.11/01, June 2019 ISSN 2073-4859 L A T E X is not the only supported output format.All possible formats are: • L A T E X(as shown in this document), further processed with e.g.latex of Hmisc, kable of knitr or xtable of xtable.
• R's console.Human readable format meant for explorative interactive analysis.
The output format is declared by the argument format_to of atable, or globally via atable_options.
The settings package van der Loo (2015) allows global declaration of various options of atable.

Modifying atable
The current implementation of tests and statistics (see table 3) is not suitable for all possible datasets.For example, the parametric t-test or the robust estimator median may be more adequate for some datasets.Additionally, dates and times are currently not handled by atable.
It is intended that some parts of atable can be altered by the user.Such modifications are accomplished by replacing the underlying methods or adding new ones while preserving the structures of arguments and results of the old functions.The workflow of atable (and the corresponding function in parentheses) is as follows: 1. calculate statistics (statistics) The R Journal Vol.11/01, June 2019 ISSN 2073-4859

Replace existing methods
The atable package offers three possibilities to replace existing methods: • pass a function to atable_options.This affects all following calls of atable.
• pass a function to atable.This affects only a single call of atable and takes precedence over atable_options.
• replace a function in atable's namespace.This is the most general possibility, as it is applicable to all R packages, but it also needs more code than the other two and is not as easily reverted.
We now define three new functions to exemplify these three possibilities.
Then modify via atable_options: atable_options( statistics.numeric= new_statistics_numeric) Then modify via passing new_format_statistics_numeric as an argument to atable, together with actual analysis.See table 4 for the results.

Add new methods
In the current implementation of atable, the generics have no method for class Surv of survival Therneau (2015).We define two new methods: the distribution of survival times is described by its mean survival time and corresponding standard error; the Mantel-Haenszel test compares two survival curves.These two functions are defined in the user's workspace, the global environment.It is sufficient to define them there, as R's scoping rules will eventually find them after going through the search path, see Wickham (2014).Now, we need data with class Surv to apply the methods.The dataset ovarian of survival contains the survival times of a randomised trial comparing two treatments for ovarian cancer.Variable futime is the survival time, fustat is the censoring status, and variable rx is the treatment group.library(survival) # set classes ovarian <-within(survival::ovarian, {time_to_event = survival::Surv(futime, fustat)}) Then, call atable to apply the statistics and hypothesis tests.See tables 6 for the results.

Discussion
A single function call does the job, and in conjunction with report-generating packages such as knitr, accelerates the analysis and reporting of clinical trials.
Other R packages exist to accomplish this task: • furniture Barrett et al. (2018) • tableone Yoshida and Bohn. (2018) • stargazer Hlavac (2018): focus is more on reporting regression models; no grouping variables, so no two-sample hypothesis tests included; and descriptive statistics are comparable to atable • DescTools Signorell (2018): comparable functions are Desc (only describes data.frames,no hypothesis tests) and PercTable (contingency tables only).
furniture and tableone have high overlap with atable, and thus we compare their advantages relative to atable in greater detail: Advantages of furniture::table1 are: • interacts well with margrittr's pipe %>% Bache and Wickham (2014), as mentioned in the examples of ?table1.This facilitates reading the code.
• handles objects defined by dplyr's group_by to define grouping variables Wickham et al. (2019).atable has no methods defined for these objects.
Advantages of tableone::CreateTableOne are: • allows arbitrary column names and prints these names in the resulting table unaltered.This is useful for generating human-readable reports.Blanks and parentheses are allowed for reporting e.g.'Sex (Male) x%'.Also, non-ASCII characters are allowed.This facilitates reporting in languages that have little or no overlap with ASCII.atable demands syntactically valid names defined by make.names.
• counting missing values is easily switched on and off by an argument of tableone::CreateTableOne.
In atable a redefinition of a function is needed.• allows pairwise comparisons tests when data is grouped into more than two classes.atable allows only multivariate tests.
Advantages of atable are: • options may be changed locally via arguments of atable and globally via atable_options, • easy expansion via S3 methods, • formula syntax, • distinction between split_cols and group_col, • accepts empty data.frames.This is useful when looping over a list of possibly empty data frames in subgroup analysis, see table 5, • allows to create tables with a blank data.framewith arbitrary fill-ins (e.g.X.xx) as placeholders for proposals or report templates, also see table 5.
Changing options is exemplified in section 2.4: passing options to atable allows the user to modify a single atable-call; changing atable_options will affect all subsequent calls and thus spares the user passing these options to every single call.
Descriptive statistics, hypothesis tests and effect sizes are automatically chosen according to the class of the target column.R's S3-object system allows a straightforward implementation and extension of this feature, see section 2.4.
atable supports the following concise and self-explanatory formula syntax: atable(target_cols ~group_col, ...) performs a hypothesis test, whether there is an influence of the interventions group_col on the endpoint target_cols.Also, statisticians know the notion of conditional probability:

P(target_cols | split_cols).
This denotes the distribution of target_cols given split_cols.atable borrows the pipe | from conditional probability: atable(target_cols ~group_col | split_cols) shows the distribution of the endpoint target_cols within the interventions group_col given the strata defined by split_cols.atable distinguishes between split_cols and group_col: group_col denotes the randomised intervention of the trial.We want to test whether it has an influence on the target_cols; split_cols are variables that may have an influence on target_cols, but we are not interested in that influence in the first place.Such variables, for example, sex, age group, and time point of measurement, arise often in clinical trials.See table 2: the variable time is such a supplementary stratification variable: it has an effect on the arthritis score, but that is not the effect of interest; we are interested in the effect of the intervention on the arthritis score.
The package can be used in other research contexts as a preliminary unspecific analysis.Displaying the distributions of variables is a task that arises in every research discipline that collects quantitative data.

Table 1 :
Mittelbach et al. (2004)ral functions that create a L A T E X-representationMittelbach et al. (2004)of the table exist: latex of Hmisc Harrell Jr et al. (2018), kable of knitr Xie (2018) or xtable of xtable Dahl et al. (2018).latex is used for this document.Table 1 reports the number of observations per group.The distribution of numeric variable age is described by its mean and standard deviation, and the distributions of categorical variable sex and ordered variable baselinescore are presented as percentages and counts.Additionally, missing values are counted per variable.Descriptive statistics, hypothesis tests and effect sizes are automatically chosen according to the class of the target column; see table 3 for details.Because the data is from a randomised study, hypothesis tests comparing baseline variables between the treatment groups are omitted.Demographics of dataset arthritis.The target variable is score, variable trt acts as the grouping variable, and variable time splits the dataset before analysis: the_table <-atable(score ~trt | time, arthritis) Table 2 reports the number of observations per group and time point.The distribution of ordered variables score is presented as counts and percentages.Missing values are also counted per variable and group.The p-value and test statistic of the comparison of the two treatment groups are shown.

Table 2 :
Hypothesis tests of dataset arthritis.

Table 3 :
R classes, scale of measurement and atable.The table lists the descriptive statistics and hypothesis tests applied by atable to the three R classes factor, ordered and numeric.The table also reports the corresponding scale of measurement.atable treats the classes character and logical as the class factor.
These five functions may be altered by the user by replacing existing or adding new methods to already existing S3-generics.Two examples are as follows:

Table 4 :
Modified atable now calculates the median, MAD, t-test and KS-test for numeric variables.The median is greater than the mean in both the drug and placebo group, indicating a skewed distribution of age.Additionally the KS-test is significant at the 5% level, while the t-test is not.See table5for the results.This table also shows that atable accepts empty data frames without errors.

Table 5 :
atable applied to an empty data frame with placeholder statistics for numeric variables.The placeholder-function is applied to the numeric variable, printing X.xx in the table.The empty factor variable is summarized in the same way as non-empty factors: by returning percentages and counts; in this case yielding 0/0 = NaN percent and counts of 0 in every category, as expected.Note, that the empty data frame still needs non-empty column names.

Table 6 :
Hypothesis tests of the dataset ovarian.