Accelerometers are a valuable tool for measuring physical activity (PA) in epidemiological studies. However, considerable processing is needed to convert time-series accelerometer data into meaningful variables for statistical analysis. This article describes two recently developed R packages for processing accelerometer data. The package accelerometry contains functions for performing various data processing procedures, such as identifying periods of non-wear time and bouts of activity. The functions are flexible, computationally efficient, and compatible with uniaxial or triaxial data. The package nhanesaccel is specifically for processing data from the National Health and Nutrition Examination Survey (NHANES), years 2003–2006. Its primary function generates measures of PA volume, intensity, frequency, and patterns according to user-specified data processing methods. This function can process the NHANES 2003–2006 dataset in under one minute, which is a drastic improvement over existing software. This article highlights important features of packages accelerometry and nhanesaccel and demonstrates typical usage for PA researchers.
The National Health and Nutrition Examination Survey (NHANES) is one of a select few national studies to include objective physical activity (PA) monitoring with accelerometers (Hagstromer et al. 2007; Troiano et al. 2008; Colley et al. 2011). In the 2003–2004 and 2005–2006 waves of the study, a total of 14,631 Americans age \(\geq\) 6 years wore uniaxial ActiGraph GT1M accelerometers at the hip for seven consecutive days. The result is a rich dataset for describing PA levels in Americans and assessing correlates of PA.
In 2007, SAS programs for processing the accelerometer data from NHANES were made available on the National Cancer Institute’s (NCI) website (National Cancer Institute 2007). Using these programs, researchers around the world could access the time-series accelerometer data from NHANES and run SAS programs to derive meaningful PA variables to use in statistical analyses. Over the past seven years, the NCI’s SAS programs have facilitated great progress in understanding the distribution of PA behaviors in America (Troiano et al. 2008; Tudor-Locke et al. 2010), and have helped to identify numerous cross-sectional associations between PA and health outcomes (Sisson et al. 2010; Carson and Janssen 2011; Healy et al. 2011).
While the NCI’s SAS programs provide several indicators of overall PA and moderate-to-vigorous PA (MVPA), researchers are increasingly writing customized scripts in programs such as MATLAB (The MathWorks, Inc. 2014), LabVIEW (National Instruments 2014), and R for more flexibility in data processing (Tudor-Locke et al. 2011). This approach allows researchers to generate more descriptive variables, such as time spent in extended periods of sedentariness or hour-by-hour count averages (Bankoski et al. 2011); to implement non-wear algorithms that might better discriminate between true non-wear and sedentary time (Choi et al. 2012); and to apply activity intensity cut-points specific to certain populations, such as children and older adults (Copeland and Esliger 2009).
As the level of complexity in data processing methods increases, the learning curve for researchers interested in working with accelerometer data also increases. More generally, researchers with data from studies other than NHANES are tasked with writing their own software to convert the time-series data to a form suitable for statistical analysis. Such efforts are time-consuming and probably inefficient, as many different researchers are likely to write their own code for very similar purposes. The size of accelerometer files causes additional complexity. For example, the accelerometer files in NHANES 2003–2006 are 3.87 GB and contain over 1.25 billion data points.
The purpose of this paper is to describe recently developed software for processing accelerometer data. The software takes the form of two R packages: accelerometry for general functions (Van Domelen 2014), and nhanesaccel for processing NHANES 2003–2006 data (Van Domelen et al. 2014).
The package accelerometry has a collection of functions for performing various tasks in accelerometer data processing, as well as composite functions for converting minute-to-minute uniaxial or triaxial accelerometer data into meaningful PA variables. This package is intended for researchers who wish to process accelerometer data from studies other than NHANES.
The package was developed according to the design goals in Table 1. Computing speed was considered particularly important because users may have data on thousands or tens of thousands of participants. Such a large volume of data can take hours to process using inefficient software.
Goal | Description |
1 | Allow flexibility in data processing (non-wear algorithm, bouts, etc.). |
2 | Able to process data from uniaxial or triaxial accelerometers worn at the |
hip, wrist, or other position. | |
3 | Run efficiently so that large datasets can be processed quickly. |
4 | Generate a wide variety of physical activity variables, including those that |
can be used to study weekday/weekend and time-of-day effects. | |
5 | Use reasonable defaults for function inputs so that users with minimal |
knowledge of accelerometry can still process their data adequately. | |
6 | Free. |
Using R with embedded C++ code was a natural choice to meet goals 3 and 6, and the functions in the package were written to achieve goals 1–2 and 4–5. The package contains eleven functions for performing various steps in accelerometer data processing. These functions are summarized in Table 2. For seven of the functions, C++ code was added via the Rcpp package (Eddelbuettel and François 2011; Eddelbuettel 2013) to improve efficiency.
Function | Description | C++ |
accel.artifacts |
Replace extreme counts with mean of neighboring values. | Yes |
accel.bouts |
Identify bouts of physical activity. Can specify minimum | Yes |
count value, minimum length, and tolerance. | ||
accel.intensities |
Compute number of minutes with counts in various user- | No |
defined intensity levels and counts accumulated from each | ||
intensity level. | ||
accel.sedbreaks |
Count number of sedentary breaks. | Yes |
accel.weartime |
Identify periods when participant is not wearing the | Yes |
accelerometer. | ||
blockaves |
Calculate average of non-overlapping segments of data. | Yes |
movingaves |
Calculate moving averages. | Yes |
rle2 |
Run length encoding. Often much faster than base function | Yes |
rle , and records value, length, and indices for each run. |
||
inverse.rle2 |
Convert matrix created by rle2 back to vector form. |
No |
accel.process.uni |
Composite function to process uniaxial accelerometer data. | No |
accel.process.tri |
Composite function to process triaxial accelerometer data. | No |
Most users will only interact with accel.process.uni
or
accel.process.tri
. These functions internally call the other
functions, and have inputs to control the non-wear and activity bout
algorithms, count cut-points for activity intensities, and other
options. Notably, accel.process.tri
allows the user to choose which
accelerometer axis to use for non-wear detection, activity bouts, and
intensity classification. This is important because vertical-axis counts
are almost always used for intensity classification, but triaxial data
is more sensitive to small movements and therefore well-suited for
non-wear detection (Choi et al. 2012).
Version 2.2.4 of accelerometry is currently available on CRAN. The package requires R version 3.0.0 or above due to its dependency on Rcpp, which itself requires R 3.0.0 or above. A data dictionary for the variables generated by the functions in accelerometry is available online (Van Domelen 2013).
Suppose a researcher wishes to process triaxial data collected from an
ActiGraph GT3X+ device (ActiGraph, Pensacola, FL) worn at the hip for a
particular study participant. The first step is to load a data file with
the accelerometer count values in one-minute intervals (“epochs”) into
R. After transferring data from the accelerometer to a computer via USB
cable, the researcher can use Actigraph’s ActiLife software (ActiGraph 2014)
to create a .csv file and load it into R using read.csv
, or use the
function gt3xAccFile
in the R package
pawacc
(Geraci et al. 2012; Geraci 2014) to load the accelerometer file directly into R.
Notably, data collected in shorter than one-minute epochs can still be
processed using accelerometry, it just has to be converted to
one-minute epochs first. This can be done in ActiLife or in R using one
of several available functions (e.g. blockaves
in accelerometry,
aggAccFile
in pawacc, or dataCollapser
in
PhysicalActivity;
Choi et al. (2011a)).
Once the data is in R, the user should have a four-column matrix or data
frame where the columns represent counts in the vertical axis,
anteroposterior (AP) axis, and mediolateral (ML) axis, as well as steps.
A sample matrix with this format is included in the accelerometry
package. The next step is to install accelerometry and call
accel.process.tri
. The following code installs the package and
processes the sample dataset in two ways: first using default settings,
and then using a different algorithm for non-wear detection, and
requesting a larger set of PA variables.
# Install and load accelerometry package
install.packages("accelerometry")
library("accelerometry")
# Load four-column matrix with counts and steps over 7 days
data(tridata)
# Generate basic PA variables using default settings
<- accel.process.tri(counts.tri = tridata[, 1:3], steps = tridata[, 4])
dailyPA1
# Request full set of PA variables, and use triaxial vector magnitude for non-wear
# detection rather than vertical axis, with 90-minute rather than 60-minute window
<- accel.process.tri(counts.tri = tridata[, 1:3], steps = tridata[, 4],
dailyPA2 brevity = 3, nonwear.axis = "mag", nonwear.window = 90)
# Check variable names for dailyPA1 and dailyPA2
colnames(dailyPA1)
colnames(dailyPA2)
# Print contents of dailyPA1 and first 15 variables in dailyPA2
dailyPA11:15]
dailyPA2[,
# Calculate average for cpm_vert from dailyPA1 and dailyPA2
mean(dailyPA1[, "cpm_vert"])
mean(dailyPA2[, "cpm_vert"])
Comparing the two datasets, we see that dailyPA1
has 15 variables for
each of the seven days of monitoring, while dailyPA2
has 124. The
brevity
input was not specified in the first function call, so the
default was used (brevity = 1
). This resulted in only basic variables
being generated: participant ID (id
), day of week (day
), whether the
day was deemed valid for analysis (valid_day
), wear time minutes
(valid_min
), total counts in each accelerometer axis (counts_vert
,
…, counts_mag
), average counts during wear time for each
accelerometer axis (cpm_vert
, …, cpm_mag
), and number of steps
(steps
). In contrast, setting brevity = 3
for dailyPA2
resulted in
an additional 109 variables being generated. These extra variables
quantify various aspects of daily activity that may be of interest to
more experienced PA researchers. For example, one may want to test
whether prolonged periods of sedentariness (sed_bouted_60min
) are
associated with health outcomes independent of total PA (cpm_vert
), or
compare daily patterns of PA across various demographics (cpm_hour1
,
…, cpm_hour24
). A data dictionary for these variables is available
online (Van Domelen 2013).
Looking at dailyPA1
, it is interesting that counts_vert
and
counts_ap
were very similar on all days except day 6, when counts_ap
was more than three times greater than counts_vert
. It seems likely
that the participant engaged in some non-walking activity on this day
that involved more anteroposterior (i.e. front-back) motion than
vertical motion. This is supported by the participant having an
unusually low number of steps on this day, but only slightly lower than
average counts_ap
.
A 60-minute non-wear algorithm based on only vertical-axis counts was
used to create dailyPA1
, while a 90-minute algorithm based on triaxial
counts was used to create dailyPA2
. The first algorithm identified
some non-wear time on days 5 and 6, while the second non-wear algorithm
did not mark any time on any day as non-wear. Some disagreement between
different non-wear algorithms is to be expected, and different
researchers may have different preferences based on validation studies
or their own experiences processing accelerometer data. Regardless, it
makes little difference which algorithm is used for this particular
participant. Average counts per minute during wear time (cpm_vert
) for
the seven days of monitoring, a standard measure of total PA, was 142.7
using the first non-wear algorithm and 142.0 using the second.
Adding return.form = 1
to the function calls to accel.process.tri
would result in daily averages being returned rather than separate
values for each day of monitoring. This format is usually more useful
for statistical analysis.
For further information on accelerometry, the package documentation
includes a description of all of the functions and their inputs and has
some additional examples (Van Domelen 2014). The code from this section is also
available in the package as a demo that can be executed by running
demo("acceljournal")
.
The package nhanesaccel is intended for researchers who want to use
the objectively measured PA data in NHANES 2003–2006. Its primary
function, nhanes.accel.process
, converts minute-to-minute
accelerometer data from NHANES participants into useful PA variables for
statistical analysis.
Design goals for nhanesaccel are shown in Table 3. Speed and flexibility were primary considerations for this package.
Goal | Description |
---|---|
1 | Process NHANES 2003–2006 accelerometer data in less than 10 minutes. |
2 | Generate file for statistical analysis without directly modifying any code. |
3 | Produce .csv file that can be imported into any software package for analysis. |
4 | Do not require user to download 3.87 GB data files to the personal computer. |
5 | Calculate adjusted NHANES sample weights based on subset of participants |
with usable data, which are needed for appropriate statistical analysis. | |
6 | Free. |
7 | Option to replicate methods used in the NCI’s SAS programs, allowing |
researchers to use NCI’s methods while taking advantage of goals 1–4 and 6. |
The main function in nhanesaccel is nhanes.accel.process
. This
function relies heavily on the functions in accelerometry, and has
similar inputs as accel.process.uni
for controlling data processing
methods. There are 31 function inputs controlling methods for non-wear
detection, activity bouts, inclusion criteria, and the number of PA
variables generated. These are described in the package documentation
files (Van Domelen et al. 2014). A data dictionary for the variables generated by
nhanes.accel.process
is available online (Van Domelen 2013).
The use of C++ code in the functions in accelerometry made it possible
for nhanes.accel.process
in nhanesaccel to process the full dataset
of 14,631 participants in NHANES 2003–2006 in less than one minute. By
including NHANES data in the package, there is no need for the user to
download the large raw data files from the NHANES website. Adjusted
sample weights, which are needed to ensure that the subset of
participants with usable accelerometer data are weighted to match the
demographics of the United States as a whole, are calculated by the
function nhanes.accel.reweight
. This function operates very similarly
to the NCI’s SAS program reweight.pam
(National Cancer Institute 2007), and is called internally
by nhanes.accel.process
.
Version 2.1.1 of nhanesaccel is currently available on R-Forge. Due to its size (88.1 MB) it cannot be hosted on CRAN. The package requires R version 3.0.0 or above.
Users can install nhanesaccel via the install.packages
function in
R. Mac and Linux users and Windows users not running the most recent
version of R (3.1 as of November 28, 2014) need to set type = "source"
to install from source files rather than binaries.
# Install accelerometry package (if not already installed)
install.packages("accelerometry")
# Install nhanesaccel on Windows running most recent version of R
install.packages("nhanesaccel", repos = "http://R-Forge.R-project.org")
# Install nhanesaccel on Mac or Linux or on Windows running earlier version of R
install.packages("nhanesaccel", repos = "http://R-Forge.R-project.org",
type = "source")
Once the package is installed, the user can process the NHANES
2003–2006 dataset by loading the package and then calling
nhanes.accel.process
.
# Load nhanesaccel package
library("nhanesaccel")
# Process NHANES 2003-2006 data using default settings
<- nhanes.accel.process()
nhanes1
# Examine summary data for first 5 participants
1:5, ] nhanes1[
After a short period of time, the function returns a data frame named
nhanes1
with daily averages for basic PA variables. Alternatively, the
user may want to write a .csv file that can be imported into a different
statistical software package for analysis. This can be achieved by
adding the input write.csv = TRUE
to the nhanes.accel.process
function call. Unless otherwise specified via the directory
input, the
.csv file would be written to the current working directory, which can
be seen by running getwd()
and set by running setwd("<location>")
.
At this point the user could close R and switch to his or her preferred software package for statistical analysis. For example, the user could load the .csv file into SAS, merge in the Demographics files (which can be downloaded from the NHANES website), and use survey procedures in SAS to test for associations between demographic variables and PA.
Here we briefly illustrate typical data processing and analysis in R. Suppose we wish to test the hypothesis that American youth are more active on weekdays than on weekend days. First, we process the accelerometer data from NHANES 2003–2006 using three non-default inputs. The first two inputs require that participants have at least one valid weekday and one valid weekend day (rather than just one valid day overall), and the third requests that PA averages are calculated for weekdays and weekend days separately.
# Process NHANES 2003-2006 data, requiring at least one valid weekday and weekend day
<- nhanes.accel.process(valid.week.days = 1, valid.weekend.days = 1,
nhanes2 weekday.weekend = TRUE)
# Get dimension and variable names for nhanes2
dim(nhanes2)
names(nhanes2)
The data frame nhanes2
has 14,631 rows and 17 columns. Each row
summarizes the PA of one NHANES participant. The first few variables are
participant ID (seqn
), NHANES year (wave
; 1 for 2003–2004, 2 for
2005–2006), number of valid days of monitoring (valid_days
,
valid_week_days
, and valid_weekend_days
), and whether the
participant has valid data for statistical analysis (include
). Then we
have daily averages for accelerometer wear time (valid_min
), counts
(counts
), and counts per minute of wear time (cpm
), for all valid
days and for weekdays (wk_
prefix) and weekend days (we_
prefix)
separately.
The variable cpm
is a standard measure of total PA. We will compare
weekday cpm (wk_cpm
) and weekend cpm (we_cpm
) in participants age 6
to 18 years to address our hypothesis. To get age for each participant,
we merge a demographics dataset (included in nhanesaccel) to the
nhanes2
data frame. Then we calculate for each participant the percent
difference between wk_cpm
and we_cpm
. Positive values for cpm_diff
indicate greater cpm on weekdays than on weekend days.
# Load demographics data and merge with nhanes2
data(dem)
<- merge(x = nhanes2, y = dem)
nhanes2
# Calculate percent difference between weekday and weekend CPM for each participant
$cpm_diff <- (nhanes2$wk_cpm - nhanes2$we_cpm) /
nhanes2$wk_cpm + nhanes2$we_cpm) / 2) * 100 ((nhanes2
Now we are almost ready for statistical analysis. Because NHANES uses a
complex multi-stage probability sampling design, analyzing the data as a
simple random sample would result in incorrect inference. We need to use
functions in the survey
package (Lumley 2004, 2014) rather than base R functions like t.test
and glm
. For those who are not familiar with concepts in survey
analysis, reference materials and tutorials are available
(Lumley 2004, 2010, 2012).
Here we create a survey object using design features of NHANES.
# Create survey object called hanes
<- svydesign(id = ~ sdmvpsu, strata = ~ sdmvstra, weight = ~ wtmec4yr_adj,
hanes data = nhanes2, nest = TRUE)
Next we generate a figure comparing weekday and weekend PA in
participants age 6 to 18 years. We use functions in the survey package
to calculate mean (95% confidence interval) for cpm_diff
in one-year
age increments, then plot the results.
# Calculate mean (SE) and 95% CI's for cpm_diff for ages 6 to 18 years
<- svyby(~ cpm_diff, by = ~ ridageyr, design = subset(hanes, ridageyr <= 18),
mean.diff FUN = svymean, na.rm = TRUE)
<- confint(mean.diff)
ci
# Plot means and CI's for cpm_diff by age
plot(x = 6:18, y = mean.diff[, 2], main = "CPM on Weekdays vs. Weekends",
ylim = c(-30, 30), pch = 19, ylab = "Perc. diff. (mean +/- 95% CI)",
xlab = "Age (years)", cex = 0.8, cex.axis = 0.85, cex.main = 1.5,
cex.lab = 1.1, xaxt = "n")
axis(side = 1, at = 6:18, cex.axis = 0.85)
segments(x0 = 6:18, y0 = ci[, 1], x1 = 6:18, y1 = ci[, 2], lwd = 1.3)
abline(h = 0, lty = 2)
Figure 1 shows that PA is similar on weekdays and weekends from age 6 to 12 years, but 10–20% greater on weekdays from age 13 to 18 years (all p < 0.05). These results support our hypothesis that American youth are more active on weekdays than on weekend days, but it appears that this is only the case for teenagers. Further analyses could try to identify the reason for the weekday/weekend difference in adolescents, for example by incorporating data on sports participation and time spent watching television for each age group.
Going back to the data processing step, users may wish to implement some non-default settings. To illustrate, the following code generates the full set of PA variables (203 variables), requires at least four valid days of monitoring, and uses a 90-minute rather than 60-minute minimum window for non-wear time.
# Process NHANES 2003-2006 data with four non-default inputs
<- nhanes.accel.process(brevity = 3, valid.days = 4, nonwear.window = 90,
nhanes3 weekday.weekend = TRUE)
The next example provides code to replicate the methods used in the
NCI’s SAS programs to process NHANES 2003–2006 data. In the first
function call, we explicitly set each function input as needed to
replicate the NCI’s methods; in the second, we use the nci.methods
argument as a shortcut.
# Specify count cut-points for moderate and vigorous intensity in youth
<- c(1400, 1515, 1638, 1770, 1910, 2059, 2220, 2393, 2580, 2781, 3000, 3239)
youthmod <- c(3758, 3947, 4147, 4360, 4588, 4832, 5094, 5375, 5679, 6007, 6363, 6751)
youthvig
# Process NHANES 2003-2006 data using NCI's methods
<- nhanes.accel.process(waves = 3, brevity = 2, valid.days = 4,
nci1 youth.mod.cuts = youthmod, youth.vig.cuts = youthvig,
cpm.nci = TRUE, days.distinct = TRUE, nonwear.tol = 2,
nonwear.tol.upper = 100, nonwear.nci = TRUE,
weartime.maximum = 1440, active.bout.tol = 2,
active.bout.nci = TRUE, artifact.thresh = 32767,
artifact.action = 3)
# Repeat, but use nci.methods input for convenience
<- nhanes.accel.process(waves = 3, brevity = 2, nci.methods = TRUE)
nci2
# Verify that nci1 and nci2 are equivalent
all(nci1 == nci2, na.rm = TRUE)
While we prefer using the default function inputs, there are some compelling reasons to use the NCI’s methods instead. Doing so is simple, easy to justify in manuscripts, and ensures consistency with prior studies that have used the NCI’s SAS programs. On the other hand, certain aspects of the NCI’s methods are suboptimal. For example, the non-wear algorithm is applied to each day of monitoring separately, which results in misclassification of non-wear time as sedentary time when participants remove the accelerometer between 11 pm and midnight to go to sleep (Winkler et al. 2012). Of course, it is up to the user to decide whether to use the default settings, the NCI’s methods, or some other combination of function inputs. Certainly many different approaches can be justified.
The code from this section of the paper is included in nhanesaccel as
a demo that can be executed by running demo("hanesjournal")
.
One of the challenges of working with the NHANES accelerometer dataset is its size. A program written in R or MATLAB using standard loops to generate variables for nearly 15,000 participants can take hours to process. Efficiency was therefore a high priority in the development of nhanesaccel.
Processing times for nhanesaccel and the NCI’s SAS programs were
compared using R version 3.1.0 and SAS Version 9.4, respectively, on a
Dell OptiPlex 990 desktop computer with Intel Core i7-2600 CPU at 3.4
GHz and 8 GB of RAM. The system.time
function in R was used to record
processing times for nhanesaccel, and “real time” output from the SAS
log file was used for the NCI’s programs. In both cases, elapsed times
rather than CPU times were recorded. Results are shown in
Table 4.
nhanesaccel | nhanesaccel | NCI’s SAS | |
(defaults) | (NCI methods) | programs | |
8.64 (0.02) s | 17.34 (0.04) s | 25.03 (0.79) min |
The package nhanesaccel was 174 times faster than the NCI’s SAS programs under default settings, and 87 times faster using settings that replicate the NCI’s methods. For nhanesaccel, the “NCI methods” settings have longer processing times than the defaults because the NCI’s algorithms for non-wear and activity bout detection are more computationally intensive than the defaults.
Equivalence between nhanesaccel (NCI methods) and NCI’s SAS programs was confirmed by direct comparison of the files generated. The only difference was that the NCI’s SAS programs had data on nine participants with potentially unreliable data, whereas nhanesaccel excluded those participants entirely. Excluding this data is considered acceptable by the NCI (National Cancer Institute 2007).
Both the NCI’s SAS programs and nhanesaccel require some one-time steps prior to processing. The SAS programs require downloading and unzipping the data files, converting from .xpt format to SAS datasets, and downloading four SAS scripts and making minor changes to each. This process takes about ten minutes. As for nhanesaccel, installation can take a minute or two due to its large size.
Processing the full NHANES 2003–2006 using nhanesaccel takes approximately twice the time to process just NHANES 2003–2004. In five trials, mean (standard deviation) processing times were 17.57 (0.01) s using defaults and 40.26 (0.06) s using the NCI’s methods.
The nhanesaccel package should increase accessibility to the
objectively measured PA data in NHANES 2003–2006. The software is free,
efficient, and allows flexibility in data processing without writing
original code to process over a billion data points. Proficiency in R is
not required, as researchers can obtain a file for statistical analysis
in SAS or another software package with just three lines of R code
(install package, load package, call nhanes.accel.process
). For
researchers with data from studies other than NHANES, the functions in
accelerometry should greatly simplify the process of converting
time-series count data into meaningful PA variables.
There are several limitations of the packages presented in this article.
First, the functions are not immediately useful for processing
accelerometer data collected in less than one-minute epochs. Researchers
with 1-s, 30-Hz, or 80-Hz data could convert the data to one-minute
epochs and then use the functions in accelerometry, but doing so would
sacrifice any additional information contained in the higher resolution
data. Also, the default settings for the functions in accelerometry
and nhanesaccel were chosen for data collected from accelerometers
worn at the hip. Users who wish to process wrist data could still use
the functions, but would have to take care to adjust function
parameters, particularly int.cuts
.
Another limitation is that the NHANES 2003–2006 dataset is 8–11 years old, and the data has been available for about six years. However, NHANES 2003–2006 remains the most recent study with objectively measured PA on a nationally representative sample of Americans. Although population activity levels may have changed since 2003–2006, associations between PA and outcomes should be relatively constant over time. Additionally, data on deaths will continue to be updated, which will enable future studies on PA and 5- or 10-year mortality. There are likely still many opportunities for useful contributions to the PA literature from this dataset.
There are several other R packages for processing accelerometer data.
The PhysicalActivity package (Choi et al. 2011b) has functions for converting
accelerometer data to longer epochs (dataCollapser
), plotting
accelerometer data (plotData
), and implementing a non-wear algorithm
(wearingMarking
). Briefly, this algorithm uses a 90- rather than
60-minute window, and allows one or two non-zero count values provided
that they are surrounded by 30 minutes of zero counts on both sides.
Variants of this non-wear algorithm can be implemented by adjusting
function inputs. The GGIR
package (Hees et al. 2014) has functions for processing high-frequency triaxial
data from certain model accelerometers, and has a function for
generating summary statistics. The variables that GGIR generates are
mostly indicators of overall PA, such as mean acceleration over the full
day or during time periods within the day. Finally, pawacc has
functions for converting to longer epochs, detecting non-wear and
activity bouts, generating summary statistics, and plotting data. The
pawacc package can also be used to read accelerometer files into R
without using ActiGraph’s ActiLife software. To our knowledge,
nhanesaccel is the only shared software currently available for
processing the data from NHANES 2003–2006 other than the NCI’s SAS
programs.
Looking forward, there is a great need for software to process high-frequency accelerometer data. In particular, wrist-worn accelerometers are being used in NHANES 2011–2014, with the devices recording triaxial data at 80-Hz (Troiano 2012). Analyzing this data will be challenging, as there will be over 145 million data points per participant, or about two trillion data points total. Aside from the sheer volume of data, methods for generating meaningful activity variables from high-frequency triaxial data are still being developed. It will be interesting to see how this data will be made available to the public, and whether the NCI will provide data processing software similar to the SAS programs for NHANES 2003–2006. Provided that the raw data is made available in some form, we intend to add functions to nhanesaccel or develop a separate R package to process NHANES 2011–2014 data.
We hope that accelerometry and nhanesaccel will make it easier for researchers to work with accelerometer data, and particularly the NHANES 2003–2006 dataset. Researchers who have any problems using either package, or have suggestions for additional features, should not hesitate to contact us.
This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-0940903.
accelerometry, Rcpp, pawacc, PhysicalActivity, survey, GGIR
HighPerformanceComputing, NumericalMathematics, OfficialStatistics, Survival, Tracking
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Domelen & Pittard, "Flexible R Functions for Processing Accelerometer Data, with Emphasis on NHANES 2003--2006", The R Journal, 2015
BibTeX citation
@article{RJ-2014-024, author = {Domelen, Dane R. Van and Pittard, W. Stephen}, title = {Flexible R Functions for Processing Accelerometer Data, with Emphasis on NHANES 2003--2006}, journal = {The R Journal}, year = {2015}, note = {https://rjournal.github.io/}, volume = {6}, issue = {2}, issn = {2073-4859}, pages = {52-62} }