arctools_intro

The arctools package allows to generate summaries of the minute-level physical activity (PA) data. The default parameters are chosen for the Actigraph activity counts collected with a wrist-worn device; however, the package can be used for other minute-level PA data with the corresponding timepstamps vector.

Below, we demonstrate the use of arctools with the attached, exemplary minute-level Actigraph PA counts data.

Using arctools package to compute physical activity summaries

The arctools functions process one file with accelerometry data at a time.

Reading PA data

Four CSV data sets with minute-level activity counts data are attached to the arctools package. The data file names are stored in extdata_fnames object that becomes available once the arctools package is loaded.

Below, we defined fpath to be a path to one of the minute-level activity counts data files. fread() reads minute-level activity counts data file while conveniently skipping first few rows with meta data, and then as.data.frame() converts the read data into a data frame object. The read-in data is assigned to dat variable. head() and tail() get first few and last few rows of dat, respectively.

library(arctools)
library(data.table)
library(dplyr)
library(ggplot2)
library(lubridate)

## Read one of the data sets
fpath <- system.file("extdata", extdata_fnames[1], package = "arctools")
dat   <- as.data.frame(fread(fpath))
rbind(head(dat, 3), tail(dat, 3))
#>       Axis1 Axis2 Axis3 vectormagnitude           timestamp
#> 1      1021  1353  2170            2754 2018-07-13 10:00:00
#> 2      1656  1190  2212            3009 2018-07-13 10:01:00
#> 3      2540  1461  1957            3524 2018-07-13 10:02:00
#> 10078     0     0     0               0 2018-07-20 09:57:00
#> 10079     0     0     0               0 2018-07-20 09:58:00
#> 10080     0     0     0               0 2018-07-20 09:59:00

The data columns are:

  • Axis1 - sensor’s X axis minute-level counts data,
  • Axis2 - sensor’s Y axis minute-level counts data,
  • Axis3 - sensor’s Z axis minute-level counts data,
  • vectormagnitude - minute-level counts data defined as sqrt(Axis1^2 + Axis2^2 + Axis3^2),
  • timestamp - time-stamps corresponding to minute-level measures.
## Plot activity counts
## Format timestamp data column from character to POSIXct object
ggplot(dat, aes(x = ymd_hms(timestamp), y = vectormagnitude)) + 
  geom_line(size = 0.3, alpha = 0.8) + 
  labs(x = "Time", y = "Activity counts") + 
  theme_gray(base_size = 10) + 
  scale_x_datetime(date_breaks = "1 day", date_labels = "%b %d")
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: 7 failed to parse.
#> Warning: Removed 7 rows containing missing values or values outside the scale range
#> (`geom_line()`).

Computing summaries with activity_stats method

acc    <- dat$vectormagnitude
acc_ts <- ymd_hms(dat$timestamp)
#> Warning: 7 failed to parse.

activity_stats(acc, acc_ts)
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
#>   n_days n_valid_days wear_time_on_valid_days     tac     tlac    ltac
#> 1      9            4                    1439 2826648 6429.838 14.8546
#>        astp       satp time_spent_active time_spent_nonactive
#> 1 0.1781782 0.09579564             499.5                939.5
#>   no_of_active_bouts no_of_nonactive_bouts mean_active_bout mean_nonactive_bout
#> 1                 89                    90          5.61236            10.43889

Output explained

To explain activity_stats method output, we first define the terms activity count, active/non-active minute, active/non-active bout, and valid day.

  • Activity count (AC) - a minute-level metric of PA volume.
  • Active minute - a minute with AC equal or above a fixed threshold; for wrist-worn Actigraph
    we use AC>=1853 (method’s default).
  • Non-active (sedentary) minute - a minute with AC below a fixed threshold; for wrist-worn Actigraph
    we use AC<1853 (method’s default).
  • Active bout - a sequence of 1 or more consecutive active minute(s).
  • Non-active bout - a sequence of 1 or more consecutive non-active minute(s).
  • Valid day - a day with no more than 10% of the non-wear time (see Details in ?activity_stats).

Meta information:

  • n_days - number of days (unique day dates) of data collection.
  • n_valid_days - number of days (unique day dates) of data collection determined as valid days.
  • wear_time_on_valid_days - average number of wear-time minutes across valid days.

Summaries of PA volumes metrics:

  • tac - TAC, Total activity counts per day - sum of AC measured on valid days divided by the number of valid days.
  • tlac - TLAC, Total-log activity counts per day - sum of log(1+AC) measured on valid days divided by the number of valid days. Here ‘log’ denotes the natural logarithm.
  • ltac - LTAC, Log-total activity counts - natural logarithm of TAC.
  • time_spent_active - Average number of active minutes per valid day.
  • time_spent_nonactive - Average number of sedentary minutes per valid day.

Summaries of PA fragmentation metrics:

  • astp - ASTP, active to sedentary transition probability on valid days.
  • satp - SATP, sedentary to active transition probability on valid days.
  • no_of_active_bouts - Average number of active minutes per valid day.
  • no_of_nonactive_bouts - Average number of sedentary minutes per valid day.
  • mean_active_bout - Average duration (in minutes) of an active bout on valid days.
  • mean_nonactive_bout - Average duration (in minutes) of a sedentary bout on valid days.

Additionalactivity_stats method options

Summarizing PA within a fixed set of minutes only

The subset_minutes argument allows to specify a subset of a day’s minutes where activity summaries should be computed. There are 1440 minutes in a 24-hour day where 1 denotes 1st minute of the day (from 00:00 to 00:01), and 1440 denotes the last minute (from 23:59 to 00:00).

Here, we summarize PA observed between 12:00 AM and 6:00 AM.

subset_12am_6am <- 1 : (6 * 1440/24)
activity_stats(acc, acc_ts, subset_minutes = subset_12am_6am) 
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
#>   n_days n_valid_days wear_time_on_valid_days tac_0to6only tlac_0to6only
#> 1      9            4                    1439      65477.5      322.1523
#>   ltac_0to6only astp_0to6only satp_0to6only time_spent_active_0to6only
#> 1      11.08946     0.5581395     0.0201005                      10.75
#>   time_spent_nonactive_0to6only no_of_active_bouts_0to6only
#> 1                        348.25                           6
#>   no_of_nonactive_bouts_0to6only mean_active_bout_0to6only
#> 1                              7                  1.791667
#>   mean_nonactive_bout_0to6only
#> 1                        49.75

By default, column names have a suffix added to denote that a subset of minutes was used (here, _0to6only). This can be disabled by setting adjust_out_colnames to FALSE.

subset_12am_6am = 1 : (6/24 * 1440)
subset_6am_12pm = (6/24 * 1440 + 1) : (12/24 * 1440) 
subset_12pm_6pm = (12/24 * 1440 + 1) : (18/24 * 1440) 
subset_6pm_12am = (18/24 * 1440 + 1) : (24/24 * 1440) 
out <- rbind(
  activity_stats(acc, acc_ts, subset_minutes = subset_12am_6am, adjust_out_colnames = FALSE),
  activity_stats(acc, acc_ts, subset_minutes = subset_6am_12pm, adjust_out_colnames = FALSE),
  activity_stats(acc, acc_ts, subset_minutes = subset_12pm_6pm, adjust_out_colnames = FALSE),
  activity_stats(acc, acc_ts, subset_minutes = subset_6pm_12am, adjust_out_colnames = FALSE))
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
rownames(out) <- c("12am-6am", "6am-12pm", "12pm-6pm", "6pm-12am")
out
#>          n_days n_valid_days wear_time_on_valid_days       tac      tlac
#> 12am-6am      9            4                    1439   65477.5  322.1523
#> 6am-12pm      9            4                    1439 1089788.5 2139.4534
#> 12pm-6pm      9            4                    1439  994104.8 2194.8539
#> 6pm-12am      9            4                    1439  677277.5 1773.3781
#>              ltac      astp      satp time_spent_active time_spent_nonactive
#> 12am-6am 11.08946 0.5581395 0.0201005             10.75               348.25
#> 6am-12pm 13.90149 0.1501377 0.1540616            181.50               178.50
#> 12pm-6pm 13.80960 0.1751337 0.1864162            187.00               173.00
#> 6pm-12am 13.42584 0.2037422 0.1032325            120.25               239.75
#>          no_of_active_bouts no_of_nonactive_bouts mean_active_bout
#> 12am-6am               6.00                  7.00         1.791667
#> 6am-12pm              27.25                 27.50         6.660550
#> 12pm-6pm              32.75                 32.25         5.709924
#> 6pm-12am              24.50                 24.75         4.908163
#>          mean_nonactive_bout
#> 12am-6am           49.750000
#> 6am-12pm            6.490909
#> 12pm-6pm            5.364341
#> 6pm-12am            9.686869

Summarizing PA within a subset of weekdays only

The subset_weekdays argument allows to specify days of a week within which activity summaries are to be computed; it takes values between 1 (Sunday) to 7 (Saturday). Default is NULL (all days of a week are used).

Here, we summarize PA within weekday days only. Note that in the method output, the n_days and n_valid_days columns only count the days from the selected week days subset; for example, below, n_days number of unique day dates in data is 6 despite the range of data collection without subsetting ranges 8 days.

# day of a week indices 2,3,4,5,6 correspond to Mon,Tue,Wed,Thu,Fri 
subset_weekdays <- c(2:6)
activity_stats(acc, acc_ts, subset_weekdays = subset_weekdays) 
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
#>   n_days n_valid_days wear_time_on_valid_days tac_weekdays23456only
#> 1      6            3                    1439               2865711
#>   tlac_weekdays23456only ltac_weekdays23456only astp_weekdays23456only
#> 1               6444.155               14.86833              0.1757294
#>   satp_weekdays23456only time_spent_active_weekdays23456only
#> 1             0.09540762                            502.6667
#>   time_spent_nonactive_weekdays23456only no_of_active_bouts_weekdays23456only
#> 1                               936.3333                             88.33333
#>   no_of_nonactive_bouts_weekdays23456only mean_active_bout_weekdays23456only
#> 1                                89.33333                           5.690566
#>   mean_nonactive_bout_weekdays23456only
#> 1                              10.48134

Note the subset_weekdays argument can be combined with other arguments, i.e. subset_minutes to subset of a day’s minutes where activity summaries should be computed.

# day of a week indices 7,1 correspond to Sat,Sun
subset_weekdays <- c(7,1)
activity_stats(acc, acc_ts, subset_weekdays = subset_weekdays, subset_minutes = subset_6am_12pm) 
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
#>   n_days n_valid_days wear_time_on_valid_days tac_6to12only_weekdays17only
#> 1      2            1                    1439                       917368
#>   tlac_6to12only_weekdays17only ltac_6to12only_weekdays17only
#> 1                      2071.864                      13.72926
#>   astp_6to12only_weekdays17only satp_6to12only_weekdays17only
#> 1                     0.1840491                     0.1522843
#>   time_spent_active_6to12only_weekdays17only
#> 1                                        163
#>   time_spent_nonactive_6to12only_weekdays17only
#> 1                                           197
#>   no_of_active_bouts_6to12only_weekdays17only
#> 1                                          30
#>   no_of_nonactive_bouts_6to12only_weekdays17only
#> 1                                             30
#>   mean_active_bout_6to12only_weekdays17only
#> 1                                  5.433333
#>   mean_nonactive_bout_6to12only_weekdays17only
#> 1                                     6.566667

Summarizing PA with a fixed set of minutes excluded

The exclude_minutes argument allows specifying a subset of a day’s minutes excluded for computing activity summaries.

Here, we summarize PA while excluding observations between 11:00 PM and 5:00 AM.

subset_11pm_5am <- c(
  (23 * 1440/24 + 1) : 1440,   ## 11:00 PM - midnight
  1 : (5 * 1440/24)            ## midnight - 5:00 AM
) 
activity_stats(acc, acc_ts, exclude_minutes = subset_11pm_5am) 
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
#>   n_days n_valid_days wear_time_on_valid_days tac_23to5removed
#> 1      9            4                    1439          2735749
#>   tlac_23to5removed ltac_23to5removed astp_23to5removed satp_23to5removed
#> 1           6052.84          14.82192         0.1702018         0.1395057
#>   time_spent_active_23to5removed time_spent_nonactive_23to5removed
#> 1                         483.25                            596.75
#>   no_of_active_bouts_23to5removed no_of_nonactive_bouts_23to5removed
#> 1                           82.25                              83.25
#>   mean_active_bout_23to5removed mean_nonactive_bout_23to5removed
#> 1                       5.87538                         7.168168

Summarizing PA with in-bed time excluded

The in_bed_time and out_bed_time arguments allow to provide day-specific in-bed periods to be excluded from analysis.

Here, we summarize PA excluding in-bed time estimated by ActiLife software.

ActiLife-estimated in-bed data

The ActiLife-estimated in-bed data file is attached to the arctools package. The sleep data columns include:

  • Subject Name - subject IDs corresponding to AC data, stored in extdata_fnames,
  • In Bed Time - ActiLife-estimated start of in-bed interval for each day of the measurement,
  • Out Bed Time - ActiLife-estimated end of in-bed interval.
## Read sleep details data file
SleepDetails_fname <- "BatchSleepExportDetails_2020-05-01_14-00-46.csv"
SleepDetails_fpath <- system.file("extdata", SleepDetails_fname, package = "arctools")
SleepDetails       <- as.data.frame(fread(SleepDetails_fpath))

## Filter sleep details data to keep ID1 file 
SleepDetails_sub <-
    SleepDetails %>%
    filter(`Subject Name` == "ID_1") %>%
    select(`Subject Name`, `In Bed Time`, `Out Bed Time`) 
str(SleepDetails_sub)
#> 'data.frame':    6 obs. of  3 variables:
#>  $ Subject Name: chr  "ID_1" "ID_1" "ID_1" "ID_1" ...
#>  $ In Bed Time : chr  "7/13/2018 9:18:00 PM" "7/14/2018 10:41:00 PM" "7/16/2018 7:46:00 PM" "7/17/2018 11:30:00 PM" ...
#>  $ Out Bed Time: chr  "7/14/2018 4:50:00 AM" "7/15/2018 5:40:00 AM" "7/17/2018 4:32:00 AM" "7/18/2018 6:32:00 AM" ...

We transform dates stored as character into POSIXct object, and then use in/out-bed dates vectors in activity_stats method.

in_bed_time  <- mdy_hms(SleepDetails_sub[, "In Bed Time"])
out_bed_time <- mdy_hms(SleepDetails_sub[, "Out Bed Time"])

activity_stats(acc, acc_ts, in_bed_time = in_bed_time, out_bed_time = out_bed_time) 
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
#> Warning in midnight_to_midnight(acc = acc_ts, acc_ts = acc_ts): Removed 6
#> timestamp duplicate observations.
#>   n_days n_valid_days wear_time_on_valid_days tac_inbedremoved
#> 1      9            4                    1439          2746582
#>   tlac_inbedremoved ltac_inbedremoved astp_inbedremoved satp_inbedremoved
#> 1          6062.753          14.82587         0.1703551         0.1580934
#>   time_spent_active_inbedremoved time_spent_nonactive_inbedremoved
#> 1                         485.75                            529.75
#>   no_of_active_bouts_inbedremoved no_of_nonactive_bouts_inbedremoved
#> 1                           82.75                              83.75
#>   mean_active_bout_inbedremoved mean_nonactive_bout_inbedremoved
#> 1                      5.870091                         6.325373

Components of activity_stats method

The primary method activity_stats is composed of several steps implemented in their respective functions. Below, we demonstrate how to produce activity_stats results step by step with these functions.

We reuse the objects:

  • acc - a numeric vector; minute-level activity counts data,
  • acc_ts - a POSIXct vector; minute-level time of acc data collection.
df <- data.frame(acc = acc, acc_ts = acc_ts)
rbind(head(df, 3), tail(df, 3))
#>        acc              acc_ts
#> 1     2754 2018-07-13 10:00:00
#> 2     3009 2018-07-13 10:01:00
#> 3     3524 2018-07-13 10:02:00
#> 10078    0 2018-07-20 09:57:00
#> 10079    0 2018-07-20 09:58:00
#> 10080    0 2018-07-20 09:59:00

Expand the length of minute-level AC vector to integer number of full 24-hour periods by NA-padding with midnight_to_midnight

  • In the returned vector, the first observation corresponds to the minute of 00:00-00:01 on the first day of data collection, and the last observation corresponds to the minute of 23:50-00:00 on the last day of data collection.
  • Entries corresponding to non-measured minutes are filled with NA.

Here, collected data cover total of 7*24*1440 = 10080 minutes (from 2018-07-13 10:00:00 to 2018-07-20 09:59:00), but spans 8*24*1440 = 11520 minutes of full midnight-to-midnight days (from 2018-07-13 00:00:00 to 2018-07-20 23:59:00).

acc <- midnight_to_midnight(acc = acc, acc_ts = acc_ts)
#> Warning in midnight_to_midnight(acc = acc, acc_ts = acc_ts): Removed 6
#> timestamp duplicate observations.

## Vector length on non NA-obs, vector length after acc 
c(length(acc[!is.na(acc)]), length(acc))
#> [1] 10073 12960

Get wear/non-wear flag with get_wear_flag

Function get_wear_flag computes wear/non-wear flag (1/0) for each minute of activity counts data. Method implements wear/non-wear detection algorithm closely following that of Choi et al. (2011). See ?get_wear_flag for more details and function arguments.

  • The returned vector has value 1 for wear and 0 for non-wear flagged minute.
  • If there is an NA entry in a data input vector, then the returned vector will have a corresponding entry set to NA too.
wear_flag <- get_wear_flag(acc)

## Proportion of wear time across the days
wear_flag_mat <- matrix(wear_flag, ncol = 1440, byrow = TRUE)
round(apply(wear_flag_mat, 1, sum, na.rm = TRUE) / 1440, 3)
#> [1] 0.583 0.999 0.874 0.679 0.999 0.999 0.999 0.337 0.000

Get valid/non-valid day flag with get_valid_day_flag

Function get_valid_day_flag computes valid/non-valid day flag (1/0) for each minute of activity counts data. See ?get_valid_day_flag for more details and function arguments.

Here, 4 out of 8 days have more than 10% (144 minutes) of missing data.

valid_day_flag <- get_valid_day_flag(wear_flag)

## Compute number of valid days
valid_day_flag_mat <- matrix(valid_day_flag, ncol = 1440, byrow = TRUE)
apply(valid_day_flag_mat, 1, mean, na.rm = TRUE)
#> [1] 0 1 0 0 1 1 1 0 0

Impute missing data with impute_missing_data

Function impute_missing_data imputes missing data in valid days based on the “average day profile”, a minute-wise average of wear-time AC across valid days. See ?get_valid_day_flag for more details and function arguments.

## Copies of original objects for the purpose of demonstration
acc_cpy  <- acc
wear_flag_cpy <- wear_flag

## Artificially replace 1h (4%) of a valid day with non-wear 
repl_idx <- seq(from = 1441, by = 1, length.out = 60)
acc_cpy[repl_idx] <- 0
wear_flag_cpy[repl_idx] <- 0

## Impute data for minutes identified as non-wear in days identified as valid
acc_cpy_imputed <- impute_missing_data(acc_cpy, wear_flag_cpy, valid_day_flag)

## Compare mean activity count on valid days before and after imputation
c(mean(acc_cpy[which(valid_day_flag == 1)]), 
  mean(acc_cpy_imputed[which(valid_day_flag == 1)]))
#> [1] NA NA

Create PA characteristics with summarize_PA

Finally, method summarize_PA computes PA summaries. Similarly as activity_stats, it accepts arguments to subset/exclude minutes. See ?activity_stats for more details and function arguments.

summarize_PA(acc, acc_ts, wear_flag, valid_day_flag) 
#>   n_days n_valid_days wear_time_on_valid_days     tac     tlac    ltac
#> 1      9            4                    1439 2826648 6429.838 14.8546
#>        astp       satp time_spent_active time_spent_nonactive
#> 1 0.1781782 0.09579564             499.5                939.5
#>   no_of_active_bouts no_of_nonactive_bouts mean_active_bout mean_nonactive_bout
#> 1                 89                    90          5.61236            10.43889

It returns the same results as the activity_stats function:

activity_stats(dat$vectormagnitude, ymd_hms(dat$timestamp))
#> Warning: 7 failed to parse.
#> Warning in midnight_to_midnight(acc, acc_ts): Removed 6 timestamp duplicate
#> observations.
#>   n_days n_valid_days wear_time_on_valid_days     tac     tlac    ltac
#> 1      9            4                    1439 2826648 6429.838 14.8546
#>        astp       satp time_spent_active time_spent_nonactive
#> 1 0.1781782 0.09579564             499.5                939.5
#>   no_of_active_bouts no_of_nonactive_bouts mean_active_bout mean_nonactive_bout
#> 1                 89                    90          5.61236            10.43889