read_gt3x.Rmd
This document describes how the read.gt3x package can be used to read binary activity data into R. To access the read.gt3x package, use:
library(read.gt3x)
For source code and installation instructions, see the GitHub page.
The read.gt3x package includes two sample .gt3x files which I’ll use to demonstrate reading the data. First we need the path to a single gt3x file. We will use data embedded in the package:
gt3xfile <-
system.file(
"extdata", "TAS1H30182785_2019-09-17.gt3x",
package = "read.gt3x")
but longer and more extensive data can be downloaded via gt3x_datapath
:
gt3xfile <- gt3x_datapath(1)
The read.gt3x()
function can take as input a path to a single .gt3x file and will then read activity samples as an R matrix.
X <- read.gt3x(gt3xfile)
head(X)
#> Sampling Rate: 100Hz
#> Firmware Version: 1.7.2
#> Serial Number Prefix: TAS
#> X Y Z
#> [1,] 0.000 0.008 0.996
#> [2,] 0.016 0.000 1.008
#> [3,] 0.020 -0.008 1.004
#> [4,] 0.016 -0.012 1.012
#> [5,] 0.016 -0.008 1.008
#> [6,] 0.008 -0.008 1.008
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 551737 29.5 1222631 65.3 NA 739631 39.6
#> Vcells 1005230 7.7 8388608 64.0 65536 2446491 18.7
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 551746 29.5 1222631 65.3 NA 739631 39.6
#> Vcells 1005300 7.7 8388608 64.0 65536 2446491 18.7
.gt3x files are actually zip archives which contain two files: log.bin and info.txt. log.bin is a binary file that contains the actual samples. It might make sense to store the data as unzipped folders containing these two files, because otherwise the read.gt3x() function will have to unzip each .gt3x archive to a temporary location, every time you need to access the data.
read.gt3x()
also accepts paths to unzipped gt3x folders. To demonstrate the usage, we’ll unzip the sample .gt3x files in the package, and then read them. The unzip.gt3x()
helper function unzips all .gt3x files in a given directory. By default, the contents of a .gt3x file named “subject001.gt3x” are extracted to a folder named “subject001”. unzip.gt3x()
returns a vector of paths to the unzipped gt3x folders. The location argument can be used to choose where to locate those folders.
datadir <- dirname(gt3xfile) # location of .gt3x files
gt3xfolders <- unzip.gt3x(datadir, location = tempdir())
#> Unzipping gt3x data to /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpZvFbRG
#> 1/1
#> Unzipping /private/var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T/RtmpW9yqzD/temp_libpath85592ead099f/read.gt3x/extdata/TAS1H30182785_2019-09-17.gt3x
#> === info.txt, log.bin extracted to /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpZvFbRG/TAS1H30182785_2019-09-17
The read.gt3x()
function accepts a path to an unzipped gt3x folder. It is a bit faster if the unzip step has already been performed.
gt3xfolder <- gt3xfolders[1]
X <- read.gt3x(gt3xfolder)
head(X)
#> Sampling Rate: 100Hz
#> Firmware Version: 1.7.2
#> Serial Number Prefix: TAS
#> X Y Z
#> [1,] 0.000 0.008 0.996
#> [2,] 0.016 0.000 1.008
#> [3,] 0.020 -0.008 1.004
#> [4,] 0.016 -0.012 1.012
#> [5,] 0.016 -0.008 1.008
#> [6,] 0.008 -0.008 1.008
Internally, the data matrix returned by read.gt3x() is a bit smarter than it looks, as it knows all the (relative) timestamps of the observations.
str(X)
#> 'activity' num [1:33000, 1:3] 0 0.016 0.02 0.016 0.016 0.008 0.016 0.02 0.016 0.012 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : NULL
#> ..$ : chr [1:3] "X" "Y" "Z"
#> - attr(*, "time_index")= num [1:33000] 0 1 2 3 4 5 6 7 8 9 ...
#> - attr(*, "missingness")='data.frame': 10 obs. of 2 variables:
#> ..$ time : POSIXct[1:10], format: "2019-09-17 18:40:10" "2019-09-17 18:44:21" ...
#> ..$ n_missing: int [1:10] 400 10500 55400 112600 3300 100 100 500 100 24500
#> - attr(*, "total_records")= int 33000
#> - attr(*, "start_time_param")= num 1.57e+09
#> - attr(*, "start_time_info")= num 1.57e+09
#> - attr(*, "sample_rate")= int 100
#> - attr(*, "impute_zeroes")= logi FALSE
#> - attr(*, "add_light")= logi FALSE
#> - attr(*, "start_time")= POSIXct[1:1], format: "2019-09-17 18:40:00"
#> - attr(*, "stop_time")= POSIXct[1:1], format: "2019-09-18 19:00:00"
#> - attr(*, "last_sample_time")= POSIXct[1:1], format: "2019-09-17 19:20:05"
#> - attr(*, "subject_name")= chr "suffix_85"
#> - attr(*, "time_zone")= chr "-04:00:00"
#> - attr(*, "firmware")= chr "1.7.2"
#> - attr(*, "serial_prefix")= chr "TAS"
#> - attr(*, "acceleration_min")= chr "-8.0"
#> - attr(*, "acceleration_max")= chr "8.0"
#> - attr(*, "bad_samples")= logi FALSE
#> - attr(*, "old_version")= logi FALSE
#> - attr(*, "header")=List of 17
#> ..$ Serial Number : chr "TAS1H30182785"
#> ..$ Device Type : chr "Link"
#> ..$ Firmware : chr "1.7.2"
#> ..$ Battery Voltage : chr "4.18"
#> ..$ Sample Rate : num 100
#> ..$ Start Date : POSIXct[1:1], format: "2019-09-17 18:40:00"
#> ..$ Stop Date : POSIXct[1:1], format: "2019-09-18 19:00:00"
#> ..$ Last Sample Time : POSIXct[1:1], format: "2019-09-17 19:20:05"
#> ..$ TimeZone : chr "-04:00:00"
#> ..$ Download Date : POSIXct[1:1], format: "2019-09-17 19:20:05"
#> ..$ Board Revision : chr "8"
#> ..$ Unexpected Resets : chr "0"
#> ..$ Acceleration Scale: int 256
#> ..$ Acceleration Min : chr "-8.0"
#> ..$ Acceleration Max : chr "8.0"
#> ..$ Subject Name : chr "suffix_85"
#> ..$ Serial Prefix : chr "TAS"
#> ..- attr(*, "class")= chr [1:2] "gt3x_info" "list"
the read.gt3x package has an as.data.frame method for the activity matrix, which converts the matrix to a dataframe and adds a “time” column, which gives the timestamp of each sample. The timestamps are stored in R with the GMT timezone but note that this is misleading: in reality the timestamps correspond to the local time of the device!
X <- as.data.frame(X)
head(X)
#> Sampling Rate: 100Hz
#> Firmware Version: 1.7.2
#> Serial Number Prefix: TAS
#> time X Y Z
#> 1 2019-09-17 18:40:00.00 0.000 0.008 0.996
#> 2 2019-09-17 18:40:00.00 0.016 0.000 1.008
#> 3 2019-09-17 18:40:00.01 0.020 -0.008 1.004
#> 4 2019-09-17 18:40:00.02 0.016 -0.012 1.012
#> 5 2019-09-17 18:40:00.03 0.016 -0.008 1.008
#> 6 2019-09-17 18:40:00.04 0.008 -0.008 1.008
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 568559 30.4 1222631 65.3 NA 843623 45.1
#> Vcells 1043714 8.0 8388608 64.0 65536 2446491 18.7
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 568553 30.4 1222631 65.3 NA 843623 45.1
#> Vcells 1043761 8.0 8388608 64.0 65536 2446491 18.7