epirapoSurvival.RmdThis tutorial introduces the following functions in the epirapoSurvival package
construct_cohort_vac_outcome()construct_cohort_followup()summarise_cohort_followup()summarise_cohort_vac_outcome()The functions combine to allow general computations of events and follow-up time in different calendar time periods and vaccine effect states and allow for taking into account both incident and recurrent events and possible washout time.
Say we have the following cohort:
cohort <- tibble(HETU_ID = 1, SYNTYMAPAIVA = as.Date("1982-02-04"), KUOLINPVM = NA )
vacs <- tibble(HETU_ID = 1, RECORDDATE = as.Date("2021-05-03"), PRODUCT_ID = "COV")
outcome <- tibble(HETU_ID = 1, TAPAHTUMAPVM = as.Date("2021-06-20"))
cohort %>% knitr::kable()| HETU_ID | SYNTYMAPAIVA | KUOLINPVM |
|---|---|---|
| 1 | 1982-02-04 | NA |
with the following vaccinations:
vacs %>% knitr::kable()| HETU_ID | RECORDDATE | PRODUCT_ID |
|---|---|---|
| 1 | 2021-05-03 | COV |
and these outcomes:
outcome %>% knitr::kable()| HETU_ID | TAPAHTUMAPVM |
|---|---|
| 1 | 2021-06-20 |
Say we are interested in a followup period
start_date <- "2021-04-01"
end_date <- "2021-06-30"We first combine the data into a single data frame using
construct_cohort_vac_outcome()
cohort_vac_outcome <- construct_cohort_vac_outcome(cohort, vacs, outcome,
start_date = start_date,
end_date = end_date)
str(cohort_vac_outcome)
#> tibble [1 × 8] (S3: tbl_df/tbl/data.frame)
#> $ HETU_ID : num 1
#> $ SYNTYMAPAIVA: Date[1:1], format: "1982-02-04"
#> $ KUOLINPVM : logi NA
#> $ entrypvm : Date[1:1], format: "2021-04-01"
#> $ exitpvm : Date[1:1], format: "2021-06-30"
#> $ TAPAHTUMA_1 : Date[1:1], format: "2021-06-20"
#> $ RECORDDATE_1: Date[1:1], format: "2021-05-03"
#> $ PRODUCT_ID_1: chr "COV"
#> - attr(*, "start_date")= Date[1:1], format: "2021-04-01"
#> - attr(*, "end_date")= Date[1:1], format: "2021-06-30"
#> - attr(*, "max_events")= int 1
#> - attr(*, "washout")= num 0We can then split the data to selected follow-up intervals. By default, the followup is split by month, but that can be overridden by assigning the datecuts argument to NULL. In addition, the follow-up is split by specified vaccine effect time points, which can be integer vectors (positive or negative) of length 1 or more. These define cut points relative to the vaccination date. The effect intervals are given separately for each vaccine dose.
A washout can be defined relative to the event date and can be any positive integer, including 0 (no washout) or Inf (only incident cases). The choice of Inf corresponds to usual survival analysis where the follow-up ends at the first event. Other choices correspond to situations where recurring events are considered, but there is a washout period of the specified length after each event, during which time the individual does not contribute follow-up time. The washout time periods can be filtered out from the results, which is default behaviour.
Age groups are also added by default, according to right open cutpoints given in the argument agecuts.
cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
datecuts = NULL,
agecuts = c(0, 12, seq(20, 80, by = 10), Inf),
vac1_effect = 21,
washout = 180,
exclude_washout = FALSE)
cohort_followup %>% knitr::kable()| LEX_ID | DATE | IKAVUOSI | IKARYHMA | PERSONDAYS | EVENT_COUNT | WASHOUT_INDICATOR | VACCINE_EFFECT | VACCINE_EFFECT_PRODUCT | VACCINE_EFFECT_PRODUCTS | COVVAC_T1 | COVVAC_T2 | COVVAC_T3 | RISK_EFFECT | EVENT_T1 | RISK_EXPOSURE_T | SYNTYMAPAIVA | KUOLINPVM | RISK_EXPOSURE_DATE | PRODUCTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2021-04-01 | 39 | 30-39 | 53 | 0 | FALSE | None | None | None | -32 | -999999 | -999999 | RISK:-Inf–1 | -80 | -999999 | 1982-02-04 | NA | NA | NA |
| 1 | 2021-05-24 | 39 | 30-39 | 27 | 0 | FALSE | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | 21 | -999946 | -999946 | RISK:-Inf–1 | -27 | -999946 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-06-20 | 39 | 30-39 | 1 | 1 | FALSE | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | 48 | -999919 | -999919 | RISK:-Inf–1 | 0 | -999919 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-06-21 | 39 | 30-39 | 10 | 0 | TRUE | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | 49 | -999918 | -999918 | RISK:-Inf–1 | 1 | -999918 | 1982-02-04 | NA | NA | COV |
Please see the documentation of the function for details on the returned data.
Dates can be cut into specified intervals, the default choice in months. Age is computed for the beginning of each individual follow-up interval in the final result, so for a unified age treatment for all individuals it is recommended to use datecuts to split the follow-up by calendar time.
cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
agecuts = c(0, 12, seq(20, 80, by = 10), Inf),
vac1_effect = 21,
washout = 180)
cohort_followup %>% select(LEX_ID, CALENDAR, everything()) %>% knitr::kable()| LEX_ID | CALENDAR | DATE | IKAVUOSI | IKARYHMA | PERSONDAYS | EVENT_COUNT | WASHOUT_INDICATOR | VACCINE_EFFECT | VACCINE_EFFECT_PRODUCT | VACCINE_EFFECT_PRODUCTS | COVVAC_T1 | COVVAC_T2 | COVVAC_T3 | RISK_EFFECT | EVENT_T1 | RISK_EXPOSURE_T | SYNTYMAPAIVA | KUOLINPVM | RISK_EXPOSURE_DATE | PRODUCTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2021-04-01 | 2021-04-01 | 39 | 30-39 | 30 | 0 | FALSE | None | None | None | -32 | -999999 | -999999 | RISK:-Inf–1 | -80 | -999999 | 1982-02-04 | NA | NA | NA |
| 1 | 2021-05-01 | 2021-05-01 | 39 | 30-39 | 23 | 0 | FALSE | None | None | None | -2 | -999969 | -999969 | RISK:-Inf–1 | -50 | -999969 | 1982-02-04 | NA | NA | NA |
| 1 | 2021-05-01 | 2021-05-24 | 39 | 30-39 | 8 | 0 | FALSE | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | 21 | -999946 | -999946 | RISK:-Inf–1 | -27 | -999946 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-06-01 | 2021-06-01 | 39 | 30-39 | 19 | 0 | FALSE | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | 29 | -999938 | -999938 | RISK:-Inf–1 | -19 | -999938 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-06-01 | 2021-06-20 | 39 | 30-39 | 1 | 1 | FALSE | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | 48 | -999919 | -999919 | RISK:-Inf–1 | 0 | -999919 | 1982-02-04 | NA | NA | COV |
There can be any number of events and any number of vaccinations per individual.
The required outcome event dates can be named freely but if the name
is other than “TAPAHTUMAPVM”, it needs to be specified by the
outcomedate_column argument in the
construct_cohort_vac_outcome() function. They outcome
columns will be renamed as TAPAHTUMA_1,…,TAPAHTUMA_n in the results.
# a cohort of two individuals
cohort <- tibble(HETU_ID = c(1,2),
SYNTYMAPAIVA = c(as.Date("1982-02-04"), as.Date("1980-05-04")),
KUOLINPVM = c(NA, NA))
# the first had two events and the second had one event
outcome <- tibble(HETU_ID = c(1,1, 2),
diagdate = as.Date(c("2021-04-25", "2021-06-25", "2021-04-02")))
# the first was vaccinated twice
vacs <- tibble(HETU_ID = c(1,1),
RECORDDATE = as.Date(c("2021-04-20", "2021-05-20")),
PRODUCT_ID = c("COV", "COV"))
cohort_vac_outcome <- construct_cohort_vac_outcome(cohort, vacs, outcome,
start_date = start_date,
end_date = end_date,
outcomedate_column = "diagdate")
str(cohort_vac_outcome)
#> tibble [2 × 11] (S3: tbl_df/tbl/data.frame)
#> $ HETU_ID : num [1:2] 1 2
#> $ SYNTYMAPAIVA: Date[1:2], format: "1982-02-04" "1980-05-04"
#> $ KUOLINPVM : logi [1:2] NA NA
#> $ entrypvm : Date[1:2], format: "2021-04-01" "2021-04-01"
#> $ exitpvm : Date[1:2], format: "2021-06-30" "2021-06-30"
#> $ TAPAHTUMA_1 : Date[1:2], format: "2021-04-25" "2021-04-02"
#> $ TAPAHTUMA_2 : Date[1:2], format: "2021-06-25" NA
#> $ RECORDDATE_1: Date[1:2], format: "2021-04-20" NA
#> $ RECORDDATE_2: Date[1:2], format: "2021-05-20" NA
#> $ PRODUCT_ID_1: chr [1:2] "COV" NA
#> $ PRODUCT_ID_2: chr [1:2] "COV" NA
#> - attr(*, "start_date")= Date[1:1], format: "2021-04-01"
#> - attr(*, "end_date")= Date[1:1], format: "2021-06-30"
#> - attr(*, "max_events")= int 2
#> - attr(*, "washout")= num 0For the first three vaccinations, there are specific arguments vac1_effect, vac2_effect and vac3_effect. However, there is also the generic argument vac_effects, which is a list of any length and specifies effects (risk intervals) for any number of vaccinations. It is recommended to always use vac_effects instead of the vacx_effect arguments. The difference is that when using vac_effects, no extra placeholder columns related to vaccinations are created.
Otherwise, the number of events or vaccinations does not change how
construct_cohort_followup() is called or used. The washout
intervals are always identical for each event.
# using vac1_effect and vac2_effect arguments
cohort_followup1 <- construct_cohort_followup(cohort_vac_outcome,
vac1_effect = c(0, 10, 21),
vac2_effect = 7,
washout = 180,
datecuts = NULL,
exclude_washout = FALSE)
cohort_followup1 %>% knitr::kable()| LEX_ID | DATE | IKAVUOSI | IKARYHMA | PERSONDAYS | EVENT_COUNT | WASHOUT_INDICATOR | VACCINE_EFFECT | VACCINE_EFFECT_PRODUCT | VACCINE_EFFECT_PRODUCTS | COVVAC_T1 | COVVAC_T2 | COVVAC_T3 | RISK_EFFECT | EVENT_T1 | EVENT_T2 | RISK_EXPOSURE_T | SYNTYMAPAIVA | KUOLINPVM | RISK_EXPOSURE_DATE | PRODUCTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2021-04-01 | 39 | 30-39 | 19 | 0 | FALSE | None | None | None | -19 | -49 | -999999 | RISK:-Inf–1 | -24 | -85 | -999999 | 1982-02-04 | NA | NA | NA |
| 1 | 2021-04-20 | 39 | 30-39 | 5 | 0 | FALSE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 0 | -30 | -999980 | RISK:-Inf–1 | -5 | -66 | -999980 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-04-25 | 39 | 30-39 | 1 | 1 | FALSE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 5 | -25 | -999975 | RISK:-Inf–1 | 0 | -61 | -999975 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-04-26 | 39 | 30-39 | 4 | 0 | TRUE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 6 | -24 | -999974 | RISK:-Inf–1 | 1 | -60 | -999974 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-04-30 | 39 | 30-39 | 11 | 0 | TRUE | VAC1:10-20 | COV_1:10-20 | COV:10-20 | 10 | -20 | -999970 | RISK:-Inf–1 | 5 | -56 | -999970 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-05-11 | 39 | 30-39 | 16 | 0 | TRUE | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | 21 | -9 | -999959 | RISK:-Inf–1 | 16 | -45 | -999959 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-05-27 | 39 | 30-39 | 29 | 0 | TRUE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 37 | 7 | -999943 | RISK:-Inf–1 | 32 | -29 | -999943 | 1982-02-04 | NA | NA | COV+COV |
| 1 | 2021-06-25 | 39 | 30-39 | 1 | 1 | TRUE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 66 | 36 | -999914 | RISK:-Inf–1 | 61 | 0 | -999914 | 1982-02-04 | NA | NA | COV+COV |
| 1 | 2021-06-26 | 39 | 30-39 | 5 | 0 | TRUE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 67 | 37 | -999913 | RISK:-Inf–1 | 62 | 1 | -999913 | 1982-02-04 | NA | NA | COV+COV |
| 2 | 2021-04-01 | 40 | 40-49 | 1 | 0 | FALSE | None | None | None | -999999 | -999999 | -999999 | RISK:-Inf–1 | -1 | -999999 | -999999 | 1980-05-04 | NA | NA | NA |
| 2 | 2021-04-02 | 40 | 40-49 | 1 | 1 | FALSE | None | None | None | -999998 | -999998 | -999998 | RISK:-Inf–1 | 0 | -999998 | -999998 | 1980-05-04 | NA | NA | NA |
| 2 | 2021-04-03 | 40 | 40-49 | 89 | 0 | TRUE | None | None | None | -999997 | -999997 | -999997 | RISK:-Inf–1 | 1 | -999997 | -999997 | 1980-05-04 | NA | NA | NA |
# using vac_effects argument is recommended
cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
vac_effects = list(c(0, 10, 21), 7),
washout = 180,
datecuts = NULL,
exclude_washout = FALSE)
cohort_followup %>% knitr::kable()| LEX_ID | DATE | IKAVUOSI | IKARYHMA | PERSONDAYS | EVENT_COUNT | WASHOUT_INDICATOR | VACCINE_EFFECT | VACCINE_EFFECT_PRODUCT | VACCINE_EFFECT_PRODUCTS | COVVAC_T1 | COVVAC_T2 | RISK_EFFECT | EVENT_T1 | EVENT_T2 | RISK_EXPOSURE_T | SYNTYMAPAIVA | KUOLINPVM | RISK_EXPOSURE_DATE | PRODUCTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2021-04-01 | 39 | 30-39 | 19 | 0 | FALSE | None | None | None | -19 | -49 | RISK:-Inf–1 | -24 | -85 | -999999 | 1982-02-04 | NA | NA | NA |
| 1 | 2021-04-20 | 39 | 30-39 | 5 | 0 | FALSE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 0 | -30 | RISK:-Inf–1 | -5 | -66 | -999980 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-04-25 | 39 | 30-39 | 1 | 1 | FALSE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 5 | -25 | RISK:-Inf–1 | 0 | -61 | -999975 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-04-26 | 39 | 30-39 | 4 | 0 | TRUE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 6 | -24 | RISK:-Inf–1 | 1 | -60 | -999974 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-04-30 | 39 | 30-39 | 11 | 0 | TRUE | VAC1:10-20 | COV_1:10-20 | COV:10-20 | 10 | -20 | RISK:-Inf–1 | 5 | -56 | -999970 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-05-11 | 39 | 30-39 | 16 | 0 | TRUE | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | 21 | -9 | RISK:-Inf–1 | 16 | -45 | -999959 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-05-27 | 39 | 30-39 | 29 | 0 | TRUE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 37 | 7 | RISK:-Inf–1 | 32 | -29 | -999943 | 1982-02-04 | NA | NA | COV+COV |
| 1 | 2021-06-25 | 39 | 30-39 | 1 | 1 | TRUE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 66 | 36 | RISK:-Inf–1 | 61 | 0 | -999914 | 1982-02-04 | NA | NA | COV+COV |
| 1 | 2021-06-26 | 39 | 30-39 | 5 | 0 | TRUE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 67 | 37 | RISK:-Inf–1 | 62 | 1 | -999913 | 1982-02-04 | NA | NA | COV+COV |
| 2 | 2021-04-01 | 40 | 40-49 | 1 | 0 | FALSE | None | None | None | -999999 | -999999 | RISK:-Inf–1 | -1 | -999999 | -999999 | 1980-05-04 | NA | NA | NA |
| 2 | 2021-04-02 | 40 | 40-49 | 1 | 1 | FALSE | None | None | None | -999998 | -999998 | RISK:-Inf–1 | 0 | -999998 | -999998 | 1980-05-04 | NA | NA | NA |
| 2 | 2021-04-03 | 40 | 40-49 | 89 | 0 | TRUE | None | None | None | -999997 | -999997 | RISK:-Inf–1 | 1 | -999997 | -999997 | 1980-05-04 | NA | NA | NA |
As mentioned, the washout can be any suitable positive integer and the washout time can be excluded from the results, which is the default behaviour.
cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
vac_effects = list(c(0, 10, 21),7),
washout = 7,
datecuts = NULL,
exclude_washout = TRUE)
cohort_followup %>% knitr::kable()| LEX_ID | DATE | IKAVUOSI | IKARYHMA | PERSONDAYS | EVENT_COUNT | WASHOUT_INDICATOR | VACCINE_EFFECT | VACCINE_EFFECT_PRODUCT | VACCINE_EFFECT_PRODUCTS | COVVAC_T1 | COVVAC_T2 | RISK_EFFECT | EVENT_T1 | EVENT_T2 | RISK_EXPOSURE_T | SYNTYMAPAIVA | KUOLINPVM | RISK_EXPOSURE_DATE | PRODUCTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2021-04-01 | 39 | 30-39 | 19 | 0 | FALSE | None | None | None | -19 | -49 | RISK:-Inf–1 | -24 | -85 | -999999 | 1982-02-04 | NA | NA | NA |
| 1 | 2021-04-20 | 39 | 30-39 | 5 | 0 | FALSE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 0 | -30 | RISK:-Inf–1 | -5 | -66 | -999980 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-04-25 | 39 | 30-39 | 1 | 1 | FALSE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 5 | -25 | RISK:-Inf–1 | 0 | -61 | -999975 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-05-02 | 39 | 30-39 | 9 | 0 | FALSE | VAC1:10-20 | COV_1:10-20 | COV:10-20 | 12 | -18 | RISK:-Inf–1 | 7 | -54 | -999968 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-05-11 | 39 | 30-39 | 16 | 0 | FALSE | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | 21 | -9 | RISK:-Inf–1 | 16 | -45 | -999959 | 1982-02-04 | NA | NA | COV |
| 1 | 2021-05-27 | 39 | 30-39 | 29 | 0 | FALSE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 37 | 7 | RISK:-Inf–1 | 32 | -29 | -999943 | 1982-02-04 | NA | NA | COV+COV |
| 1 | 2021-06-25 | 39 | 30-39 | 1 | 1 | FALSE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 66 | 36 | RISK:-Inf–1 | 61 | 0 | -999914 | 1982-02-04 | NA | NA | COV+COV |
| 2 | 2021-04-01 | 40 | 40-49 | 1 | 0 | FALSE | None | None | None | -999999 | -999999 | RISK:-Inf–1 | -1 | -999999 | -999999 | 1980-05-04 | NA | NA | NA |
| 2 | 2021-04-02 | 40 | 40-49 | 1 | 1 | FALSE | None | None | None | -999998 | -999998 | RISK:-Inf–1 | 0 | -999998 | -999998 | 1980-05-04 | NA | NA | NA |
| 2 | 2021-04-09 | 40 | 40-49 | 83 | 0 | FALSE | None | None | None | -999991 | -999991 | RISK:-Inf–1 | 7 | -999991 | -999991 | 1980-05-04 | NA | NA | NA |
A column named RISK_EXPOSURE_DATE in the data frame given to
construct_cohort_followup() is used to indicate the date of
exposure to a time-dependent risk factor. The risk effect time intervals
are defined by the risk_effect argument, which is a
positive integer vector of length 1 or more, defining cut points
relative to the risk exposure date. The risk effect intervals are the
same for all individuals and all events. If there are no additional
risks, the risk_effect argument can be set to NULL, which is the default
value.
additional_risk <- data.frame(HETU_ID = 1,
RISK_EXPOSURE_DATE = as.Date("2021-05-22"))
cohort_vac_outcome <- cohort_vac_outcome %>%
left_join(additional_risk, by = "HETU_ID")
cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
vac_effects = list(c(0, 10, 21),7),
risk_effect = c(0, 21),
washout = 7,
datecuts = NULL,
exclude_washout = FALSE)
cohort_followup %>% knitr::kable()| LEX_ID | DATE | IKAVUOSI | IKARYHMA | PERSONDAYS | EVENT_COUNT | WASHOUT_INDICATOR | VACCINE_EFFECT | VACCINE_EFFECT_PRODUCT | VACCINE_EFFECT_PRODUCTS | COVVAC_T1 | COVVAC_T2 | RISK_EFFECT | EVENT_T1 | EVENT_T2 | RISK_EXPOSURE_T | SYNTYMAPAIVA | KUOLINPVM | RISK_EXPOSURE_DATE | PRODUCTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2021-04-01 | 39 | 30-39 | 19 | 0 | FALSE | None | None | None | -19 | -49 | RISK:-Inf–1 | -24 | -85 | -51 | 1982-02-04 | NA | 2021-05-22 | NA |
| 1 | 2021-04-20 | 39 | 30-39 | 5 | 0 | FALSE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 0 | -30 | RISK:-Inf–1 | -5 | -66 | -32 | 1982-02-04 | NA | 2021-05-22 | COV |
| 1 | 2021-04-25 | 39 | 30-39 | 1 | 1 | FALSE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 5 | -25 | RISK:-Inf–1 | 0 | -61 | -27 | 1982-02-04 | NA | 2021-05-22 | COV |
| 1 | 2021-04-26 | 39 | 30-39 | 4 | 0 | TRUE | VAC1:0-9 | COV_1:0-9 | COV:0-9 | 6 | -24 | RISK:-Inf–1 | 1 | -60 | -26 | 1982-02-04 | NA | 2021-05-22 | COV |
| 1 | 2021-04-30 | 39 | 30-39 | 2 | 0 | TRUE | VAC1:10-20 | COV_1:10-20 | COV:10-20 | 10 | -20 | RISK:-Inf–1 | 5 | -56 | -22 | 1982-02-04 | NA | 2021-05-22 | COV |
| 1 | 2021-05-02 | 39 | 30-39 | 9 | 0 | FALSE | VAC1:10-20 | COV_1:10-20 | COV:10-20 | 12 | -18 | RISK:-Inf–1 | 7 | -54 | -20 | 1982-02-04 | NA | 2021-05-22 | COV |
| 1 | 2021-05-11 | 39 | 30-39 | 11 | 0 | FALSE | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | 21 | -9 | RISK:-Inf–1 | 16 | -45 | -11 | 1982-02-04 | NA | 2021-05-22 | COV |
| 1 | 2021-05-22 | 39 | 30-39 | 5 | 0 | FALSE | VAC1:21-Inf | COV_1:21-Inf | COV+COV:21-Inf | 32 | 2 | RISK:0-20 | 27 | -34 | 0 | 1982-02-04 | NA | 2021-05-22 | COV+COV |
| 1 | 2021-05-27 | 39 | 30-39 | 16 | 0 | FALSE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 37 | 7 | RISK:0-20 | 32 | -29 | 5 | 1982-02-04 | NA | 2021-05-22 | COV+COV |
| 1 | 2021-06-12 | 39 | 30-39 | 13 | 0 | FALSE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 53 | 23 | RISK:21-Inf | 48 | -13 | 21 | 1982-02-04 | NA | 2021-05-22 | COV+COV |
| 1 | 2021-06-25 | 39 | 30-39 | 1 | 1 | FALSE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 66 | 36 | RISK:21-Inf | 61 | 0 | 34 | 1982-02-04 | NA | 2021-05-22 | COV+COV |
| 1 | 2021-06-26 | 39 | 30-39 | 5 | 0 | TRUE | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | 67 | 37 | RISK:21-Inf | 62 | 1 | 35 | 1982-02-04 | NA | 2021-05-22 | COV+COV |
| 2 | 2021-04-01 | 40 | 40-49 | 1 | 0 | FALSE | None | None | None | -999999 | -999999 | RISK:-Inf–1 | -1 | -999999 | -999999 | 1980-05-04 | NA | NA | NA |
| 2 | 2021-04-02 | 40 | 40-49 | 1 | 1 | FALSE | None | None | None | -999998 | -999998 | RISK:-Inf–1 | 0 | -999998 | -999998 | 1980-05-04 | NA | NA | NA |
| 2 | 2021-04-03 | 40 | 40-49 | 6 | 0 | TRUE | None | None | None | -999997 | -999997 | RISK:-Inf–1 | 1 | -999997 | -999997 | 1980-05-04 | NA | NA | NA |
| 2 | 2021-04-09 | 40 | 40-49 | 83 | 0 | FALSE | None | None | None | -999991 | -999991 | RISK:-Inf–1 | 7 | -999991 | -999991 | 1980-05-04 | NA | NA | NA |
The data format returned by construct_cohort_followup()
is suitable for survival analysis with Poisson regression. The
EVENT_COUNT indicator (which is actually always 0 or 1 here) can be
treated as the dependent variable, with PERSONDAYS as the
offset in a Poisson regression analysis, and parameter estimation done
with the glm() function. With this toy cohort of two
individuals the results are meaningless but the following gives the
idea.
# use the whole product series to mark the risk periods and
# define the reference to be the unvaccinated state
cohort_followup$exposure <- relevel(factor(cohort_followup$VACCINE_EFFECT_PRODUCTS), ref = "None")
m1 <- glm(EVENT_COUNT ~ exposure,
offset = log(PERSONDAYS),
data = cohort_followup,
family = "poisson")
summary(m1)
#>
#> Call:
#> glm(formula = EVENT_COUNT ~ exposure, family = "poisson", data = cohort_followup,
#> offset = log(PERSONDAYS))
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -4.700 1.000 -4.700 2.6e-06 ***
#> exposureCOV:0-9 2.398 1.414 1.696 0.090 .
#> exposureCOV:10-20 -15.047 3550.834 -0.004 0.997
#> exposureCOV:21-Inf -16.000 5717.532 -0.003 0.998
#> exposureCOV+COV:21-Inf -15.211 5717.532 -0.003 0.998
#> exposureCOV+COV:7-Inf 1.145 1.414 0.810 0.418
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 24.632 on 15 degrees of freedom
#> Residual deviance: 21.117 on 10 degrees of freedom
#> AIC: 39.117
#>
#> Number of Fisher Scoring iterations: 16The computations for multiple individuals can be summarised using the
summarise_cohort_followup() function. It takes as input the
followup data and a character vector of columns to aggregate over
cohort_followup_aggregated <- summarise_cohort_followup(cohort_followup, by = c("IKARYHMA",
"VACCINE_EFFECT",
"VACCINE_EFFECT_PRODUCT",
"VACCINE_EFFECT_PRODUCTS",
"RISK_EFFECT",
"WASHOUT_INDICATOR"))
cohort_followup_aggregated %>%
knitr::kable()| IKARYHMA | VACCINE_EFFECT | VACCINE_EFFECT_PRODUCT | VACCINE_EFFECT_PRODUCTS | RISK_EFFECT | WASHOUT_INDICATOR | PERSONDAYS | EVENT_COUNT |
|---|---|---|---|---|---|---|---|
| 30-39 | None | None | None | RISK:-Inf–1 | FALSE | 19 | 0 |
| 30-39 | VAC1:0-9 | COV_1:0-9 | COV:0-9 | RISK:-Inf–1 | FALSE | 6 | 1 |
| 30-39 | VAC1:0-9 | COV_1:0-9 | COV:0-9 | RISK:-Inf–1 | TRUE | 4 | 0 |
| 30-39 | VAC1:10-20 | COV_1:10-20 | COV:10-20 | RISK:-Inf–1 | TRUE | 2 | 0 |
| 30-39 | VAC1:10-20 | COV_1:10-20 | COV:10-20 | RISK:-Inf–1 | FALSE | 9 | 0 |
| 30-39 | VAC1:21-Inf | COV_1:21-Inf | COV:21-Inf | RISK:-Inf–1 | FALSE | 11 | 0 |
| 30-39 | VAC1:21-Inf | COV_1:21-Inf | COV+COV:21-Inf | RISK:0-20 | FALSE | 5 | 0 |
| 30-39 | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | RISK:0-20 | FALSE | 16 | 0 |
| 30-39 | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | RISK:21-Inf | FALSE | 14 | 1 |
| 30-39 | VAC2:7-Inf | COV_2:7-Inf | COV+COV:7-Inf | RISK:21-Inf | TRUE | 5 | 0 |
| 40-49 | None | None | None | RISK:-Inf–1 | FALSE | 85 | 1 |
| 40-49 | None | None | None | RISK:-Inf–1 | TRUE | 6 | 0 |
An identical Poisson regression model can be estimated from the aggregated data. In this toy example the results can differ due to numerical instability.
# use the whole product series to mark the risk periods and
# define the reference to be the unvaccinated state
cohort_followup_aggregated$exposure <- relevel(factor(cohort_followup_aggregated$VACCINE_EFFECT_PRODUCTS), ref = "None")
m2 <- glm(EVENT_COUNT ~ exposure,
offset = log(PERSONDAYS),
data = cohort_followup_aggregated,
family = "poisson")
summary(m2)
#>
#> Call:
#> glm(formula = EVENT_COUNT ~ exposure, family = "poisson", data = cohort_followup_aggregated,
#> offset = log(PERSONDAYS))
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -4.700 1.000 -4.700 2.6e-06 ***
#> exposureCOV:0-9 2.398 1.414 1.696 0.090 .
#> exposureCOV:10-20 -17.047 9652.168 -0.002 0.999
#> exposureCOV:21-Inf -18.000 15541.864 -0.001 0.999
#> exposureCOV+COV:21-Inf -17.212 15541.864 -0.001 0.999
#> exposureCOV+COV:7-Inf 1.145 1.414 0.810 0.418
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 6.8854 on 11 degrees of freedom
#> Residual deviance: 3.3699 on 6 degrees of freedom
#> AIC: 21.37
#>
#> Number of Fisher Scoring iterations: 18It is possible to construct the followup with no events.
cohort_vac_noevents <- construct_cohort_vac_outcome(cohort = cohort,
vacs = vacs,
start_date = start_date,
end_date = end_date)
construct_cohort_followup(cohort_vac_noevents) %>%
knitr::kable()| LEX_ID | DATE | IKAVUOSI | IKARYHMA | PERSONDAYS | EVENT_COUNT | WASHOUT_INDICATOR | VACCINE_EFFECT | VACCINE_EFFECT_PRODUCT | VACCINE_EFFECT_PRODUCTS | COVVAC_T1 | COVVAC_T2 | COVVAC_T3 | RISK_EFFECT | EVENT_T1 | RISK_EXPOSURE_T | SYNTYMAPAIVA | KUOLINPVM | RISK_EXPOSURE_DATE | CALENDAR | PRODUCTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2021-04-01 | 39 | 30-39 | 19 | 0 | FALSE | None | None | None | -19 | -49 | -999999 | RISK:-Inf–1 | -999999 | -999999 | 1982-02-04 | NA | NA | 2021-04-01 | NA |
| 1 | 2021-04-20 | 39 | 30-39 | 11 | 0 | FALSE | VAC1:0-Inf | COV_1:0-Inf | COV:0-Inf | 0 | -30 | -999980 | RISK:-Inf–1 | -999980 | -999980 | 1982-02-04 | NA | NA | 2021-04-01 | COV |
| 1 | 2021-05-01 | 39 | 30-39 | 19 | 0 | FALSE | VAC1:0-Inf | COV_1:0-Inf | COV:0-Inf | 11 | -19 | -999969 | RISK:-Inf–1 | -999969 | -999969 | 1982-02-04 | NA | NA | 2021-05-01 | COV |
| 1 | 2021-05-20 | 39 | 30-39 | 12 | 0 | FALSE | VAC2:0-Inf | COV_2:0-Inf | COV+COV:0-Inf | 30 | 0 | -999950 | RISK:-Inf–1 | -999950 | -999950 | 1982-02-04 | NA | NA | 2021-05-01 | COV+COV |
| 1 | 2021-06-01 | 39 | 30-39 | 30 | 0 | FALSE | VAC2:0-Inf | COV_2:0-Inf | COV+COV:0-Inf | 42 | 12 | -999938 | RISK:-Inf–1 | -999938 | -999938 | 1982-02-04 | NA | NA | 2021-05-01 | COV+COV |
| 2 | 2021-04-01 | 40 | 40-49 | 30 | 0 | FALSE | None | None | None | -999999 | -999999 | -999999 | RISK:-Inf–1 | -999999 | -999999 | 1980-05-04 | NA | NA | 2021-04-01 | NA |
| 2 | 2021-05-01 | 40 | 40-49 | 31 | 0 | FALSE | None | None | None | -999969 | -999969 | -999969 | RISK:-Inf–1 | -999969 | -999969 | 1980-05-04 | NA | NA | 2021-05-01 | NA |
| 2 | 2021-06-01 | 41 | 40-49 | 30 | 0 | FALSE | None | None | None | -999938 | -999938 | -999938 | RISK:-Inf–1 | -999938 | -999938 | 1980-05-04 | NA | NA | 2021-05-01 | NA |
The function summarise_cohort_vac_outcome performs the
followup time computation and aggregation in chunks to avoid memory
issues. This is recommended for large cohorts. The split_by
argument defines the number of chunks (In this dummy example it has no
effect because the cohort size is one).
Note that with a large cohort, you can also use the n
argument of construct_cohort_followup() for testing
purposes to only compute over a small subset of the data (in this
example n must be 1 because there was only one individual in the
data).
summarise_cohort_vac_outcome(cohort_vac_outcome,
washout = 7,
datecuts = NULL,
vac_effects = list(21, 7),
exclude_washout = TRUE,
by = c("VACCINE_EFFECT_PRODUCTS", "IKARYHMA"),
n = 1,
split_by = 10,
verbose = TRUE) %>%
knitr::kable()
#> Processing strata 1/2
#> Processing strata 2/2| VACCINE_EFFECT_PRODUCTS | IKARYHMA | PERSONDAYS | EVENT_COUNT |
|---|---|---|---|
| None | 30-39 | 34 | 1 |
| COV:21-Inf | 30-39 | 11 | 0 |
| COV+COV:21-Inf | 30-39 | 5 | 0 |
| COV+COV:7-Inf | 30-39 | 30 | 1 |
| None | 40-49 | 85 | 1 |