About

This tutorial introduces the following functions in the epirapoSurvival package

The functions combine to allow general computations of events and follow-up time in different calendar time periods and vaccine effect states and allow for taking into account both incident and recurrent events and possible washout time.

A cohort, vaccinations and outcomes

Say we have the following cohort:

cohort <- tibble(HETU_ID = 1, SYNTYMAPAIVA = as.Date("1982-02-04"), KUOLINPVM = NA )
vacs <- tibble(HETU_ID = 1, RECORDDATE = as.Date("2021-05-03"), PRODUCT_ID  = "COV")
outcome <- tibble(HETU_ID = 1, TAPAHTUMAPVM = as.Date("2021-06-20"))
cohort %>% knitr::kable()
HETU_ID SYNTYMAPAIVA KUOLINPVM
1 1982-02-04 NA

with the following vaccinations:

vacs %>% knitr::kable()
HETU_ID RECORDDATE PRODUCT_ID
1 2021-05-03 COV

and these outcomes:

outcome %>% knitr::kable()
HETU_ID TAPAHTUMAPVM
1 2021-06-20

Say we are interested in a followup period

start_date <- "2021-04-01"
end_date <- "2021-06-30"

We first combine the data into a single data frame using construct_cohort_vac_outcome()

cohort_vac_outcome <- construct_cohort_vac_outcome(cohort, vacs, outcome, 
                                                   start_date = start_date, 
                                                   end_date = end_date)
str(cohort_vac_outcome)
#> tibble [1 × 8] (S3: tbl_df/tbl/data.frame)
#>  $ HETU_ID     : num 1
#>  $ SYNTYMAPAIVA: Date[1:1], format: "1982-02-04"
#>  $ KUOLINPVM   : logi NA
#>  $ entrypvm    : Date[1:1], format: "2021-04-01"
#>  $ exitpvm     : Date[1:1], format: "2021-06-30"
#>  $ TAPAHTUMA_1 : Date[1:1], format: "2021-06-20"
#>  $ RECORDDATE_1: Date[1:1], format: "2021-05-03"
#>  $ PRODUCT_ID_1: chr "COV"
#>  - attr(*, "start_date")= Date[1:1], format: "2021-04-01"
#>  - attr(*, "end_date")= Date[1:1], format: "2021-06-30"
#>  - attr(*, "max_events")= int 1
#>  - attr(*, "washout")= num 0

Splitting the followup and indicating events

We can then split the data to selected follow-up intervals. By default, the followup is split by month, but that can be overridden by assigning the datecuts argument to NULL. In addition, the follow-up is split by specified vaccine effect time points, which can be integer vectors (positive or negative) of length 1 or more. These define cut points relative to the vaccination date. The effect intervals are given separately for each vaccine dose.

A washout can be defined relative to the event date and can be any positive integer, including 0 (no washout) or Inf (only incident cases). The choice of Inf corresponds to usual survival analysis where the follow-up ends at the first event. Other choices correspond to situations where recurring events are considered, but there is a washout period of the specified length after each event, during which time the individual does not contribute follow-up time. The washout time periods can be filtered out from the results, which is default behaviour.

Age groups are also added by default, according to right open cutpoints given in the argument agecuts.

cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
                                             datecuts = NULL,
                                             agecuts = c(0, 12, seq(20, 80, by = 10), Inf),
                                             vac1_effect = 21,
                                             washout = 180,
                                             exclude_washout = FALSE)

cohort_followup %>% knitr::kable()
LEX_ID DATE IKAVUOSI IKARYHMA PERSONDAYS EVENT_COUNT WASHOUT_INDICATOR VACCINE_EFFECT VACCINE_EFFECT_PRODUCT VACCINE_EFFECT_PRODUCTS COVVAC_T1 COVVAC_T2 COVVAC_T3 RISK_EFFECT EVENT_T1 RISK_EXPOSURE_T SYNTYMAPAIVA KUOLINPVM RISK_EXPOSURE_DATE PRODUCTS
1 2021-04-01 39 30-39 53 0 FALSE None None None -32 -999999 -999999 RISK:-Inf–1 -80 -999999 1982-02-04 NA NA NA
1 2021-05-24 39 30-39 27 0 FALSE VAC1:21-Inf COV_1:21-Inf COV:21-Inf 21 -999946 -999946 RISK:-Inf–1 -27 -999946 1982-02-04 NA NA COV
1 2021-06-20 39 30-39 1 1 FALSE VAC1:21-Inf COV_1:21-Inf COV:21-Inf 48 -999919 -999919 RISK:-Inf–1 0 -999919 1982-02-04 NA NA COV
1 2021-06-21 39 30-39 10 0 TRUE VAC1:21-Inf COV_1:21-Inf COV:21-Inf 49 -999918 -999918 RISK:-Inf–1 1 -999918 1982-02-04 NA NA COV

Please see the documentation of the function for details on the returned data.

Dates can be cut into specified intervals, the default choice in months. Age is computed for the beginning of each individual follow-up interval in the final result, so for a unified age treatment for all individuals it is recommended to use datecuts to split the follow-up by calendar time.

cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
                                             agecuts = c(0, 12, seq(20, 80, by = 10), Inf),
                                             vac1_effect = 21,
                                             washout = 180)

cohort_followup %>% select(LEX_ID, CALENDAR, everything()) %>% knitr::kable()
LEX_ID CALENDAR DATE IKAVUOSI IKARYHMA PERSONDAYS EVENT_COUNT WASHOUT_INDICATOR VACCINE_EFFECT VACCINE_EFFECT_PRODUCT VACCINE_EFFECT_PRODUCTS COVVAC_T1 COVVAC_T2 COVVAC_T3 RISK_EFFECT EVENT_T1 RISK_EXPOSURE_T SYNTYMAPAIVA KUOLINPVM RISK_EXPOSURE_DATE PRODUCTS
1 2021-04-01 2021-04-01 39 30-39 30 0 FALSE None None None -32 -999999 -999999 RISK:-Inf–1 -80 -999999 1982-02-04 NA NA NA
1 2021-05-01 2021-05-01 39 30-39 23 0 FALSE None None None -2 -999969 -999969 RISK:-Inf–1 -50 -999969 1982-02-04 NA NA NA
1 2021-05-01 2021-05-24 39 30-39 8 0 FALSE VAC1:21-Inf COV_1:21-Inf COV:21-Inf 21 -999946 -999946 RISK:-Inf–1 -27 -999946 1982-02-04 NA NA COV
1 2021-06-01 2021-06-01 39 30-39 19 0 FALSE VAC1:21-Inf COV_1:21-Inf COV:21-Inf 29 -999938 -999938 RISK:-Inf–1 -19 -999938 1982-02-04 NA NA COV
1 2021-06-01 2021-06-20 39 30-39 1 1 FALSE VAC1:21-Inf COV_1:21-Inf COV:21-Inf 48 -999919 -999919 RISK:-Inf–1 0 -999919 1982-02-04 NA NA COV

Multiple vaccinations, multiple events, multiple effect periods

There can be any number of events and any number of vaccinations per individual.

The required outcome event dates can be named freely but if the name is other than “TAPAHTUMAPVM”, it needs to be specified by the outcomedate_column argument in the construct_cohort_vac_outcome() function. They outcome columns will be renamed as TAPAHTUMA_1,…,TAPAHTUMA_n in the results.


# a cohort of two individuals
cohort <- tibble(HETU_ID = c(1,2), 
                 SYNTYMAPAIVA = c(as.Date("1982-02-04"), as.Date("1980-05-04")),
                 KUOLINPVM = c(NA, NA))

# the first had two events and the second had one event
outcome <- tibble(HETU_ID = c(1,1, 2),
                  diagdate = as.Date(c("2021-04-25", "2021-06-25", "2021-04-02")))

# the first was vaccinated twice
vacs <- tibble(HETU_ID = c(1,1), 
               RECORDDATE = as.Date(c("2021-04-20", "2021-05-20")), 
               PRODUCT_ID  = c("COV", "COV"))

cohort_vac_outcome <- construct_cohort_vac_outcome(cohort, vacs, outcome, 
                                                   start_date = start_date, 
                                                   end_date = end_date,
                                                   outcomedate_column = "diagdate")

str(cohort_vac_outcome)
#> tibble [2 × 11] (S3: tbl_df/tbl/data.frame)
#>  $ HETU_ID     : num [1:2] 1 2
#>  $ SYNTYMAPAIVA: Date[1:2], format: "1982-02-04" "1980-05-04"
#>  $ KUOLINPVM   : logi [1:2] NA NA
#>  $ entrypvm    : Date[1:2], format: "2021-04-01" "2021-04-01"
#>  $ exitpvm     : Date[1:2], format: "2021-06-30" "2021-06-30"
#>  $ TAPAHTUMA_1 : Date[1:2], format: "2021-04-25" "2021-04-02"
#>  $ TAPAHTUMA_2 : Date[1:2], format: "2021-06-25" NA
#>  $ RECORDDATE_1: Date[1:2], format: "2021-04-20" NA
#>  $ RECORDDATE_2: Date[1:2], format: "2021-05-20" NA
#>  $ PRODUCT_ID_1: chr [1:2] "COV" NA
#>  $ PRODUCT_ID_2: chr [1:2] "COV" NA
#>  - attr(*, "start_date")= Date[1:1], format: "2021-04-01"
#>  - attr(*, "end_date")= Date[1:1], format: "2021-06-30"
#>  - attr(*, "max_events")= int 2
#>  - attr(*, "washout")= num 0

For the first three vaccinations, there are specific arguments vac1_effect, vac2_effect and vac3_effect. However, there is also the generic argument vac_effects, which is a list of any length and specifies effects (risk intervals) for any number of vaccinations. It is recommended to always use vac_effects instead of the vacx_effect arguments. The difference is that when using vac_effects, no extra placeholder columns related to vaccinations are created.

Otherwise, the number of events or vaccinations does not change how construct_cohort_followup() is called or used. The washout intervals are always identical for each event.


# using vac1_effect and vac2_effect arguments
cohort_followup1 <- construct_cohort_followup(cohort_vac_outcome,
                                             vac1_effect = c(0, 10, 21),
                                             vac2_effect = 7,
                                             washout = 180,
                                             datecuts = NULL,
                                             exclude_washout = FALSE)

cohort_followup1 %>% knitr::kable()
LEX_ID DATE IKAVUOSI IKARYHMA PERSONDAYS EVENT_COUNT WASHOUT_INDICATOR VACCINE_EFFECT VACCINE_EFFECT_PRODUCT VACCINE_EFFECT_PRODUCTS COVVAC_T1 COVVAC_T2 COVVAC_T3 RISK_EFFECT EVENT_T1 EVENT_T2 RISK_EXPOSURE_T SYNTYMAPAIVA KUOLINPVM RISK_EXPOSURE_DATE PRODUCTS
1 2021-04-01 39 30-39 19 0 FALSE None None None -19 -49 -999999 RISK:-Inf–1 -24 -85 -999999 1982-02-04 NA NA NA
1 2021-04-20 39 30-39 5 0 FALSE VAC1:0-9 COV_1:0-9 COV:0-9 0 -30 -999980 RISK:-Inf–1 -5 -66 -999980 1982-02-04 NA NA COV
1 2021-04-25 39 30-39 1 1 FALSE VAC1:0-9 COV_1:0-9 COV:0-9 5 -25 -999975 RISK:-Inf–1 0 -61 -999975 1982-02-04 NA NA COV
1 2021-04-26 39 30-39 4 0 TRUE VAC1:0-9 COV_1:0-9 COV:0-9 6 -24 -999974 RISK:-Inf–1 1 -60 -999974 1982-02-04 NA NA COV
1 2021-04-30 39 30-39 11 0 TRUE VAC1:10-20 COV_1:10-20 COV:10-20 10 -20 -999970 RISK:-Inf–1 5 -56 -999970 1982-02-04 NA NA COV
1 2021-05-11 39 30-39 16 0 TRUE VAC1:21-Inf COV_1:21-Inf COV:21-Inf 21 -9 -999959 RISK:-Inf–1 16 -45 -999959 1982-02-04 NA NA COV
1 2021-05-27 39 30-39 29 0 TRUE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 37 7 -999943 RISK:-Inf–1 32 -29 -999943 1982-02-04 NA NA COV+COV
1 2021-06-25 39 30-39 1 1 TRUE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 66 36 -999914 RISK:-Inf–1 61 0 -999914 1982-02-04 NA NA COV+COV
1 2021-06-26 39 30-39 5 0 TRUE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 67 37 -999913 RISK:-Inf–1 62 1 -999913 1982-02-04 NA NA COV+COV
2 2021-04-01 40 40-49 1 0 FALSE None None None -999999 -999999 -999999 RISK:-Inf–1 -1 -999999 -999999 1980-05-04 NA NA NA
2 2021-04-02 40 40-49 1 1 FALSE None None None -999998 -999998 -999998 RISK:-Inf–1 0 -999998 -999998 1980-05-04 NA NA NA
2 2021-04-03 40 40-49 89 0 TRUE None None None -999997 -999997 -999997 RISK:-Inf–1 1 -999997 -999997 1980-05-04 NA NA NA
# using vac_effects argument is recommended
cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
                                             vac_effects = list(c(0, 10, 21), 7),
                                             washout = 180,
                                             datecuts = NULL,
                                             exclude_washout = FALSE)

cohort_followup %>% knitr::kable()
LEX_ID DATE IKAVUOSI IKARYHMA PERSONDAYS EVENT_COUNT WASHOUT_INDICATOR VACCINE_EFFECT VACCINE_EFFECT_PRODUCT VACCINE_EFFECT_PRODUCTS COVVAC_T1 COVVAC_T2 RISK_EFFECT EVENT_T1 EVENT_T2 RISK_EXPOSURE_T SYNTYMAPAIVA KUOLINPVM RISK_EXPOSURE_DATE PRODUCTS
1 2021-04-01 39 30-39 19 0 FALSE None None None -19 -49 RISK:-Inf–1 -24 -85 -999999 1982-02-04 NA NA NA
1 2021-04-20 39 30-39 5 0 FALSE VAC1:0-9 COV_1:0-9 COV:0-9 0 -30 RISK:-Inf–1 -5 -66 -999980 1982-02-04 NA NA COV
1 2021-04-25 39 30-39 1 1 FALSE VAC1:0-9 COV_1:0-9 COV:0-9 5 -25 RISK:-Inf–1 0 -61 -999975 1982-02-04 NA NA COV
1 2021-04-26 39 30-39 4 0 TRUE VAC1:0-9 COV_1:0-9 COV:0-9 6 -24 RISK:-Inf–1 1 -60 -999974 1982-02-04 NA NA COV
1 2021-04-30 39 30-39 11 0 TRUE VAC1:10-20 COV_1:10-20 COV:10-20 10 -20 RISK:-Inf–1 5 -56 -999970 1982-02-04 NA NA COV
1 2021-05-11 39 30-39 16 0 TRUE VAC1:21-Inf COV_1:21-Inf COV:21-Inf 21 -9 RISK:-Inf–1 16 -45 -999959 1982-02-04 NA NA COV
1 2021-05-27 39 30-39 29 0 TRUE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 37 7 RISK:-Inf–1 32 -29 -999943 1982-02-04 NA NA COV+COV
1 2021-06-25 39 30-39 1 1 TRUE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 66 36 RISK:-Inf–1 61 0 -999914 1982-02-04 NA NA COV+COV
1 2021-06-26 39 30-39 5 0 TRUE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 67 37 RISK:-Inf–1 62 1 -999913 1982-02-04 NA NA COV+COV
2 2021-04-01 40 40-49 1 0 FALSE None None None -999999 -999999 RISK:-Inf–1 -1 -999999 -999999 1980-05-04 NA NA NA
2 2021-04-02 40 40-49 1 1 FALSE None None None -999998 -999998 RISK:-Inf–1 0 -999998 -999998 1980-05-04 NA NA NA
2 2021-04-03 40 40-49 89 0 TRUE None None None -999997 -999997 RISK:-Inf–1 1 -999997 -999997 1980-05-04 NA NA NA

As mentioned, the washout can be any suitable positive integer and the washout time can be excluded from the results, which is the default behaviour.


cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
                                             vac_effects = list(c(0, 10, 21),7),
                                             washout = 7,
                                             datecuts = NULL,
                                             exclude_washout = TRUE)

cohort_followup %>% knitr::kable()
LEX_ID DATE IKAVUOSI IKARYHMA PERSONDAYS EVENT_COUNT WASHOUT_INDICATOR VACCINE_EFFECT VACCINE_EFFECT_PRODUCT VACCINE_EFFECT_PRODUCTS COVVAC_T1 COVVAC_T2 RISK_EFFECT EVENT_T1 EVENT_T2 RISK_EXPOSURE_T SYNTYMAPAIVA KUOLINPVM RISK_EXPOSURE_DATE PRODUCTS
1 2021-04-01 39 30-39 19 0 FALSE None None None -19 -49 RISK:-Inf–1 -24 -85 -999999 1982-02-04 NA NA NA
1 2021-04-20 39 30-39 5 0 FALSE VAC1:0-9 COV_1:0-9 COV:0-9 0 -30 RISK:-Inf–1 -5 -66 -999980 1982-02-04 NA NA COV
1 2021-04-25 39 30-39 1 1 FALSE VAC1:0-9 COV_1:0-9 COV:0-9 5 -25 RISK:-Inf–1 0 -61 -999975 1982-02-04 NA NA COV
1 2021-05-02 39 30-39 9 0 FALSE VAC1:10-20 COV_1:10-20 COV:10-20 12 -18 RISK:-Inf–1 7 -54 -999968 1982-02-04 NA NA COV
1 2021-05-11 39 30-39 16 0 FALSE VAC1:21-Inf COV_1:21-Inf COV:21-Inf 21 -9 RISK:-Inf–1 16 -45 -999959 1982-02-04 NA NA COV
1 2021-05-27 39 30-39 29 0 FALSE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 37 7 RISK:-Inf–1 32 -29 -999943 1982-02-04 NA NA COV+COV
1 2021-06-25 39 30-39 1 1 FALSE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 66 36 RISK:-Inf–1 61 0 -999914 1982-02-04 NA NA COV+COV
2 2021-04-01 40 40-49 1 0 FALSE None None None -999999 -999999 RISK:-Inf–1 -1 -999999 -999999 1980-05-04 NA NA NA
2 2021-04-02 40 40-49 1 1 FALSE None None None -999998 -999998 RISK:-Inf–1 0 -999998 -999998 1980-05-04 NA NA NA
2 2021-04-09 40 40-49 83 0 FALSE None None None -999991 -999991 RISK:-Inf–1 7 -999991 -999991 1980-05-04 NA NA NA

Additional time-dependent risks

A column named RISK_EXPOSURE_DATE in the data frame given to construct_cohort_followup() is used to indicate the date of exposure to a time-dependent risk factor. The risk effect time intervals are defined by the risk_effect argument, which is a positive integer vector of length 1 or more, defining cut points relative to the risk exposure date. The risk effect intervals are the same for all individuals and all events. If there are no additional risks, the risk_effect argument can be set to NULL, which is the default value.

additional_risk <- data.frame(HETU_ID = 1,
                   RISK_EXPOSURE_DATE = as.Date("2021-05-22"))

cohort_vac_outcome <- cohort_vac_outcome %>% 
  left_join(additional_risk, by = "HETU_ID")

cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
                                             vac_effects = list(c(0, 10, 21),7),
                                             risk_effect = c(0, 21),
                                             washout = 7,
                                             datecuts = NULL,
                                             exclude_washout = FALSE)

cohort_followup %>% knitr::kable()
LEX_ID DATE IKAVUOSI IKARYHMA PERSONDAYS EVENT_COUNT WASHOUT_INDICATOR VACCINE_EFFECT VACCINE_EFFECT_PRODUCT VACCINE_EFFECT_PRODUCTS COVVAC_T1 COVVAC_T2 RISK_EFFECT EVENT_T1 EVENT_T2 RISK_EXPOSURE_T SYNTYMAPAIVA KUOLINPVM RISK_EXPOSURE_DATE PRODUCTS
1 2021-04-01 39 30-39 19 0 FALSE None None None -19 -49 RISK:-Inf–1 -24 -85 -51 1982-02-04 NA 2021-05-22 NA
1 2021-04-20 39 30-39 5 0 FALSE VAC1:0-9 COV_1:0-9 COV:0-9 0 -30 RISK:-Inf–1 -5 -66 -32 1982-02-04 NA 2021-05-22 COV
1 2021-04-25 39 30-39 1 1 FALSE VAC1:0-9 COV_1:0-9 COV:0-9 5 -25 RISK:-Inf–1 0 -61 -27 1982-02-04 NA 2021-05-22 COV
1 2021-04-26 39 30-39 4 0 TRUE VAC1:0-9 COV_1:0-9 COV:0-9 6 -24 RISK:-Inf–1 1 -60 -26 1982-02-04 NA 2021-05-22 COV
1 2021-04-30 39 30-39 2 0 TRUE VAC1:10-20 COV_1:10-20 COV:10-20 10 -20 RISK:-Inf–1 5 -56 -22 1982-02-04 NA 2021-05-22 COV
1 2021-05-02 39 30-39 9 0 FALSE VAC1:10-20 COV_1:10-20 COV:10-20 12 -18 RISK:-Inf–1 7 -54 -20 1982-02-04 NA 2021-05-22 COV
1 2021-05-11 39 30-39 11 0 FALSE VAC1:21-Inf COV_1:21-Inf COV:21-Inf 21 -9 RISK:-Inf–1 16 -45 -11 1982-02-04 NA 2021-05-22 COV
1 2021-05-22 39 30-39 5 0 FALSE VAC1:21-Inf COV_1:21-Inf COV+COV:21-Inf 32 2 RISK:0-20 27 -34 0 1982-02-04 NA 2021-05-22 COV+COV
1 2021-05-27 39 30-39 16 0 FALSE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 37 7 RISK:0-20 32 -29 5 1982-02-04 NA 2021-05-22 COV+COV
1 2021-06-12 39 30-39 13 0 FALSE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 53 23 RISK:21-Inf 48 -13 21 1982-02-04 NA 2021-05-22 COV+COV
1 2021-06-25 39 30-39 1 1 FALSE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 66 36 RISK:21-Inf 61 0 34 1982-02-04 NA 2021-05-22 COV+COV
1 2021-06-26 39 30-39 5 0 TRUE VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf 67 37 RISK:21-Inf 62 1 35 1982-02-04 NA 2021-05-22 COV+COV
2 2021-04-01 40 40-49 1 0 FALSE None None None -999999 -999999 RISK:-Inf–1 -1 -999999 -999999 1980-05-04 NA NA NA
2 2021-04-02 40 40-49 1 1 FALSE None None None -999998 -999998 RISK:-Inf–1 0 -999998 -999998 1980-05-04 NA NA NA
2 2021-04-03 40 40-49 6 0 TRUE None None None -999997 -999997 RISK:-Inf–1 1 -999997 -999997 1980-05-04 NA NA NA
2 2021-04-09 40 40-49 83 0 FALSE None None None -999991 -999991 RISK:-Inf–1 7 -999991 -999991 1980-05-04 NA NA NA

Regression analysis

The data format returned by construct_cohort_followup() is suitable for survival analysis with Poisson regression. The EVENT_COUNT indicator (which is actually always 0 or 1 here) can be treated as the dependent variable, with PERSONDAYS as the offset in a Poisson regression analysis, and parameter estimation done with the glm() function. With this toy cohort of two individuals the results are meaningless but the following gives the idea.

# use the whole product series to mark the risk periods and 
# define the reference to be the unvaccinated state
cohort_followup$exposure <- relevel(factor(cohort_followup$VACCINE_EFFECT_PRODUCTS), ref = "None")

m1 <- glm(EVENT_COUNT ~ exposure, 
    offset = log(PERSONDAYS), 
    data = cohort_followup, 
    family = "poisson")

summary(m1)
#> 
#> Call:
#> glm(formula = EVENT_COUNT ~ exposure, family = "poisson", data = cohort_followup, 
#>     offset = log(PERSONDAYS))
#> 
#> Coefficients:
#>                        Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)              -4.700      1.000  -4.700  2.6e-06 ***
#> exposureCOV:0-9           2.398      1.414   1.696    0.090 .  
#> exposureCOV:10-20       -15.047   3550.834  -0.004    0.997    
#> exposureCOV:21-Inf      -16.000   5717.532  -0.003    0.998    
#> exposureCOV+COV:21-Inf  -15.211   5717.532  -0.003    0.998    
#> exposureCOV+COV:7-Inf     1.145      1.414   0.810    0.418    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 24.632  on 15  degrees of freedom
#> Residual deviance: 21.117  on 10  degrees of freedom
#> AIC: 39.117
#> 
#> Number of Fisher Scoring iterations: 16

Summarising the results

The computations for multiple individuals can be summarised using the summarise_cohort_followup() function. It takes as input the followup data and a character vector of columns to aggregate over

cohort_followup_aggregated <- summarise_cohort_followup(cohort_followup, by = c("IKARYHMA",
                                                  "VACCINE_EFFECT",
                                                  "VACCINE_EFFECT_PRODUCT",
                                                  "VACCINE_EFFECT_PRODUCTS", 
                                                  "RISK_EFFECT",
                                                  "WASHOUT_INDICATOR"))

cohort_followup_aggregated %>% 
  knitr::kable()
IKARYHMA VACCINE_EFFECT VACCINE_EFFECT_PRODUCT VACCINE_EFFECT_PRODUCTS RISK_EFFECT WASHOUT_INDICATOR PERSONDAYS EVENT_COUNT
30-39 None None None RISK:-Inf–1 FALSE 19 0
30-39 VAC1:0-9 COV_1:0-9 COV:0-9 RISK:-Inf–1 FALSE 6 1
30-39 VAC1:0-9 COV_1:0-9 COV:0-9 RISK:-Inf–1 TRUE 4 0
30-39 VAC1:10-20 COV_1:10-20 COV:10-20 RISK:-Inf–1 TRUE 2 0
30-39 VAC1:10-20 COV_1:10-20 COV:10-20 RISK:-Inf–1 FALSE 9 0
30-39 VAC1:21-Inf COV_1:21-Inf COV:21-Inf RISK:-Inf–1 FALSE 11 0
30-39 VAC1:21-Inf COV_1:21-Inf COV+COV:21-Inf RISK:0-20 FALSE 5 0
30-39 VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf RISK:0-20 FALSE 16 0
30-39 VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf RISK:21-Inf FALSE 14 1
30-39 VAC2:7-Inf COV_2:7-Inf COV+COV:7-Inf RISK:21-Inf TRUE 5 0
40-49 None None None RISK:-Inf–1 FALSE 85 1
40-49 None None None RISK:-Inf–1 TRUE 6 0

An identical Poisson regression model can be estimated from the aggregated data. In this toy example the results can differ due to numerical instability.

# use the whole product series to mark the risk periods and 
# define the reference to be the unvaccinated state
cohort_followup_aggregated$exposure <- relevel(factor(cohort_followup_aggregated$VACCINE_EFFECT_PRODUCTS), ref = "None")

m2 <- glm(EVENT_COUNT ~ exposure, 
    offset = log(PERSONDAYS), 
    data = cohort_followup_aggregated, 
    family = "poisson")

summary(m2)
#> 
#> Call:
#> glm(formula = EVENT_COUNT ~ exposure, family = "poisson", data = cohort_followup_aggregated, 
#>     offset = log(PERSONDAYS))
#> 
#> Coefficients:
#>                         Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)               -4.700      1.000  -4.700  2.6e-06 ***
#> exposureCOV:0-9            2.398      1.414   1.696    0.090 .  
#> exposureCOV:10-20        -17.047   9652.168  -0.002    0.999    
#> exposureCOV:21-Inf       -18.000  15541.864  -0.001    0.999    
#> exposureCOV+COV:21-Inf   -17.212  15541.864  -0.001    0.999    
#> exposureCOV+COV:7-Inf      1.145      1.414   0.810    0.418    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 6.8854  on 11  degrees of freedom
#> Residual deviance: 3.3699  on  6  degrees of freedom
#> AIC: 21.37
#> 
#> Number of Fisher Scoring iterations: 18

Follow-up with no events

It is possible to construct the followup with no events.

cohort_vac_noevents <- construct_cohort_vac_outcome(cohort = cohort, 
                                                   vacs = vacs, 
                                                   start_date = start_date, 
                                                   end_date = end_date)

construct_cohort_followup(cohort_vac_noevents) %>% 
  knitr::kable()
LEX_ID DATE IKAVUOSI IKARYHMA PERSONDAYS EVENT_COUNT WASHOUT_INDICATOR VACCINE_EFFECT VACCINE_EFFECT_PRODUCT VACCINE_EFFECT_PRODUCTS COVVAC_T1 COVVAC_T2 COVVAC_T3 RISK_EFFECT EVENT_T1 RISK_EXPOSURE_T SYNTYMAPAIVA KUOLINPVM RISK_EXPOSURE_DATE CALENDAR PRODUCTS
1 2021-04-01 39 30-39 19 0 FALSE None None None -19 -49 -999999 RISK:-Inf–1 -999999 -999999 1982-02-04 NA NA 2021-04-01 NA
1 2021-04-20 39 30-39 11 0 FALSE VAC1:0-Inf COV_1:0-Inf COV:0-Inf 0 -30 -999980 RISK:-Inf–1 -999980 -999980 1982-02-04 NA NA 2021-04-01 COV
1 2021-05-01 39 30-39 19 0 FALSE VAC1:0-Inf COV_1:0-Inf COV:0-Inf 11 -19 -999969 RISK:-Inf–1 -999969 -999969 1982-02-04 NA NA 2021-05-01 COV
1 2021-05-20 39 30-39 12 0 FALSE VAC2:0-Inf COV_2:0-Inf COV+COV:0-Inf 30 0 -999950 RISK:-Inf–1 -999950 -999950 1982-02-04 NA NA 2021-05-01 COV+COV
1 2021-06-01 39 30-39 30 0 FALSE VAC2:0-Inf COV_2:0-Inf COV+COV:0-Inf 42 12 -999938 RISK:-Inf–1 -999938 -999938 1982-02-04 NA NA 2021-05-01 COV+COV
2 2021-04-01 40 40-49 30 0 FALSE None None None -999999 -999999 -999999 RISK:-Inf–1 -999999 -999999 1980-05-04 NA NA 2021-04-01 NA
2 2021-05-01 40 40-49 31 0 FALSE None None None -999969 -999969 -999969 RISK:-Inf–1 -999969 -999969 1980-05-04 NA NA 2021-05-01 NA
2 2021-06-01 41 40-49 30 0 FALSE None None None -999938 -999938 -999938 RISK:-Inf–1 -999938 -999938 1980-05-04 NA NA 2021-05-01 NA

Memory efficiency

The function summarise_cohort_vac_outcome performs the followup time computation and aggregation in chunks to avoid memory issues. This is recommended for large cohorts. The split_by argument defines the number of chunks (In this dummy example it has no effect because the cohort size is one).

Note that with a large cohort, you can also use the n argument of construct_cohort_followup() for testing purposes to only compute over a small subset of the data (in this example n must be 1 because there was only one individual in the data).

summarise_cohort_vac_outcome(cohort_vac_outcome,
                             washout = 7,
                             datecuts = NULL,
                             vac_effects = list(21, 7),
                             exclude_washout = TRUE,
                             by = c("VACCINE_EFFECT_PRODUCTS", "IKARYHMA"),
                             n = 1,
                             split_by = 10,
                             verbose = TRUE) %>% 
  knitr::kable()
#> Processing strata 1/2
#> Processing strata 2/2
VACCINE_EFFECT_PRODUCTS IKARYHMA PERSONDAYS EVENT_COUNT
None 30-39 34 1
COV:21-Inf 30-39 11 0
COV+COV:21-Inf 30-39 5 0
COV+COV:7-Inf 30-39 30 1
None 40-49 85 1