Survival analysis and incidence computation with the epirapoSurvival R package • epirapoSurvival

About

This tutorial introduces the following functions in the epirapoSurvival package

The functions combine to allow general computations of events and follow-up time in different calendar time periods and vaccine effect states and allow for taking into account both incident and recurrent events and possible washout time.

A cohort, vaccinations and outcomes

Say we have the following cohort:

cohort <- tibble(HETU_ID = 1, SYNTYMAPAIVA = as.Date("1982-02-04"), KUOLINPVM = NA )
vacs <- tibble(HETU_ID = 1, RECORDDATE = as.Date("2021-05-03"), PRODUCT_ID  = "COV")
outcome <- tibble(HETU_ID = 1, TAPAHTUMAPVM = as.Date("2021-06-20"))

cohort %>% knitr::kable()

HETU_ID	SYNTYMAPAIVA	KUOLINPVM
1	1982-02-04	NA

with the following vaccinations:

vacs %>% knitr::kable()

HETU_ID	RECORDDATE	PRODUCT_ID
1	2021-05-03	COV

and these outcomes:

outcome %>% knitr::kable()

HETU_ID	TAPAHTUMAPVM
1	2021-06-20

Say we are interested in a followup period

start_date <- "2021-04-01"
end_date <- "2021-06-30"

We first combine the data into a single data frame using construct_cohort_vac_outcome()

cohort_vac_outcome <- construct_cohort_vac_outcome(cohort, vacs, outcome, 
                                                   start_date = start_date, 
                                                   end_date = end_date)
str(cohort_vac_outcome)
#> tibble [1 × 8] (S3: tbl_df/tbl/data.frame)
#>  $ HETU_ID     : num 1
#>  $ SYNTYMAPAIVA: Date[1:1], format: "1982-02-04"
#>  $ KUOLINPVM   : logi NA
#>  $ entrypvm    : Date[1:1], format: "2021-04-01"
#>  $ exitpvm     : Date[1:1], format: "2021-06-30"
#>  $ TAPAHTUMA_1 : Date[1:1], format: "2021-06-20"
#>  $ RECORDDATE_1: Date[1:1], format: "2021-05-03"
#>  $ PRODUCT_ID_1: chr "COV"
#>  - attr(*, "start_date")= Date[1:1], format: "2021-04-01"
#>  - attr(*, "end_date")= Date[1:1], format: "2021-06-30"
#>  - attr(*, "max_events")= int 1
#>  - attr(*, "washout")= num 0

Splitting the followup and indicating events

We can then split the data to selected follow-up intervals. By default, the followup is split by month, but that can be overridden by assigning the datecuts argument to NULL. In addition, the follow-up is split by specified vaccine effect time points, which can be integer vectors (positive or negative) of length 1 or more. These define cut points relative to the vaccination date. The effect intervals are given separately for each vaccine dose.

A washout can be defined relative to the event date and can be any positive integer, including 0 (no washout) or Inf (only incident cases). The choice of Inf corresponds to usual survival analysis where the follow-up ends at the first event. Other choices correspond to situations where recurring events are considered, but there is a washout period of the specified length after each event, during which time the individual does not contribute follow-up time. The washout time periods can be filtered out from the results, which is default behaviour.

Age groups are also added by default, according to right open cutpoints given in the argument agecuts.

cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
                                             datecuts = NULL,
                                             agecuts = c(0, 12, seq(20, 80, by = 10), Inf),
                                             vac1_effect = 21,
                                             washout = 180,
                                             exclude_washout = FALSE)

cohort_followup %>% knitr::kable()

LEX_ID	DATE	IKAVUOSI	IKARYHMA	PERSONDAYS	EVENT_COUNT	WASHOUT_INDICATOR	VACCINE_EFFECT	VACCINE_EFFECT_PRODUCT	VACCINE_EFFECT_PRODUCTS	COVVAC_T1	COVVAC_T2	COVVAC_T3	RISK_EFFECT	EVENT_T1	RISK_EXPOSURE_T	SYNTYMAPAIVA	KUOLINPVM	RISK_EXPOSURE_DATE	PRODUCTS
1	2021-04-01	39	30-39	53	0	FALSE	None	None	None	-32	-999999	-999999	RISK:-Inf–1	-80	-999999	1982-02-04	NA	NA	NA
1	2021-05-24	39	30-39	27	0	FALSE	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	21	-999946	-999946	RISK:-Inf–1	-27	-999946	1982-02-04	NA	NA	COV
1	2021-06-20	39	30-39	1	1	FALSE	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	48	-999919	-999919	RISK:-Inf–1	0	-999919	1982-02-04	NA	NA	COV
1	2021-06-21	39	30-39	10	0	TRUE	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	49	-999918	-999918	RISK:-Inf–1	1	-999918	1982-02-04	NA	NA	COV

Please see the documentation of the function for details on the returned data.

Dates can be cut into specified intervals, the default choice in months. Age is computed for the beginning of each individual follow-up interval in the final result, so for a unified age treatment for all individuals it is recommended to use datecuts to split the follow-up by calendar time.

cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
                                             agecuts = c(0, 12, seq(20, 80, by = 10), Inf),
                                             vac1_effect = 21,
                                             washout = 180)

cohort_followup %>% select(LEX_ID, CALENDAR, everything()) %>% knitr::kable()

LEX_ID	CALENDAR	DATE	IKAVUOSI	IKARYHMA	PERSONDAYS	EVENT_COUNT	WASHOUT_INDICATOR	VACCINE_EFFECT	VACCINE_EFFECT_PRODUCT	VACCINE_EFFECT_PRODUCTS	COVVAC_T1	COVVAC_T2	COVVAC_T3	RISK_EFFECT	EVENT_T1	RISK_EXPOSURE_T	SYNTYMAPAIVA	KUOLINPVM	RISK_EXPOSURE_DATE	PRODUCTS
1	2021-04-01	2021-04-01	39	30-39	30	0	FALSE	None	None	None	-32	-999999	-999999	RISK:-Inf–1	-80	-999999	1982-02-04	NA	NA	NA
1	2021-05-01	2021-05-01	39	30-39	23	0	FALSE	None	None	None	-2	-999969	-999969	RISK:-Inf–1	-50	-999969	1982-02-04	NA	NA	NA
1	2021-05-01	2021-05-24	39	30-39	8	0	FALSE	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	21	-999946	-999946	RISK:-Inf–1	-27	-999946	1982-02-04	NA	NA	COV
1	2021-06-01	2021-06-01	39	30-39	19	0	FALSE	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	29	-999938	-999938	RISK:-Inf–1	-19	-999938	1982-02-04	NA	NA	COV
1	2021-06-01	2021-06-20	39	30-39	1	1	FALSE	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	48	-999919	-999919	RISK:-Inf–1	0	-999919	1982-02-04	NA	NA	COV

Multiple vaccinations, multiple events, multiple effect periods

There can be any number of events and any number of vaccinations per individual.

The required outcome event dates can be named freely but if the name is other than “TAPAHTUMAPVM”, it needs to be specified by the outcomedate_column argument in the construct_cohort_vac_outcome() function. They outcome columns will be renamed as TAPAHTUMA_1,…,TAPAHTUMA_n in the results.


# a cohort of two individuals
cohort <- tibble(HETU_ID = c(1,2), 
                 SYNTYMAPAIVA = c(as.Date("1982-02-04"), as.Date("1980-05-04")),
                 KUOLINPVM = c(NA, NA))

# the first had two events and the second had one event
outcome <- tibble(HETU_ID = c(1,1, 2),
                  diagdate = as.Date(c("2021-04-25", "2021-06-25", "2021-04-02")))

# the first was vaccinated twice
vacs <- tibble(HETU_ID = c(1,1), 
               RECORDDATE = as.Date(c("2021-04-20", "2021-05-20")), 
               PRODUCT_ID  = c("COV", "COV"))

cohort_vac_outcome <- construct_cohort_vac_outcome(cohort, vacs, outcome, 
                                                   start_date = start_date, 
                                                   end_date = end_date,
                                                   outcomedate_column = "diagdate")

str(cohort_vac_outcome)
#> tibble [2 × 11] (S3: tbl_df/tbl/data.frame)
#>  $ HETU_ID     : num [1:2] 1 2
#>  $ SYNTYMAPAIVA: Date[1:2], format: "1982-02-04" "1980-05-04"
#>  $ KUOLINPVM   : logi [1:2] NA NA
#>  $ entrypvm    : Date[1:2], format: "2021-04-01" "2021-04-01"
#>  $ exitpvm     : Date[1:2], format: "2021-06-30" "2021-06-30"
#>  $ TAPAHTUMA_1 : Date[1:2], format: "2021-04-25" "2021-04-02"
#>  $ TAPAHTUMA_2 : Date[1:2], format: "2021-06-25" NA
#>  $ RECORDDATE_1: Date[1:2], format: "2021-04-20" NA
#>  $ RECORDDATE_2: Date[1:2], format: "2021-05-20" NA
#>  $ PRODUCT_ID_1: chr [1:2] "COV" NA
#>  $ PRODUCT_ID_2: chr [1:2] "COV" NA
#>  - attr(*, "start_date")= Date[1:1], format: "2021-04-01"
#>  - attr(*, "end_date")= Date[1:1], format: "2021-06-30"
#>  - attr(*, "max_events")= int 2
#>  - attr(*, "washout")= num 0

For the first three vaccinations, there are specific arguments vac1_effect, vac2_effect and vac3_effect. However, there is also the generic argument vac_effects, which is a list of any length and specifies effects (risk intervals) for any number of vaccinations. It is recommended to always use vac_effects instead of the vacx_effect arguments. The difference is that when using vac_effects, no extra placeholder columns related to vaccinations are created.

Otherwise, the number of events or vaccinations does not change how construct_cohort_followup() is called or used. The washout intervals are always identical for each event.


# using vac1_effect and vac2_effect arguments
cohort_followup1 <- construct_cohort_followup(cohort_vac_outcome,
                                             vac1_effect = c(0, 10, 21),
                                             vac2_effect = 7,
                                             washout = 180,
                                             datecuts = NULL,
                                             exclude_washout = FALSE)

cohort_followup1 %>% knitr::kable()

LEX_ID	DATE	IKAVUOSI	IKARYHMA	PERSONDAYS	EVENT_COUNT	WASHOUT_INDICATOR	VACCINE_EFFECT	VACCINE_EFFECT_PRODUCT	VACCINE_EFFECT_PRODUCTS	COVVAC_T1	COVVAC_T2	COVVAC_T3	RISK_EFFECT	EVENT_T1	EVENT_T2	RISK_EXPOSURE_T	SYNTYMAPAIVA	KUOLINPVM	RISK_EXPOSURE_DATE	PRODUCTS
1	2021-04-01	39	30-39	19	0	FALSE	None	None	None	-19	-49	-999999	RISK:-Inf–1	-24	-85	-999999	1982-02-04	NA	NA	NA
1	2021-04-20	39	30-39	5	0	FALSE	VAC1:0-9	COV_1:0-9	COV:0-9	0	-30	-999980	RISK:-Inf–1	-5	-66	-999980	1982-02-04	NA	NA	COV
1	2021-04-25	39	30-39	1	1	FALSE	VAC1:0-9	COV_1:0-9	COV:0-9	5	-25	-999975	RISK:-Inf–1	0	-61	-999975	1982-02-04	NA	NA	COV
1	2021-04-26	39	30-39	4	0	TRUE	VAC1:0-9	COV_1:0-9	COV:0-9	6	-24	-999974	RISK:-Inf–1	1	-60	-999974	1982-02-04	NA	NA	COV
1	2021-04-30	39	30-39	11	0	TRUE	VAC1:10-20	COV_1:10-20	COV:10-20	10	-20	-999970	RISK:-Inf–1	5	-56	-999970	1982-02-04	NA	NA	COV
1	2021-05-11	39	30-39	16	0	TRUE	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	21	-9	-999959	RISK:-Inf–1	16	-45	-999959	1982-02-04	NA	NA	COV
1	2021-05-27	39	30-39	29	0	TRUE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	37	7	-999943	RISK:-Inf–1	32	-29	-999943	1982-02-04	NA	NA	COV+COV
1	2021-06-25	39	30-39	1	1	TRUE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	66	36	-999914	RISK:-Inf–1	61	0	-999914	1982-02-04	NA	NA	COV+COV
1	2021-06-26	39	30-39	5	0	TRUE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	67	37	-999913	RISK:-Inf–1	62	1	-999913	1982-02-04	NA	NA	COV+COV
2	2021-04-01	40	40-49	1	0	FALSE	None	None	None	-999999	-999999	-999999	RISK:-Inf–1	-1	-999999	-999999	1980-05-04	NA	NA	NA
2	2021-04-02	40	40-49	1	1	FALSE	None	None	None	-999998	-999998	-999998	RISK:-Inf–1	0	-999998	-999998	1980-05-04	NA	NA	NA
2	2021-04-03	40	40-49	89	0	TRUE	None	None	None	-999997	-999997	-999997	RISK:-Inf–1	1	-999997	-999997	1980-05-04	NA	NA	NA

# using vac_effects argument is recommended
cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
                                             vac_effects = list(c(0, 10, 21), 7),
                                             washout = 180,
                                             datecuts = NULL,
                                             exclude_washout = FALSE)

cohort_followup %>% knitr::kable()

LEX_ID	DATE	IKAVUOSI	IKARYHMA	PERSONDAYS	EVENT_COUNT	WASHOUT_INDICATOR	VACCINE_EFFECT	VACCINE_EFFECT_PRODUCT	VACCINE_EFFECT_PRODUCTS	COVVAC_T1	COVVAC_T2	RISK_EFFECT	EVENT_T1	EVENT_T2	RISK_EXPOSURE_T	SYNTYMAPAIVA	KUOLINPVM	RISK_EXPOSURE_DATE	PRODUCTS
1	2021-04-01	39	30-39	19	0	FALSE	None	None	None	-19	-49	RISK:-Inf–1	-24	-85	-999999	1982-02-04	NA	NA	NA
1	2021-04-20	39	30-39	5	0	FALSE	VAC1:0-9	COV_1:0-9	COV:0-9	0	-30	RISK:-Inf–1	-5	-66	-999980	1982-02-04	NA	NA	COV
1	2021-04-25	39	30-39	1	1	FALSE	VAC1:0-9	COV_1:0-9	COV:0-9	5	-25	RISK:-Inf–1	0	-61	-999975	1982-02-04	NA	NA	COV
1	2021-04-26	39	30-39	4	0	TRUE	VAC1:0-9	COV_1:0-9	COV:0-9	6	-24	RISK:-Inf–1	1	-60	-999974	1982-02-04	NA	NA	COV
1	2021-04-30	39	30-39	11	0	TRUE	VAC1:10-20	COV_1:10-20	COV:10-20	10	-20	RISK:-Inf–1	5	-56	-999970	1982-02-04	NA	NA	COV
1	2021-05-11	39	30-39	16	0	TRUE	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	21	-9	RISK:-Inf–1	16	-45	-999959	1982-02-04	NA	NA	COV
1	2021-05-27	39	30-39	29	0	TRUE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	37	7	RISK:-Inf–1	32	-29	-999943	1982-02-04	NA	NA	COV+COV
1	2021-06-25	39	30-39	1	1	TRUE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	66	36	RISK:-Inf–1	61	0	-999914	1982-02-04	NA	NA	COV+COV
1	2021-06-26	39	30-39	5	0	TRUE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	67	37	RISK:-Inf–1	62	1	-999913	1982-02-04	NA	NA	COV+COV
2	2021-04-01	40	40-49	1	0	FALSE	None	None	None	-999999	-999999	RISK:-Inf–1	-1	-999999	-999999	1980-05-04	NA	NA	NA
2	2021-04-02	40	40-49	1	1	FALSE	None	None	None	-999998	-999998	RISK:-Inf–1	0	-999998	-999998	1980-05-04	NA	NA	NA
2	2021-04-03	40	40-49	89	0	TRUE	None	None	None	-999997	-999997	RISK:-Inf–1	1	-999997	-999997	1980-05-04	NA	NA	NA

As mentioned, the washout can be any suitable positive integer and the washout time can be excluded from the results, which is the default behaviour.


cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
                                             vac_effects = list(c(0, 10, 21),7),
                                             washout = 7,
                                             datecuts = NULL,
                                             exclude_washout = TRUE)

cohort_followup %>% knitr::kable()

LEX_ID	DATE	IKAVUOSI	IKARYHMA	PERSONDAYS	EVENT_COUNT	WASHOUT_INDICATOR	VACCINE_EFFECT	VACCINE_EFFECT_PRODUCT	VACCINE_EFFECT_PRODUCTS	COVVAC_T1	COVVAC_T2	RISK_EFFECT	EVENT_T1	EVENT_T2	RISK_EXPOSURE_T	SYNTYMAPAIVA	KUOLINPVM	RISK_EXPOSURE_DATE	PRODUCTS
1	2021-04-01	39	30-39	19	0	FALSE	None	None	None	-19	-49	RISK:-Inf–1	-24	-85	-999999	1982-02-04	NA	NA	NA
1	2021-04-20	39	30-39	5	0	FALSE	VAC1:0-9	COV_1:0-9	COV:0-9	0	-30	RISK:-Inf–1	-5	-66	-999980	1982-02-04	NA	NA	COV
1	2021-04-25	39	30-39	1	1	FALSE	VAC1:0-9	COV_1:0-9	COV:0-9	5	-25	RISK:-Inf–1	0	-61	-999975	1982-02-04	NA	NA	COV
1	2021-05-02	39	30-39	9	0	FALSE	VAC1:10-20	COV_1:10-20	COV:10-20	12	-18	RISK:-Inf–1	7	-54	-999968	1982-02-04	NA	NA	COV
1	2021-05-11	39	30-39	16	0	FALSE	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	21	-9	RISK:-Inf–1	16	-45	-999959	1982-02-04	NA	NA	COV
1	2021-05-27	39	30-39	29	0	FALSE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	37	7	RISK:-Inf–1	32	-29	-999943	1982-02-04	NA	NA	COV+COV
1	2021-06-25	39	30-39	1	1	FALSE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	66	36	RISK:-Inf–1	61	0	-999914	1982-02-04	NA	NA	COV+COV
2	2021-04-01	40	40-49	1	0	FALSE	None	None	None	-999999	-999999	RISK:-Inf–1	-1	-999999	-999999	1980-05-04	NA	NA	NA
2	2021-04-02	40	40-49	1	1	FALSE	None	None	None	-999998	-999998	RISK:-Inf–1	0	-999998	-999998	1980-05-04	NA	NA	NA
2	2021-04-09	40	40-49	83	0	FALSE	None	None	None	-999991	-999991	RISK:-Inf–1	7	-999991	-999991	1980-05-04	NA	NA	NA

Additional time-dependent risks

A column named RISK_EXPOSURE_DATE in the data frame given to construct_cohort_followup() is used to indicate the date of exposure to a time-dependent risk factor. The risk effect time intervals are defined by the risk_effect argument, which is a positive integer vector of length 1 or more, defining cut points relative to the risk exposure date. The risk effect intervals are the same for all individuals and all events. If there are no additional risks, the risk_effect argument can be set to NULL, which is the default value.

additional_risk <- data.frame(HETU_ID = 1,
                   RISK_EXPOSURE_DATE = as.Date("2021-05-22"))

cohort_vac_outcome <- cohort_vac_outcome %>% 
  left_join(additional_risk, by = "HETU_ID")

cohort_followup <- construct_cohort_followup(cohort_vac_outcome,
                                             vac_effects = list(c(0, 10, 21),7),
                                             risk_effect = c(0, 21),
                                             washout = 7,
                                             datecuts = NULL,
                                             exclude_washout = FALSE)

cohort_followup %>% knitr::kable()

LEX_ID	DATE	IKAVUOSI	IKARYHMA	PERSONDAYS	EVENT_COUNT	WASHOUT_INDICATOR	VACCINE_EFFECT	VACCINE_EFFECT_PRODUCT	VACCINE_EFFECT_PRODUCTS	COVVAC_T1	COVVAC_T2	RISK_EFFECT	EVENT_T1	EVENT_T2	RISK_EXPOSURE_T	SYNTYMAPAIVA	KUOLINPVM	RISK_EXPOSURE_DATE	PRODUCTS
1	2021-04-01	39	30-39	19	0	FALSE	None	None	None	-19	-49	RISK:-Inf–1	-24	-85	-51	1982-02-04	NA	2021-05-22	NA
1	2021-04-20	39	30-39	5	0	FALSE	VAC1:0-9	COV_1:0-9	COV:0-9	0	-30	RISK:-Inf–1	-5	-66	-32	1982-02-04	NA	2021-05-22	COV
1	2021-04-25	39	30-39	1	1	FALSE	VAC1:0-9	COV_1:0-9	COV:0-9	5	-25	RISK:-Inf–1	0	-61	-27	1982-02-04	NA	2021-05-22	COV
1	2021-04-26	39	30-39	4	0	TRUE	VAC1:0-9	COV_1:0-9	COV:0-9	6	-24	RISK:-Inf–1	1	-60	-26	1982-02-04	NA	2021-05-22	COV
1	2021-04-30	39	30-39	2	0	TRUE	VAC1:10-20	COV_1:10-20	COV:10-20	10	-20	RISK:-Inf–1	5	-56	-22	1982-02-04	NA	2021-05-22	COV
1	2021-05-02	39	30-39	9	0	FALSE	VAC1:10-20	COV_1:10-20	COV:10-20	12	-18	RISK:-Inf–1	7	-54	-20	1982-02-04	NA	2021-05-22	COV
1	2021-05-11	39	30-39	11	0	FALSE	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	21	-9	RISK:-Inf–1	16	-45	-11	1982-02-04	NA	2021-05-22	COV
1	2021-05-22	39	30-39	5	0	FALSE	VAC1:21-Inf	COV_1:21-Inf	COV+COV:21-Inf	32	2	RISK:0-20	27	-34	0	1982-02-04	NA	2021-05-22	COV+COV
1	2021-05-27	39	30-39	16	0	FALSE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	37	7	RISK:0-20	32	-29	5	1982-02-04	NA	2021-05-22	COV+COV
1	2021-06-12	39	30-39	13	0	FALSE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	53	23	RISK:21-Inf	48	-13	21	1982-02-04	NA	2021-05-22	COV+COV
1	2021-06-25	39	30-39	1	1	FALSE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	66	36	RISK:21-Inf	61	0	34	1982-02-04	NA	2021-05-22	COV+COV
1	2021-06-26	39	30-39	5	0	TRUE	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	67	37	RISK:21-Inf	62	1	35	1982-02-04	NA	2021-05-22	COV+COV
2	2021-04-01	40	40-49	1	0	FALSE	None	None	None	-999999	-999999	RISK:-Inf–1	-1	-999999	-999999	1980-05-04	NA	NA	NA
2	2021-04-02	40	40-49	1	1	FALSE	None	None	None	-999998	-999998	RISK:-Inf–1	0	-999998	-999998	1980-05-04	NA	NA	NA
2	2021-04-03	40	40-49	6	0	TRUE	None	None	None	-999997	-999997	RISK:-Inf–1	1	-999997	-999997	1980-05-04	NA	NA	NA
2	2021-04-09	40	40-49	83	0	FALSE	None	None	None	-999991	-999991	RISK:-Inf–1	7	-999991	-999991	1980-05-04	NA	NA	NA

Regression analysis

The data format returned by construct_cohort_followup() is suitable for survival analysis with Poisson regression. The EVENT_COUNT indicator (which is actually always 0 or 1 here) can be treated as the dependent variable, with PERSONDAYS as the offset in a Poisson regression analysis, and parameter estimation done with the glm() function. With this toy cohort of two individuals the results are meaningless but the following gives the idea.

# use the whole product series to mark the risk periods and 
# define the reference to be the unvaccinated state
cohort_followup$exposure <- relevel(factor(cohort_followup$VACCINE_EFFECT_PRODUCTS), ref = "None")

m1 <- glm(EVENT_COUNT ~ exposure, 
    offset = log(PERSONDAYS), 
    data = cohort_followup, 
    family = "poisson")

summary(m1)
#> 
#> Call:
#> glm(formula = EVENT_COUNT ~ exposure, family = "poisson", data = cohort_followup, 
#>     offset = log(PERSONDAYS))
#> 
#> Coefficients:
#>                        Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)              -4.700      1.000  -4.700  2.6e-06 ***
#> exposureCOV:0-9           2.398      1.414   1.696    0.090 .  
#> exposureCOV:10-20       -15.047   3550.834  -0.004    0.997    
#> exposureCOV:21-Inf      -16.000   5717.532  -0.003    0.998    
#> exposureCOV+COV:21-Inf  -15.211   5717.532  -0.003    0.998    
#> exposureCOV+COV:7-Inf     1.145      1.414   0.810    0.418    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 24.632  on 15  degrees of freedom
#> Residual deviance: 21.117  on 10  degrees of freedom
#> AIC: 39.117
#> 
#> Number of Fisher Scoring iterations: 16

Summarising the results

The computations for multiple individuals can be summarised using the summarise_cohort_followup() function. It takes as input the followup data and a character vector of columns to aggregate over

cohort_followup_aggregated <- summarise_cohort_followup(cohort_followup, by = c("IKARYHMA",
                                                  "VACCINE_EFFECT",
                                                  "VACCINE_EFFECT_PRODUCT",
                                                  "VACCINE_EFFECT_PRODUCTS", 
                                                  "RISK_EFFECT",
                                                  "WASHOUT_INDICATOR"))

cohort_followup_aggregated %>% 
  knitr::kable()

IKARYHMA	VACCINE_EFFECT	VACCINE_EFFECT_PRODUCT	VACCINE_EFFECT_PRODUCTS	RISK_EFFECT	WASHOUT_INDICATOR	PERSONDAYS	EVENT_COUNT
30-39	None	None	None	RISK:-Inf–1	FALSE	19	0
30-39	VAC1:0-9	COV_1:0-9	COV:0-9	RISK:-Inf–1	FALSE	6	1
30-39	VAC1:0-9	COV_1:0-9	COV:0-9	RISK:-Inf–1	TRUE	4	0
30-39	VAC1:10-20	COV_1:10-20	COV:10-20	RISK:-Inf–1	TRUE	2	0
30-39	VAC1:10-20	COV_1:10-20	COV:10-20	RISK:-Inf–1	FALSE	9	0
30-39	VAC1:21-Inf	COV_1:21-Inf	COV:21-Inf	RISK:-Inf–1	FALSE	11	0
30-39	VAC1:21-Inf	COV_1:21-Inf	COV+COV:21-Inf	RISK:0-20	FALSE	5	0
30-39	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	RISK:0-20	FALSE	16	0
30-39	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	RISK:21-Inf	FALSE	14	1
30-39	VAC2:7-Inf	COV_2:7-Inf	COV+COV:7-Inf	RISK:21-Inf	TRUE	5	0
40-49	None	None	None	RISK:-Inf–1	FALSE	85	1
40-49	None	None	None	RISK:-Inf–1	TRUE	6	0

An identical Poisson regression model can be estimated from the aggregated data. In this toy example the results can differ due to numerical instability.

# use the whole product series to mark the risk periods and 
# define the reference to be the unvaccinated state
cohort_followup_aggregated$exposure <- relevel(factor(cohort_followup_aggregated$VACCINE_EFFECT_PRODUCTS), ref = "None")

m2 <- glm(EVENT_COUNT ~ exposure, 
    offset = log(PERSONDAYS), 
    data = cohort_followup_aggregated, 
    family = "poisson")

summary(m2)
#> 
#> Call:
#> glm(formula = EVENT_COUNT ~ exposure, family = "poisson", data = cohort_followup_aggregated, 
#>     offset = log(PERSONDAYS))
#> 
#> Coefficients:
#>                         Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)               -4.700      1.000  -4.700  2.6e-06 ***
#> exposureCOV:0-9            2.398      1.414   1.696    0.090 .  
#> exposureCOV:10-20        -17.047   9652.168  -0.002    0.999    
#> exposureCOV:21-Inf       -18.000  15541.864  -0.001    0.999    
#> exposureCOV+COV:21-Inf   -17.212  15541.864  -0.001    0.999    
#> exposureCOV+COV:7-Inf      1.145      1.414   0.810    0.418    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 6.8854  on 11  degrees of freedom
#> Residual deviance: 3.3699  on  6  degrees of freedom
#> AIC: 21.37
#> 
#> Number of Fisher Scoring iterations: 18

Follow-up with no events

It is possible to construct the followup with no events.

cohort_vac_noevents <- construct_cohort_vac_outcome(cohort = cohort, 
                                                   vacs = vacs, 
                                                   start_date = start_date, 
                                                   end_date = end_date)

construct_cohort_followup(cohort_vac_noevents) %>% 
  knitr::kable()

LEX_ID	DATE	IKAVUOSI	IKARYHMA	PERSONDAYS	WASHOUT_INDICATOR	VACCINE_EFFECT	VACCINE_EFFECT_PRODUCT	VACCINE_EFFECT_PRODUCTS	COVVAC_T1	COVVAC_T2	COVVAC_T3	RISK_EFFECT	EVENT_T1	RISK_EXPOSURE_T	SYNTYMAPAIVA	KUOLINPVM	RISK_EXPOSURE_DATE	CALENDAR	PRODUCTS
1	2021-04-01	39	30-39	19	FALSE	None	None	None	-19	-49	-999999	RISK:-Inf–1	-999999	-999999	1982-02-04	NA	NA	2021-04-01	NA
1	2021-04-20	39	30-39	11	FALSE	VAC1:0-Inf	COV_1:0-Inf	COV:0-Inf	0	-30	-999980	RISK:-Inf–1	-999980	-999980	1982-02-04	NA	NA	2021-04-01	COV
1	2021-05-01	39	30-39	19	FALSE	VAC1:0-Inf	COV_1:0-Inf	COV:0-Inf	11	-19	-999969	RISK:-Inf–1	-999969	-999969	1982-02-04	NA	NA	2021-05-01	COV
1	2021-05-20	39	30-39	12	FALSE	VAC2:0-Inf	COV_2:0-Inf	COV+COV:0-Inf	30	0	-999950	RISK:-Inf–1	-999950	-999950	1982-02-04	NA	NA	2021-05-01	COV+COV
1	2021-06-01	39	30-39	30	FALSE	VAC2:0-Inf	COV_2:0-Inf	COV+COV:0-Inf	42	12	-999938	RISK:-Inf–1	-999938	-999938	1982-02-04	NA	NA	2021-05-01	COV+COV
2	2021-04-01	40	40-49	30	FALSE	None	None	None	-999999	-999999	-999999	RISK:-Inf–1	-999999	-999999	1980-05-04	NA	NA	2021-04-01	NA
2	2021-05-01	40	40-49	31	FALSE	None	None	None	-999969	-999969	-999969	RISK:-Inf–1	-999969	-999969	1980-05-04	NA	NA	2021-05-01	NA
2	2021-06-01	41	40-49	30	FALSE	None	None	None	-999938	-999938	-999938	RISK:-Inf–1	-999938	-999938	1980-05-04	NA	NA	2021-05-01	NA

Memory efficiency

The function summarise_cohort_vac_outcome performs the followup time computation and aggregation in chunks to avoid memory issues. This is recommended for large cohorts. The split_by argument defines the number of chunks (In this dummy example it has no effect because the cohort size is one).

Note that with a large cohort, you can also use the n argument of construct_cohort_followup() for testing purposes to only compute over a small subset of the data (in this example n must be 1 because there was only one individual in the data).

summarise_cohort_vac_outcome(cohort_vac_outcome,
                             washout = 7,
                             datecuts = NULL,
                             vac_effects = list(21, 7),
                             exclude_washout = TRUE,
                             by = c("VACCINE_EFFECT_PRODUCTS", "IKARYHMA"),
                             n = 1,
                             split_by = 10,
                             verbose = TRUE) %>% 
  knitr::kable()
#> Processing strata 1/2
#> Processing strata 2/2

VACCINE_EFFECT_PRODUCTS	IKARYHMA	PERSONDAYS	EVENT_COUNT
None	30-39	34	1
COV:21-Inf	30-39	11	0
COV+COV:21-Inf	30-39	5	0
COV+COV:7-Inf	30-39	30	1
None	40-49	85	1