Skip to main content
BMJ - PMC COVID-19 Collection logoLink to BMJ - PMC COVID-19 Collection
. 2022 Aug 23;12(8):e061126. doi: 10.1136/bmjopen-2022-061126

COVID-19 vaccination effectiveness rates by week and sources of bias: a retrospective cohort study

Anna Ostropolets 1, George Hripcsak 1,2,
PMCID: PMC9402447  PMID: 35998962

Abstract

Objective

To examine COVID-19 vaccine effectiveness over six 7-day intervals after the first dose and assess underlying bias in observational data.

Design and setting

Retrospective cohort study using Columbia University Irving Medical Center data linked to state and city immunisation registries.

Outcomes and measures

We used large-scale propensity score matching with up to 54 987 covariates, fitted Cox proportional hazards models and constructed Kaplan-Meier plots for two main outcomes (COVID-19 infection and COVID-19-associated hospitalisation). We conducted manual chart review of cases in week 1 in both groups along with a set of secondary analyses for other index date, outcome and population choices.

Results

The study included 179 666 patients. We observed increasing effectiveness after the first dose of mRNA vaccines with week 6 effectiveness approximating 84% (95% CI 72% to 91%) for COVID-19 infection and 86% (95% CI 69% to 95%) for COVID-19-associated hospitalisation. When analysing unexpectedly high effectiveness in week 1, chart review revealed that vaccinated patients are less likely to seek care after vaccination and are more likely to be diagnosed with COVID-19 during the encounters for other conditions. Secondary analyses highlighted potential outcome misclassification for International Classification of Diseases, Tenth Revision, Clinical Modification diagnosis, the influence of excluding patients with prior COVID-19 infection and anchoring in the unexposed group. Long-term vaccine effectiveness in fully vaccinated patients matched the results of the randomised trials.

Conclusions

For vaccine effectiveness studies, observational data need to be scrutinised to ensure compared groups exhibit similar health-seeking behaviour and are equally likely to be captured in the data. While we found that studies may be capable of accurately estimating long-term effectiveness despite bias in early weeks, the early week results should be reported in every study so that we may gain a better understanding of the biases. Given the difference in temporal trends of vaccine exposure and patients’ baseline characteristics, indirect comparison of vaccines may produce biased results.

Keywords: COVID-19, Health informatics, EPIDEMIOLOGY


Strengths and limitations of this study.

  • This study thoroughly investigates weekly COVID-19 vaccine effectiveness using methods to reduce potential confounding (large-scale propensity score matching, negative control calibration) accompanied by manual chart review of the cases in week 1.

  • The study includes a range of secondary analyses for different patient populations, anchoring strategies and outcome definitions.

  • The study was carried out using routinely collected clinical practice data, which represents real-world patients, but also implies a risk of misclassification.

Background

Randomised clinical phase III trials have demonstrated high efficacy for the four most commonly used COVID-19 vaccines against symptomatic COVID-19 infection, ranging from 66.9% and 70.4% for Ad26.COV2.S (Johnson & Johnson-Janssen) and ChAdOx1 (AstraZeneca) to 94.1% and 94.6% for BNT162b2 (Pfizer-BioNTech) and mRNA-1273 (Moderna) vaccines.1–4 Their rapid approval and widespread use require robust postmarketing studies that leverage large sample size, heterogeneous populations and longer follow-up available in observational data.

There have been recent observational studies which have shown effectiveness similar to the randomised clinical trials (RCTs) across the globe, both test negative and cohort,5–12 followed by studies across different patient populations, variants and number of doses.13–17

Nevertheless, the challenges associated with the use of observational data such as incomplete data capture, outcome misclassification and appropriate comparator sampling can undermine the results of the studies if such biases are not accounted for.18 For COVID-19 vaccines, questions associated with vaccine status misclassification,19 matching vaccinated and unvaccinated populations,6 addressing disease risk factor confounding and ascertainment bias20 21 and others were raised.

One of such questions is COVID-19 vaccine effectiveness during the first 2 weeks following the first dose. Studies have shown contradicting results for Pfizer-BioNTech vaccine with effectiveness ranging from moderate effectiveness of 52%3 to very high effectiveness of 92.6%.22 Similarly, a recent study showed an unexplained high effectiveness of Janssen vaccine during week 1.23 Other studies simply excluded the first week(s) from the time at risk.9 13 24–26 While week 1 lack of effectiveness has been suggested as a metric for lack of confounding in the long-term vaccine effectiveness studies, the reasons for high effectiveness and its impact on the validity of the conclusions regarding the overall effectiveness remain unclear.9

The goal of this study was to examine COVID-19 vaccine effectiveness over six 7-day intervals after the first dose to assess underlying bias associated with the use of observational data for short-term vaccine effectiveness and its impact on long-term vaccine effectiveness estimates. We employed large-scale propensity score matching and many negative controls to reduce bias and leveraged a range of secondary analyses as well as manual review of the COVID-19 infection cases in week 1 to examine the health-seeking behaviour of vaccinated and unvaccinated patients.

Methods

Main design

For this retrospective observational cohort study, we used electronic health records (EHR) from the Columbia University Irving Medical Center (CUIMC) database (online supplemental appendix 1), which has an ongoing automated connection to New York City and state public health department vaccine registries and includes all within-state vaccinations for our population. The data were translated to the OMOP Common Data Model version 5 as was used in multiple studies.27

Supplementary data

bmjopen-2022-061126supp001.pdf (5.4MB, pdf)

For our main analysis, we studied two mRNA vaccines (Pfizer-BioNTech or Moderna). The exposed group included patients indexed on the first dose of one of the corresponding vaccines with no prior COVID-19 infection and no previous exposure to other COVID-19 vaccines. For the unexposed group, we selected unvaccinated patients and set their index date to a date (not necessarily with any medical event) that matched the index date of one of the exposed group participants. Both the exposed and unexposed groups had at least 365 days of prior observation and primarily resided in New York City according to their zip code. Patients who did not reside in New York were excluded from the study to ensure reliable vaccination data capture.

Outcomes of interest included (a) COVID-19 infection defined as a positive COVID-19 test (reverse transcriptase PCR assay) or a diagnostic code of COVID-19 and (b) COVID-19 hospitalisation defined as an inpatient visit associated with a COVID-19 positive test or diagnosis within 30 days prior or during the visit. On further examination of the results, we added two other outcomes: (a) COVID-19 positive test only and (b) COVID-19 hospitalisation associated with a positive COVID-19 test. Design overview is provided in online supplemental appendix 2; code lists and links to phenotype definitions are provided in online supplemental appendix 3.

We calculated vaccine effectiveness during six consecutive 7-day intervals after the first dose. Within each interval, patients were followed up until an outcome, end of the period or death, whichever came earlier. Additionally, given the results for vaccine effectiveness during week 1 following the first dose, we conducted a chart review for patients with a COVID-19-positive test recorded in the above-mentioned period. We reviewed all cases for the vaccinated population as well as a random sample of the cases in the unvaccinated population and extracted the main complaint, COVID-19 history, including symptoms (fever, shortness of breath, sore throat, cough, etc.), severity, time from the first symptom to encounter and COVID-19 exposure.

Secondary analyses

We also conducted a set of secondary analyses. First, given that the published studies focused on patients without prior COVID-19 infection, we studied all eligible patients regardless of their previous COVID-19 status.

As the strategy for unvaccinated group index date selection (anchoring) has been reported to influence incidence of outcomes and baseline characteristics,28 29 we additionally tested unexposed patients indexed on a healthcare encounter matching the index date of one of the exposed group participants within 3 days corridor, with at least 365 days of prior observation located in New York.

Finally, we assessed vaccine effectiveness in patients with at least one dose of a COVID-19 vaccine and in fully vaccinated patients over all available follow-up to compare the estimates to the results of the RCTs. The latter was defined as 14 days after the second dose of Pfizer-BioNTech or Moderna vaccines or first dose of Janssen vaccine. For each comparison we estimated hazard ratios (HRs) and constructed Kaplan-Meier plots as described below.

Statistical methods

For each analysis, we fitted a lasso regression model to calculate propensity scores and match patients in each exposed and unexposed group with a 1:1 ratio. For large-scale propensity score model we used all demographic information, index year and month, as well as the number of visits, condition and drug groups, procedures, device exposures, laboratory and instrumental tests and other observations over long-term period (prior year) and short-term period (prior month).30 31

For each outcome, we fitted a Cox proportional hazards model to estimate HRs and constructed Kaplan-Meier plots. Empirical calibration based on the negative control outcomes was used to identify and minimise any potential residual confounding by calibrating HRs and 95% confidence intervals (CIs).32 33 Vaccine effectiveness was calculated as 100%×(1−HR).

All analyses were supported by the OHDSI Infrastructure (CohortMethod package available at https://ohdsi.github.io/CohortMethod/, FeatureExtraction available at https://ohdsi.github.io/FeatureExtraction/ and the Cyclops package for large-scale regularised regression34 available at https://ohdsi.github.io/Cyclops).

Diagnostics

We used multiple sources of diagnostics to estimate potential bias and confounding following best practices for evidence generation.35 First, we examined covariate and propensity score balance prior to proceeding with outcome modelling and effect estimation to ensure that we have enough sample size and to control for potential observed confounding.35 We plotted propensity scores to investigate the overlap in patient populations at the baseline and examined the balance of all baseline characteristics to determine if the exposed and unexposed cohorts were imbalanced at the baseline and after propensity score matching. Exposed and unexposed cohorts were said to be balanced if the standardised difference of means of all covariates after propensity score matching was less than 0.1.36

For negative control calibration, we used 93 negative controls (online supplemental appendix 4) with no known causal relationship with the COVID-19 vaccines. Negative controls were selected based on a review of existing literature, product labels and spontaneous reports and were reviewed by clinicians.37 We assessed residual bias from the negative control estimates.

Patient and public involvement

No patient was involved.

Results

Patient characteristics

In total, we identified 179 666 patients with at least one dose of COVID-19 vaccine in January to May 2021: 121 771 patients for Pfizer-BioNTech, 52 728 for Moderna and 5167 for Janssen (table 1). The sample included patients from all age groups, with or without comorbidities captured in inpatient and outpatient settings.

Table 1.

Patient baseline characteristics for patients with at least one dose of a COVID-19 vaccine and the unexposed patients, before and after propensity score matching

Before matching After matching
Characteristic Vaccinated Unvaccinated Standardised difference Vaccinated Unvaccinated Standardised difference
Pfizer-BioNTech COVID-19 vaccine
Patients (n) 121 771 164 997 101 109 101 111
Follow-up (days), median (IQR) 107 (80–137) 104 (71–137) 107 (78–149) 107 (79–140)
COVID-19 diagnosis or positive COVID-19 test (n) 822 1355
Positive COVID-19 test (n) 231 786
Age group (%)
 10–19 4.2 10.8 −0.25 4.8 4.3 0.02
 20–49 37.2 42.6 −0.11 40.3 40.1 0
 50–64 23.9 20.3 0.09 23.6 23.7 0
 65–74 18.8 12.6 0.17 15.8 16.6 −0.02
 75–84 11.3 8.9 0.08 10.6 10.7 0
 >84 4.1 3.8 0.02 4.2 4.1 0.01
Gender (%)
 Female 63.7 57.8 0.12 61.4 62 −0.01
Race (%)
 Asian 3.8 2.6 0.07 3.5 3.4 0.01
 Black or African American 12.4 14.2 −0.05 12.6 12.2 0.01
 White 40.5 35.1 0.11 39.3 39.5 0
Medical history (%)
 Chronic liver disease 0.6 0.6 0 0.5 0.5 0
 Chronic obstructive lung disease 1.3 1 0.02 1 1 0.01
 Dementia 1.2 1.1 0 1.1 1 0.01
 Depressive disorder 5.3 4 0.06 4 3.7 0.02
 Diabetes mellitus 7.1 5.2 0.08 5.7 5.4 0.01
 HIV infection 1.4 1.1 0.03 1.1 1 0
 Hyperlipidaemia 12.9 8.1 0.16 10.2 9.5 0.02
 Hypertensive disorder* 16 11.3 0.14 13.1 12.2 0.03
 Obesity 5.1 4.9 0.01 4.4 4.1 0.02
 Osteoarthritis 7.3 4.7 0.11 5.8 5.3 0.02
 Renal impairment† 3.7 3 0.04 2.9 2.7 0.01
 Cerebrovascular disease 1.7 1.4 0.02 1.5 1.4 0.01
 Heart disease‡ 8.6 7.1 0.06 7.5 7.1 0.02
 Malignant neoplastic disease 5.3 4.5 0.04 4.7 4.3 0.02
Charlson Comorbidity Index, mean (SD) 1.75 (3.18) 1.69 (3.09) −0.01 1.70 (3.11) 1.63 (3.03) −0.01
Influenza vaccination within a year prior 10.9 7.9 0.10 7.5 6.9 0.02
Moderna COVID-19 vaccine
Patients (n) 52 728 148 795 50 517 50 517
Follow-up (days), median (IQR) 127 (102–153) 123 (99–153) 126 (101–153) 126 (102–153)
COVID-19 diagnosis or positive COVID-19 test (n) 382 786
Positive COVID-19 test (n) 94 447
Age group (%)
 10–19 0.5 1.7 −0.12 0.5 0.4 0.01
 20–49 35.7 45.7 −0.20 36.9 37.4 −0.01
 50–64 21.2 23.3 −0.05 21.7 21.4 0.01
 65–74 21.3 14.4 0.18 20.6 20.5 0.00
 75–84 15.4 10 0.16 14.6 14.6 0.00
 >84 5.8 4.8 0.04 5.6 5.6 0.00
Gender (%)
 Female 64.4 58.7 0.12 64.2 64.7 −0.01
Race (%)
 Asian 4.2 2.8 0.07 4.2 4.4 −0.01
 Black or African American 8.7 14.2 −0.17 9 8.4 0.02
 White 48.3 34.4 0.29 46.9 47.9 −0.02
Medical history (%)
 Chronic liver disease 0.5 0.6 −0.02 0.5 0.5 0
 Chronic obstructive lung disease 1.4 1.1 0.02 1.2 1.2 0
 Dementia 1 1.2 −0.02 1 0.9 0.01
 Depressive disorder 4.7 3.9 0.04 4.2 4 0.01
 Diabetes mellitus 6.6 5.6 0.04 6.2 5.8 0.02
 HIV infection 0.9 1.2 −0.03 0.8 0.8 0
 Hyperlipidaemia 14.9 8.9 0.19 13 12.6 0.01
 Hypertensive disorder 16 12.4 0.1 14.7 13.9 0.02
 Obesity 4 4.4 −0.02 3.8 3.6 0.01
 Osteoarthritis 7.7 5.3 0.1 6.8 6.5 0.01
 Renal impairment 3.5 3.3 0.01 3.3 3 0.01
 Cerebrovascular disease 2.2 1.6 0.05 2 1.8 0.02
 Heart disease 10.1 7.6 0.09 9.2 8.7 0.02
 Malignant neoplastic disease 6.5 5 0.07 5.9 5.5 0.02
Charlson Comorbidity Index, mean (SD) 1.62 (2.81) 1.62 (3.00) 0.00 1.59 (2.80) 1.59 (2.99) 0.00
Influenza vaccination within a year prior 8.4 6.3 0.08 7.2 6.8 0.02
Janssen COVID-19 vaccine
Patients (n) 5167 52 643 5031 5031
Follow-up (days), median (IQR) 79 (72–95) 79 (72–95) 79 (72–95) 79 (72–95)
COVID-19 diagnosis or positive COVID-19 test (n) 31 37
Positive COVID-19 test (n) 8 16
Age group (%)
 10–19 0.8 0.8 0.00 0.8 0.8 0.00
 20–49 43.9 43 0.02 44.2 43.9 0.01
 50–64 31.7 31.7 0.00 31.8 31.3 0.01
 65–74 11.6 12.2 −0.02 11.5 12 −0.02
 75–84 7.6 7.9 −0.01 7.2 7.9 −0.03
 >84 4.3 4.3 0.00 4.2 4 0.01
Gender (%)
 Female 63.4 63.2 0.01 63.5 61.1 0.05
Race (%)
 Asian 3.6 1.7 0.12 3.7 3.6 0.01
 Black or African American 15.9 15.5 0.01 15.7 15.5 0
 White 37.4 35.7 0.03 37.4 37.5 0
Medical history (%)
 Chronic liver disease 1.1 0.7 0.05 1 1.2 −0.02
 Chronic obstructive lung disease 2.4 1.3 0.09 2 2.2 −0.01
 Dementia 2.6 1.1 0.11 2.2 2.2 0
 Depressive disorder 8 4.8 0.13 7.1 8 −0.03
 Diabetes mellitus 10.3 6.2 0.15 9.5 10.2 −0.02
 HIV infection 1.7 1.4 0.02 1.6 1.8 −0.01
 Hyperlipidaemia 14.3 10.2 0.13 13.4 14.3 −0.03
 Hypertensive disorder 21.4 13.8 0.2 20.1 21.7 −0.04
 Obesity 7.3 5.9 0.06 6.8 7.8 −0.04
 Osteoarthritis 8.4 6.2 0.08 7.8 8.8 −0.04
 Renal impairment 6.6 3.3 0.15 5.3 5.9 −0.02
 Cerebrovascular disease 2.7 1.7 0.07 2.3 2.4 −0.01
 Heart disease 11.8 8 0.13 10.3 11.7 −0.04
 Malignant neoplastic disease 5 4.9 0 4.8 5.2 −0.02
Charlson Comorbidity Index, mean (SD) 1.84 (3.34) 1.55 (2.96) −0.07 1.56 (3.04) 1.43 (2.79) −0.03
Influenza vaccination within a year prior 12.5 8.0 0.15 10.1 11.4 −0.04

*Hypertensive disorder includes primary and secondary hypertension.

†Renal impairment includes acute and chronic renal failure (prerenal and renal).

‡Heart disease includes cardiac arrhythmias, heart valve disorders, coronary arteriosclerosis, heart failure, cardiomyopathies, etc.

We observed that unexposed patients (table 1) were on average younger and had fewer comorbidities and less exposure to various drugs prior to matching. We were able to achieve balance on all covariates (up to 54 987 covariates, standardised difference of means less than 0.1) with propensity score matching. Figure 1 presents the covariate balance and propensity score balance plots showing that anchoring unvaccinated patients on a date allowed us to achieve better balance compared with anchoring patients on a visit.

Figure 1.

Figure 1

Diagnostics for the effectiveness study comparing the cohort vaccinated with at least one dose of Pfizer, Moderna or Janssen COVID-19 vaccines and unvaccinated cohort anchored on a date or on a visit: (A) covariate balance before and after propensity score matching, (B) preference score balance, and (C) effect of negative control calibration displaying effect estimate and SE. In (A), each dot represents the standardised difference of the means for a single covariate before and after stratification on the propensity score. In (C), each blue dot is a negative control. The area below the dashed line indicates estimates with p<0.05 and the orange area indicates estimates with calibrated p<0.05.

Patients vaccinated with Pfizer-BioNTech had a similar distribution of baseline characteristics compared with the patients vaccinated with Moderna but differed from the patients vaccinated with Janssen. On average, the latter group was younger, had more patients with race recorded as Black and had more comorbidities such as diabetes mellitus or hypertensive disorder (table 1).

When investigating the vaccination pathways, we discovered that 112 963 patients (93% of patients with at least one dose of Pfizer-BioNTech) had two doses of Pfizer-BioNTech and 42 384 (80%) patients had two doses of Moderna. We found 344 and 291 patients with three doses of the corresponding vaccines and 440 patients having mixed Pfizer-BioNTech, Moderna and Janssen vaccines in different combinations.

Within our database, Moderna was administered early on with a peak in January 2021 (figure 2), while Pfizer-BioNTech and Janssen vaccinations peaked in April. It was reflected in the follow-up time with Moderna patients having on average longer follow-up with some individuals having up to 5.8 months of postobservation.

Figure 2.

Figure 2

Distribution of vaccination month for COVID-19 vaccines. Black dots represent the number of incident COVID-19 cases (defined as a positive test) in each month.

Main week-by-week effectiveness analysis

Figure 3 shows vaccine effectiveness over six 7-day intervals for patients vaccinated with at least one dose of Pfizer-BioNTech or Moderna (160 114 patients) compared with unvaccinated patients (115 689). Due to the small sample size, we were not able to obtain stable week-by-week estimates for Janssen.

Figure 3.

Figure 3

Effectiveness of Pfizer-BioNTech and Moderna vaccines over six 7-day intervals after first dose; % and 95% CI for COVID-19 infection (A) and COVID-19 hospitalisation (B).

While week 1 was characterised by unexpectedly high effectiveness (58%; 95% CI 45% to 69% against COVID-19 infection and 72%; 95% CI 57% to 83% against COVID-19-associated hospitalisation), we observed plausible increasing effectiveness beginning week 2 with the effectiveness on week 6 approximating 84% (95% CI 72% to 91%) for COVID-19 infection and 86% (95% CI 69% to 95%) for COVID-19-associated hospitalisation.

We then looked at the week 1 COVID-19 infection cases to explain high effectiveness (figure 4). A chart review of week 1 positive COVID-19 tests revealed a high proportion of unvaccinated patients seeking care related to COVID-19 symptoms or COVID-19 exposure (85% in total) compared with only 61% of vaccinated patients. Initial healthcare encounters in vaccinated population were oftentimes related to other medical reasons such as comorbid conditions or surgeries (39% compared with 14% in unvaccinated population, online supplemental appendix 5). Moreover, an observed gap between symptom onset and an initial healthcare encounter was more pronounced in the vaccinated cohort as the patients attributed their symptoms to temporal vaccine side effects as opposed to COVID-19 infection.

Figure 4.

Figure 4

Chart review of COVID-19 cases (defined as a positive COVID-19 test) during week 1, vaccinated and unvaccinated patients.

When looking at the severity of COVID-19 symptoms at the initial encounter during week 1 after the index date, we observed that the unvaccinated cohort had a higher proportion of asymptomatic cases (39% compared with 18% in the vaccinated cohort) while the vaccinated population had more severe or mild cases (34% and 48%, respectively).

Secondary analysis

As cohort analysis allows us to construct Kaplan-Meier curves to assess effectiveness over time, we also looked at the effectiveness during the year after the first dose (online supplemental Appendices 6-8). We observed similar trends with all three vaccines being less effective during the first month after the first dose. After that, Pfizer-BioNTech and Moderna were highly effective against both COVID-19 infection and COVID-19-associated hospitalisation, while Janssen vaccine exhibited a wide range of effectiveness (online supplemental appendix 9).

The results for fully vaccinated patients with time at risk starting at the full vaccination matched the results of the clinical trials for corresponding vaccines (detailed estimates are provided in online supplemental Appendices 10 and 11).

Our initial design included a positive COVID-19 test or a diagnostic code as an outcome. On further case examination, we discovered that COVID-19 diagnostic codes in the CUIMC data were partially assigned to the patients with negative COVID-19 tests on or immediately following the date of diagnosis. In that case, International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) code U07.1 ‘Disease caused by Severe acute respiratory syndrome coronavirus 2’ was entered in the system for billing purposes (COVID-19 molecular or antibody tests) or for COVID-19 sequelae. We, therefore, focused on positive COVID-19 test only for our primary outcome, which led to higher effectiveness for all vaccines compared with using both positive test and diagnosis (online supplemental appendix 9).

Finally, exclusion of patients with prior COVID-19 infection in our main analysis resulted in higher effectiveness. Inclusion of patients regardless of their prior COVID-19 status led to a small decrease in observed effectiveness (online supplemental appendix 12) for both COVID-19 infection and hospitalisation in patients vaccinated with Moderna or Janssen.

Discussion

In this retrospective cohort study, we examined the effectiveness of COVID-19 mRNA vaccines over six 7-day intervals after the first dose. We scrutinised the effectiveness of the mRNA vaccines following the first dose and confirmed the findings of moderate vaccine effectiveness during the first 2 weeks. For week 1 following the first dose we discovered previously uncaptured differential biases in vaccinated and unvaccinated populations resulting in high vaccine effectiveness. Other researchers suggested that the difference between vaccinated and unvaccinated groups can be mitigated by adjusting for previous healthcare utilisation such as number of visits before baseline, comorbidities or prior vaccination behaviour.6 13 24 Nevertheless, the confounding we observed remains even on controlling for a large number of covariates including those above.

Vaccination directly influenced the attitude of patients towards their symptoms, causing a delay in seeking care and a higher symptom severity threshold needed to seek care or get tested. On contrary, vaccinated patients in other studies had higher rates of testing compared with unvaccinated.20 38 This indicates that patients’ attitude towards risk of infection and testing may vary geographically and over time. Similarly, frequency of testing may depend on local policies and practices.

In unvaccinated patients, mild COVID-19-related symptoms were the reason to seek care; in vaccinated patients such cases were mainly captured on seeking outpatient and inpatient care for other conditions.

For example, vaccinated patients could be hospitalised for elective surgery or delivery and be tested positive for COVID-19 on the day of admission or later on. Differential symptom severity was previously reported for other vaccines39 and may affect any observational study that uses hospitalisation as a surrogate for COVID-19 severity as it can be hard to accurately identify the main reason for hospitalisation in structured data.

Previous research suggested that vaccinated patients do not have an increase in the number of cases immediately following vaccination as they are unlikely to get vaccinated if sick.9 40 Our review of the cases in week 1 adds to ‘healthy vaccinee’ effect by showing that vaccinated patients are more likely to attribute their symptoms to common vaccine side effects and, therefore, are less likely to seek care.

Nevertheless, even when this differential bias is present, the estimates of the COVID-19 vaccine effectiveness in subsequent weeks still match the results of the RCTs. This indicates that high effectiveness during week 1 following vaccination does not necessarily undermine the estimates of subsequent vaccine effectiveness. On the other hand, we argue against using estimates of vaccine effectiveness within a short period after the vaccination as a negative control as the differences between the groups observed in this study are likely to be time variant and may diminish over time.41

Our secondary analyses discovered several challenges and potential biases that must be accounted for when conducting vaccine effectiveness studies on observational data. First, we observed that outcome definitions are prone to measurement error, which has not been studied thoroughly. Some of the published studies used ICD-10 or ICD-10-CM codes to identify COVID-19 outcomes.42–44 We found that the specifics of data capture and billing processes were associated with some patients having assigned COVID-19 diagnosis codes for billing for tests rather than as an indicator of active disease. Another reason for assigning the code was COVID-19 sequela, where the actual date of COVID-19 infection could have been anywhere from 6 months to a couple of weeks in the past. Some researchers have previously reported high positive predictive value of ICD-10 diagnostic codes for COVID-19, which points out that index date misclassification should be scrutinised in each institution participating in the analysis to make valid inferences.45 46

Second, inclusion or exclusion of patients with prior COVID-19 infection influenced estimated effectiveness. We observed that inclusion of patients with prior COVID-19 leads to lower effectiveness for all vaccines regardless of the outcome definition.

Third, an appropriate index event (anchor) for the unvaccinated cohort must be chosen to represent a counterfactual for vaccination.29 47 In our study, we confirmed that an arbitrary date represents a better counterfactual than a medical visit for COVID-19 vaccination, which is reflected in propensity score balance and covariate balance. Nevertheless, other institutions may have different vaccination pathways such as vaccination on discharge, which can make a visit a better counterfactual for vaccination. More generally, completeness of vaccination data capture is a crucial feature that influences the robustness of the study. While CUIMC data ensure complete exposure capture by linking EHR to the city and state registries, the researchers should exhibit caution with conducting studies on the data sources with unknown vaccination capture.

In general, our findings support the RCTs and previously published postmarketing studies for all three vaccines. Larger sample size for patients vaccinated with COVID-19 mRNA vaccines allowed us to have more power, which resulted in overlapping yet narrower CIs compared with the RCTs. On the other hand, our study had fewer patients with the Janssen vaccine, which resulted in wider yet overlapping intervals compared with the Janssen’s vaccine RCT.1 2 7 Nevertheless, an indirect comparison of these vaccines may not be accurate due to the differences in the populations we observed in our study. First, patients vaccinated with Janssen were substantially different from mRNA patients: on average, they were younger, had a higher proportion of patients with race recorded as Black and had more comorbidities. Therefore, comparative effectiveness studies of Janssen and mRNA vaccines require robust techniques such as large-scale propensity matching to ensure valid comparison. Second, while Moderna and Pfizer patients had similar baseline characteristics, the temporal distribution of vaccinations in CUIMC data differed. Moderna vaccine was administered early on in 2021 with the peak in January, while Pfizer vaccination peaked in April. Given the varying baseline COVID-19 prevalence, a comparison of mRNA vaccines requires matching patients on calendar month to account for this potential bias. These vaccines also had different administration pathways in our system. As opposed to Pfizer vaccine, which was administered at the CUIMC/New York-Presbyterian sites to all patients over a prolonged period, Moderna vaccination was performed elsewhere and recorded for actively observed patients. Such patients were more likely to get tested or receive care outside of our healthcare system.

Limitations

Due to observational nature of the study, the data sources may not have complete capture of patient conditions as the patients could seek care outside of the hospital system. While our outcome phenotype algorithms may be subject to measurement error, we provided additional analyses with alternative outcome definitions. Exposure misclassification was mitigated by having free and available COVID-19 testing and COVID-19 vaccination at the CUIMC/New York-Presbyterian sites as well as by having data capture from New York City and state immunisation registries. Along with availability of testing, COVID-19 baseline infection rate difference was mitigated by matching the exposed and unexposed groups on the index date and using the index month as a covariate in propensity score model. We attempted to address potential differences between exposed and unexposed groups by selecting a large number of covariates in our propensity score model such as number of visits, procedure and drug utilisation, prior vaccine behaviour, race and others. Nevertheless, we did not have data for social interactions, adherence to preventive measures and policies, which could affect the likelihood of COVID-19 infection and testing.

The results of the study may not be generalisable to other countries or settings with different vaccine administration practices and policies. Finally, the study period did not allow us to stratify the results by COVID-19 variants, which limits the generalisability of findings to other variants.

Conclusions

Observational data can be used to ascertain vaccine effectiveness if potential biases such as exposure and outcome misclassification are accounted for, and appropriate anchoring event is selected. When analysing vaccine effectiveness researchers need to scrutinise the data to ensure that compared groups exhibit similar health-seeking behaviour and are equally likely to be captured in the data and report their findings. Specifically for COVID-19 vaccines, an arbitrary date for the index date in unvaccinated patients represents a better counterfactual for vaccination than a healthcare encounter. Effectiveness over the first week(s) after the vaccination should be reported even though low or high effectiveness immediately after the vaccination may not invalidate study findings. Given the difference in temporal trends of vaccine exposure and baseline characteristics, there is a need for large-scale direct comparison of vaccines to examine comparative effectiveness.

Supplementary Material

Reviewer comments
Author's manuscript

Acknowledgments

We would like to acknowledge Patrick Ryan, an employee of Janssen Research and Development, Titusville, New Jersey, for his thoughtful feedback on the study.

Footnotes

Contributors: GH designed and supervised the study and acts as a guarantor. AO executed the study, interpreted the results and drafted the manuscript. GH and AO reviewed the manuscript, approved the final version and had final responsibility for the decision to submit for publication.

Funding: US National Library of Medicine (R01 LM006910), US Food and Drug Administration CBER BEST Initiative (75F40120D00039).

Competing interests: GH and AO received funding from the US National Institutes of Health (NIH) and the US Food and Drug Administration.

Patient and public involvement: Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review: Not commissioned; externally peer reviewed.

Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information. Patient-level data cannot be shared without approval from data custodians due to local information governance and data protection regulations.

Ethics statements

Patient consent for publication

Not applicable.

Ethics approval

The protocol for this research was approved by the Columbia University Institutional Review Board (AAAO7805). The study used deidentified data.

References

  • 1.Sadoff J, Gray G, Vandebosch A, et al. Safety and efficacy of single-dose Ad26.COV2.S vaccine against Covid-19. N Engl J Med 2021;384:2187–201. 10.1056/NEJMoa2101544 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Baden LR, El Sahly HM, Essink B, et al. Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine. N Engl J Med 2021;384:403–16. 10.1056/NEJMoa2035389 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Polack FP, Thomas SJ, Kitchin N, et al. Safety and efficacy of the BNT162b2 mRNA Covid-19 vaccine. N Engl J Med 2020;383:2603–15. 10.1056/NEJMoa2034577 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Voysey M, Clemens SAC, Madhi SA, et al. Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK. Lancet 2021;397:99–111. 10.1016/S0140-6736(20)32661-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Thompson MG, Stenehjem E, Grannis S, et al. Effectiveness of Covid-19 vaccines in ambulatory and inpatient care settings. N Engl J Med 2021;385:1355–71. 10.1056/NEJMoa2110362 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Tartof SY, Slezak JM, Fischer H, et al. Effectiveness of mRNA BNT162b2 COVID-19 vaccine up to 6 months in a large integrated health system in the USA: a retrospective cohort study. The Lancet 2021;398:1407–16. 10.1016/S0140-6736(21)02183-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Haas EJ, Angulo FJ, McLaughlin JM, et al. Impact and effectiveness of mRNA BNT162b2 vaccine against SARS-CoV-2 infections and COVID-19 cases, hospitalisations, and deaths following a nationwide vaccination campaign in Israel: an observational study using national surveillance data. Lancet 2021;397:1819–29. 10.1016/S0140-6736(21)00947-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kissling E, Hooiveld M, Sandonis Martín V, et al. Vaccine effectiveness against symptomatic SARS-CoV-2 infection in adults aged 65 years and older in primary care: I-MOVE-COVID-19 project, Europe, December 2020 to may 2021. Euro Surveill 2021;26 https://www.eurosurveillance.org/content/ 10.2807/1560-7917.ES.2021.26.29.2100670 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dagan N, Barda N, Kepten E, et al. BNT162b2 mRNA Covid-19 vaccine in a nationwide mass vaccination setting. N Engl J Med 2021;384:1412–23. 10.1056/NEJMoa2101765 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chemaitelly H, Yassine HM, Benslimane FM, et al. mRNA-1273 COVID-19 vaccine effectiveness against the B.1.1.7 and B.1.351 variants and severe COVID-19 disease in Qatar. Nat Med 2021;27:1614–21. 10.1038/s41591-021-01446-y [DOI] [PubMed] [Google Scholar]
  • 11.Lopez Bernal J, Andrews N, Gower C, et al. Effectiveness of Covid-19 vaccines against the B.1.617.2 (delta) variant. N Engl J Med 2021;385:585–94. 10.1056/NEJMoa2108891 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bedston S, Akbari A, Jarvis CI, et al. COVID-19 vaccine uptake, effectiveness, and waning in 82,959 health care workers: a national prospective cohort study in Wales. Vaccine 2022;40:1180–9. 10.1016/j.vaccine.2021.11.061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Waxman JG, Makov-Assif M, Reis BY, et al. Comparing COVID-19-related hospitalization rates among individuals with infection-induced and vaccine-induced immunity in Israel. Nat Commun 2022;13:2202. 10.1038/s41467-022-29858-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gazit S, Shlezinger R, Perez G, et al. The Incidence of SARS-CoV-2 Reinfection in Persons With Naturally Acquired Immunity With and Without Subsequent Receipt of a Single Dose of BNT162b2 Vaccine : A Retrospective Cohort Study. Ann Intern Med 2022;175:674–81. 10.7326/M21-4130 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Feikin DR, Higdon MM, Abu-Raddad LJ, et al. Duration of effectiveness of vaccines against SARS-CoV-2 infection and COVID-19 disease: results of a systematic review and meta-regression. Lancet 2022;399:924–44. 10.1016/S0140-6736(22)00152-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tartof SY, Slezak JM, Puzniak L, et al. Effectiveness of a third dose of BNT162b2 mRNA COVID-19 vaccine in a large US health system: a retrospective cohort study. Lancet Reg Health Am 2022;9:100198. 10.1016/j.lana.2022.100198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Price AM, Olson SM, Newhams MM, et al. BNT162b2 protection against the omicron variant in children and adolescents. N Engl J Med 2022;386:1899–909. 10.1056/NEJMoa2202826 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dean NE, Hogan JW, Schnitzer ME. Covid-19 vaccine effectiveness and the test-negative design. N Engl J Med 2021;385:1431–3. 10.1056/NEJMe2113151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Polinski JM, Weckstein AR, Batech M, et al. Effectiveness of the Single-Dose Ad26.COV2.S COVID Vaccine [Internet]. Infectious Diseases 2021. [Epub ahead of print: Available from] http://medrxiv.org/lookup/doi/ (cited 2021 Sep 23). [Google Scholar]
  • 20.Ioannidis JPA. Factors influencing estimated effectiveness of COVID-19 vaccines in non-randomised studies. BMJ Evid Based Med 2022:bmjebm-2021-111901. 10.1136/bmjebm-2021-111901 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Fell DB, Dimitris MC, Hutcheon JA, et al. Guidance for design and analysis of observational studies of fetal and newborn outcomes following COVID-19 vaccination during pregnancy. Vaccine 2021;39:1882–6. 10.1016/j.vaccine.2021.02.070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Skowronski D. Safety and efficacy of the BNT162b2 mRNA Covid-19 vaccine: a letter to the editor. N Engl J Med 2021;384:1576. [DOI] [PubMed] [Google Scholar]
  • 23.Tabak YP, Sun X, Brennan TA, et al. Incidence and estimated vaccine effectiveness against symptomatic SARS-CoV-2 infection among persons tested in US retail locations, may 1 to August 7, 2021. JAMA Netw Open 2021;4:e2143346. 10.1001/jamanetworkopen.2021.43346 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Barda N, Dagan N, Cohen C, et al. Effectiveness of a third dose of the BNT162b2 mRNA COVID-19 vaccine for preventing severe outcomes in Israel: an observational study. Lancet 2021;398:2093–100. 10.1016/S0140-6736(21)02249-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hall VJ, Foulkes S, Saei A, et al. COVID-19 vaccine coverage in health-care workers in England and effectiveness of BNT162b2 mRNA vaccine against infection (siren): a prospective, multicentre, cohort study. Lancet 2021;397:1725–35. 10.1016/S0140-6736(21)00790-X [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Pilishvili T, Gierke R, Fleming-Dutra KE, et al. Effectiveness of mRNA Covid-19 vaccine among U. S. Health Care Personnel. N Engl J Med 2021;385:e90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.OMOP Common Data Model [Internet] . GitHub. [Epub ahead of print: cited 2020 Feb 11] https://github.com/OHDSI/CommonDataModel [Google Scholar]
  • 28.Ostropolets A, Ryan PB, Schuemie MJ, et al. Characterizing anchoring bias in vaccine comparator selection due to health care utilization with COVID-19 and influenza: observational cohort study. JMIR Public Health Surveill 2022;8:e33099. 10.2196/33099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ostropolets A, Li X, Makadia R, et al. Factors influencing background incidence rate calculation: systematic empirical evaluation across an international network of observational databases. Front Pharmacol 2022;13:814198. 10.3389/fphar.2022.814198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tian Y, Schuemie MJ, Suchard MA. Evaluating large-scale propensity score performance through real-world and synthetic data experiments. Int J Epidemiol 2018;47:2005–14. 10.1093/ije/dyy120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Fortin SP, Johnston SS, Schuemie MJ. Correction to: applied comparison of large-scale propensity score matching and cardinality matching for causal inference in observational research. BMC Med Res Methodol 2021;21:174. 10.1186/s12874-021-01365-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Schuemie MJ, Ryan PB, Hripcsak G, et al. Improving reproducibility by using high-throughput observational studies with empirical calibration. Philos Trans A Math Phys Eng Sci 2018;376:20170356. 10.1098/rsta.2017.0356 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Schuemie MJ, Ryan PB, DuMouchel W, et al. Interpreting observational studies: why empirical calibration is needed to correct p-values. Stat Med 2014;33:209–18. 10.1002/sim.5925 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Suchard MA, Simpson SE, Zorych I, et al. Massive parallelization of serial inference algorithms for a complex generalized linear model. ACM Trans Model Comput Simul 2013;23:1–17. 10.1145/2414416.2414791 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Schuemie MJ, Ryan PB, Pratt N, et al. Principles of large-scale evidence generation and evaluation across a network of databases (legend). J Am Med Inform Assoc 2020;27:1331–7. 10.1093/jamia/ocaa103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Austin PC. Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Commun Stat Simul Comput 2009;38:1228–34. 10.1080/03610910902859574 [DOI] [Google Scholar]
  • 37.The knowledge base Workgroup of the observational health data sciences and informatics (OHDSI) collaborative. large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data. J Biomed Semant 2017;8:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Glasziou P, McCaffery K, Cvejic E, et al. Testing behaviour may bias observational studies of vaccine effectiveness [Internet]. Infectious Diseases 2022. http://medrxiv.org/lookup/doi/ (cited 2022 May 26). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lewnard JA, Tedijanto C, Cowling BJ, et al. Measurement of vaccine direct effects under the test-negative design. Am J Epidemiol 2018;187:2686–97. 10.1093/aje/kwy163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Remschmidt C, Wichmann O, Harder T. Frequency and impact of confounding by indication and healthy vaccinee bias in observational studies assessing influenza vaccine effectiveness: a systematic review. BMC Infect Dis 2015;15:429. 10.1186/s12879-015-1154-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hitchings MDT, Lewnard JA, Dean NE, et al. Use of recently vaccinated individuals to detect bias in test-negative case–control studies of COVID-19 vaccine effectiveness. Epidemiology [Internet] 2022. [Epub ahead of print: cited 2022 May 27] https://journals.lww.com/10.1097/EDE.0000000000001484 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Hadi YB, Thakkar S, Shah-Khan SM, et al. COVID-19 Vaccination Is Safe and Effective in Patients With Inflammatory Bowel Disease: Analysis of a Large Multi-institutional Research Network in the United States. Gastroenterology 2021;161:1336–9. 10.1053/j.gastro.2021.06.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Nunes B, Rodrigues AP, Kislaya I, et al. mRNA vaccine effectiveness against COVID-19-related hospitalisations and deaths in older adults: a cohort study based on data linkage of national health registries in Portugal, February to August 2021. Euro Surveill 2021;26 https://www.eurosurveillance.org/content/ 10.2807/1560-7917.ES.2021.26.38.2100833 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wright BJ, Tideman S, Diaz GA, et al. Comparative vaccine effectiveness against severe COVID-19 over time in US Hospital administrative data: a case-control study. Lancet Respir Med 2022;10:S221326002200042X:557–65. 10.1016/S2213-2600(22)00042-X [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bodilsen J, Leth S, Nielsen SL, et al. Positive predictive value of ICD-10 diagnosis codes for COVID-19. Clin Epidemiol 2021;13:367–72. Volume. 10.2147/CLEP.S309840 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Lynch KE, Viernes B, Gatsby E, et al. Positive predictive value of COVID-19 ICD-10 diagnosis codes across calendar time and clinical setting. Clin Epidemiol 2021;13:1011–8. Volume. 10.2147/CLEP.S335621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Ostropolets A, Ryan PB, Schuemie MJ, et al. Differential anchoring effects of vaccination comparator selection: characterizing a potential bias due to healthcare utilization in COVID-19 versus influenza [Internet]. Epidemiology 2021. http://medrxiv.org/lookup/doi/ (cited 2021 Nov 7). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data

bmjopen-2022-061126supp001.pdf (5.4MB, pdf)

Reviewer comments
Author's manuscript

Data Availability Statement

All data relevant to the study are included in the article or uploaded as supplementary information. Patient-level data cannot be shared without approval from data custodians due to local information governance and data protection regulations.


Articles from BMJ Open are provided here courtesy of BMJ Publishing Group

RESOURCES