Abstract
Background
Large‐scale evaluation of COVID‐19 is likely to rely on the quality of ICD coding. However, little is known about the validity of ICD‐coded COVID‐19 diagnoses.
Objectives
To evaluate the performance of diagnostic codes in detecting COVID‐19 during pregnancy.
Methods
We used data from a national cohort of 78,283 individuals with a pregnancy ending between 11 March 2020 and 31 January 2021 in the OptumLabs® Data Warehouse (OLDW). OLDW is a longitudinal, real‐world data asset with de‐identified administrative claims and electronic health record data. We identified all services with an ICD‐10‐CM diagnostic code of U07.1 and all laboratory claims records for COVID‐19 diagnostic testing. We compared ICD‐coded diagnoses to testing results to estimate positive and negative predictive values (PPV and NPV). To evaluate impact on risk estimation, we estimated risk of adverse pregnancy outcomes by source of exposure information.
Results
Of 78,283 pregnancies, 5644 had a laboratory test result for COVID‐19. Testing was most common among older individuals, Hispanic individuals, those with higher socioeconomic status and those with a diagnosed medical condition or pregnancy complication; 52% of COVID‐19 cases was identified through ICD‐coded diagnosis alone, 19% from laboratory test results alone and 29% from both sources. Agreement between ICD‐coded diagnosis and laboratory testing records was high 91% (95% confidence interval [CI] 90, 92). However, the PPV of ICD‐code diagnosis was low (36%; 95% CI 33, 39). We observed up to a 50% difference in risk estimates of adverse pregnancy outcomes when exposure was based on laboratory testing results or diagnostic coding alone.
Conclusions
More than one‐in‐five COVID‐19 cases would be missed by using ICD‐coded diagnoses alone to identify COVID‐19 during pregnancy. Epidemiological studies exclusively relying on diagnostic coding or laboratory testing results are likely to be affected by exposure misclassification. Research and surveillance should draw upon multiple sources of COVID‐19 diagnostic information.
Keywords: COVID‐19, diagnostic coding, pregnancy, SARS‐CoV‐2, validity
Synopsis
Study question
How well do diagnostic codes measure COVID‐19 during pregnancy?
What’s already known
Many studies examining the perinatal health impacts of COVID‐19 have utilised diagnostic coding from medical records to identify infections. However, the validity of these codes among pregnant patients has not yet been evaluated.
What this study adds
In a large cohort of pregnancies, we found that diagnostic coding without laboratory testing information would miss nearly 20% of COVID‐19 cases, resulting in exposure misclassification. Reliance on either diagnostic coding or laboratory testing data exclusively can bias risk estimates for adverse pregnancy outcomes by up to 50%. To avoid misclassification, epidemiology studies should draw from multiple sources of COVID‐19 information.
1. BACKGROUND
Large‐scale epidemiological studies often draw from existing medical records. These studies commonly use diagnostic coding systems, such as the International Classification of Diseases (ICD) system, to detect conditions of interest. In the context of the ongoing pandemic, large studies are likely to rely on ICD‐coded diagnoses for documenting COVID‐19. Given growing interest in evaluating the effects of COVID‐19 on maternal and infant health, 1 , 2 pregnant persons are one such group where large‐scale evaluation utilising diagnostic coding is likely. However, the validity and reliability of these codes for measuring COVID‐19, particularly among pregnant persons, remain uncertain.
ICD‐10‐CM codes were released in March 2020 for emergency use in order to clinically document diagnoses of COVID‐19. 3 These codes include U07.1 (COVID‐19 diagnosis without laboratory confirmation) and U07.2 (COVID‐19 diagnosis with laboratory confirmation). While many countries have adopted both codes, in the United States, only U07.1 was adopted. Guidance on the use of U07.1 is to record only a confirmed diagnosis as (a) documented by the provider, (b) documented through positive COVID‐19 test result or (c) through presumptive positive COVID‐19 test result. 3 Analysis of national commercial insurance claims data presents an opportunity to evaluate the accuracy of ICD‐10‐CM codes for two reasons: (1) commercially insured cohorts are closed with extensive information on contact with the medical health system; and (2) since all individuals are insured, financial access to laboratory testing may have less influence on results.
Using national insurance claims data, we aimed to evaluate (a) the ability of ICD‐coded clinical diagnoses to detect COVID‐19 when compared to laboratory testing results; and (b) how reliance on different methods of COVID‐19 detection could impact measurement of associations between COVID‐19 and pregnancy outcomes.
2. METHODS
2.1. Cohort selection
We conducted a claims‐based cohort study using de‐identified administrative claims and electronic health record (EHR) data from the OptumLabs® Data Warehouse (OLDW). 4 The database includes longitudinal health information on patients and enrollees across the United States. Claims data in OLDW include medical and pharmacy claims, laboratory results and enrolment records for commercial enrollees. Pregnancies, pregnancy outcomes and gestational age were identified and estimated from medical claims data using a validated algorithm based on ICD‐9, Current Procedural Terminology (CPT) codes, Healthcare Common Procedure Coding System (HCPC) and ICD‐9 procedure codes, which was modified for use with ICD‐10‐CM codes (Table S1). 5 All pregnant individuals with a date of delivery after 11 March 2020 (ie declaration of pandemic start) were selected for analysis. We excluded ectopic and molar pregnancies identified and pregnancies where the gestational age was inconsistent with the pregnancy outcome (ie spontaneous abortion at 32 weeks). To avoid right truncation in the cohort and ensure complete capture of medical events during the pregnancy period, we restricted analysis to those with an estimated date of conception before 30 April 2020. COVID‐19 information and pregnancy outcomes were complete in the dataset.
2.2. Variable measurement
We extracted physician, facility and laboratory claims records and EHR data for one year preceding and 30 days following the date of delivery. We identified COVID‐19 diagnoses from diagnosis codes (U07.1) in the physician or facility medical claims records. The presence of the diagnostic code indicated the individual had a COVID‐19 diagnosis, and we considered the absence of such a code as no COVID‐19 diagnosis. COVID‐19 laboratory tests and test results were identified using LOINC codes (Table S2) and free text information in the extracted laboratory claims record. We considered a positive COVID‐19 test result as an indication of COVID‐19 and a negative test result as an indication of no COVID‐19. All types of laboratory tests were considered, including serology, as we assumed a priori that infections at any time during pregnancy were potentially influential; however, we acknowledge that some infections detected through serology may have occurred prior to pregnancy, especially if testing was performed early in pregnancy. An encounter with a date of service within three days of the date of the laboratory test date was considered to be the same medical event. For pregnancies with more than one unique laboratory test result, we included any positive test result. For those with consecutive negative test results, we selected any test result coinciding with a COVID‐19 diagnosis in the medical record. For those with consecutive negative laboratory test results without no clinical diagnoses, we selected one negative result at random.
Information on maternal age, race/ethnicity, residence, educational attainment and household income was derived from enrolment data and linked data supplied by an external vendor. Race/ethnicity was defined as Black, Hispanic, Asian or White and was assigned by an external vendor based on a structured, rule‐based system that combines analysis of first names, middle names, surnames and surname prefixes and suffixes with geographic reference files. Values were then categorised to comply with data de‐identification requirements. Education was estimated based on the average level of education achieved among residents within the census block group. Household income was derived using public and private consumer data for the street address of the enrollee. We identified pre‐existing health conditions using medical claims records (Table S3).
2.3. Statistical analysis
We estimated the per cent of individuals tested during pregnancy for COVID‐19, overall and among those with a COVID‐19 diagnosis. We compared the rate of testing by sociodemographic and health characteristics using log‐binomial regression models which controlled for calendar week of conception (as a cubic spline term) and estimated gestational age at pregnancy end.
Among those tested for COVID‐19, we estimated the per cent agreement between ICD‐coded diagnoses and laboratory testing results using Kappa's coefficient. We additionally estimated the positive predictive value and negative predictive value of the U07.1 COVID‐19 diagnostic code. We estimated these values overall and by subgroups. To evaluate the influence of increased testing availability and changes in disease prevalence over time, we estimated values during three time periods: March 2020 to May 2020, June 2020 to September 2020 and October 2020 to January 2021. We additionally estimated values by method of laboratory detection (polymerase chain reaction vs. rapid antigen or serology), gestational age at infection (first, second or third trimester) and presence of medical risk factors including chronic medical conditions, pregnancy complications and advanced maternal age.
To evaluate how different sources of information could influence measures of association, we used Cox proportional hazard models treating COVID‐19 as a time‐varying exposure to estimate the risk of adverse birth outcomes associated with prenatal COVID‐19, using three different methods of identifying COVID‐19: laboratory testing only, diagnostic coding only or a combination of both. Models adjusted for maternal age, race/ethnicity, household income, pre‐existing medical conditions and week of conception (cubic spline).
2.3.1. Missing data
Missing covariate information was imputed using multiple imputation by bootstrapping and expectation‐maximization algorithm with 50 sets of imputations.
2.4. Ethics approval
Because this study involved analysis of pre‐existing, de‐identified data, it was considered exempt from Institutional Review Board approval.
3. RESULTS
Of the 86,111 pregnancies identified, 78,283 pregnancies had a date of conception after 30 April 2020 and were eligible for inclusion in the analysis (Figure 1); 5644 (7.2%, 95% CI 7.0%, 7.4%) pregnancies had a laboratory test result for COVID‐19. Laboratory testing peaked between June and July 2020 (Figure 2). The majority of laboratory tests (75.4%; n=4253) were performed using RT‐PCR; 1391 (24.6%) laboratory tests were performed by serology or rapid antigen testing. COVID‐19 testing rates increased with maternal age (RR ≥40 years vs. <24 years: 1.37, 95% CI 1.23, 1.51) and was more frequent among pregnancies with an asthma diagnosis (RR 1.26, 95% CI 1.18, 1.35), a diagnosed pregnancy complication (RR 1.09, 95% CI 1.04, 1.14) and among pregnancies delivered by caesarean delivery (RR 1.23, 95% CI 1.17, 1.29) (Table 1). Testing was less common among non‐Hispanic Black pregnant individuals (RR 0.91, 95% CI 0.83, 0.99) and those residing in rural areas compared with metropolitan (RR 0.58, 95% CI 0.42, 0.73). Among those with a clinical diagnosis of COVID‐19, similar factors were associated with COVID‐19 testing, with exception to caesarean delivery, age and race/ethnicity.
TABLE 1.
All pregnancies | Pregnancies diagnosed with COVID−19 | |||||
---|---|---|---|---|---|---|
Number of pregnancies | Tested | Number of pregnancies | Tested | |||
n (%) | aRR a (95% CI) | n (%) | aRR a (95% CI) | |||
Total | 78,283 | 5644 (7.2) | – | 2515 | 596 (23.7) | – |
Maternal age (years) | ||||||
<24 | 5684 | 345 (6.1) | 1.00 (Reference) | 277 | 62 (22.4) | 1.00 (Reference) |
25–29 | 19,496 | 1325 (6.8) | 1.14 (1.03, 1.26) | 773 | 195 (25.2) | 1.14 (0.89, 1.39) |
30–34 | 29,207 | 2157 (7.4) | 1.27 (1.16, 1.38) | 834 | 198 (23.7) | 1.11 (0.87, 1.36) |
35–39 | 18,348 | 1409 (7.7) | 1.38 (1.27, 1.49) | 481 | 116 (24.1) | 1.15 (0.89, 1.42) |
≥40 | 5548 | 408 (7.3) | 1.37 (1.23, 1.51) | 150 | 25 (16.7) | 0.79 (0.38, 1.21) |
Race/ethnicity | ||||||
White, non‐Hispanic | 48,969 | 3509 (7.2) | 1.00 (Reference) | 1346 | 309 (22.9) | 1.00 (Reference) |
Black, non‐Hispanic | 9205 | 603 (6.5) | 0.91 (0.83, 0.99) | 380 | 83 (21.8) | 0.97 (0.76, 1.18) |
Hispanic | 13,013 | 1043 (8.0) | 1.11 (1.04, 1.17) | 601 | 154 (25.6) | 1.15 (0.98, 1.32) |
Asian | 7096 | 489 (6.9) | 0.97 (0.88, 1.06) | 188 | 50 (26.6) | 1.13 (0.88, 1.39) |
Education | ||||||
≤High school graduate | 16,771 | 1747 (10.4) | 1.00 (Reference) | 699 | 167 (23.9) | 1.00 (Reference) |
Some college | 40,135 | 2714 (6.8) | 0.84 (0.77, 0.91) | 1222 | 284 (23.2) | 0.96 (0.77, 1.15) |
≥College graduate | 21,377 | 1183 (5.5) | 0.82 (0.76, 0.88) | 594 | 145 (24.4) | 0.93 (0.76, 1.11) |
Residence | ||||||
Metropolitan | 70,452 | 5,266 (7.5) | 1.00 (Reference) | 2215 | 549 (24.8) | 1.00 (Reference) |
Micropolitan | 4297 | 224 (5.2) | 0.69 (0.56, 0.82) | 164 | 29 (17.7) | 0.66 (0.31, 0.99) |
Small town/Rural | 3534 | 154 (4.4) | 0.58 (0.42, 0.73) | 136 | 18 (13.2) | 0.51 (0.07, 0.95) |
Household income | ||||||
<$40,000 | 15,564 | 1079 (6.9) | 1.00 (Reference) | 568 | 120 (21.1) | 1.00 (Reference) |
$40–74,000 | 17,340 | 1183 (6.8) | 0.99 (0.91, 1.07) | 546 | 116 (21.2) | 0.99 (0.77, 1.22) |
$75–124,000 | 21,575 | 1449 (6.7) | 0.97 (0.89, 1.04) | 681 | 182 (26.7) | 1.27 (1.07 1.47) |
$125–199,000 | 13,811 | 1067 (7.7) | 1.11 (1.03, 1.19) | 417 | 102 (24.5) | 1.15 (0.91, 1.38) |
≥$200,000 | 9993 | 866 (8.7) | 1.27 (1.18, 1.35) | 303 | 76 (25.1) | 1.20 (0.95, 1.44) |
Medical conditions | ||||||
Any medical condition | 9972 | 829 (8.3) | 1.16 (1.09, 1.23) | 366 | 96 (26.2) | 1.13 (0.94, 1.32) |
Asthma | 5294 | 480 (9.1) | 1.26 (1.18, 1.35) | 192 | 56 (29.2) | 1.27 (1.03, 1.50) |
Hypertension | 3652 | 267 (7.3) | 1.01 (0.89, 1.12) | 138 | 32 (23.2) | 0.96 (0.64, 1.27) |
Immune disorder | 160 | 14 (8.7) | 1.24 (0.75, 1.74) | 13 | <11 | – |
Neurological disorder | 237 | 15 (6.3) | 0.91 (0.42, 1.40) | <11 | <11 | – |
No medical condition | 68,311 | 4815 (7.0) | 1.00 (Reference) | 2149 | 500 (23.3) | 1.00 (Reference) |
Pregnancy complications | ||||||
Any complication | 23,971 | 1801 (7.5) | 1.09 (1.04, 1.14) | 820 | 196 (23.9) | 1.02 (0.87, 1.17) |
Pre‐eclampsia | 3443 | 269 (7.8) | 1.03 (0.91, 1.15) | 117 | 30 (25.6) | 1.03 (0.71, 1.36) |
Gestational diabetes | 8098 | 639 (7.9) | 1.01 (0.93, 1.09) | 279 | 59 (21.1) | 0.88 (0.64, 1.12) |
Bleeding in pregnancy | 4264 | 325 (7.6) | 1.10 (0.99, 1.20) | 149 | 53 (35.6) | 1.60 (1.37, 1.83) |
Haemorrhage in early pregnancy | 13,061 | 951 (7.3) | 1.13 (1.07, 1.20) | 442 | 107 (24.2) | 1.03 (0.85, 1.21) |
Hyperemesis | 3478 | 269 (7.7) | 1.04 (0.92, 1.15) | 118 | 30 (25.4) | 1.07 (0.74, 1.39) |
Antepartum haemorrhage | 4883 | 339 (6.9) | 1.01 (0.91, 1.11) | 190 | 60 (31.6) | 1.40 (1.18, 1.62) |
Postpartum haemorrhage | 3835 | 285 (7.4) | 0.96 (0.85, 1.07) | 150 | 31 (20.7) | 0.83 (0.50, 1.15) |
No complications | 54,312 | 3843 (7.1) | 1.00 (Reference) | 1695 | 400 (23.6) | 1.00 (Reference) |
Birth outcome | ||||||
Miscarriage | 7366 | 320 (4.3) | 1.35 (1.16, 1.55) | 80 | <11 | – |
Medical termination | 1187 | 22 (1.9) | 0.49 (0.06, 0.93) | 15 | <11 | – |
Stillbirth | 401 | 24 (6.0) | 0.87 (0.49, 1.26) | 14 | <11 | – |
Preterm birth | 4630 | 300 (6.5) | 0.88 (0.76, 1.00) | 190 | 41 (21.6) | 1.05 (0.80, 1.30) |
Spontaneous | 3526 | 224 (6.3) | 0.85 (0.72, 0.99) | 134 | 26 (19.4) | 0.78 (0.43, 1.13) |
Clinician‐initiated | 1104 | 76 (6.9) | 0.96 (0.73, 1.18) | 56 | 25 (44.6) | 1.87 (1.53, 2.22) |
Premature rupture of membranes | 8716 | 587 (6.7) | 0.88 (0.79, 0.96) | 277 | 60 (21.7) | 0.87 (0.64, 1.11) |
Caesarean delivery | 12,286 | 1087 (8.8) | 1.23 (1.17, 1.29) | 405 | 98 (24.2) | 0.99 (0.80, 1.18) |
Relative rate of testing, controlling for timing of conception (cubic spline) and estimated gestational age at pregnancy end.
Of the 5644 pregnant individuals with information on COVID‐19 testing, 736 had a record of COVID‐19: 380 (51.6%) had only a COVID‐19 diagnosis, 140 (19.0%) had only a positive laboratory test, and 216 (29.3%) had both a positive test and a COVID‐19 diagnosis code (Figure 3). The remaining 4908 pregnant individuals tested negative for COVID‐19 and had no clinical diagnosis of COVID‐19. Agreement between laboratory testing and diagnostic coding was high (90.8%; 95% CI 90.0, 91.5). When compared to laboratory testing information, clinical diagnoses of COVID‐19 had a positive predictive value of 36.2% (95% CI 33.3, 39.3) and a negative predictive value of 97.2% (95% CI 96.9, 97.5) (Table 2). Positive predictive values increased after May 2020 and were highest for infections occurring during third trimester (42.1%, 95% CI 38.1, 46.3). We observed no differences in positive predictive values by medical risk factors.
TABLE 2.
Tested positive (n = 356) | Diagnosis (n = 596) | Agreement (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | |
---|---|---|---|---|---|---|---|
% | % | ||||||
Overall | 6.6 | 10.5 | 90.8 (90.0, 91.5) | 60.7 (55.4, 65.8) | 92.8 (92.1, 93.5) | 36.2 (33.3, 39.3) | 97.2 (96.9, 97.5) |
By month of laboratory test | |||||||
Mar 2020–May 2020 | <3.1 a | 10.0 | 89.6 (87.5, 91.5) | 40.0 (19.1, 63.9) | 90.7 (88.6, 92.5) | 8.3 (4.9, 13.9) | 98.6 (98.0, 99.0) |
Jun 2020–Sept 2020 | 7.1 | 9.9 | 91.3 (90.4, 92.2) | 58.7 (52.5, 64.8) | 93.8 (93.0, 94.6) | 41.9 (37.9, 46.1) | 96.7 (96.3, 97.2) |
Oct 2020–Jan 2021 | 7.6 | 13.4 | 89.9 (87.9, 91.7) | 72.1 (60.9, 81.7) | 91.4 (89.4, 93.1) | 40.7 (34.9, 46.8) | 97.6 (96.5, 98.3) |
By laboratory testing method | |||||||
RT‐PCR | 8.1 | 11.0 | 90.8 (89.9, 91.7) | 61.2 (55.8, 66.4) | 93.4 (92.6, 94.2) | 44.9 (41.3, 48.5) | 96.5 (96.0, 96.9) |
Rapid antigen or serology | <3.1 a | 9.2 | 90.7 (89.1, 92.2) | 46.1 (19.2, 74.9) | 91.1 (89.5, 92.6) | 4.7 (2.6, 8.3) | 99.5 (99.1, 99.7) |
By gestation at time of laboratory test | |||||||
First trimester | 7.8 | 14.6 | 86.9 (83.8, 89.7) | 59.5 (43.3, 74.4) | 89.3 (86.2, 91.8) | 32.1 (24.8, 40.3) | 96.3 (94.7, 97.4) |
Second trimester | 6.2 | 10.8 | 89.1 (87.5, 90.5) | 49.1 (39.5, 58.7) | 91.7 (90.3, 93.0) | 28.2 (23.5, 33.5) | 96.5 (95.8, 97.0) |
Third trimester | 6.1 | 10.1 | 92.3 (91.4, 93.2) | 67.3 (60.4, 73.7) | 94.0 (93.1, 94.8) | 42.1 (38.1, 46.3) | 97.8 (97.3, 98.2) |
By health and medical factors | |||||||
High‐risk medical condition | |||||||
Yes | 6.4 | 11.6 | 89.0 (86.7, 91.1) | 54.7 (40.5, 68.4) | 91.4 (89.2, 93.3) | 30.2 (23.6, 37.7) | 96.7 (95.6, 97.5) |
No | 6.3 | 10.4 | 91.1 (90.3, 91.9) | 61.7 (56.0, 67.2) | 93.1 (92.3, 93.8) | 37.4 (34.2, 40.7) | 97.3 (96.9, 97.7) |
Diagnosis of a pregnancy complication | |||||||
Yes | 5.6 | 10.9 | 91.1 (89.7, 92.3) | 67.3 (57.3, 76.3) | 92.5 (91.1, 93.7) | 34.7 (30.0, 39.7) | 97.9 (97.3, 98.4) |
No | 6.6 | 10.4 | 90.7 (89.7, 91.6) | 58.0 (51.7, 64.2) | 93.0 (92.1, 93.8) | 37.0 (33.4, 40.8) | 96.9 (96.4, 97.3) |
Advanced maternal age | |||||||
Yes | 4.5 | 7.8 | 93.1 (91.8, 94.2) | 59.3 (47.8, 70.1) | 94.6 (93.5, 95.7) | 34.0 (28.3, 40.3) | 98.0 (97.5, 98.5) |
No | 7.2 | 11.9 | 89.7 (88.7, 90.7) | 61.1 (55.1, 66.9) | 91.9 (91.0, 92.8) | 36.9 (33.6, 40.4) | 96.8 (96.3, 97.3) |
Abbreviations: CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value. a Exact percentages are suppressed to comply with requirements for data release.
Exact percentages are suppressed to comply with requirements for data release.
Based on the two sources of information, although the incidence proportion of COVID‐19 would be higher if relying on laboratory testing alone (6.3%, 95% CI 5.7%, 7.0%) compared with diagnostic coding alone (3.2%, 95% CI 3.1, 3.3%), the proportionate severity would be similar (Table 3). For measures of association, the direction was the same for the outcomes considered; however, the strength of association was not. For example, when caesarean delivery was measured by laboratory testing only, there was a weaker association between COVID‐19 and caesarean when exposure was measured by laboratory testing alone (aHR 1.32, 95% CI 0.95, 1.84) compared with diagnostic coding alone (aHR 2.04, 95% CI 1.74, 2.39) (Table 3). We observed a stronger association between COVID‐19 and preterm birth when exposure was measured by laboratory testing alone (aHR 2.41, 95% CI 1.53, 3.79) compared with diagnostic coding alone (aHR 1.87, 95% CI 1.41, 2.47). Effect estimates for all other outcomes were within 0.23 units from each other.
TABLE 3.
Source of COVID−19 information | |||
---|---|---|---|
Laboratory test result (n = 5644) | Diagnostic coding (n = 78,283) | Both laboratory testing or diagnostic coding (n = 78,283) | |
Number of COVID−19 cases | 356 | 2515 | 2655 |
Incidence proportion, % (95% CI) | 6.3 (5.7, 7.0) | 3.2 (3.1, 3.3) | 3.4 (3.3, 3.5) |
Proportionate severity, % (95% CI) a | 3.2 (1.9, 5.5) | 3.6 (2.9, 4.4) | 3.4 (2.8, 4.2) |
Association with birth outcome (aHR b [95% CI]) | |||
Preterm birth | 2.41 (1.53, 3.79) | 1.87 (1.41, 2.47) | 2.07 (1.65, 2.61) |
Premature rupture of membranes | 1.36 (0.84, 2.20) | 1.59 (1.30, 1.95) | 1.54 (1.27, 1.87) |
Spontaneous preterm labour | 1.97 (1.15, 3.38) | 1.87 (1.41, 2.47) | 1.80 (1.38, 2.35) |
Caesarean delivery | 1.32 (0.95, 1.84) | 2.04 (1.74, 2.39) | 1.99 (1.71, 2.31) |
Foetal growth restriction | 2.13 (1.41, 3.24) | 2.15 (1.80, 2.57) | 2.04 (1.72, 2.43) |
Abbreviation: CI, confidence interval.
Proportionate severity indicates the proportion of COVID‐19 cases that were considered severe (ie required admission to intensive care unit, mechanical ventilation or extracorporeal membrane oxygenation, or diagnosis of acute respiratory distress syndrome).
Hazard ratios accounting for COVID‐19 infection as a time‐varying exposure and adjusting for maternal age, race/ethnicity, household income, pre‐existing medical conditions and week of conception (cubic spline).
4. COMMENT
4.1. Principal findings
This large, national study comparing clinical diagnoses to COVID‐19 laboratory testing results provides important insights into the quality of existing data for measuring COVID‐19 infections during pregnancy. In this commercially insured cohort, we found that the rate of COVID‐19 testing did not appear to be random and varied by sociodemographic and health factors. More than one‐in‐five COVID‐19 cases would be missed by ICD‐coded diagnoses alone, and 64% of those with a COVID‐19 diagnosis had a negative laboratory result. These results imply that reliance on diagnostic coding or laboratory testing results alone from medical records will result in misclassification of disease status, and depending on which measurement applied, could change measures of association with perinatal health outcomes. Epidemiological studies relying on existing health data should be aware of these considerations and how they may influence their study findings.
4.2. Strengths of the study
Our study had several key strengths. This study drew from a large national cohort of commercially insured US pregnant individuals with comprehensive health and medical information. Analyses within a commercially insured population presented the opportunity to perform our analyses with a population where access to clinical testing was not limited (a major strength of the study). Furthermore, in this population, clinical information from multiple sources can be readily pooled.
4.3. Limitations of the data
Despite these strengths, our study had several limitations to consider. First, the reliance on commercial insurance claims records, while useful, meant that our findings may not necessarily generalise to uninsured or publicly insured populations. Second, similar to evaluations of influenza, we opted to use laboratory testing results as the ‘gold standard’ for our comparisons. However, clinical guidance allows for diagnosis of COVID‐19 in the absence of a positive laboratory test if there are clinical observations or epidemiological reasoning to indicate COVID‐19 infection. As a result, it is possible that several of our test‐negative individuals were false negatives, which would have biased our estimates of positive predictive values.
4.4. Interpretation
COVID‐19 diagnoses are relatively novel, and few studies have evaluated their validity. A recent Danish study conducted among 710 diagnosed patients with COVID‐19 between February and May 2020 showed that compared with medical record review, ICD‐coded COVID‐19 diagnoses had a positive predictive value of 99%. 6 An important distinction between this study and ours is that for our comparison of diagnoses and laboratory testing information, we restricted our sample to tested individuals to allow comparison. Medical chart review of those who test negative but have a clinical diagnosis would be helpful for elucidating reasons for diagnosis, but was not possible in the current study. A cohort study of 2201 individuals tested for COVID‐19 at the University of Utah Medical Center showed that ICD codes for fever, cough and dyspnoea codes had low sensitivity and negative predictive values. 7 However, this study did not evaluate the validity of COVID‐19 diagnostic codes. A single‐centre validation study among 3,905 paediatric patients indicated U07.1 diagnosis had a sensitivity, specificity, positive predictive value and negative predictive value exceeding 90%. 8 Our findings do not align with these previous studies—which could be due to differences in patient population, clinical behaviours or a combination of both.
Our findings suggest existing data may be prone to several biases. First, in this insured cohort of pregnant individuals, although individuals would have had access to clinical testing, the rate of testing was not independent of sociodemographic and health factors. Given these factors have also been associated with more serious COVID‐19 infection, 9 this could indicate that detection of COVID‐19 is more common among those more prone to severe infection. We believe these results show that observational studies should consider the sociodemographic factors correlated with testing when using large observational databases to perform COVID‐19 research. Second, 64% of those with a clinical diagnosis had a negative COVID‐19 test. Clinical guidance is consistent with giving a diagnosis in the absence of a positive COVID‐19 test if clinical symptoms indicate. Furthermore, false negatives range from 21%–67%, depending on the duration of illness at the time of testing. 10 , 11 , 12 As a result, without medical chart review, it is difficult to say whether these diagnoses reflect true COVID‐19 cases. Since we relied on clinical records, our study would not have detected COVID‐19 among asymptomatic individuals who did not present for medical care or routine screening. As a result, it is possible that we underestimated the true incidence of COVID‐19 among pregnant persons.
Thirty‐nine per cent of COVID‐19 cases detected through laboratory testing had no diagnosis of COVID‐19, indicating that reliance on coded illnesses alone would miss these infections, including asymptomatic infections. Our analyses of how different sources of COVID‐19 information may influence effect estimates highlight that the use of laboratory testing information only or clinical diagnostic information only could result in somewhat different effect estimates for several key outcomes. We hypothesise, therefore, that differences in the measurement of the outcome in published studies 9 may be somewhat attributable to methods of measurement of the outcome. However, our results did not show consistently weaker or stronger effect estimates when clinical diagnosis was the sole means of identifying infection, suggesting that additional factors may contribute to variation in these estimates.
5. CONCLUSIONS
We believe our results highlight the importance of considering the source of information on COVID‐19 infection among pregnant individuals and how different sources of information may influence effect estimations. Regardless, utilisation of one data source alone is likely to miss a substantial portion of cases, resulting in measurement error and misclassification bias. It is important for epidemiological evaluations to consider these limitations when conducting research and interpreting findings.
CONFLICT OF INTERESTS
The authors have no conflicts of interest to disclose.
AUTHOR CONTRIBUTIONS
AKR extracted and prepared data and oversaw all aspects of project implementation. OA and SS contributed to the study design. OA and SS contributed to the development of the analytic plan. AKR performed the statistical analyses and prepared study results. AKR, OA and SS contributed to the interpretation of findings. AKR led the drafting of the report, and all co‐authors contributed to revising of the report and approved the final version.
Supporting information
ACKNOWLEDGEMENTS
The authors would like to acknowledge Nina Veeravalli, Christine Kha, and Rick Little at OptumLabs for their technical assistance and guidance in working with the data from the OptumLabs Data Warehouse. Open access publishing facilitated by Curtin University, as part of the Wiley ‐ Curtin University agreement via the Council of Australian University Librarians.
Regan AK, Arah OA, Sullivan SG. Performance of diagnostic coding and laboratory testing results to measure COVID‐19 during pregnancy and associations with pregnancy outcomes. Paediatr Perinat Epidemiol. 2022;36:508–517. doi: 10.1111/ppe.12863
Funding information
Access to study data was financially supported by OptumLabs(R) and University of California Los Angeles. The WHO Collaborating Centre for Reference and Research on Influenza is supported by the Australian Government Department of Health. The funder had no influence in the design or implementation of the study or the decision to publish the study findings
DATA AVAILABILITY STATEMENT
During the conduct of the study, the first author (AKR) had full access to the study data and takes responsibility for the integrity of the data and the accuracy of the data analysis; however, the authors do not have ongoing access to the data analysed in this study, nor do they have permission to share the study data with other researchers.
REFERENCES
- 1. Zambrano LD, Ellington S, Strid P, et al. Update: characteristics of symptomatic women of reproductive age with laboratory‐confirmed SARS‐CoV‐2 infection by pregnancy status ‐ United States, January 22‐October 3, 2020. MMWR Morb Mortal Wkly Rep. 2020;69:1641‐1647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Villar J, Ariff S, Gunier RB, et al. Maternal and neonatal morbidity and mortality among pregnant women with and without COVID‐19 infection: the INTERCOVID multinational cohort study. JAMA Pediatr. 2021;175(8):817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. CDC . International Classification of Diseases, Tenth Revision, Clinical Modification (ICD‐10‐CM). [cited 2021 July 28]. https://www.cdc.gov/nchs/icd/icd10cm.htm2021https://www.cdc.gov/nchs/icd/icd10cm.htm
- 4. OptumLabs . OptumLabs and OptumLabs Data Warehouse (OLDW) Descriptions and Citation. July. 2020. [Google Scholar]
- 5. Ailes EC, Simeone RM, Dawson AL, Petersen EE, Gilboa SM. Using insurance claims data to identify and estimate critical periods in pregnancy: an application to antidepressants. Birth Defects Res A Clin Mol Teratol. 2016;106:927‐934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Bodilsen J, Leth S, Nielsen SL, Holler JG, Benfield T, Omland LH. Positive predictive value of ICD‐10 diagnosis codes for COVID‐19. Clin Epidemiol. 2021;13:367‐372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Crabb BT, Lyons A, Bale M, et al. Comparison of international classification of diseases and related health problems, tenth revision codes with electronic medical records among patients with symptoms of coronavirus disease 2019. JAMA Netw Open. 2020;3:e2017703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Blatz AM, David MZ, Otto WR, Luan X, Gerber JS. Validation of international classification of disease‐10 code for identifying children hospitalized with coronavirus disease‐2019. J Pediatr Infect Dis. 2021;10:547‐548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Allotey J, Stallings E, Bonet M, et al. Clinical manifestations, risk factors, and maternal and perinatal outcomes of coronavirus disease 2019 in pregnancy: living systematic review and meta‐analysis. BMJ. 2020;370:m3320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Kucirka LM, Lauer SA, Laeyendecker O, Boon D, Lessler J. Variation in false‐negative rate of reverse transcriptase polymerase chain reaction‐based SARS‐CoV‐2 tests by time since exposure. Ann Intern Med. 2020;173:262‐267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Woloshin S, Patel N, Kesselheim AS. False negative tests for SARS‐CoV‐2 Infection — challenges and implications. New Engl J Med. 2020;383:e38. [DOI] [PubMed] [Google Scholar]
- 12. Arevalo‐Rodriguez I, Buitrago‐Garcia D, Simancas‐Racines D, et al. False‐negative results of initial RT‐PCR assays for COVID‐19: a systematic review. PLoS One. 2020;15:e0242958. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
During the conduct of the study, the first author (AKR) had full access to the study data and takes responsibility for the integrity of the data and the accuracy of the data analysis; however, the authors do not have ongoing access to the data analysed in this study, nor do they have permission to share the study data with other researchers.