Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Jan 7;16(1):e0244746. doi: 10.1371/journal.pone.0244746

Validating International Classification of Disease 10th Revision algorithms for identifying influenza and respiratory syncytial virus hospitalizations

Mackenzie A Hamilton 1,2, Andrew Calzavara 1, Scott D Emerson 1, Mohamed Djebli 1,2, Maria E Sundaram 1, Adrienne K Chan 2,3,4, Rafal Kustra 2, Stefan D Baral 5, Sharmistha Mishra 3,6,7,8, Jeffrey C Kwong 1,2,9,10,11,12,*
Editor: Judith Katzenellenbogen13
PMCID: PMC7790248  PMID: 33411792

Abstract

Objective

Routinely collected health administrative data can be used to efficiently assess disease burden in large populations, but it is important to evaluate the validity of these data. The objective of this study was to develop and validate International Classification of Disease 10th revision (ICD -10) algorithms that identify laboratory-confirmed influenza or laboratory-confirmed respiratory syncytial virus (RSV) hospitalizations using population-based health administrative data from Ontario, Canada.

Study design and setting

Influenza and RSV laboratory data from the 2014–15, 2015–16, 2016–17 and 2017–18 respiratory virus seasons were obtained from the Ontario Laboratories Information System (OLIS) and were linked to hospital discharge abstract data to generate influenza and RSV reference cohorts. These reference cohorts were used to assess the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the ICD-10 algorithms. To minimize misclassification in future studies, we prioritized specificity and PPV in selecting top-performing algorithms.

Results

83,638 and 61,117 hospitalized patients were included in the influenza and RSV reference cohorts, respectively. The best influenza algorithm had a sensitivity of 73% (95% CI 72% to 74%), specificity of 99% (95% CI 99% to 99%), PPV of 94% (95% CI 94% to 95%), and NPV of 94% (95% CI 94% to 95%). The best RSV algorithm had a sensitivity of 69% (95% CI 68% to 70%), specificity of 99% (95% CI 99% to 99%), PPV of 91% (95% CI 90% to 91%) and NPV of 97% (95% CI 97% to 97%).

Conclusion

We identified two highly specific algorithms that best ascertain patients hospitalized with influenza or RSV. These algorithms may be applied to hospitalized patients if data on laboratory tests are not available, and will thereby improve the power of future epidemiologic studies of influenza, RSV, and potentially other severe acute respiratory infections.

Introduction

Routinely collected health administrative data are increasingly being used to assess disease burden and aetiology [1,2]. Algorithms applied to International Classification of Disease (ICD) codes documented in hospital discharge abstracts can be used to identify cases of a disease for the purposes of disease surveillance, but it is imperative to evaluate the validity of such algorithms to limit misclassification bias in epidemiologic studies.

While several studies have assessed the validity of ICD codes for identifying influenza and respiratory syncytial virus (RSV) within health administrative data [18], many of those studies had limitations. Some studies could only examine correlative patterns between true cases and ICD-coded cases at an aggregate level, because they could not link data at the individual level [2,3,5,6]. Without individual-level data, there remains the risk of misclassification of individual cases, as well as challenges in characterizing the sensitivity, specificity, and predictive values of these algorithms. When individual-level data were available and validity parameters were reported, studies were generally limited by one or more of: small numbers of study centres, restricted participant age ranges, or inclusion of few respiratory virus seasons [1,4,7,8]. Consequently, the generalizability of these algorithms is uncertain.

The objective of this study was to develop and validate more generalizable ICD 10th revision (ICD-10) case-finding algorithms to identify patients hospitalized with laboratory-confirmed influenza or laboratory-confirmed RSV using population-based health administrative data from Ontario, Canada.

Methods

Ethical considerations

This study used laboratory and health administrative data from Ontario, Canada (population 13.5 million in 2016) housed at ICES. ICES is a prescribed entity under section 45 of Ontario’s Personal Health Information Protection Act (PHIPA). Section 45 authorizes ICES to collect personal health information, without consent, for the purpose of analysis or compiling statistical information with respect to the management of, evaluation or monitoring of, the allocation of resources to or planning for all or part of the health system. Projects conducted under section 45, by definition, do not require review by a Research Ethics Board. This project was conducted under section 45, and was approved by ICES’ Privacy and Legal Office.

Data sources

Ontario’s universal healthcare system captures virtually all healthcare interactions. To identify eligible patients for this study, we used data from the Ontario Laboratories Information System (OLIS), the Canadian Institute for Health Information’s Discharge Abstract Database (CIHI-DAD), and the Registered Persons Database (RPDB). These datasets were linked using unique encoded identifiers and analyzed at ICES.

OLIS is an electronic repository of Ontario’s laboratory test results, containing information on laboratory orders, patient demographics, provider information, and test results. The system captures data from hospital, commercial, and public health laboratories participating in OLIS. OLIS excludes: tests performed for purposes other than providing direct care to patients; tests that are ordered for out-of-province patients or providers; and tests for patients with health cards that are recorded as lost, stolen, expired, or invalid.

Implemented in 1988, CIHI-DAD captures administrative, clinical, and demographic information on all hospitalization discharges. Following a patient’s discharge from hospital, a trained medical coder codes the medical record with up to 25 ICD-10 diagnosis codes (1 “most responsible” diagnosis code and up to 24 additional diagnosis codes), all of which are recorded in CIHI-DAD.

The RPDB provides basic demographic information on all individuals who have ever had provincial health insurance, including birth date, sex and postal code of residence. Ontario health insurance eligibility criteria are summarized in Table A of the S1 Appendix.

Generating influenza and RSV reference standard cohorts

Influenza and RSV polymerase chain reaction (PCR) laboratory data were obtained from OLIS over 4 respiratory virus seasons ranging from 2014–15 to 2017–18. This time frame was selected to include as many seasons as possible during a period when a relatively higher and stable proportion of laboratories were reporting to OLIS. Respiratory virus seasonality was defined to create the most inclusive time frames that would capture influenza and RSV seasonal activity in Ontario between the 2014–15 and 2017–18 viral seasons according to data provided by Public Health Ontario’s Respiratory Pathogen Bulletin [9]. Therefore, influenza tests were collected from November to May and RSV tests were collected from November to April. Only one test per person per season was included in the reference cohort. If an individual was tested multiple times per season, we included the first positive test, or the first negative test if all tests were negative. Tests were excluded if they were linked to an individual who: was missing information on birth date, sex, or postal code from the RPDB; was not eligible for provincial health insurance or resided out of province according to the RPDB; or had a death date registered before the specimen collection date.

Laboratory data were then linked to CIHI-DAD hospitalization data using patients’ unique encoded identifiers. Only patients with suspected community-acquired infections, defined as specimen collection within 3 days before or after a hospital admission, were included in the analysis. This definition ensured reference hospitalizations were more likely to be associated with community-acquired influenza or RSV infection. Individuals with suspected nosocomial infections, defined as hospitalizations associated with specimens collected more than 72 hours post admission [10], were excluded from the reference cohorts for that respective season. Overall, the “true positive” influenza and RSV reference cohorts comprised all hospitalized patients who tested positive for influenza or RSV by PCR within 3 days of admission, respectively, and the “true negative” influenza and RSV reference cohorts comprised all hospitalized patients who tested negative for influenza or RSV by PCR within 3 days of admission, respectively.

Statistical analysis

The reference cohorts were used to assess the validity of influenza and RSV case-finding algorithms. Algorithms were defined according to combinations of ICD-10 codes that have been previously described in the literature [1,4,5] (see Table B in S1 Appendix for the detailed list of ICD-10 codes). In brief, algorithms included virus-specific ICD-10 codes alone (influenza: J09, J10.0, J10.1, J10.8; RSV: J12.1, J20.5, J21.0, B97.4) or in combination with common acute respiratory infection outcome codes such as pneumonia (J12.8, J12.9), bronchitis (J20.8, J20.9), or bronchiolitis (J21.8, J21.9).

The validity of each algorithm was evaluated by calculating sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). First, validity parameters were calculated by evaluating the “most responsible diagnosis” code in the discharge abstract. If an ICD-10 code in the algorithm was recorded as the most responsible diagnosis in the discharge abstract, then it was classified as an algorithm-positive record. Next, validity parameters were calculated using all diagnosis codes available in the discharge abstract. If an ICD-10 code in the algorithm was recorded as any diagnosis code on the discharge abstract, then it was classified as an algorithm-positive record. Algorithms applied to the most responsible diagnosis code were consistently less accurate than the same algorithms applied to all diagnosis codes (see Tables A–D in S2 Appendix). Therefore, we present the results of the latter analyses only.

To minimize false positive rates and minimize misclassification of algorithm-positive cases, top-performing algorithms were selected according to specificity and PPV parameters[11]. If multiple algorithms had similar specificity and PPV, we then prioritized sensitivity. Since PPV and NPV are susceptible to changes in disease prevalence [12], and thus may vary depending on patient age or month of hospital admission, we also validated the top-performing algorithms in the reference cohorts stratified by age and month of hospital admission. The algorithms with consistently high specificity and PPV were selected as top-performing algorithms.

We calculated 95% confidence intervals using the Clopper-Pearson exact method [13]. All analyses were conducted using SAS version 9.4 (SAS Institute, Cary, NC, USA).

Results

Influenza and RSV reference cohorts

We identified 133,422 and 96,624 PCR testing events for influenza and RSV, respectively, in OLIS during the 2014–15 to 2017–18 respiratory virus seasons (Fig 1). After exclusions, 83,638 (63%) and 61,117 (63%) events for influenza and RSV, respectively, were associated with a hospitalization within 3 days of specimen collection and thus comprised the reference cohorts. Reference cohort characteristics are summarized in Table 1. True positive cases, defined as hospitalizations associated with a positive PCR test, comprised 17.6% of the influenza cohort and 9.2% of the RSV cohort (Table 1). Patient age ranged from 0 to 105 years. In both reference cohorts, all age strata had at least 2,000 patients.

Fig 1. Flow diagram of patients included and excluded in the influenza and RSV algorithm development cohorts.

Fig 1

PCR, polymerase chain reaction; OLIS, Ontario Laboratory Information System; RSV, respiratory syncytial virus; OHIP, Ontario Health Insurance Plan. †Nosocomial infections were defined as hospitalizations associated with specimen collection dates more than 3 days post hospital admission and before hospital discharge. Patients with nosocomial associated infections were only excluded for the season in which their first positive test event was defined as a nosocomial infection.

Table 1. Characteristics of the influenza and RSV reference cohorts.

Characteristics Influenza Reference Cohort (N = 83,638) RSV Reference Cohort (N = 61,117)
Virus detected by PCR, n (%) 14,754 (17.6%) 5,614 (9.2%)
Season, n (%)    
    2014–2015 14,344 (17.2%) 9,267 (15.2%)
    2015–2016 14,931 (17.9%) 9,435 (15.4%)
    2016–2017 23,081 (27.6%) 20,360 (33.3%)
    2017–2018 31,282 (37.4%) 22,055 (36.1%)
Age Group, n (%)
    0–4 10,173 (12.2%) 7,260 (11.9%)
    5–19 2,890 (3.5%) 2,058 (3.4%)
    20–34 3,185 (3.8%) 2,400 (3.9%)
    35–49 4,890 (5.8%) 3,661 (6.0%)
    50–64 12,572 (15.0%) 9,297 (15.2%)
    65–74 14,332 (17.1%) 10,695 (17.5%)
    75–84 17,879 (21.4%) 12,999 (21.3%)
    85+ 17,717 (21.2%) 12,747 (20.9%)
Sex on RPDB, n (%)
    Female 41,997 (50.2%) 30,655 (50.2%)
    Male 41,641 (49.8%) 30,462 (49.8%)
Neighborhood Income Quintile, n (%)
    Missing Data 237 (0.3%) 179 (0.3%)
    1 (lowest) 22,238 (26.6%) 16,145 (26.4%)
    2 18,726 (22.4%) 13,998 (22.9%)
    3 16,190 (19.4%) 12,107 (19.8%)
    4 13,433 (16.1%) 9,580 (15.7%)
    5 (highest) 12,814 (15.3%) 9,108 (14.9%)
Risk factors for serious viral infection, n (%)
    Asthma 24,662 (29.5%) 18,239 (29.8%)
    Chronic Obstructive Pulmonary Disease 23,812 (28.5%) 17,183 (28.1%)
    Immunodeficiency 7,461 (8.9%) 5,693 (9.3%)
    Cancer 9,920 (11.9%) 7,527 (12.3%)
    Diabetes 29,384 (35.1%) 21,710 (35.5%)
    Hypertension 52,656 (63.0%) 38,696 (63.3%)
    Cardiac Ischemic Disease 17,065 (20.4%) 12,560 (20.6%)
    Congestive Heart Failure 25,031 (29.9%) 18,525 (30.3%)
    Ischemic Stroke or Transient Ischemic Attack 7,322 (8.8%) 5,408 (8.8%)
    Advanced Liver Disease 2,958 (3.5%) 2,313 (3.8%)
    Chronic Kidney Disease 19,316 (23.1%) 14,458 (23.7%)
    Dementia or frailty score > 15, n (%) 21,508 (25.7%) 16,084 (26.3%)
LTC Home Resident, n (%) 6,704 (8.0%) 5,001 (8.2%)
Received Influenza Vaccination, n (%) 28,469 (34.0%) 20,450 (33.5%)
Prior Hospital Admissions, mean (SD) 1.77 (2.64) 1.84 (2.75)
Prior Physician Visits, mean (SD) § 14.71 (13.40) 15.09 (13.67)
Length of Hospital Stay, days, mean (SD) 8.94 (18.24) 9.22 (18.93)
Spent time in ICU, n (%) 15,753 (18.8%) 11,728 (19.2%)

Continuous variables are expressed as means and standard deviations. Categorical variables are expressed as absolute numbers and percentages. One hospitalization per person, per season was included in counts. RSV, respiratory syncytial virus; PCR, polymerase chain reaction; RPDB, Registered Persons Database; LTC, long-term care; ICU, intensive care unit.

† As recorded in the same season as hospitalization.

‡ Mean prior hospital admissions in the past 3 years.

§ Mean prior physician visits in the past year.

Algorithm validation

Most influenza and RSV ICD-10 algorithms had specificities ≥95% and NPVs ≥94%. Algorithm sensitivities and PPVs were more variable, ranging from 69% to 91% and 20% to 94% respectively (Tables 2 and 3). We established two highly accurate ICD-10 algorithms that identified influenza hospitalizations: one that found discharge abstracts with influenza-specific codes accompanied by laboratory confirmation of influenza (FLU1; ICD-10 Codes: J09, J10.0, J10.1, J10.8) and another that found discharge abstracts with influenza-specific codes with or without laboratory confirmation of influenza (FLU2; ICD-10 Codes: J09, J10.0, J10.1, J10.8, J11.0, J11.1, J11.8). Specificity was ≥98% and PPV was ≥91% for both algorithms (Table 2).

Table 2. Validation of ICD-10 algorithms to identify hospitalized individuals with influenza infection.

ICD-10 Algorithm TP FP FN TN Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI)
Influenza-specific codesa, FLU1* 10,755 653 3,999 68,231 0.73(0.72–0.74) 0.99(0.99–0.99) 0.94(0.94–0.95) 0.94(0.94–0.95)
Influenza-specific + Influenza (virus not identified)b, FLU2* 12,245 1,201 2,509 67,683 0.83(0.82–0.84) 0.98(0.98–0.98) 0.91(0.91–0.92) 0.96(0.96–0.97)
Influenza-specific + ARI of multiple/unspecified sitesc 10,965 3,337 3,789 65,547 0.74(0.74–0.75) 0.95(0.95–0.95) 0.77(0.76–0.77) 0.95(0.94–0.95)
Influenza-specific + viral pneumoniad 10,819 1,341 3,935 67,543 0.73(0.73–0.74) 0.98(0.98–0.98) 0.89(0.88–0.90) 0.94(0.94–0.95)
Influenza-specific + bronchopneumoniae 11,517 18,690 3,237 50,194 0.78(0.77–0.79) 0.73(0.73–0.73) 0.38(0.38–0.39) 0.94(0.94–0.94)
Influenza-specific + acute bronchitisf 10,804 1,194 3,950 67,690 0.73(0.73–0.74) 0.98(0.98–0.98) 0.90(0.90–0.91) 0.94(0.94–0.95)
Influenza-specific + acute bronchiolitisg 10,792 2,117 3,962 66,767 0.73(0.72–0.74) 0.97(0.97–0.97) 0.84(0.83–0.84) 0.94(0.94–0.95)
Influenza-specific + ARI of multiple sites + acute bronchitis + acute bronchiolitis 12,321 3,201 2,433 65,683 0.84(0.83–0.84) 0.95(0.95–0.96) 0.79(0.79–0.80) 0.96(0.96–0.97)
Influenza-specific + viral infection (unspecified site)h 10,904 2,518 3,850 66,366 0.74(0.73–0.75) 0.96(0.96–0.96) 0.81(0.81–0.82) 0.95(0.94–0.95)
Influenza-specific + unspecified acute lower respiratory tract infectioni 10,792 1,033 3,962 67,851 0.73(0.72–0.74) 0.99(0.98–0.99) 0.91(0.91–0.92) 0.94(0.94–0.95)
Influenza-specific + all general ARI codesj 13,379 26,040 1,375 42,844 0.91(0.90–0.91) 0.62(0.62–0.63) 0.34(0.33–0.34) 0.97(0.97–0.97)

ICD-10, International Classification of Disease 10th Revision; ARI, acute respiratory infection; TP, true positive; FP, false positive; FN, false negative; TN, true negative; PPV, positive predictive value; NPV, negative predictive value

*Identified as a top-performing algorithm.

a—Influenza-specific (virus identified) ICD-10 codes: J09, J10.0, J10.1, J10.8

b—Influenza (virus not identified) ICD-10 codes: J11.0, J11.1, J11.8

c—Acute upper respiratory infections of multiple unspecified sites (virus unspecified/not identified) ICD-10 codes: J06.0, J06.8, J06.9

d—Viral pneumonia (virus unspecified/not identified) ICD-10 codes: J12.8, J12.9

e—Bronchopneumonia (organism unspecified) ICD-10 codes: J18.0, J18.8, J18.9

f—Acute bronchitis (organism unspecified) ICD-10 codes: J20.8, J20.9

g—Acute bronchiolitis (organism unspecified) ICD-10 codes: J21.8, J21.9

h–Viral infection (unspecified site) ICD-10 code: B34

i—Unspecified acute lower respiratory tract infection ICD-10 code: J22

j—General ARI ICD-10 codes: J11.0 J11.1, J11.8, J06.0, J06.8, J06.9, J12.8, J12.9, J18.0, J18.8, J18.9, J20.8, J20.9, J21.8, J21.9, B34, J22

Table 3. Validation of ICD-10 algorithms to identify hospitalized individuals with RSV infection.

ICD-10 Algorithm TP FP FN TN Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI)
RSV-specific codesa, RSV1* 3,881 403 1,733 55,100 0.69(0.68–0.70) 0.99(0.99–0.99) 0.91(0.90–0.91) 0.97(0.97–0.97)
RSV-specific + ARI of multiple/unspecified sitesb 4,038 2,079 1,576 53,424 0.72(0.71–0.73) 0.96(0.96–0.96) 0.66(0.65–0.67) 0.97(0.97–0.97)
RSV-specific + Influenza (virus not identified)c 3,895 1,551 1,719 53,952 0.69(0.68–0.71) 0.97(0.97–0.97) 0.72(0.70–0.73) 0.97(0.97–0.97)
RSV-specific + viral pneumoniad 3,938 867 1,676 54,636 0.70(0.69–0.71) 0.98(0.98–0.99) 0.82(0.81–0.83) 0.97(0.97–0.97)
RSV-specific + bronchopneumoniae 4,280 13,667 1,334 41,836 0.76(0.75–0.77) 0.75(0.75–0.76) 0.24(0.23–0.24) 0.97(0.97–0.97)
RSV-specific + acute bronchitisf 3,896 764 1,718 54,739 0.69(0.68–0.71) 0.99(0.99–0.99) 0.84(0.83–0.85) 0.97(0.97–0.97)
RSV-specific + acute bronchiolitisg 4,115 990 1,499 54,513 0.73(0.72–0.74) 0.98(0.98–0.98) 0.81(0.80–0.82) 0.97(0.97–0.97)
RSV-specific + ARI of multiple sites + acute bronchitis + acute bronchiolitis 4,276 3,007 1,338 52,496 0.76(0.75–0.77) 0.95(0.94–0.95) 0.59(0.58–0.60) 0.98(0.97–0.98)
RSV-specific + viral infection (unspecified site)h 3,944 1,650 1,670 53,853 0.70(0.69–0.71) 0.97(0.97–0.97) 0.71(0.69–0.72) 0.97(0.97–0.97)
RSV-specific + unspecified acute lower respiratory tract infectioni, RSV2* 3,896 598 1,718 54,905 0.69(0.68–0.71) 0.99(0.99–0.99) 0.87(0.86–0.88) 0.97(0.97–0.97)
RSV-specific + all general ARI codesj 4,769 18,867 845 36,636 0.85(0.84–0.86) 0.66(0.66–0.66) 0.20(0.20–0.21) 0.98(0.98–0.98)

ICD-10, International Classification of Disease 10th Revision; RSV, respiratory syncytial virus; ARI, acute respiratory infection; TP, true positive; FP, false positive; FN, false negative; TN, true negative; PPV, positive predictive value; NPV, negative predictive value.

*Identified as a top performing algorithm.

a—RSV-specific (virus identified) ICD-10 codes: J12.1, J20.5, J21.0, B97.4

b—Acute upper respiratory infections of multiple unspecified sites (virus unspecified/not identified) ICD-10 codes: J06.0, J06.8, J06.9

c—Influenza (virus not identified) ICD-10 codes: J11.0, J11.1, J11.8

d—Viral pneumonia (virus unspecified/not identified) ICD-10 codes: J12.8, J12.9

e—Bronchopneumonia (organism unspecified) ICD-10 codes: J18.0, J18.8, J18.9

f—Acute bronchitis (organism unspecified) ICD-10 codes: J20.8, J20.9

g—Acute bronchiolitis (organism unspecified) ICD-10 codes: J21.8, J21.9

h–Viral infection (unspecified site) ICD-10 code: B34

i—Unspecified acute lower respiratory tract infection ICD-10 code: J22

j—General ARI ICD-10 codes: J11.0 J11.1, J11.8, J06.0, J06.8, J06.9, J12.8, J12.9, J18.0, J18.8, J18.9, J20.8, J20.9, J21.8, J21.9, B34, J22

Similarly, we established two highly accurate ICD-10 algorithms that identified RSV hospitalizations: one that found discharge abstracts with RSV-specific codes (RSV1; ICD-10 Codes: J12.1, J20.5, J21.0, B97.4), and another that found discharge abstracts with RSV-specific codes and unspecified acute lower respiratory tract infection codes (RSV2; ICD-10 Codes: J12.1, J20.5, J21.0, B97.4, J22). Specificity was 99%, and PPV was ≥87% for both algorithms (Table 3).

Algorithm validation by age group and month of admission

Validity of the FLU1 and FLU2 algorithms did not vary substantially by age (Table 4). Both algorithms had specificities ≥98% and PPVs ≥89% across all age strata. More variability in FLU1 and FLU2 algorithm validity was observed when assessed by month of hospital admission (Table E in S2 Appendix). Specificity of both algorithms remained ≥98% during all months, whereas sensitivity and PPV decreased in November and May.

Table 4. Validation of top-performing ICD-10 influenza and RSV algorithms by age at hospital admission.

ICD-10 Algorithm TP FP FN TN Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI)
FLU1 Algorithma
    0–4 751 69 378 8,975 0.67(0.64–0.69) 0.99(0.99–0.99) 0.92(0.89–0.93) 0.96(0.96–0.96)
    5–19 379 24 223 2,264 0.63(0.59–0.67) 0.99(0.98–0.99) 0.94(0.91–0.96) 0.91(0.90–0.92)
    20–34 295 15 156 2,719 0.65(0.61–0.70) 0.99(0.99–1.00) 0.95(0.92–0.97) 0.95(0.94–0.95)
    35–49 518 28 228 4,116 0.69(0.66–0.73) 0.99(0.99–1.00) 0.95(0.93–0.97) 0.95(0.94–0.95)
    50–64 1,405 78 544 10,545 0.72(0.70–0.74) 0.99(0.99–0.99) 0.95(0.93–0.96) 0.95(0.95–0.95)
    65–74 1,772 115 652 11,793 0.73(0.71–0.75) 0.99(0.99–0.99) 0.94(0.93–0.95) 0.95(0.94–0.95)
    75–84 2,683 160 931 14,105 0.74(0.73–0.76) 0.99(0.99–0.99) 0.94(0.93–0.95) 0.94(0.93–0.94)
    85+ 2,952 164 887 13,714 0.77(0.76–0.78) 0.99(0.99–0.99) 0.95(0.94–0.95) 0.94(0.94–0.94)
FLU2 Algorithmb
    0–4 844 104 285 8,940 0.75(0.72–0.77) 0.99(0.99–0.99) 0.89(0.87–0.91) 0.97(0.97–0.97)
    5–19 438 39 164 2,249 0.73(0.69–0.76) 0.98(0.98–0.99) 0.92(0.89–0.94) 0.93(0.92–0.94)
    20–34 347 38 104 2,696 0.77(0.73–0.81) 0.99(0.98–0.99) 0.90(0.87–0.93) 0.96(0.96–0.97)
    35–49 598 64 148 4,080 0.80(0.77–0.83) 0.98(0.98–0.99) 0.90(0.88–0.92) 0.97(0.96–0.97)
    50–64 1,595 173 354 10,450 0.82(0.80–0.84) 0.98(0.98–0.99) 0.90(0.89–0.92) 0.97(0.96–0.97)
    65–74 2,003 213 421 11,695 0.83(0.81–0.84) 0.98(0.98–0.98) 0.90(0.89–0.92) 0.97(0.96–0.97)
    75–84 3,078 275 536 13,990 0.85(0.84–0.86) 0.98(0.98–0.98) 0.92(0.91–0.93) 0.96(0.96–0.97)
    85+ 3,342 295 497 13,583 0.87(0.86–0.88) 0.98(0.98–0.98) 0.92(0.91–0.93) 0.96(0.96–0.97)
RSV1 Algorithmc
    0–4 2,072 241 639 4,308 0.76(0.75–0.78) 0.95(0.94–0.95) 0.90(0.88–0.91) 0.87(0.86–0.88)
    5–19 71 13 71 1,903 0.50(0.42–0.59) 0.99(0.99–1.00) 0.85(0.75–0.91) 0.96(0.95–0.97)
    20–49 98 13 100 5,850 0.49(0.42–0.57) 1.00(1.00–1.00) 0.88(0.81–0.94) 0.98(0.98–0.99)
    50–64 257 25 190 8,825 0.57(0.53–0.62) 1.00(1.00–1.00) 0.91(0.87–0.94) 0.98(0.98–0.98)
    65–74 353 31 233 10,078 0.60(0.56–0.64) 1.00(1.00–1.00) 0.92(0.89–0.94) 0.98(0.97–0.98)
    75–84 470 52 261 12,216 0.64(0.61–0.68) 1.00(0.99–1.00) 0.90(0.87–0.92) 0.98(0.98–0.98)
    85+ 560 28 239 11,920 0.70(0.67–0.73) 1.00(1.00–1.00) 0.95(0.93–0.97) 0.98(0.98–0.98)
RSV2 Algorithmd
    0–4 2,079 271 632 4,278 0.77(0.75–0.78) 0.94(0.93–0.95) 0.88(0.87–0.90) 0.87(0.86–0.88)
    5–19 72 20 70 1,896 0.51(0.42–0.59) 0.99(0.98–0.99) 0.78(0.68–0.86) 0.96(0.96–0.97)
    20–49 100 24 98 5,839 0.51(0.43–0.58) 1.00(0.99–1.00) 0.81(0.73–0.87) 0.98(0.98–0.99)
    50–64 257 52 190 8,798 0.57(0.53–0.62) 0.99(0.99–1.00) 0.83(0.79–0.87) 0.98(0.98–0.98)
    65–74 353 69 233 10,040 0.60(0.56–0.64) 0.99(0.99–0.99) 0.84(0.80–0.87) 0.98(0.97–0.98)
    75–84 472 93 259 12,175 0.65(0.61–0.68) 0.99(0.99–0.99) 0.84(0.80–0.87) 0.98(0.98–0.98)
    85+ 563 69 236 11,879 0.70(0.67–0.74) 0.99(0.99–1.00) 0.89(0.86–0.91) 0.98(0.98–0.98)

ICD-10, International Classification of Disease 10th Revision; RSV, respiratory syncytial virus; TP, true positive; FP, false positive; FN, false negative; TN, true negative; PPV, positive predictive value; NPV, negative predictive value.

a—Influenza-specific ICD-10 codes with virus identified: J09, J10.0, J10.1, J10.8

b—Influenza-specific ICD-10 codes with and without virus identified: J09, J10.0, J10.1, J10.8, J11.0, J11.1, J11.8

c—RSV-specific ICD-10 codes with virus identified: J12.1, J20.5, J21.0, B97.4

d—RSV-specific ICD-10 codes with virus identified + unspecified acute lower respiratory tract infection ICD-10 code: J12.1, J20.5, J21.0, B97.4, J22

RSV1 and RSV2 algorithm validity was more variable across age strata (Table 4). Algorithm specificities were ≥94% across all age strata, while algorithm sensitivities were higher among children aged 0–4 years (e.g. RSV1 Sensitivity = 76%) compared to adults (e.g. adults aged 20–49 years, RSV1 Sensitivity = 49%). Further, PPVs declined among patients aged 5–19 years to lows of 85% for RSV1 and 78% for RSV2. RSV1 and RSV2 algorithm validity also varied by month of hospital admission (Table E in S2 Appendix). Algorithm specificities were ≥99% for November through April, while algorithm sensitivities and PPVs declined in April (RSV1: Sensitivity = 56% PPV = 89%; RSV2: Sensitivity = 57% PPV = 81%).

Overall, the FLU1 algorithm and the RSV1 algorithm maintained the highest specificity and PPV across all age strata and months of admission, and were therefore classified as the most valid algorithms to identify influenza and RSV hospitalizations.

Discussion

We established two highly specific ICD-10 algorithms to identify influenza and RSV hospitalizations using large, population-based reference cohorts of patients with laboratory-confirmed hospitalizations over four respiratory virus seasons. Based on the criteria of specificity and PPV, the most valid influenza algorithm included all influenza-specific ICD-10 codes that included laboratory confirmation (FLU1), while the most valid RSV algorithm included all RSV-specific ICD-10 codes (RSV1).

This finding was expected given our reference cohorts were defined using laboratory test results. Medical coding is performed at discharge when testing results may be available; thus, medical coding and laboratory data are not necessarily independent.

FLU1 and RSV1 maintained high specificity and PPV when the reference cohorts were stratified by age. Thus, the algorithms can be applied to paediatric, adult, and elderly populations with low risk of misclassification bias. The specificity of the algorithms also remained high when assessed by month of hospitalization, although PPV was more variable. The PPV of FLU1 dropped to lows of 87% in November and 86% in May, while the PPV of RSV1 dropped to a low of 89% in April. These decreases were expected, as PPV is dependent on disease prevalence, and the decreases were concordant with typical declines in respiratory virus prevalence and activity in Ontario during those months [14]. Notably, the absolute number of false positives generated during times of low viral activity made up <8% of overall FLU1 false positives and <7% of overall RSV1 false positives. Therefore, while PPV declined during months of lower viral activity, the overall algorithm validity was not impacted.

Our findings concur with previous literature indicating that ICD-10 codes have high specificity and moderate sensitivity for identifying influenza and RSV hospitalizations using health administrative data [1,4,7,8]. Where direct comparisons are possible, our quantitative measures of specificity align with previous findings, while our measures of sensitivity are lower. For example, Moore et al. found that an algorithm that included codes for influenza with or without laboratory confirmation (J10.0-J10.9, J11.0-J11.9) had a specificity of 98.6% and a sensitivity of 86.1% for children aged 0–9 years, whereas our FLU2 algorithm had a specificity of 99% and a sensitivity of 73–75% for children aged 0–19 years [4]. Furthermore, Pisesky et al. found that an algorithm comprising RSV-specific codes (J12.1, J20.5, J21.0, B97.4) had a specificity of 99.6% and a sensitivity of 97.9% for children aged 0–3 years, whereas our RSV1 algorithm had corresponding values of 95% and 76% for children aged 0–4 years [1].

Distinctions between our study populations may explain the differences in sensitivity observed. Pisesky et al. studied a population from a specialized hospital in Ottawa, Ontario [1], while Moore et al. studied a Western Australian population [4]. In contrast, our study was conducted in a larger cohort of patients using data from hospitals across the entire province of Ontario. ICD-10 codes may be used more or less frequently across jurisdictions and institutions resulting in variable algorithm sensitivity. The discrepancies highlight the importance of validating algorithms within distinct populations.

While we established two highly specific algorithms that identify influenza and RSV hospitalizations, some limitations must be considered. First, our reference cohorts only included hospitalized patients who were tested by PCR for the respective pathogens and did not include patients who were not tested. Untested patients with suspected respiratory infections may differ from the tested population of patients. They may have less severe symptoms at hospitalization, may be more likely to live in long-term care facilities where outbreaks have occurred, or may be more likely to live in less-resourced settings where testing is limited. Untested patients may have had more pressing medical concerns at hospitalization and therefore testing was not a priority, or they may have been hospitalized at an overcrowded, high-volume site. Testing may further depend on the age of the patient at admission or the protocols in place at the hospital. By including hospitals across the entire province of Ontario we aimed to mitigate hospital-specific variability that may have affected the generalizability of our results. However, variability between untested and tested patients must be considered when using the algorithms to assess certain risk factors that may be associated with propensity to receive a test. For example, risk factors such as age, symptom severity, comorbidities, and residence in long-term care facilities may be associated with propensity to receive a PCR test, and thus may have biased estimates of effect when using these algorithms. Caution must also be applied when assessing risk factors in specific settings that have differing testing practises compared to general Ontario hospitals.

Another limitation is that our top-performing algorithms were selected to maximize specificity and PPV. This approach was taken to minimize misclassification of cases rather than non-cases. Depending on future study objectives, it may be more important to maximize sensitivity. For example, our algorithms significantly underestimate the number of true influenza and RSV cases in the Ontario population, and thus would not be suitable to estimate population burden of influenza or RSV. Therefore, validity parameters have been reported for all algorithms tested to facilitate the selection of the best algorithm(s) for particular studies.

Use of these algorithms in non-Ontario-based cohorts also warrants caution. PPV and NPV are highly susceptible to changes in disease prevalence [12]. Coding practises and testing practises may vary across jurisdictions, affecting all validity measures reported [15,16]. Thus, it may be necessary to re-validate these algorithms when applying them to other populations.

Our findings have important implications for future studies that aim to assess the aetiology of severe outcomes for influenza and RSV hospitalizations using broad health administrative data. Not all hospitals across Ontario currently submit laboratory data to OLIS. Further, OLIS data collection was limited between 2007 and 2012 as laboratories only gradually started submitting data upon implementation of OLIS in 2007. As CIHI-DAD is available for all hospitals across Ontario, these algorithms will allow us to create larger and more representative cohorts of patients hospitalized with influenza or RSV, increasing the power of future aetiological studies. Lastly, since historical CIHI-DAD data are available as early as 1988, these algorithms could be used to assess changes in disease prevalence and aetiology over time.

Conclusion

Using a population-based cohort of patients tested for influenza and RSV, we identified two highly specific algorithms that best ascertain paediatric, adult, and elderly patients hospitalized with influenza or RSV. These algorithms will improve future efforts to evaluate prognostic and aetiologic factors associated with influenza and RSV when reporting of laboratory data is limited. The same principles may be applicable for other severe acute respiratory infections.

Supporting information

S1 Appendix. Supplementary information for study methodology.

(DOCX)

S2 Appendix. Supplementary results.

(DOCX)

Acknowledgments

Parts of this material are based on data and information compiled and provided by the Ontario Ministry of Health and Long-Term Care (MOHLTC) and the Canadian Institute for Health Information (CIHI). The analyses, conclusions, opinions and statements expressed herein are solely those of the authors; no official endorsement by MOHLTC, or CIHI should be inferred.

Data Availability

Data sharing agreements between ICES and the Ontario Ministry of Health and Long Term Care outlined in Ontario’s Personal Health Information Protection Act legally prohibit ICES from making the dataset publicly available. Therefore, due to our legally binding agreements, we cannot publicly share the dataset from this study. Qualified researchers can obtain access to the data required to replicate the analyses conducted in this study. One can request access to the data at https://www.ices.on.ca/DAS, by email at das@ices.on.ca, or by phone at 1-844-848-9855.

Funding Statement

This study was funded by the Canadian Institutes of Health Research (JCK, PJT 159516, https://cihr-irsc.gc.ca/e/193.html; SM, VR5 172683; https://webapps.cihr-irsc.gc.ca/decisions/p/project_details.html?applId=430319&lang=en) and a St. Michael’s Hospital Foundation Research Innovation Council’s 2020 COVID-19 Research Award (SM; https://secure3.convio.net/smh/site/SPageNavigator/RIC2019.html). SM is supported by a Tier 2 Canada Research Chair in Mathematical Modelling and Program Science (CRC number 950-232643). JCK is supported by a Clinician-Scientist Award from the University of Toronto Department of Family and Community Medicine. This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Pisesky A, Benchimol EI, Wong CA, Hui C, Crowe M, Belair M-A, et al. Incidence of Hospitalization for Respiratory Syncytial Virus Infection amongst Children in Ontario, Canada: A Population-Based Study Using Validated Health Administrative Data. PLoS One. 2016. March 9;11(3). 10.1371/journal.pone.0150416 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Buda S, Tolksdorf K, Schuler E, Kuhlen R, Haas W. Establishing an ICD-10 code based SARI-surveillance in Germany—description of the system and first results from five recent influenza seasons. BMC Public Health. 2017. June 30;17(1):612 10.1186/s12889-017-4515-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Amodio E, Tramuto F, Costantino C, Restivo V, Maida C, Calamusa G, et al. Diagnosis of Influenza: Only a Problem of Coding? Med Princ Pract. 2014. November;23(6):568–573. 10.1159/000364780 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Moore HC, Lehmann D, de Klerk N, Smith DW, Richmond PC, Keil AD, et al. How Accurate Are International Classification of Diseases-10 Diagnosis Codes in Detecting Influenza and Pertussis Hospitalizations in Children? J Pediatric Infect Dis Soc. 2014. September 1;3(3):255–260. 10.1093/jpids/pit036 [DOI] [PubMed] [Google Scholar]
  • 5.Cai W, Tolksdorf K, Hirve S, Schuler E, Zhang W, Haas W, et al. Evaluation of using ICD-10 code data for respiratory syncytial virus surveillance. Influenza and Other Respiratory Viruses. 2019. June 17 10.1111/irv.12665 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Marsden-Haug N, Foster VB, Gould PL, Elbert E, Wang H, Pavlin JA. Code-based Syndromic Surveillance for Influenzalike Illness by International Classification of Diseases, Ninth Revision. Emerg Infect Dis. 2007. February;13(2):207–216. 10.3201/eid1302.060557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Feemster KA, Leckerman KH, Middleton M, Zerr DM, Elward AM, Newland JG, et al. Use of Administrative Data for the Identification of Laboratory-Confirmed Influenza Infection: The Validity of Influenza-Specific ICD-9 Codes. J Pediatric Infect Dis Soc. 2013. March 1;2(1):63–66. 10.1093/jpids/pis052 [DOI] [PubMed] [Google Scholar]
  • 8.Keren R, Wheeler A, Coffin SE, Zaoutis T, Hodinka R, Heydon K. ICD-9 Codes for Identifying Influenza Hospitalizations in Children. Emerg Infect Dis. 2006. October;12(10):1603–1604. 10.3201/eid1210.051525 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ontario Respiratory Pathogen Bulletin [Internet]. Public Health Ontario. Available from: https://www.publichealthontario.ca/en/Data and Analysis/Infectious Disease/Respiratory Pathogens Weekly
  • 10.Chow EJ, Mermel LA. Hospital-Acquired Respiratory Viral Infections: Incidence, Morbidity, and Mortality in Pediatric and Adult Patients. Open Forum Infect Dis. 2017. February 3;4(1). Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5414085/ [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Brenner H, Gefeller O. Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. Am J Epidemiol. 1993. December 1;138(11):1007–1015. 10.1093/oxfordjournals.aje.a116805 [DOI] [PubMed] [Google Scholar]
  • 12.Parikh R, Mathai A, Parikh S, Chandra Sekhar G, Thomas R. Understanding and using sensitivity, specificity and predictive values. Indian J Ophthalmol. 2008;56(1):45–50. 10.4103/0301-4738.37595 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Clopper CJ, Pearson ES. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika. 1934;26(4):404–413. [Google Scholar]
  • 14.Kwong JC, Buchan SA, Chung H, Campitelli MA, Schwartz KL, Crowcroft NS, et al. Can routinely collected laboratory and health administrative data be used to assess influenza vaccine effectiveness? Assessing the validity of the Flu and Other Respiratory Viruses Research (FOREVER) Cohort. Vaccine. 2019. 18;37(31):4392–4400. 10.1016/j.vaccine.2019.06.011 [DOI] [PubMed] [Google Scholar]
  • 15.Otero Varela L, Knudsen S, Carpendale S, Eastwood C, Quan H. Comparing ICD-Data Across Countries: A Case for Visualization? In: 2019 IEEE Workshop on Visual Analytics in Healthcare (VAHC). 2019. p. 60–61.
  • 16.Sivashankaran S, Borsi JP, Yoho A. Have ICD-10 Coding Practices Changed Since 2015? AMIA Annu Symp Proc. 2020. March 4;2019:804–811. [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Judith Katzenellenbogen

11 Sep 2020

PONE-D-20-24822

Validating International Classification of Disease 10th Revision algorithms for identifying influenza and respiratory syncytial virus hospitalizations

PLOS ONE

Dear Dr. Hamilton,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 26 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Judith Katzenellenbogen, Ph D

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2.  We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

"This study was funded by the Canadian Institutes of Health Research (JCK; PJT

159516; https://cihr-irsc.gc.ca/e/193.html) and a St. Michael’s Hospital Foundation

Research Innovation Council’s 2020 COVID-19 Research Award (SM;

https://secure3.convio.net/smh/site/SPageNavigator/RIC2019.html). The funders had

no role in study design, data collection and analysis, decision to publish, or preparation

of the manuscript."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

"SM is supported by a Tier 2 Canada Research Chair in Mathematical Modeling and Program Science. JCK is supported by a Clinician-Scientist Award from the University of Toronto Department of Family and Community Medicine. This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC)."

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This is an important study addressing limitations in the existing literature about the validity of ICD-10 coded administrative data for influenza and RSV case ascertainment.

I have a few comments for the authors to address:

1. The algorithms do not take into account likely differences between hospitals. In a study with similar objectives for a different disease (reference below for your information), we found that inter-hospital differences were important in predicting the validity of case ascertainment through ICD-10 codes, because of implicit differences in de facto coding practices between hospitals and over time (i.e. ultimately medical coders). While there are coding guidelines, there is likely still some variability in how individual coders apply these rules in practice. At a minimum this hypothesis should be evaluated as part of algorithm development. In our study, we used hospital random effects to capture facility-specific differences, but other approaches are possible.

2. You state that the algorithms using all diagnosis positions were more accurate. Could you comment on the differences in accuracy between algorithms using all diagnosis positions and algorithms using "most responsible diagnosis" only? I think, the finding that algorithms should use all diagnosis positions is an interesting finding in itself and should be supported with the evidence that it is based on.

3. A large number of persons were excluded from the cohorts due to hospital acquired disease. I agree with the importance to exclude records for these episodes of disease to identify community acquired disease, but I am unsure of the need to exclude the persons' records entirely. Were all records for a person who ever had hospital acquired influenza excluded and if so, why did you make this choice? Intuitively, the records for these persons should be used, at a minimum for other seasons, especially since this cohort might be systematically different (possibly more comorbidities, older, overall more susceptible etc.) from people who never had hospital-acquired influenza.

Regarding data availability, the authors have specified why their data is not fully publicly available.

References:

Bond-Smith D, Seth R, de Klerk N, Nedkoff L, Anderson M, Hung J, Cannon J, Griffiths K, Katzenellenbogen JM. Development and Evaluation of a Prediction Model for Ascertaining Rheumatic Heart Disease Status in Administrative Data. Clin Epidemiol. 2020;12:717-730

https://doi.org/10.2147/CLEP.S241588

Reviewer #2: The aim of the study was to identify valid algorithms for identifying influenza and RSV associated infections resulting in hospitalisation. This is a noble and important goal, since often times, population-based researchers do not have access to laboratory data. The algorithms developed as part of this study will be useful for future research. I have just a few points where I though additional consideration or clarification would be helpful:

1. Because the reference cohort was selected based on laboratory data, by nature, untested individuals are excluded. This is a major limitation of the study, since those who are tested vs. not can be very different. For example, tested individuals could have presented with much more severe infection or may represent certain population groups of interest. I appreciate the authors mention this (in brief) on page 15, but I feel this point requires more attention. It is not just residents of long-term care facilities who may be less likely to be tested, testing varies by a number of other factors. I think this should be discussed in more detail - particularly with regard to how this may influence the study results and generalizability.

2. Line 288: I didn’t quite follow the statement: “Differences in reference cohort definitions directly influence the resulting algorithm validity” – while this is certainly true, I didn’t understand how the authors linked this statement to the justification for inclusion of PCR-only results. Based on a positive immunofluorescence or culture results, we could still confidently say that an individual had influenza. While it makes sense to exclude serology, the absence of non-PCR testing for virus detection could introduce some selection bias. Is this what the authors meant? Please confirm (Also note that Western Australia (data published by Moore et al.) mostly performs testing by PCR, similar to Ontario.)

3. Could the authors provide more detail on the definition of influenza and RSV seasons? These were selected independently, which is appropriate, but I wasn’t sure whether RSV season typically extended months after influenza season – or if in some winters, RSV activity started before influenza season. Some further information here would be helpful, since as the authors point out – seasonality can strongly influence the validity of diagnostic coding.

4. It makes sense to me that the best performing algorithm are the codes where virus is identified, as the laboratory data and the coding are not necessarily independent. Medical coding is performed at discharge, when testing results may already be available. I think this is worth mentioning in the interpretation of findings.

Reviewer #3: This manuscript evaluates the validity of ICD10 codes associated with influenza- and RSV-related hospital admissions and has identified algorithms (of ICD-10 codes) that best identify patients associated with these two infections. Although there have been several publications about the use of ICD codes to identify specific respiratory infection-related hospitalisation across the world, this paper uses the best data source and provides evidence for the Ontario region. I don’t have any concerns with regards to the data used or the analysis. However, as the authors have right pointed out in the manuscript, due to differences in coding (and even laboratory testing) practices between hospitals/jurisdictions/regions/countries, the generalisability of the identified algorithms to non-Ontario based cohorts is limited.

I just have one very minor comment: The authors have looked at four respiratory virus seasons 2014-2015, 2015-12016, 2016-2017 and 207-2018. However, reading the abstract and the methods/results gives the impression that only two seasons (2014-2015 and 2017-2018) were used or compared. The authors might want to make this clear.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Daniela Bond-Smith

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jan 7;16(1):e0244746. doi: 10.1371/journal.pone.0244746.r002

Author response to Decision Letter 0


5 Nov 2020

Dear Reviewers and Editors,

We thank you for your time, comments and expertise. Please find a point-by-point response to your comments below.

Academic Editor:

Comment 1: Please ensure that your manuscript meets PLOS ONE’s style requirements, including those for file naming.

Response 1: We have updated the manuscript to follow PLOS ONE’s style requirements. Notably, we have re-formatted the headings, we have re-formatted the figure and table titles, we have updated the file names and we have added a supporting information section at the end of the manuscript.

Comment 2: We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.longfor guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

Response 2: The dataset from this study is held securely in coded form at ICES. Data sharing agreements between ICES and the Ontario Ministry of Health and Long Term Care outlined in Ontario’s Personal Health Information Protection Act legally prohibit ICES from making the dataset publicly available, as it may contain personally identifiable information. Therefore, due to our legally binding agreements, we cannot publicly share the dataset from this study. Certain individuals may be granted access to the data if they meet pre-specified criteria for confidential access. One can request access to the data from this study at www.ices.on.ca/DAS. One can also contact ICES Data & Analytic Services by email at das@ices.on.ca, or by phone at 1-844-848-9855. This information has been outlined in our cover letter.

Comment 3: We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement.

Response 3: Thank you for highlighting this discrepancy. We have outlined an updated funding statement in our cover letter. One piece of funding related text remains in our acknowledgements statement. Our organization, ICES, is an independent, non-profit research institute funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). Through our agreements with the MOHLTC, we are required to mention the MOHLTC’s funding support for ICES in our acknowledgments. We do not feel that it is appropriate to place this text in our funding statement because the MOHLTC did not directly fund this study.

Reviewer #1: This is an important study addressing limitations in the existing literature about the validity of ICD-10 coded administrative data for influenza and RSV case ascertainment.

Comment 1: The algorithms do not take into account likely differences between hospitals. In a study with similar objectives for a different disease (reference below for your information), we found that inter-hospital differences were important in predicting the validity of case ascertainment through ICD-10 codes, because of implicit differences in de facto coding practices between hospitals and over time (i.e. ultimately medical coders). While there are coding guidelines, there is likely still some variability in how individual coders apply these rules in practice. At a minimum this hypothesis should be evaluated as part of algorithm development. In our study, we used hospital random effects to capture facility-specific differences, but other approaches are possible.

Response 1: We thank the reviewer for this important comment. Inter-hospital differences across Ontario must be considered for individual-level predictive analyses. As the reviewer described, it is plausible that variability in coding and testing practises may influence the algorithm validity at a single center scale. Importantly, our algorithms are not meant to be applied to single centers, but rather a province-wide hospital discharge abstract database to generate broad cohorts of Ontario patients hospitalized with influenza or respiratory syncytial virus. We have identified the most valid algorithms to be applied across the entire province. We acknowledge that the most valid algorithms for single centers may differ. This is an interesting research question to examine, but it extends beyond the current research objectives. We highlight this limitation in the discussion of our manuscript, stating it may be necessary to re-validate the algorithms when applying them to populations that differ from that in our study.

Comment 2: You state that the algorithms using all diagnosis positions were more accurate. Could you comment on the differences in accuracy between algorithms using all diagnosis positions and algorithms using "most responsible diagnosis" only? I think, the finding that algorithms should use all diagnosis positions is an interesting finding in itself and should be supported with the evidence that it is based on.

Response 2: We thank the reviewer for highlighting the lack of clarity in our statement. Algorithms using the most responsible diagnosis code had marginally better specificity and positive predictive values (PPV), and substantially worse sensitivity, as compared to algorithms that used all medical diagnosis codes. The claim that “algorithms using all diagnosis codes were more accurate than algorithms using the most responsible diagnosis code only” was based on calculations of Youden’s index and Cohen’s kappa. We found that the Youden’s index was always larger when applying algorithms to all diagnosis codes compared to the most responsible diagnosis code. We further found that all algorithms (with the exception of our most general influenza algorithm and most general RSV algorithm) had larger Cohen’s kappa values when applying them to all diagnosis codes compared to the most responsible diagnosis code. We have provided algorithm validity when applied to the most responsible diagnosis code in Tables A and B in the S2 Appendix. A comparison of Youden’s index and Cohen’s kappa for algorithms applied to the most responsible diagnosis code versus all diagnosis codes can be seen in Table C and D of the S2 appendix. The claim in the manuscript has also been modified to improve transparency.

Comment 3: A large number of persons were excluded from the cohorts due to hospital acquired disease. I agree with the importance to exclude records for these episodes of disease to identify community acquired disease, but I am unsure of the need to exclude the persons' records entirely. Were all records for a person who ever had hospital acquired influenza excluded and if so, why did you make this choice? Intuitively, the records for these persons should be used, at a minimum for other seasons, especially since this cohort might be systematically different (possibly more comorbidities, older, overall more susceptible etc.) from people who never had hospital-acquired influenza.

Response 3: We thank the reviewer for highlighting this important point. In our study, individuals with hospital acquired disease were only excluded during the respiratory season in which they were identified as having hospital-acquired disease. Therefore, if these individuals were hospitalized with community-acquired disease in another season, their hospitalization event was included for that other season. We have provided additional clarity in our manuscript, and in the legend of Figure 1.

Individuals with hospital-acquired influenza or RSV were excluded for the entire respective season of infection because we did not have a way to differentiate secondary or tertiary tests associated with the initial nosocomial infection from secondary or tertiary tests that could have resulted from re-infection via interactions within their community.

Reviewer #2: The aim of the study was to identify valid algorithms for identifying influenza and RSV associated infections resulting in hospitalisation. This is a noble and important goal, since often times, population-based researchers do not have access to laboratory data. The algorithms developed as part of this study will be useful for future research. I have just a few points where I thought additional consideration or clarification would be helpful:

Comment 1: Because the reference cohort was selected based on laboratory data, by nature, untested individuals are excluded. This is a major limitation of the study, since those who are tested vs. not can be very different. For example, tested individuals could have presented with much more severe infection or may represent certain population groups of interest. I appreciate the authors mention this (in brief) on page 15, but I feel this point requires more attention. It is not just residents of long-term care facilities who may be less likely to be tested, testing varies by a number of other factors. I think this should be discussed in more detail - particularly with regard to how this may influence the study results and generalizability.

Response 1: We thank the reviewer for their positive comments and feedback. In the discussion of this study’s limitations, we have provided more detail on how the selection of our reference cohort may affect the generalizability of the study results and bias future effect estimates.

Comment 2: Line 288: I didn’t quite follow the statement: “Differences in reference cohort definitions directly influence the resulting algorithm validity” – while this is certainly true, I didn’t understand how the authors linked this statement to the justification for inclusion of PCR-only results. Based on a positive immunofluorescence or culture results, we could still confidently say that an individual had influenza. While it makes sense to exclude serology, the absence of non-PCR testing for virus detection could introduce some selection bias. Is this what the authors meant? Please confirm (Also note that Western Australia (data published by Moore et al.) mostly performs testing by PCR, similar to Ontario.)

Response 2: We thank the reviewer for highlighting the lack of clarity. The statement in line 288 was not meant to highlight selection bias by excluding non-PCR testing. Instead, it was meant to summarize that our reference cohorts varied from previous studies based on: the age of the patients; the population from which the patients were identified; and the testing methods used to define true positive patients from true negative patients. Differences in all validity parameters may arise due to variability in any of these factors. For example, younger patients may be more likely to receive a test regardless of the severity of their symptoms – this may increase the proportion of individuals in the true negative cohort, most likely influencing the negative predictive value. Testing sensitivity and specificity depends on when a specimen is collected, how the specimen is collected, the type of laboratory test used and the specific location where the test is run. At a specialized center, patients may be more likely to receive a test earlier rather than later, improving the probability that an individual who truly has an infection falls in the “true positive” cohort – this would most likely improve algorithm positive predictive values. Finally, ICD-10 codes may be used more or less frequently across jurisdictions and at particular institutions influencing the sensitivity and specificity of an ICD-10 algorithm in differing populations.

We realize that some of this discussion is not relevant to our direct analysis of the differing sensitivities observed. Further, it is unlikely that test sensitivity and specificity were drastically different between the specialized hospital in Ottawa, Western Australian hospitals, and Ontario hospitals. Therefore, we have edited this section to strictly discuss differing sensitivity observed.

Comment 3: Could the authors provide more detail on the definition of influenza and RSV seasons? These were selected independently, which is appropriate, but I wasn’t sure whether RSV season typically extended months after influenza season – or if in some winters, RSV activity started before influenza season. Some further information here would be helpful, since as the authors point out – seasonality can strongly influence the validity of diagnostic coding.

Response 3: Respiratory virus seasonality was defined according to Public Health Ontario’s Respiratory Pathogen Bulletin. We created the most inclusive time frames that would capture seasonal influenza and RSV activity in Ontario between the 2014-15 and 2017-18 seasons according to information accessed on Public Heath Ontario’s Respiratory Pathogen Bulletin. We have updated this section of the manuscript to define seasonality more clearly.

Comment 4: It makes sense to me that the best performing algorithms are the codes where virus is identified, as the laboratory data and the coding are not necessarily independent. Medical coding is performed at discharge, when testing results may already be available. I think this is worth mentioning in the interpretation of findings.

Response 4: Thank you for your comment. We agree that our findings were expected and have included a paragraph mentioning this in the interpretation of our findings.

Reviewer #3: This manuscript evaluates the validity of ICD10 codes associated with influenza- and RSV-related hospital admissions and has identified algorithms (of ICD-10 codes) that best identify patients associated with these two infections. Although there have been several publications about the use of ICD codes to identify specific respiratory infection-related hospitalisation across the world, this paper uses the best data source and provides evidence for the Ontario region. I don’t have any concerns with regards to the data used or the analysis. However, as the authors have right pointed out in the manuscript, due to differences in coding (and even laboratory testing) practices between hospitals/jurisdictions/regions/countries, the generalisability of the identified algorithms to non-Ontario based cohorts is limited.

Comment 1: The authors have looked at four respiratory virus seasons 2014-2015, 2015-2016, 2016-2017 and 2017-2018. However, reading the abstract and the methods/results gives the impression that only two seasons (2014-2015 and 2017-2018) were used or compared. The authors might want to make this clear.

Response 1: We thank the reviewer for their positive comments. We have updated the abstract to clarify which seasons were included in our analyses.

We look forward to your response.

Yours truly,

Jeff Kwong, on behalf of the authors

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Judith Katzenellenbogen

16 Dec 2020

Validating International Classification of Disease 10th Revision algorithms for identifying influenza and respiratory syncytial virus hospitalizations

PONE-D-20-24822R1

Dear Dr. Hamilton,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Judith Katzenellenbogen, Ph D

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #3: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #3: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #3: (No Response)

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: My comments have been adequately addressed. The paper lays out a methodological approach for developing a prediction algorithm, with its findings being highly specific to the data that it was developed for. It provides a useful contribution for others who may wish to develop a similar algorithm, but the its findings may not generalize beyond the original data.

Reviewer #3: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: No

Acceptance letter

Judith Katzenellenbogen

28 Dec 2020

PONE-D-20-24822R1

Validating International Classification of Disease 10th Revision algorithms for identifying influenza and respiratory syncytial virus hospitalizations

Dear Dr. Hamilton:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Judith Katzenellenbogen

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Supplementary information for study methodology.

    (DOCX)

    S2 Appendix. Supplementary results.

    (DOCX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    Data sharing agreements between ICES and the Ontario Ministry of Health and Long Term Care outlined in Ontario’s Personal Health Information Protection Act legally prohibit ICES from making the dataset publicly available. Therefore, due to our legally binding agreements, we cannot publicly share the dataset from this study. Qualified researchers can obtain access to the data required to replicate the analyses conducted in this study. One can request access to the data at https://www.ices.on.ca/DAS, by email at das@ices.on.ca, or by phone at 1-844-848-9855.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES