A methodological approach to validate pneumonia encounters from radiology reports using Natural Language Processing (NLP)

AlokSagar Panny; Harshad Hegde; Ingrid Glurich; Frank A Scannapieco; Jayanth G Vedre; Jeffrey J VanWormer; Jeffrey Miecznikowski; Amit Acharya

doi:10.1055/a-1817-7008

. Author manuscript; available in PMC: 2022 Aug 20.

Published in final edited form as: Methods Inf Med. 2022 Apr 5;61(1-02):38–45. doi: 10.1055/a-1817-7008

A methodological approach to validate pneumonia encounters from radiology reports using Natural Language Processing (NLP).

AlokSagar Panny ^a, Harshad Hegde ^a, Ingrid Glurich ^a, Frank A Scannapieco ^b, Jayanth G Vedre ^c, Jeffrey J VanWormer ^d, Jeffrey Miecznikowski ^e, Amit Acharya ^a,^f,^*

PMCID: PMC9391313 NIHMSID: NIHMS1810525 PMID: 35381617

Abstract

Introduction:

Pneumonia is caused by microbes that establish an infectious process in the lungs. The gold standard for pneumonia diagnosis is radiologist-documented pneumonia-related features in radiology notes that are captured in electronic health records in an unstructured format.

Objective:

The study objective was to develop a methodological approach for assessing validity of a pneumonia diagnosis based on identifying presence or absence of key radiographic features in radiology reports with subsequent rendering of diagnostic decisions into a structured format.

Methods:

A pneumonia-specific Natural Language Processing (NLP) pipeline was strategically developed applying cTAKES to validate pneumonia diagnoses following development of a pneumonia feature-specific lexicon. Radiographic reports of study-eligible subjects identified by International Classification of Diseases (ICD) codes were parsed through the NLP pipeline. Classification rules were developed to assign each pneumonia episode into one of three categories: “positive”, “negative” or “not classified: requires manual review” based on tagged concepts that support or refute diagnostic codes.

Results:

A total of 91,998 pneumonia episodes diagnosed in 65,904 patients were retrieved retrospectively. Approximately 89% (81,707/91,998) of the total pneumonia episodes were documented by 225,893 chest x-ray reports. NLP classified and validated 33% (26,800/81,707) of pneumonia episodes classified as ‘Pneumonia-positive’, 19% as (15401/81,707) as ‘Pneumonia-negative’ and 48% (39,209/81,707) as “episode classification pending further manual review’. NLP pipeline performance metrics included accuracy (76.3%), sensitivity (88%), and specificity (75%).

Conclusion:

The pneumonia-specific NLP pipeline exhibited good performance comparable to other pneumonia-specific NLP systems developed to date.

Keywords: Pneumonia, Natural Language Processing, Knowledge Bases

1. Introduction

Pneumonia is caused by opportunistic or pulmonary pathogens that establish infectious/inflammatory processes in the lungs. The condition is dynamic and evolves with progression of the ongoing infectious process such that diagnosis and management are mainly informed by clinical symptomology supported by observable emergent radiologic manifestations over time with advancing disease¹. Documentation of absence or presence of these radiologic phenotypic features by a trained radiologist on x-rays or Computed Tomography (CT) images is the gold standard for validation of a pneumonia diagnosis^2
3. Typically, putative cases are managed and treated empirically without benefit of microbiological investigations, especially on initial presentation and assessment of symptomology.

Informatics and statistical approaches have been attempted to simplify and facilitate pneumonia diagnosis. However, validating a pneumonia diagnosis based solely on abstraction of structured data from electronic health records (EHR) is challenging. For example, among 175 putative cases who had received International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnostic codes Drahos et al⁴ reported a positive predictive value (PPV) of 88% following validation involving manual review of infection-related symptoms, laboratory test results, and prescribed medications to confirm the pneumonia diagnoses⁴. While their study suggested that ICD codes represent useful structured data elements to support the initial identification of potential pneumonia cases and for epidemiologic surveillance, manual validation would be required for etiological studies and include closer examination of phenotypic radiological features not readily available as structured data.

The gold standard for validating pneumonia diagnosis is radiographic interpretation and confirmation documented by the radiologist, in free text-based documents. Notably, these data are not readily extractable as structured data from most EHRs. Evolution of pneumonia over the time frame defining an ‘episode’ are documented by serial radiographs that capture emergent cardinal features and clinical symptomology. The radiological features of pneumonia represent a finite number of definable terms based on emergent clinical features related to the pathological processes underlying this condition. Creation of a lexicon cataloguing these features is essential for development of a pneumonia-specific natural language processing (NLP) pipeline. Documenting presence or absence of key radiographic features in text-based radiology reports following application of NLP is essential to validating or negating a pneumonia diagnosis.

Other reports have described creation of NLP tools/systems to validate or negate a pneumonia case based on textual documents available in the EHR including: ONYX NLP system⁵, MedLEE NLP system⁶, MetaMap⁷ and Multi-threaded Clinical Vocabulary Server (MCVS) NLP system⁸. However use of the cTAKES NLP tool to develop a pneumonia-specific NLP pipeline/system for pneumonia validation in text-based radiology reports in the EHR was not reported previously. The authors successfully developed a cTAKES-based NLP pipeline for classifying smoking status reported in text-based documents⁹ and postulated that a pneumonia-specific lexicon for terms found in radiology notes could be similarly be developed in cTAKES to classify pneumonia status. Comparison of performance of our tool with previously developed tools is detailed in the discussion section later in the manuscript.

This study describes the methodological approach used to create a pneumonia-specific NLP pipeline capable of classifying true pneumonia status among all putative pneumonia cases initially identified by ICD9/10 codes captured in the EHR of Marshfield Clinic Health System (MCHS). A methodological approach for pneumonia status validation was a task defined in the context of achieving the objectives of a funded study whose focus was definition of a pneumonia subtype classification algorithm by setting in which pneumonia emerged (e.g. hospital acquired vs healthcare acquired vs community acquired, etc.). Ultimately, the overarching study goal was to examine potential association of the pneumonia subtype and key oral health phenotypes definable from clinical data documented in electronic dental records. The design of the NLP pipeline specifically developed to evaluate radiographic terminology documented as part of the radiologic notes/finding is described. Application of the NLP pipeline to detect ‘presence’ or ‘absence’ of key phenotypic characteristics, or the need for further manual review to achieve pneumonia classification status based on availability of key variables is further defined.

2. Methods

2.1. Cohort identification and data retrieval

This study was conducted in the MCHS patient population. MCHS is a large integrated healthcare system, operating 50 outpatient clinics and seven hospitals that serve 34 largely rural communities throughout central, northern and western Wisconsin. Patients with a pneumonia encounter occurring within the temporal window ranging from 1/1/2007 to 12/30/2019, were identified retrospectively using ICD9 (480.0–487.0) or ICD10 (J12-J18.9) diagnostic codes. ‘Rule of one’ (patients with at least one pneumonia encounter) or ‘rule of two’ (patients with two or more pneumonia encounters within a defined period) criteria were applied based on frequency of pneumonia diagnostic codes documented in the EHR. As outlined in Figure S1 (See Appendix), for a given patient, an incident pneumonia episode was identified by one or more pneumonia encounters occurring within a defined 90-day temporal window.

In patients with multiple pneumonia encounters separated by time, a temporal window of >/= 90 days from the index diagnosis marking a prior pneumonia event was defined in consultation with the critical care pulmonologist (JGV) as a ‘new’ episodes of pneumonia. Pneumonia-associated symptomology or diagnosis occurring less than 90 days of the index event (initial diagnosis of pneumonia) was classified as a ‘recurrent’ pneumonia encounter and defined continuation of a prior pneumonia episode.

2.2. Eligibility Criteria

Defined inclusion/exclusion criteria increased the likelihood that putative pneumonia cases were eligible for inclusion while excluding cases that did not meet eligibility criteria.

Specifically, ‘eligible’ patients selected were assigned to one of two cohorts, based on their alignment with the following inclusion criteria:

Cohort 1: At least two pneumonia-related encounters (i.e. rule of 2) documented by ICD9/10 diagnosis codes during a given pneumonia episode (see Figure 1).
Cohort 2: One pneumonia encounter (i.e., rule of 1) during a given pneumonia episode documented by diagnosis codes (ICD9/10) and prescribed antibiotics (Table 1) and/or had a chest X-ray within ±30 days of their pneumonia encounter (see Figure 1).

Fig. 1 — Summary of the total number of pneumonia episodes within each cohort based on definitions of eligibility criteria for inclusion in analyses.

Table 1:

Medication Variables

Class	Medication(antibiotics)
Macrolides	Azithromycin, clarithromycin, erythromycin
Fluoroquinolones	Moxifloxacin, gemifloxacin, levofloxacin, ciprofloxacin
Cephalosporins	Cefotaxime, cefepime, ceftazidime, ceftriaxone
Carbapenems	Doripenem, ertapenem, imipenem/cilastatin, meropenem
Penicillins	Ampicillin/sulbactam, piperacillin/tazobactam
Aminoglycosides	Amikacin, gentamicin, tobramycin, linezolid, vancomycin, colistin

Open in a new tab

These two cohorts were mutually exclusive with no overlap. Patients with only one documented pneumonia encounter and no chest X-ray or antibiotics prescribed +/−30days from their pneumonia encounter were excluded from further analyses. Cases meeting inclusion criteria were validated by parsing the radiology reports through the NLP pipeline.

2.3. NLP Pipeline

The NLP pipeline was designed with the ultimate goal of electronically validating true cases of pneumonia among the 81,707 putative episodes that initially met inclusion criteria based on documentation by a total of 225,893 chest x-ray reports. Inclusion of an episode required documentation at least one chest X-ray report. Pneumonia episodes were grouped into one of two cohorts and parsed through the NLP pipeline. Creating the NLP pipeline was an essential first step in achieving pneumonia status classification as described below.

2.3.1. Lexicon

The NLP pipeline is outlined in Figure S2 (See Appendix). A dictionary of keywords (lexicon) enumerating positive and negative qualifiers for a confirmed pneumonia diagnosis was developed as the critical component contributing to NLP pipeline functionality. (Table S1 (See Appendix)). Terminology included in the dictionary was informed by a detailed literature review to obtain a list of key words to validate ‘presence’ or ‘absence’ of pneumonia in the radiology report. Further, the data abstraction manual developed by Dublin et al.⁵, was reviewed and a list of key words extracted that could classify terminology in radiology reports as ‘consistent’ or ‘inconsistent’ with presence of pneumonia. Complex features were manually reviewed. The keywords in the dictionary were mapped to a list of concept unique identifiers (CUIs) representing the primary radiological findings in chest X-ray reports validating presence/absence of pneumonia and its phenotypic features. ‘Clinical Text Analysis and Knowledge Extraction System’ (cTAKES)¹⁰, an open sourced NLP system developed by Mayo Clinic which combines rule-based and machine learning (ML) techniques¹¹ to extract information from clinical documents, was used to analyze text-based interpretations of radiographic reports. The cTAKES platform was used to extract CUI’s from the unified medical language system (UMLS) database¹² to develop a domain-specific lexicon (Figure S3 (See Appendix)).

2.3.2. Classification rules

A set of rules was further developed to support classification of each pneumonia episode into one of three categories: 1) ‘positive’, 2) ‘negative’ or 3) ‘not classified: requires manual review’, based on tagged concepts that validate or refute the diagnostic code. Text review of radiology notes limited focus to the ‘Findings’, and ‘Impression’ sections of a radiology notes where radiologists summarized their observations and interpretations of the chest x-ray features important in classifying the episodes (See Figure 2.). The ‘Clinical indications’ section of the notes was not included because it mostly represented the differential diagnosis of the clinician which prompted the chest x-ray order placed to further investigate suspicion of pneumonia.

Fig. 2 — Examples of mapping between “positive” and “negative” qualifiers from radiology text and concept unique identifiers (CUIs) from UMLS.

A set of three rules was defined to identify putative pneumonia cases based on ‘positive’ and ‘negative’ qualifiers (concepts) from the radiology reports. These three rules included:

I. Rule of Hierarchy:

The presence of at least one positive qualifier in the radiographic notes classifies the episode as ‘positive’ for pneumonia. A hierarchical approach was followed where a positive qualifier would supersede a negative qualifier in classifying episodes. Absence of either of qualifier renders that episode as ‘unclassified’.

II. Rule of Majority:

In contrast to the ‘hierarchical approach’, episode classification was based on ‘majority’ content of qualifiers in the radiographic note which were either classified as ‘positive’ or ‘negative’ found in either ‘Findings’ and ‘Impression’ sections

III. Rule of Majority (‘I’):

Episode classification was based on the ‘majority’, or number of positive or negative qualifiers in the radiographic note, limited only to those found in the ‘Impression’ section of the radiological note.

This contrasts with the rules of hierarchy and majority that considered qualifiers in both ‘Findings’ and ‘Impression’ sections.

In the event of equivalent representation of both positive and negative qualifiers in a given episode, that episode was classified as positive.

A set of decision rules was adapted from the abstraction guide provided by Dublin et al.⁵ to address inconsistencies in radiology reports. For example, in the sentence “There is parenchymal density present at the right lung base that is certainly consistent with a history of pneumonia/atelectasis”, the NLP tool identifies both ‘positive’ (pneumonia) and ‘negative’ (atelectasis) concepts, thus lending ambiguity to definitive classification of such records. The decision rule shown in Table 2 was adapted to address the above-stated ambiguity.

Table 2:

Decision rules to classify radiology reports having both positive and negative concepts.

Pneumonia : Negative	Pneumonia : Positive
“Pneumonia or atelectasis”	“pneumonia and atelectasis”
“Pneumonia vs atelectasis”
“pneumonia / atelectasis”
“pneumonia and /or atelectasis”

Open in a new tab

2.3.3. Validation of NLP pipeline

To validate the pneumonia case classification based on the defined rules, a subset of records of subjects consisting of 90 radiological reports related to 51 putative pneumonia encounters classified into 38 pneumonia episodes (dataset 1) meeting inclusion criteria were randomly chosen and analyzed. Subjects selected for parsing through the NLP pipeline to test accuracy of NLP-based pneumonia classification rules each had multiple pneumonia encounters occurring during multiple pneumonia episodes and multiple chest X-ray reports. Concurrently, the same 90 radiological reports used to test the NLP classification rules were also independently reviewed manually by two reviewers. The first reviewer (JGV), a pulmonologist and intensivist, classified each pneumonia event based on extensive clinical expertise. Pulmonologist review was deemed the ‘gold standard’. While blinded to classification of reviewer 1, a second reviewer (AP), a researcher with training in informatics and dentistry, independently classified the same set of records, utilizing the CUI reference list (concepts) created to support NLP-assisted pneumonia classification. Inter-rater reliability calculated between the two reviewers indicated strong agreement (Kappa=0.945). A further subset of records consisting of 43 radiology reports related to 22 putative pneumonia encounters grouped into 14 distinct pneumonia episodes was randomly chosen (dataset 2). These records were parsed by the NLP tool and also independently classified by reviewer 2 (AP) while blinded to NLP classification. The NLP outcomes of dataset 2 were compared with outcomes of reviewer 2. A 95% confidence interval (CI) was calculated by the normal approximation to the binomial distribution. Fisher’s exact test was used to test the calls from each rule for significance of association with the gold standard within each dataset.

2.3.4. Classification of Pneumonia episodes

After the two rounds of validation and finalization of the classification rules set, all the pneumonia encounters with chest x-ray reports of eligible patients with a putative pneumonia diagnosis classified into one or more pneumonia episodes during the temporal window of the study (cohort1 and cohort 2) were parsed through the cTAKES NLP pipeline. The final set of rules was then applied in the post-processor to generate output. Text in radiological reports containing potential pneumonia descriptors were parsed through the NLP pipeline coding the output into cTAKES references. The output from cTAKES are eXtensible Markup Language (XML) files. These files contain information about concept occurrences and CUIs associated with the terms present in the notes. Outcome of individual radiological notes produced an XML file were collectively analyzed by a post-processor. The post-processor was developed using Python 3¹³.

2.4. Ethical considerations

The study underwent expedited review and was approved by the Institutional Review Board for Human Subjects Research of MCHS. The study qualified for expedited review because it involved only secondary review of existing clinical data without direct human subject involvement.

3. Results

3.1. Validation of NLP tool and classification rules.

Table 3 lists the performance metrics of the NLP pipeline for each of the defined rules compared with outcomes of manual review for both validation datasets 1 and 2. For all cases, each rule examining significance of association with the gold standard within each dataset examined by Fisher’s exact test was significant at level 0.05, indicating that the NLP rule calls have a significantly better-than- random pneumonia call rate.

Table 3:

NLP tool metrics displaying statistic (95% CI) and p-values from Fisher’s exact test for each dataset and rule.

Validation Datasets (N=Total # of radiology reports)	Rules of classification
Dataset1 (N=90)	Rule of Hierarchy % (95% CI) (p=0.026)	Rule of Majority % (95% CI (p=0.001)	Rule of Majority (‘I’^*) % (95% CI (p=0.002)
Accuracy	59.7% (48.4%, 71.1%)	76.3% (66.6%, 86.2%)	73.6% (63.4%, 83.8%)
Sensitivity	88.0% (64.6%, 100%)	88.0% (64.6%, 100%)	88.0% (64.6%, 100%)
Specificity	56.2% (44.1%, 68.4%)	75.0% (64.4%, 85.6%)	71.9% (60.9%, 82.9%)
False negative rate	12.0% (0.00%, 35.4%)	12.0% (0%, 35.4%)	12.0% (0%, 35.4%)
False positive rate	43.80% (31.6%, 55.9%)	25.0% (14.4%, 35.6%)	28.1% (17.1%, 39.1%)
Dataset2 (N=43)	Rule of Hierarchy (p=0.002)	Rule of Majority (p<0.001)	Rule of Majority (‘I’) (p<0.001)
Accuracy	67.5% (52.5%, 82.7%)	83.0% (71.9%, 95.7%)	81.0% (68.5%, 93.7%)
Sensitivity	100% (100%, 100%)	90.0% (71.4%, 1%)	90.0% (71.4%, 1%)
Specificity	55.5% (36.8%, 74.3%)	81.4% (66.8%, 96.1%)	77.8% (62.1%, 93.5%)
False negative rate	0% (0%, 0%)	10.0% (0%, 28.6%)	10.0% (0%, 28.6%)
False positive rate	44.5% (25.7%, 63.2%)	18.6% (3.9%, 33.2%)	22.2% (6.5%, 37.9%)

Open in a new tab

I: Positive or Negative qualifiers in the radiographic report limited only to those found in the “Impression” section.

^**

Total number of radiology reports designated as “not classified: requires manual review” are 18% (24 /133).

Accuracy of the NLP pipeline applying ‘rule of majority’ was 76.3% and 83% for dataset 1 and dataset 2, respectively (Table 3), when compared with classification by the independent reviewers. Collectively, among radiology reports in dataset 1 and dataset 2, 18% (24/133) were designated as ‘not classified: requires manual review’. The sensitivity and specificity of the NLP tool applying rule of majority were 88% and 75% respectively, for dataset 1. Performance of the NLP tool applying ‘rule of majority’ to dataset 2 showed sensitivity and specificity of 90% and 81.4%, respectively.

3.2. NLP classification outcome

A total of 65,904 patients with a putative pneumonia diagnosis classified into 91,998 pneumonia episodes were identified retrospectively applying the inclusion /exclusion criteria within the temporal window including diagnoses occurring within the date range spanning 1/1 2007 to 12/30/2019. As summarized in Figure 2, among 91,998 episodes that were associated with a total of 225,893 chest x-ray reports, 89% (81,707/91,998) parsed through the NLP pipeline were supported by at least one chest X-ray report. The total number of episodes (n=81,7070) that met eligibility criteria comprised 60% (49,008/81,707) of episodes derived from cohort 1 and 40% of episodes (32,699/81,707) derived from cohort 2. In cohort 1, 39% (18,902/49,008) of the episodes were classified as ‘pneumonia positive’, 16% (7,898/49,008) as ‘pneumonia negative’, and 45% (22,208/49,008) were classified as: ‘classification pending-requires manual review’. In cohort 2, 25% (8,195/32,699) of the episodes were classified as ‘pneumonia positive’, 23% (7,503/32,699) as ‘pneumonia negative’ and 52% (17,001/32,699) were classified as: ‘not classifiable/pending further manual review’.

Of the total pneumonia episodes determined as ‘positive’ in cohort 1 and cohort 2, 73.5% (13,906/18,902) and 68.6% (5,625/8,195), respectively, also had at least one qualifying antibiotic prescription during the study. Among the total pneumonia episodes classified as ‘pneumonia negative’ and ‘not classified/requires manual review’ in cohort 1, 70% (5,519/7,898) and 63.5% (14,106/22,208), respectively, documented at least one qualifying antibiotic prescription during the study period. In cohort 2, 66.5% (4,990/7,503) of pneumonia episodes classified as ‘pneumonia negative’ and 64.1% (10,903/17,001) classified as ‘not classified/requires manual review’, had at least one documented qualifying antibiotic prescription.

4. Discussion

This application of NLP to confirm pneumonia status described herein was developed to support an ongoing study whose objective was classification of pneumonia based on its evolutionary environment or subtype (i.e. community/healthcare acquired versus hospital acquired; ventilator associated pneumonia; or aspiration pneumonia). A prerequisite of pneumonia sub-classification is validation of a true pneumonia diagnosis. This report describes the development of an NLP pipeline to support validation of ‘true’ pneumonia cases based on radiological features that are hallmarks of this condition. The study design extends the rule-based logic used in the Drahos et al.⁴ study by initially applying ICD9/10 codes to electronically identify putative pneumonia cases. Following exhaustive definition of the radiographic features, an open source NLP application was applied to radiology reports of putative cases to extract terminology that could validate or refute presence of phenotypic characteristics of pneumonia.

Initial validation of the tool was achieved by first testing performance of the tool on two separate subsets of pneumonia records (dataset1&2) that were parsed through the cTAKES NLP pipeline and also manually reviewed and classified initially by expert reviewers, blinded to NLP pipeline classification. When compared to expert manual review and classification, cTAKES NLP pipeline accurately classified 76.3% (95% CI = 66.6%, 86.2%) and 83% (95% CI = 71.9%, 95.7%) of pneumonia chest X-ray reports in dataset 1 and dataset 2 respectively. The algorithm when validated against Comparison of NLP classification outcome with clinical “gold standard” (classification by a pulmonologist expert reviewer) yielded 88% sensitivity and 75% specificity, demonstrating strong performance and capacity for providing clinical decision support in ICU’s for interpreting chest x-ray reports of patients with equivocal pneumonia diagnoses. The NLP tool exhibited a high sensitivity and specificity, comparable to findings from previous studies that similarly applied radiology review to pneumonia validation. The ONYX NLP system developed by Dublin et al.⁵ to validate pneumonia cases had sensitivity and specificity of 92% and 87% respectively. Mendonca et al. evaluated the feasibility of a NLP system MedLEE to identify healthcare associated pneumonia reported sensitivity and specificity of 71% and 99%, respectively⁶. In another study to determine the accuracy of the NLP system for identifying patients with “true” pneumonias, Elkin et al. used Multi-threaded Clinical Vocabulary Server(MCVS) NLP system ⁸. They reported an accuracy of 96.25% and sensitivity and specificity of 100% and 90.3% respectively. A study team at Kaiser Permanente of Northern California evaluated the feasibility of NLP to identify pneumonia in chest radiographs of critically ill patients admitted to an intensive care unit¹⁴. They reported the sensitivity and specificity of 89.9% and 87.5% respectively for NLP performance.

The pneumonia specific cTAKES NLP tool described herein also had a negative predictive value (NPV) of 95.6 % comparable to that reported previously by Dublin et al.⁵ However the tool showed a moderate positive predictive value (PPV) of 64.2%. Moreover outcomes of NLP classification also exhibited a low false negative rate of 11% and a moderate false positive rate of 22%.

4.1. Limitations

A limitation of this study was that the decision rules that were designed to access the NLP output involved complex inferencing and it was challenging to develop multiple decision rules to address ambiguities in the radiological text. However, since our main objective was to validate positive cases of pneumonia for a larger study that was attempting to develop a rule-based algorithm for classifying pneumonia by the setting in which occurred, the number of episodes validated as ‘positive’ across cohorts 1 and 2 still remained substantive (n=27,097).

Another limitation was that in its present format the pneumonia specific cTAKES NLP tool was not able to utilize additional clinical knowledge (e.g. clinical notes of the patient) available in the EHR. These limitations impacted on the overall performance of the NLP tool, as evidenced by the 22% false positive rate and relatively high number of putative pneumonia cases classified as needing additional manual review. Future research will explore the utilization of clinical notes to improve NLP decision rules. Use of machine learning techniques to further decrease the rate of false positives and false negatives in classifying the pneumonia episodes will also be investigated in future studies.

4.2. Clinical/Public Health Relevance

The study describes development of an open source NLP application uniquely developed using cTAKES, a freely available open source application. The tool incorporates both ICD coding and an exhaustive lexicon of pneumonia-specific radiological terminology created for vetting phenotypic characteristics of pneumonia to validate or refute presence of pneumonia. Methodological approaches and the lexicon are provided and offer this technology as an expanded reproducible clinical NLP tool for pneumonia classification. This tool has applicability in the public health sector for epidemiological investigation of environmental origin of pneumonia emergence. The cTAKES NLP pipeline tool proved useful for accomplishing the further study objective for which it was originally created by producing a dataset of validated positive pneumonia episodes that were eligible for further rule-based sub-classification by environmental setting in which pneumonia emerged. Outcomes of pneumonia sub-classification based on setting of emergence assisted by the cTAKES NLP pipeline are described elsewhere¹⁵.

5. Conclusions

We developed an accurate pneumonia-specific cTAKES NLP pipeline for pneumonia classification of putative pneumonia episodes experienced by the MCHS patient population. The performance of our tool was high and consistent with tools developed in other systems in prior studies and scalable to other healthcare systems. Notably, the cTAKES NLP pipeline offers a phenotypically refined informatics approach to EHR-based data for pneumonia classification for future studies requiring validation of pneumonia classification. The tool is particularly applicable in the context of defining pneumonia sub-classification or etiology. Future expansion of the tool for applicability to clinical notes is planned to further extend its capacity for pneumonia validation.

Supplementary Material

NIHMS1810525-supplement-1.pdf^{(370.6KB, pdf)}

Acknowledgement

The authors thank Dr. Michael Jackson (Associate Investigator, Kaiser Permanente Washington Health Research Institute) and Dr. Sascha Dublin (Senior Scientific Investigator, Kaiser Permanente Washington Health Research Institute) for providing us with the abstractor guide for analyzing the pneumonia chest radiographs. The authors also thank Brooke Ellen Delgoffe and Erica Scotty from Office for Research Computing and Analytics, Marshfield Clinic Research Institute for their help with data abstraction.

Funding sources.

Research reported in this publication was supported by the National Institute of Dental & Craniofacial Research of the National Institutes of Health under Award Number 1R03DE027020-01A1. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

Conflict of interest statement.

The authors do not have any conflict of interest

References:

1.Glurich I, Shimpi N, Scannapieco F, Vedre J, Acharya A. Interdisciplinary Care Model: Pneumonia and Oral Health. In: Acharya A, Powell V, Torres-Urquidy M, Posteraro R, Thyvalikakath T, editors. Integration of Medical and Dental Care and Patient Data. 2nd ed. Springer, Cham; 2019. p. 123–39. 10.1007/978-3-319-98298-4_9 [Accessed 2021 Jul 30] [DOI] [Google Scholar]
2.Franco J. Community-acquired pneumonia. Radiol Technol. 2017;88(6):621–38 [PubMed] [Google Scholar]
3.Franquet T. Imaging of Community-Acquired Pneumonia. In: Journal of Thoracic Imaging. J Thorac Imaging; 2018. p. 282–94 [DOI] [PubMed] [Google Scholar]
4.Drahos J, Vanwormer JJ, Greenlee RT, Landgren O, Koshiol J. Accuracy of ICD-9-CM codes in identifying infections of pneumonia and herpes simplex virus in administrative data. Ann Epidemiol. 2013;23(5):291–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Dublin S, Baldwin E, Walker RL, et al. Natural language processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf. 2013;22(8):834–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Mendonça EA, Haas J, Shagina L, Larson E, Friedman C. Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform. 2005. Aug;38(4):314–21 [DOI] [PubMed] [Google Scholar]
7.Chapman WW, Fiszman M, Dowling JN, Chapman BE, Rindflesch TC. Identifying Respiratory Findings in Emergency Department Reports for Biosurveillance using MetaMap. 2004 [PubMed] [Google Scholar]
8.Elkin PL, Froehling D, Wahner-Roedler D, et al. NLP-based Identification of Pneumonia Cases from Free-Text Radiological Reports. AMIA Annu Symp Proc. 2008;2008:172. [PMC free article] [PubMed] [Google Scholar]
9.Hegde H, Shimpi N, Glurich I, Acharya A. Tobacco use status from clinical notes using Natural Language Processing and rule based algorithm. Technol Heal Care. 2018;26(3):445–56 [DOI] [PubMed] [Google Scholar]
10.Savova GK, Masanz JJ, Ogren PV, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications. J Am Med Informatics Assoc. 2010. Sep;17(5):507–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Hegde H, Shimpi N, Panny A, Glurich I, Christie P, Acharya A. Development of non-invasive diabetes risk prediction models as decision support tools designed for application in the dental clinical environment. Informatics Med Unlocked. 2019. Jan 1;17:100254. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Oliver B. The Unified Medical Language System (UMLS): Integrating Biomedical Terminology. Nucleic Acids Res. 2004;32(Database issue). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Van Rossum G. Python reference manual. Amsterdam; 1995. Jan. https://ir.cwi.nl/pub/5008 [Accessed 2021 Jul 30] [Google Scholar]
14.Liu V, Clark MP, Mendoza M, et al. Automated identification of pneumonia in chest radiograph reports in critically ill patients. BMC Med Inform Decis Mak. 2013;13:90. PMC3765332 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Hegde H, Glurich I, Panny A, et al. Identifying Pneumonia Sub-types from Electronic Health Records Using Rule-based Algorithms. Methods Inf Med. 2022. Mar 17. doi: 10.1055/a-1801-2718. Epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1810525-supplement-1.pdf^{(370.6KB, pdf)}

[R1] 1.Glurich I, Shimpi N, Scannapieco F, Vedre J, Acharya A. Interdisciplinary Care Model: Pneumonia and Oral Health. In: Acharya A, Powell V, Torres-Urquidy M, Posteraro R, Thyvalikakath T, editors. Integration of Medical and Dental Care and Patient Data. 2nd ed. Springer, Cham; 2019. p. 123–39. 10.1007/978-3-319-98298-4_9 [Accessed 2021 Jul 30] [DOI] [Google Scholar]

[R2] 2.Franco J. Community-acquired pneumonia. Radiol Technol. 2017;88(6):621–38 [PubMed] [Google Scholar]

[R3] 3.Franquet T. Imaging of Community-Acquired Pneumonia. In: Journal of Thoracic Imaging. J Thorac Imaging; 2018. p. 282–94 [DOI] [PubMed] [Google Scholar]

[R4] 4.Drahos J, Vanwormer JJ, Greenlee RT, Landgren O, Koshiol J. Accuracy of ICD-9-CM codes in identifying infections of pneumonia and herpes simplex virus in administrative data. Ann Epidemiol. 2013;23(5):291–3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Dublin S, Baldwin E, Walker RL, et al. Natural language processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf. 2013;22(8):834–41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Mendonça EA, Haas J, Shagina L, Larson E, Friedman C. Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform. 2005. Aug;38(4):314–21 [DOI] [PubMed] [Google Scholar]

[R7] 7.Chapman WW, Fiszman M, Dowling JN, Chapman BE, Rindflesch TC. Identifying Respiratory Findings in Emergency Department Reports for Biosurveillance using MetaMap. 2004 [PubMed] [Google Scholar]

[R8] 8.Elkin PL, Froehling D, Wahner-Roedler D, et al. NLP-based Identification of Pneumonia Cases from Free-Text Radiological Reports. AMIA Annu Symp Proc. 2008;2008:172. [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Hegde H, Shimpi N, Glurich I, Acharya A. Tobacco use status from clinical notes using Natural Language Processing and rule based algorithm. Technol Heal Care. 2018;26(3):445–56 [DOI] [PubMed] [Google Scholar]

[R10] 10.Savova GK, Masanz JJ, Ogren PV, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications. J Am Med Informatics Assoc. 2010. Sep;17(5):507–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Hegde H, Shimpi N, Panny A, Glurich I, Christie P, Acharya A. Development of non-invasive diabetes risk prediction models as decision support tools designed for application in the dental clinical environment. Informatics Med Unlocked. 2019. Jan 1;17:100254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Oliver B. The Unified Medical Language System (UMLS): Integrating Biomedical Terminology. Nucleic Acids Res. 2004;32(Database issue). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Van Rossum G. Python reference manual. Amsterdam; 1995. Jan. https://ir.cwi.nl/pub/5008 [Accessed 2021 Jul 30] [Google Scholar]

[R14] 14.Liu V, Clark MP, Mendoza M, et al. Automated identification of pneumonia in chest radiograph reports in critically ill patients. BMC Med Inform Decis Mak. 2013;13:90. PMC3765332 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Hegde H, Glurich I, Panny A, et al. Identifying Pneumonia Sub-types from Electronic Health Records Using Rule-based Algorithms. Methods Inf Med. 2022. Mar 17. doi: 10.1055/a-1801-2718. Epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A methodological approach to validate pneumonia encounters from radiology reports using Natural Language Processing (NLP).

AlokSagar Panny

Harshad Hegde

Ingrid Glurich

Frank A Scannapieco

Jayanth G Vedre

Jeffrey J VanWormer

Jeffrey Miecznikowski

Amit Acharya

Abstract

Introduction:

Objective:

Methods:

Results:

Conclusion:

1. Introduction

2. Methods

2.1. Cohort identification and data retrieval

2.2. Eligibility Criteria

Fig. 1.

Table 1:

2.3. NLP Pipeline

2.3.1. Lexicon

2.3.2. Classification rules

Fig. 2.

I. Rule of Hierarchy:

II. Rule of Majority:

III. Rule of Majority (‘I’):

Table 2:

2.3.3. Validation of NLP pipeline

2.3.4. Classification of Pneumonia episodes

2.4. Ethical considerations

3. Results

3.1. Validation of NLP tool and classification rules.

Table 3:

3.2. NLP classification outcome

4. Discussion

4.1. Limitations

4.2. Clinical/Public Health Relevance

5. Conclusions

Supplementary Material

Acknowledgement

Funding sources.

Footnotes

References:

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases