Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Aug 1.
Published in final edited form as: Pharmacoepidemiol Drug Saf. 2013 Apr 1;22(8):10.1002/pds.3418. doi: 10.1002/pds.3418

Natural Language Processing to Identify Pneumonia from Radiology Reports

Sascha Dublin 1,2, Eric Baldwin 1, Rod L Walker 1, Lee M Christensen 6, Peter J Haug 5,6, Michael L Jackson 1, Jennifer C Nelson 1,3, Jeffrey Ferraro 5,6, David Carrell 1, Wendy W Chapman 4
PMCID: PMC3811072  NIHMSID: NIHMS471897  PMID: 23554109

Abstract

Purpose

To develop Natural Language Processing (NLP) approaches to supplement manual outcome validation, specifically to validate pneumonia cases from chest radiograph reports.

Methods

We trained one NLP system, ONYX, using radiograph reports from children and adults that were previously manually reviewed. We then assessed its validity on a test set of 5,000 reports. We aimed to substantially decrease manual review, not replace it entirely, and so we classified reports as 1) consistent or 2) not consistent with pneumonia or 3) requiring manual review due to complex features. We developed processes tailored either to optimize accuracy or to minimize manual review. Using logistic regression, we jointly modeled sensitivity and specificity of ONYX in relation to patient age, comorbidity, and care setting. We estimated positive and negative predictive value (PPV and NPV) assuming pneumonia prevalence in the source data.

Results

Tailored for accuracy, ONYX identified 25% of reports as requiring manual review (34% of true pneumonias and 18% of non-pneumonias). For the remainder, ONYX’s sensitivity was 92% (95% confidence interval [CI] 90–93%), specificity 87% (86–88%), PPV 74% (72– 76%), and NPV 96% (96–97%). Tailored to minimize manual review, ONYX classified 12% as needing manual review and for the remainder had sensitivity 75% (72-77%), specificity 95% (94-96%), PPV 86% (83-88%), and NPV 91% (90-91%).

Conclusions

For pneumonia validation, ONYX can replace almost 90% of manual review while maintaining low to moderate misclassification rates. It can be tailored for different outcomes and study needs and thus warrants exploration in other settings.

Keywords: pneumonia, Natural Language Processing, sensitivity, specificity, validity

INTRODUCTION

Pneumonia is common and can have severe consequences in older adults. A growing literature suggests that some medications increase pneumonia risk.1-5 Pharmacoepidemiologic studies often identify pneumonia cases within large databases using International Classification of Diseases, version 9 (ICD-9) codes or the equivalent. However, ICD-9 codes lack accuracy for pneumonia: in validation studies, their sensitivity has ranged from 48 to 80% and positive predictive value (PPV) from 73 to 81%.6,7 Misclassification of outcomes can limit statistical power and bias study results. Some studies have reviewed medical records to validate cases,1,3,8 but this approach is costly and time-consuming. Automated methods for outcome validation would be very helpful and could also be used for clinical decision support or public health surveillance.

With the growing use of electronic medical records, automated outcome validation may be possible using Natural Language Processing (NLP), in which a computer processes free text to create structured variables. Pneumonia is suited to this approach because the diagnosis requires a positive chest radiograph,9,10 and chest radiograph reports have fairly standard format and language. Several studies have used NLP to identify pneumonia from clinical texts,11-14 with sensitivity from 64 to 100% and specificity from 85 to 99%. Most prior studies examined relatively few reports and included few pneumonia cases. Little is known about accuracy in the outpatient setting, where pneumonia is often diagnosed. No prior study evaluated NLP as a filter for chart review.

Our aim was to develop NLP approaches to validate pneumonia cases from electronic radiology reports. Because we knew that certain reports are challenging to classify, we aimed to replace a large portion of manual review with NLP, but not all. We used reports that were previously manually reviewed to train one NLP tool, ONYX,15 and assess its accuracy. We tailored our approach for different scenarios to explore trade-offs between efficiency and accuracy.

METHODS

Setting

Group Health (GH) is an integrated healthcare delivery system in the Northwest United States with extensive electronic health data. GH members have coverage through employer-based plans, individual plans, Medicare, and Medicaid. The racial and ethnic composition is similar to the surrounding region, including 79% Caucasian, 3% African-American, 8% Asian/Pacific Islander, 1% Native American, 5% Hispanic and 3% other race.

This study was approved by the GH Human Subjects Review Committee with a waiver of consent.

Data Sources

The gold standard measure of pneumonia came from medical record reviews performed for the Pneumonia Surveillance Study (PSS).16 Presumptive cases were identified from ICD-9 codes (480-487.0 or 507.0) for GH members of all ages between 1998 and 2004.16 Trained abstractors reviewed about 93,000 electronic chest radiograph reports (Table 1) to determine if an infiltrate was present or the radiologist interpreted the report as showing pneumonia. To improve consistency and ensure that abnormal findings were likely to represent pneumonia, abstractors were given detailed instructions (manual available on request). For instance, infiltrates described as streaky, nodular, mass-like or consistent with atelectasis did not qualify. Inter-rater agreement was 94% (kappa 0.84) and intra-rater agreement 95% (kappa 0.87). For the current analyses, the gold standard for pneumonia was that a report described an infiltrate meeting study criteria or contained a radiologist’s interpretation that pneumonia was present.

Table 1.

Characteristics of the source population of radiology reports and the test set of 5,000 reports used for validation

Source dataset* Test set
Characteristic All reports
(N=93,110)
Positive for
pneumonia
(N=26,345)
Negative for
pneumonia
(N=66,765)
(N=5,000)
n (%) n (%) n (%) n (%)
Patient age (years)
 0-4 7,130 (8) 3,084 (12) 4,046 (6) 423 (8)
 5-19 7,839 (8) 3,426 (13) 4,413 (7) 439 (9)
 20-44 12,248 (13) 3,311 (13) 8,937 (13) 609 (12)
 45-64 24,984 (27) 5,840 (22) 19,144 (29) 1,256 (25)
 65-74 14,466 (16) 3,538 (13) 10,928 (16) 840 (17)
 75-84 17,940 (19) 4,752 (18) 13,188 (20) 987 (20)
 85+ 8,503 (9) 2,394 (9) 6,109 (9) 446 (9)
Pneumonia case 26,345 (28) 26,345 (100) 0 (0) 2200 (44)
Comorbidities
 Congestive heart failure 14,424 (15) 3,712 (14) 10,712 (16) 877 (18)
 Chronic lung disease 29,079 (31) 8,240 (31) 20,839 (31) 1767 (35)
 Cancer 12,285 (13) 3,255 (12) 9,030 (14) 747 (15)
Setting of care
 Outpatient 86,028 (92) 23,926 (91) 62,102 (93) 4598 (92)
 Inpatient 6,366 (7) 2,191 (8) 4,175 (6) 362 (7)
 Missing 716 (1) 228 (1) 488 (1) 40 (1)
*

A dataset of 93,110 chest radiograph reports that were previously manually reviewed for a study of pneumococcal conjugate vaccine.16

According to manual review.

Oversampled for true pneumonia cases (according to manual review) and comorbidities.

In addition to using PSS data, we also trained ONYX using reports from a second study which used similar methods.

Information about subjects’ age, care setting (outpatient vs. inpatient), and comorbidities (congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD), or cancer) came from GH automated data. Comorbidities were defined based on ICD-9 codes from the prior 12 months. We used automated data to identify the individual radiologist reading the report.

NLP Tools: ONYX and ConText

ONYX is an open-source NLP system (available at http://code.google.com/p/onyx-mplus/) that integrates knowledge about syntax (the structure of sentences) and semantics (the meaning of words) to interpret free text and produce structured output.15 ONYX can be trained on documents from a particular domain (e.g., the pulmonary domain), and it helps the user create training cases. Its output is a set of concepts identified from individual sentences. For instance, from the phrase “ill-defined density in the right lower lobe,” ONYX generates a concept of “localized infiltrate” with a location of “right lower lobe.”

Decision rules are applied to ONYX’s output to classify the report as a whole. We designed rules to classify reports as 1) consistent or 2) inconsistent with pneumonia or 3) needing manual review (when a report had certain prespecified complex features). Because studies’ needs differ, we developed two classifiers. Both are based on the PSS abstraction manual, and both treat ONYX as a filter for chart review, but the extent of filtering differs. The first classifier is designed to prioritize accuracy –to minimize false positives and false negatives – at the cost of more reports needing manual review. This approach might be useful for a study with greater resources to support manual review. The second classifier is designed to minimize manual review, as might be preferred when resources are limited, although we thought this efficiency might come at the cost of higher misclassification.

The first classifier identifies reports as needing manual review if any of the following are present: 1) seemingly inconsistent statements about pneumonia (e.g. infiltrates present and also absent); 2) atelectasis and pneumonia identified in a single report; or 3) state change language (e.g. infiltrate “improving” or “resolved”). Online Appendix Table 1 provides more detail. For the second classifier we made the following changes: 1) reports containing inconsistent statements were classified based on the most frequently occurring concept (pneumonia vs. not pneumonia); 2) reports with both atelectasis and pneumonia were classified as not pneumonia; and 3) ONYX’s output was processed by an algorithm called ConText to more accurately detect negated and resolved findings and to distinguish clinical history from current findings.17

ONYX is based on a family of NLP systems12,18 developed largely for chest radiograph interpretation. Our initial training data came from these systems; it consisted of phrases extracted from chest radiograph reports and their relationship to relevant concepts. Next, ONYX was applied to 30 GH reports and the results were manually reviewed and corrected. This process allows ONYX to “learn” from mistakes, creating new associations that are applied to future texts. Next, we applied ONYX to 1000 PSS reports and targeted training based on ONYX’s mistakes. In addition to editing errors in ONYX’s output, we made changes to ONYX’s processing engine, such as adding a new semantic or syntactic grammar rule. Finally, ONYX’s performance was assessed in an independent test set of 5,000 PSS reports oversampled for pneumonia (based on manual review) and comorbidities to improve the precision of estimates of the test statistics for patient subgroups.

Statistical Analyses

For each classifier, we calculated the proportion of reports that ONYX classified as requiring manual review and then estimated sensitivity and specificity for the remaining reports. Sensitivity is the proportion of true pneumonia reports that ONYX correctly identified as showing pneumonia. Specificity is the proportion of non-pneumonia reports that ONYX correctly identified as not showing pneumonia. Because we wanted to examine how these measures varied by patient characteristics, we did not calculate them directly but instead modeled them using multivariable logistic regression. Specifically, we jointly modeled ONYX’s true positive rate (sensitivity) and false positive rate (1 minus specificity) as a function of patient age, comorbidities, and care setting.19 Interaction terms between each characteristic and pneumonia status were used to facilitate estimation of both the true and false positive rates in a single model. We used weights to account for oversampling and generalized estimating equations to provide standard errors that account for potential correlation between multiple reports from the same patient.20 Using the coefficients from this model, we estimated ONYX’s overall sensitivity and specificity in a population with the same characteristics as the source population (using the predictive margins method).21,22 Finally, from the estimated sensitivity and specificity we calculated PPV and NPV assuming pneumonia prevalence as in the source population. Specifying prevalence is important because PPV increases as prevalence goes up, because the proportion of true positives increases relative to false positives. For similar reasons, NPV goes down as prevalence goes up.

We also estimated ONYX’s performance for individual radiologists who had read at least 100 reports from the test set.

Results showed that Classifier 2 had markedly higher specificity and PPV than Classifier 1. This was in part expected because of the new decision rule that arbitrarily classified certain ambiguous reports as not showing pneumonia. As a post hoc analysis, we explored the factors contributing to Classifier 2’s higher specificity by comparing the classifiers’ performance on the subset of reports that both could classify and examining Classifier 2’s accuracy for the reports that only it was able to classify.

We examined the implications of our findings for a hypothetical study seeking to validate 10,000 potential pneumonia cases, assuming the same prevalence as the source population. We calculated the number of reports that would need manual review and the number of false negatives and false positives under three scenarios: manual review without ONYX’s assistance; ONYX with Classifier 1 (tailored for accuracy); and ONYX with Classifier 2 (tailored to decrease manual review.)

Analyses were performed using SAS software, version 9.2 (SAS Institute, Inc., Cary, NC).

RESULTS

Table 1 shows characteristics of chest radiograph reports in the source dataset of 93,110 reports and the test set of 5,000 reports. Patients’ mean age in the test set was 55 years, and 92% of reports came from the outpatient setting. Outpatient reports predominate because GH patients requiring hospitalization are cared for in outside hospitals, so their inpatient radiograph reports are not available.

When ONYX and the pneumonia classifier were tailored to maximize accuracy (Classifier 1), 25% of reports were identified as requiring manual review (34% of true pneumonias and 18% of non-pneumonias). This proportion varied by age, care setting, and presence of chronic lung disease (see Online Appendix Figure 1.) For the remaining reports, ONYX had an estimated sensitivity of 92% (95% CI, 90 – 93%), specificity 87% (86 – 88%), PPV 74% (72 – 76%), and NPV 96% (96 – 97%); see Figures 1A-1C. Sensitivity differed slightly by age but not comorbidity or care setting. Specificity was lowest in those aged 65 or above and was lower for people with COPD than those without COPD.

Figure 1.

Figure 1

ONYX’s performance under conditions aimed at improving accuracy (allows more reports to be designated as needing manual review)

A. Specificity*

B. Sensitivity*

C. Positive predictive value*†

*Compared to gold standard from manual review of reports.

†PPV estimated assuming prevalence of pneumonia in the source dataset (28%).

‡Comorbid conditions ascertained from health plan automated diagnosis data (International Classification of Diseases, version 9, codes.)

Tailored to minimize manual review (Classifier 2), ONYX classified 12% of reports as needing manual review, including 18% of true pneumonias and 7% of non-pneumonias. For the remaining reports, sensitivity was 75% (72 – 77%) and specificity 95% (94 – 96%). Sensitivity varied considerably by age but specificity varied little, and neither varied substantially by comorbidity or care setting (Figures 2A-2C). The estimated PPV was 86% (83 – 88%) and NPV, 91% (90 – 91%).

Figure 2.

Figure 2

ONYX’s performance under conditions aimed at decreasing the proportion of reports needing manual review

A. Specificity*

B. Sensitivity*

C. Positive predictive value*†

*Compared to gold standard from manual review of reports.

†PPV estimated assuming prevalence of pneumonia in the source dataset (28%).

‡Comorbid conditions ascertained from health plan automated diagnosis data (International Classification of Diseases, version 9, codes.)

Online Appendix Table 2 shows results of analyses exploring reasons for Classifier 2’s higher specificity. Classifier 2 had high specificity for the reports that were able to be classified by both classifiers and also for the reports that only it could classify.

Figure 3 shows results for the 15 radiologists who had read at least 100 reports, graphed separately for each classifier. A few had particularly high (or low) values.

Figure 3.

Figure 3

ONYX’s accuracy for individual radiologists*

A. ONYX with Classifier 1 (prioritizes accuracy)

B. ONYX with Classifier 2 (prioritizes decreasing the amount of manual review). Thus, additional reports are included in the analyses for Figure 3B that were not included in Figure 3A.

*Limited to the 15 radiologists who read 100 or more reports in the test set. Each number represents one radiologist, and each radiologist is assigned the same number in both graphs.

Table 2 shows the estimated outcomes if a hypothetical study used each of 3 approaches to validate 10,000 potential cases.

Table 2.

Trade-offs between efficiency and accuracy in a hypothetical study with 10,000 potential cases and true pneumonia prevalence the same as in the source population*

Manual review
of all records
ONYX with
Classifier 1
(prioritizes
accuracy)
ONYX with
Classifier 2
(prioritizes fewer
manual reviews)
Number of charts requiring
manual review
10,000 2248 1008
 With true pneumonia 2800 952 504
 Without pneumonia 7200 1296 504
ONYX: total incorrectly classified 0 916 909
 False positives 0 768 335
 False negatives 0 148 574
ONYX: total correctly classified
(no manual review needed)
0 6836 8083
 True positives 0 1700 1722
 True negatives 0 5136 6361
*

Assumes that reports classified by NLP as either consistent or inconsistent with pneumonia do not undergo further manual review.

Compared to the gold standard of manual review and thus, by definition, there are no false positives or negatives when all reports undergo manual review.

DISCUSSION

We found that one NLP tool, ONYX, accurately identified a large proportion of pneumonia cases from electronic chest radiograph reports and could dramatically reduce manual review for outcome validation. Because research projects have varying needs, we created two versions of our tools. The first (aiming to maximize accuracy) had sensitivity 92%, specificity 87% and PPV 74% but designated 25% of reports as “needing manual review”. The second (aiming to decrease manual review) had sensitivity 75%, specificity 95%, and PPV 86%, with 12% of reports needing manual review. With this second classifier, ONYX could eliminate almost 90% of manual medical record review while maintaining a low false positive rate (5%) and moderate false negative rate (25%).

The amount of outcome misclassification that is acceptable will vary according to a project’s goals and resources. In some contexts, a false negative rate of 25% may be too high. It could lead to underestimation of pneumonia incidence or selection of cases that are not representative. In these settings, our first classifier may be preferable, with its false negative rate of only 8%. The trade-off is that more resources will be needed for manual review. Still, the false negative rates for both classifiers should be placed in context: most pharmacoepidemiologic studies define pneumonia from ICD-9 codes or the equivalent, which have false negative rates of 20-52%.6,7 With NLP research proceeding at a rapid pace, the accuracy of tools like ONYX will probably continue to improve.

Differences in accuracy between Classifier 1 and 2 likely stem from several causes, including the addition of ConText, an NLP algorithm designed to improve handling of negation and historical information. Adding ConText should improve both sensitivity and specificity. Another factor was that we changed a decision rule so that certain ambiguous reports were automatically classified as not pneumonia. This decision, which was somewhat arbitrary, favors specificity over sensitivity, because shifting reports into the “not pneumonia” category will inevitably result in more false negatives (reducing sensitivity) as well as true negatives (improving specificity).

ONYX’s accuracy varied modestly by patient age and comorbidity. For Classifier 1, sensitivity was consistently high for all groups. Specificity was more variable, with lower values for older people and those with chronic lung disease. Classifier 2 had high specificity across all groups but lower sensitivity, especially in the oldest patients. Older patients and those with chronic lung disease may be more likely to have chronic lung abnormalities, and NLP may have difficulty distinguishing chronic, stable abnormal findings from acute infiltrates.

Compared with prior NLP studies, our results indicate similar or somewhat better accuracy, although it is difficult to compare studies because of differences in the gold standards and patient populations, including pneumonia prevalence (which influences PPV and NPV). Also, prior studies used NLP to classify all reports—even those with complex features that our study diverted for manual review. This difference may make our tools’ accuracy appear higher. Mendonca et al. applied an NLP system, MedLEE, to chest radiograph reports from the neonatal intensive care unit.14 Sensitivity was 71%, specificity 99%, and positive predictive value only 7.5% compared to medical record review. The low PPV arose because pneumonia prevalence was very low (2%) and also the gold standard required signs and symptoms not captured on chest radiographs (and thus not available to MedLee.) Fiszman et al. applied SymText to 292 reports (pneumonia prevalence, 38%) and reported accuracy similar to ours: sensitivity 95%, specificity 85%, and PPV 78%.12 Their gold standard was physicians’ interpretation of the reports. Elkin et al. applied the Multithreaded Clinical Vocabulary Server to 400 reports (pneumonia prevalence, 3.5%).13 Compared to physician review of the reports, NLP had sensitivity of 100%, specificity 90% and PPV 70%. Prior studies included relatively few reports. Most did not describe the care setting (inpatient vs. outpatient), and none reported accuracy for subgroups.

Strengths of our study include the large test set and the availability of clinical information including age, comorbidity, and care setting. These resources allowed more precise estimates of accuracy as well as subgroup-specific estimates. The preponderance of outpatient exams makes our findings more relevant to the general population. To our knowledge, ours is the first study to explore tailoring NLP tools to different study needs.

Thus far, we have applied NLP only to chest radiograph reports and not other clinical data relevant to pneumonia. Radiologists’ interpretation of chest radiographs can be inconsistent, with studies reporting kappa of 0.37 for infiltrate in adults23 and 0.58 in children.24 Established case definitions, e.g. the CDC/National Healthcare Safety Network definition for hospital-acquired pneumonia,10 typically require a positive radiograph in addition to clinical symptoms and findings. One could argue that NLP should draw on symptoms, vital signs, and laboratory findings in addition to radiology reports. Murff et al. took such an approach.11 Compared to medical record review, their NLP algorithms had sensitivity of 64% (95% CI, 58-70%) and specificity of 95% (94-96%) for post-operative pneumonia. While a multi-faceted approach may be desirable, challenges to implementation exist, including that many of the data required are rarely measured in outpatients. Still, a crucial building block is an accurate method for classifying chest radiographs, which are central to most pneumonia definitions.

Our study has limitations. We measured comorbid illnesses from administrative data, which do not have perfect accuracy. Our data included relatively few inpatient cases, resulting in less precise estimates for this subgroup. Reports came from a single health care system, which may limit generalizability. Radiologists from different institutions or geographic regions may use different language, and so an important next step will be to assess transferability. It would also be useful to investigate transferability of our tools to other conditions. As a next step, we plan to assess our NLP tools’ accuracy for reports from other institutions and for other conditions. In the current study, considerable effort was needed to construct a classifier to create a binary pneumonia variable from ONYX’s output. We developed a rule-based classifier based on expert opinion, but machine learning is an alternative approach that could improve efficiency. It would also be valuable to explore the use of machine learning alone, without an NLP tool such as ONYX, to detect pneumonia from radiograph reports. Schuemie et al. recently described such efforts; he found that the best approaches achieved sensitivity of 82-93% with PPV 81-90%.25 We currently use only features of the radiology report to determine which reports need manual review. Future work could explore whether incorporating patient characteristics (e.g. age or comorbidity) can improve this determination.

In conclusion, vast amounts of clinical data are becoming available in free text within EMRs. They could be valuable for many purposes but currently are expensive and time consuming to access. New technologies such as NLP offer tremendous opportunities. Studies such as ours provide insight into the potential of NLP to improve research and clinical care.

Supplementary Material

Appendix Fig S1
01

Online Appendix Table 1. Pneumonia classifier: further explanation of criteria used to identify reports needing manual review

Online Appendix Table 2. Factors Contributing to Changes in Performance Measures Between Classifier 1 and Classifier 2

Key points.

  • When disease outcomes are identified from diagnosis codes in automated databases, misclassification can limit statistical power and bias results.

  • Natural Language Processing (NLP) can extract information from free-text electronic medical records in an automated fashion, which could improve validity and efficiency.

  • We found that one NLP system, ONYX, works well to identify pneumonia from free-text radiology reports: after training, ONYX could replace nearly 90% of manual medical record review, with sensitivity 75%, specificity 95%, and positive predictive value 86%.

  • ONYX is available open-source and can be adapted for different outcomes and study needs.

ACKNOWLEDGMENT

We thank Dr. Lisa Jackson, Principal Investigator of the Pneumonia Surveillance Study, for sharing the PSS data and commenting on an early draft of the manuscript.

Funding sources and related paper presentations: Dr. Dublin was funded by a Paul Beeson Career Development Award from the National Institute on Aging, grant K23AG028954, by the Branta Foundation, and by Group Health Research Institute internal funds. The Beeson award is also supported by the Hartford and Starr Foundations and Atlantic Philanthropies. Dr. Carrell was funded by National Cancer Institute grant RC1CA146917. Dr. Chapman was funded by National Institutes of Health grants 5R01GM090187 and U54HL108460. Group Health Research Institute internal funds covered the data collection and analysis.

Footnotes

Conflicts of Interest: Dr. Dublin has received a Merck/American Geriatrics Society New Investigator Award.

Prior presentation: This work was presented as an oral presentation at the Health Maintenance Organization Research Network Annual Conference in Boston, Massachusetts on March 24, 2011, where it received an Early Career Investigator Award.

Disclaimer: This work does not necessarily reflect the views of the National Institute on Aging or National Institutes of Health.

REFERENCES

  • 1.Trifiro G, Gambassi G, Sen EF, et al. Association of community-acquired pneumonia with antipsychotic drug use in elderly patients: a nested case-control study. Ann Intern Med. 2010;152:418–425. W139–440. doi: 10.7326/0003-4819-152-7-201004060-00006. DOI:152/7/418 [pii] 10.1059/0003-4819-152-7-201004060-00006. [DOI] [PubMed] [Google Scholar]
  • 2.Dublin S, Walker RL, Jackson ML, et al. Use of opioids or benzodiazepines and risk of pneumonia in older adults: A population-based case-control study. J Am Geriatr Soc. 2011;59:1899–1907. doi: 10.1111/j.1532-5415.2011.03586.x. DOI:10.1111/j.1532-5415.0211.03586.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Laheij RJ, Sturkenboom MC, Hassing RJ, et al. Risk of community-acquired pneumonia and use of gastric acid-suppressive drugs. JAMA. 2004;292:1955–1960. doi: 10.1001/jama.292.16.1955. [DOI] [PubMed] [Google Scholar]
  • 4.Gulmez SE, Holm A, Frederiksen H, et al. Use of proton pump inhibitors and the risk of community-acquired pneumonia:aA population-based case-control study. Arch Intern Med. 2007;167:950–955. doi: 10.1001/archinte.167.9.950. [DOI] [PubMed] [Google Scholar]
  • 5.Herzig SJ, Howell MD, Ngo LH, et al. Acid-suppressive medication use and the risk for hospital-acquired pneumonia. JAMA. 2009;301:2120–2128. doi: 10.1001/jama.2009.722. DOI:301/20/2120 [pii] 10.1001/jama.2009.722. [DOI] [PubMed] [Google Scholar]
  • 6.Aronsky D, Haug PJ, Lagor C, et al. Accuracy of administrative data for identifying patients with pneumonia. Am J Med Qual. 2005;20:319–328. doi: 10.1177/1062860605280358. DOI:20/6/319 [pii] 10.1177/1062860605280358. [DOI] [PubMed] [Google Scholar]
  • 7.van de Garde EM, Oosterheert JJ, Bonten M, et al. International classification of diseases codes showed modest sensitivity for detecting community-acquired pneumonia. J Clin Epidemiol. 2007;60:834–838. doi: 10.1016/j.jclinepi.2006.10.018. DOI:S0895-4356(06)00421-5 [pii] 10.1016/j.jclinepi.2006.10.018. [DOI] [PubMed] [Google Scholar]
  • 8.Jackson ML, Nelson JC, Weiss NS, et al. Influenza vaccination and risk of community-acquired pneumonia in immunocompetent elderly people: a population-based, nested case-control study. Lancet. 2008;372:398–405. doi: 10.1016/S0140-6736(08)61160-5. DOI:S0140-6736(08)61160-5 [pii] 10.1016/S0140-6736(08)61160-5. [DOI] [PubMed] [Google Scholar]
  • 9.Mandell LA, Wunderink RG, Anzueto A, et al. Infectious Diseases Society of America/American Thoracic Society consensus guidelines on the management of community-acquired pneumonia in adults. Clin Infect Dis. 2007;44(Suppl 2):S27–72. doi: 10.1086/511159. DOI:CID41620 [pii]10.1086/511159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Horan TC, Andrus M, Dudeck MA. CDC/NHSN surveillance definition of health care-associated infection and criteria for specific types of infections in the acute care setting. Am J Infect Control. 2008;36:309–332. doi: 10.1016/j.ajic.2008.03.002. DOI:S0196-6553(08)00167-3 [pii] 10.1016/j.ajic.2008.03.002. [DOI] [PubMed] [Google Scholar]
  • 11.Murff HJ, FitzHenry F, Matheny ME, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306:848–855. doi: 10.1001/jama.2011.1204. DOI:306/8/848 [pii] 10.1001/jama.2011.1204. [DOI] [PubMed] [Google Scholar]
  • 12.Fiszman M, Chapman WW, Aronsky D, et al. Automatic detection of acute bacterial pneumonia from chest X-ray reports. J Am Med Inform Assoc. 2000;7:593–604. doi: 10.1136/jamia.2000.0070593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Elkin PL, Froehling D, Wahner-Roedler D, et al. NLP-based identification of pneumonia cases from free-text radiological reports. AMIA Annu Symp Proc. 2008:172–176. [PMC free article] [PubMed] [Google Scholar]
  • 14.Mendonca EA, Haas J, Shagina L, et al. Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform. 2005;38:314–321. doi: 10.1016/j.jbi.2005.02.003. DOI:S1532-0464(05)00016-X [pii] 10.1016/j.jbi.2005.02.003. [DOI] [PubMed] [Google Scholar]
  • 15.Christensen LM, Harkema H, Haug P, et al. ONYX: A system for the semantic analysis of clinical text. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing; Boulder, CO: Association for Computational Linguistics; 2009. pp. 19–27. 19-27. [Google Scholar]
  • 16.Nelson JC, Jackson M, Yu O, et al. Impact of the introduction of pneumococcal conjugate vaccine on rates of community acquired pneumonia in children and adults. Vaccine. 2008;26:4947–4954. doi: 10.1016/j.vaccine.2008.07.016. DOI:S0264-410X(08)00933-X [pii] 10.1016/j.vaccine.2008.07.016. [DOI] [PubMed] [Google Scholar]
  • 17.Harkema H, Dowling JN, Thornblade T, et al. ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform. 2009;42:839–851. doi: 10.1016/j.jbi.2009.05.002. DOI:S1532-0464(09)00074-4 [pii] 10.1016/j.jbi.2009.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Christensen L, Haug PJ, Fiszman M. MPLUS: a probabilistic medical language understanding system. Proc Workshop on Natural Language Processing in the Biomedical Domain. 2002:29–36. [Google Scholar]
  • 19.Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press; Oxford: 2003. [Google Scholar]
  • 20.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 73:13–22. 19986. [Google Scholar]
  • 21.Lane PW, Nelder JA. Analysis of covariance and standardization as instances of prediction. Biometrics. 1982;38:613–621. [PubMed] [Google Scholar]
  • 22.Graubard BI, Korn EL. Predictive margins with survey data. Biometrics. 1999;55:652–659. doi: 10.1111/j.0006-341x.1999.00652.x. [DOI] [PubMed] [Google Scholar]
  • 23.Albaum MN, Hill LC, Murphy M, et al. Interobserver reliability of the chest radiograph in community-acquired pneumonia. PORT Investigators. Chest. 1996;110:343–350. doi: 10.1378/chest.110.2.343. [DOI] [PubMed] [Google Scholar]
  • 24.Hansen J, Black S, Shinefield H, et al. Effectiveness of heptavalent pneumococcal conjugate vaccine in children younger than 5 years of age for prevention of pneumonia: updated analysis using World Health Organization standardized interpretation of chest radiographs. Pediatr Infect Dis J. 2006;25:779–781. doi: 10.1097/01.inf.0000232706.35674.2f. DOI:10.1097/01.inf.0000232706.35674.2f00006454-200609000-00005 [pii] [DOI] [PubMed] [Google Scholar]
  • 25.Schuemie MJ, Sen E, ’t Jong GW, et al. Automating classification of free-text electronic health records for epidemiological studies. Pharmacoepidemiol Drug Saf. 2012 doi: 10.1002/pds.3205. DOI:10.1002/pds.3205. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix Fig S1
01

Online Appendix Table 1. Pneumonia classifier: further explanation of criteria used to identify reports needing manual review

Online Appendix Table 2. Factors Contributing to Changes in Performance Measures Between Classifier 1 and Classifier 2

RESOURCES