Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jun 1.
Published in final edited form as: J Allergy Clin Immunol. 2018 Feb 10;141(6):2292–2294.e3. doi: 10.1016/j.jaci.2017.12.1003

Ascertainment of Asthma Prognosis Using Natural Language Processing from Electronic Medical Records

Sunghwan Sohn 1,*, Chung-Il Wi 2,3,*, Stephen T Wu 4, Hongfang Liu 1, Euijung Ryu 1, Elizabeth Krusemark 2,3, Alicia Seabright 3, Gretchen A Voge 5, Young J Juhn 2,3
PMCID: PMC5994178  NIHMSID: NIHMS951150  PMID: 29438770

Summary

NLP algorithm successfully determined asthma prognosis (i.e., no remission, long-term remission, and intermittent remission) by taking into account asthma symptoms documented in EMR, and addressed the limitations of billing code- based asthma outcome assessment.

Keywords: Algorithm, informatics, outcome, remission, relapse, electronic medical record, large-scale, children

To the Editor

Childhood asthma, the most common chronic illness in children, frequently (10- 70%) remits before adulthood.1,2 This observation poses a question as to why some children outgrow asthma and others do not. Addressing this question, while assessing longitudinal asthma prognosis over time, is important and necessary. In a real-world setting, it is challenging to perform periodic longitudinal surveys and follow-up visits for a large population in the context of assessing long-term asthma outcomes and prognosis. While structured code-based criteria can be improved with supplementary information on asthma symptoms that is obtained from labor-intensive medical record review,3 it still poses challenges for large scale studies that require review of large volumes of medical records. Recently, we demonstrated promising results in which a natural language processing (NLP) algorithm extracts asthma-related episodes and symptoms embedded in the free text of EMRs, automatically ascertaining asthma status based on predetermined asthma criteria (PAC).4

In this study, we extended our prior work by developing and validating an NLP algorithm to determine asthma prognosis after asthma onset (no remission, long- term remission, and intermittent remission) based on asthma-related events embedded in clinical notes of an EMR system. Asthma onset was defined as the earliest constellation of symptoms found in EMRs (birth to the last follow-up date of the subjects) that met the PAC. We defined remission as the absence of asthma events for at least three consecutive years after asthma onset (i.e., index date of asthma). Asthma events include any of 1) clinic visit for asthma with a physician diagnosis of asthma, 2) asthma symptoms, such as cough plus wheezing, prolonged exhaling, exercise-induced symptoms, or chest tightness/pain; night cough; dyspnea, or 3) current use of asthma medication. Relapse was defined as the occurrence of any of these events after achieving remission. We further categorized remission such as intermittent remission, as remission followed by relapse, and long-term remission, as remission without any relapse during our study period (i.e., asthma onset to the last follow-up date). Otherwise, a patient with asthma onset who had never achieved remission was considered as persistent asthma. Hence, each patient’s asthma prognosis can be assigned to one of three categories at any given time (i.e., persistent asthma, intermittent remission, and long-term remission).

Figure 1 shows the high-level design of asthma remission and relapse ascertainment by the NLP algorithm and manual review. The NLP algorithm (right branch) extracted descriptive patterns of asthma events (e.g., wheezing, asthma medication) and temporal information (e.g., date, duration) from manually annotated data (left branch), associated them together, and applied rules to compute asthma event timelines in order to determine remission and relapse. For each subject, all clinical notes after asthma onset were processed to determine the complete history of asthma remission and relapse with date information. Then, the NLP algorithm performance was compared to the manual chart review with manual chart review as the gold standard for validation. Our NLP algorithm was built in the framework of the open-source NLP pipeline MedTagger (http://ohnlp.org/index.php/MedTagger) developed at Mayo Clinic.5 We utilized EMRs of 35 subjects with asthma (a training cohort from a previous study of the NLP-PAC algorithm6) to develop a prototype NLP algorithm for asthma prognosis. Then, we used the EMRs of an independent cohort of 72 asthmatic subjects enrolled in a previous study which assessed the risk of asthma among late, preterm infants born in Olmsted County, Minnesota, between 2002 and 20067; half (36 subjects) were used as a formative cohort to refine the NLP algorithm and the other half (36 subjects) as a validation cohort to validate the NLP algorithm. Criterion validity of the NLP algorithm was assessed by determining concordance of asthma prognosis between the NLP algorithm and manual chart review. Construct validity of the NLP algorithm was assessed by determining the correlation between asthma remission status (yes/no) ascertained by the NLP algorithm and factors which have been reported to be associated with asthma remission, such as gender or other atopic conditions.1,2,8 This study was approved by the Institutional Review Boards (IRBs) at Mayo Clinic and the Olmsted Medical Center.

Figure 1.

Figure 1

An overview of manual vs. NLP asthma prognosis ascertainment using an example.

In a validation cohort of 36 children with asthma, 1,194 asthma-related events were documented in EMRs. eTable 1 (see online repository) shows that the training, formative, and validation cohorts have overall similar characteristics.

During the follow-up period since asthma onset (median age [interquartile range]: 8.3 [6.8-9.9]), 33 children with asthma (91%) showed concordance between manual chart review and NLP for the three categories (Table 1). When considering date of remission and relapse, the concordance was similar (i.e., 32 cases (88%) matched within one month). Use of NLP for extracting individual asthma events (i.e., asthma diagnosis, symptoms, and medications) produced macro- and micro-averaged F-measures of 0.96 and 0.95, respectively, showing high concordance with manual chart review. eFigure 1 (see online repository) shows patient-level agreement of NLP-identified comprehensive asthma prognosis profiling, i.e., histories of sequence of remission and relapse compared with manual chart review. In all subgroups categorized by the sequence of remission and relapse during the follow-up period, positive predictive value and sensitivity ranged between 0.83 and 1.0. Associations of known factors linked to asthma remission were similar between asthma prognoses defined by NLP versus that by manual chart review (see eTable 2 in online repository). For example, by both NLP and manual chart review, children who did not achieve remission were more likely to have allergic rhinitis (33% and 36%) than those who did achieve remission (5% and 0%).

Table 1.

Concordance of asthma prognosis between the NLP algorithm and manual chart review (validation cohort, n=36)

Status of asthma prognosis
Weighted Kappa= 0.82 By manual chart review
No remission Long-term remission Intermittent remission Total
By NLP No remission 17 0 1 18
Long-term remission 1 10 0 11
Intermittent remission 1 0 6 7
Total 19 10 7 36
Individual asthma event extraction
Sensitivity PPV F-measure*
Micro average 0.950 0.957 0.953
Macro averageǂ 0.943 0.980 0.961

Computed by using a global count of each metric and averaging these sums;

ǂ

obtained by first calculating each metric per patient and taking the average of each metric.

*

harmonic mean of sensitivity and positive predictive value (PPV) defined by 2×(sensitivity×PPV)/(sensitivity+PPV)

Our NLP approach to determine asthma prognosis was based on evidence from clinical documents in EMRs, which addresses the limitations of billing code- based asthma outcome assessment, not accounting for asthma symptoms. It improves our ability to detect the biological heterogeneity of asthma prognoses and their related predictors more precisely in large-scale studies. If our NLP algorithm for asthma prognosis is replicated and properly implemented in EMR systems, it could have important implications on clinical practice (e.g., automated chart review), research (e.g., replication of study findings in a real-world setting), and even public health (e.g., identification of high-risk populations with asthma). A limitation of this study is the relatively small sample size due to the labor- intensive manual annotation necessary to construct a large data set (e.g., 1,194 asthma-related events in our study). However, we believe that our multi-stage development and validation process (i.e., development, formation, and validation)9 produces a reasonable performance estimate. NLP misinterpreted some hypothetical sentences or materials for patient instruction in EMRs (e.g., “watch closely for any shortness of breath or persistent cough, or difficulty treating”) as patient’s actual symptoms, which could be improved with further enhancement of the algorithm.

In conclusion, our NLP-based ascertainment of asthma prognosis has strong potential to become a tool enabling large-scale asthma studies and population management strategies for asthma care in the era of EMRs. We believe that this NLP tool would be feasible in other clinical settings with modest adjustments, as demonstrated by our previous study for asthma ascertainment4. Additional studies in an independent cohort will be needed for replication of results.

Supplementary Material

Acknowledgments

We thank Mrs. Kelly Okeson for her administrative assistance.

Funding source: This study was supported by NIH grant (R21AI116839-01). It was made possible by Rochester Epidemiology Project (R01-AG34676) from the National Institute on Aging and CTSA Grant Number UL1 TR000135 from the National Center for Advancing Translational Sciences (NCATS). Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NIH.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Financial Disclosure: Dr. Young Juhn is the Principal Investigator of the Innovative Asthma Research Methods Award from Genentech. The authors have indicated they have no financial relationships relevant to this article to disclose.

Conflict of Interest: The authors have indicated they have no conflict of interest relevant to this article to disclose.

References

  • 1.Tai A, Tran H, Roberts M, et al. Outcomes of childhood asthma to the age of 50 years. J Allergy Clin Immunol. 2014;133:1572–8 e3. doi: 10.1016/j.jaci.2013.12.1033. [DOI] [PubMed] [Google Scholar]
  • 2.Covar RA, Strunk R, Zeiger RS, et al. Predictors of remitting, periodic, and persistent childhood asthma. J Allergy Clin Immunol. 2010;125:359–66 e3. doi: 10.1016/j.jaci.2009.10.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Expert Panel Report 3 (EPR-3): Guidelines for the Diagnosis and Management of Asthma-Summary Report 2007. J Allergy Clin Immunol. 2007;120:S94–138. doi: 10.1016/j.jaci.2007.09.043. [DOI] [PubMed] [Google Scholar]
  • 4.Wi CI, Sohn S, Ali M, et al. Natural Language Processing for Asthma Ascertainment in Different Practice Settings. J Allergy Clin Immunol Pract. 2017 doi: 10.1016/jjaip201704041. In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Liu H, Bielinski S, Sohn S, et al. AMIA Summits Transl Sci Proc. San Francisco, CA: 2013. An information extraction framework for cohort identification using electronic health records; pp. 149–53. [PMC free article] [PubMed] [Google Scholar]
  • 6.Wu ST, Sohn S, Ravikumar K, et al. Automated chart review for asthma cohort identification using natural language processing: an exploratory study. Annals of Allergy, Asthma & Immunology. 2013;111:364–9. doi: 10.1016/j.anai.2013.07.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Voge GA, Carey WA, Ryu E, King K, Wi C, Juhn Y. What accounts for the association between late preterm births and risk of asthma? Allergy and Asthma Proceedings. doi: 10.2500/aap.2017.38.4021. (In press) (Accepted on 10/14/2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sears MR. Predicting asthma outcomes. J Allergy Clin Immunol. 2015;136:829–36. doi: 10.1016/j.jaci.2015.04.048. [DOI] [PubMed] [Google Scholar]
  • 9.Wu S, Wi C, Sohn S, Liu H, Juhn Y. Language Resources and Evaluation (LREC) conference Potorož. Slovenia: 2016. Staggered NLP-assisted refinement for Clinical Annotations of Chronic Disease Events. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

RESOURCES