Abstract
A natural language processing (NLP) algorithm to extract microbial keratitis morphology measurements from the electronic health record (EHR) was 75-96% sensitive and 91%-96% specific. NLP accurately extracts data from the corneal exam free-text EHR field.
Electronic health records (EHR) are rapidly being adopted among practicing ophthalmologists, with rates doubling between 2011 and 2016.1 Natural language processing (NLP) is a strategy to capture visit information from EHRs. NLP describes a family of algorithms that can capture key words from free-text and convert it to a structured format for analysis. These algorithms may use statistical and linguistic methods to derive meaning in the face of word variations, word ambiguity, and documentation inconsistency. In EHR-based research, extracting key information often requires chart review, which can be time-consuming and prone to error. Through automation with algorithms, chart review can be performed at scale in a fraction of the time. Ophthalmic researchers have used NLP to identify key words for glaucoma diagnoses, medication names, ophthalmic procedures, and surgical complications.2–4 They have also used it to obtain numeric values for visual acuity.5 New opportunities exist to leverage the rich information in EHRs and physicians’ notes to better understand ophthalmic diseases and improve clinical care.
The Intelligent Research in Sight (IRIS) registry and other registries have de-identified data for clinical and research purposes,6,7 but optimized tools are needed to extract information in free-text fields accurately. NLP could augment many aspects of ophthalmic clinical research from identification of cohorts and exposures to quantifying disease progression and outcomes. The purpose of this study was to evaluate NLP as a tool for accurately extracting quantifiable measures of disease from EHRs in an ophthalmology sample, specifically patients with microbial keratitis (MK). MK disease morphology changes rapidly; disease progression is typically documented as a free-text narrative in the examination note.
Patients with MK were identified using international classification of diseases codes from the EPIC EHRs of the University of Michigan (UM, from August 1, 2012 to March 30, 2018) and Henry Ford Health System (HFHS, from April 1, 2016 to May 1, 2018). The data represented text documentation from eight physicians and 10 cornea fellows from UM and three faculty physicians from HFHS. An NLP algorithm was created on a training dataset of MK patients from UM (7220 encounters of 1689 patients). The NLP algorithm searched the ‘cornea’ free-text field in the ophthalmic examination section of the progress note for MK measurements. The NLP algorithm separately extracted two key features of MK, the epithelial defect and stromal infiltrate, as millimeter measurements in two dimensions. A study team member, with oversight and clarifications by a cornea specialist (MAW), performed a chart review (CR) of the full clinical encounter for the same measurements. This was performed on a random sample of 100 MK patients (400 encounters) within the training sample who had more severe disease requiring four visits within the first 14 days of care. To optimize the algorithm prior to validation, CR and NLP results were compared for agreement of extracted measurements. The algorithm was iteratively updated to decrease errors. The completely trained algorithm (Online Supplement 1, available at www.aaojournal.org) was compared to CR on a unique, hold-out validation set from UM (400 encounters of 100 unique patients from the training dataset) and an external validation set from HFHS without any sample restrictions (59 encounters of 59 patients). Across our 3 samples (UM training and validation, HFHS validation), the corneal examination section of the EHR contained an average character count that ranged from 132-150, as well as 26-27 words, 1-2 sentences, and 2-3 measurement values. Patients were on average 45-58 years old, 51-69% female, and 52-85% White. Compared to CR, the sensitivity of the NLP algorithm to extract MK measurements ranged from 87-96% for the UM training sample, 8288% for the UM validation sample, and 75-76% for the HFHS validation sample (Table 1). Specificity ranged from 91-96% and positive predictive value ranged from 91-98%, over all samples. Physicians did not document the epithelial defect or stromal infiltrate size in the EHR in 27%-58% of encounters across samples.
Table 1.
Sensitivity and specificity of a natural language processing algorithm (NLP) to extract microbial keratitis measurements from the electronic health record.
UM Training Sample (n=400 encounters) | UM Validation Sample (n=400 encounters) | HFHS Validation Sample (n=59 encounters) | ||||
---|---|---|---|---|---|---|
NLP Ulcer Measurement | CR Ulcer Measurement | CR Ulcer Measurement | CR Ulcer Measurement | |||
Present | Absent | Present | Absent | Present | Absent | |
# (# correct, %) | # | # (# correct, %) | # | # (# correct, %) | # | |
ED 1st Dimension | ||||||
Present | 291 (283, 97%) | 8 | 265 (257, 97%) | 4 | 27 (26, 96%) | 1 |
Absent | 4 | 97 | 28 | 103 | 7 | 24 |
Sensitivity | 95.9% (93.7, 98.2) | 87.7% (84.0, 91.5) | 76.5% (62.2, 90.7) | |||
Specificity | 92.4% (87.3, 97.5) | 96.3% (92.7, 99.9) | 96.0% (88.3, 100.0) | |||
PPV | 97.3% (95.4, 99.1) | 98.5% (97.0, 100.0) | 96.3% (89.2, 100,0) | |||
F Measure | 0.966 | 0.928 | 0.853 | |||
ED 2nd Dimension | ||||||
Present | 291 (280, 96%) | 8 | 265 (256, 97%) | 4 | 26 (25, 96%) | 2 |
Absent | 4 | 97 | 28 | 103 | 7 | 24 |
Sensitivity | 94.9% (92.4, 97.4) | 87.4% (83.6, 91.2) | 75.8% (61.1,90.4) | |||
Specificity | 92.4% (87.3, 97.5) | 96.3% (92.7, 99.9) | 92.3% (82.1, 100.0) | |||
PPV | 97.2% (95.3, 99.1) | 98.5% (97.0, 100.0) | 92.6% (82.7, 100.0) | |||
F Measure | 0.960 | 0.926 | 0.834 | |||
SI 1st Dimension | ||||||
Present | 205 (195, 95%) | 11 | 201 (186, 93%) | 16 | 26 (21,81%) | 2 |
Absent | 16 | 168 | 23 | 160 | 2 | 29 |
Sensitivity | 88.2% (84.0, 92.5) | 83.0% (78.1,88.0) | 75.0% (59.0, 91.0) | |||
Specificity | 93.9% (90.3, 97.4) | 90.9% (86.7, 95.2) | 93.6% (84.9, 100.0) | |||
PPV | 94.7% (91.6, 97.7) | 92.1% (88.4, 95.8) | 91.3% (79.8, 100.0) | |||
F Measure | 0.913 | 0.873 | 0.824 | |||
SI 2nd Dimension | ||||||
Present | 205 (193, 94%) | 11 | 201 (184, 92%) | 16 | 26 (21,81%) | 2 |
Absent | 16 | 168 | 23 | 160 | 2 | 29 |
Sensitivity | 87.3% (82.9, 91.7) | 82.1% (77.1,87.2) | 75.0% (59.0, 91.0) | |||
Specificity | 93.9% (90.3, 97.4) | 90.9% (86.7, 95.2) | 93.6% (84.9, 100.0) | |||
PPV | 94.6% (91.5, 97.7) | 92.0% (88.2, 95.8) | 91.3% (79.8, 100.0) | |||
F Measure | 0.908 | 0.821 | 0.824 |
UM, University of Michigan; HFHS, Henry Ford Health System; NLP, Natural Language Processing; CR, Chart Review; ED, Epithelial Defect; SI, Stromal Infiltrate; PPV, Positive Predictive Value
Sensitivity, Specificity, and PPV are displayed with 95% confidence intervals
Note: Sensitivity, Specificity, and PPV are calculated where true positive measurements are those where NLP extracted a measurement (present) and that measurement was correct (# correct)
Our NLP algorithm showed good sensitivity to extract quantified MK morphology measurements from free-text in the corneal examination section of the EHR. Potential applications for quantifying MK measurements are both clinical and scientific - to improve clinical care and to better understand the pathogenesis of MK. Data visualization of MK morphology over time may aid clinical decision making similar to data visualization to manage wet macular degeneration. Additionally, quantified data from large datasets can help researchers understand MK pathophysiology, disease progression, and response to treatments.
In this study, unique word choices and disjointed sentence structures remained problematic to analyze by the algorithm. The algorithm could be modified to extract text from clinical drawing notes, potentially improving performance. Additionally, physicians did not quantify MK morphology in a portion of the sample revealing limitations to both clinical care and our method of extracting quantified numbers of MK. National leaders should define terms and identify features to measure and document at each encounter. To continue algorithm optimization on existing clinical documentation, additional external validation samples will need to be tested to improve generalizability. The structure of the data in the original EHR, as structured or unstructured data or as an isolated text field or part of a general note, may affect algorithm accuracy. Additional limitations are described in Online Supplement 2, available at www.aaojournal.org.
The goal of this study was to show the accuracy of NLP for extracting quantified measurements from the EHR for a specific ophthalmic condition; however, the usefulness of NLP is not exclusive to our application. We expect ophthalmic researchers to expand their use of NLP given its increased use across medicine. In other medical disciplines, NLP has allowed researchers to accurately differentiate clinical phenotypes and detect key clinical features. Ophthalmic researchers now use NLP techniques to test hypotheses for rare exposures or conditions and to parse key clinical information to ask nuanced research questions with data that is often missing from large healthcare database analyses. After an ophthalmologist types notes into the EHR to document the eye health of a patient, researchers using NLP can interpret those notes to understand eye health for populations.
Supplementary Material
Acknowledgments
Financial support: This work was supported by a grant from the National Institutes of Health, Bethesda, MD (Woodward; Clinical Scientist Award K23EY023596). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Acronyms:
- NLP
Natural Language Processing
- EHR
Electronic health records
- MK
Microbial keratitis
- UM
University of Michigan
- HFHS
Henry Ford Health System
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Meeting Presentation: This material has been accepted for presentation at the American Society of Cataract and Refractive Surgery (ASCRS) in San Diego in May 2019.
Conflict of interest: The authors have no proprietary or commercial interest in any of the materials discussed in this article.
This article contains additional online-only material. Supplemental material is available at www.aaoiournal.org.
References
- 1.Lim MC, Boland MV, McCannel CA, et al. Adoption of electronic health records and perceptions of financial and clinical outcomes among ophthalmologists in the United States. JAMA Ophthal. 2018;136:164–170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Barrows RC Jr, Busuioc M, Friedman C. Limited parsing of notational text visit notes: ad-hoc vs. NLP approaches. Proc AMIA Symp. 2000;51–5. [PMC free article] [PubMed] [Google Scholar]
- 3.Stein JD, Rahman M, Andrews C, et al. Evaluation of an Algorithm for Identifying Ocular Conditions in Electronic Health Record Data. JAMA Ophthal. 2019. February 21 [Epub ahead of print] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Liu L, Shorstein NH, Amsden LB, Herrinton LJ. Natural language processing to ascertain two key variables from operative reports in ophthalmology. Pharmacoepidemiol Drug Saf. 2017;26:278–385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Su GL, Tsui I, Lee CS, Baughman DM, Lee AY. Validation of the Total Visual Acuity Extraction Algorithm (TOVA) for Automated Extraction of Visual Acuity Data From Free Text, Unstructured Clinical Records. Transl Vis Sci Technol. 2017;6:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Repka MX, Lum F, Burugapalli B. Strabismus, Strabismus Surgery, and Reoperation Rate in the United States: Analysis from the IRIS Registry. Ophthalmol. 2018;125:1646–1653. [DOI] [PubMed] [Google Scholar]
- 7.Rao P, Lum F, Wood K, et al. Real-World Vision in Age-Related Macular Degeneration Patients Treated with Single Anti-VEGF Drug Type for 1 Year in the IRIS Registry. Ophthalmol. 2018;125:522–528. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.