Natural language processing (NLP) is well positioned to revolutionize clinical and health services research. Researchers in these fields have long relied on medical chart review (whether paper or electronic) to capture relevant aspects of clinical care. However, over the past decades, and, most remarkably, over the past 5 years, rapid advances in the ability of NLP algorithms to understand language have opened the possibility for these algorithms to replace chart reviews. NLP algorithms have evolved from rules-based models that search for predefined expressions to deep learning–based large language models (LLMs), pretrained on vast amounts of text, that consider contextual information and result in an impressive capability to understand and generate language. This technological evolution led to a recent explosion of publications in the medical literature that harness this new capability to advance medical research.1
Approximately 80% of clinical information in electronic health records (EHR) is stored as unstructured free text data.2 Unstructured EHR data facilitate communication among the care team and with patients (eg, via open notes and secure messaging) but are largely inaccessible for analysis. NLP algorithms offer access to this vast amount of information by converting unstructured data in the form of clinical text into structured data that are ready for analysis.3 NLP-based information extraction from EHRs has many promising applications including (1) replacing the need for manual chart review for research and quality improvement; (2) identifying patient cohorts (ie, phenotyping), which facilitates research recruitment and delineates characteristics of diseases and disease progression and response to treatment and outcomes; (3) contributing to patient safety by identifying adverse drug events or drug-drug interactions mentioned in EHRs (ie, pharmacovigilance); and (4) providing real-time data to clinicians to inform medical decisions.
In this issue of Hospital Pediatrics, Aronson et al present a preview of how NLP can revolutionize EHR-based pediatric hospital research.4 This study describes the development and testing of a rules-based NLP algorithm to identify clinical notes from the emergency department (ED) that contain mentions of temperature measurements prior to the ED visit for infants aged younger than 90 days. The authors’ choice to use a rules-based NLP approach rather than a more advanced LLM has several advantages. First, the simple nature of the information being extracted (fever) in one specific clinical setting (ED) lends itself well to a rules-based approach and avoids the need for machine learning expertise and infrastructure, which could hinder implementation and broad dissemination. Second, the proposed use case provides increased efficiency compared with manual chart review, even if a human reviewer needs to perform a second review. Third, being based on predefined rules provides transparency, which facilitates trust and aids in implementation. On the other hand, LLMs offer several advantages over rules-based algorithms. First, they often provide higher accuracy, which reduces the need for a human reviewer as a second stage. Second, LLMs are capable of summarizing and labeling free text for nuanced or complicated concepts (eg, confidential content in teens, adverse event reports).5,6 Third, LLMs enable a “one-to-many” dynamic in which one model can be applied to many studies or use cases, allowing organizations to build efficient NLP capabilities.
While the primary aim of the study by Aronson et al was to assess the accuracy of a specific NLP algorithm, the broader goal was to provide researchers with a more comprehensive way to identify febrile infants for inclusion in future EHR-based studies, hopefully strengthening the evidence base for how to manage this common presenting problem. Given the barriers to conducting prospective clinical trials, pediatric researchers are increasingly utilizing administrative or EHR data to answer important clinical questions across a broad range of clinical populations. A brief interrogation of PubMed demonstrates the rapid growth of these types of studies, with publications utilizing the Pediatric Health Information System Database, for example, increasing from 70 per year to 172 per year in the past decade. Unfortunately, structured data often cannot capture the full clinical context of each patient, and relying solely on these data poses a risk of patient misclassification, which can significantly impact a study’s results and conclusions.7 NLP offers a way to combine the strengths of large-scale EHR-based research with the strengths of manual chart review in a more efficient manner.
Applied to clinical research, NLP could define more accurate cohorts for comparative effectiveness studies, either by including additional patients who would have been missed or by excluding patients who should be ineligible. As demonstrated in the current study, the authors increased their cohort of febrile infants by 46% through the application of NLP to unstructured data compared with traditional cohort identification through structured data alone. In another study on influenza, prior database studies assessing the clinical impact of oseltamivir in hospitalized children unintentionally included patients who had received oseltamivir as an outpatient in the “unexposed” cohort, as this information often lives in free text documentation.7 Both of these cases demonstrate how NLP could strengthen the validity and generalizability of the conclusions of these studies. Applied to quality improvement studies, NLP could aid in understanding adherence to guidelines and clinical pathways, as well as provide context for health care utilization.8,9 Being able to efficiently assess appropriate vs inappropriate deviations from guidelines or variability in care could help quality leaders target interventions in a more specific and effective way.
NLP can transform research in many populations of hospitalized children, where nuances in a patient’s history could impact medical decision-making in a way not easily captured by structured data such as diagnosis codes. Examples include (1) influenza, where duration of illness and prior receipt of oseltamivir may impact the decision to treat with antivirals in the hospital; (2) Kawasaki disease, where duration of fever and number of presenting clinical criteria could alter risk stratification for response to first-line intravenous immunoglobulin; (3) community-acquired pneumonia, where failure of a course of adequate oral antibiotics may influence the decision to broaden antibiotics on admission; and (4) bronchiolitis, where concurrent atopy or prior documented response to bronchodilators could rationalize use of albuterol while inpatient. These examples represent priorities for pediatric hospital medicine research, where NLP could provide a richer understanding of the patient population for research or quality improvement.10,11
Regardless of the population of interest, there are several principles that should be followed when seeking to implement NLP tools in pediatric research, many of which are exemplified by Aronson et al. First, use the simplest tool to meet your goals, saving cost and increasing transparency. Second, when selecting a use case for NLP, it is best to start with increasing the efficiency of existing workflows, aligning with the institution’s priorities and goals.12 Third, consider model generalizability across clinical settings and organizations. In this multisite study, language in clinical notes likely varied across institutions. When assessing model generalizability, it is important to assure appropriate representation of notes from all sites and to report if model performance varied by site. Fourth, consider and assess for model bias that may be perpetuated by the model if disparities in care exist.13 In this study, while the use of a rules-based algorithm to extract concrete information (fever) minimizes concerns for model bias, it is important to assess if documentation of fever varied in certain patient subgroups because it could lead to underrepresentation of patient populations in research when using this NLP tool for cohort identification.
Lastly, poor documentation by clinicians is an important limitation that needs to be acknowledged when deploying an NLP-based tool to extract information from the electronic health record. Poor documentation continues to greatly limit clinical and health services research that relies on documentation in the EHR as the best approximation of the care provided to patients. This limitation is true whether researchers perform manual chart review or use natural language processing tools. A future development of NLP that is currently being developed and tested, termed Ambient AI, may overcome this long-standing problem. Several startup companies, in collaboration with EHR vendors, are testing tools that serve as artificial intelligence–based digital scribes, which summarize the clinical encounter based on the conversation between the clinician and patient. While the main driver for implementing this technology is the promise of reducing documentation burden, it also carries an incredible promise to remove the limitation of poor documentation, which has hindered the ability of researchers to accurately capture care that is being provided to patients. Currently, this technology performs best in encounters with 1 clinician and 1 patient. An important challenge for digital scribe tools that is specific to the pediatric population is correctly identifying the many participants in a clinical encounter, including clinicians, family members, and patients.14
The paper by Aronson et al provides a glimpse into the future, in which NLP will undoubtedly transform the way we do pediatric research. With appropriate precautions, NLP holds the promise of more efficient and precise pediatric research that drives better health outcomes for children and their families.
FUNDING:
Dr Bannett was supported by the Stanford Maternal and Child Health Research Institute and by the National Institute of Mental Health of the National Institutes of Health under grant number K23MH128455. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Funders did not have any part in preparation, review, approval of the commentary, or decision to submit the commentary for publication.
Footnotes
CONFLICT OF INTEREST DISCLOSURES: The authors have no conflicts of interest or financial relationships relevant to this article to disclose.
REFERENCES
- 1.Wu S, Roberts K, Datta S, et al. Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc. 2020;27(3):457–470. doi: 10.1093/jamia/ocz200 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Martin-Sanchez F, Verspoor K. Big data in medicine is driving big changes. Yearb Med Inform. 2014;9(1):14–20. doi: 10.15265/IY-2014-0020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wang Y, Wang L, Rastegar-Mojarad M, et al. Clinical information extraction applications: a literature review. J Biomed Inform. 2018;77:34–49. doi: 10.1016/j.jbi.2017.11.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Aronson PL, Kuppermann N, Mahajan P, et al. Natural language processing to identify infants ≤90 days with fevers prior to presentation. Hosp Pediatr. 2025. doi: 10.1542/hpeds.2024-008051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Rabbani N, Brown C, Bedgood M, et al. Evaluation of a large language model to identify confidential content in adolescent encounter notes. JAMA Pediatr. 2024;178(3):308–310. doi: 10.1001/jamapediatrics.2023.6032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Johnson J, Brown C, Lee G, Morse K. Accuracy of a proprietary large language model in labeling obstetric incident reports. Jt Comm J Qual Patient Saf. 2024;S1553–7250(24)00233–2. doi: 10.1016/j.jcjq.2024.08.001 [DOI] [PubMed] [Google Scholar]
- 7.Bassett HK, Coon ER, Mansbach JM, Snow K, Wheeler M, Schroeder AR. Misclassification of both influenza infection and oseltamivir exposure status in administrative data. JAMA Pediatr. 2024;178(2): 201–203. doi: 10.1001/jamapediatrics.2023.5731 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lindsay ME, de Oliveira S, Sciacca K, Lindvall C, Ananth PJ. Harnessing Natural language processing to assess quality of end-of-life care for children with cancer. JCO Clin Cancer Inform. 2024;8(8):e2400134. doi: 10.1200/CCI.24.00134 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pillai M, Posada J, Gardner RM, Hernandez-Boussard T, Bannett Y. Measuring quality-of-care in treatment of young children with attention-deficit/hyperactivity disorder using pre-trained language models. J Am Med Inform Assoc. 2024;31(4):949–957. doi: 10.1093/jamia/ocae001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kaiser SV, Rodean J, Coon ER, Mahant S, Gill PJ, Leyenaar JK. Common diagnoses and costs in pediatric hospitalization in the US. JAMA Pediatr. 2022;176(3):316–318. doi: 10.1001/jamapediatrics.2021.5171 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gill PJ, Anwar MR, Thavam T, et al. Pediatric Research in Inpatient Setting (PRIS) Network. Identifying conditions with high prevalence, cost, and variation in cost in US children’s hospitals. JAMA Netw Open. 2021;4(7):e2117816. doi: 10.1001/jamanetworkopen.2021.17816 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Morse KE, Bagley SC, Shah NH. Estimate the hidden deployment cost of predictive models to improve patient care. Nature Medicine. 2020/01/01 2020;26(1):18–19. doi: 10.1038/s41591-019-0651-8 [DOI] [PubMed] [Google Scholar]
- 13.Bear Don’t Walk OJ IV, Reyes Nieva H, Lee SS-J, Elhadad N. A scoping review of ethics considerations in clinical natural language processing. JAMIA Open. 2022;5(2):ooac039. doi: 10.1093/jamiaopen/ooac039 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Quiroz JC, Laranjo L, Kocaballi AB, Berkovsky S, Rezazadegan D, Coiera E. Challenges of developing a digital scribe to reduce clinical documentation burden. NPJ Digit Med. 2019;2(1):114. doi: 10.1038/s41746-019-0190-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
