Clinical natural language processing for secondary uses

Yanjun Gao; Diwakar Mahajan; Özlem Uzuner; Meliha Yetisgen

doi:10.1016/j.jbi.2024.104596

. Author manuscript; available in PMC: 2024 Jun 28.

Published in final edited form as: J Biomed Inform. 2024 Jan 24;150:104596. doi: 10.1016/j.jbi.2024.104596

Clinical natural language processing for secondary uses

Yanjun Gao ^1,^*, Diwakar Mahajan ¹, Özlem Uzuner ¹, Meliha Yetisgen ¹

PMCID: PMC11212507 NIHMSID: NIHMS2004092 PMID: 38278312

1. Introduction

Electronic health records (EHRs) have the potential to significantly enhance quality improvement efforts and surveillance initiatives, while also revolutionizing clinical research by capturing comprehensive information about patient status and all aspects of their care. The unstructured clinical narratives found within EHRs contain critical details, including medical problems, treatments, diagnostic tests, patient demographics, social determinants of health, as well as the reasoning behind care decisions and outcomes. However, this valuable patient information cannot be readily utilized by computerized methods and must be transformed into a structured format to support various downstream applications. Natural Language Processing (NLP) plays a crucial role in converting clinical narratives into structured representations, which can complement the structured coded data used for diagnosis, care, triage, health outcomes, and clinical research. In this special issue, we present the 19 papers that were accepted for publication in this special issue, and highlight innovative clinical NLP methods and their contributions to clinical applications.

As shown in Table 1, ten papers are from the 2022 National NLP Clinical Challenges Workshop at AMIA 2022 Annual Symposium (n2c2). Seven papers [1–7] are related to Shared Task 1 Contextualized Medication Event Extraction. Three papers [8–10] are related to n2c2 2022 Shared Task 3 Progress Note Understanding – Assessment and Plan Reasoning (PNU-AP), and 9 papers are on other clinical NLP problems and applications in the field. These cover a broad range of research areas, including clinical text classification [11–13], named entity recognition (NER) [14], event extraction [15] terminology extraction [16], sentiment analysis [17], clinical text segmentation [18], and semantic representation [19]. Besides English-EHR, the publications include studies on different languages, including German [14,16], Chinese [13,15] and Spanish EHR [18].

Table 1.

Papers presented in the JBI Special Issue on Clinical NLP for Secondary Uses.

Author	Research area	Study target	Study Population	Study Data	Contribution	Shared Task
Medication event extraction and classification (n2c2/CMED challenge)
Mahajan et al. [1]	Entity extraction and classification	Medication event extraction and classification	Patients in 2014 i2b2/UTHealth Natural Language Processing shared task corpus [20,21]	n2c2/CMED	An overview summarizing the n2c2 shared task 1 on CMED	Yes
Tsujimura et al. [2]	Entity extraction and classification	“””	“””	n2c2/CMED	BERT-based pretrained model with classification layer and BILOU tags for entity extraction. A novel ensemble of span-based and QA-based models for entity classification.	Yes
Vasilakes et al. [3]	Entity extraction and classification	“””	“””	n2c2/CMED	An approach utilizing span-based BERT model was used for entity extraction and a BERT-based model with Levitated Context Markers (LCMs) for entity classification.	Yes
Chen et al. [4]	Entity extraction and classification	“””	“””	n2c2/CMED	Evaluated GatorTron [24] for entity extraction and classification tasks.	Yes
Gan et al. [5]	Entity extraction and classification	“””	“””	n2c2/CMED	An approach utilizing a BERT-based pretrained model with a CRF classification layer for entity extraction, and special markers with sliding input window for the classification task.	Yes
Schäfer et al. [6]	Entity extraction and classification	“””	“””	n2c2/CMED	An approach utilizing a BERT-based pretrained model with sliding input window.	Yes
Ramachandran et al. [7]	Entity extraction and classification	“””	“””	n2c2/CMED	A multi-stage and multi-task approach utilizing BERT models to tag sequence of medication mentions then encode medication changes with relevant context	No
Diagnostic relation prediction between Assessment and Plan sections (nec2/PNU-AP challenge)
Gao et al. [8]	Relation prediction	Diagnostic relation prediction between Assessment and Plan sections	Patients in MIMIC-III [25] (2001–2012)	n2c2/PNU-AP	An overview summarizing the n2c2 shared task 3 on PNU-AP	Yes
Gao et al. [9]	Relation prediction	“””	“””	n2c2/PNU-AP	An approach that incorporates external medical ontology and section order information into a pipeline with pre-trained language models to predict section relations	Yes
Socrates et al. [10]	Relation prediction	“””	“””	n2c2/PNU-AP	An approach for incorporating human-in-the-loop NER models in medical concept annotation to enhance training data for pre-trained language models	Yes
Papers related to clinical text classification
Liu et al. [11]	Clinical text classification	Multi-modal EHR modeling for risks and diagnosis prediction	Patients in MIMIC-III [25] (2001–2012)	MIMIC-Extract [27] benchmark with Vital signs, lab results, and clinical notes from MIMIC-III	A new model capturing missing modality in EHR and results on four risks prediction and diagnosis prediction tasks	No
Chen et al. [12]	Clinical text classification	Multi-modal EHR modeling for Autism prediction	Patients admitted to DUHS birthday between 2013 and 2019	DUHS EHR	An ensembled model that built separately on structured data and clinical text for autism prediction	No
Lu et al. [13]	Clinical text classification	Prompt-based learning for pre-trained language model to predict diseases	Pediatric and adult patients	Pediatric EHR and adult EHR from Guangzhou Women and Children’s Medical Center	Novel pre-trained language model training framework to incorporate external medical knowledge, & a prompt-engineering method to enhance few-short learning for model generalization	No
Publications on Other NLP topics
Frei and Kramer [14]	Named entity recognition	Automated German clinical dataset annotation	Patients in Kerndatensatz [28] Corpus	Medical entities from the clinical narratives in Kerndatensatz corpus	A prompt-based approach leveraging pre-trained large language models for automated dataset annotation, & models trained on the created dataset	No
Pan et al. [15]	Medical event extraction	Tumor event extraction from CT reports	Patients in CCKS 2020 corpus [29]	Medical records with tumors labeled data from patients in China	A multi-task approach converting the event extraction to machine reading comprehension, utilizing BERT and biLSTM for question-answering and answer span prediction	No
Kugic et al. [16]	Terminology extraction	Terminology expansion on German clinical corpus	Problem list entries from KAGes	Manually assigned ICD-10 codes for the de-identified problem list entries	An embedding-based method that identifies and extracts the medical terminologies	No
Denecke and Reichenpfader [17]	Sentiment analysis	Scoping review for papers about sentiment analysis on EHR	29 academic publications from PubMed and IEEE Explore	Papers focusing on sentiment analysis using clinical narratives	A literature review on papers with the topics of sentiment analysis for clinical applications, summarizing methods, data sources, uses cases and state-of-the-art performances	No
Iglesia et al. [18]	Clinical text segmentation	Spanish corpus annotation for clinical text section identification	Patients in CodiEsp Corpus [30]	Sections in the clinical notes from CodiEsp dataset	An annotated corpus in Spanish and a new metric for clinical note section identification task	No
Lee et al. [19]	Semantic representation	Semantic representation for ICD codes	Patients in United Kingdom Biobank, CDW from SMC in South Korea	ICD codes in patients records for diseases relations	An approach for converting ICD codes to mathematical vectors	No

Open in a new tab

Abbreviations: electronic health records (EHR). National Natural Language Processing Clinical Challenges (N2c2). Contextualized Medication Event Extraction (CMED). Progress Note Understanding—Assessment and Plan Reasoning (PNU-AP). Duke University Health System (DUHS). Institute of Electrical and Electronics Engineers (IEEE). Medical Information Mart for Intensive Care III (MIMIC-III). International Classification of Diseases, 10th Revision (ICD-10). Clinical Data Warehouse (CDW). Samsung Medical Center (SMC). China Conference on Knowledge Graph and Semantic Computing (CCKS) 2020.

2. Papers related to contextualized medication event extraction and classification

The papers submitted for N2C2 shared task 1, Contextualized Medication Event Extraction, focus on enhancing entity extraction and classification within clinical text. The study’s target is medication extraction and classification, the study population being patients in 2014 i2b2/UTHealth Natural Language Processing shared task corpus [20,21], and the study data being Contextualized Medication Event Dataset (CMED) [22]. Paper [1] provides an overview of the shared task and papers [2–6] are participants’ system papers for the shared task. Although paper [7] addresses the same tasks using the CMED dataset, it was not a part of the shared task. The participants for the shared task primarily leverage variations of BERT [23], a powerful pre-trained language model, adapted to interpret and analyze clinical data. However, the teams distinguished themselves based on their approach to modeling the problem (such as treating it as a token classification task or a question-answering (QA) task) and the context window that the model is permitted to observe.

Mahajan et al. [1] presented a summary of the 2022 National NLP Clinical Challenge (N2C2) Track 1 - Contextualized Medication Event Extraction. The challenge comprised of three subtasks, requiring participants to develop systems capable of identifying all medication mentions within a clinical note (Medication Extraction subtask), determining whether a change has been or is being discussed (Event Classification subtask), and classifying change events along five contextual dimensions: Action, Negation, Temporality, Certainty, and Actor (Context Classification subtask). The paper details the overall task and subtasks with data descriptions, provides participation details, analyzes systems’ performance and presents final results for each subtask from this challenge. It also offers insights into common errors and short-comings of the systems with future research directions.

Tsujimura et al. [2] proposed three models: a striding NER model using a sliding window approach for entity extraction and a span-based and multi-turn QA-based model for entity classification. The utilization of a sliding window in the striding NER model enhances the model’s ability to comprehend the context more effectively. The QA-model uses prompts from human annotation guidelines, reducing the need for prompt tuning. Incorporating a QA-style input is promising, especially in the data-scarce clinical domain, enabling the incorporation of knowledge through questions.

Vasilakes et al. [3] utilized a BERT-based span detection model for entity extraction and proposed Levitated Context Markers (LCMs), an adaptation of levitated markers, for entity classification. LCMs leverage shared position IDs and a directional attention mask, enabling pre-trained transformers to grasp both global context and concentrate on task-specific subspans without requiring extensive input markup.

Chen et al. [4] experimented with GatorTron [24], a language model pretrained on large amounts of clinical text and modelled the problem as a span & sentence classification task. While GatorTron exhibits better baseline performance than other off-the-shelf, domain-specific language models pretrained on smaller data, it fails to exceed the performance of more carefully designed, problem-specific approaches.

Gan et al. [5] experimented with a Conditional Random Fields (CRF) head on top of BERT for medication extraction task, adding specialized markers in text while modeling the problem as a sequence classification task and multi-label classification for entity classification tasks. Meanwhile, Schäfer et al. [6] treated both medication extraction and classification as token classification tasks with a variety of BERT-based models. Ramachandran et al. [7] utilized BERT with a CRF framework with semi-supervised learning for entity extraction. They also explored various techniques for entity classification, such as employing an ensemble of multiple models and adopting a multi-task setup with entity-specific markers, with the latter yielding promising results.

3. Papers related to progress note understanding – Assessment and plan reasoning

Papers submitted for N2C2 shared task 3 PNU-AP focused on relation prediction. The study target was diagnostic relation prediction between assessment and plan sections, the study population was Patients in MIMIC-III [25] (2001–2012), and the study data was n2c2/PNU-AP.

Gao et al. [8] presented a summary of the 2022 National NLP Clinical Challenge (N2C2) Track 3 “Progress Note Understanding — Assessment and Plan Reasoning” (PNU-AP). The task targeted the Assessment and Plan sections of daily progress notes in electronic health records. The challenge was to develop natural language processing (NLP) systems to predict causal relations between patient status (Assessment section) and treatment plans (Plan section), aiming to enhance diagnostic decision support by prioritizing diagnoses in lengthy documents. The paper presents results, data descriptions, evaluation methods, participation details, and system performance from this challenge, marking a significant step in applying NLP for extracting relevant information in healthcare documentation.

Gao et al. [9] proposed a method that enhances pre-trained language models by incorporating external medical ontologies and taking into account the order of sections within clinical notes. This integration aims to improve the model’s capacity to understand and predict the relationships between different sections of a clinical document.

Socrates et al. [10] developed an approach that integrates human-in-the-loop NER models to improve medical concept annotation. This human-assisted annotation is used to enrich the training data for pre-trained language models, thereby enhancing their ability to predict relationships within the clinical text. Both methods showcase innovative ways to leverage human expertise and external medical knowledge to augment the predictive capabilities of AI models in understanding complex clinical narratives.

4. Papers presenting current trends and foci in clinical NLP

The publications besides N2C2 shared tasks serve as a representation of the current trends and foci in clinical NLP research, such as prompt-based learning for pretrained language models, multi-modal EHR modeling, and automated data annotation.

4.1. Papers related to clinical text classification

Three papers focus on clinical text classification. Two of them focus on prompts. Liu et al. [11] introduced a neural network based on the transformer architecture [26] to integrate structured text, such as lab results, with natural language found in clinical narratives, for the purpose of accomplishing six clinical prediction tasks. In their work, to enable the flexible handling of input modalities, they devised a modality-specific token and designed an attention module that effectively modeled cross-modality attention within the input data. The findings of their study demonstrated that this approach successfully captured crucial signals for clinical prediction, irrespective of the input modality.

Chen et al. [12] focused on improving the prediction of autism in children under one year using structured electronic health record (EHR) data and clinical narratives. The study developed separate models based on structured data and clinical narratives, and then an ensemble model integrating both data sources. The study’s findings highlighted the significant added value of incorporating clinical text into early autism prediction models, suggesting that features learned from clinician narratives could be crucial for understanding early development in autism.

Lu et al. [13] proposed a prompt-based framework, namely Medical Knowledge Enhanced Prompt Learning (MedKPL), that incorporates an open-source external knowledge base by prompt template designing to enhance pre-trained language models’ performance in diagnosis classification. By leveraging knowledge-enhanced prompt learning, they proved that the pre-trained language models show better generalizability on unseen diseases.

4.2. Papers related to other topics

The six papers presented in this section cover different foci of NLP tasks. It is interesting that many of them cover non-English EHRs. In fact, publications on non-English EHRs pave the way for future research in clinical NLP beyond English EHR datasets, including Chinese [13,15], German [14,16], and Spanish EHR data [18].

Frei and Kramer’s [14] research tackles the task of automatically labeling datasets for model training, acknowledging the resource-intensive nature of language models and the expensive nature of manual annotation. They utilized a few-shot learning approach with specialized prompts to emphasize medical entities to semi-automatically generate synthetic datasets for NER tasks, and leveraged transfer learning techniques to train pre-trained language models. Experimental results on in-domain and out-of-domain datasets showed that models trained on the semi-automatically created datasets improved the performance of NER tasks, demonstrating the efficacy of the proposed approaches.

Pan et al. [15] introduced a novel method for medical event extraction from unstructured clinical examination reports by leveraging Mahine Reading Comprehension. The method comprised of two subtasks: Question Answerability Judgment and Span Selection.The authors deployed a BERT model as the answer judger and a BERT with a bidirectional LSTM to identify and represent key information, predicting the answer’s span in the text. The proposed approach has shown to surpass existing methods and achieved new state-of-the-art results on labelled tumor dataset from a cohort in China.

Kugic et al. [16] contributed to solving concept normalization in clinical documentation using embedding-based representation scheme with co-occurrence analysis. The significant addition of this paper is a novel methodology for extracting synonyms, hyponyms, and hypernyms, which integrates co-occurrence analysis with an embedding-based approach, bridging the lexical gap between standardized medical terms in terminologies like ICD-10 and their corresponding jargonladen clinical text. This advancement aids in better aligning clinical routine documentation with standardized terminologies, thus enhancing the accuracy and utility of term normalization in healthcare settings.

Denecke and Reichenpfader [17] presented a scoping review on sentiment analysis of clinical narratives, aiming at providing summaries of research and identifying the open gaps. The review synthesized findings from 29 unique studies, covering published research on applications of utilizing sentiment analysis for clinical outcome, and new domain specific approaches. The authors encouraged future research to prioritize establishing a gold standard sentiment lexicon tailored to the clinical narratives, augmenting or creating new high-quality corpora, and investigating the efficacy of the state-of-the-art machine learning methods for sentiment analysis.

Iglesia et al. [18] directed their focus towards the Spanish EHR Section Identification task, aiming to address a fundamental challenge in clinical record understanding. They introduced a newly annotated dataset specifically for this task. In addition to dataset creation, they delved into the exploration of evaluation metrics for clinical text segmentation, and put forward a novel evaluation algorithm which is a metric tailored specifically to clinical text. The significance of their work lies in promoting a comprehensive understanding of the performance of clinical text segmenters based on error types and the degree of errors, rather than relying solely on accuracy, which is less explanatory.

Lee et al. [19] introduced ICD2Vec, a universal framework for converting ICD codes into mathematical vectors, capturing non-linear relationship across diseases. The framework encoded disease into vectors, presenting arithmetical and sematic relationships and align them with corresponding ICD codes, and validated ICD2VEC by comparing the ICD codes’s biological relationship and cosine similarity. Besides the ICD2Vec, the work also introduced a new risk score and validated its clinical utility using large cohorts from UK and South Korea. The results showed significant correlation of ICD2Vec and ICD codes, suggesting the potential of ICD2Vec for widespread application for research and clinical practice.

5. Conclusion

The collection of the accepted papers highlights the innovative strides in clinical natural language processing. The research presented at N2C2 shared tasks showcases diverse methodologies leveraging the power of pre-trained language models like BERT, each with unique innovation contributing to solving the task of medical event extraction and diagnosis relation prediction. On the N2C2 Contextualized Medical Event Extraction task, the papers reveal a convergence towards more sophisticated, context-aware NLP systems. Researchers have made progress in improving entity extraction accuracy, tailored model optimization for clinical nuances and the development of complex, multi-task models. On the N2C2 task of progress note understanding – assessment and plan reasoning, the presented papers demonstrate significant advancements by combining medical ontologies and sophisticated annotation methods involving human-in-the-loop systems. This integration offers substantial improvements in the automated interpretation of complex diagnostic relations within EHRs.

Beyond N2C2 shared tasks, the Special Issue is witnessing a broader exploration with NLP methodologies and tackling diverse clinical issues, ranging from familiar tasks like text classification, corpus annotation, sentiment analysis, to less examined tasks such as clinical text segmentation evaluation, and diagnosis prediction.

On the topic of clinical text classification, the papers introduce groundbreaking approaches that expand the scope of this field. These methods encompass multi-modal modeling, prompt-engineering, and innovative frameworks for training language models, each contributing to the advancement of sophisticated NLP applications in clinical contexts. Other papers in the Special Issue explore a wide array of NLP topics, ranging from named entity recognition and medical event extraction to sentiment analysis and semantic representation. These studies employ innovative methods like prompt-based language modeling, multi-task approaches, embedding techniques, and novel metrics for tasks such as dataset annotation, terminology extraction, and clinical text segmentation. Together, they demonstrate the diverse applications of NLP in enhancing the understanding and utilization of complex clinical data.

These contributions collectively signal a shift towards more nuanced, knowledge-informed approaches that seek to maximize the strength of AI while maintaining human input. The progress showcased in this Special Issue not only highlights the enhanced AI’s capability in clinical settings, but also underscores the growing potential for these technologies to become integral components in decision making systems with the goal of improved patient outcomes. Importantly, the issue has also shed light on the challenges and limitations in the current landscape of clinical NLP, which includes the need of high-quality annotated datasets, the interoperability between different health information systems, and the necessity for generalizable and scalable NLP systems.

By presenting this special issue, the Guest Editorial Board wishes to provide insights and inspirations for future research and progress in the domain of clinical NLP for secondary uses.

6. Statement of significance

Electronic health records (EHRs) are pivotal in enhancing healthcare quality and advancing clinical research by providing detailed patient information. These records feature unstructured clinical narratives rich in details like medical issues, treatments, diagnostics, demographics, and social health determinants. Crucially, these narratives encompass the rationale behind care decisions and outcomes. However, their unstructured nature limits computerized usage, necessitating transformation into structured formats for various applications. Natural Language Processing (NLP) is key in this, turning narratives into structured data to support diagnosis, care, triage, and research. This special issue showcases 19 papers on innovative clinical NLP methods and their application in clinical settings.

Footnotes

CRediT authorship contribution statement

Yanjun Gao: Conceptualization, Data curation, Formal analysis, Writing – original draft, Writing – review & editing. Diwakar Mahajan: Conceptualization, Formal analysis, Writing – original draft, Writing – review & editing. Ozlem Uzuner: Conceptualization, Formal analysis, Writing – original draft, Writing – review & editing. Meliha Yetisgen: Conceptualization, Formal analysis, Writing – original draft, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

[1].Mahajan D, Liang JJ, Tsou C-H, Uzuner Ö, Overview of the 2022 n2c2 shared task on contextualized medication event extraction in clinical notes, J. Biomed. Inform (2023) 104432. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Tsujimura T, Yamada K, Ida R, Miwa M, Sasaki Y, Contextualized medication event extraction with striding NER and multi-turn QA, J. Biomed. Inf 144 (2023) 104416, 10.1016/j.jbi.2023.104416 [published Online First: Epub Date]. [DOI] [PubMed] [Google Scholar]
[3].Vasilakes J, Georgiadis P, Nguyen NTH, Miwa M, Ananiadou S, Contextualized medication event extraction with levitated markers, J. Biomed. Inf 141 (2023) 104347, 10.1016/j.jbi.2023.104347 [published Online First: Epub Date]. [DOI] [PubMed] [Google Scholar]
[4].Chen A, Yu Z, Yang X, Guo Y, Bian J, Wu Y, Contextualized medication information extraction using transformer-based deep learning architectures, J. Biomed. Inf 142 (2023) 104370, 10.1016/j.jbi.2023.104370 [published Online First: Epub Date]. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Gan Q, Hu M, Peterson KS, et al. , A deep learning approach for medication disposition and corresponding attributes extraction, J. Biomed. Inf 143 (2023) 104391, 10.1016/j.jbi.2023.104391 [published Online First: Epub Date]. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Schäfer H, Idrissi-Yaghir A, Bewersdorff J, Frihat S, Friedrich CM, Zesch T, Medication event extraction in clinical notes: contribution of the WisPerMed team to the n2c2 2022 challenge, J. Biomed. Inf 143 (2023) 104400, 10.1016/j.jbi.2023.104400 [published Online First: Epub Date]. [DOI] [PubMed] [Google Scholar]
[7].Ramachandran GK, Lybarger K, Liu Y, et al. , Extracting medication changes in clinical narratives using pre-trained language models, J. Biomed. Inf 139 (2023) 104302, 10.1016/j.jbi.2023.104302 [published Online First: Epub Date]. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Gao Y, Dligach D, Miller T, Churpek MM, Uzuner O, Afshar M, Progress note understanding—assessment and plan reasoning: overview of the 2022 N2C2 track 3 shared task, J. Biomed. Inform 2023 (2022) 104346. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Gao J, He S, Hu J, Chen G, A hybrid system to understand the relations between assessments and plans in progress notes, J. Biomed. Inform 141 (2023) 104363. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Socrates V, Gilson A, Lopez K, Chi L, Taylor RA, Chartash D, Predicting relations between SOAP note sections: the value of incorporating a clinical information model, J. Biomed. Inform 141 (2023) 104360. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].C. D Liu Jinghui, Nguyen A, Verspoor K, Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities, J. Biomed. Inform (2023). [DOI] [PubMed] [Google Scholar]
[12].Chen J, Engelhard M, Henao R, Berchuck S, Eichner B, Perrin EM, Sapiro G, Dawson G, Enhancing early autism prediction based on electronic records using clinical narratives, J. Biomed. Inf (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Lu Y, Liu X, Du Z, Gao Y, Wang G, MedKPL: a heterogeneous knowledge enhanced prompt learning framework for transferable diagnosis, J. Biomed. Inform (2023). [DOI] [PubMed] [Google Scholar]
[14].Frei J, Kramer F, Annotated dataset creation through large language models for non-English medical NLP, J. Biomed. Inform (2023). [DOI] [PubMed] [Google Scholar]
[15].Pan Q, Zhao F, Chen X, Chen D, A method for extracting tumor events from clinical CT examination reports, J. Biomed. Inf 142 (2023) 104371, 10.1016/j.jbi.2023.104371 [published Online First: Epub Date]. [DOI] [PubMed] [Google Scholar]
[16].P. B Kugic Amila, Schulz S, Kreuzthaler M, Embedding-based terminology expansion via secondary use of large clinical real-world datasets, J. Biomed. Inform (2023). [DOI] [PubMed] [Google Scholar]
[17].R. D Denecke Kerstin, Sentiment analysis of clinical narratives: a scoping review, J. Biomed. Inform (2023). [DOI] [PubMed] [Google Scholar]
[18].de la Iglesia I, Chocró P, de Maeztu G, Gojenola K, Atutx A, An open source corpus and automatic tool for section identification in Spanish health records, J. Biomed. Inf (2023). [DOI] [PubMed] [Google Scholar]
[19].Lee YC, Jung S-H, Kumar A, Shim I, Song M, Kim MS, Kim K, Myung W, Park W-Y, Won H-H, ICD2Vec: mathematical representation of diseases, J. Biomed. Inf (2023). [DOI] [PubMed] [Google Scholar]
[20].Kumar V, Stubbs A, Shaw S, Uzuner Ö, Creation of a new longitudinal corpus of clinical narratives, J. Biomed. Inf 58 (2015) S6–S10, 10.1016/j.jbi.2015.09.018 [published Online First: Epub Date]. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Stubbs A, Uzuner Ö, Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus (1532–0480 (Electronic)). [DOI] [PMC free article] [PubMed]
[22].Mahajan D, Liang JJ, Tsou CH, Toward understanding clinical context of medication change events in clinical narratives, in: AMIA Annual Symposium Proceedings 2021, Vol. 2021, American Medical Informatics Association, p. 833. [PMC free article] [PubMed] [Google Scholar]
[23].Devlin J, Chang MW, Lee K, Toutanova K, Bert: pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018. Oct 11. [Google Scholar]
[24].Yang X et al. , Gatortron: a large clinical language model to unlock patient information from unstructured electronic health records, arXiv preprint arXiv: 2203.03540, 2022. [Google Scholar]
[25].Johnson AE, Pollard TJ, Shen L, et al. , MIMIC-III, a freely accessible critical care database, Sci. Data 3 (1) (2016) 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26].Vaswani A, Shazeer N, Parmar N, et al. , Attention is all you need, Adv. Neural Inf. Proces. Syst 30 (2017). [Google Scholar]
[27].Mimic-extract: a data extraction, preprocessing, and representation pipeline for mimic-iii, in: Proceedings of the ACM Conference on Health, Inference, and Learning, 2020. [Google Scholar]
[28].für Kommunikation FdZ, Standardisierung von Forschungsinformationen im Projekt Kerndatensatz Forschung.
[29].Li X, Wen Q, Lin H, Jiao Z, Zhang J, Overview of CCKS 2020 task 3: named entity recognition and event extraction in Chinese electronic medical records, Data Intell 3 (3) (2021) 376–388, 10.1162/dint_a_00093 [published Online First: Epub Date]. [DOI] [Google Scholar]
[30].Miranda-Escalada A, Gonzalez-Agirre A, CodiEsp: Clinical Case Coding in Spanish Shared Task (eHealth CLEF 2020), eHealth CLEF, 2020. [Google Scholar]

[R1] [1].Mahajan D, Liang JJ, Tsou C-H, Uzuner Ö, Overview of the 2022 n2c2 shared task on contextualized medication event extraction in clinical notes, J. Biomed. Inform (2023) 104432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Tsujimura T, Yamada K, Ida R, Miwa M, Sasaki Y, Contextualized medication event extraction with striding NER and multi-turn QA, J. Biomed. Inf 144 (2023) 104416, 10.1016/j.jbi.2023.104416 [published Online First: Epub Date]. [DOI] [PubMed] [Google Scholar]

[R3] [3].Vasilakes J, Georgiadis P, Nguyen NTH, Miwa M, Ananiadou S, Contextualized medication event extraction with levitated markers, J. Biomed. Inf 141 (2023) 104347, 10.1016/j.jbi.2023.104347 [published Online First: Epub Date]. [DOI] [PubMed] [Google Scholar]

[R4] [4].Chen A, Yu Z, Yang X, Guo Y, Bian J, Wu Y, Contextualized medication information extraction using transformer-based deep learning architectures, J. Biomed. Inf 142 (2023) 104370, 10.1016/j.jbi.2023.104370 [published Online First: Epub Date]. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Gan Q, Hu M, Peterson KS, et al. , A deep learning approach for medication disposition and corresponding attributes extraction, J. Biomed. Inf 143 (2023) 104391, 10.1016/j.jbi.2023.104391 [published Online First: Epub Date]. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Schäfer H, Idrissi-Yaghir A, Bewersdorff J, Frihat S, Friedrich CM, Zesch T, Medication event extraction in clinical notes: contribution of the WisPerMed team to the n2c2 2022 challenge, J. Biomed. Inf 143 (2023) 104400, 10.1016/j.jbi.2023.104400 [published Online First: Epub Date]. [DOI] [PubMed] [Google Scholar]

[R7] [7].Ramachandran GK, Lybarger K, Liu Y, et al. , Extracting medication changes in clinical narratives using pre-trained language models, J. Biomed. Inf 139 (2023) 104302, 10.1016/j.jbi.2023.104302 [published Online First: Epub Date]. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Gao Y, Dligach D, Miller T, Churpek MM, Uzuner O, Afshar M, Progress note understanding—assessment and plan reasoning: overview of the 2022 N2C2 track 3 shared task, J. Biomed. Inform 2023 (2022) 104346. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Gao J, He S, Hu J, Chen G, A hybrid system to understand the relations between assessments and plans in progress notes, J. Biomed. Inform 141 (2023) 104363. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Socrates V, Gilson A, Lopez K, Chi L, Taylor RA, Chartash D, Predicting relations between SOAP note sections: the value of incorporating a clinical information model, J. Biomed. Inform 141 (2023) 104360. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].C. D Liu Jinghui, Nguyen A, Verspoor K, Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities, J. Biomed. Inform (2023). [DOI] [PubMed] [Google Scholar]

[R12] [12].Chen J, Engelhard M, Henao R, Berchuck S, Eichner B, Perrin EM, Sapiro G, Dawson G, Enhancing early autism prediction based on electronic records using clinical narratives, J. Biomed. Inf (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Lu Y, Liu X, Du Z, Gao Y, Wang G, MedKPL: a heterogeneous knowledge enhanced prompt learning framework for transferable diagnosis, J. Biomed. Inform (2023). [DOI] [PubMed] [Google Scholar]

[R14] [14].Frei J, Kramer F, Annotated dataset creation through large language models for non-English medical NLP, J. Biomed. Inform (2023). [DOI] [PubMed] [Google Scholar]

[R15] [15].Pan Q, Zhao F, Chen X, Chen D, A method for extracting tumor events from clinical CT examination reports, J. Biomed. Inf 142 (2023) 104371, 10.1016/j.jbi.2023.104371 [published Online First: Epub Date]. [DOI] [PubMed] [Google Scholar]

[R16] [16].P. B Kugic Amila, Schulz S, Kreuzthaler M, Embedding-based terminology expansion via secondary use of large clinical real-world datasets, J. Biomed. Inform (2023). [DOI] [PubMed] [Google Scholar]

[R17] [17].R. D Denecke Kerstin, Sentiment analysis of clinical narratives: a scoping review, J. Biomed. Inform (2023). [DOI] [PubMed] [Google Scholar]

[R18] [18].de la Iglesia I, Chocró P, de Maeztu G, Gojenola K, Atutx A, An open source corpus and automatic tool for section identification in Spanish health records, J. Biomed. Inf (2023). [DOI] [PubMed] [Google Scholar]

[R19] [19].Lee YC, Jung S-H, Kumar A, Shim I, Song M, Kim MS, Kim K, Myung W, Park W-Y, Won H-H, ICD2Vec: mathematical representation of diseases, J. Biomed. Inf (2023). [DOI] [PubMed] [Google Scholar]

[R20] [20].Kumar V, Stubbs A, Shaw S, Uzuner Ö, Creation of a new longitudinal corpus of clinical narratives, J. Biomed. Inf 58 (2015) S6–S10, 10.1016/j.jbi.2015.09.018 [published Online First: Epub Date]. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Stubbs A, Uzuner Ö, Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus (1532–0480 (Electronic)). [DOI] [PMC free article] [PubMed]

[R22] [22].Mahajan D, Liang JJ, Tsou CH, Toward understanding clinical context of medication change events in clinical narratives, in: AMIA Annual Symposium Proceedings 2021, Vol. 2021, American Medical Informatics Association, p. 833. [PMC free article] [PubMed] [Google Scholar]

[R23] [23].Devlin J, Chang MW, Lee K, Toutanova K, Bert: pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018. Oct 11. [Google Scholar]

[R24] [24].Yang X et al. , Gatortron: a large clinical language model to unlock patient information from unstructured electronic health records, arXiv preprint arXiv: 2203.03540, 2022. [Google Scholar]

[R25] [25].Johnson AE, Pollard TJ, Shen L, et al. , MIMIC-III, a freely accessible critical care database, Sci. Data 3 (1) (2016) 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] [26].Vaswani A, Shazeer N, Parmar N, et al. , Attention is all you need, Adv. Neural Inf. Proces. Syst 30 (2017). [Google Scholar]

[R27] [27].Mimic-extract: a data extraction, preprocessing, and representation pipeline for mimic-iii, in: Proceedings of the ACM Conference on Health, Inference, and Learning, 2020. [Google Scholar]

[R28] [28].für Kommunikation FdZ, Standardisierung von Forschungsinformationen im Projekt Kerndatensatz Forschung.

[R29] [29].Li X, Wen Q, Lin H, Jiao Z, Zhang J, Overview of CCKS 2020 task 3: named entity recognition and event extraction in Chinese electronic medical records, Data Intell 3 (3) (2021) 376–388, 10.1162/dint_a_00093 [published Online First: Epub Date]. [DOI] [Google Scholar]

[R30] [30].Miranda-Escalada A, Gonzalez-Agirre A, CodiEsp: Clinical Case Coding in Spanish Shared Task (eHealth CLEF 2020), eHealth CLEF, 2020. [Google Scholar]

PERMALINK

Clinical natural language processing for secondary uses

Yanjun Gao

Diwakar Mahajan

Özlem Uzuner

Meliha Yetisgen

1. Introduction

Table 1.

2. Papers related to contextualized medication event extraction and classification

3. Papers related to progress note understanding – Assessment and plan reasoning

4. Papers presenting current trends and foci in clinical NLP

4.1. Papers related to clinical text classification

4.2. Papers related to other topics

5. Conclusion

6. Statement of significance

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Clinical natural language processing for secondary uses

Yanjun Gao

Diwakar Mahajan

Özlem Uzuner

Meliha Yetisgen

1. Introduction

Table 1.

2. Papers related to contextualized medication event extraction and classification

3. Papers related to progress note understanding – Assessment and plan reasoning

4. Papers presenting current trends and foci in clinical NLP

4.1. Papers related to clinical text classification

4.2. Papers related to other topics

5. Conclusion

6. Statement of significance

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases