Summary
Objectives : Owing to the rapid progress of natural language processing (NLP), the role of NLP in the medical field has radically gained considerable attention from both NLP and medical informatics. Although numerous medical NLP papers are published annually, there is still a gap between basic NLP research and practical product development. This gap raises questions, such as what has medical NLP achieved in each medical field, and what is the burden for the practical use of NLP? This paper aims to clarify the above questions.
Methods : We explore the literature on potential NLP products/services applied to various medical/clinical/healthcare areas.
Results : This paper introduces clinical applications (bedside applications), in which we introduce the use of NLP for each clinical department, internal medicine, pre-surgery, post-surgery, oncology, radiology, pathology, psychiatry, rehabilitation, obstetrics, and gynecology. Also, we clarify technical problems to be addressed for encouraging bedside applications based on NLP.
Conclusions : These results contribute to discussions regarding potentially feasible NLP applications and highlight research gaps for future studies.
Keywords: Natural language processing, medical application, chatbot, randomized controlled trial, social media
1 Introduction
Electronic health/medical records (referred to as EHR in this study) are rapidly replacing paper-based records in hospitals worldwide. Natural language processing (NLP) techniques have gained importance in the medical field. Because NLP is a hot topic in computer science, the number of medical NLP studies is increasing each year dramatically.
Despite the large number of studies, only a few practical studies have validated medical NLP applications in real-world settings. Studies using randomized controlled trials (RCTs), which have the highest medical evidence, are rare. In the PubMed search for “NLP” + “RCT” or “Clinical trial,” we could find few studies only [ 1 2 3 4 ]. Instead of RCT studies, several studies employed a retrospective study using EHR big data: screening of diseases, case classification, incident detection, etc. [ 5 6 7 8 ]. However, unlike medical image software, these systems have not been commercialized as a product. A similar trend can be observed in the approved applications of the Food and Drug Administration (FDA) as artificial intelligence (AI) systems 1 . Most were audiology devices, and no medical systems related to NLP were found.
In summary, NLP has been actively studied, but there is still a gap between basic research and practical product development. This raises several questions, including what has medical NLP achieved in each medical field, and what is the burden for practical use of NLP? To clarify these questions, this study investigates what clinical/medical NLP has achieved in different clinical/medical fields.
This review aims to provide a guide for the NLP specialist who does not know medical informatics well enough. The scope of this paper is related to studies that have the potential to directly contribute to daily clinical practice, which we call bedside applications, consisting of internal medicine, pre-surgery, post-surgery, oncology, radiology, pathology, psychiatry, rehabilitation, obstetrics, and gynecology, etc. This paper introduces existing ready-to-use systems used in the above fields and summarizes its current methodology and performance. Finally, we mention future potential NLP applications not only for hospital use but also for patient use.
2 Bedside Applications
We provide an overview of how far NLP can be applied to outpatient and inpatient diagnosis, treatment, or management in each department. Historically, shared tasks have been one of the effective ways for researchers to drive fundamental innovations in the clinical NLP [ 9 ]. This is a competitive platform where organizers present a technically challenging and clinically meaningful task along with the dataset, gold standards, and evaluation criteria. In the early days, simple tasks were chosen, such as classifying patient records based on smoking status [ 10 ]. These days, shared tasks deal with far more complex problems, such as temporal relationship recognition among clinical events in discharge summaries [ 11 ], risk factor identification in longitudinal series of progress notes [ 12 ], and clinical decision support [ 13 14 15 ]. Over time, reproducibility of solutions and techniques found in shared tasks have been demonstrated by researchers, which has promoted advancements in clinical NLP.
We surveyed how far NLP applications have been proven to be replicable in real-world clinical practice. We made no limitations on hospital departments in searching publications. We referred to (i) reviews and systematic reviews published in 2017 or later and (ii) original research articles published in 2020 or later on NLP applications for each hospital department. We searched PubMed for publications using the keyword “natural language processing” for reviews and systematic reviews, and “natural language processing”, and a hospital department name together for original research articles. Because this article is not a systematic review, we focused on studies that can directly contribute to daily clinical practice. Although NLP is also helpful in research-oriented applications, such as cohort building with patient identification or phenotyping [ 16 ], evidence generation using clinical free-text [ 17 18 19 ], or semi-automation of meta-analysis [ 20 ] and systematic review [ 21 22 23 ], these are beyond the scope of this article.
2.1 Applications in Different Departments
NLP-based technology has enabled information extraction (IE) from various unstructured free-text documents such as clinic letters, progress notes, discharge summaries, and test reports. This technology can improve care quality in multiple departments, which has been demonstrated mainly in retrospective studies and sometimes in prospective studies [ 24 25 26 27 ]. NLP performance has also been validated in multicenter studies [ 28 , 29 ]. See also Table 1 for details of the NLP systems introduced below.
Table 1.

Summary of bedside NLP application studies. BART = Bidirectional Auto-Regressive Transformer, BERT = Bidirectional Encoder Representations of Transformers, CNN = Convolutional Neural Network, EL = Entity Linking, GBDT = Gradient Boosting Decision Tree, LSTM = Long Short-Term Memory, NER = Named Entity Recognition, NN = Neural Network, RCT = Randomized Controlled Trial, RoBERTa = Robustly Optimized BERT Pretraining Approach, SVM = Support Vector Machine, T5 = Text-to-Text Transfer Transformer
*Papers published in 2020 or later.
Table 1 (continued).

Summary of bedside NLP application studies. BART = Bidirectional Auto-Regressive Transformer, BERT = Bidirectional Encoder Representations of Transformers, CNN = Convolutional Neural Network, EL = Entity Linking, GBDT = Gradient Boosting Decision Tree, LSTM = Long Short-Term Memory, NER = Named Entity Recognition, NN = Neural Network, RCT = Randomized Controlled Trial, RoBERTa = Robustly Optimized BERT Pretraining Approach, SVM = Support Vector Machine, T5 = Text-to-Text Transfer Transformer
*Papers published in 2020 or later.
Internal Medicine
NLP aids in the prevention, early diagnosis, treatment, and prognostic prediction of a wide range of diseases, such as cardiovascular, endocrine, metabolic, hepatobiliary, and neurological diseases [ 30 ].
- (i) Disease prevention. NLP can identify risk factors, estimate risk, or predict events of disease development or readmissions [ 12 , 31 , 32 ]. Wang et al. automatically calculated CHA 2 DS 2 -VASc and HAS-BLED, the risk scores for the cerebral stroke of atrial fibrillation patients, by a rule-based approach. They also identified patients with a high risk of cerebral stroke with positive predictive values of 0.92”1.00 [ 33 ]. Buchan et al. analyzed clinical notes of patients without a history of coronary artery disease (CAD) with named entity recognition (NER) and support vector machine (SVM), and identified patients with later development of CAD with F1-score of 0.774 [ 34 ]; 
- (ii) Early diagnosis. NLP can help clinicians recognize diseases out of their specialty that might otherwise be misdiagnosed or overlooked without proper transfer. Chase et al. achieved area under a receiver operating characteristic curve (ROC-AUC) of 0.94 in classifying patients with and without multiple sclerosis using NER and Naïve Bayes classifiers. They also identified patients suspected of undiagnosed multiple sclerosis [ 35 ]; 
- (iii) Treatment support. Clinical decision support tools to summarize patient clinical information and suggest treatment are beginning to be realized. Seol et al. integrated a clinical decision support tool into the EHR system for pediatric asthma outpatients, which warns of the risk of acute exacerbation and recommends an optimal treatment plan based on free-text and structure data in the EHR [ 25 ]. An RCT demonstrated improvement of patient outcomes and significantly reduced physicians' workload for manual chart review. 
Pre-surgery
NLP has the potential to aid in identifying clinical conditions of preoperative, perioperative, and postoperative patients [ 36 , 37 ]. In preoperative settings, NLP can (i) evaluate surgical indications and (ii) reduce the workload of preoperative assessment. Wissel et al. implemented an automatic NLP scoring system in the EHR system that identifies epileptic outpatients with indications of surgery with SVM. The system achieved ROC-AUC of 0.79 in recommending operation [ 24 ]. Fonferko-Shadrach et al. developed an NLP system to review clinic letters and automatically extract symptoms, diagnosis, and medication history of preoperative patients. The system was based on an existing entity linking tool and demonstrated F1-score of 0.911 [ 38 ].
Post-surgery
Perioperatively and postoperatively, NLP contributes to continuous quality improvement efforts. NLP can identify complications and their details in unstructured free-text clinical records, even if they are not codified with ICD-10 (International Classification of Diseases -10 th revision) [ 29 , 39 ]. Bucher et al. identified surgical site infections (SSIs) with an NLP pipeline that parses and extracts information from clinical notes reaching ROC-AUC of 0.912. The system also determined SSI subgroups based on the depth, the wound condition, and the outcome [ 29 ]. Furthermore, surgical outcomes can also be automatically extracted from unstructured free-text using NLP, which aids labor-intensive manual chart review. In orthopedics, hip dislocation after total hip arthroplasty can be detected [ 40 ]. Tibbo et al. developed an NLP system to automatically determine Vancouver classification of periprosthetic femur fractures with the sensitivity of 0.786 and specificity of 0.948 [ 41 ].
Oncology
Oncology is another department where NLP plays an important role [ 30 , 42 ].
- (i) IE and cancer registration. NLP helps information retrieval on genetic, histological, and clinical characteristics of cancer, which is essential in clinical decision making and surveillance for effective public health interventions [ 43 , 44 ]. The information includes histological type, differentiation, Ki-67 index, TNM (classification of malignant tumors) staging, test findings, treatment, family history, and performance status. Benjamin et al. automatically extracted quantitative information of biomarkers from breast cancer pathology reports. They achieved an accuracy of 0.98 with a rule-based approach on top of an existing NER tool MetaMap [ 45 , 46 ]; 
- (ii) Clinical decision support. Precision medicine is a tailor-made clinical practice considering individual patient demographics and cancer genetic characteristics. NLP can recommend optimal treatment plans by searching biomedical articles and clinical trial repositories using patient information as a query [ 13 14 15 , 47 ]. Li et al. released a chatbot-style open access clinical decision support tool [ 48 ]. 
Radiology
NLP can contribute to multiple stages of the radiological clinical workflow [ 49 50 51 ].
- (i) Patient safety. NLP can help screen patients for contraindications to diagnostic imaging. Valtchinov et al. identified implants with contraindication to magnetic resonance imaging (MRI) in clinical notes with accuracies of 0.83”0.91 with NER [ 52 ]; 
- (ii) Imaging protocol recommendation. NLP can determine the use of contrast agents or optimal imaging protocols based on free-text in ordering comments or clinical records [ 53 54 55 56 ]. Chillakuru et al. developed a machine learning-based NLP system to recommend the use of contrast agents for brain and spinal MRI with accuracies of 0.83”0.85, of which an online demo is available. The system is based on term frequency-inverse document frequency vectorization, Gradient Boosting Decision Tree (GBDT), word embeddings, and shallow neural networks [ 54 ]. Some other scan optimization tools are commercially available [ 55 ]; 
- (iii) Automated radiology reporting. As the workload of diagnostic radiologists rapidly grows [ 57 ], automated radiology report generation in cooperation with computer vision AI is attracting attention [ 58 ]. Most studies have dealt with chest X-rays thus far, and further application to computed tomography (CT), MRI, and nuclear medicine is expected; 
- (iv) Surveillance. Radiology reports sometimes point out incidental findings. NLP can help prevent such findings from being missed by the attending physician by automatically sending alerts [ 49 50 51 ]. 
Pathology
NLP is helpful for both pathologists, whose responsibility is increasing in the era of personalized medicine, and clinicians, who refer to the diagnosis for treatment planning.
- (i) Support diagnosis. NLP can support pathologists by providing a better computer-based image retrieval system incorporating pathology reports [ 59 ] or by automated pathology reporting [ 60 ]; 
- (ii) Support clinical practice. Information on pathological diagnosis is used afterward by clinicians for better treatment strategy. NLP helps convert unstructured pathology reports into a structured form [ 45 , 57 , 61 ]. Kim et al. automatically extracted descriptions of a specimen, procedure, and pathologic diagnosis from pathology reports regardless of clinical departments. Their deep learning-based system, which uses Bidirectional Encoder Representations from Transformers (BERT), achieved accuracies of 0.9795”0.9839 [ 57 , 62 ]. At a more fine-grained level, Odisho et al. extracted seventeen types of information from prostate cancer pathology reports and achieved a weighted F1-score of 0.972 for categorical data and a mean accuracy of 0.930 for numerical data. They applied document classification with convolutional neural network (CNN) to categorical data and token classification with random forest to numerical data [ 61 ]. 
Psychiatry
In psychiatry, NLP can be used for IE from unstructured EHR and speech analysis on patient speech data [ 63 , 64 ]. NLP can help in the screening, early diagnosis, or severity estimation of various diseases such as depression [ 63 ], bipolar disorder [ 65 ], dementia [ 66 67 68 ], psychosis [ 69 , 70 ], and schizophrenia [ 71 ]. Dai et al. showed that NLP automatically diagnosed psychiatric diseases with free-text discharge summaries. Their system achieved a micro F1-score of 0.584 using multiple classifiers based on pre-trained Robustly Optimized BERT pretraining Approach (RoBERTa) models [ 72 , 73 ]. More fundamentally, NLP can contribute to psychiatric diagnostics. The Research Domain Criteria (RDoC), a potential counterpart of the Diagnostic and Statistical Manual of Mental Disorders (DSM), aims to integrate brain research knowledge into psychiatric disease classification [ 74 ], for which NLP shared tasks were held in 2016 and 2019 [ 75 , 76 ].
Rehabilitation
NLP is used in speech therapy by incorporating it into electronic devices for augmentative and alternative communication (AAC) [ 77 , 78 ]. Moreover, NLP has the potential to better unite the entire rehabilitation into the healthcare process by enabling the integration of the International Classification of Functioning, Disability, and Health (ICF) into EHRs, although there are still problems to overcome [ 79 ].
Obstetrics and Gynecology
Publications on bedside NLP applications were found in obstetrics and gynecology, although limited in number. Moon et al. showed the effectiveness of a rule-based NLP approach to highlight information discrepancies on surgical history due to misinterpretation during hospital transfer or improper copy and paste [ 80 ]. Sterckx et al. developed a birth risk prediction system to support preterm birth treatment, which was based on GBDT. NER-based features improved prediction performance when combined with structured data, with F1-score of birth prediction within 24 hours over 0.80 [ 81 ]. Barber et al. used NLP for prognostic prediction of ovarian cancer surgery, where postoperative readmission within 30 days was predicted with ROC-AUC of 0.70 using preoperative CT radiology reports [ 82 ].
Other Departments
NLP application is limited in ophthalmology and anesthesiology, where most AI systems are devoted to automated image diagnosis [ 83 ] or intraoperative monitoring with numerical data [ 84 ]. However, some studies combine NLP for unstructured free-text documents and AI for structured EHR data to predict patient prognosis [ 85 ]. NLP also has the potential to automatically pick up patient risk factors preoperatively.
As indicated above, NLP can improve the quality and efficiency of bedside clinical practice mainly by IE from unstructured free-text for various departments and diseases, a part of which has already been put to practical use.
2.2 Cross-cutting Applications
Some NLP applications are not limited to specific hospital departments but can be helpful widely. We introduce such applications in this subsection.
Text Simplification
Clinical texts can sometimes be difficult for patients or clinicians in other departments due to jargon or abbreviations. Automated text simplification with NLP can improve both patient-staff and staff-staff communication [ 86 , 87 ]. Moen et al. developed an NLP system to suggest replacements for abbreviations in Finnish clinical texts that are difficult for patients. The system achieved top-1 accuracy of 0.3464 with an unsupervised approach using cosine similarity of word embeddings [ 87 ].
Writing Support
Writing support with NLP can solve more fundamental problems that illegible clinical texts often result from a shortage of time of healthcare professionals for documentation.
- (i) Auto-completion. Auto-completion is a real-time suggestion of the next word or clinical concept while a healthcare professional writes a clinical document. Gopinath et al. developed an auto-completion system for the emergency department that suggests clinical conditions, symptoms, medications, and laboratory test items during the documentation of progress notes. The system reduced the keystroke burden by 67% [ 88 ]; 
- (ii) Auto-structuring. Some clinical documents such as progress notes or nursing notes are required to be in a structured form. NLP allows healthcare professionals to write such documents in an unstructured narrative by automatic editing and structuring. Moen et al. structured Finnish nursing notes into paragraphs whose headings were selected from standardized taxonomy with an accuracy of 0.71 using a Long Short-Term Memory (LSTM)-based sentence classification [ 89 ]. Furthermore, patient-staff conversations can be automatically structured once transcribed [ 90 , 91 ]; 
- (iii) Digital scribe. Digital scribe is different from dictation but similar to auto-structuring except for using voice input. That is, clinicians have only to record an outpatient conversation with some additional voice command, and the NLP system analyzes and summarizes the conversation and converts it into a clinical document in a predefined format [ 92 93 94 95 ]. Wang et al. developed a digital scribe system, which was 2.17”3.12 times faster than typing and dictation during patient encounter documentation [ 95 ]. 
3 Problems to be Addressed
3.1 Standard Annotation Schemes
Most NLP-based IE techniques adopted in the studies we referred to thus far use supervised machine learning, which requires high-quality, large datasets for training. Creating such datasets relies on manual annotation and thus increases the cost.
The formats and conventions of writing clinical documents differ not only in document types (e.g., EHRs, radiology reports, and nursing notes), but also in hospitals, departments, and even individual doctors. This textual diversity requires medical NLP researchers to create dedicated corpora for different applications by designing distinct annotation schemes. For instance, doctors often write disease name abbreviations in EHRs owing to the nature of personal note-taking, while radiology reports contain slightly more standardized terms because they are exchanged between diagnosing doctors and radiologists. Distributions of the appearing clinical terms in different types of clinical notes of different departments also deviate substantially, leading to uneven performance even when using an identical model architecture [ 96 ].
To adapt for a wide range of clinical note types with a single annotation scheme, some studies propose general-purpose annotation guidelines that define popular medical entities (e.g., diseases, drugs, tests, remedies, and body parts), as well as semantic relationships among them (e.g., “a medicine ‘is-subscribed-for’ a disease” and “a symptom ‘was-found-in’ an anatomical part”) [ 96 97 98 99 ]. However, this approach increases the complexity of the resulting annotation schemes, making training annotators expensive. One guideline of such schemes has more than 30 pages [ 100 ]; a temporal IE corpus provides a 63 pages-long guideline document [ 101 ].
The complexity of annotation schemes can also generate ambiguous boundaries between multiple entity types. For example, a general-purpose corpus [ 99 ] defines ‘Disease’ entity and ‘Signs or Symptoms’ entity separately, the inter-annotator agreement of which was relatively low probably because of the annotators' confusion.
3.2 Task Formulation
There are always several ways to formulate a medical/clinical problem into an NLP task. The difference in task formulation affects overall performance and how to create an annotated corpus. Careful design of an NLP task setting translated from clinical needs matters. Taking adverse drug event (ADE) detection as an example, we have at least three options in its task formulation: NER, relation extraction (RE), and text classification. We represent these different approaches in Figure 1 . The example sentence implies that a medication “nivolumab” prescribed for a “laryngeal cancer” adversely caused “liver damage.” As we mentioned below, each approach has its own benefits and drawbacks. This trade-off suggests that we must carefully design NLP approaches against given medical/clinical IE issues.
Fig. 1.

Different task formulations for the same task (a case of the adverse drug event task).
Named Entity Recognition
One way of identifying an ADE is to label which disease entities were adversely caused by medication. We can adopt NER approaches, e.g., by directly labeling “ADE” entities [ 102 103 104 ]. In our example (the top row of Figure 1 ), this approach distinguishes “liver damage” as an ADE from a non-ADE disease entity “laryngeal cancer.”
Another approach is to put a value “ADE” as an attribute to corresponding disease entities that are already labeled by the standard medical NER. In medical NER, some attributes can be assigned to an entity type, such as factuality (whether or not a disease was found in the patient) and schedule (when a medication was subscribed) [ 96 , 105 , 106 ]. In a shared-task workshop called Real-MedNLP 2 , Subtask 3 ADE proposes such a task formulation, where the medication and disease/symptom entities found in a document are to be labeled ADE TRIGGER or ADE, respectively.
Although these simple approaches do not encode the information about which drug caused the adverse symptom (i.e., causal ADE relationships), it still works for initial screening. Probably because the models need to recognize longer context to detect ADE entities, they perform relatively lower (around 0.6 F1-score [ 104 , 107 ]) than typical disease name recognition models (around 0.9 F1-score in the BC5CDR dataset [ 108 ]).
Relation Extraction
ADE detection tends to be defined as RE [ 103 , 104 , 107 , 109 ] so that the causality information of possible ADEs is directly encoded. In our example (the middle row of Figure 1 ), the drug entity “nivlumab” should be connected to “liver damage” by an ADE-causing relation (“CAUSED”). Additionally, the detail of medication treatments is often annotated, i.e., labeling drug-attribute relations from a drug entity to the expressions such as its amount or frequency of prescription.
However, it is not trivial even for professional clinicians to decide if a disorder written in a document was certainly caused by some drugs or not, which may result in difficulty in annotations [ 107 ]. In fact, the performance in detecting ADE relations, which distributes around 0.5 F1-score, were substantially low in comparison to drug-attribute relation extraction, most models of which achieved around 0.9 F1-score [ 103 , 107 , 109 ].
Text Classification
Another simple approach to ADE detection is classification-based IE, which detects ADE information mentioned in a document by sentence- [ 110 , 111 ] or document-level [ 111 ] classification. For instance, Ujiie et al. [ 111 ] proposed a machine learning-based method to first classify each sentence of case reports into ADE-suggesting or not, and then to identify the documents that report any ADEs based on the sentence-level classification results. In our case (the bottom row of Figure 1), the second sentence is to be marked as ADE-suggesting since it mentions an ADE (“liver damage”), and hence the whole document containing the two sentences is to be labeled ADE-reporting.
This coarse-grained approach allows end-users who report ADEs from clinical documents to investigate the position in a document that suggests potential ADEs. The document-level classification seems to work better than sentence-level classification (around 0.5 vs 0.8 F1-score in [ 111 ]), probably due to the difficulty in inter-sentence relation understanding.
3.3 Real-time Nature, UI, UX of NLP
Despite its potential, the effectiveness of NLP applications has rarely been prospectively examined except for a few studies such as decision support for surgery candidacy [ 24 ]. There is a huge gap between retrospective studies and prospective studies. To break this out, a real-time NLP platform including a clinician-friendly graphical interface [ 25 112 ] is required.
4 Future NLP and Conclusions
Sections 2 and 3 described the clinical NLP systems in the hospital. Beyond its use in hospitals, NLP applications can be combined with a variety of smart devices, such as smartphones, smart speakers, and smartwatches. In the final part of this review, we pick up emerging out-of-hospital NLP applications that will grow potentially in the near future. Their core concepts of services are twofold: (1) for the patient and (2) for medical staff.
Peer support and conversation agents are core NLP targets for patients. Peer support is based on human-to-human communication. Nowadays, direct human communication has been gradually replaced by virtual communication. Rouzfarakh et al., for example, formed a WhatsApp peer support group for burn patients to share their experiences [ 113 ]. Zhang et al. developed a WeChat platform for parents of children with congenital heart diseases [ 114 ]. Yonek et al. performed a Facebook-based RCT for tobacco and heavy alcohol use [ 115 ]. Yang et al. explored the effect of WeChat follow-up management on improving parents' mental status and quality of life (QoL) in premature newborns with patent ductus arteriosus [ 116 ]. Thus, these previous studies focus more on forming virtual communication spaces where one can connect with peers and exploring their effectiveness without NLP techniques. As the next step, NLP would be applied for peer recommendation, communication facilitation support, effectiveness measurement of peer support, etc.
Instead of human conversational agents, NLP systems (conversation agents or chatbots) can provide mental encouragement to patients. Conversation agents have been developed for depression patients [ 117 ] and smokers [ 118 ], while some other agents are dedicated to promoting physical activity, a healthy diet [ 119 ], communication support for children with autism spectrum disorder [ 120 ], QoL control for inflammatory bowel disease (IBD) [ 121 ]. A clinical issue in such chatbot development is how to ensure patient safety [ 122 ]. To deal with this problem, new solutions are explored. For example, a system named Addiction-Comprehensive Health Enhancement Support System [ 123 ] implemented a panic button: if the patient pressed it, the system sends an emergency message to pre-registered contact people.
For the hospital, education and navigation are the main NLP targets. Communication skill training is a typical example for both doctors [ 124 ] and nurses [ 125 ]. Medical navigation, not only geographic but also information-oriented, is useful in medical applications. A successful example is to provide relevant information inside clinical departments [ 126 ]. Chu et al. developed a Question-and-Answer (QA) system for hospital staff to inform the location of mobile medical equipment (electrocardiography machines), moving around a hospital [ 127 ].
To conclude this paper, we refer to the first two questions in Section 1: what has medical NLP achieved in each medical field, and what is the burden for practical use of NLP? On the one hand, NLP-powered approaches have already been applied to most bedside needs. The performance of such approaches reached around 0.9 ROC-AUC, demonstrating the “in-vitro” feasibility of NLP for bedside applications. On the other hand, we observed several limitations in real-world use of NLP: too much variety of corpus-annotation schemes and task formulation lead to low portability of existing solutions; and lack of user-interface/experience evaluations concerns clinicians about “in-vivo” usability. The potential coverage of medical NLP is yet broader than direct bedside applications, as introduced in this section. Realization of successful medical NLP applications may need a much larger-scale, interdisciplinary collaboration involving bedside staff, patients, UI/UX scholars, wearable Internet of Things devices, and NLP researchers.
Footnotes
References
- 1.
- 2.
- 3.Boyé M, Grabar N, Thi Tran M. Contrastive conversational analysis of language production by Alzheimer's and control people. Stud Health Technol Inform 2014;205:682-6. [PubMed]
- 4.
- 5.Mendonça EA, Haas J, Shagina L, Larson E, Friedman C. Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform 2005 Aug;38(4):314-21. [DOI] [PubMed]
- 6.
- 7.Shiner B, Neily J, Mills PD, Watts BV. Identification of Inpatient Falls Using Automated Review of Text-Based Medical Records. J Patient Saf 2020 Sep;16(3):e174-e178. [DOI] [PubMed]
- 8.Salmasian H, Freedberg DE, Abrams JA, Friedman C. An automated tool for detecting medication overuse based on the electronic health records. Pharmacoepidemiol Drug Saf 2013 Feb;22(2):183-9. [DOI] [PMC free article] [PubMed]
- 9.Chapman WW, Nadkarni PM, Hirschman L, D'Avolio LW, Savova GK, Uzuner Ö. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 2011 Sep-Oct;18(5):540-3. [DOI] [PMC free article] [PubMed]
- 10.Uzuner Ö, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc 2008 Jan-Feb;15(1):14-24. [DOI] [PMC free article] [PubMed]
- 11.Sun W, Rumshisky A, Uzuner Ö. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc 2013 Sep-Oct;20(5):806-13. [DOI] [PMC free article] [PubMed]
- 12.Stubbs A, Kotfila C, Xu H, Uzuner Ö. Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2. J Biomed Inform 2015 Dec;58 Suppl(Suppl):S67-S77. [DOI] [PMC free article] [PubMed]
- 13.Roberts K, Demner-Fushman D, Voorhees EM, Bedrick S, Hersh WR. Overview of the TREC 2020 Precision Medicine Track. Text Retr Conf 2020 Nov;1266. [PMC free article] [PubMed]
- 14.
- 15.
- 16.Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM Trans Comput Biol Bioinform 2019 Jan-Feb;16(1):139-53. [DOI] [PMC free article] [PubMed]
- 17.Ross EG, Shah N, Leeper N. Statin Intensity or Achieved LDL? Practice-based Evidence for the Evaluation of New Cholesterol Treatment Guidelines. PLoS One 2016 May 26;11(5):e0154952. [DOI] [PMC free article] [PubMed]
- 18.Leeper NJ, Bauer-Mehren A, Iyer SV, Lependu P, Olson C, Shah NH. Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes. PLoS One 2013 May 23;8(5):e63499. [DOI] [PMC free article] [PubMed]
- 19.
- 20.Norman CR, Leeflang MMG, Porcher R, Névéol A. Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy. Syst Rev 2019 Oct 28;8(1):243. [DOI] [PMC free article] [PubMed]
- 21.Gartlehner G, Wagner G, Lux L, Affengruber L, Dobrescu A, Kaminski-Hartenthaler A, Viswanathan M. Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study. Syst Rev 2019 Nov 15;8(1):277. [DOI] [PMC free article] [PubMed]
- 22.Schmidt L, Olorisade BK, McGuinness LA, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: A living review protocol. F1000Res 2020 Mar 25;9:210. [DOI] [PMC free article] [PubMed]
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.Tian Z, Sun S, Eguale T, Rochefort CM. Automated Extraction of VTE Events From Narrative Radiology Reports in Electronic Health Records: A Validation Study. Med Care 2017 Oct;55(10):e73-e80. [DOI] [PMC free article] [PubMed]
- 29.
- 30.Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019 Apr 27;7(2):e12239. [DOI] [PMC free article] [PubMed]
- 31.
- 32.
- 33.Wang SV, Rogers JR, Jin Y, Bates DW, Fischer MA. Use of electronic healthcare records to identify complex patients with atrial fibrillation for targeted intervention. J Am Med Inform Assoc 2017 Mar 1;24(2):339-44. [DOI] [PMC free article] [PubMed]
- 34.Buchan K, Filannino M, Uzuner Ö. Automatic prediction of coronary artery disease from clinical narratives. J Biomed Inform 2017 Aug;72:23-32. [DOI] [PMC free article] [PubMed]
- 35.Chase HS, Mitrani LR, Lu GG, Fulgieri DJ. Early recognition of multiple sclerosis using natural language processing of the electronic health record. BMC Med Inform Decis Mak 2017 Feb 28;17(1):24. [DOI] [PMC free article] [PubMed]
- 36.
- 37.Wyatt JM, Booth GJ, Goldman AH. Natural Language Processing and Its Use in Orthopaedic Research. Curr Rev Musculoskelet Med 2021 Dec;14(6):392-6. [DOI] [PMC free article] [PubMed]
- 38.
- 39.Selby LV, Narain WR, Russo A, Strong VE, Stetson P. Autonomous detection, grading, and reporting of postoperative complications using natural language processing. Surgery 2018 Dec;164(6):1300-5. [DOI] [PMC free article] [PubMed]
- 40.Borjali A, Magnéli M, Shin D, Malchau H, Muratoglu OK, Varadarajan KM. Natural language processing with deep learning for medical adverse event detection from free-text medical narratives: A case study of detecting total hip replacement dislocation. Comput Biol Med 2021 Feb;129:104140. [DOI] [PubMed]
- 41.
- 42.Sollini M, Bartoli F, Marciano A, Zanca R, Slart RHJA, Erba PA. Artificial intelligence and hybrid imaging: the best match for personalized medicine in oncology. Eur J Hybrid Imaging 2020 Dec 9;4(1):24. [DOI] [PMC free article] [PubMed]
- 43.Datta S, Bernstam EV, Roberts K. A frame semantic overview of NLP-based information extraction for cancer-related EHR notes. J Biomed Inform 2019 Dec;100:103301. [DOI] [PMC free article] [PubMed]
- 44.Tucker TC, Durbin EB, McDowell JK, Huang B. Unlocking the potential of population-based cancer registries. Cancer 2019 Nov 1;125(21):3729-3737. [DOI] [PMC free article] [PubMed]
- 45.
- 46.Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp 2001:17-21. [PMC free article] [PubMed]
- 47.
- 48.Li J, Chen H, Wang Y, Chen MM, Liang H. Next-Generation Analytics for Omics Data. Cancer Cell 2021 Jan 11;39(1):3-6. [DOI] [PMC free article] [PubMed]
- 49.Pons E, Braun LM, Hunink MG, Kors JA. Natural Language Processing in Radiology: A Systematic Review. Radiology 2016 May;279(2):329-43. [DOI] [PubMed]
- 50.
- 51.
- 52.Valtchinov VI, Lacson R, Wang A, Khorasani R. Comparing Artificial Intelligence Approaches to Retrieve Clinical Reports Documenting Implantable Devices Posing MRI Safety Risks. J Am Coll Radiol 2020 Feb;17(2):272-9. [DOI] [PubMed]
- 53.
- 54.
- 55.DSS Inc. Radiology Decision Support (RadWise®). Available from: https://www.dssinc.com/products/integrated-clinical-products/radwise-radiology-decision-support/
- 56.Letourneau-Guillon L, Camirand D, Guilbert F, Forghani R. Artificial Intelligence Applications for Workflow, Process Optimization and Predictive Analytics. Neuroimaging Clin N Am 2020 Nov;30(4):e1-e15. [DOI] [PubMed]
- 57.
- 58.Monshi MMA, Poon J, Chung V. Deep learning in generating radiology reports: A survey. Artif Intell Med 2020 Jun;106:101878. [DOI] [PMC free article] [PubMed]
- 59.
- 60.Pavlopoulos J, Kougia V, Androutsopoulos I. A survey on biomedical image captioning. In: Proceedings of the second workshop on shortcomings in vision and language; 2019. p.26-36.
- 61.Odisho AY, Park B, Altieri N, DeNero J, Cooperberg MR, Carroll PR, Yu B. Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation. JAMIA Open 2020 Oct 14;3(3):431-8. [DOI] [PMC free article] [PubMed]
- 62.Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); 2019. p. 4171”86.
- 63.DeSouza DD, Robin J, Gumus M, Yeung A. Natural Language Processing as an Emerging Tool to Detect Late-Life Depression. Front Psychiatry 2021 Sep 6;12:719125. [DOI] [PMC free article] [PubMed]
- 64.
- 65.
- 66.
- 67.Petti U, Baker S, Korhonen A. A systematic literature review of automatic Alzheimer's disease detection from speech and language. J Am Med Inform Assoc 2020 Nov 1;27(11):1784-97. [DOI] [PMC free article] [PubMed]
- 68.de la Fuente Garcia S, Ritchie CW, Luz S. Artificial Intelligence, Speech, and Language Processing Approaches to Monitoring Alzheimer's Disease: A Systematic Review. J Alzheimers Dis 2020;78(4):1547-74. [DOI] [PMC free article] [PubMed]
- 69.
- 70.Corcoran CM, Cecchi GA. Using Language Processing and Speech Analysis for the Identification of Psychosis and Other Disorders. Biol Psychiatry Cogn Neurosci Neuroimaging 2020 Aug;5(8):770-9. [DOI] [PMC free article] [PubMed]
- 71.Ratana R, Sharifzadeh H, Krishnan J, Pang S. A Comprehensive Review of Computational Methods for Automatic Prediction of Schizophrenia With Insight Into Indigenous Populations. Front Psychiatry 2019 Sep 12;10:659. [DOI] [PMC free article] [PubMed]
- 72.
- 73.
- 74.Cuthbert BN. The RDoC framework: facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology. World Psychiatry 2014 Feb;13(1):28-35. [DOI] [PMC free article] [PubMed]
- 75.Uzuner Ö, Stubbs A, Filannino M. A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry. J Biomed Inform 2017 Nov;75S:S1-S3. [DOI] [PMC free article] [PubMed]
- 76.Anani M, Kazi N, Kuntz M, Kahanda I. RDoC task at BioNLP-OST 2019. In: Proceedings of The 5th Workshop on BioNLP Open Shared Tasks; 2019. p. 216-26.
- 77.Elsahar Y, Hu S, Bouazza-Marouf K, Kerr D, Mansor A. Augmentative and Alternative Communication (AAC) Advances: A Review of Configurations for Individuals with a Speech Disability. Sensors (Basel) 2019 Apr 22;19(8):1911. [DOI] [PMC free article] [PubMed]
- 78.Higginbotham DJ, Lesher GW, Moulton BJ, Roark B. The application of natural language processing to augmentative and alternative communication. Assist Technol 2011 Spring;24(1):14-24. [DOI] [PubMed]
- 79.Maritz R, Aronsky D, Prodinger B. The International Classification of Functioning, Disability and Health (ICF) in Electronic Health Records. A Systematic Literature Review. Appl Clin Inform 2017 Dec 20;8(3):964-80. [DOI] [PMC free article] [PubMed]
- 80.
- 81.
- 82.Barber EL, Garg R, Persenaire C, Simon M. Natural language processing with machine learning to predict outcomes after ovarian cancer surgery. Gynecol Oncol 2021 Jan;160(1):182-6. [DOI] [PMC free article] [PubMed]
- 83.Lin WC, Chen JS, Chiang MF, Hribar MR. Applications of Artificial Intelligence to Electronic Health Record Data in Ophthalmology. Transl Vis Sci Technol 2020 Feb 27;9(2):13. [DOI] [PMC free article] [PubMed]
- 84.Connor CW. Artificial Intelligence and Machine Learning in Anesthesiology. Anesthesiology 2019 Dec;131(6):1346-59. [DOI] [PMC free article] [PubMed]
- 85.Gaskin GL, Pershing S, Cole TS, Shah NH. Predictive modeling of risk factors and complications of cataract surgery. Eur J Ophthalmol 2016 Jun 10;26(4):328-37. [DOI] [PMC free article] [PubMed]
- 86.
- 87.
- 88.Gopinath D, Agrawal M, Murray L, Horng S, Karger D, Sontag D. Fast, Structured clinical documentation via contextual autocomplete. In: Proceedings of the Machine Learning for Healthcare Conference; 2020. p. 842”70.
- 89.
- 90.Krishna K, Khosla S, Bigham JP, Lipton ZC. Generating SOAP notes from doctor-patient conversations using modular summarization techniques. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics; 2021. p. 4958”72.
- 91.
- 92.Blackley SV, Huynh J, Wang L, Korach Z, Zhou L. Speech recognition for clinical documentation from 1990 to 2018: a systematic review. J Am Med Inform Assoc 2019 Apr 1;26(4):324-38. [DOI] [PMC free article] [PubMed]
- 93.Coiera E, Kocaballi B, Halamka J, Laranjo L. The digital scribe. NPJ Digit Med 2018 Oct 16;1:58. [DOI] [PMC free article] [PubMed]
- 94.van Buchem MM, Boosman H, Bauer MP, Kant IMJ, Cammel SA, Steyerberg EW. The digital scribe in clinical practice: a scoping review and research agenda. NPJ Digit Med 2021 Mar 26;4(1):57. [DOI] [PMC free article] [PubMed]
- 95.Wang J, Lavender M, Hoque E, Brophy P, Kautz H. A patient-centered digital scribe for automatic medical documentation. JAMIA Open 2021 Feb 17;4(1):ooab003. [DOI] [PMC free article] [PubMed]
- 96.Yada S, Joh A, Tanaka R, Cheng F, Aramaki E, Kurohashi S. Towards a versatile medical-annotation guideline feasible without heavy medical knowledge: starting from critical lung diseases. In: Proceedings of the 12th Language Resources and Evaluation Conference; 2020. p. 4565-72.
- 97.
- 98.Patel P, Davey D, Panchal V, Pathak P. Annotation of a large clinical entity corpus. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; 2018. p. 2033-42.
- 99.Campillos L, Deléger L, Grouin C, Hamon T, Ligozat AL, Névéol A. A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT). Language Resources and Evaluation 2018;52(2):571-601.
- 100.Yada S, Aramaki E, Tanaka R, Cheng F, Kurohashi S. Medical/Clinical text annotation guidelines. figshare. Book; 2021. Available from: https://doi.org/10.6084/m9.figshare.16418811.v2
- 101.
- 102.Karimi S, Metke-Jimenez A, Kemp M, Wang C. Cadec: A corpus of adverse drug event annotations. J Biomed Inform 2015 Jun;55:73-81. [DOI] [PubMed]
- 103.Roberts K, Demner-Fushman D, Tonning JM. Overview of the TAC 2017 Adverse Reaction Extraction from Drug Labels Track. In: Proceedings of the Tenth Text Analysis Conference 2017.
- 104.Henry S, Buchan K, Filannino M, Stubbs A, Uzuner Ö. 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records. J Am Med Inform Assoc 2020 Jan 1;27(1):3-12. [DOI] [PMC free article] [PubMed]
- 105.Aramaki E, Yano K, Wakamiya S. MedEx/J: A One-Scan Simple and Fast NLP Tool for Japanese Clinical Texts. Stud Health Technol Inform 2017;245:285-8. [PubMed]
- 106.Huang Y, Lowe HJ. A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc 2007 May-Jun;14(3):304-11. [DOI] [PMC free article] [PubMed]
- 107.Jagannatha A, Liu F, Liu W, Yu H. Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0). Drug Saf 2019 Jan;42(1):99-111. [DOI] [PMC free article] [PubMed]
- 108.
- 109.
- 110.Negi K, Pavuri A, Patel L, Jain C. A novel method for drug-adverse event extraction using machine learning. Informatics in Medicine Unlocked 2019;17:100190.
- 111.Ujiie S, Yada S, Wakamiya S, Aramaki E. Identification of Adverse Drug Event-Related Japanese Articles: Natural Language Processing Analysis. JMIR Med Inform 2020 Nov 27;8(11):e22661. [DOI] [PMC free article] [PubMed]
- 112.
- 113.Rouzfarakh M, Deldar K, Froutan R, Ahmadabadi A, Mazlom SR. The effect of rehabilitation education through social media on the quality of life in burn patients: a randomized, controlled, clinical trial. BMC Med Inform Decis Mak 2021 Feb 22;21(1):70. [DOI] [PMC free article] [PubMed]
- 114.Zhang QL, Xu N, Huang ST, Chen Q, Cao H. WeChat-Assisted Preoperative Health Education Reduces Burden of Care on Parents of Children with Simple Congenital Heart Disease: a Prospective Randomized Controlled Study. Braz J Cardiovasc Surg 2021 Oct 17;36(5):663-9. [DOI] [PMC free article] [PubMed]
- 115.Yonek JC, Meacham MC, Ramo D, Delucchi K, Tolou-Shams M, Satre DD. The Relationship of E-Cigarette Use to Tobacco Use Outcomes Among Young Adults Who Smoke and Use Alcohol. J Addict Med 2021 Sep-Oct 01;15(5):421-4. [DOI] [PMC free article] [PubMed]
- 116.Yang B, Liu JF, Xie WP, Cao H, Chen Q. The effects of WeChat follow-up management to improve the parents' mental status and the quality of life of premature newborns with patent ductus arteriosus. J Cardiothorac Surg 2021 Aug 21;16(1):235. [DOI] [PMC free article] [PubMed]
- 117.Abd-Alrazaq AA, Alajlani M, Alalwan AA, Bewick BM, Gardner P, Househ M. An overview of the features of chatbots in mental health: A scoping review. Int J Med Inform 2019 Dec;132:103978. [DOI] [PubMed]
- 118.Almusharraf F, Rose J, Selby P. Engaging Unmotivated Smokers to Move Toward Quitting: Design of Motivational Interviewing-Based Chatbot Through Iterative Interactions. J Med Internet Res 2020 Nov 3;22(11):e20251. [DOI] [PMC free article] [PubMed]
- 119.
- 120.Cooper A, Ireland D. Designing a Chat-Bot for Non-Verbal Children on the Autism Spectrum. Stud Health Technol Inform 2018;252:63-8. [PubMed]
- 121.Zand A, Sharma A, Stokes Z, Reynolds C, Montilla A, Sauk J, Hommes D. An Exploration Into the Use of a Chatbot for Patients With Inflammatory Bowel Diseases: Retrospective Cohort Study. J Med Internet Res 2020 May 26;22(5):e15589. [DOI] [PMC free article] [PubMed]
- 122.Glowacki EM, Bernhardt JM, McGlone MS. Tailored texts: An application of regulatory fit to text messages designed to reduce high-risk drinking. Health Informatics J 2020 Sep;26(3):1742763. [DOI] [PubMed]
- 123.
- 124.Reiswich A, Haag M. Evaluation of Chatbot Prototypes for Taking the Virtual Patient's History. Stud Health Technol Inform 2019;260:73-80. [PubMed]
- 125.Shorey S, Ang E, Yap J, Ng ED, Lau ST, Chui CK. A Virtual Counseling Application Using Artificial Intelligence for Communication Skills Training in Nursing Education: Development Study. J Med Internet Res 2019 Oct 29;21(10):e14658. Erratum in: J Med Internet Res 2019 Nov 26;21(11):e17064. [DOI] [PMC free article] [PubMed]
- 126.Lee H, Kang J, Yeo J. Medical Specialty Recommendations by an Artificial Intelligence Chatbot on a Smartphone: Development and Deployment. J Med Internet Res 2021 May 6;23(5):e27460. [DOI] [PMC free article] [PubMed]
- 127.Chu ET, Huang ZZ. DBOS: A Dialog-Based Object Query System for Hospital Nurses. Sensors (Basel) 2020 Nov 19;20(22):6639. [DOI] [PMC free article] [PubMed]
