Abstract
There are significant variabilities in clinicians’ guideline-concordant documentation in asthma care. However, assessing clinicians’ documentation is not feasible using only structured data but requires labor intensive chart review of electronic health records. Although the national asthma guidelines are available it is still challenging to use them as a real-time tool for providing feedback on adhering documentation guidelines for asthma care improvement. A certain guideline element, such as teaching or reviewing inhaler techniques, is difficult to capture by handcrafted rules since it requires contextual understanding of clinical narratives. This study examined a deep learning based natural language model, Bidirectional Encoder Representations from Transformers (BERT) coupled with distant supervision to identify inhaler techniques from clinical narratives. The BERT model with distant supervision outperformed the rule-based approach and achieved performance gain compared with the BERT without distant supervision.
Keywords: deep learning, natural language processing, documentation variations, adherence to asthma guidelines
I. Introduction
Asthma is the most common chronic illness among children as well as one of the five most burdensome adult chronic diseases in United States (US), causing significant morbidity and cost [1–3]. In asthma care, there are significant variabilities in clinicians’ documentation and guideline adherence [4–6] – i.e., documentation regarding patients’ asthma-related conditions in asthma guidelines. However, assessing current clinicians’ documentation to support improvement of guideline-concordant care is not feasible using only structured data (administrative data or billing codes) but requires manual chart review of electronic health records (EHRs), which is labor intensive and costly.
The 2007 National Asthma Education and Prevention Program (NAEPP) asthma guidelines provides guidance for improved asthma management using asthma control, factors with control, and medications [6]. Since these guideline elements of NAEPP are not available through structured data, the advanced techniques, such as natural language processing (NLP), is required to mitigate the issues of manual chart review. Some guideline elements are relatively straightforward (e.g., asthma medications, day/night time symptoms) and can be identified by handcrafted rules based on keywords and description patterns. However, a certain guideline element requires contextual (semantic) understanding to capture from EHR free text. One typical case is inhaler techniques – i.e., teaching patients how to use an asthma inhaler or reviewing their inhaler use. For example, “A patient received asthma education and instruction in appropriate metered-dose inhaler technique.”; “Discussed 3rd neb treatment here versus one upon home.” The first example is correct inhaler techniques but the second one is not regarding teaching or reviewing inhaler techniques but discussing efficacy of neb treatment, which has to be differentiated from true inhaler techniques.
Recent advances of deep learning created Bidirectional Encoder Representations from Transformers (BERT), a pretrained NLP framework that has given promising results on a variety of NLP tasks. A deep learning model requires a large data to learn the tasks properly in order to perform well. Often this is a critical bottleneck in the clinical domain due to high cost of generating labeled data. A distant supervision, which produces labeled data using rules or heuristics to train the model, has been successfully used to alleviate this issue [7, 8].
This study applied a BERT to examine the feasibility of identifying asthma inhaler techniques from EHR free text (clinical notes) and further investigated the effect of distant supervision. The performance of BERT model is also compared with a rule-based approach.
II. Background
Rule-based NLP techniques have been successfully applied in asthma research with high performance [9–13]. A rule-based approach has been used widely in the clinical domain to implement existing criteria with expert knowledge. It is relatively flexible to customize and tolerant to imbalance data. However, a rule-based approach needs significant effort until it reaches to high performance [14, 15]. Recent advances in the deep learning area have resulted in break-through performance and also enabled generalizability using the pre-trained language model to different tasks [16]. A BERT is a current state-of-the art language model which is pretrained on deep bidirectional representations on large amount of unlabeled text, and the pretrained BERT has been used successfully in many NLP tasks including text classification [17]. To further enhance the utility of deep learning, various approaches can be also used; distant supervision to create weakly labeled data in order to increase the training data without manual labeling process [7, 8], and cost sensitivity to tackle a data imbalance problem [18]. This study compares the performance of identifying inhaler techniques in clinical notes between a rule-based and BERT model and examines issues of those models through actual examples. Moreover, we implemented BERT with distant supervision coupled with cost sensitivity to deal with costly manually labeled data and imbalance data distribution.
III. Materials and methods
A. Data
The study data consists of two data sets: 1) manual chart reviewed data – 1,039 clinical notes of 300 patients with asthma diagnosis, randomly selected from the Olmsted County birth cohort (2016 – 2018). We used notes from 200 patients as a training set (n=724) and notes from the other 100 patients (n=315) as a test set. Two physicians performed chart review and annotated guideline elements based on 2007 NAEPP guidelines; and 2) weakly labeled data (distant supervision) – 27,363 clinical notes from 800 patients with asthma diagnosis that were randomly selected from the Olmsted County birth cohort (2008 – 2018) for the training set. The guideline elements were labeled by handcrafted rules (see subsection B) instead of manual chart review (i.e., weakly labeled). This data is to train the BERT model with distant supervision. We use the same test set as in the manual chart reviewed data to compare the performance of inhaler technique identification. For both data, we used the contents in History of Present Illness and Impression/Report/Plan sections since majority of teaching and reviewing inhaler techniques reside in these sections.
B. Rule-based model
The rules were developed by using common patterns based on textual markers (i.e., keywords relevant to asthma guideline elements) and evaluated against manual chart review as a gold standard. The keywords were initially provided by domain experts and updated and refined iteratively as we developed rules on the training set (Table I). We implemented rules under the framework of MedTaggerIE [19], a clinical NLP pipeline developed by Mayo Clinic. The performance of the NLP algorithm was evaluated in a document level (i.e., whether a guideline element is recorded or not in the clinical note).
Table I.
Rules to identify inhaler techniques.
| Keywords | Rules |
|---|---|
| 1) observ*, reassess*, review*, demonstrat*, check*, educat*, teach*, taught, explain*, reinforce*, discuss*, instruction, constraints, how to use | I. Combination of keyword 1 and 2 (e.g., reviewed inhaler use) |
| 2) inhaler, MDI, neb, nebulizer, optichamber, spacer | |
| 3) techniques, administrations, dosing, guidance | II. Combination of keyword 1, 3, and 4 (e.g., teach daily medication technique) |
| 4) asthma/rescue/daily/preventive/control medication, ICS, list of maintenance & rescue medications |
C. BERT model
A basic model ‘bert-base-uncased’ was used due to its good performance with less computational requirements. The model was built with the addition of a dropout layer (p=0.1) and a linear classification layer with cross entropy loss function to perform binary classification (i.e., presence or absence of inhaler technique). The BERT tokenizer was used to tokenize clinical notes and padded all the input sentences (maximum sequence length = 256) as all the sentences in our data had tokens less than 256. Each sentence is labelled as presence or absence of inhaler technique and then a document-level classification was performed by examining sentences within a given document – i.e., presence of inhaler technique if exists any sentences with inhaler technique; absence of inhaler technique if there is no sentence with inhaler technique in a given document. We implemented a cyclical learning rate with triangular mode scheduler ranging between lower bound of 2e-5 and upper bound of 5e-5 with a step size of 2500 and initialized at 3e-5 learning rate. The model was trained, validated (10 epochs), and tested on manual chart reviewed data. A part of the training set (12%) was used as validation set. The data sets are highly imbalanced; in the training set (sentences), only 0.4% of them is presence of inhaler techniques and the test has 0.7% presence of inhaler techniques. The cost sensitivity approach [18] was used to deal with the highly imbalanced data; we set the weights in our cross entropy loss function and incorporated it into our BERT model to penalize more towards misclassification on minority samples.
D. BERT model with distant supervision (BERT-DS)
Fig 1 shows an overview of the process. The weakly labeled data, which used a rule-based model (subsection B) to label presence or absence of inhaler technique, were used to train the BERT model (as in subsection C). This training set (sentences) is significantly imbalanced with 0.14% of presence of inhaler techniques necessitating the use of cost sensitivity to penalize more on the minority misclassification. The cost sensitive weights were determined experimentally to [0.52, 5.52]. The model was trained for 6 epochs with the learning rate of 3e-5 and tested on the same manual chart reviewed data as in the regular BERT model (subsection C).
Fig. 1.

BERT with distant supervision (BERT-DS)
IV. Results
The performance of inhaler technique identification was compared among three different models: rule-based, BERT (trained on the small manual chart reviewed data), and BERT-DS (distant supervision; trained on the large weakly labeled data). We calculated precision, recall, F1-score, and accuracy (document-level) on the test set as evaluation metrics.
Table II shows the performance among different models. All models produced high precision (>=0.90) but recall was relatively lower than precision. The use of cost sensitivity on both BERT and BERT-DS increased recall. The BERT trained on the small data had similar overall performance (F1-score and accuracy) to the rule-based model. When the BERT was trained on the larger weakly labeled data (BERT-DS), it produced higher performance in F1-score and accuracy than those of the rule-based and the BERT model trained on the small data.
Table II.
Performance of inhaler technique identification. Numbers in parenthesis are without cost sensitivity.
| Metrics | Rules | BERT | BERT-DS |
|---|---|---|---|
| Precision | 0.90 | 1.0 (1.0) | 1.0 (1.0) |
| Recall | 0.82 | 0.73 (0.57) | 0.82 (0.73) |
| F1-score | 0.86 | 0.84 (0.72) | 0.90 (0.84) |
| Accuracy | 96% | 96% (94%) | 97% (96%) |
Actual cases of inhaler technique identification among different models were compared and analyzed (Table III). Both BERT and BERT-DS were able to correctly identify cases that are false positive (FP) by the rule-based model (Example A). These cases require to understand semantic of free text, which is difficult to handle by handcrafted rules. BERT-DS was able to identify cases missed by BERT on the small data (Example B) but there are opposite cases as well (Example C). Also, there are cases that all models failed to correctly identify inhaler techniques (Example D). The more diverse training data seem to be required for BERT to identify these challenging cases.
Table III.
Comparision of inhaler techniqe idenfification among different models (BERT and BERT-DS are with cost sensitivity).
| Examples | Rules | BERT | BERT-DS |
|---|---|---|---|
| A. Absence of inhaler technique | FP | Correct | Correct |
| • “Discussed 3rd neb treatment here versus one upon home” | |||
| • “We reviewed her medications and discussed labeling the Flovent … for a daily inhaler and albuterol” | |||
| • “Discussed better treatment of her allergies by use of daily Claritin … during the summer months and use of albuterol inhaler …” | |||
| B. Presence of inhaler technique | |||
| • “Discussed 6 puffs equals the same as a nebulizer” | Correct | FN | Correct |
| • “Discussed with mother regarding access to albuterol inhaler and they would like to try” | |||
| C. Presence of inhaler technique | FN | Correct | FN |
| • “He met with our nurse to review his technique” | |||
| • “Our nurse, went in before me and checked on his technique and compliance” | |||
| D. Presence of inhaler technique | FN | FN | FN |
| • “Mom voices no concerns about his technique in using the inhaler” | |||
| • “It was confirmed that Mother is giving XXX the medication correctly and using a spacer” | |||
FP: false positive, FN: false negative
V. Discussion
Reviewing or teaching asthma inhaler techniques is one of guideline elements in NAEPP to assess clinicians’ adherence in asthma care. This requires manual chart review or necessitates advanced AI techniques to understand semantics in clinical free text in order to automate the process. This study addresses this need by developing rule-based and deep learning-based model (BERT). A rule-based model is transparent and easily customizable but labor intensive requiring expert knowledge. It is less likely suffered from imbalance data but difficult to capture semantics of true inhaler techniques, causing false positive cases (relatively lower precision than BERT models). This is because rules are based on presence of keywords (e.g., discuss, neb) but lack contextual understanding to discern unrelated cases as seen in Table III A. A BERT model demonstrated a capability to overcome the limitation of a rule-based model but it requires large enough data to outperform the rules. As can be seen in Table II, a BERT model trained on the small data set (same data as the rule-based model) showed similar performance to the rule-based model even though BERT can avoid labor-intensive rule development. Also, BERT, like other machine learning models, suffered from imbalanced data and required a way to deal with it (e.g., applying cost sensitivity). A distant supervision is a promising way to generate large data without costly manual annotation for the training set. BERT-DS using weakly labeled data outperformed both a rule-based and plain BERT on the small data, and may be further improved when applied on larger data.
Interestingly, the patterns or semantics learned by BERT on the small data seem to be different from BERT-DS, which is evident from several examples in Table III Example B and C. This may be because the weakly label data generated by rules biased on explicit existence of review/teaching indications and inhaler keywords (e.g., reviewed inhaler use, teach rescue medication techniques), whereas the small data (manual chart reviewed data) contain relatively more portion of implicit cases. All models failed to identify certain cases that do not contain explicit inhaler and indication of review or teaching but require rather semantic inference (Table III, example D). More data with diverse semantic expressions would be needed for BERT to better learn these problems. A hybrid approach with combination of rules and BERT may be considered to filter in only informative data (i.e., sentences that contain potential indications of inhaler techniques preprocessed by rules) to train the BERT, which may diminish the issue of imbalanced data and improved the performance.
VI. Conclusion
A deep learning model, BERT with distant supervision (i.e., trained on weakly labeled data) demonstrated the capability to identify inhaler techniques, which require semantic understanding in clinical narratives, and outperformed both the rule-based model and BERT on the small data. The cost sensitivity functionality was able to effectively handle the imbalanced class distribution and improve the identification performance. With a distant supervision approach, we may alleviate costly manual chart review to generate the training data required in most deep learning-based model. The proposed approach might be a potential alternative to a rule-based model and combined together to further improve the performance.
Acknowledgements
This study was supported by NIAID R21 AI142702, NHLBI R01 HL126667 and NIA R01 AG068007.
Contributor Information
Bhavani Singh Agnikula Kshatriya, Division of Digital Health Sciences, Mayo Clinic, Rochester MN, USA.
Elham Sagheb, Division of Digital Health Sciences, Mayo Clinic, Rochester MN, USA.
Chung-Il Wi, Community Pediatrics and Adolescent Medicine, Mayo Clinic, Rochester MN, USA.
Jungwon Yoon, Department of Pediatrics, Myongji Hospital, South Korea.
Hee Yun Seol, Pusan National University Yangsan Hospital South Korea.
Young Juhn, Community Pediatrics and Adolescent Medicine, Mayo Clinic, Rochester MN, USA.
Sunghwan Sohn, Division of Digital Health Sciences, Mayo Clinic, Rochester MN, USA.
References
- [1].C. f. D. Control and Prevention, “Vital signs: asthma prevalence, disease characteristics, and self-management education: US, 2001–2009,” Morbidity and mortality weekly report, vol. 60, no. 17, p. 547, 2011. [PubMed] [Google Scholar]
- [2].Lethbridge-Çejku M and Vickerie JL, “Summary health statistics for US adults; National Health Interview Survey, 2003,” 2004. [PubMed] [Google Scholar]
- [3].Stanton MW and Rutherford M, The high concentration of US health care expenditures. Agency for Healthcare Research and Quality; Rockville (MD), 2006. [Google Scholar]
- [4].Mold JW et al. , “Implementing asthma guidelines using practice facilitation and local learning collaboratives: a randomized controlled trial,” The Annals of Family Medicine, vol. 12, no. 3, pp. 233–240, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Yee AB, Fagnano M, and Halterman JS, “Preventive asthma care delivery in the primary care office: missed opportunities for children with persistent asthma symptoms,” Academic pediatrics, vol. 13, no. 2, pp. 98–104, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Yawn BP, Rank MA, Cabana MD, Wollan PC, and Juhn YJ, “Adherence to asthma guidelines in children, tweens, and adults in primary care settings: a practice-based network assessment,” in Mayo Clinic Proceedings, 2016, vol. 91, no. 4: Elsevier, pp. 411–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Wang Y et al. , “A Deep Representation Empowered Distant Supervision Paradigm for Clinical Information Extraction,” arXiv preprint arXiv:1804.07814, 2018. [Google Scholar]
- [8].Su P, Li G, Wu C, and Vijay-Shanker K, “Using distant supervision to augment manually annotated data for relation extraction,” PloS one, vol. 14, no. 7, p. e0216913, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Wi C-I et al. , “Natural Language Processing for Asthma Ascertainment in Different Practice Settings,” The Journal of Allergy and Clinical Immunology: In Practice, vol. 6, no. 1, pp. 126–131, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Wi C-I et al. , “Application of a Natural Language Processing Algorithm to Asthma Ascertainment: An Automated Chart Review,” American Journal of Respiratory And Critical Care Medicine, vol. 196, no. 4, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Kaur H et al. , “Automated chart review utilizing natural language processing algorithm for asthma predictive index,” BMC pulmonary medicine, vol. 18, no. 1, p. 34, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Sohn S et al. , “Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions,” Journal of the American Medical Informatics Association, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Sohn S et al. , “Ascertainment of Asthma Prognosis Using Natural Language Processing from Electronic Medical Records,” Journal of Allergy and Clinical Immunology, vol. 141, no. 6, pp. 2292–2294, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Juhn Y and Liu H, “Artificial intelligence approaches using natural language processing to advance EHR-based clinical research,” Journal of Allergy and Clinical Immunology, vol. 145, no. 2, pp. 463–469, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Wang Y et al. , “A clinical text classification paradigm using weak supervision and deep representation,” BMC medical informatics and decision making, vol. 19, no. 1, p. 1, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Howard J and Ruder S, “Universal language model fine-tuning for text classification,” arXiv preprint arXiv:1801.06146, 2018. [Google Scholar]
- [17].Devlin J, Chang M-W, Lee K, and Toutanova K, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [Google Scholar]
- [18].Madabushi HT, Kochkina E, and Castelle M, “Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data,” arXiv preprint arXiv:2003.11563, 2020. [Google Scholar]
- [19].Liu H et al. , “An information extraction framework for cohort identification using electronic health records,” presented at the AMIA Summits Transl Sci Proc, San Francisco, CA, 2013 Mar, 2013 [PMC free article] [PubMed] [Google Scholar]
