Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2019 Sep 24;26(12):1584–1591. doi: 10.1093/jamia/ocz158

Extracting entities with attributes in clinical text via joint deep learning

Xue Shi 1, Yingping Yi 2, Ying Xiong 1, Buzhou Tang 1,4,, Qingcai Chen 1, Xiaolong Wang 1, Zongcheng Ji 3, Yaoyun Zhang 3, Hua Xu 3
PMCID: PMC7647140  PMID: 31550346

Abstract

Objective

Extracting clinical entities and their attributes is a fundamental task of natural language processing (NLP) in the medical domain. This task is typically recognized as 2 sequential subtasks in a pipeline, clinical entity or attribute recognition followed by entity-attribute relation extraction. One problem of pipeline methods is that errors from entity recognition are unavoidably passed to relation extraction. We propose a novel joint deep learning method to recognize clinical entities or attributes and extract entity-attribute relations simultaneously.

Materials and Methods

The proposed method integrates 2 state-of-the-art methods for named entity recognition and relation extraction, namely bidirectional long short-term memory with conditional random field and bidirectional long short-term memory, into a unified framework. In this method, relation constraints between clinical entities and attributes and weights of the 2 subtasks are also considered simultaneously. We compare the method with other related methods (ie, pipeline methods and other joint deep learning methods) on an existing English corpus from SemEval-2015 and a newly developed Chinese corpus.

Results

Our proposed method achieves the best F1 of 74.46% on entity recognition and the best F1 of 50.21% on relation extraction on the English corpus, and 89.32% and 88.13% on the Chinese corpora, respectively, which outperform the other methods on both tasks.

Conclusions

The joint deep learning–based method could improve both entity recognition and relation extraction from clinical text in both English and Chinese, indicating that the approach is promising.

Keywords: clinical entity or attribute recognition, clinical entity-attribute relation extraction, joint deep learning

INTRODUCTION

Extracting clinical entities and their attributes, which includes 2 subtasks of clinical entity or attribute recognition and clinical entity-attribute relation extraction, is a fundamental task of natural language processing (NLP) in the medical domain. The 2 subtasks are usually tackled sequentially in a pipeline framework, where clinical entities and attributes are recognized at first, and then relations between clinical entities and attributes are extracted. The clinical entity or attribute recognition is a typical named entity recognition (NER) task, treated as a sequence labeling problem. A large number of state-of-the-art machine learning methods, such as conditional random fields (CRF),1 structural support vector machines,2 and deep learning methods,3 have been proposed for NER. The entity-attribute relation extraction is usually recognized as a classification problem, and a variety of machine learning methods, such as support vector machines and deep learning methods, have been proposed for this task.

One issue of the pipeline framework is that errors from entity recognition are unavoidably passed to relation extraction. To avoid error propagation, joint learning methods have been proposed to recognize named entities and their relations simultaneously in recent years, and have shown better performance than pipeline methods. Among them, joint deep learning methods, such as Miwa and Bansal’s method,4 are state-of-the-art methods up-to-date. Inspired by Miwa and Bansal’s method, we propose a novel joint deep learning method for clinical entity or attribute recognition and entity-attribute relation extraction. The method considers relation constraints between clinical entities and attributes, and introduces a combination coefficient to leverage the 2 subtasks. Experimental results on an existing English clinical corpus from SemEval-2015 Task 14 and a newly developed Chinese corpus show that the proposed method outperforms other existing state-of-the-art methods on both corpora, indicating the joint deep learning approach is promising in clinical NER and relation extraction tasks across languages.

RELATED WORK

Clinical entity or attribute recognition

Early systems for clinical entity recognition are often rule-based, such as MedLEE.5 In the past decade, machine learning methods, such as CRF6 and structural support vector machines,7 have been proposed for clinical NER and shown better performance on several annotated clinical corpora.1,8,9 Clinical entity recognition is solved as a sequence labeling problem by these machine learning methods. More recently, deep learning methods have been widely used for clinical entity recognition and achieve better performance with minimum feature engineering. For example, Jagannatha and Yu10 applied long short-term memory (LSTM) and gated recurrent unit to recognize clinical entities such as medications, diseases and the associated attributes, and obtained better performance than CRF. Liu et al3 presented a clinical entity recognition system based on bidirectional long short-term memory LSTM-CRF (Bi-LSTM-CRF), which is competitive with other state-of-the-art systems on the corpora of the 2010, 2012 and 2014 i2b2 NLP challenges.

Clinical relation extraction

Relation extraction in the clinical domain is usually recognized as a classification problem that needs to determine whether there is a relation between a pair of clinical entities or entity-attribute and what is the type of the relation if there exists any. Machine learning methods using various features, including lexical, semantic, syntactic, and medical domain ontology features,11,12 such as support vector machines,10,13 have been applied to clinical relation extraction. Recently, deep learning methods, such as convolutional neural networks (CNNs)14,15 and recurrent neural networks,16 such as LSTM,17,18 have been proposed for clinical relation extraction and achieved some successes. For example, Luo19 applied LSTM to extract relations between clinical concepts from clinical notes, which was comparable to the state-of-the-art systems of the i2b2/VA relation classification challenge. Luo et al20 further proposed segment CNNs, which identified relations between 2 concepts by simultaneously learning separate representations of text segments in a sentence: preceding, concept1, middle, concept2, and succeeding. Segment CNNs obtained state-of-the-art performance on the i2b2/VA challenge corpus.

Joint clinical entity recognition and relation extraction

There have been few studies on joint entity recognition and relation extraction in the medical domain. However, a number of joint learning methods have been proposed for entity recognition and relation extraction in the open domain in recent years. They may fall into 2 categories: joint learning methods relying on feature engineering21–24 and joint deep learning methods. The former ones include linear programming–based methods,22,23 the joint graphical method,23 and the incremental joint method.24 The latter ones handle entity recognition and relation extraction in a unified deep learning framework. For example, Miwa and Bansal4 proposed a novel end-to-end model by stacking Bi-LSTM for entity recognition and bidirectional tree-structured LSTM for relation extraction. This model was able to jointly represent both entities and relations with shared parameters and used entity information in relation extraction via entity pretraining and scheduled sampling. It relies on a good syntactic parser, which may not exist in some specific domains. Zheng et al25 presented a hybrid neural network which contains an encoder-decoder Bi-LSTM (Bi-LSTM-ED) module to recognize entities and a CNN module to extract relations. The contextual information of entities obtained in BiLSTM-ED is shared with the CNN module to improve the relation classification. Zheng et al26 proposed a novel tagging scheme to convert entity recognition and relation extraction into a sequence labeling task. By designing a new tag schema to present the information of entities and the relationships simultaneously, the model could directly extract triplets composed of 2 entities and their relation, not entities and relations separately. However, this tag schema has one disadvantage that it cannot handle the case that one entity has multiple relations with different entities.

MATERIALS AND METHODS

Tasks and datasets

English task and corpus

The SemEval-2015 challenge organized a task about clinical entity recognition and template slot filling (ie, Task 14), which was similar to clinical entity and attributes extraction. However, besides clinical entity recognition, the challenge focuses on normalizing attribute values instead of extracting mentions of attributes from text. Therefore, we construct a new corpus based upon the dataset of SemEval-2015 Task 14 by converting cues in the text into mentions of attributes and linking them with clinical entities as relations. In our study, the 298 clinical records in the training set of SemEval-2015 Task 14 are also used as training set, and the 133 clinical records in the development set of SemEval-2015 Task 14 are used as test set. To avoid relations span more than 1 sentence, we just simply split all records into sentences by consecutive “\n”s.

Chinese task and corpus

The Chinese entity-attribute extraction task is to extract information as shown in Figure 1, in which there are 3 clinical entities with 2 attributes in the given Chinese sentence “患者四肢皮肤有划伤, 双侧肱二、三头肌反射、膝腱反射正常” (There are scratches on patient’s arms and legs; bilateral biceps and triceps reflexes and patellar reflexes are normal.), that is, a problem (“划伤” [scratches]), 2 lab tests (“双侧肱二、三头肌反射” [bilateral biceps and triceps reflexes] and “膝腱反射” [patellar reflexes]) and 2 attributes(“四肢皮肤” [arms and legs] and “正常” [normal]), the first attribute “四肢皮肤” modifies the problem “划伤” (denoted by attrOf), and the second attribute “正常” modifies the 2 lab tests (denoted by valueOf).

Figure 1.

Figure 1.

Example sentence of Chinese clinical entities with their attribute extraction.

Following a similar guideline as the one used in the SemEval task, we annotate a Chinese corpus of 290 manually de-identified clinical documents (241 admission notes and 49 discharge summaries) that are randomly selected from a Tier 3A hospital in China. Two Chinese annotators with medical background are recruited for the annotation after appropriate training. A small number of documents (n = 10) are annotated by both annotators to calculate the interannotation agreement, which is 0.85 for entities and 0.87 for relations measured by kappa. In this study, 192 admission notes are used as training set and the remaining 49 admission notes and 49 discharge summaries are used as test set. The statistics of the corpora used for clinical entity or attribute recognition and entity-attribute relation extraction are listed in Table 1, in which #* is the number of asterisks.

Table 1.

Statistics of the corpora used for clinical entity or attribute recognition and entity-attribute relation extraction

Corpus Training Test Total
English #Document 298 133 431
#Sentence 2534 1714 4248
#Entity disorder 10 720 7693 18 413
#Attribute negation 1512 1054 2566
body 2160 1382 3542
severity 916 412 1328
change 397 269 666
uncertain 800 408 1208
conditional 300 273 573
subject 76 99 175
generic 79 78 157
#Relation attrOf 8013 5360 13 373
Chinese #Document 192 98 290
#Sentence 6150 2804 8954
#Entity problem 10 758 5198 15 956
labtest 5427 3230 8657
procedure 993 752 1745
medicine 503 468 971
#Attribute value 4475 2459 6934
negation 4773 1744 6517
body 2238 757 2995
temporal 803 362 1165
severity 409 175 584
change 150 83 233
uncertain 136 116 252
#Relation attrOf 9608 3841 13 449
valueOf 4347 2426 6773

System description

Inspired by Miwa and Bansal’s method,4 we propose a novel joint deep learning method to extract clinical entities with attributes from clinical text, denoted by JBCB (joint learning method using Bi-LSTM-CRF for entity or attribute recognition and Bi-LSTM for entity-attribute relation extraction). It consists of 4 components: (1) sentence encoding, (2) entity or attribute recognition, (3) entity-attribute relation (ie, <attribute, modify, entity>) extraction, and (4) joint learning. Figure 2 gives the architecture of JBCB through an example, where a negation “无 (ie, “no”)” is an attribute of (denoted by attrOf) a problem “浊音 (ie, “dullness”)”. The following sections present each component in detail.

Figure 2.

Figure 2.

Architecture of the proposed joint deep learning method. LSTM: long short-term memory.

Sentence encoding

Following previous studies,23 we use Bi-LSTM27,28 to encode an input sentence, including the following 3 layers. First, an input layer converts an input sentence into a sequence of words’ embeddings by dictionary lookup. Each word’s embeddings includes 2 parts: word embedding and part-of-speech (POS) tag embedding. Second, the Bi-LSTM layer takes the word embedding sequence of the input sentence as input and returns another sequence that represents context information of each word of interest at every position from 2 parallel directions respectively. Third, the concatenation layer concatenates the outputs of forward and backward LSTMs at every position. For example, given an input sentence s=w1w2wn with each word wt (1⩽t⩽n) represented as xt, which is the concatenation of the word embedding and POS tag embedding (ie, xt=[emb(wt); emb(pt)], where “;” is concatenation operation]), Bi-LSTM produces 2 sequences from forward and backward directions hf=hf1hf2hfn and hb=hb1hb2hbn. The 2 output sequences are then concatenated into h=h1h2hn, where ht=[hft; hbt]. In the case of wt, an LSTM unit, composed of 3 multiplicative gates: an input gate, a forget gate and an output gate, modulating how much the current, the previous, and output information should be included in the current time step t (as shown in Figure 3), takes xt, the previous hidden state vector ht-1 and the previous cell vector ct-1 as input and computes the current hidden state vector ht and the current cell vector ct based on the following formulas:

it=σ(Wixt+Uiht-1+bi) (1)
ft=σ(Wfxt+Ufht-1+bf) (2)
ot=σ(Woxt+Uoht-1+bo) (3)
ut=tanh(Wuxt+Uuht-1+bu) (4)
ct=itut+ftct-1 (5)
ht=ottanhct, (6)

whereσ·, tanh·, and denote the logistic function, hyperbolic tangent function, and element-wise multiplication, respectively; W and U are weight matrices; and b are bias vectors.

Figure 3.

Figure 3.

Structure of a long short-term memory unit.

Entity or attribute recognition

Clinical entity or attribute recognition is recognized as a sequence labeling problem, and the “BILOU” tag schema is used to represent the boundaries of entities, where B/I/L denote the beginning, inside, and last words of an entity, respectively; U denotes a single-word entity; and O denotes that a word does not belong to any entity. For example, the sentence “患者四肢皮肤有划伤, 双侧肱二、三头肌反射、膝腱反射正常” in Figure 1 is represented as “患/O 者/O 四/B-body 肢/I-body 皮/I-body 肤/L-body 有/O 划/B-problem 伤/L-problem , /O 双/B-labtest 侧/I-labtest 肱/I-labtest 二/I-labtest 、/I-labtest 三/I-labtest 头/I-labtest 肌/I-labtest 反/I-labtest 射/L-labtest 、/O 膝/B-labtest 腱/I-labtest 反/I-labtest 射/L-labtest 正/B-value 常/L-value”.

A 2-layer neural network, as shown on the top left of Figure 2, is applied for clinical entity or attribute recognition. It takes hidden state sequence h=h1h2…hn as input, and predicts the most possible tag sequence z=z1z2…zn as follows:

z=argmaxyexp(score(h,y))y'exp(score(h,y')), (7)

where scoreh,y=t=1nEt,yt+t=0n-1Tytyt+1, Et,yt=wytTht is the score of emiting yt at the t-th time, and Tytyt+1 is the score of transitioning from yt to yt+1. To keep consistent with Miwa and Bansal’s method, we add a hidden layer before the CRF layer.

Entity-attribute relation extraction

A 3-layer neural networks are proposed for entity-attribute relation classification, called Bi-LSTM (the right part in Figure 2). It first takes a pair of a target entity and a target attribute with context between them as input at the first layer, and obtains the corresponding representation using Bi-LSTM mentioned in section Sentence Encoding Each input word wt is represented by [ht; lt] (ht is the t-th item of the sentence encoding mentioned in section Sentence Encoding, and lt is the predicted tag embedding of wt), the corresponding representation of context fragment between the target entity and attribute is hfr=[hfr→; hfr←] (hfr→ and hfr← are the last outputs of LSTM from forward and backward directions), the target entity and attribute are represented by the average of hidden state vectors of words they contain, that is, hr1=jEA1hj and hr2=jEA2hj, where EA1 and EA2 represent indices of the words in the first and second entity or attribute. In this way, the target entity and attribute pair with context is represented by hr = [hfr; hr1; hr2]. At the second layer, hr is fed into a transform function to obtain a hidden state hrelation, and then the probability of a possible relation between the target entity and attribute pri(1⩽i⩽Nr) is calculated by the softmax function at the third layer as follows:

hrrelation=tanh(Wrhhr+brh) (8)
yr=Wrhrrelation+br (9)
pri=exp(yri)j=1Nrexp(yrj), (10)

where W are the weight matrices, b are bias vectors, and Nr is the total number of entity-attribute relation types including null that indicates no relation.

In a specific domain, there are constraints to limit relations between entities and attributes. For example, in the clinical domain, a “value” can be an attribute of a “labtest,” but cannot be an attribute of a “problem.” In this study, we further utilize clinical entity-attribute relation constraints to filter <attribute, modify, entity> candidates that could not appear. We list all constraints in Supplementary Appendix A.

Joint learning for entity with attribute extraction

In this study, we adopt cross-entropy as loss function. The loss functions for entity or attribute recognition Le and entity-attribute relation extraction Lr can be written as:

Le=-i=1|Ds|t=1|Si|ptilogpti' (11)
Lr=-i=1Drprilogpri', (12)

where |Ds| is the number of sentences in the training set, |Si| is the length of sentence Si, pti the probability of the gold standard label of wt in the i-th sentence,pti'is the probability of the predicted label of wt, |Dr| is the number of entity-attribute relations in the training set, pri is the gold standard label of the i-th entity-attribute relation in the training set, and pri'is the probability of the predicted label for the i-th entity-attribute relation. For joint learning, Le and Lr are combined together as the final loss function:

L=αLe+(1-α)Lr, (13)

where α is the combination coefficient. The largerα is, the greater influence entity or attribute recognition task has; otherwise, the greater influence entity-attribute extraction task has.

Experiments

We compare the proposed method with the following 2 pipeline baseline methods, BCR and BCB, and a joint method JBB:

  1. BCR uses Bi-LSTM-CRF for clinical entity or attribute recognition and a simple rule-based method for entity-attribute relation extraction. The rule-based method combines each attribute with the nearest entity.

  2. BCB uses Bi-LSTM-CRF for clinical entity or attribute recognition and Bi-LSTM for entity-attribute relation extraction.

  3. JBB is a variant of Miwa and Bansal’s method that uses Bi-LSTM instead of the bidirectional tree-structured LSTM for relation extraction.

In addition, we investigate the effects of combination coefficient α and clinical entity-attribute relation constraints on joint learning methods.

Preprocessing

The Stanford CoreNLP toolkit 3.829 is utilized for preprocessing, including tokenization, POS tagging, etc.

Evaluation

The performance of methods for clinical entity or attribute recognition and entity-attribute relation extraction are measured by standard micro-averaged precision (P), recall (R), and F1 score (F1).

Parameters

The hyperparameters of all deep learning models are set the same as follows except combination coefficient α, which is optimized by searching from 0.1 to 0.9 with an increment of 0.1 (the optimal value of α is 0.4 on the English corpus and 0.6 on the Chinese corpus):

  • Embedding dimensions: dim (emb (wt)) =50, dim (emb (pt)) = dim (emb (dt)) = dim (emb (lt)) =25

  • – Learning rate: 1e-6

  • – Regularization parameter: 1e-6

  • – Dropout probability: 0.3

  • – Bi-LSTM(encoding): dim (hft)= dim (hbt)=100, dim (htentity)=100

  • – Bi-LSTM(entity-attribute relation extraction): dim (hfr→)= dim (hfr←) =100, dim (hrrelation)=100

Chinese word embeddings are learned from all data of the 2017 China conference on knowledge graph and semantic computing (CCKS) clinical named entity recognition (CNER) challenge,30 by word2vec,31 English word embeddings are learned from the MIMIC-III (Medical Intervention Mart for Intensive Care III) clinical database32 by word2vec, and the other embeddings are randomly initialized. We first randomly divide the training set into 2 parts (80% for training and 20% for validation to optimize parameters), then train models on the whole training set using the optimized parameters.

RESULTS

The performance of different pipeline and joint methods is shown in Table 2, where the highest P, R, and F1 on the 2 corpora are highlighted in boldface. Overall, the joint methods show better performance than the pipeline methods on the 2 corpora. For example, the best joint method (ie, JBCB) achieves the highest F1 of 89.32% on clinical entity or attribute recognition and the highest F1 of 88.13% on entity-attribute relation extraction on the Chinese corpus, outperforming the best pipeline method (ie, BCB) by 0.48% and 1.03%, respectively. Similar to previous works, Bi-LSTM-CRF outperforms Bi-LSTM for NER in our implementations of both pipeline and joint learning methods.33

Table 2.

Comparison of different pipeline and joint methods

Corpus Method Entity or attribute recognition
Entity-attribute relation extraction
P R F1 P R F1
English BCR 0.7708a 0.7171 0.7430 0.3028 0.5320a 0.3859
BCB 0.7708a 0.7171 0.7430 0.6342 0.3989 0.4897
JBB 0.7280 0.6901 0.7085 0.6678a 0.3993 0.4998
JBCB 0.7504 0.7389a 0.7446a 0.6647 0.4034 0.5021a
Chinese BCR 0.8924 0.8843a 0.8884 0.8125 0.8681 0.8394
BCB 0.8924 0.8843a 0.8884 0.8537 0.8891a 0.8710
JBB 0.8716 0.8693 0.8705 0.8919a 0.8652 0.8783
JBCB 0.9027a 0.8839 0.8932a 0.8836 0.8790 0.8813a

BCB: a pipeline method using Bi-LSTM-CRF for clinical entity or attribute recognition and Bi-LSTM for entity-attribute relation extraction; BCR: a pipeline method using Bi-LSTM-CRF for clinical entity or attribute recognition and a simple rule-based method for entity-attribute relation extraction; F1: F1-score; JBB: joint learning method using Bi-LSTM for entity or attribute recognition and Bi-LSTM for entity-attribute relation extraction; JBCB: joint learning method using Bi-LSTM-CRF for entity or attribute recognition and Bi-LSTM for entity-attribute relation extraction; P: precision; R: recall.

a the best performance.

The detailed results of JBCB for each entity or attribute type and relation type are shown in Table 3. On the Chinese corpus, among the 4 types of entities, JBCB performs best on lab test, with an F1 of 99.83%, and worst on procedure, with an F1 of 77.46%. For attribute recognition, it performs best on negation, with an F1 of 97.57%, and worst on severity, with an F1 of 44.16%. On the English corpus, JBCB achieves an F1 of 79.95% on disorder, and performs best on severity, with an F1 of 77.43%, and worst on generic, with an F1 of 21.24%, among the 8 attribute types.

Table 3.

Detailed results of JBCB for each entity or attribute type and entity-attribute relation type

Corpus Subtask Type P R F1
English Entity or attribute recognition disorder 0.7767 0.8237 0.7995
negation 0.7543 0.7894 0.7714
body 0.6030 0.5640 0.5828
severity 0.7564 0.7932 0.7743
change 0.5382 0.6543 0.5906
uncertain 0.4428 0.4363 0.4395
conditional 0.5890 0.5092 0.5462
subject 0.7394 0.7354 0.7374
generic 0.3429 0.1538 0.2124
Entity-attribute relation extraction attrOf 0.6647 0.4034 0.5021
Chinese Entity or attribute recognition problem 0.8676 0.8683 0.8680
labtest 0.9438 0.9328 0.9383
procedure 0.8490 0.7174 0.7746
medicine 0.8333 0.6545 0.7777
value 0.9350 0.9240 0.9294
negation 0.9732 0.9782 0.9757
body 0.8769 0.9036 0.8900
temporal 0.9105 0.8149 0.8601
severity 0.5113 0.3886 0.4416
change 0.6615 0.5181 0.5811
uncertain 0.6891 0.7069 0.6979
Entity-attribute relation extraction valueOf 0.9128 0.9151 0.9140
attrOf 0.8696 0.8562 0.8605

F1: F1-score; JBCB: joint learning method using Bi-LSTM-CRF for entity or attribute recognition and Bi-LSTM for entity-attribute relation extraction; P: precision; R: recall.

In addition, we investigate the effects of combination coefficient α and clinical entity-attribute relation constraints on the 2 joint deep learning methods by removing each one or both on the Chinese corpus. From Table 4, we can see that when combination coefficient or clinical entity-attribute relation constraints is removed, F1 decreases more or less on clinical entity or attribute recognition and entity-attribute relation extraction, clinical entity or attribute recognition is more sensitive to the combination coefficient, and clinical entity-attribute relation extraction is more sensitive to the entity-attribute relation constraints. For example, when the combination coefficient is removed (ie, α is set to 0.5), the F1 of JBB on NER decreases 0.4%, while that on relation extraction decreases no more than 0.1%. When clinical entity-attribute relation constraints are removed, the F1 of JBB on NER decreases 0.15%, while that on relation extraction decreases 0.47%. Even so, the joint methods are still superior to the corresponding pipeline methods shown in Table 2.

Table 4.

Effects of combination coefficient α and clinical entity-attribute relation constraints on joint deep learning methods on the Chinese corpus

Settings Entity or attribute recognition
Relation classification
P R F1 P R F1
JBB 0.8716 0.8693 0.8705 0.8919 0.8652 0.8783
— Combination coefficient 0.8664 0.8667 0.8665 0.8901 0.8655 0.8776
— Relation constraints 0.8685 0.8696 0.8690 0.8772 0.8700 0.8736
— ALL 0.8676 0.8687 0.8682 0.8749 0.8703 0.8726
JBCB 0.9027 0.8839 0.8932 0.8836 0.8790 0.8813
— Combination coefficient 0.8967 0.8862 0.8914 0.8797 0.8780 0.8788
— Relation constraints 0.8871 0.8947 0.8918 0.8818 0.8708 0.8763
— ALL 0.8901 0.8636 0.8918 0.8823 0.8686 0.8754

F1: F1-score; JBB: joint learning method using Bi-LSTM for entity or attribute recognition and Bi-LSTM for entity-attribute relation extraction; JBCB: joint learning method using Bi-LSTM-CRF for entity or attribute recognition and Bi-LSTM for entity-attribute relation extraction; P: precision; R: recall.

DISCUSSION

Similar to previous works,20,21 the joint deep learning methods outperformed pipeline methods for clinical entity with attributes extraction in this study. The advantage of the joint deep learning lies in that some errors in one subtask may be fixed by another subtask. Two examples of errors in BCB fixed by JBCB are given in Supplementary Appendix B.

The reason why we replace the bidirectional tree-structured LSTM of Miwa and Bansal’s method for relation extraction by Bi-LSTM is that there is no public parser performing well enough for clinical text. Using Miwa and Bansal’s method, we obtain an F1 of 86.96% on NER and an F1 of 86.20% on relation extraction on the Chinese corpus, lower than JBB by 0.09% and 1.63%. Please see Supplementary Appendix C for the detailed information. Besides, CNN-based methods are not investigated in this study for relation extraction, because it is proved by previous works that LSTM and CNNs can obtain comparable performance in relation extraction tasks.34 As CRF takes full advantages of dependencies across output tags for NER, the methods adopting Bi-LSTM-CRF for NER outperform that adopting Bi-LSTM as shown in Table 2.

The reason why the clinical entity-attribute relation constraints can improve the joint deep learning models is that they filter out a large number of negative relation samples (ie, an entity-attribute pair of interest without relation). For example, when the constraints are not considered, 17 of 20 samples for 3 entities and 2 attributes in the sentence in Figure 1 are negative. All the negative samples can be filtered out by the constraints. When the constraints are added into the joint deep learning methods, more than 90% negative samples in the training set are filtered out.

The reason why the difference in performance on the 2 corpora is large is likely that the distance (number of words in English and characters in Chinese) between the entity and attribute in a relation in English clinical text is much longer than in Chinese clinical text. The distance between an entity and one of its attributes in English clinical text is 4.15, while 2.19 in Chinese clinical text.

To assess whether the F1 scores between methods are statistically significantly different, we conduct the following statistical test. From the test set, we randomly select 1000 sentences with replacement for 100 times, thus to generate 100 bootstrapping datasets. We then calculate F1 scores for any 2 different methods on each dataset, and apply the Wilcoxon signed rank test,35 a nonparametric test for paired samples, to determine whether the F1 scores between 2 methods are statistically significant (P < .05). For example, the statistical testing between JBCB and BCB shows that the difference between F1 scores from these 2 methods on entity-attribute relation extraction is statistically significant, with a P value of 3.90 × 10−18.

Although the proposed joint deep learning method (eg, JBCB) performs better than other related methods in comparison, it also outputs a number of errors. We look into the errors on the Chinese corpus and find the following. First, for NER, misrecognition of entities or attributes accounts for over 36% of all errors, while around 34% entities or attributes are correctly recognized which are labeled with incorrect semantic types. Moreover, around 30% entities or attributes are partially recognized which are labeled with correct semantic types. Second, in the case of relation extraction, more than 50% relations are not extracted because of misrecognized entities or attributes; about 43% relations are wrongly extracted because of incorrect entities or attributes; no more than 7% relations are not extracted even though the corresponding entities/attributes are correctly recognized. The main reasons for entity or attribute recognition errors in the system are ambiguities from numbers (the example of Error Type I), types (the example of Error Type II), and language expressions (the example of Error Type III). Most missed relations are related to entities or attributes of small number and may suffer from data imbalance. Therefore, most entity-attribute relation extraction errors are caused by entity or attribute recognition errors. Error samples are listed in Supplementary Appendix D.

In the future work, there may be 2 directions for further improvement: (1) integrating neural language models such as BERT (Bidirectional Encoder Representations from Transformers)36 into our method, which leverages the attention mechanism and obtained state-of-the-art performance on a set of benchmark NLP tasks in the open domain; and (2) considering all entity-attribute pairs in a sentence at the same time for clinical entity-attribute relation extraction.

CONCLUSION

In this study, we propose a novel joint deep learning method to extract clinical entities with attributes. It integrates Bi-LSTM-CRF and Bi-LSTM together in a unified framework and introduces constraints to limit clinical entity-attribute relations and a combination coefficient to leverage entity or attribute recognition and entity-attribute relation extraction. Experiments conducted on 2 corpora prove its effectiveness. Moreover, the proposed method is also suitable for similar tasks in other domains.

FUNDING

We gratefully acknowledge Beijing Baidu Netcom Science Technology Co., Ltd.

ACKNOWLEDGMENTS

This work is supported in part by grants: National Natural Science Foundations of China grant nos. U1813215, 61876052, and 61573118 (to BT); Special Foundation for Technology Research Program of Guangdong Province grant no. 2015B010131010 (to XW); Strategic Emerging Industry Development Special Funds of Shenzhen grant nos. JCYJ20170307150528934 (to XW) and JCYJ20180306172232154 (to BT); and Innovation Fund of Harbin Institute of Technology grant HIT.NSRIF.2017052 (to BT).

AUTHOR CONTRIBUTIONS

The work presented here was carried out in collaboration among all authors. XS, YY, YX, and BT designed the study, developed the methods, and conducted experiments on the Chinese dataset; XS, ZJ, and YZ contributed to the experiments on the English dataset; BT, XW, QC, and HX provided guidance and reviewed the manuscript critically. All authors have approved of the final manuscript.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

CONFLICT OF INTEREST STATEMENT

HX and The University of Texas Health Science Center at Houston have research-related financial interests in Melax Technologies, Inc.

Supplementary Material

ocz158_Supplementary_Data

REFERENCES

  • 1. Uzuner Ö, South BR, Shen S, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc 2011; 185: 552–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Tang B, Cao H, Wu Y, et al. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. BMC Med Inform Decis Mak 2013: 13; S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Liu Z, Yang M, Wang X, et al. Entity recognition from clinical texts via recurrent neural network. BMC Med Inform Decis Mak 2017; 17 (S2): 67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Miwa M, Bansal M. End-to-end relation extraction using LSTMS on sequences and tree structures. arXiv2016. Jun 8 [E-pub ahead of print].
  • 5. Friedman C, Alderson PO, Austin JH, et al. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1994; 12: 161–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Li D, Savova G, Kipper-Schuler K. Conditional random fields and support vector machines for disorder named entity recognition in clinical texts. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing; 2008: 94–5.
  • 7. Tang B, Cao H, Wu Y, et al. Clinical entity recognition using structural support vector machines with rich features. In: Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics; 2012: 13–20.
  • 8. Elhadad N, Pradhan S, Gorman S, et al. SemEval-2015 task 14: analysis of clinical text. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015); 2015: 303–10.
  • 9. Kelly L, Goeuriot L, Suominen H, et al. Overview of the CLEF eHealth evaluation lab 2016. In: International Conference of the Cross-Language Evaluation Forum for European Languages New York, NY: Springer; 2016: 255–66.
  • 10. Jagannatha AN, Yu H. Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of the Conference Association for Computational Linguistics North American Chapter Meeting; 2016: 473. [DOI] [PMC free article] [PubMed]
  • 11. Luo Y, Uzuner Ö, Szolovits P.. Bridging semantics and syntax with graph algorithms—state-of-the-art of extracting biomedical relations. Brief Bioinform 2017; 181: 160–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. De Bruijn B, Cherry C, Kiritchenko S, et al. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc 2011; 185: 557–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Roberts A, Gaizauskas R, Hepple M. Extracting clinical relationships from patient narratives. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing; Association for Computational Linguistics; 2008: 10–8.
  • 14. Santos C. D, Xiang B, Zhou B. Classifying relations by ranking with convolutional neural networks. arXiv2015. May 24 [E-pub ahead of print].
  • 15. Zeng D, Liu K, Lai S, Zhou G, Zhao J. Relation classification via convolutional deep neural network. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers; 2014: 2335–44.
  • 16. Zhang D, Wang D. Relation classification via recurrent neural network. arXiv2015. Dec 25.
  • 17. Xu Y, Mou L, Li G, et al. Classifying relations via long short term memory networks along shortest dependency paths. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing; 2015. 1785–94.
  • 18. Zhang S, Zheng D, Hu X, et al. Bidirectional long short-term memory networks for relation classification. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation; 2015: 73–8.
  • 19. Luo Y. Recurrent neural networks for classifying relations in clinical notes. J Biomed Inform 2017; 72: 85–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Luo Y, Cheng Y, Uzuner Ö, et al. Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes. J Am Med Inform Assoc 2018; 251: 93–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Roth D, Yih W. Global inference for entity and relation identification via a linear programming formulation[J]. Introduction to statistical relational learning, 2007: 553–80.
  • 22. Yang B, Cardie C. Joint inference for fine-grained opinion extraction. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics; Sofia, Bulgaria. 2013: 1640–9.
  • 23. Singh S, Riedel S, Martin B, et al. Joint Inference of entities, relations, and coreference. In: proceedings of the 2013 Workshop on Automated Knowledge Base Construction. New York, NY, USA: ACM; 2013: 1–6.
  • 24. Li Q, Ji H. Incremental joint extraction of entity mentions and relations. In: Proceedings of the 52st AnnualMeeting of the Association for Compu- 75 tational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics; Baltimore, Maryland, USA. 2014: 402–12.
  • 25. Zheng S, Hao Y, Lu D, et al. Joint entity and relation extraction based on a hybrid neural network. Neurocomputing 2017; 257: 59–66. [Google Scholar]
  • 26. Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B. Joint extraction of entities and relations based on a novel tagging scheme. arXiv2017. Jun 7.
  • 27. Gers FA, Schmidhuber J, Cummins F.. Learning to forget: continual prediction with LSTM. Neural Comput 2000; 1210: 2451–71. [DOI] [PubMed] [Google Scholar]
  • 28. Hochreiter S, Schmidhuber J.. Long short-term memory. Neural Comput 1997; 98: 1735–80. [DOI] [PubMed] [Google Scholar]
  • 29. Manning C, Surdeanu M, Bauer J, et al. The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations Stroudsburg, PA: Association for Computational Linguistics; 2014: 55–60.
  • 30. Hu J, Shi X, Liu Z, et al. HITSZ CNER: A hybrid system for entity recognition from Chinese clinical text. In: Proceedings of CEUR Workshop; 2017: 25–30. [Google Scholar]
  • 31. Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, et al. , eds. Advances in Neural Information Processing Systems 26. Red Hook, NY: Curran Associates, Inc; 2013: 3111–9. [Google Scholar]
  • 32. Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible critical care database. Sci Data 2016; 3: 160035.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv2015. Aug 9.
  • 34. Sahu SK, Anand A.. Drug-drug interaction extraction from biomedical texts using long short-term memory network. J Biomed Inform 2018; 86: 15–24. [DOI] [PubMed] [Google Scholar]
  • 35. Wilcoxon F. Individual comparisons by ranking methods In: Kotz S, Johnson NL, eds. Breakthroughs in Statistics (Springer Series in Statistics). New York, NY: Springer; 1992: 196–202. [Google Scholar]
  • 36. Devlin J, Chang M-W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding. arXIV 2019 May 24.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocz158_Supplementary_Data

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES