Neural Multi-Task Learning for Adverse Drug Reaction Extraction

Feifan Liu; Xiaoyu Zheng; Hong Yu; Jennifer Tjia

. 2021 Jan 25;2020:756–762.

Neural Multi-Task Learning for Adverse Drug Reaction Extraction

Feifan Liu ¹, Xiaoyu Zheng ¹, Hong Yu ², Jennifer Tjia ¹

PMCID: PMC8075418 PMID: 33936450

Abstract

A reliable and searchable knowledge database of adverse drug reactions (ADRs) is highly important and valuable for improving patient safety at the point of care. In this paper, we proposed a neural multi-task learning system, NeuroADR, to extract ADRs as well as relevant modifiers from free-text drug labels. Specifically, the NeuroADR system exploited a hierarchical multi-task learning (HMTL) framework to perform named entity recognition (NER) and relation extraction (RE) jointly, where interactions among the learned deep encoder representations from different subtasks are explored. Different from the conventional HMTL approach, NeuroADR adopted a novel task decomposition strategy to generate auxiliary subtasks for more inter-task interactions and integrated a new label encoding schema for better handling discontinuous entities. Experimental results demonstrate the effectiveness of the proposed system.

Introduction

Drug labels are intended to provide health care professionals with clear and concise prescribing information that will enhance patient safety at the point of care¹. That important information is in an unstructured format that greatly limits its potential for real-life clinical practices, thus automatic extraction of ADRs associated with relevant properties from narrative drug labels has drawn increasing attention in pharmacovigilance community², natural language processing community³ and the government⁴. In 2017, the U.S. Food and Drug Administration (FDA) and the U.S. National Library of Medicine (NLM) jointly organized a shared task entitled “Adverse Drug Reaction Extraction from Drug Labels” at the Text Analysis Conference (TAC-ADR), which further advanced text mining techniques for ADRs extraction from drug labels⁵. Our study focuses on the named entity recognition (NER) task, i.e. extracting ADRs and related concept modifiers (Severity, Factors, DrugClass, Negation, Animal), and relation extraction (RE) task, i.e. identifying the relations (Negated, Hypothetical, Effect) between ADRs and related concept modifiers. For example, in the text “Grade 3 cutaneous reactions”, “Grade 3” is a Severity, “cutaneous reactions” is an ADR, and there is an “Effect” relation between them.

Most of the existing systems for ADRs extraction exploited deep learning approaches which have shown promising results in many natural language processing (NLP) tasks⁶. For instance, Saldana explored convolutional neural networks (CNN) for detecting ADR relevant sentences⁷ and Alimova et al. utilized interactive attention neural network (IAN) to detect ADRs from biomedical texts⁸. To effectively train deep neural networks, however, it usually needs millions of labeled samples which are often prohibitively expensive to get in many real-life applications⁹. To address this challenge, semi-supervised methods based on co-training¹⁰ and neural network pre-training ¹¹ have been proposed respectively for extracting adverse drug reaction mentions from tweets. A popular alternative solution is Multi-Task Learning (MTL) ¹², which has been widely applied and led to successes across all applications of machine learning, including speech recognition ¹³, NLP ¹⁴, computer vision ¹⁵. MTL has also been applied to ADR extraction from social media texts^16,17.

Several prior works have developed hierarchical MTL ^18–20, which integrates supervised feedback from each task at different levels of a specific hierarchy, achieving better system performance than traditional MTL approaches. Hierarchical MTL can be seen as a seamless way to combine multi-task and cascaded learning which is especially helpful for NLP tasks with low-level tasks feeding into high-level ones ¹⁸. More recently, Sanh et al. further demonstrated the effectiveness of a hierarchically supervised multi-task learning on four related semantic tasks ²¹ without complex regularization schemes. Although conventional MTL approaches through shared components have been successfully applied in biomedical domain ^22,23, exploring the emerging hierarchical MTL in biomedical applications is still an untapped but promising area. There lacks an understanding of how effective the hierarchical MTL works and what adaptations are needed to increase its potential without compromising the system generalizability in the biomedical domain. In this work, we proposed a new hierarchical MTL system, NeuroADR, for efficient ADR extraction from narrative drug labels. Unlike the top performed system at TAC-ADR 2017, NeuroADR is end-to-end trainable and was able to achieve comparable performance without relying on any handcrafted heuristic rules.

Our contributions are the following: (1) We propose a new hierarchical MTL architecture combining different subtasks related to ADR extraction, a first study to evaluate the state-of-the-art hierarchical MTL in the biomedical domain. (2) The new NeuroADR model exploits task decomposition and coupling strategies allowing more inductive transfer connections among different tasks at different hierarchy levels for optimal learning capacity. (3) A new label encoding schema was proposed to better handle discontinuous entities. (4) We evaluate and give insights on the impact of each adaptation strategy.

Methods

In this section, we describe our NeuroADR architecture regarding different tasks and components involved in the hierarchical multitask learning process. The main idea is that low-level tasks (e.g. NER tasks) are supervised at the lower level of the hierarchy while keeping more complex tasks (e.g. RE task) at deeper layers, and the supervisions from low-level tasks can provide feedback to deeper-level supervisions. Note that NER subtasks of different levels of complexity could be put at different levels of the hierarchy so that simple subtasks (e.g. NER-ADR-C) can provide feedback to more complex subtasks (e.g. NER-ADR-D) as described below. The architecture of our model is shown in Figure 1, where red lines indicate the decoding process for each task and blue lines for hidden representation flowing interaction among tasks.

Words Embeddings

As shown in Figure 1, we explore three word embeddings: (1) the state-of-the-art BERT (Bidirectional Encoder Representations from Transformers) encoder, which has been shown to outperform ELMo on various NLP tasks ²⁴. (2) Domain Embedding: a customized word embedding ²⁵ trained through skip-gram setting using all PubMed open access articles, 99,700 EHR notes, and English Wikipedia articles in 2015. This embedding contains 3 billion tokens and the dimension is 200. (3) CharCNN: the convolutional neural network (CNN) based Character-level word embeddings ²⁶. Therefore, for any word w_i in a given input sentence (s = w₁,w_l,…,w_n), the embedding layer concatenates three embeddings as its final embeddings (e_Wi).

Task Decomposition

As mentioned earlier, we focus on NER and RE tasks in the context of ADR extraction. Motivated by Liebel et al.²⁷ who shows that adding auxiliary tasks as additional regularization can boost the performance of multi-task learning, we decompose original NER tasks into three tasks in the NeuroADR architecture (Figure 1): (1) NER-ADR-C for recognizing ADR continuous named entities; (2) NER-ADR-D for handling both continuous and discontinuous ADR named entities with a new encoding described later; (3) NER-Modifier for recognizing 5 types of modifier entities (Severity, Factors, DrugClass, Negation, Animal). The rationale behind this is that Modifier entities may prefer different hidden representations from ADR entities, and representations taking into account discontinuous entities may differ from the ones not. Through decomposition, those less complex but related subtasks fit well in the multi-task setting. The encoder for each task takes the word embeddings concatenated with the encoder outputs from the preceding tasks in the hierarchy(denoted by blue arrows in Figure 1) through multi-layer BiLSTM ²⁸ and derive an encoded sequence (eNER) into the final conditional random fields (CRF) layer for inferring the NER output.

Task Coupling

In the TAC-ADR task setting, Modifier concepts in drug labels are annotated as ground truth only if they are related to at least one ADR. Therefore, performing tasks to blindly recognize all the ADRs and Modifiers is prone to more false positives. The idea of task coupling is to do the modifier entity recognition and relation extraction at one shot. It is implemented as a typical sequence labeling task, where labels combining modifier type with relation type are predicted. What is special for this step is that each training example is specific to one target ADR, so only modifiers related to that ADR according to the ground truth get the entity labels and others will be labeled as "O" (outside). Similar to Xu et al. ²⁹, we couple modifier NER task with the RE task (“RE-Coupling” in Figure 1) where the RE-Coupling module is trained to simultaneously extract mentions of modifier concepts that are associated with a given ADR entity and classify the relation type between the recognized modifier and the given ADR. To this end, if there is more than one ADR in a sentence, we generate a separate training instance for each ADR entity in the sentence. Similar to other tasks, the encoder of the RE-Coupling task takes the word embeddings and other encoder outputs provided by other tasks in the hierarchy and derives an encoded sequence (e_RE-Coupling) into the CRF layer. We used traditional BIOUL (Begin, Inside, Outside, Unit, Last) tags together with modifier type and relation type as labels for modifier entities so that modifier entity and associated relations can be inferred simultaneously. For example, in “no vomiting”, the label for “no” would be “U-Negation_Negated”, indicating there is a “Negation_Negated” relation between “no” and the target ADR “vomiting”.

New Label Encoding for Discontinuous Entities

Traditional BIOUL labels for NER can only handle entities with continuous sequences of words. However, there are over 7% discontinuous entity mentions in the TAC-ADR data. To solve disjoint concepts, {B, I, O, HB, HI, DB, DI} was proposed ³⁰ where HB/HI refers to words that are shared by multiple concepts (head components) and DB/DI refers to words that belong to discontinuous concepts but not shared (discontinuous components). However, it is challenging and confusing in the decoding process to determine whether to combine each pair of identified HB/HI and DB/DI or not. In this study, we propose a new label encoding to handle discontinuous entities by adding Head (H) to indicate shared headwords, Discontinuous label (D) for discontinuous components and Tail label (T) for shared tail words, leading to the encoding schema of (B, I, O, U, L, H, T, D) combined with the entity types. Figure 2 shows an example of the new encoding BIOUL-HTD. With the new encoding, it is clear that we shouldn’t connect the shared “changes” with discontinuous “a-” due to Head and Tail labels while it is ambiguous for the previous BIOHD encoding.

Figure 2. — Example of a New Label Encoding. Entities in this example are: (a) electroretinographic changes; (b) ERG changes; (c) a-wave amplitude decrease; (d) b-wave amplitude decrease.

Experiments

Dataset and Evaluation Metrics

The training data from TAC-ADR 2017 consists of manually annotated 101 drug labels. It contains 15,722 mentions and 3228 relations. The official test set contains 99 drug labels with 13735 mentions and 2039 relations annotated. The official evaluation script is provided to calculate the primary metrics of the micro-averaged F1 score with an exact match. Due to space limits, a more detailed data description is referred to the TAC-ADR task overview ⁵.

Overall Performance of NeuroADR

We implemented the HMTL model proposed in ²¹ on our tasks using their standard NER and RE setup. For a fair comparison, we use our new label encoding so that discontinuous entities can also be handled. We also separately trained NER modules (NER-ADR-C, NER-ADR-D, and NER-Modifier) and the RE module (RE-Coupling) in the NeuroADR architecture to show the pipeline (sequentially perform NER and RE, with the output from the former as the input of the latter) results as shown in Table 1. We can see that NeuroADR outperforms both the ²¹ system and pipeline system across different metrics, leading to the best F1 of 81.53% for NER and 38.58% for RE.

Table 1.

Overall performance of NeuroADR.

	NER			RE
Model	Precision	Recall	F1	Precision	Recall	F1
(Sanh,2018)	78.84	78.28	78.56	44.54	28.87	35.03
Pipeline Approach	81.48	80.06	80.77	35.41	36.75	36.07
NeuroADR	82.45	80.63	81.53	42.05	35.64	38.58

Open in a new tab

Ablation Study of NeuroADR Performance

To better understand the effects of different adaptations proposed in NeuroADR, we conducted an ablation study as shown in Table 2. We ran the experiments in the following setting, i.e. starting with the NeuroADR architecture, we made only one change at a time. (1) BERT vs. ELMo (row A): replacing BERT with ELMo ³¹; (2) Domain Embedding vs. Glove ³² (row B): replacing domain embedding with Glove; (3) BIOUL-HTD vs. BIO-HD (row C): replace the proposed new encoding with the previous one in ³⁰; (4) Removing auxiliary task NER-Modifier (row D); (5) Merge NER-ADR-C with NER-Modifier into a single NER task (row E); (6) Replacing RE-Coupling with the RE module ²¹.

Table 2.

Ablation study of NeuroADR performance.

	NER			RE
Model	Precision	Recall	F1	Precision	Recall	F1
NeuroADR	82.45	80.63	81.53	42.05	35.64	38.58
Row A	82.43	77.00	79.62	38.15	27.99	32.29
Row B	80.87	81.40	81.14	38.74	37.86	38.30
Row C	82.82	79.18	80.96	41.14	34.69	37.64
Row D	81.83	79.69	80.75	40.60	32.43	36.06
Row E	83.28	78.33	80.73	40.90	30.13	34.70
Row F	82.36	80.14	81.24	46.81	32.01	38.02

Open in a new tab

We can see all the strategies we’ve exploited in NeuroADR contribute to the system’s performance positively, especially for adopting BERT (row A), task decomposition (row D and E), and the new encoding (row C). Domain embedding (row B) and task coupling (row F) also provide help in boosting the system’s performance.

Performance of NeuroADR per NER and RE Type

The statistics and performance of each NER and RE type are shown in Table 3. In the training set, the ADR mentions accounts for over 85%, which led to much better performance (F1 of 84.17%) compared with the modifier mentions. The performance for the Drug Class is pretty low (F1 of 40.7%), which is partially due to the complexities in their mention expressions (e.g. capitalized abbreviations tend to be recognized as Drug Class). It is worth mentioning that the model achieved the highest precision of 86.76% for the Animal mention, with F1 of 76.26%. As for RE, the performance on three types of relations is relatively low, yielding the F1 of 30%-44%. The Effect relation shows the worst performance, although there are a good amount of training examples available. In addition to the challenges posed by the language variations and ambiguities, there are some errors in the ground truth. For example, in the drug label of ADRMPAS, the sentence “Adempas may cause fetal harm” appears three times, but in the ground truth, the term “may” and “fetal harm” are only labeled once, which will adversely affect the performance for both NER and RE tasks.

Table 3.

Performance on the testing data per type with counts in both training and testing.

NER	Precision	Recall	F1	# in Testing	# in Training
ADR	84.52	83.83	84.17	12,693	13,795
Negation	72.93	56.64	63.76	173	98
Severity	65.14	57.45	61.05	947	934
Animal	86.76	68.02	76.26	86	44
Factor	69.73	70.26	69.99	562	602
Drug Class	50.93	33.89	40.70	164	249
Relation
Negated	45.41	33.03	38.24	288	163
Hypothetical	45.81	42.48	44.08	1,486	1611
Effect	35.64	27.67	30.89	1,181	1454

Open in a new tab

Discussion and Conclusion

We proposed a hierarchical multi-task learning system, NeuroADR, for extracting ADR information from drug labels. We found that task decomposition and the new encoding can improve the performance in the multi-task setting. Our model’s performance on NER is slightly lower than the best result (81.53% vs. 82.48) in TAC-ADR 2017, however, our HMTL based model is end-to-end trainable, and not relying on any external linguistic tools, and handcrafted heuristic rules as the best system ²⁹ does. The proposed model has great potential if the heuristic-based pre-processing can be integrated. As shown in Table 3, our model achieved the best state-of-the-art performance on recognizing Factor (F1 of 69.99% vs. 68%) and Negation (F1 of 63.76 vs. 62.3%) entities ⁵. The RE performance is low across all the teams where our model’s performance (38.58%) ranked the third.

Error analysis shows that the Drug Class entity performed worst with the F1 score of 40.70%, followed by Severity entity with the F1 score of 61.05%. Abbreviations introduced some confusion for the system when detecting Drug Class, e.g. "LABA" in "LABA, such vilanterol, …" is a drug class but is missed in our model predictions, while "AED" in "… was added to the current AED therapy" was incorrectly picked up as Drug Class by the model. We do find some annotation errors, e.g. the ground truth annotated "another 5-HT3 receptor antagonist alone" as Drug Class, and our model only recognized "5-HT3 receptor antagonist" which makes more sense. Among relations, the "Effect" relation type obtained the lowest F1 score of 30.89% and most cases are false negatives, resulting in a low recall of 27.67%.

There are limitations to this study. First, we only evaluated our model on one dataset and further validation is needed. However, the dataset we used is relatively complex including a good amount of discontinuous entities and NeuroADR demonstrates competitive performance comparing the state-of-the-art methods. Second, we made an empirical ad-hoc design for the hierarchy structure to engaging different subtasks in an HMTL setting, which may have limited the potential of mutual benefits of multiple learning tasks. We will explore a more systematic approach, e.g. incorporating a similar strategy from fully adaptive feature sharing ³³, so that the decomposed subtasks can be dynamically learned.

Figures & Table

References

1.Crowe B, Chuang-Stein C, Lettis S, Brueckner A. Reporting Adverse Drug Reactions in Product Labels. Drug Inf J. 2016;50(4):455–463. doi: 10.1177/2168479016628574. doi:10.1177/2168479016628574. [DOI] [PubMed] [Google Scholar]
2.Szarfman A, Tonning JM, Doraiswamy PM. Pharmacovigilance in the 21st century: new systematic tools for an old problem. Pharmacotherapy. 2004;24(9):1099–1104. doi: 10.1592/phco.24.13.1099.38090. [DOI] [PubMed] [Google Scholar]
3.Li Q, Deleger L, Lingren T, et al. Mining FDA drug labels for medical conditions. BMC Medical Informatics and Decision Making. 2013;13(1):53. doi: 10.1186/1472-6947-13-53. doi:10.1186/1472-6947-13-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Duggirala HJ, Tonning JM, Smith E, et al. Use of data mining at the Food and Drug Administration. J Am Med Inform Assoc. 2016;23(2):428–434. doi: 10.1093/jamia/ocv063. doi:10.1093/jamia/ocv063. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Roberts K, Demner-Fushman D, Tonning JM. TAC. 2017. Overview of the TAC 2017 Adverse Reaction Extraction from Drug Labels Track. [Google Scholar]
6.Young T, Hazarika D, Poria S, Cambria E. Recent Trends in Deep Learning Based Natural Language Processing. Published online August 9, 2017. Accessed February 12, 2019. https://arxiv.org/abs/1708.02709v8 .
7.Miranda DS. Automated Detection of Adverse Drug Reactions in the Biomedical Literature Using Convolutional Neural Networks and Biomedical Word Embeddings. arXiv:180409148 [cs, stat]. Published online April 24, 2018. Accessed July 21, 2020. http://arxiv.org/abs/1804.09148 .
8.Alimova I, Tutubalina E. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics; 2019. Detecting Adverse Drug Reactions from Biomedical Texts with Neural Networks; pp. 415–421. doi:10.18653/v1/P19-2058. [Google Scholar]
9.Zhang Y, Yang Q. An overview of multi-task learning. Natl Sci Rev. 2018;5(1):30–43. doi:10.1093/nsr/nwx105. [Google Scholar]
10.Gupta S, Gupta M, Varma V, Pawar S, Ramrakhiyani N, Palshikar GK. Co-training for Extraction of Adverse Drug Reaction Mentions from Tweets. arXiv:180205121 [cs]. Published online February 14, 2018. Accessed July 21, 2020. http://arxiv.org/abs/1802.05121 .
11.Gupta S, Pawar S, Ramrakhiyani N, Palshikar GK, Varma V. Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction mention extraction. BMC Bioinformatics. 2018;19(8):212. doi: 10.1186/s12859-018-2192-4. doi:10.1186/s12859-018- 2192-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Multitask Learning Caruana R. Machine Learning. 1997;28(1):41–75. doi:10.1023/A:1007379606734. [Google Scholar]
13.Deng L, Hinton G, Kingsbury B. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2013. New types of deep neural network learning for speech recognition and related applications: an overview; pp. 8599–8603. doi:10.1109/ICASSP.2013.6639344. [Google Scholar]
14.Collobert R, Weston J. Proceedings of the 25th International Conference on Machine Learning. ACM; 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning; pp. 160–167. Accessed October 27, 2014. http://dl.acm.org/citation.cfm?id=1390177 . [Google Scholar]
15.Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv:150601497 [cs]. Published online June 4, 2015. Accessed January 2, 2019. http://arxiv.org/abs/1506.01497 . [DOI] [PubMed]
16.Chowdhury S, Zhang C, Yu PS. Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18. Multi-Task Pharmacovigilance Mining from Social Media Posts; pp. 117–126. Published online 2018. doi:10.1145/3178876.3186053. [Google Scholar]
17.Gupta S, Gupta M, Varma V, Pawar S, Ramrakhiyani N, Palshikar GK. Multi-task Learning for Extraction of Adverse Drug Reaction Mentions from Tweets. In: Pasi G, Piwowarski B, Azzopardi L, Hanbury A, editors. Advances in Information Retrieval Lecture Notes in Computer Science. Springer International Publishing; 2018. pp. 59–71. [Google Scholar]
18.Søgaard A, Goldberg Y. 2016. Deep multi-task learning with low level tasks supervised at lower layers; pp. 231–235. doi:10.18653/v1/P16-2038. [Google Scholar]
19.Hashimoto K, xiong caiming, Tsuruoka Y, Socher R. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2017. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks; pp. 1923–1933. doi:10.18653/v1/D17-1206. [Google Scholar]
20.Sanabria R, Metze F. Hierarchical Multi Task Learning With CTC. Published online July 18, 2018. Accessed March 4, 2019. https://arxiv.org/abs/1807.07104v5 .
21.Sanh V, Wolf T, Ruder S. A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks. arXiv:181106031 [cs]. Published online November 14, 2018. Accessed March 4, 2019. http://arxiv.org/abs/1811.06031 .
22.Rawat BPS, Li F, Yu H. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’19. ACM; 2019. Naranjo Question Answering Using End-to-End Multi-task Learning Model; pp. 2547–2555. doi:10.1145/3292500.3330770. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Tran T, Kavuluru R, Kilicoglu H. A Multi-Task Learning Framework for Extracting Drugs and Their Interactions from Drug Labels. arXiv:190507464 [cs]. Published online May 17, 2019. Accessed December 10, 2019. http://arxiv.org/abs/1905.07464 .
24.Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:181004805 [cs]. Published online October 10, 2018. Accessed March 4, 2019. http://arxiv.org/abs/1810.04805 .
25.Jagannatha AN, Yu H. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics; 2016. Bidirectional RNN for Medical Event Detection in Electronic Health Records; pp. 473–482. Accessed March 4, 2019. http://www.aclweb.org/anthology/N16-1056 . [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Chiu J, Nichols E. Named Entity Recognition with Bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics. 2016;4:357–370. [Google Scholar]
27.Liebel L, Körner M. Auxiliary Tasks in Multi-task Learning. arXiv:180506334 [cs]. Published online May 16, 2018. Accessed May 21, 2019. http://arxiv.org/abs/1805.06334 .
28.Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural Architectures for Named Entity Recognition. Published online March 4, 2016. Accessed March 4, 2019. https://arxiv.org/abs/1603.01360v3 .
29.Xu J, Lee H-J, Ji Z, Wang J, Wei Q, Xu H. TAC. 2017. UTH_CCB System for Adverse Drug Reaction Extraction from Drug Labels at TAC-ADR 2017. [Google Scholar]
30.Dandala B, Mahajan D, Devarakonda MV. TAC. 2017. IBM Research System at TAC 2017: Adverse Drug Reactions Extraction from Drug Labels. [Google Scholar]
31.Peters ME, Neumann M, Iyyer M, et al. Deep contextualized word representations. Published online February 15, 2018. Accessed March 4, 2019: https://arxiv.org/abs/1802.05365v2 .
32.Pennington J, Socher R, Manning C. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Association for Computational Linguistics; 2014. Glove: Global Vectors for Word Representation; pp. 1532–1543. Accessed March 4, 2019. http://www.aclweb.org/anthology/D14-1162 . [Google Scholar]
33.Lu Y, Kumar A, Zhai S, Cheng Y, Javidi T, Feris R. Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017:1131–1140. doi:10.1109/CVPR.2017.126. [Google Scholar]

[r1-110_3417286] 1.Crowe B, Chuang-Stein C, Lettis S, Brueckner A. Reporting Adverse Drug Reactions in Product Labels. Drug Inf J. 2016;50(4):455–463. doi: 10.1177/2168479016628574. doi:10.1177/2168479016628574. [DOI] [PubMed] [Google Scholar]

[r2-110_3417286] 2.Szarfman A, Tonning JM, Doraiswamy PM. Pharmacovigilance in the 21st century: new systematic tools for an old problem. Pharmacotherapy. 2004;24(9):1099–1104. doi: 10.1592/phco.24.13.1099.38090. [DOI] [PubMed] [Google Scholar]

[r3-110_3417286] 3.Li Q, Deleger L, Lingren T, et al. Mining FDA drug labels for medical conditions. BMC Medical Informatics and Decision Making. 2013;13(1):53. doi: 10.1186/1472-6947-13-53. doi:10.1186/1472-6947-13-53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4-110_3417286] 4.Duggirala HJ, Tonning JM, Smith E, et al. Use of data mining at the Food and Drug Administration. J Am Med Inform Assoc. 2016;23(2):428–434. doi: 10.1093/jamia/ocv063. doi:10.1093/jamia/ocv063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5-110_3417286] 5.Roberts K, Demner-Fushman D, Tonning JM. TAC. 2017. Overview of the TAC 2017 Adverse Reaction Extraction from Drug Labels Track. [Google Scholar]

[r6-110_3417286] 6.Young T, Hazarika D, Poria S, Cambria E. Recent Trends in Deep Learning Based Natural Language Processing. Published online August 9, 2017. Accessed February 12, 2019. https://arxiv.org/abs/1708.02709v8 .

[r7-110_3417286] 7.Miranda DS. Automated Detection of Adverse Drug Reactions in the Biomedical Literature Using Convolutional Neural Networks and Biomedical Word Embeddings. arXiv:180409148 [cs, stat]. Published online April 24, 2018. Accessed July 21, 2020. http://arxiv.org/abs/1804.09148 .

[r8-110_3417286] 8.Alimova I, Tutubalina E. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics; 2019. Detecting Adverse Drug Reactions from Biomedical Texts with Neural Networks; pp. 415–421. doi:10.18653/v1/P19-2058. [Google Scholar]

[r9-110_3417286] 9.Zhang Y, Yang Q. An overview of multi-task learning. Natl Sci Rev. 2018;5(1):30–43. doi:10.1093/nsr/nwx105. [Google Scholar]

[r10-110_3417286] 10.Gupta S, Gupta M, Varma V, Pawar S, Ramrakhiyani N, Palshikar GK. Co-training for Extraction of Adverse Drug Reaction Mentions from Tweets. arXiv:180205121 [cs]. Published online February 14, 2018. Accessed July 21, 2020. http://arxiv.org/abs/1802.05121 .

[r11-110_3417286] 11.Gupta S, Pawar S, Ramrakhiyani N, Palshikar GK, Varma V. Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction mention extraction. BMC Bioinformatics. 2018;19(8):212. doi: 10.1186/s12859-018-2192-4. doi:10.1186/s12859-018- 2192-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12-110_3417286] 12.Multitask Learning Caruana R. Machine Learning. 1997;28(1):41–75. doi:10.1023/A:1007379606734. [Google Scholar]

[r13-110_3417286] 13.Deng L, Hinton G, Kingsbury B. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2013. New types of deep neural network learning for speech recognition and related applications: an overview; pp. 8599–8603. doi:10.1109/ICASSP.2013.6639344. [Google Scholar]

[r14-110_3417286] 14.Collobert R, Weston J. Proceedings of the 25th International Conference on Machine Learning. ACM; 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning; pp. 160–167. Accessed October 27, 2014. http://dl.acm.org/citation.cfm?id=1390177 . [Google Scholar]

[r15-110_3417286] 15.Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv:150601497 [cs]. Published online June 4, 2015. Accessed January 2, 2019. http://arxiv.org/abs/1506.01497 . [DOI] [PubMed]

[r16-110_3417286] 16.Chowdhury S, Zhang C, Yu PS. Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18. Multi-Task Pharmacovigilance Mining from Social Media Posts; pp. 117–126. Published online 2018. doi:10.1145/3178876.3186053. [Google Scholar]

[r17-110_3417286] 17.Gupta S, Gupta M, Varma V, Pawar S, Ramrakhiyani N, Palshikar GK. Multi-task Learning for Extraction of Adverse Drug Reaction Mentions from Tweets. In: Pasi G, Piwowarski B, Azzopardi L, Hanbury A, editors. Advances in Information Retrieval Lecture Notes in Computer Science. Springer International Publishing; 2018. pp. 59–71. [Google Scholar]

[r18-110_3417286] 18.Søgaard A, Goldberg Y. 2016. Deep multi-task learning with low level tasks supervised at lower layers; pp. 231–235. doi:10.18653/v1/P16-2038. [Google Scholar]

[r19-110_3417286] 19.Hashimoto K, xiong caiming, Tsuruoka Y, Socher R. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2017. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks; pp. 1923–1933. doi:10.18653/v1/D17-1206. [Google Scholar]

[r20-110_3417286] 20.Sanabria R, Metze F. Hierarchical Multi Task Learning With CTC. Published online July 18, 2018. Accessed March 4, 2019. https://arxiv.org/abs/1807.07104v5 .

[r21-110_3417286] 21.Sanh V, Wolf T, Ruder S. A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks. arXiv:181106031 [cs]. Published online November 14, 2018. Accessed March 4, 2019. http://arxiv.org/abs/1811.06031 .

[r22-110_3417286] 22.Rawat BPS, Li F, Yu H. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’19. ACM; 2019. Naranjo Question Answering Using End-to-End Multi-task Learning Model; pp. 2547–2555. doi:10.1145/3292500.3330770. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23-110_3417286] 23.Tran T, Kavuluru R, Kilicoglu H. A Multi-Task Learning Framework for Extracting Drugs and Their Interactions from Drug Labels. arXiv:190507464 [cs]. Published online May 17, 2019. Accessed December 10, 2019. http://arxiv.org/abs/1905.07464 .

[r24-110_3417286] 24.Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:181004805 [cs]. Published online October 10, 2018. Accessed March 4, 2019. http://arxiv.org/abs/1810.04805 .

[r25-110_3417286] 25.Jagannatha AN, Yu H. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics; 2016. Bidirectional RNN for Medical Event Detection in Electronic Health Records; pp. 473–482. Accessed March 4, 2019. http://www.aclweb.org/anthology/N16-1056 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26-110_3417286] 26.Chiu J, Nichols E. Named Entity Recognition with Bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics. 2016;4:357–370. [Google Scholar]

[r27-110_3417286] 27.Liebel L, Körner M. Auxiliary Tasks in Multi-task Learning. arXiv:180506334 [cs]. Published online May 16, 2018. Accessed May 21, 2019. http://arxiv.org/abs/1805.06334 .

[r28-110_3417286] 28.Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural Architectures for Named Entity Recognition. Published online March 4, 2016. Accessed March 4, 2019. https://arxiv.org/abs/1603.01360v3 .

[r29-110_3417286] 29.Xu J, Lee H-J, Ji Z, Wang J, Wei Q, Xu H. TAC. 2017. UTH_CCB System for Adverse Drug Reaction Extraction from Drug Labels at TAC-ADR 2017. [Google Scholar]

[r30-110_3417286] 30.Dandala B, Mahajan D, Devarakonda MV. TAC. 2017. IBM Research System at TAC 2017: Adverse Drug Reactions Extraction from Drug Labels. [Google Scholar]

[r31-110_3417286] 31.Peters ME, Neumann M, Iyyer M, et al. Deep contextualized word representations. Published online February 15, 2018. Accessed March 4, 2019: https://arxiv.org/abs/1802.05365v2 .

[r32-110_3417286] 32.Pennington J, Socher R, Manning C. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Association for Computational Linguistics; 2014. Glove: Global Vectors for Word Representation; pp. 1532–1543. Accessed March 4, 2019. http://www.aclweb.org/anthology/D14-1162 . [Google Scholar]

[r33-110_3417286] 33.Lu Y, Kumar A, Zhai S, Cheng Y, Javidi T, Feris R. Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017:1131–1140. doi:10.1109/CVPR.2017.126. [Google Scholar]

PERMALINK

Neural Multi-Task Learning for Adverse Drug Reaction Extraction

Feifan Liu, PhD

Xiaoyu Zheng, MS

Hong Yu, PhD

Jennifer Tjia, MD

Abstract

Introduction

Methods

Figure 1.

Words Embeddings

Task Decomposition

Task Coupling

New Label Encoding for Discontinuous Entities

Figure 2.

Experiments

Dataset and Evaluation Metrics

Overall Performance of NeuroADR

Table 1.

Ablation Study of NeuroADR Performance

Table 2.

Performance of NeuroADR per NER and RE Type

Table 3.

Discussion and Conclusion

Figures & Table

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Neural Multi-Task Learning for Adverse Drug Reaction Extraction

Feifan Liu, PhD

Xiaoyu Zheng, MS

Hong Yu, PhD

Jennifer Tjia, MD

Abstract

Introduction

Methods

Figure 1.

Words Embeddings

Task Decomposition

Task Coupling

New Label Encoding for Discontinuous Entities

Figure 2.

Experiments

Dataset and Evaluation Metrics

Overall Performance of NeuroADR

Table 1.

Ablation Study of NeuroADR Performance

Table 2.

Performance of NeuroADR per NER and RE Type

Table 3.

Discussion and Conclusion

Figures & Table

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases