Abstract
Electronic Health Records contain a wealth of clinical information that can potentially be used for a variety of clinical tasks. Clinical narratives contain information about the existence or absence of medical conditions as well as clinical findings. It is essential to be able to distinguish between the two since the negated events and the non-negated events often have very different prognostic value. In this paper, we present a feature-enriched neural network-based model for negation scope detection in biomedical texts. The system achieves a robust high performance on two different types of texts, scientific abstracts, and radiology reports, achieving the new state-of-the-art result without requiring the availability of gold cue information for negation scope detection task on the scientific abstracts part of BioScope1 corpus and competitive result on the radiology report corpus.
Introduction
Negation detection in clinical notes is crucial for any practical medical decision support that utilizes unstructured natural language data. Similarly, accurate negation detection in biomedical research publications can lead to improvement in a number of clinically relevant tasks: a short automatically generated summary of the negative versus positive results mentioned in the paper would allow fast randomized controlled trial (RCT) matching, better information indexing and more accurate paper summarization, reducing the time a medical professional spends to find information related to a given clinical case.
Traditionally the negation detection task is divided into two subtasks: negation cue identification and negation scope detection, where cue is defined as a word or a sequence of words that indicate the presence of a negation in the text and scope is a sequence of one or more words that are being negated by a given cue2, 3. While detecting the presence of negation in domain-specific texts can often be done by simple keyword matching2, the problem of identifying the scope of negation is a significantly more challenging problem. Consider the sentence: Patient had not eaten for the past two days, felt faint, and then collapsed. The sentence contains three medically relevant “event” that might be used as an input to a clinical decision support system: eating behavior, feeling faint and loss of consciousness. Only one of them (normal eating behavior) is negated while the other two constitute real observed symptoms that can be used as an input for a number of downstream tasks.
The majority of state-of-the-art negation scope detection systems rely on the availability of the cue information at inference time, limiting the applicability of the system to non-cue annotated texts4–6.
In this work, we present a Long Short-Term Memory (LSTM)-based system13 that effectively utilizes syntactic information and does not require any human-derived cue annotation at the inference time. In cases where the gold cue annotation is available, the system allows its usage as an additional feature and achieves results on the BioScope corpus comparable to state-of-the-art methods.
The main contribution of this work can be summarized as follows:
We propose a syntactic feature-rich augmented hierarchical LSTM model for negation scope detection in Biomedical texts without dependency on prior negation cue identification.
The proposed work provides an extensive comparison with state-of-the-art approaches and demonstrates a significant boost in performance, specifically in recall, on Biomedical abstracts.
We report on the domain transfer experiments, that tentatively support the partial transferability of the syntactic induced recall gain for negation scope detection on unlabeled medical data.
Related Work
Negation scope detection algorithms can be categorized broadly into two categories: approaches that are mainly rulebased and utilize both surface level rules (i.e., regular expressions defined for a set of surface input strings) and deeper syntactic/semantic level rules, and statistical machine learning approaches that utilize surface level features as input.
Rule-based approaches: Several existing works demonstrated a viability of purely rule-based approaches for the negation scope detection, especially for domain restricted texts: NegEx (Chapman et al.)2 and its extended version (Harkema et al.)7 that rely on a set of regular expressions triggered by a set of dictionary terms and negation classification system by Elkin et al.8. All these works reported high precision and recall on their respective test datasets.
Machine learning approaches: While rule-based approaches might achieve high performance on restricted datasets, they do not generalize well for other documents types as they may require customization to adapt to the new corpus criteria (i.e., new keywords). Such shortcoming led to the development of Statistical machine learning-based systems that do not require active human expert participation to adapt for a new dataset. Earlier works exemplifying this statistical approach for the negation scope detection include Morante et al.9 ensemble method combining Support Vector Machine (SVM) and Conditional Random Field (CRF) to predict the starting and ending tokens of the negation, a hybrid approach by Zhu et al.10 and CRF-based models by Agarwal and Yu11 and Councill et al.12.
More recently, several researchers adopted Neural Network-based approaches including: Convolutional Neural Network (CNN)-based scope detection model by Qian et al.6 and cue embeddings LSTM model by Fancellu et al.4.
Data
This work uses data from a publicly available corpus referred to as BioScope1. The corpus consists of three different types of text: Biological Papers abstracts from Genia Corpus (1,273 abstracts in total), Radiology reports from Cincinnati Children’s Hospital Medical Center (1,954 reports in total) and full scientific articles in bioinformatics domain (nine articles in total). Each negated/speculative sequence of tokens in the corpus is annotated with a unique id and contains a negation/speculation cue: a token or a set of tokens that indicate the presence of a negated or speculative information in the text. Uncertainty annotation was not used in the current experimental setup. Every token is given one of three choices of labels: in scope, outside scope and negation cue. Following the labeling schema used for the BioScope dataset, the negation cue token is also included as a part of the negation scope.
In this work, we considered two parts of the BioScope corpus: abstracts and clinical sentences. We perform in-domain training and testing for each of them, to highlight the differences between the distributional characteristics of the domain specific texts. As shown in Table 1, corpus statistics differ significantly depending on the source of the text, with abstracts corpus exhibiting longer sentences with a more complex syntactic structure and longer negation scopes.
Table 1:
Corpus statistics for different types of documents
| Clinical Texts | Bio. Abstracts | |
|---|---|---|
| Percentage of sentences containing negation | 13.55% | 13.45% |
| Percentage of sentences < 10 tokens | 75.85% | 3.17% |
| Percentage of sentences 11-30 tokens | 23.99% | 66.42% |
| Percentage of sentences > 30 tokens | 0.16% | 30.39% |
| Average length of a sentence | 7.73 | 26.43 |
| Average length of a scope | 4.98 | 9.43 |
Methods
We set up the task at hand as a token-level binary classification - for each given token in the input, we predict if it is within the scope of the negation or not. Figure 1 demonstrates an overview of the proposed system for the negation scope detection. Our final model is composed of the following high-level layers:
A word embedding layer that takes a token as an input and retrieves a corresponding continuous fixed-dimensional vector representation of the token.
A Character level word embedding layer that collects all embedded characters of a given token, passes them through a bi-directional LSTM and outputs a character-based token representation by concatenating the forward and backward outputs of the layer for a given token as shown in Figure 2.
A feature embedding layers, that integrates pre-computed syntactic features for each token into a continuous vector space.
A final bi-directional LSTM representation layer that takes a concatenation of pre-trained distributional embeddings, character-based word embeddings, and syntactic features embeddings and outputs the final context dependent continuous representation of a token.
A final representation layer optimized for label prediction. We pass the flattened LSTM-produced representation through a fully-connected layer followed by a soft-max layer. This layer produces the probability of each token in the sentence belonging to a certain class that is later used to make a classification decision.
Figure 1:

An overview of the proposed framework for the negation scope detection.
Figure 2:

Character-based word-embedding representation generation using a bi-directional LSTM.
The main building block of the constructed model is a LSTM model. Given a sequence of vectors x1, x2, · · · xn, at each time step t1, t2, · · · tn, an LSTM takes input xt, ht−1, ct−1 and produces two outputs: (i) a hidden state ht and (ii) a memory cell ct by applying the following set of transformations to the inputs:
where Wi, Wc, Wo are trainable weight matrices and bi, bc, bo are bias vectors, the symbols σ(·) and tanh(·) represent element-wise sigmoid and hyperbolic tangent, respectively, and is the Hadamard product of two vectors.
Following Schuster and Paliwal13, we compute the representation for both forward and backward context of the token, concatenating two representation into a single final one.
Given the fact that LSTM models tend to gravitate towards learning simple surface features as opposed to deep syntax structures in absence of direct supervision14, we provide explicit syntax derived information to the system right before computing the final LSTM based token representation. We compute the following syntactic features for every token using spaCy toolkit.1
Part-Of-Speech (POS) of a given token as defined by Penn Treebank tagging scheme;16
Depth of the token in the syntactic dependency tree;
A string of Google Universal POS tags17 of the three direct ancestors of the token in the dependency tree: this feature means to capture local constituent-like information about the token;
Type of dependency between the token and its parent, representing limited dependency tree information about token.
Consider the following sentence: 10-years-old female with cough. After parsing, the dependency tree of this sentence is derived as shown in Figure 3. The token 10-years is marked as a cardinal number (POS = CD); the path from the token to the root node is two ancestors apart (DEPTH = 2); the immediate ancestors of the token are an adjective and a noun (PATH = ADJ − NOUN); and the type of the dependency between tokens 10-years and old is of the adjectival modifier type (DEP = AMOD).
Figure 3:

Syntactic feature construction from the computed dependency tree.
To get a continuous representation of the features, we embed each of them into a 10-dimensional continuous vector space. Unlike syntactic path features discussed in Qian et al.6, the features we provide do not depend on the presence of the gold negation cue information in the original dataset and can be computed for unannotated texts, allowing the system to run without requiring any human interaction.
Experimental Setup
To evaluate the importance of syntactic features for negation scope identification, we run our system with both syntactic features embeddings as well as a pure character-word hierarchical LSTM. We also run our system with the additional gold-cue feature provided at both training and testing rounds to compare our system with other Neural Network-based models and to test if the gold negation cue information confers the same information about the boundaries of the negation scope as local syntactic information. The proposed model was implemented in Python using Pytorch framework.15
Following the arguments of Reimers and Gurevich18, we refrain from using a single train/test set split evaluation and instead perform 10-fold cross validation training/prediction experiments with three random seeds to obtain a more reliable estimate of the model’s performance and report mean and standard deviation across all folds for each dataset type. As we do not perform any hyper-parameter tuning of the model selection based on the validation fold results, our scores can be considered to be the true proxy for the unknown data. We run our model with the following setting of hyper-parameters.
Character embedding dimension: 20
Pretrained word embedding dimension: 100
Pretrained word embedding type: Glove19
Character-based word embedding LSTM dimension: 25
Syntactic features embeddings dimension: 10 for each feature
Final LSTM embedding dimensionality: 100
Learning rate: 0.1
Number of Epochs: 40
The current hyper-parameter setting is adopted from prior works20,21. And judging by our results, such choice of parameters also results in a competitive performance on the negation scope detection task for the current study.
Due to the class imbalance issues (the number of negated tokens is quite smaller than the number of non-negated tokens), we conduct the evaluation using the negative class per token.
Results
The performance of the negation scope detection is measured in terms of the Precision, Recall, and F1-score. True positive is defined as a match between the predicted label (in or outside scope) and the ground truth label for every token. The outcome of the prediction is counted as false positive or false negative if the predicted label for a given token is in scope but the ground truth label is outside scope or vice versa, respectively.
Tables 2 and 3 summarize the performance of our system versus the performance of the previously proposed methods for the negation scope detection task on BioScope corpus, without and with accessing gold negation cue, respectively. Reported results from the prior works in the table are gathered from the corresponding publications. Among all prior works, we were only able to re-run the evaluation using the same training/testing breakdown as used for our model on the work by Fancellu et al.5 as authors provided access to their code.
Table 2:
Performance comparison of the proposed work versus state-of-the-art systems on the BioScope corpus as reported in the literature. All systems did not have access to “gold negation cues” inputs during training and testing. Unk. split: Unknown split; CV: Cross-validation; w/o: without.
| Model | Test. Schema | Precision | Clinical Text Recall | F1 | Precision | Bio. Abstract Recall | F1 |
|---|---|---|---|---|---|---|---|
| NegEx2 | Unk. split | 90.8 ± 5.4 | 85.9 ± 6.9 | 88.0 ± 4.0 | 63.8 ± 4.7 | 72.7 ± 3.8 | 67.8 ± 3.6 |
| Morante (2009)9 | 10-fold CV | 86.4 | 82.1 | 84.2 | 81.7 | 83.5 | 82.6 |
| Zhu (2010)10 | 10-fold CV | 82.2 | 80.6 | 81.4 | 78.2 | 78.8 | 78.5 |
| Agarwal (2010)11 | Unk. split. | 94.7 ± 4.2 | 95.0 ± 3.2 | 94.8 ± 3.2 | 84.7 ± 3.5 | 84.1 ± 3.8 | 84.3 ± 3.2 |
| Ballestros (2012)22 | Unk. split | 95.8 | 90.6 | 93.1 | 73.5 | 80.7 | 84.8 |
| Our model (w/o synt. features) | 10-fold CV2 | 94.4 ± 4.7 | 93.7± 4.5 | 94.0 ± 2.5 | 85.9 ± 5.2 | 84.1 ± 4.2 | 85.1 ± 3.3 |
| CI 95 | CI 95 | CI 95 | CI 95 | CI 95 | CI 95 | ||
| [89.9 − 97.5] | [92.5 − 100] | [92.6 − 96.2] | [80.7 − 90.5] | [81.3 − 89.0] | [82.3 − 86.0] | ||
| Our model (with synt. feature) | 10-fold CV2 | 95.1 ± 4.4 | 93.8 ± 6.0 | 94.4 ± 3.1 | 85.4 ± 6.7 | 90.8 ± 5.0 | 87.9 ± 4.5 |
| CI 95 | CI 95 | CI 95 | CI 95 | CI 95 | CI 95 | ||
| [92.0 − 98.4] | [90.1 − 99.7] | [92.7 − 95.8] | [80.1 − 91.5] | [85.8 − 93.6] | [85.6 − 92.5] |
Table 3:
Performance comparison of the proposed work versus other state-of-the-art work on the BioScope corpus as reported in the literature. All frameworks had access to “gold negation cues” inputs during both training and testing. Unk. split: Unknown split; CV: Cross-validation; w/o: without.
| Model | Test. Schema | Precision | Clinical Text Recall | F1 | Precision | Bio. Abstract Recall | F1 |
|---|---|---|---|---|---|---|---|
| Morante (2009)9 | 10-fold CV | 91.6 | 92.5 | 92.5 | 90.7 | 90.7 | 90.7 |
| Fancellu (2016)5 | Unk. split | unk | unk | 97.7 | unk | unk | 91.3 |
| Qian (2016)6 | 10-fold CV | 92.0 | 97.0 | 94.4 | 89.5 | 90.5 | 89.9 |
| Fancellu (2017)4 | Unk. split | unk | unk | 97.9 | unk | unk | 92.1 |
| Fancellu (2016)5 re-run | 10-fold CV2 | 97.3 ± 2:8 | 96:7 ± 3:2 | 97:2 ± 2:0 | 90:6 ± 3:2 | 83:7 ± 1:0 | 87:6 ± 2:3 |
| Our Model (w/o synt. features) | 10-fold CV2 | 97.1 ± 3.9 | 96.1 ± 3.4 | 96.8 ± 2.3 | 90.8 ± 4.4 | 90.1 ± 2.5 | 90.4 ± 2.3 |
| CI 95 | CI 95 | CI 95 | CI 95 | CI 95 | CI 95 | ||
| [94 − 100] | [93.0 − 97.0] | [95.7 − 98.1] | [85.7 − 92.7] | [88.6 − 92.2] | [89.4 − 92.4] | ||
| Our Model (with synt. features) | 10-fold CV2 | 96.8 ± 3.8 | 96.8 ± 3.7 | 96.8 ± 2.1 | 91.3 ± 4.7 | 93.9 ± 3.6 | 92.6± 3.4 |
| CI 95 | CI 95 | CI 95 | CI 95 | CI 95 | CI 95 | ||
| [94.3 − 100] | [93.0 − 100] | [94.6 − 98.0] | [86.7 − 93.8] | [90.0 − 93.8] | [89.9 − 95.5] |
As can be observed from the tables, our model achieves competitive performance on both corpora of the BioScope dataset in terms of F1-scores, with the best performance on pure scope detection with no gold negation cue information available task for biological abstracts data.
Discussion
Considering results reported in Tables 2 and 3, it is interesting to note that providing syntactic information resulted in a consistently improved performance for the biological abstracts dataset but not for the clinical texts set. This is consistent with the “high complexity” of the biological abstracts dataset previously reported: abstract sentences tend to be longer and contain subclauses, while medical notes sentences are, as a rule, short statements. While low-level morphological features are generally captured by the learned character embedding23, final context-aware LSTM-based word representations do not appear to be learning syntax and mostly rely on the combinatorial skip-gram like context, unless directly supervised for that particular objective14.
Furthermore, judging by the independent cumulative model performance, the improvement achieved by the inclusion of the gold negation cue information and the syntactic information seem to represent different types of additional information, with gold negation cue information helping to detect the presence of the negation and syntactic information providing additional clues for end-of-the-scope decision.
In order to facilitate future evaluations and comparisons, we provide the document split information publicly: unfortunately the majority of previously published results are reported on unknown test split, which, considering the small dataset size, might not reflect the model performance adequately (in our cross-validation experiments “easy” test split results and “hard” test split results differ by about 2% to 3% in terms of F1-score, which constitute the difference between state-of-the-art and the average model performance for the current dataset. We also would like to encourage the community to report the tokenization algorithm and pre-processing pipeline explicitly, since the tokenization schemes might influence the model behavior significantly, as exemplified by our re-run of Fancellu (2016) using our tokenization algorithm.
The majority of high-performing cue-negation systems rely on the access to the cue phrase in the input text: Gold negation cue information results in 3 to 5 percent higher F1-scores on average. Automatic cue phrase identification is considered to be an “easy task” with most system achieving an F1-score of 95% or higher for the cue detection on the scientific papers dataset9, 11 and 97% or higher on the clinical notes dataset9, 11. We have investigated whether providing an automatic cue detection with very high accuracy instead of access to manually curated gold negation cue affect the negation scope detection performance. We developed a cue detection framework using a hierarchal LSTM model. Table 4 summarizes the performance of negation scope detection in three different scenarios: without access to gold negation cue, with access to automatically detected negation cue and finally, with access to gold negation cue.
Table 4:
Comparing the impact of automatic versus manual cue identification on negation scope detection.
| Model | Precision | Clinical Text Recall | F1 | Precision | Bio. Abstract Recall | F1 |
|---|---|---|---|---|---|---|
| No Negation Cues | 95.1 ± 4.4 | 93.8 ± 6.0 | 94.4 ± 3.1 | 85.4 ± 6.7 | 90.8 ± 5.0 | 87.9 ± 4.5 |
| With Negation Cue Prediction | 93.5 ± 4.5 | 94.7 ± 7.3 | 94.0 ± 3.9 | 85.6 ± 6.1 | 89.2 ± 5.0 | 87.7 ± 4.4 |
| With Gold Negation Cues | 96.8 ± 3.8 | 96.8 ± 3.7 | 96.8 ± 2.1 | 91.3 ± 4.7 | 93.9 ± 3.6 | 92.6 ± 3.4 |
As can be observed from the table, there is no significant difference in the performance of the negation scope detection between the system with no access to the gold negation cue and a system with access to automatically detected negation cue. This suggests that the negation scope detection system already learns cue information to some extent, at least as good as a cue prediction algorithm. On the other hand, providing the gold negation cue information results in significant improvement in the performance in terms of both recall and precision. As expected, due to the inclusion of gold negation cue, the model has an “easier” job in identifying negated sentences and making less mistakes in determining scopes.
In order to evaluate the generalizability of the proposed framework, we have conducted an additional series of experiments where we trained our model on the BioScope corpus (including both clinical texts and biological abstracts) and tested on an independent test set of medical texts extracted from the publicly available dataset included with the NegEx work2. This corpus was manually annotated for the negation cue and scope. The NegEx corpus has a different distribution (different hospital, general clinical notes) than the training set. We considered the following training scenarios:
Train without syntactic features (Baseline) referred to as NO FEAT;
Train with syntactic features referred to as ALL FEAT;
Train with syntactic features and the gold negation cue referred to as ALL FEAT + Gold Cues.
Consider results reported in “40 epochs” column of Table 5. FEAT + Gold Cues model still demonstrates high recall; however, the precision dropped significantly compared to results reported on the BioScope data, resulting in significantly lower F1-score. Furthermore, comparing results between ALL FEAT and ALL FEAT + Gold Cue demonstrates that there seems to be a significant boost in recall due to the inclusion of gold negation cue as an additional feature; however, the precision seems not to be affected the same way. This is inconsistent with our observation from Table 4. After reviewing false positives and false negatives, it became clear that a significant portion of the errors is due to the sentence parsing errors. Here is an example:
Table 5:
Performance of the negation scope detection on an independent negation dataset: a subset of the NegEx corpus. NO FEAT: without syntactic features; ALL FEAT: with syntactic features
| Best Performance | Converges To (40 epochs) | |||||
|---|---|---|---|---|---|---|
| Model | Precision | Recall | F1 | Precision | Recall | F1 |
| NO FEAT | 74.6% | 83.1% | 78.6% | 63.8% | 76.3% | 69.5% |
| ALL FEAT | 73.7% | 89.0% | 80.7% | 64.7% | 85.8% | 73.8% |
| ALL FEAT + Gold Cues | 74.9% | 93.6% | 83.0% | 60.9% | 92.1% | 73.4% |
<<PROCEDUREIMAGES>> COMPLICATIONS: None POSTOPERATIVE DIAGNOSIS(ES): 1) INTERNAL HEMORRHOIDS.
In this string, “None POSTOPERATIVE DIAGNOSIS(ES)” was parsed as one sentence due to lack of sentence delimiters such as new lines. As a result, the negation cue, None, was considered for the proceeding phrase for the scope detection. The majority of non-preprocessing dependent errors are, as expected, produced by the systems inability to find the end of a given scope: training data is much cleaner and (as noted by Fancellu et al.4), all systems learn to rely on the proper punctuation for scope closure at least to some extent. As such, we focus on completely or almost completely “missed” scopes to test our hypothesis on cue discovery nature of the recall gain. For the sake of comparison, we have also included the performance of the algorithm proposed by Fancellu (2016), similarly trained on the BioScope corpus and tested on the NegEx corpus. Despite the high recall, this algorithm suffers from very low precision.
In Table 5, we report two training scenarios: First, a typical scenario in which the training is performed for a fixed number of epochs (typically 40 epochs). The 40 epochs scenario represents the model that is effectively overfitting on the training data, and as such expected to result in worse generalization on an independent test set with different texts distribution (target dataset). Second, the best performing model, which is determined by selecting the model among 40 epochs that yields the best result on the test set (in our use case, the best performance was achieved at the fourth epoch). The first and second scenarios are referred to as ‘know when to stop” and “train till convergence”, respectively. Results are summarized in Table 5. Considering ALL FEAT + Gold Cue model, an interesting observation from the table is that there seems to be no gain in recall between “know when to stop” and “train till convergence” scenarios; however, overfitting to the training corpus resulted in significantly lower precision.
Unknown negation cues remain challenging to detect for all three systems mentioned in Table; however, ALL FEAT model appears to be better at unknown cue prediction for more regular negation patterns. Consider the following set of examples (negation cue is shown with bold font and the negation scope is distinguished with italic font):
She denies any ear pain, sore throat, hemoptysis, shortness-of-breath, dyspnea on exertion, chest discomfort anorexia, nausea, weight-loss, mass, adenopathy or pain. 2325-2328-examplesFromLiterature
He denies any tobaco, alchohol, or drug abuse although he occasionally drinks beer.2549-2551-examplesFromNegex08
He denies any orthopnea, lower extremity edema, or calf pain 1430-1432-examplesFromNegex11
Token “denies” has very low occurrence in the training dataset (BioScope corpus). As a result, NO FEAT model failed to detect the negation entirely, while ALL FEAT was able to both detect a negation and label the scope boundaries in all three examples given above correctly. ALL FEAT appears to be relying on the syntactic patterns of the sentence as well as implicitly predicted cue information when determining the scope of the negation, as such it is able to “fall back” to syntactic enhances scope discovery when the gold cue information is unavailable, unlike the systems that do not have the syntactic information available.
Conclusions
In this work, we present a gold negation cue independent hierarchical LSTM based model that utilizes local syntactic features for negation scope detection in biomedical texts, Our comparison with state-of-the-art approaches utilizing gold negation cue demonstrated promising performance on a publicly available corpus referred to as BioScope.
Syntactic information appears to facilitate the cue discovery process that boosts the models recall and results in a general improvement on the datasets that does not rely on the explicitly annotated gold negation cue information. This improvement appears to hold in the naive domain transfer settings, though more experiments are required.
Footnotes
References
- 1.Vincze V, Szarvas G, Farkas R, Mo´ra G, Csirik J. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC bioinformatics. 2008 Dec;9(11):S9. doi: 10.1186/1471-2105-9-S11-S9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. Evaluation of negation phrases in narrative clinical reports; proceedings of the AMIA Symposium; American Medical Informatics Association; 2001. p. p. 105. [PMC free article] [PubMed] [Google Scholar]
- 3.Morante R, Blanco E. * SEM 2012 shared task: Resolving the scope and focus of negation; In Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation; Association for Computational Linguistics; 2012. Jun 7, pp. 265–274. [Google Scholar]
- 4.Fancellu F, Lopez A, Webber B, He H. Detecting negation scope is easy, except when it isn’t; Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers; 2017. pp. 58–63. [Google Scholar]
- 5.Fancellu F, Lopez A, Webber B. Neural networks for negation scope detection; Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2016. pp. 495–504. [Google Scholar]
- 6.Qian Z, Li P, Zhu Q, Zhou G, Luo Z, Luo W. Speculation and negation scope detection via convolutional neural networks; Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing; 2016. pp. 815–825. [Google Scholar]
- 7.Harkema H, Dowling JN, Thornblade T, Chapman WW. ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of biomedical informatics. 2009 Oct 1;42(5):839–51. doi: 10.1016/j.jbi.2009.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Elkin PL, Brown SH, Bauer BA, Husser CS, Carruth W, Bergstrom LR, Wahner-Roedler DL. A controlled trial of automated classification of negation from clinical notes. BMC medical informatics and decision making. 2005 Dec;5(1):13. doi: 10.1186/1472-6947-5-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Morante R, Daelemans W. A metalearning approach to processing the scope of negation; Proceedings of the Thirteenth Conference on Computational Natural Language Learning; Association for Computational Linguistics; 2009. Jun 4, pp. 21–29. [Google Scholar]
- 10.Zhu Q, Li J, Wang H, Zhou G. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2010. Oct 9, A unified framework for scope learning via simplified shallow semantic parsing; pp. 714–724. [Google Scholar]
- 11.Agarwal S, Yu H. Biomedical negation scope detection with conditional random fields. Journal of the American medical informatics association. 2010 Nov 1;17(6):696–701. doi: 10.1136/jamia.2010.003228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Councill IG, McDonald R, Velikovich L. What’s great and what’s not: learning to classify the scope of negation for improved sentiment analysis; Proceedings of the workshop on negation and speculation in natural language processing; Association for Computational Linguistics; 2010. Jul 10, pp. 51–59. [Google Scholar]
- 13.Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing. 1997 Nov;45(11):2673–81. [Google Scholar]
- 14.Linzen T, Dupoux E, Goldberg Y. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. arXiv preprint arXiv:1611.01368. 2016 Nov 4; [Google Scholar]
- 15.Paszke A., Gross S., Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A., Antiga L., Lerer A. Automatic differentiation in pytorch. 2017 [Google Scholar]
- 16.Marcus MP, Marcinkiewicz MA, Santorini B. Building a large annotated corpus of English: The Penn Treebank. Computational linguistics. 1993 Jun 1;19(2):313–30. [Google Scholar]
- 17.Petrov S, Das D, McDonald R. A Universal Part-of-Speech Tagset; Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012) 2012. [Google Scholar]
- 18.Reimers N, Gurevych I. Reporting score distributions makes a difference: Performance study of LSTM-networks for sequence tagging. arXiv preprint arXiv:1707.09861. 2017 Jul 31; [Google Scholar]
- 19.Pennington J, Socher R, Manning C. Glove: Global vectors for word representation; Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014. pp. 1532–1543. [Google Scholar]
- 20.Boag W, Sergeeva E, Kulshreshtha S, Szolovits P, Rumshisky A, Naumann T. CliNER 2.0: Accessible and Accurate Clinical Concept Extraction. arXiv preprint arXiv:1803.02245. 2018 Mar 6; [Google Scholar]
- 21.Dernoncourt F, Lee JY, Uzuner O, Szolovits P. De-identification of patient notes with recurrent neural networks. Journal of the American Medical Informatics Association. 2017 May 1;24(3):596–606. doi: 10.1093/jamia/ocw156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ballesteros M, Díaz A, Francisco V, Gerva´s P, De Albornoz JC, Plaza L. UCM-2: a rule-based approach to infer the scope of negation via dependency parsing; InProceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation; Association for Computational Linguistics; 2012. Jun 7, pp. 288–293. [Google Scholar]
- 23.Gaddy D, Stern M, Klein D. What’s Going On in Neural Constituency Parsers? An Analysis. arXiv preprint arXiv:1804.07853. 2018 Apr 20; [Google Scholar]
