Abstract
The identification of relationships between clinical concepts in patient records is a preliminary step for many important applications in medical informatics, ranging from quality of care to hypothesis generation. In this work we describe an approach that facilitates the automatic recognition of relationships defined between two different concepts in text. Unlike the traditional bag-of-words representation, in this work, a relationship is represented with a scheme of five distinct context-blocks based on the position of concepts in the text. This scheme was applied to eight different relationships, between medical problems, treatments and tests, on a set of 349 patient records from the 4th i2b2 challenge. Results show that the context-block representation was very successful (F-Measure = 0.775) compared to the bag-of-words model (F-Measure = 0.402). The advantage of this representation scheme was the correct management of word position information, which may be critical in identifying certain relationships.
Keywords: relationship representation, patient records, relationship identification, clinical text, medical concepts
I. Introduction
The era of electronic health records brings the necessity of automatic recognition for clinical concepts, and the relationships that tie them together. Patient records contain comprehensive accounts of the patients’ visits to the hospital. Such information can be invaluable for pharmaco-vigilance, detection of adverse effects, comparative effectiveness studies, etc.
The first step in the automatic processing of a clinical document is the recognition of text phrases which refer to clinically relevant concepts such as: medical problems, treatments and tests. Medical problems are observations about the patient’s clinical health. Treatments include procedures, or medications administered to patients. Tests include lab procedures or measurements prescribed to patients. The second step is the identification of relationships among the recognized concepts. A relationship between two clinical concepts identifies how problems relate to treatments, tests and other medical problems in text. Figure 1 shows a diagram of the possible relationships, adapted from the 4th i2b2 challenge. Table 1 gives examples for each relationship. In this study, we focus on the identification of relationships as defined in the 4th i2b2 challenge (https://www.i2b2.org/NLP/Relations).
Figure 1.

The diagram of clinical relationships. Concepts appear in blue boxes, while the relationships between them appear in colored diamonds.
TABLE I.
EXAMPLES OF RELATIONSHIPS BETWEEN MEDICAL CONCEPTS IN PATIENT RECORDS
![]() |
Medical problems are shown in red, tests are shown in green and treatments are shown in orange.
Although there exists a significant body of literature on relationship extraction in biomedicine, it mostly addresses the extraction of relationships between biological entities (e.g. protein-protein interaction) in the scientific literature [1–6]. Fewer studies are found on the extraction of relationships between diseases, symptoms, and medication in patient records. The proposed methods are typically based on co-occurrence statistics, semantic interpretation, or machine learning. Chen and colleagues [7] proposed an automatic method for extracting disease-drug pairs based on applying an existing Natural Language Processing (NLP) system (MedLEE) for identifying associations. More recently, there were applied methods to the association between symptoms and diseases [8] and to the detection of drug adverse effects to disease related symptoms [9].
Although co-occurrence based methods can identify the existence of relationships between clinical entities, they are unable to further characterize specific relations. Another significant research effort addressing the extraction of relationships between biomedical entities has resulted in the development of the semantic representation program SemRep, which exploits linguistic analysis of biomedical text and domain knowledge in the Unified Medical Language System (UMLS) [10]. This tool achieved competitive performance in [11] for extracting drug-disease treatment relationships from biomedical text.
We address the relationship identification task with machine learning techniques, which have been shown effective in mining patient smoking and medication status [12–13] from unstructured patient records. Unlike previous studies that focused on a single entity, our task involved relationship identification between two different kinds of entities.
In this study, we developed a context-based scheme of representing relationships. We defined the relationship between two concepts as a structure of five context-blocks (see Fig. 2). We characterized each of these five different blocks, and built a machine learning model that makes an informed decision based on the present characteristics. We achieved an F-measure of 0.768 on average among all eight different kinds of relationships that we studied.
Figure 2.

Relationship representation between two concepts as five context-blocks: introductory block—the set of words from the beginning of the sentence to the occurrence of the first concept, 1st concept block—the set of words that comprise first concept (not necessarily first in the sentence), connective block—the set of words that tie the two concepts in the relationship, 2nd concept block—the set of words that comprise second concept, and conclusive block—the set of words from the 2nd concept to the end of the sentence.
II. DATA
Our participation in the 4th i2b2 challenge allowed us to have access to a corpus of 349 fully de-identified medical records from four different hospitals. This corpus was manually annotated for concept, assertion type and relationship information at the sentence level.
Table 2 shows the number of positive and negative examples for each relationship type. Positive examples were extracted from the corpus annotations. Negative examples were created using all the pairs of annotated concepts for all the sentences in the corpus. Each pair of (problem, problem) concepts contributed one candidate to the PIP relationship, each pair of (problem, test) concepts contributed one candidate for each of the TeCP and TeRP relationships, and each pair of (problem, treatment) concepts contributed one candidate for each of TrAP, TrNAP, TrIP, TrWP and TrCP relationships.
TABLE II.
DATA DESCRIPTION. THE AGGREGATED NUMBER OF EXAMPLES EXTRACTED FROM 349 PATIENT RECORDS
| Relationship | PIP | TeCP | TeRP | TrAP | TrNAP | TrIP | TrCP | TrWP |
|---|---|---|---|---|---|---|---|---|
| Number of positive examples | 1239 | 303 | 1733 | 1421 | 106 | 107 | 296 | 56 |
| Total number of candidate examples | 8588 | 3571 | 3571 | 4315 | 4315 | 4315 | 4315 | 4315 |
III. METHODS
We describe the relationship representation scheme, the relationship features and the evaluation measures.
A. The context-blocks relationship representation scheme
We represent a relationship between two concepts as a schema of five, not necessarily consecutive, context-blocks, as shown in Figure 2. This structure—Introductory, 1st Concept, Connective, 2nd Concept and Conclusive blocks—is naturally marked by the location of the two concepts in the sentence. As an operational decision, the introductory and conclusive blocks contained a maximum of five words.
B. Features
We considered 4,961 unique un-stemmed word features extracted from our corpus. We experimented both with word stemming and stop-words elimination [14]. Next, we identified 2,061 distinct Concept Unique Identifiers (CUI features) and 101 distinct Semantic Type categories (SemType features) from mapping our data to UMLS using MetaMap [15]. Finally, the assertion categories (assertion features) were defined in the 4th i2b2 challenge as: absent, conditional, present, hypothetical, possible, and associated-with-someone-else, and were extracted from the provided corpus annotations.
C. Machine learning model and iterative feature selection
For each relationship, we built a separate machine learning model that recognized the true relationships from the given candidates. The model took into account the context-block representation, so that each block was represented with its own bag-of-words model. The classification algorithm of choice was a linear SVM. We built a five-fold cross-validation setting with balanced positive and negative instances for each fold.
Our approach was to train our SVM-learner repeatedly, and eliminate a fixed number of lowest-weighted features, after each step. Then a new model was learned on the remaining features. We reduced the number of features 500 at a time, until the system’s performance did not improve any more.
Finally, given a test sentence annotated for concept and assertion, all relevant relationship models were tested for each pair of concepts. Next, each score result was converted to a probability value. The two concepts were predicted to have the relationship which score provided the highest probability.
D. Evaluation metrics
We used precision, recall, and F-measure. These values were averaged over the five folds of cross validation. For a system balanced both in precision and in recall, F-measure was used to pick the best models. When the same F-measure was obtained, we broke ties by choosing the model with the less number of features. We performed per-relationship and per-record evaluation. The first measured the system’s performance on the relationship candidates, regardless of the patient records they were collected from. The latter measured the system’s performance for each patient’s record. Here, the number of records varied per relationship, because only the records that had at least one annotated positive example were considered.
We used these baselines to compare the models:
String matching: For each relationship we manually examined sample sentences and identified a set of relationship characteristic phrases.
Naïve bag-of-words: This SVM model used all the available features without taking into account the context-block representation of a relationship.
IV. RESULTS
We conducted a wide range of experiments to identify the relationship models and we present a summary of them.
A. Concept-blocks relationship model performs best
Table III shows the per-relationship evaluation results for the PIP identification using the string matching model, the naïve bag-of-words model and the context-blocks relationship representation model. It also presents results when assertion, CUI and SemType features are added to the word features in the Concept blocks.
TABLE III.
PER-RELATIONSHIP EVALUATION FOR THE PIP RELATIONSHIP, USING STRING MATCHING AND SVM MODELS
| Relationship Model | String Matching | SVM-naïve bag of words | SVM-blocks representation, word features | SVM-blocks word + assertion | SVM-blocks word + CUI | SVM-blocks word + SemType |
|---|---|---|---|---|---|---|
| Precision | 0.177 | 0.254 | 0.601 | 0.598 | 0.646 | 0.590 |
| Recall | 0.511 | 0.960 | 0.796 | 0.784 | 0.746 | 0.797 |
| F-measure | 0.263 | 0.402 | 0.685 | 0.679 | 0.692 | 0.678 |
B. Assertions, Concept identifiers and Semantic Types are important for different relationships
Table IV shows per-relationship evaluation for all eight clinical relationships. For each relationship, the best model combining the word features with the assertion, CUI and SemType features was selected based on the F-measure values. These results illustrate that different relationships benefited from different additional concept features.
TABLE IV.
PER-RELATIONSHIP EVALUATION FOR THE BEST MODELS OF ALL RELATIONSHIPS
| Relationship | PIP | TeCP | TeRP | TrAP | TrNAP | TrIP | TrCP | TrWP |
|---|---|---|---|---|---|---|---|---|
| Precision | 0.646 | 0.584 | 0.805 | 0.727 | 0.800 | 0.619 | 0.644 | 0.800 |
| Recall | 0.746 | 0.742 | 0.932 | 0.872 | 0.604 | 0.607 | 0.588 | 0.358 |
| F-measure | 0.692 | 0.654 | 0.864 | 0.793 | 0.688 | 0.613 | 0.615 | 0.494 |
| The best model | Word, CUI | Word, Assertion | Word features | Word, CUI, Assertion, SemType | Word, CUI | Word, SemType | Word, CUI, Assertion, SemType | Word, Assertion |
C. Feature selection refined relationship identification
We applied the SVM iterative feature selection to each relationship model selected in Table IV. After feature selection we identified 1000 features for each relationship. Table V presents the F-measures obtained, before and after feature selection, for each relationship using five-fold cross validation. Metrics are computed using both per-relationship and per-record evaluation.
TABLE V.
PER-RELATIONSHIP AND PER-RECORD F-MEASURES COMPUTED PRIOR TO AND AFTER FEATURE SELECTION
| Feature Selection | Evaluation metrics | PIP | TeCP | TeRP | TrAP | TrNAP | TrIP | TrCP | TrWP |
|---|---|---|---|---|---|---|---|---|---|
| Prior | Per Relationship | 0.692 | 0.654 | 0.864 | 0.793 | 0.688 | 0.613 | 0.615 | 0.494 |
| After | Per Relationship | 0.775 | 0.759 | 0.912 | 0.866 | 0.754 | 0.683 | 0.735 | 0.673 |
| Prior | Per Record | 0.761 | 0.803 | 0.819 | 0.768 | 0.685 | 0.703 | 0.695 | 0.569 |
| After | Per Record | 0.823 | 0.883 | 0.855 | 0.831 | 0.743 | 0.814 | 0.793 | 0.621 |
D. Context blocks important for relationship identification
We studied the feature composition of the selected models for each relationship category. We found that specific words were selected in specific context blocks. Consider, for example, the TeCP and TeRP relationships. The word “revealed” was weighted positively in the Connective block of the TeRP relationship, but it was weighted negatively in the Connective block of the TeCP relationship. Stop-words were also highly weighted features, both positively and negatively, in all relationship models.
V. DISCUSSION
In this study, we defined the relationship between two concepts as a structure of five distinct context-blocks: the introductory block, the first concept block, the connective block, the second concept block, and the conclusive block. Such a representation was successful in identifying eight relationships between medical problems, treatments and tests in patient records. The performance degraded considerably when the context-blocks structure was removed for the same relationships, with the same set of features and the same classification algorithm (the naïve bag-of-words model). The context-blocks representation captured the individual word positions, and treated them accordingly. For example, for the TrCP relationship the word “without” was a highly weighted negative feature in the introductory block and a highly weighted positive feature in the connective block.
Also, contrary to general belief, stop-words were very valuable in this study. For example, the word “no” was a highly weighted negative feature in the introductory block of the TrAP relationship, while being a highly weighted positive feature in the same block of the TrNAP relationship. Similarly, the words “for”, “but”, “because”, and other stop-words, were observed to fulfill analogous roles.
A limitation of this study is the fact that, in order for a relationship to be identified between two co-occurring concepts; those two concepts need to be identified first. A reliable concept recognizer is a prerequisite for the relationship identification to take place. Next, other types of relationships may also be defined between medical concepts. Moreover, this model only considered the text within a sentence. Such a simplification, by definition, puts a limitation on the sensitivity of the produced results. Future work should include natural language techniques in order to obtain a better understanding of the text, as well as resolve pronouns and inference.
VI. CONCLUSIONS
In this work, we defined a relationship identification schema between two concepts in text. In this scheme, the relationship is represented as a structure of five context-blocks: the introductory, first concept, connective, second concept, and conclusive block. This scheme automatically captured the word positions information; critical in certain relationships. We also found that assertion information was useful in identifying clinical tests conducted to investigate medical problems, and treatments which cause medical problems to get worse. Semantic types were useful in identifying treatments that improved a medical problem and UMLS concept identifiers were relevant in identifying two medical problems that were related to each other. Our system benefited from inclusion of stop-words, especially when found in the introductory and connective blocks of the relationship representation.
Acknowledgments
Funding: This research was supported by the Intramural Research Program of the NIH, National Library of Medicine.
Contributor Information
Rezarta Islamaj Doğan, Email: islamaj@mail.nih.gov.
Aurélie Névéol, Email: neveola@mail.nih.gov.
Zhiyong Lu, Email: luzh@mail.nih.gov.
References
- 1.Cohen AM, Hersh WR. A survey of current work in biomedical text mining. Brief Bioinform. 2005 March;6(1):57–71. doi: 10.1093/bib/6.1.57. [DOI] [PubMed] [Google Scholar]
- 2.Craven M. Learning to Extract Relations from MEDLINE. AAAI-99 Workshop on Machine Learning for Information Extraction; 1999. pp. 25–30. [Google Scholar]
- 3.Leitner F, Mardis SA, Krallinger M, Cesareni G, Hirschman LA, Valencia A. An overview of BioCreative II.5. IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):385–99. doi: 10.1109/tcbb.2010.61. [DOI] [PubMed] [Google Scholar]
- 4.Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A. Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biol. 2008 Sep;9(Suppl 2:S4) doi: 10.1186/gb-2008-9-s2-s4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Björne J, Ginter F, Pyysalo S, Tsujii J, Salakoski T. Complex event extraction at PubMed scale. Bioinformatics. 2010 June;15;26(12):i382–90. doi: 10.1093/bioinformatics/btq180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel HP. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics. 2008 April;9:207. doi: 10.1186/1471-2105-9-207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc. 2008 Jan-Feb;15(1):87–98. doi: 10.1197/jamia.M2401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wang X, Chused A, Elhadad N, Friedman C, Markatou M. Automated knowledge acquisition from clinical narrative reports. AMIA Annu Symp Proc. 2008 Nov;6:783–7. [PMC free article] [PubMed] [Google Scholar]
- 9.Wang X, Chase H, Markatou M, Hripcsak G, Friedman C. Selecting information in electronic health records for knowledge acquisition. J Biomed Inform. 2010 Aug;43(4):595–601. doi: 10.1016/j.jbi.2010.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform. 2003 Dec;36(6):462–77. doi: 10.1016/j.jbi.2003.11.003. [DOI] [PubMed] [Google Scholar]
- 11.Rindflesch TC, Pakhomov SV, Fiszman M, Kilicoglu H, Sanchez VR. Medical facts to support inferencing in natural language processing. AMIA Annu Symp Proc. 2005:634–8. [PMC free article] [PubMed] [Google Scholar]
- 12.Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc. 2008 Jan-Feb;15(1):14–24. doi: 10.1197/jamia.M2408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Pakhomov SV, Ruggieri A, Chute CG. Maximum entropy modeling for mining patient medication status from free text. Proc AMIA Symp. 2002:587–91. [PMC free article] [PubMed] [Google Scholar]
- 14.Shatkay H, Pan F, Rzhetsky A, Wilbur WJ. Multidimensional classification of biomedical text: Toward automated practical provision of high-utility text to diverse users. Bioinformatics. 2008;24(18):2086–2093. doi: 10.1093/bioinformatics/btn381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–36. doi: 10.1136/jamia.2009.002733. [DOI] [PMC free article] [PubMed] [Google Scholar]

