Anatomical Entity Recognition with a Hierarchical Framework Augmented by External Resources

Yan Xu; Ji Hua; Zhaoheng Ni; Qinlang Chen; Yubo Fan; Sophia Ananiadou; Eric I-Chao Chang; Junichi Tsujii

doi:10.1371/journal.pone.0108396

. 2014 Oct 24;9(10):e108396. doi: 10.1371/journal.pone.0108396

Anatomical Entity Recognition with a Hierarchical Framework Augmented by External Resources

Yan Xu ^1,², Ji Hua ¹, Zhaoheng Ni ¹, Qinlang Chen ¹, Yubo Fan ¹, Sophia Ananiadou ³, Eric I-Chao Chang ^2,^*, Junichi Tsujii ^2,^*

Editor: Ramin Homayouni⁴

PMCID: PMC4208750 PMID: 25343498

Abstract

References to anatomical entities in medical records consist not only of explicit references to anatomical locations, but also other diverse types of expressions, such as specific diseases, clinical tests, clinical treatments, which constitute implicit references to anatomical entities. In order to identify these implicit anatomical entities, we propose a hierarchical framework, in which two layers of named entity recognizers (NERs) work in a cooperative manner. Each of the NERs is implemented using the Conditional Random Fields (CRF) model, which use a range of external resources to generate features. We constructed a dictionary of anatomical entity expressions by exploiting four existing resources, i.e., UMLS, MeSH, RadLex and BodyPart3D, and supplemented information from two external knowledge bases, i.e., Wikipedia and WordNet, to improve inference of anatomical entities from implicit expressions. Experiments conducted on 300 discharge summaries showed a micro-averaged performance of 0.8509 Precision, 0.7796 Recall and 0.8137 F1 for explicit anatomical entity recognition, and 0.8695 Precision, 0.6893 Recall and 0.7690 F1 for implicit anatomical entity recognition. The use of the hierarchical framework, which combines the recognition of named entities of various types (diseases, clinical tests, treatments) with information embedded in external knowledge bases, resulted in a 5.08% increment in F1. The resources constructed for this research will be made publicly available.

Introduction

Since anatomical locations play a crucial role in organizing information and knowledge in the clinical domain, the identification of expressions which refer to them in text has been identified as an important target of Natural Language Processing (NLP) in the clinical domain. In this paper, we use the term anatomical entities to refer to expressions which correspond to anatomical locations such as body parts, organs, and their subparts. Such expressions may be explicit or implicit. Earlier studies on anatomical entity recognition focused only on expressions which refer to explicit anatomical entities [1]–[5]. However, implicit references are abundant in clinical records. Indeed, clinical experts can hardly realize whether expressions are explicit or implicit [6]. For example, a clinical record may report that a patient has had an ECG test. Whilst the term “ECG” itself does not refer to an anatomical entity explicitly, the mention of an ECG test does suggest to an expert that the patient has problems with his or her heart.

In this research, entities such as ECG are defined as implicit anatomical entities. Although such entities belong to different semantic classes from anatomical entities, they are nonetheless strongly associated with anatomical entities (e.g. heart). Mentions of such implicit anatomical entities are as important as those of explicit anatomical entities, since they provide clinical experts with clues about patients' conditions with respect to specific anatomical locations. Although several tools have been developed for clinical information extraction, none of them has focused on the recognition of implicit anatomical entities.

Recognition of implicit anatomical entities presents two challenges with respect to current technology. Firstly, since implicit entities themselves belong to diverse semantic classes, expressions which refer to them appear in different contexts, depending on their semantics classes. Thus, we cannot construct one single recognizer which assumes that they appear in homogeneous local contexts. Secondly, to determine which semantic classes correspond to implicit anatomical entities requires the domain knowledge of clinical experts.

In order to identify implicit anatomical entities, we have developed a hierarchical framework, in which different layers of named entity recognizers (NERs) work in a cooperative manner. In order to resolve the first problem, we used as the first layer of NER an existing tool capable of recognizing multiple semantic classes, i.e., the multi-class recognizer which we developed for I2B2 challenge tasks [7]. This tool recognizes three major classes of entities (i.e., Diseases, Clinical Tests and Clinical Treatments) that could potentially constitute implicit references to anatomical entities. The second NER layer recognizes explicit anatomical entity mentions, whilst the third layer determines which of the candidate entities from the first layer actually represent implicit references to anatomical entities. All layers of the framework are based on the Conditional Random Field (CRF) models [8]–[10].The third layer exploits Wikipedia and WordNet as knowledge resources. Entities recognized by the multi-class recognizer (first level) are checked against the knowledge resources. If the resources specify an explicit link exists between the candidate implicit entity and a specific anatomical entity, then specific features used by the CRF model are set.

A comprehensive dictionary of expressions is known to improve the performance of named entity recognizers. We have thus supplemented the use of the above external resources with the construction of a dictionary of known anatomical entity expressions using a number of existing resources, i.e. the Unified Medical Language System (UMLS) [11], Medical Subject Headings (MeSH) [12], RadLex [13] and table of BodyParts3D [14]. The dictionary matching results are used as features by both the explicit and implicit anatomical entity recognizers.

Abbreviations, which are abundant in clinical records, are one of the major causes of difficulties for NERs used within clinical applications. This is because a large proportion of abbreviations occurring in clinical records are local and ad hoc in nature. i.e., they are only used in a given text and their full forms appear in the same text. Due to their local nature, we cannot include them in a dictionary. Instead, we assume that their full forms can be discovered using existing abbreviation detection techniques [7], [15]. Since abbreviation detection is not the focus of this research, we make use of coreference chains that are already annotated in our corpus to find the full forms of abbreviations. Each abbreviation is replaced by its full form prior to the application of the NERs. The coreference feature is explained in greater detail in the Methods section.

Related Work

Named entity recognition is the first step of information extraction (IE), which maps information in text to the knowledge of a domain. The Medical Language Extraction and Encoding System (MedLEE) was one of the earliest systems developed to carry out named entity extraction on clinical text. Their extraction method used is based on the use of semantic lexicons and hand-written rules [16]–[19]. Hripcsak [16] found that NLP has a potential to extract clinically important information from narrative reports, as an aid to automated decision-support and clinical research. Friedman [18] further improved extraction of relevant clinical information and UMLS coding using NLP. In 2009 and 2010, the I2B2 challenge tasks [20], [21] constituted the first serious attempt of focus attention on named entity extraction in the clinical domain. The tasks included extraction of medical concepts (problem, treatment and test). Since the I2B2 organizers provided a reasonably large annotated corpus, most of the groups participating in the I2B2 challenge tasks used machine learning-based approaches. In particular, state-of-the-art NER performance has been achieved by systems based on the Conditional Random Field (CRF) model. Recent trends in NER and ER (Event Recognition) in the biomedical domain is surveyed by Ananiadou et al. [22].

There have been a number of efforts to building dictionaries of anatomical entities and associated ontologies. In 2003 and 2008, Rosse et al. [23], [24] proposed a fundamental model of anatomy and proposed a reference ontology of anatomy. While MedLEE performs anatomical entity extraction [25], [26], the underlying dictionary does not link to any reference ontology. In 2010, Naderi et al. [27] presented the organism tagger, focusing on recognizing various subcategories of organisms. Their system is a hybrid rule-based/machine learning system. Machine learning-based anatomical entity recognition has been studied by Pyysalo et al. [6], and they constructed a dictionary using resources available in the open biomedical ontologies (OBO) repository. However, these previous studies dealt only with explicit anatomical entities listed in the reference ontology; they have not exploited information embedded in external resources to identify implicit anatomical entities.

In the general domain of NERs, there have been numerous attempts to use external resources to improve the performance of systems. For example, Kazama et al. [28] explored the use of Wikipedia as such an external knowledge base. Cucerzan [29] also used Wikipedia to disambiguate named entities in a large-scale system for texts in the general domain. In the clinical domain, Rink et al. [30] used Wikipedia to produce features for relation classification among medical concepts, and achieved the best performance in the relation extraction task of the 2010 I2B2 challenge. Xu et al. [31] used diverse external resources (e.g., Wikipedia, WordNet, Probase) to produce features for co-reference recognition, and achieved the best performance in the coreference task of the 2011 I2B2 challenges. Xu et al. [32] also used web resources to improve their sentiment classifier in the 2011 I2B2 sentiment analysis challenge. The present work is a natural extension of the previous attempts at anatomical entity recognition. In particular, our work focuses on how to use relational information embedded in ontological resources, such as entities and their anatomical locations.

Methods

In order to examine how anatomical entities are referred to in clinical records (i.e., discharge summaries in this study), we first asked a clinical expert to annotate expressions which he considered to be “anatomical entities”. As a result, we found that a large number of expressions annotated as anatomical entities did not explicitly refer to anatomical locations. Therefore, we decided to distinguish such implicit entities from explicit anatomical entities in our annotation scheme. These two types of anatomical entities are also treated separately by our entity recognizer.

The architecture of the system is shown in Figure 1 . As is common practice in NERs, we apply standard language NLP tools (i.e., POS tagging, parsing, and character string processing) to extract features which have been found effective in NERs operating in other domains. We refer to these commonly-used standard set of features as the baseline features.

In addition to the baseline features, the second layer CRF recognizer uses features derived from the first layer recognizer, in addition to external knowledge sources (i.e., Wikipedia [33] and WordNet [34]), which we will discuss in detail in the following sections.

Annotation

Consistency and comprehensiveness of annotation greatly affect the performance of the system and the credibility of experimental results. In order to ensure the quality of annotation, we performed several iterations of preliminary annotation prior to the final annotation effort.

Our annotated dataset is the same set of 300 discharge summaries used by the I2B2 challenges, which consists of 28642 sentences. The final annotated corpus includes 16690 explicit anatomical entity tokens and 5564 implicit anatomical entities tokens. The link is following: https://drive.google.com/file/d/0B1A1rRX4lVdxbmhUVFUyWlRQOFk/edit?usp=sharing.

Annotation was performed by three annotators, two with a biomedical engineering background and one with a clinical background.

Annotation guidelines

Expressions of anatomical location in biomedical text are often categorized into one of five different levels, i.e., systems, organs, tissue, cells, chemicals (e.g. ions and molecules) [35]. Since cells and chemicals commonly exist in every part of the human body and are not useful for the current study, we only annotated expressions referring to the top three levels: systems, organs, and tissue.

An explicit anatomical entity is defined as an expression which directly denotes a specific body component of the system, organ, or tissue level. In other words, we consider the medical domains which can describe the human body at such levels as explicit anatomical entities in clinical texts. Such explicit anatomical entities are not limited to nouns or noun phrases. Adjectives or adjectival phrases such as “pulmonary” are also treated as explicit entities.

Implicit anatomical entities comprise a wide range of medical terms. In this study, medical terms that belong to the following categories are defined as potential implicit anatomical entities: (1) Medical problems (e.g., diseases) which occur in specific parts of the body or are caused by abnormalities of specific body components. For example, “pneumonia” implicitly refers to the lung, while “Hypertension” implicitly refers to vessels, as it is mostly physiologically caused by blood vessel abnormalities, such as narrowing of arteries and arterioles. (2) Clinical treatments specifically aimed at certain body components, such as “mastectomy”. (3) Clinical tests that are closely related to body components such as “ECG”.

Apart from expressions belonging to these three classes, expressions were also annotated as implicit anatomical entities if they express relations with body components or contain useful clinical information. For example, the set of adjectives which have structures of “positional prefix or word+body component” were also annotated. Expressions of this type refer to one or more peripheral areas around the body component, such as “supraclavicular” and “infraclavicular”.

According to their corresponding full forms, abbreviations are treated as either explicit or implicit. If the full form of an abbreviation contains an explicit anatomical entity, it is annotated as explicit. For example, “cp” (chest pain), is identified as an explicit anatomical entity. If an explicit anatomical entity is not included in the full form, then the abbreviation will be treated as an implicit entity.

Note that words like “neurology” are associated with components of human body but do not refer to specific components. Thus, we do not annotate them as anatomical entities. Special attention is given to different usages of the same terms. For example, “visual” mostly refers to the observer (e.g., visual inspection) but it can be used to denote an anatomical entity of the patient. While the former usage is not annotated as an anatomical entity, the latter is. Figure 2 shows the example of annotations in discharge summaries.

Annotation Flow

The three annotators annotated 10 discharge summaries independently of each other and then discussed differences among their annotations. Based on the results of discussion, each annotator independently produced their own set of guidelines. The next round of annotation was performed independently based on these individual sets of guidelines. The same cycle of discussion, revisions of individual guidelines, and independent annotation was repeated until reasonable convergence of annotations was reached. The three annotators then compiled their individual sets of guidelines into a unified set of guidelines.

Using thee unified guidelines, a further two rounds of annotation were performed. The first round was carried out by the two annotators (A1 and A2) with the biomedical engineering background, independently of each other. The annotations produced by A1 and A2 were checked by the third annotator (A3) with the clinical background. If there was a disagreement between A1 and A2, A3 took the role of adjudicator and explained his judgment to A1 and A2. The guidelines were further revised based on the outcome of this process. The final version of the guidelines was used in the annotation of the complete set of 300 discharge summaries. Both A1 and A2 performed the annotation work independently on the whole set of discharge summaries, while A3 made the final decision in case of disagreements between A1 and A2.

Inter-annotator agreement

We used the kappa coefficient [36] to measure the inter-annotator agreement. Table 1 summarizes the inter-annotator agreement between A1 and A2. Table 2 shows the inter-annotator agreement between each annotator and the gold standard. The gold standard constitutes the corpus following adjudication by A3 on the differences between the annotations of A1 and A2.

Table 1. Inter-annotator agreement between A1 and A2.

	Explicit	Implicit
True positive	13011	3287
False positive	951	491
False negative	522	312
k	0.9284	0.8812

Open in a new tab

Table 2. Inter-annotator agreement between each annotator and the gold standard.

		Explicit	Implicit
A1	True positive	13787	3577
	False positive	1495	465
	False negative	924	302
	k	0.8901	0.8937
A2	True positive	13116	3328
	False positive	1679	596
	False negative	972	389
	k	0.8768	0.8590

Open in a new tab

As shown in Table 1 and Table 2 , some level of disagreement still existed between A1 and A2 during the final annotation stage. However, the differences were very small. The gold standard may still contain annotation errors, since adjudication by A3 was performed only when A1 and A2 gave different annotations. However, considering the small k between A1 and A2, the remaining errors in the gold standard are expected to be very few in number, and the adjudicated corpus is accurate enough to be used in practice.

In the following experiments, we use the gold standard as the training and the test data sets using five-fold cross-validation.

CRF Model and Features

As mentioned above, we employed the CRF model in this work due to its wide and successful application in other NER tasks. As the baseline features, we used a standard set of features which have been found useful in previously reported NERs of diverse types. Subsequently, we added a set of features specific to explicit and implicit anatomical entities. As illustrated in Figure 1, the system consists of two layers of NERs, which are both trained using the CRF model.

An NER based on the CRF model sees the entity recognition problem as a sequence labeling problem. Each recognizer assigns one of the three labels Begin/Inside/Outside (BIO) to each word in a sentence. B and I labels, also called B-tag and I-tag, mean that the corresponding word constitutes beginning or intermediate word of a named entity, respectively. An O-tag means that the word does not constitute part of a named entity. A CRF model assigns one of these tags to each word in a sentence, successively from left to right, by observing the word itself and the local context in which the word appears. A word and its local context are represented by features attached to both the focused word and the words in its context. The performance of a CRF-based recognizer is determined by the set of features which are used to characterize words. Table 3 shows the features used in our system, which are explained in detail in the following sections.

Table 3. List of features in this task.

Category	Features
Baseline features	Original
	Capital Upper
	Upper
	Normalized form
	Prefix and Suffix
	Concept dictionary matching
	Concept type
	Stanford Parser POS
	Enju Parser POS
Ontological features	4 dictionaries matching (DF1)
	Position matching(DF2)
Coreference features	Coreference dictionary matching (CF)
World knowledge features	Wiki word matching (WF1)
	Wiki word matching (WF2)
	Wiki word matching (WF3)
Hierarchical feature	Hierarchical feature(HF)

Open in a new tab

Baseline features

The baseline features are those which have been commonly used in entity recognizers in previous studies. These features are computed by using standard NLP tools. They are:

Original word feature: the word itself.
Capital upper feature (Binary feature): 1 if the initial character of the word is an upper case letter, otherwise 0.
Upper case feature (Binary feature): 1 if all characters in the word are upper case letters, otherwise 0.
Part of Speech (POS) feature: the POS of a word (noun, verb, adjective, preposition, etc.), as determined by the Enju parser [37] and Stanford parser [38].
Original forms of name entities are used as a feature obtained by the basement of Enju parser.
Prefix and Suffix features: used to allow recognition of morphological variants of words to be mapped to a normalized form. For example, “abdominal” (abdomen). The prefix feature was derived from combinations (in order) of the first eight characters in a word and suffix feature was derived from combinations (in order) of the last eight characters in a word.

Figure 3 gave a detailed explanation of baseline features for the named entity “bronchitis”, which can express the standard feature formation in CRF model.

Ontological features: DF1 and DF2

Due to the ambiguity of named entity expressions, existence of nested expressions, and incompleteness of dictionaries, the performance of a recognizer solely based on dictionary matching is known to be unsatisfactory. However, it is also known that the results of dictionary matching against a comprehensive resource as features of a CRF-based recognizer can be very effective for improving the performance of the recognizer. That is, an ontological feature of a word is set to 1 when the word appears as part of a word sequence which matches an entry in a dictionary.

We prepared two dictionaries to compute ontological features (DF1 and DF2) used in our recognizer. The effectiveness of such an ontological feature largely depends on the comprehensiveness of the dictionary. To construct a comprehensive dictionary of anatomical entities, we have used four resources: UMLS, MeSH, RadLex, and BodyPart3D. The first dictionary (Dictionary-1) was constructed by extracting relevant entries from each of these four resources (further details are provided below). Dictionary-1 is expected to cover explicit anatomical entities. As Table 4 shows, the actual coverage of Dictionary-1 is much higher than any one of the four individual resources. The coverage in this table refers to the percentage of expressions annotated as explicit named entities in the gold standard dataset which appear in each resource. DF1 is the ontological feature based on this dictionary.

Table 4. Numbers of entities in dictionaries.

Dictionary	Number of explicit tokens matched in dictionary	Coverage of explicit named entities
UMLS	3012	18.05%
MeSH	2174	13.03%
RadLex	3238	19.40%
BodyParts3D	1595	9.56%
Total without duplications (Dictinary-1)	4019	24.08%

Open in a new tab

In Dictionary-1, 77504 entities are extracted from UMLS, belonging to the semantic types “bpoc” (body part and organ), “tisu” (tissue), “blor” (body location or region) and “bsoj” (body space or junction). From MeSH (Medical Subject Headings), 622 entities belonging to relevant categories are extracted. We extracted all entities (a total of 11406) classified under the type “anatomy_metaclass” in RadLex. From BodyPart3D, 1524 anatomical entities were extracted. We have removed duplications of entities extracted from these resources. As a result, Dictionary-1 contains 86,002 entities in total.

Dictionary-2 contains positional adjectives or adjectival phrases, which can be combined with anatomical entity expressions to create larger units of anatomical expressions. DF2 is the feature associated with this dictionary.

Positional words and phrases may be combined with anatomical expressions to produce new anatomical expressions. The position matching dictionary (Dictionary-2) contains of total of 43 such expressions and 1524 entities. Since the set of such expressions is a closed one, we enumerated them by manual inspection of discharge summaries. Dictionary-2 contains words such as “left” (left arm), “bilateral” (bilateral knees) and “distal” (distal ulnar). Abbreviations of positional words were also included, such as “bilat” (bilateral). The dictionary contributes to the accuracy of boundary detection of anatomical expressions (e.g. “bilateral knees” and “left hand” are recognized as anatomical expressions, instead of “knee” and “hand”).

Coreference features

Coreference chains were exploited to alleviate the problem caused by abbreviations. If an abbreviation and its full name appear in the same discharge summary, the abbreviation is called a local abbreviation. That is, a local abbreviation is introduced in the current text. As explained previously, the nature of local abbreviations means that we cannot provide a dictionary of them in advance. However, there have been several research efforts focusing on coreference of local abbreviations and their full forms [7], [15]. These works have showed that coreference relations between local abbreviations and their full forms can be recognized with relatively high accuracy. In the current work, instead of implementing these algorithms, we used coreference links already provided in the gold standard I2B2 corpus. If a coreference chain contains at least one expression recognized as an anatomical entity, the co-reference feature of all expressions in the chain is set to 1. Local abbreviations are treated in the same way as their full forms if the full forms are anatomical entities.

World knowledge features: WF1, WF2, and WF3

For implicit anatomical entity recognition, we have to solve two separate problems. The first problem is to identify a set of entities that belong to other semantic classes (diseases, clinical tests, clinical treatments, etc.) but are strongly associated with specific anatomical entities. Since there are no dictionaries that define which members of the above semantic classes correspond to implicit anatomical entities, we cannot use simple dictionary matching as we do for explicit anatomical entities (i.e. DF1).

The second problem is identification of the boundaries of implicit entities. Because implicit entities themselves belong to different semantic classes, the contexts in which they appear differ, depending on their semantic classes. A CRF recognizer which treats implicit entities as a single class would not be able to recognize them accurately.

The second problem leads to the two-layer architecture of our system. Instead of applying a single CRF model directly, we first apply the multi-class CRF recognizer [35], which we developed for I2B2 challenge tasks. The multi-class recognizer identifies spans entities belonging to three different classes (i.e., diseases, clinical tests, and clinical treatments). We refer here to the named entities extracted by this recognizer as medical concepts. These named entities constitute implicit anatomical entity candidates.

To solve the first of the above problems, we use two knowledge sources, i.e., Wikipedia and WordNet, to try to determine where a link exists between the medical concept and an explicit named entity. Since neither Wikipedia nor WordNet is a structured knowledge base, they do not express structured associations between implicit entities and explicit entities. Instead, they just provide free text definitions of medical concepts. Thus, we have to judge whether these free text concept definitions imply associations with specific anatomical entities. In the current system, we use simple heuristics for this judgment. That is, we check whether the definition of a medical concept includes any anatomical entity appearing in the Anatomy Dictionary (Dictionary-1). If so, the medical concept is taken represent an implicit anatomical entity. In WordNet, the complete definition of the concept is considered, while in Wikipedia, we treat the first three sentences of the entry for the concept as the definition.

These two steps, i.e., the application of the multi-class entity recognizer and recognition of candidates for implicit anatomical entities from the external resources, can be seen as a sophisticated dictionary matching process for implicit anatomical entities. That is, entities recognized by the multi-class recognizer are checked with corresponding entries in Wikipedia and WordNet to see whether they are associated with entries in the anatomy dictionary (Dictionary-1). If such associations are found, the features (WF2 for Wikipedia and WF3 for WordNet) of component words of the entries are set to 1. In the same way as the ontological feature of DF1 for explicit anatomical entities, these features are used as features of the CRF recognizers for both explicit entities and implicit entities. Since many expressions used in discharge summaries are not formal medical terms (e.g. ex for extremity), they do not have corresponding entries in Wikipedia. In order to alleviate such mismatches between expressions in discharge summaries and entries in Wikipedia, we used another feature, WF1, as explained below.

Wiki word matching (WF1): Regardless of the results of the first layer of the CRF recognizer, we consider all the words in a discharge summary sentence (except for stop words, such as “the”, “and”, “in”, etc.) and, if the word has a corresponding entry in Wikipedia, we check whether any words in the definition appear in Dictionary-1. If so, the WF1 feature of the word is set to 1, otherwise the feature is set to 0.

Wiki word of concept matching (WF2): For this feature, instead of considering all the words in discharge summaries, we consider only those corresponding to medical concepts (as determined by our multi-class CRF recognizer). For each medical concept word, we determine whether there is a corresponding entry in Wikipedia. If so, we check whether the definition matches any entries in Dictionary-1. If so, the feature WF2 of all the words in the expressions are set to 1, otherwise 0. Note that named entity expressions recognized by the first layer recognizer may consist of more than one word.

WordNet word of concept matching (WF3): We take named entity expressions recognized as such by the first layer CRF recognizer, and, if there is a corresponding entry in WordNet, we check whether the any words in the words in definition appear in Dictionary-1. If so, the feature WF2 of all the words in the expressions will be set to 1, otherwise 0.

A hierarchical framework for implicit anatomical entity recognition

Our framework approaches the problem of recognizing anatomical entities using two separate CRF recognizers, one for explicit anatomical entities and the other for implicit entities. The system consists of the following three steps:

[Step 1] Recognition of entities of belonging three semantic classes (Diseases, Clinical Tests and Treatments), which may constitute implicit anatomical entities. This step is carried out by our multi-class CRF recognizer (first layer recognizer) developed for I2B2 challenges.
[Step 2] Recognition of explicit anatomical entities by the explicit anatomical entity recognizer (second layer recognizer).
[Step 3] Recognition of implicit anatomical entities by the implicit anatomical entity recognizer (third layer recognizer).

As input to the recognizers in [Step 2] and [Step 3], the system computes several types of features, i.e. baseline features, dictionary-based features (DF1 and DF2), knowledge-based features (WF1, WF2 and WF3) and a hierarchical feature (HF). To compute HF, we apply the explicit anatomical entity recognizer ([Step 2]) to definitions in Wikipedia. Specifically, the first three sentences in every entry from Wikipedia were extracted and used as test data for the explicit anatomy entity recognizer. We can judge the result of hierarchical feature by seeing if any explicit named entity was recognized. Unlike other features, the hierarchical feature (HF) is only used to recognize implicit anatomical named entities.

Figure 4 illustrates how the hierarchical feature (HF) is built into the framework. “Arteries”, an explicit entity, appears in the definition of “Hypertension” in Wikipedia, and is recognized by the explicit anatomical entity recognizer ([Step 2]). Thus the feature value of HF for the word “hypertension” is set to “1”. This feature contributes to the negative judgment by the explicit anatomical entity recognizer and the positive judgment by the implicit anatomical entity recognizer.

The recognizers in [Step 2] and [Step 3] are trained using 5-fold cross-validation on the annotated corpus. Since the corpus also contains annotations of diseases, clinical tests and treatments (as annotated by the I2B2 organizers), we compute the features of WF2 and WF3 for entities annotated as such by the I2B2 organizers. These computed features are used for training the recognizers. For testing purposes, we performed two experiments, one using the I2B2 gold standard annotation and the other using the results of the multi-class recognizer ([Step 1]), to compute WF2 and WF3.

Experiments

In order to evaluate the effect of features and the size of training data, we conducted six groups of experiments, split into two sets. In order to evaluate our novel methods independently of the performance of the multi-class recognizers developed for I2B2, we firstly conducted preliminary experiments using the gold standard I2B2 annotated text as our source of diseases, clinical tests and treatment annotations. We then carried out a second set of experiments, in which the multi-class recognizer was used to recognize diseases, clinical tests and treatments ([Step 1]). The results of the two different sets of sets of experiments allow us to assess the influence of errors introduced by the multi-class recognizer on anatomical entity recognition.

The performance of system is evaluated using the three standard performance metrics.i.e., precision (P), recall (R) and F1 (F) [39]. We use the strictest criterion for evaluation on boundary detection, which means that both of the right and the left boundaries of anatomical entities must be correctly recognized.

The first group of experiments (using the gold standard disease, clinical tests and treatment annotations from I2B2) was designed to evaluate the effectiveness of each of our resource-derived features. The results are shown in Table 5 . A further group of experiments in Table 6 shows the cumulative effects of combining together features on the performance of the recognizers. Table 7 shows the effect of the training data size on the performance. We varied the size of the test data from 50 to 300 discharge summaries while the training data size was always 240, which was extracted from the 300 summaries. In each experiment, five-fold cross-validation was used to evaluate the performance.