Semantic Annotation of Clinical Events for Generating a Problem List

Danielle L Mowery; Pamela Jordan; Janyce Wiebe; Henk Harkema; John Dowling; Wendy W Chapman

. 2013 Nov 16;2013:1032–1041.

Semantic Annotation of Clinical Events for Generating a Problem List

Danielle L Mowery ¹, Pamela Jordan ¹, Janyce Wiebe ¹, Henk Harkema ¹, John Dowling ¹, Wendy W Chapman ²

PMCID: PMC3900128 PMID: 24551392

Abstract

We present a pilot study of an annotation schema representing problems and their attributes, along with their relationship to temporal modifiers. We evaluated the ability for humans to annotate clinical reports using the schema and assessed the contribution of semantic annotations in determining the status of a problem mention as active, inactive, proposed, resolved, negated, or other. Our hypothesis is that the schema captures semantic information useful for generating an accurate problem list. Clinical named entities such as reference events, time points, time durations, aspectual phase, ordering words and their relationships including modifications and ordering relations can be annotated by humans with low to moderate recall. Once identified, most attributes can be annotated with low to moderate agreement. Some attributes – Experiencer, Existence, and Certainty - are more informative than other attributes – Intermittency and Generalized/Conditional - for predicting a problem mention’s status. Support vector machine outperformed Naïve Bayes and Decision Tree for predicting a problem’s status.

Introduction

In medicine, clinical narratives such as emergency department reports provide a concise overview of the patient’s progress with respect to a clinical encounter. The clinical narrative is a flexible medium that supports documentation of signs, symptoms and diseases experienced by a patient accompanied by tests, procedures and treatments administered by care providers to manage the patient’s problem status. These narratives have a rich history of use in electronic medical record systems¹ and are written to convey important clinical events that inform clinicians providing quality care. Natural language processing (NLP) is an approach used to identify, encode and extract these clinical events from clinical narratives to support a variety of use cases. NLP techniques can be used to extract patient medication lists²^,³, identify adverse drug effects⁴, and generate problem lists⁵^,⁶. Our long-term goal is to develop an NLP system that supports information extraction of clinical named entities and events for patient care environments including automatically generating patient problem lists for care providers and visually displaying medical record information for clinical researchers.

One important step necessary to building an automated NLP system that supports these uses is the development of an annotation schema that explicitly describes the information the NLP system should identify. Typically, humans annotate using the schema and the resulting annotations guide development of an automated extraction system. Before going to the effort of building an NLP system to annotate according to the schema, it is useful to evaluate inter-annotator agreement using the schema and test the informativeness of the schema information for the end goal – if the schema features are not useful for an NLP system, these features should not be encoded. In this paper, we will 1) introduce an annotation schema that supports clinical information extraction, 2) determine how well annotators apply the annotation schema to clinical reports, and 3) evaluate the informativeness of these annotation schema features for predicting a problem mention’s status. After revising the schema based on this study, we will annotate a larger corpus and develop NLP methods to extract information according to the schema.

Background

Traditionally, annotation schemas are used to capture information to be manually and/or automatically identified, structured, or extracted from clinical narratives. Researchers have developed these schemas to model semantic information at the document, section, sentence, and mention levels. Mention-level annotations can model a clinical named entity or event at the clause or phrase level, such as the type of clinical condition represented by a noun phrase. Researchers develop annotation schemas to model salient clinical named entity and event mentions (NEs) in clinical narratives such as patients, disorders, drugs, procedures, and temporal concepts. In addition to specifying the semantic categories of information to annotate in the report, the schemas often include attributes describing the NEs in context, addressing questions of who, what, when, where, and how. Other schemas aim to encode semantic relationships that occur between mention pairs including is-a and associated-with relationships. In the following section, we will review schemas developed for NEs, attributes, and relationships in clinical text. The works reviewed are not meant to be representative of all annotation schemas, but provide context for the schema we developed, which leveraged existing schemas.

Annotated Clinical Corpora

Several annotated clinical corpora have been developed in recent years to model the information contained in clinical reports, including CLEF⁷, i2B2 VA/Challenge⁸, and TimeML⁹. As part of a partnership with Royal Marsden Hospital, the CLEF project uses information extraction to support clinical research and evidenced-based medicine⁷. Named entities and events captured include conditions, locus, drug, etc. Condition and Locus have attributes such as negation and laterality, respectively. Relationships between named entities include coreferring and causal relations. As part of a shared task, the 2010 i2B2 VA/Challenge, discharge summaries were annotated with clinical named entities (problems, tests, and treatments), their assertion attributes (present, absent, possible, etc.), and causal relations (improves, reveals, etc.)⁸. One of the first efforts to adapt Saurí et al.’s TimeML schema⁹ to clinical corpora was undertaken by Savova et al.¹⁰. Their adaptation includes named entities representing TIMEX3 and events, attributes capturing tense, class, degree, and modality, and relationships linking TLINKS and ALINKS.

Our aim was to develop a schema that integrates named NEs, attributes, and relationships that are important for automatically identifying active problems that should be added to a patient problem list. We borrowed heavily from these existing schemas adding new elements when they did not already exist. We also aimed to align our annotated elements with current annotation initiatives in the NLP community including SHARP’s Common Type System¹¹ and ShARe’s Semantic Schema¹² to support the development of a generalizable NLP problem list generator applicable to data from different institutions.

Methods

In the next section, we introduce the annotation schema, describe a pilot study to evaluate the schema, and describe a proof of concept study using attributes in the schema as features for predicting a problem mention’s status.

Annotation Schema Introduction

The schema we developed addresses information important for interpreting a patient’s clinical conditions: 1) NEs, 2) attributes and their values, and 3) relationships between NEs.

NEs

We developed our annotation schema to encode information related to a patient’s disorders; therefore, other NE mentions are only annotated according to the schema if the NE is necessary for interpretation of the disorder.

For each NE, we define boundaries – start and end offsets – for the NE span in the text with square brackets followed by a subscript indicating the annotation type e.g., [chest pain]_CO is a spanned clinical condition mention.

➢ Conditions (CO): All problems represented by the UMLS semantic group disorders: signs, symptoms, diagnoses, and test results. “Patient had minor [chest pain]_CO.” Condition entities were annotated according to the guidelines described in Chapman et al. 2005¹³ and Chapman et al. 2006¹⁴.
➢ Reference events (RE): events that place the condition in a particular setting or clinical context. “Patient [was referred to cardiology]_RE for [chest pain]_CO.” Reference events are restricted to common care events (admissions, transfers, consults, discharge) and events (motorcycle crash) associated with temporal concepts (“[CVA]_CO from [motorcycle crash]_RE [in 1990]_TI”) that are useful for determining when the clinical condition occurred.
➢ Time points (TP): a particular instance in time. “[Three days ago]_TP he had a [stroke]_CO.”
➢ Time durations (TD): an interval or period of time. “[For the last three days]_TD he denied [extreme fatigue]_CO.”
➢ Aspectual phase (AP): the stage or phase of the event at a particular point in time (e.g., initial, middle, or end). “The [onset]_AP of her [nausea]_CO occurred after eating a fish dinner.” Aspectual phase describes the aspect of the interval representing the life cycle of an NE.
➢ Ordering word (OR): an expression positioning two events with respect to each other or a point of perspective in the discourse. “[After]_OR [admission]_RE, she [vomited]_CO”. A point of perspective can be a reference to other explicit events outside the current sentence boundaries or implicit time perspectives (narrative time, aforementioned time reference) in the discourse.

Figure 1 shows a section of an ED report annotated for these NEs and relationships.

Attributes

Identifying and spanning the NEs mentions in the text alone does not provide the necessary information for understanding the contextual characteristics of the mention in the sentence. In this section, we review the attributes for each NE mention.

Condition Attributes:

Every condition mention was annotated with the following attributes and their possible values (bolded values are the values applied to the example sentence):

➢ Experiencer – who is experiencing the condition.

Ex. The patient’s mother had [breast cancer]_CO. – Experiencer: other, patient
➢ Existence – whether a condition was present or not in the context of the mention.

Ex. He denies [chest pain]_CO – Existence: no, yes
➢ Change – whether there is variation in degree or quality of the condition.

Ex. She has had recurrent episodes of [viral meningitis]_CO – Change: unmarked, changing, unchanging, decreasing, increasing, worsening, improving, recurrence
➢ Intermittency – whether the condition is episodic in nature.

Ex. White female who complains of [maroon stools]_CO two times. – Intermittent: unmarked, yes, no
➢ Certainty – the amount of certainty expressed about whether a condition exists or not.

Ex. I have no suspicion for [bacterial infection]_CO – Certainty: unmarked, high, moderate, low
➢ Mental State – whether an outward thought or feeling about a condition is mentioned.

Ex. It seems to me there is some active upper [GI bleeding]_CO. – Mental State: yes, no
➢ Generalized/Conditional – whether a condition is in a non-particular or conditional context.

Ex. The patient has [chest pain]_CO at rest. – Generalized/Conditional: yes, no
➢ Relation to Current Visit – position of the condition time interval to the current encounter.

Ex. Past medical history: [Chronic Obstructive Pulmonary Disease]_CO – Relation to Current Visit: Before, Meets_Overlaps, After

For all conditions in which Relation to Current Visit equaled Before, we applied the following attributes:

➢ MagBeforeCurrentVisit – the magnitude of the condition’s onset before the current encounter.

Ex. He has had [abdominal pain]_CO for the last two days – MagBeforeCurrentVisit: 2, notClear, N/A, DateGiven
➢ UtsBeforeCurrentVisit – the units of the condition’s onset before the current encounter.

Ex. He has had [abdominal pain]_CO for the last two days – UtsBeforeCurrentVisit: days, notClear, N/A, DateGiven, hours, weeks, months, years

Reference Event Attributes:

We defined a subset of common key events or reference events describing where a condition occurred including common events of ambulatory care visits (admission, discharge, transfer). Our previous study²⁰ indicated there is also a need for non-clinical events that indirectly link temporal concepts to condition mentions. For instance, in the sentence “[In 2000]_TP, the patient [had a serious fall]_RE resulting in a [shattered knee cap]_CO.”, we know that the knee injury occurred in the year 2000, but the fall (a non-clinical event) is the linguistic link between the condition and temporal concepts. In our annotation schema, reference events do not have attributes, but will eventually be annotated by semantic types e.g., admission event.

Point Attributes:

Time points provide a reference for when a NE occurred. Time points generally refer to the beginning or end of intervals and are sometimes relative to the date of the emergency department visit. The attributes defined for time points will be used to map the time point to the beginning or end of an interval in later processing. We defined the Point attribute values using a subset of the temporal values defined by Zhou et al¹⁵. and similarly used by Irvine et al¹⁶.

Ex. [Approximately one week ago]_TP he had episodes of [fever]_CO.

➢ Distance Expressed – whether temporal concept contains a length of time. – yes, no
➢ Point Type – what type of temporal concept – Date and Time, Relative Date and Time, Fuzzy Time, Point of Perspective, Time Pronoun

Duration Attributes:

Time durations give an indication of how long a NE took place. The attributes for durations include common characteristics (beginning, length, and end) used to represent intervals and Duration attribute values are the same as Point attribute values.

Ex. He has had [abdominal pain]_CO [for the last two days]_TD.

➢ Length Expressed – whether temporal concept contains the length of interval – yes, no
➢ Beginning Type – what type of temporal concept is at the start of the interval –Date and Time, Relative Date and Time, Fuzzy Time, Point of Perspective, Time Pronoun
➢ Ending Type– what type of temporal concept is at the end of the interval –Date and Time, Relative Date and Time, Fuzzy Time, Point of Perspective, Time Pronoun

Aspectual Phase Attributes:

Aspectual phase words denote the stage of NEs at a particular time in the narrative. For aspectual mentions, our annotation schema defines attribute values consistent with the TimeML specification⁹.

Ex. Her [fever]_CO has [abated]_AP.

➢ Phase Type –whether beginning, middle or end of event – Initiation, Continuation, Culmination

Ordering Word Attributes:

Ordering words denote the sequential position of reference events and conditions with respect to one another. We used a simplified set of Allen’s temporal intervals¹⁷ similar to Saurí et al.’s TimeML TLinks⁹ to annotate the ordering type between NEs. We instructed annotators to determine the ordering type that most closely represents the ordering word (e.g., follows is semantically similar to “after”) then assign the NEs as arguments 1 and 2 to semantically represent what was meant (e.g., “syncopal episode” after “weak”).

Ex. [Syncopal episode]_CO [yesterday]_TP [after]_OR feeling quite [weak]_CO.

➢ Ordering Type – temporal position of one entity/event to another – Precedes, During, Follows

Relationships

In the early phase of our project, we recognized that the large number of explicit and implicit relationships among NEs could present a cognitive burden on even the most skilled annotator. The focus of our task is to model relationships that describe the condition in a given context relative to a particular place (reference events), time (temporal concepts), stage (aspectual phase), and order (ordering words). As such, we only annotated two relationship types, modifies and orders, between mentions using four simple and restrictive rules.

➢ Rule 1: Reference event modifies condition. Only instantiate a condition as a reference event when it serves as a direct link to a clinical condition.
➢ Rule 2: Temporal concept (Points and Durations) modifies conditions and reference events.
➢ Rule 3: Aspectual phase modifies conditions and reference events.
➢ Rule 4: Ordering expression orders all combinations of pairs of conditions, reference events and points of perspective with respect to each other.

The arrows in the following section contain the same meaning as arrows defined in Figure 1 and directional constraints between mentions are illustrated in Figure 2 below.

Figure 2. — Allowed relationships between NEs.

1. Annotation Study Design

We conducted a pilot annotation study approved by the University of Pittsburgh Institute Review Board. We randomly selected 30 de-identified emergency department (ED) reports from the University of Pittsburgh Medical Center. One author (JD) annotated these reports for clinical conditions. A second annotator reviewed the annotations and came to consensus with JD on any missing or spurious annotations. Five authors (DM, PJ, JW, HH and WC) initially developed the annotation schema based on a literature review of linguistic phenomenon, on existing schemas, and on error analyses of the ConText algorithm¹⁸ and temporal classifiers developed by our lab¹⁹^,²⁰. Once the annotation guidelines were written, two authors (DM and PJ) annotated the remaining NEs, attributes and relationships in the 30 ED reports. Using the Knowtator annotation tool²¹, the authors reached consensus using two authors (JW and WC) as adjudicators for any disagreements. The resulting annotations and guidelines were updated iteratively through this process serving as the reference standard and training materials for the annotation study described next.

Medical and nonmedical students were recruited from the University of Pittsburgh using flyers distributed throughout the campus. Over the course of three days, we obtained informed consent, trained subjects about our annotation schema, and reviewed annotation software (Protégé 3.3.1 with Knowtator plugin). In an attempt to reduce the likelihood of annotator fatigue due to the schema’s complexity, we assigned the majority attribute value for the previously annotated conditions as default values. Annotators were instructed to change the default value to semantically represent the mention in the text and to annotate additional NEs in the sentence containing the pre-annotated condition as needed. Annotators were given three weeks to independently complete annotation of the 30 ED reports. To determine each annotator’s accuracy at the task, we compared each annotator’s completed dataset against the reference standard with a python (v 2.5) script we developed, as follows:

NEs

We evaluated the agreement for identifying NEs (other than previously annotated, clinical condition NEs) by assessing annotated mentions against reference standard NE annotations. We considered exact and overlapping span to be true positive NE annotations if the overlapping annotations were assigned the same NE type.

We counted the number of true positives (TP: mention occurs in the reference standard), false positives (FP: mention does not occur in the reference standard) and false negatives (FN: mention was not annotated from the reference standard). We computed recall to determine the proportion of the reference standard NEs annotators identified and precision to determine the proportion of annotated NEs also generated by the reference standard²². We could not count the number of true negative annotations (i.e., a string was correctly not annotated as an NE); therefore, we applied the F₁ score as a surrogate for kappa, since the F₁ score approaches kappa as the number of true negatives become large²². We report the mean and standard deviation for each metric.