Abstract
Patients report their symptoms and subjective experiences in their own words. These expressions may be clinically meaningful yet are difficult to capture using automated methods. We annotated subjective symptom expressions in 750 clinical notes from the Veterans Affairs EHR. Within each document, subjective symptom expressions were compared to mentions of symptoms in clinical terms and to the assigned ICD-9-CM codes for the encounter. A total of 543 subjective symptom expressions were identified, of which 66.5% were categorized as mental/behavioral experiences and 33.5% somatic experiences. Only two subjective expressions were coded using ICD-9-CM. Subjective expressions were restated in semantically related clinical terms in 246 (45.3%) instances. Nearly one third (31%) of subjective expressions were not coded or restated in standard terminology. The results highlight the diversity of symptom descriptions and the opportunities to further develop natural language processing to extract symptom expressions that are unobtainable by other automated methods.
Introduction
Symptoms are subjective experiences that are reported by patients to their providers. From the clinical perspective, the rule of thumb is that a symptom is what the patient complains of whereas a sign is what the clinician observes. Some findings, such as rash or fever, may be reported by the patient and also directly observed by the clinician.
Elicitation and assessment of symptoms is a key aspect of the patient-clinician interaction. Symptoms play a critical role to support phenotypic classification, clinical diagnosis, disease detection, therapeutic decision-making, assessment of disease severity, and evaluation of treatment response. The challenge for epidemiologists, health services researchers, and others involved in health management is that the accessibility of symptoms for population analysis is severely limited. The most common technique to extract symptoms and other clinical details from patient records is manual chart review, which is expensive, time-consuming, and not scalable. Alternative methods are also problematic. Although diagnosis codes are sometimes used to study symptoms, they are not designed for this purpose, as they only code symptoms when an appropriate diagnosis is not documented. Patient surveys are used in selected circumstances but are also expensive and circumscribed in scope.
Natural language processing offers the potential to efficiently and accurately extract reports of symptoms from electronic records 1 – 4 . However, the nature of the language used to describe symptoms in clinical notes has received little scrutiny. Figures of speech and idioms are parts of everyday speech. When patients use these expressions to report symptoms, the clinician is faced with the decision of whether to translate the lay descriptions into clinical jargon or to record both the patient’s expressions and the medical term analogue. For instance, “everything is spinning around” might be converted into “experiencing vertigo” and “burning up” may be rendered as “fever”.
Clinical terms are succinct representations of the clinical reasoning process with diagnostic connotations recognized by other clinicians. However, key issues may be ‘lost in translation’. In contrast to clinical terms, subjective descriptions provide details of the patient’s experience. The information is more individualized and often conveys the nature and severity of the symptom and its impact on the patient. An improved understanding of the expressions used to characterize symptoms is useful for the development of natural language processing tools. Capturing subjective descriptions may facilitate implementation of cognitive support systems, improve phenotypic classification, and advance personalized health care.
Our goals in this analysis were to characterize the prevalence of subjective symptom expressions in different types of notes, to classify them into semantic groups, and to compare them with conventional symptom assertions grounded in clinical terminologies. In addition, for each sampled document (and corresponding visit), the assigned ICD-9-CM diagnosis codes were examined to determine the presence of a condition which plausibly included the subjective symptom expression as one of its known manifestations.
Methods
An operational definition of symptoms was defined as anything that plausibly represented the patient’s subjective experience. For example, in the statement “patient has a cough”, cough would be annotated as a symptom even though it is unknown whether the cough was directly observed by the clinician or solely reported by the patient. We define subjective symptom expressions as phrases that entirely or partially capture the voice of the patient when describing symptoms, including figures of speech, idioms, or lay terms. Words and phrases that reflected common medical usage were considered symptom terms .
Patients in this study were veterans who received care at a Department of Veterans Affairs clinic or hospital between 1/1/2009 and 12/31/2009. All text notes for this cohort available in the VA Corporate Data Warehouse (CDW) were obtained, including both inpatient and outpatient encounters from any specialty. In total, 750 documents were randomly selected for review and annotation. Notes were grouped by their note titles into either “Primary Care / Specialist” or “Mental Health / Social Work.” The “Primary Care / Specialist” group includes notes from primary care clinics, specialty clinics, physical and occupational therapy, and inpatient stays in medical units. The “Mental Health / Social Work” grouping includes notes from inpatient and outpatient psychiatry, psychology, social work, and case management.
We conducted two distinct reviews by health care providers from our research team. The first was an annotation effort to locate symptoms and subjective symptom expressions. The second was a review of the resulting annotations to classify them into symptom types, determine if the subjective symptom descriptions were coded using ICD-9-CM, and determine if the provider mentioned the symptom using words that could be mapped to a standard terminology.
Four health care providers with annotation experience were recruited and used their clinical experience and judgment to distinguish between standard symptoms and subjective symptom expressions. Reviewers were trained using 50 documents set aside for that purpose until an Inter-Annotator Agreement (IAA) above 80% was consistently achieved for subjective symptoms expressions, which had the lowest IAA due to their subjective nature. The 750 documents were randomly allocated into batches of 20, and each batch was randomly assigned to 2 independent reviewers. Any disagreement between the two reviewers was then adjudicated by a third reviewer to create the reference standard.
The subjective symptom expressions identified by reviewers were grouped into categories and sub-categories based on the nature of the description. Somatic experiences were defined as expressions that referred to a body part, substance, function, or sensation. Representations of altered function like general malaise (“I feel sick all over”) were classified as somatic experiences. Body sensations were considered to be perceptions. Altered or unpleasant sensations such as pain or numbness constituted another type of symptom within the somatic category. Since all symptoms are subjective, the distinction between an experience of the body and an experience of the mind was made on the basis of its representation in the language artifacts.
Mental/behavioral experiences were defined as descriptions that referred to cognitive processes, emotions, interpersonal processes, and dysfunctional behaviors. Cognitive processes included references to concentration and memory. Emotions included references to mood, such as sadness, and states of mind, such as nervousness or anxiety. Dysfunctional behaviors encompassed references to substance abuse and compulsions. Thoughts of hurting someone were grouped together into a specific sub-category. When the expression suggested the presence of a perception or belief that was untrue (e.g., “the UN is plotting to get me”), it was classified as a false belief.
Explicitly stated information was used to make the appropriate classification. Inferences about mental health states were not drawn unless overtly mentioned. An expression such as “was awake all night” was a somatic experience of loss of sleep while “was awake all night guarding my house” was a mental/behavioral experience because it directly refers to the emotional state of hyper-arousal.
Each subjective symptom expression was compared to other symptom terms in the same document to assess semantic relatedness and determine if the author had stated the subjective expression in common clinical terms. For example, the subjective symptom “feels like someone hitting my head with a hammer” would be linked to a mention of “headache” in the document. Each subjective symptom expression description was also compared to the ICD-9-CM codes assigned to the encounter associated with the clinical note. When a code denoted a symptom with a meaning similar to the subjective expression description, the instance was considered “Explicitly Coded”. For example “feels like a vise on my head” would be considered explicitly coded if the encounter had a code for headache. If the expression was a plausible manifestation of conditions represented by the diagnosis codes, the instance was considered “contained by”. For example “I’m not good for anything anymore” would be considered a plausible manifestation of Depressive Disorder.
Results
750 clinical notes were reviewed and annotated. The documents represented 750 Veterans, 89% male (n=664), average age 48.6 (range: 22–97), 45% married (n=337), 54% OEF/OIF (n=400), 16% Vietnam (n=122). These notes were grouped into Mental Health / Social Work Notes (23%) and Primary / Specialty Care Notes (77%). There were 5031 (90.3%) symptom terms and 543 (9.7%) subjective symptom expressions. Only 170 notes contained subjective symptom expressions. The prevalence of subjective expressions and symptom terms is higher in Mental Health / Social notes than in Primary / Specialty notes.
We classified the 543 subjective symptom expressions into 2 major semantic categories and 8 sub-categories. 361 (66%) of the subjective expressions were classified in the semantic group of cognitive / mental / behavioral symptoms. The remaining 182 (34%) were classified as somatic symptoms. In Mental Health / Social Work notes containing subjective symptoms, most (78%) were in the cognitive / mental / behavioral category. The proportion of somatic subjective expressions was higher in Primary / Specialty care notes (41%).
Symptom expressions of emotional distress were the most frequently appearing expressions. Almost a half (44%) of those in Mental Health / Social Work notes were from this category, and 26% in Primary / Specialty care notes.
In general, subjective expressions of somatic nature were less often restated (41%) in symptom terms compared to those in the cognitive category (62%). Well-restated categories include cognitive dysfunction (78%), harm to self/others (69%) and emotional distress (60%).
We also examined how well subjective expressions in each category were contained by ICD-9-CM codes. Similar to symptom term representation of expressions, a higher proportion of cognitive subjective symptoms (37%) were contained by ICD-9 when compared to somatic symptoms (31%). Harm to self/others, the second most frequently restated category, was the one of the least frequent contained by ICD-9 codes. Only 2 symptom expressions, both describing headache, were explicitly coded.
The challenge in obtaining clinical information from the symptom expressions was further illustrated in Table 4 . If ICD-9 alone were used, only 36% of expressions would be represented either explicitly or as part of manifestations of diagnosis. Symptom terms offered better coverage, but still almost half (45%) would be missed. Utilizing both ICD-9 codes and symptom terms restated by physicians, there were still 31% of subjective symptoms not captured.
Table 4.
Distribution of Subjective Symptom Expressions (SSE) contained by ICD-9 or restated.
| SSE Restated in ST | |||
|---|---|---|---|
| SSE Contained by ICD-9 | Yes | No | |
| Yes | 114 (21%) | 79 (15%) | 193(36%) |
| No | 183 (34%) | 167 (31%) | 350 (64%) |
| 297 (55%) | 246 (45%) | 543 | |
Discussion:
To obtain the symptoms represented by subjective symptom expressions without extensive human review, one could use coded data or NLP. Only 2 symptom expressions of headache were explicitly coded, and we found that only 35% of the subjective symptom expressions were plausible manifestations of the coded diagnoses. It is important to note that these manifestations can’t always be inferred from the diagnosis and lack specificity, making this approach less than desirable. With the diagnosis code of “sleep disorder”, one cannot infer the patient’s actual complaint of “my mind races all night”, nor what type of sleep disorder that patient has. A limitation of this study was that 176 of the 750 documents had no coding for the encounter. In some settings, the coding may be more complete and give better results. Because the VA currently uses ICD-9-CM, we could not determine how the use of ICD-10 will influence coding of symptoms in the future.
Using standardized terminologies in information extraction systems has more promise. Approximately 55% of the subjective symptom expressions were restated in standard medical terms in the note. If an NLP system were able to correctly map and classify all symptom terms in the notes, 55% of the symptoms described in subjective expressions could be extracted from the note, albeit with less detail, leaving 45% of the subjective symptom expressions unobtainable. The ability of mapping systems to correctly identify and label these mentions varies by system and is another factor to consider when using terminologies. The document corpus and annotated data generated by this study are restricted to VA research personnel in accordance with federal law, but the lexicon of subjective symptom expressions may be made available to aid future research in this important area.
The subjective symptom expressions were diverse in language and content, and were often examples that were striving to form a narrative of the symptom. A patient may say “I always size up every guy for weapons”, an example of hyper-vigilance, a term the patient may not know or accept. The diversity of language is likely also owed to the complex nature of these symptoms. Mental and emotional experiences in particular are difficult to describe and often manifest in different ways, making them hard to communicate without providing examples.
The expressions often contained elements of symptom severity. “It feels like I’m being stabbed in the chest by a hot knife.” clearly conveys that the chest pain is severe, while the phrase “my knee doesn’t quite feel right” conveys a lower severity joint symptom. Future work could utilize the severity expressed in these phrases to enhance symptom understanding. Finally, the expressions communicate the symptom’s impact on the patient, and often had implications about the patient’s mental and physical functional state. Phrases like “I never make it to the bathroom on time” or “I can’t leave my house because crowds make me freak out” contain important information about the patients ability to perform basic life functions. Thus, subjective symptom expressions may be useful in gauging and monitoring quality of life measures.
Conclusion:
Symptoms expressed by the patient represent an important aspect of the clinical encounter; however using symptoms in retrospective research is challenging. When manual chart review is not feasible, one is left with coded data and current NLP techniques. Analysis showed that 31% of the subjective symptom expressions were neither coded to any degree, nor restated in symptom terms that could be mapped to a standard terminology, highlighting the need for information extraction systems that are designed and trained to capture these symptoms. In cases when coding or extracting restated symptom terms could identify the symptom concept, much information is left locked in the notes. Subjective symptom expressions contained a depth of information about the nature and severity of the symptom and of the patient experience. Due to their rich and complex nature, natural language processing techniques must be adapted to capture these subjective symptom expressions in order to generate a true patient “phenotype”.
Table 1.
Prevalence of subjective symptom expressions (SSE) and symptoms terms (ST) by note type.
| Note Type | Total Notes | Count of Notes with SSE (%) | Subjective Symptom Expression (SSE) | Symptom Terms (ST) | ||
|---|---|---|---|---|---|---|
| Count | Mean SSE per Note (range) | Count | Mean per Note (range) | |||
| Mental/Social | 171 | 65 (38.24%) | 213 | 1.25 (0 ∼ 16) | 1486 | 8.74 (0 ∼ 67) |
| Primary/Specialty | 579 | 105 (61.76%) | 330 | 0.57 (0 ∼ 35) | 3545 | 6.14 (0 ∼ 69) |
| Total | 750 | 170 | 543 | 5031 | ||
Table 2.
Classification of subjective symptom expressions (SSE) by note type.
| Mental Health / Social Work Notes | Primary / Specialty Care Notes | |||
|---|---|---|---|---|
| Count of SSE | % of SSE | Count of SSE | % of SSE | |
| Cognitive / Mental / Behavioral | 167 | 78.40% | 194 | 58.79% |
| Cognitive Dysfunction | 15 | 7.04% | 34 | 10.30% |
| False Perceptions/beliefs | 8 | 3.76% | 10 | 3.03% |
| Emotional Distress | 94 | 44.13% | 87 | 26.36% |
| Harm to self/others | 13 | 6.10% | 23 | 6.97% |
| Behavioral Dysfunction | 37 | 17.37% | 40 | 12.12% |
| Somatic | 46 | 21.60% | 136 | 41.21% |
| Loss of Somatic Function | 35 | 16.43% | 52 | 15.76% |
| Unpleasant or Altered Sensation | 5 | 2.35% | 47 | 14.24% |
| Somatic, non-sensory, not functional | 6 | 2.82% | 37 | 11.21% |
Table 3a.
Characteristics of subjective symptom expressions (SSE) restated in corresponding symptom terms (ST).
| Count of SSE | Count of Restated SSEs (%) | Example of SSE | Example of Corresponding ST | ICD-9 Contained (%) | |
|---|---|---|---|---|---|
| Mental / Behavioral | 361 | 222 (61.50%) | 135 (37.40%) | ||
| Cognitive Dysfunction | 49 | 38 (77.55%) | Always forgetting where I put things | Memory Problems | 16 (32.65%) |
| False Perceptions/beliefs | 18 | 9 (50.00%) | Talks often with Abraham Lincoln | Hallucinations | 0 (0.00%) |
| Emotional Distress | 181 | 109 (60.22%) | I'm good for nothing anymore | Depressed mood | 90 (49.72%) |
| Harm to self/others | 36 | 25 (69.44%) | I sometimes put a gun to my head | Suicidal | 7 (19.44%) |
| Behavioral Dysfunction | 77 | 41 (53.25%) | I never leave my room so I don't have to talk to people | Social Isolation | 22 (28.57%) |
| Somatic | 182 | 75 (41.21%) | 57 (31.32%) | ||
| Loss of Somatic Function | 87 | 44 (50.57%) | I can't even get from my bed to the kitchen | Inability to ambulate | 33 (37.93%) |
| Unpleasant or Altered Sensation | 52 | 22 (42.31%) | My head feels like it was hit by a sledgehammer | Severe Headache | 13 (25.00%) |
| Somatic, not sensory/functional | 43 | 9 (20.93%) | My legs are puffed up like marshmallow | BLE Edema | 11 (25.58%) |
Acknowledgments
This work was supported by the Department of Veterans Affairs, Office of Research and Development, Health Services Research and Development, ProWATCH: Epidemiology of Medically Unexplained Syndromes project # HIR 10-001.
References:
- 1. Matheny ME , Fitzhenry F , Speroff T , Green JK , Griffith ML , Vasilevskis EE , et al. Detection of infectious symptoms from VA emergency department and primary care clinical documentation . Int J Med Inform . 2012 Mar ; 81 ( 3 ): 143 – 56 . doi: 10.1016/j.ijmedinf.2011.11.005. [DOI] [PubMed] [Google Scholar]
- 2. Jaszuk MS , GraËzyna , Walczak , Andrzej , Puzio Leszek . Building a Model of Disease Symptoms Using Text Processing and Learning from Examples . Federated Conference on Computer Science and Information Systems ; 2011 . [Google Scholar]
- 3. Sotelsek-Margalef AV-R , Julio MIDAS . An Information-Extraction Approach to Medical Text Classification . Procesamiento del lenguaje Natural . 2008 ;( 41 ): 97 – 104 . [Google Scholar]
- 4. Uzuner O , et al. i2b2/VA challenge on concepts, assertions, and relations in clinical text . J Am Med Inform Assoc . 2010 ; 2011 ; 18 ( 5 ): 552 – 6 . doi: 10.1136/amiajnl-2011-000203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Zeng QT , Tse T . Exploring and developing consumer health vocabularies . J Am Med Inform Assoc . 2006 Jan-Feb; 13 ( 1 ): 24 – 9 . doi: 10.1197/jamia.M1761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Tse T , Soergel D . Exploring medical expressions used by consumers and the media: an emerging view of consumer health vocabularies . AMIA Annu Symp Proc . 2003 : 674 – 8 . [PMC free article] [PubMed] [Google Scholar]
- 7. Pakhomov SV , et al. Agreement between patient-reported symptoms and their documentation in the medical record . Am J Manag Care . 2008 ; 14 ( 8 ): 530 – 9 . [PMC free article] [PubMed] [Google Scholar]
