Skip to main content
Kidney360 logoLink to Kidney360
. 2025 Jan 29;6(5):776–783. doi: 10.34067/KID.0000000694

Natural Language Processing Identifies Underdocumentation of Symptoms in Patients on Hemodialysis

Yang Dai 1, Huei Hsun Wen 2, Joanna Yang 1, Neepa Gupta 3, Connie Rhee 4,5,6, Carol R Horowitz 7, Dinushika Mohottige 2,7, Girish N Nadkarni 1,2,8, Steven Coca 2, Lili Chan 1,2,8,
PMCID: PMC12136641  PMID: 39879098

Abstract

Key Points

  • Natural language processing can be used to identify patient symptoms from the electronic health records with good performance when compared with manual chart review.

  • Natural language processing–extracted patient symptom burden does not reflect patient burden due to under-recognition and underdocumentation by health care professionals.

Background

Patients on hemodialysis have a high burden of emotional and physical symptoms. These symptoms are often under-recognized. Natural language processing (NLP) can be used to identify patient symptoms from electronic health records (EHRs). However, whether symptom documentation matches patient-reported burden is unclear.

Methods

We conducted a prospective study of patients seen at an ambulatory nephrology practice from September 2020 to April 2021. We collected symptom surveys from patients, nurses, and physicians. We then developed an NLP algorithm to identify symptoms from the patients' EHRs and validated the performance of this algorithm using manual chart review and patient surveys as a reference standard. Using patient surveys as the reference standard, we compared symptom identification by (1) physicians, (2) nurses, (3) physicians or nurses, and (4) NLP.

Results

We enrolled 97 patients into our study, 63% were female, 49% were non-Hispanic Black, and 41% were Hispanic. The most common symptoms reported by patients were fatigue (61%), cramping (59%), dry skin (53%), muscle soreness (43%), and itching (41%). Physicians and nurses significantly under-recognized patients' symptoms (sensitivity 0.51 [95% confidence interval (CI), 0.40 to 0.61] and 0.63 [95% CI, 0.52 to 0.72], respectively). Nurses were better at identifying symptoms when patients reported more severe symptoms. There was no difference in results by patients' sex or ethnicity. NLP had a sensitivity of 0.92, specificity of 0.95, positive predictive value of 0.75, and negative predictive value of 0.99 with manual EHR review as the reference standard and a sensitivity of 0.58 (95% CI, 0.47 to 0.68), specificity of 0.73 (95% CI, 0.48 to 0.89), positive predictive value of 0.92 (95% CI, 0.82 to 0.97), and negative predictive value of 0.24 (95% CI, 0.14 to 0.38) compared with patient surveys.

Conclusions

Although patients on hemodialysis report high prevalence of symptoms, symptoms are under-recognized and underdocumented. NLP was accurate at identifying symptoms when they were documented. Larger studies in representative populations are needed to assess the generalizability of the results of the study.

Keywords: dialysis, ESKD, hemodialysis, patient self-assessment, artificial intelligence, biostatistics

Visual Abstract

graphic file with name kidney360-6-776-g001.jpg

Introduction

In the United States, there are over 800,000 patients with kidney failure. Most of these patients are treated with in-center hemodialysis.1 Patients on hemodialysis have a high symptom burden, with over 50% of patients experiencing hemodialysis-associated symptoms, such as fatigue, itching, and cramping.24 Patient-reported symptoms, including depression, are associated with quality of life, morbidity, and mortality.3,57 A recent study identified several symptoms, including fatigue and pain, as high-priority outcomes by patients, care givers, and health care professionals.8 Unfortunately, although patients on in-center hemodialysis see a dialysis nurse/technician at every hemodialysis treatment and a physician or advanced practice provider nearly weekly, these highly prevalent hemodialysis-associated symptoms are under-recognized and undertreated.5,9,10

There are multiple barriers to symptom recognition, including poor communication between patients and health care professionals, limited time in the fast-paced hemodialysis unit, and lack of standard assessment of symptoms.11 In one study, Weisbord et al. reported that renal health care professionals' sensitivity to identifying symptoms in patients on hemodialysis was <50% with a positive predictive value (PPV) of <75% for 25 of 30 symptoms.5 This under-recognition contributes to the undertreatment of symptoms, with only 58% of patients with pain receiving a prescription for an analgesic.10 Increasing recognition has been demonstrated to improve patient symptom scores, particularly for pain and itching.9

Health care professional documentation of symptoms in the electronic health record (EHR) allows for increased awareness of symptoms across the different health care professionals. Patients on hemodialysis have frequent health care encounters, resulting in a large number of progress notes in the EHR. However, because the symptoms are often documented in free text, identifying symptoms has previously required the laborious task of manual chart review. Identifying symptoms from the documentation may facilitate intervention planning for these symptoms. EHR-based symptom reporting is also sometimes essential for linking patient-specific items to diagnostic codes and subsequent payment for therapies (e.g., treatments of refractory pruritus). Natural language processing (NLP)12 can extract information from text in an automated and structured manner that is more efficient than manual note review. This may also reduce the burden to patients of regularly completing patient reported outcome measure (PROM)-related surveys and bypass the known patient-level challenges of PROM-focused patient surveys, including health literacy and physical limitations.

Previous studies, however, have found mixed results for the efficacy of NLP on EHRs to detect patient symptoms. In our prior work, we found that NLP applied to EHRs had a higher sensitivity than International Classification of Diseases codes for symptom identification in a cohort of patients on hemodialysis, using manual chart review for validation.13 However, patients' self-reported symptoms may not match what is documented in EHRs.14,15 Therefore, EHRs may not effectively capture patient symptoms. To our knowledge, there are no prior studies evaluating the concordance between patient-reported symptoms and those reported by physicians and nurses and recorded in progress notes for patients on maintenance in-center hemodialysis. Therefore, we aim to evaluate the performance of an NLP for identifying patient symptoms and assess how nursing and physician recognition and documentation may affect the utility of NLP for identifying patient symptoms.

Methods

Study Population

We conducted a prospective study at an outpatient hemodialysis unit in New York City from September 14, 2020, to April 1, 2021. We included patients if they were older than 18 years, had been on in-center hemodialysis for more than 30 days, and receive hemodialysis three times a week. We included only patients who could provide informed consent and were able to answer surveys without assistance (as determined by the patients' treating physician or nurse) in either English or Spanish as previously described.16 We confirm that this study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki. All patients completed informed consent, and the Mount Sinai Institutional Review Board approved this study under protocol number STUDY-19-00468.

Measures

We surveyed patients during the last 15 minutes of their dialysis treatments to ensure we collected any intradialytic symptoms and events over a 4-week period, resulting in a total of 12 surveys per patient. Patients who were hospitalized after the enrollment were included if they were able to complete the remaining surveys on hospital discharge. Patients were approached 2 weeks after hospital discharge on the day of the week that the survey was due (e.g., if the last survey was completed on a Tuesday before hospitalization, then the survey was resumed on Thursday). The patients' nurses were also surveyed at the end of every hemodialysis treatment over the 4-week period for a total of 12 surveys and their physicians once during week 3 of the study period. We chose these survey timings to closely follow the timing of health care professional documentation.

We based our survey on the validated and widely used Dialysis Symptom Index that asks patients for the presence and severity of 30 different symptoms using a five-point Likert scale (in which a response of “0” equates to “no” while a response of “4” equates to “yes: very much”).17 The range of Dialysis Symptom Index scores span from 0 to 120, with higher scores indicating higher overall severity of symptoms.18 We limited the symptoms surveyed to those that would fluctuate between dialysis treatments. We asked patients if over the past 24 hours, they experienced any of the following symptoms: fatigue, muscle cramps, dry skin, muscle soreness, itching, bone pain, cough, dry mouth, restless leg syndrome, dizziness, shortness of breath, headache, decreased appetite, nausea, constipation, edema, chest pain, difficulty concentrating, vomiting, or diarrhea. The survey was administered by a trained bilingual research coordinator, either in paper or electronically on a laptop as per patient preference. Patients who could not complete the survey on paper or electronically were read the survey by the coordinator. The survey was administered in either English or Spanish as per patient preference.

In addition to surveys, we asked participants to provide their sex, self-reported race (Asian, Black, Multiracial, Native American, Other, Pacific Islander, White), self-reported ethnicity (Hispanic or not Hispanic), dialysis access, and medical history. We confirmed this information with a review of the participant's dialysis EHR. The dialysis EHR is separate from the health system EHR and only includes notes from encounters in the ambulatory hemodialysis unit. From the EHR, we extracted progress notes from 1 month before enrollment and until 1 month after the study period. We also included laboratory values of hemoglobin and urea reduction ratio that were obtained on monthly laboratory tests during the month of survey participation. In addition, we extracted interdialytic weight gain, systolic BP (prehemodialysis, nadir, and posthemodialysis), and ultrafiltration rate of every treatment from the EHR.

NLP on Progress Notes

We developed an NLP method to identify symptoms within patient progress notes written by hemodialysis physicians, nurses, nutritionists, and social workers. The symptoms of interest were the five symptoms most frequently reported by patients in at least one survey during the study period. The patient progress notes spanned the period from 1 month before the first survey until 1 month after the last survey. We used the Python NLP library, spaCy, for symptom identification, specifically using spaCy's rule-based matching functionality, which finds words and phrases using user-defined patterns.19 Rule-based matching was chosen because the progress notes were largely in a semistructured format with only a limited amount of free text. We elected to not use fuzzy matching for this dataset because fuzzy matching introduces the possibility of false-positive matches. Finally, we made use of the negspaCy pipeline component to detect negations of symptoms. Manual review of a subset of notes found that there were generally a finite number of examples for each symptom in the progress notes. A list of tokens was manually generated for each symptom using manual review of a subset of progress notes and a review of the Unified Medical Language System synonym lists.20 See Supplemental Table 1 for the list of tokens used for each symptom.

The performance of NLP was evaluated in two ways: (1) using manual review of the progress notes as the reference standard to ensure appropriate identification of symptoms and (2) using patients' surveys as the reference standard. Two raters performed a manual review of the progress notes of 20 randomly selected patients (ten non-Hispanic and ten Hispanic). The raters went through each note and marked which of the five symptoms the note mentioned. When there was a disagreement between the manual validations for a note, joint review of the note was performed until a consensus was reached. Inter-rater reliability at a symptom and patient level was measured using Cohen's kappa. Using the manual review symptom labels as the reference, the NLP model was evaluated on sensitivity, specificity, PPV, and negative predictive value (NPV).

Statistical Analyses

To measure how well physicians, nurses, other multidisciplinary health care professionals, and the NLP/EHR model captured patients' symptoms, we used the patient surveys as the reference to obtain the sensitivity, specificity, PPV, and NPV for each of the raters. Patients who consented for the study but were not able to complete all 12 surveys were excluded from final analyses. For the reference, a patient is recorded as experiencing a given symptom if the patient reported the symptom at any one of the surveys during the study period. Physicians and nurses are recorded as identifying a symptom in a patient if they identified the patient as experiencing the symptom at any one of the surveys during the study period. To evaluate for differences in results by time period, we conducted sensitivity analysis where analysis was conducted on a weekly basis. The NLP/EHR model was recorded as identifying a symptom in a patient if it found the symptom in any one of the patient's progress notes. Different cutoffs for patient-reported symptom severity and frequency were used in obtaining the reference to investigate whether increased symptom severity or frequency made physicians, nurses, and the NLP/EHR model more likely to identify the symptom. Cohen's kappa was used to measure agreement between patient-reported symptoms and physician/NLP-reported symptoms. To compare patient, nurse, physician, and NLP/EHR results with each other, we used McNemar's test. To measure whether ethnicity (Hispanic versus non-Hispanic), sex (male versus female), or symptom severity (severity <3 versus severity ≥3) made a significant difference in how accurately raters (physician, nurse, and NLP/EHR) detected symptoms, we used Pearson's chi-square test of independence. All analyses were performed using Python's SciPy statistical module. The code for this analysis is available at https://github.com/Nadkarni-Lab/Symptom_Documentation.

Results

Patient Characteristics

A total of 166 patients of 209 were eligible for participation. Ultimately, 97 patients completed the study (Supplemental Figure 1). More patients from the Monday-Wednesday-Friday shifts (54%) than the Tuesday-Thursday-Saturday shifts (39%) participated in the study. The mean age of participants was 56 (SD ±14) years, 53% were female, 49% were Black, 41% were Hispanic, and 74% of patients used a fistula for dialysis access. Patients had a high comorbidity burden with 83% having hypertension, 26% had a history of coronary artery disease, and 39% had diabetes. Mean hemoglobin was 10.3 (SD ±1.1) g/dl, and the mean urea reduction ratio was 72 (SD ±6%); Table 1. The total number of notes was 6013, and the mean number of notes per patient was 60±11.

Table 1.

Patient characteristics at study enrollment

Characteristic No. (%)
Sex
 Female 51 (53)
 Male 46 (47)
Mean age±SD 56±14
Race and ethnicity
 Hispanic 40 (41)
 Non-Hispanic Black 48 (50)
 Non-Hispanic White 5 (5)
 Other 4 (4)
Dialysis access
 Arteriovenous fistula 72 (74)
 Arteriovenous graft 13 (13)
 Central venous catheter 12 (12.4)
Comorbidities
 Hypertension 80 (83)
 Diabetes mellitus 38 (39)
 Coronary artery disease 25 (26)
 Prior stroke 13 (13)
 Liver disease/cirrhosis 10 (10)
 Current or past cancer 10 (10)
Laboratory results, mean±SD
 Hemoglobin (mg/dl) 10.3±1.1
 URRa (%) 72±6
Mean number of notes per patient 60±11

URR, urea reduction ratio.

a

Urea reduction ratio calculated as (pretreatment blood urea nitrogen−post treatment BUN)/pretreatment BUN.

Symptom Prevalence from Patient Surveys

During the study period, the top five most prevalent symptoms reported on at least one survey were fatigue (59 patients, 61%), cramping (57 patients, 59%), dry skin (51 patients, 53%), muscle soreness (42 patients, 43%), and itching (40 patients, 41%; Figure 1).

Figure 1.

Figure 1

Number of patients with symptoms, as identified by patients, nurses, health care professionals, and NLP. *Indicates a P value < 0.01 and ** a P value < 0.001 by McNemar’s test for comparison of patient survey with physician survey, nurse survey, and NLP. EHR, electronic health record; NLP, natural language processing.

Nurse-Reported Symptoms

Results of the nurse surveys found that nurses correctly identified 17 of the 59 patients (29%) with fatigue, 37 of the 57 patients (65%) with cramping, 2 of the 51 patients (4%) with dry skin, 8 of the 42 patients (19%) with muscle soreness, and 7 of the 40 patients (18%) with itching (Figure 1). In some cases, nurses identified patients as having a symptom when the patient did not report the symptom. False-positive rates for nurses ranged from 0 (0%) for dry skin and itching to 4 (4.1%) for cramping and muscle soreness. Using patient surveys as the reference, nurse-reported symptoms had a sensitivity of 0.63 (95% confidence interval [CI], 0.52 to 0.72), specificity of 1.00 (95% CI, 0.79 to 1.00), PPV of 1.00 (95% CI, 0.93 to 1.00), and NPV of 0.31 (95% CI, 0.20 to 0.46; Figure 2, A and B). Cohen's kappa values between nurse- and patient-reported symptoms are 0.22 for fatigue, 0.52 for cramps, 0.04 for dry skin, 0.13 for muscle soreness, and 0.20 for itching (Table 2).

Figure 2.

Figure 2

Overall test parameters of different evaluators. Sensitivity, specificity, PPV, and NPV of nurse, physician, nurse/physician, and NLP for identifying symptoms in patients (A) overall and (B) by individual symptom. NPV, negative predictive value; PPV, positive predictive value.

Table 2.

Cohen's kappa for NLP, nursing surveys, and physician surveys compared with patient surveys

Symptom Evaluator Fatigue Cramp Dry Skin Muscle Soreness Itching
Nurse 0.22 0.52 0.04 0.13 0.2
Physician 0.12 0.07 0.05 0.02 0.19
Nurse+physician 0.28 0.5 0.02 0.28 0.19
NLP 0.02 0.28 0 0.05 0.24

NLP, natural language processing.

Physician-Reported Symptoms

Results of the physician surveys found that physicians correctly identified 19 of the 59 patients (32%) with fatigue, 13 of the 57 patients (23%) with cramping, 16 of the 51 patients (31%) with dry skin, 6 of the 42 patients (14%) with muscle soreness, and 17 of the 40 patients (42%) with itching (Figure 1). In some cases, physicians identified patients as having a symptom when the patient did not report the symptom. False-positive rates for physicians ranged from 6 (6%) for cramping to 14 (14%) for itching. Using patient surveys as the reference, physician-reported symptoms had a sensitivity of 0.51 (95% CI, 0.40 to 0.61), specificity of 0.43 (95% CI, 0.21 to 0.67), PPV of 0.84 (95% CI, 0.72 to 0.92), and NPV of 0.13 (95% CI, 0.06 to 0.25; Figure 2A). A symptom-level breakdown of these measures is shown in Figure 2B. Cohen's kappa values between physician- and patient-reported symptoms are 0.12 for fatigue, 0.07 for cramps, 0.05 for dry skin, 0.02 for muscle soreness, and 0.19 for itching (Table 2).

Nurse or Physician-Reported Symptoms

Results of combining nurse and physician surveys found that they correctly identified 39 of the 59 patients (66%) with fatigue, 48 of the 57 patients (84%) with cramping, 30 of the 51 patients (59%) with dry skin, 25 of the 42 patients (60%) with muscle soreness, and 35 of the 40 patients (88%) with itching (Figure 1). Using patient surveys as the reference, nurse/physician-reported symptoms had a sensitivity of 0.80 (95% CI, 0.70 to 0.87), specificity of 0.43 (95% CI, 0.21 to 0.67), PPV of 0.89 (95% CI, 0.80 to 0.94), and NPV of 0.13 (95% CI, 0.06 to 0.25; Figure 2A). A symptom-level breakdown of these measures is shown in Figure 2B. For most symptoms, combining nurse and physician surveys resulted in higher agreement. Cohen's kappa values between nurse/physician and patient-reported symptoms are 0.28 for fatigue, 0.5 for cramps, 0.02 for dry skin, 0.28 for muscle soreness, and 0.19 for itching (Table 2).

NLP Performance Using Manual Review of EHR as Reference

A two-person manual review of the progress notes of 20 randomly selected patients had an initial Cohen's kappa of 0.75 and a Cohen's kappa of 1 after joint review. Using the post–joint review symptom labels as the reference, the NLP algorithm had a sensitivity of 0.92, specificity of 0.95, PPV of 0.75, and NPV of 0.99.

NLP Performance Using Patient Surveys as Reference

The NLP algorithm that was run on patients' progress notes was able to correctly identify 9 of the 59 patients (15%) with fatigue, 22 of the 57 patients (39%) with cramping, 0 of the 51 patients (0%) with dry skin, 5 of the 42 patients (12%) with muscle soreness, and 13 of the 40 patients (33%) with itching (Figure 1). In some cases, the NLP algorithm identified patients as having a symptom when the patient did not report the symptom. False-positive rates for the NLP algorithm ranged from 0 (0%) for dry skin to 7 (7.2%) for itching. Using patient surveys as the reference, NLP-identified symptoms had a sensitivity of 0.58 (95% CI, 0.47 to 0.68), specificity of 0.73 (95% CI, 0.48 to 0.89), PPV of 0.92 (95% CI, 0.82 to 0.97), and NPV of 0.24 (95% CI, 0.14 to 0.38; Figure 2, A and B). Cohen's kappa values between NLP- and patient-reported symptoms are 0.02 for fatigue, 0.28 for cramps, 0.00 for dry skin, 0.05 for muscle soreness, and 0.24 for itching (Table 2).

Performance Comparison between Physicians, Nurses, and NLP/EHR

Using McNemar's test to test whether there is a statistically significant difference between symptom detection across different raters, we found that nurses and physicians were significantly different in detecting cramps (nurse 65% versus physician 23%, P < 0.001), dry skin (nurse 4% versus physician 31%, P < 0.0001), and itching (nurse 18% versus physician 42%, P < 0.0001). The NLP/EHR model differed significantly with nurses in detecting cramps (nurse 4.1% versus NLP 39%, P < 0.01) and itching (18% versus NLP 33%, P < 0.01), but did not differ significantly with physicians for any symptom (Figure 1).

Subgroup Analyses

To test our hypothesis that nurses and physicians would be better at ascertaining more severe symptoms, we filtered for symptoms with a severity rating of ≥3. Taking symptom severity into account improved overall symptom detection sensitivity for nurses, but not for physicians (P < 0.001 versus P = 0.91, Supplemental Figure 2, and Supplemental Table 2). Specificity was similar across different raters.

To evaluate whether symptom detection changed depending on how frequently the patient reported the symptom, we conducted subgroup analysis by if the patient reported the symptom in ≥50% of the surveys versus those who reported <50%. Notably, PPV was higher and NPV was lower in patients with lower frequency compared with those with higher frequency (Supplemental Figure 3). This indicates that in patients who had low frequency of symptoms, if nurses, physicians, and NLP identified the symptom, the probability of the patient having the symptom was high. Whereas if nurses, physicians, and NLP did not find the symptom, the probability of the patient reporting the symptom was low. The results were reversed for patients with high frequency.

We examined test parameters by patient sex to evaluate for potential sex differences in recognition and documentation. There were increases in sensitivity and specificity for female patients, but these were not statistically significant (Supplemental Figure 4 and Supplemental Table 2).

We examined performance by patient ethnicity—Hispanic versus non-Hispanic individuals. We did not find a significant difference in test parameters by ethnicity (Supplemental Figure 5 and Supplemental Table 2).

Sensitivity Analysis

When agreement was assessed on a weekly basis, we found that there was minimal variability in agreement from week to week on nurses' surveys (Supplemental Table 3). When patient surveys were limited to the week of the physician survey, agreement was slightly decreased across all symptoms. Overall, results were not materially different with the monthly analysis.

Discussion

In this article, we tested an NLP algorithm to identify patient symptoms from the EHR. Although the NLP algorithm had a high sensitivity, specificity, NPV, and moderate PPV compared with manual EHR review, the agreement between the NLP algorithm and patient surveys was low. On evaluation of nurses' and physicians' recognition and documentation of patients' symptoms, both nurses and physicians under-recognized and underdocumented patient-reported symptoms. There were some differences in performance between nurses and physicians for certain symptoms. There were significant differences in symptom recognition by severity and frequency, but not by patient sex and ethnicity. Therefore, although NLP can accurately identify patient symptoms from the EHR, this is limited by the low documentation of symptoms.

Health care professional under-recognition of symptoms in patients on hemodialysis is well established. Weisbord et al. surveyed 75 patients and 18 health care professionals and found that for 27 of 30 symptoms, the sensitivity of health care professional responses was <50%.5 However, our study fills a critical gap because prior work by Weisbord et al. did not include nurses' symptom recognition. In our analyses, we found that there was considerable variation in agreement depending on the specific symptom, with physicians outperforming nurses or vice versa depending on the symptom. This may be related to the acuity of the symptom being assessed. Nurses are likely to recognize symptoms that occur during the dialysis treatment session, such as cramping, and may potentially miss symptoms that are more chronic and affect the patients outside of the dialysis unit, such as dry skin. In addition, while nurses were asked to complete the surveys at the end of each dialysis treatment session, physicians were only asked to complete the survey once. Although this mimics real-world health care professional interactions, this may allow physician recall bias to affect our results.

In our study, some of our Hispanic participants, primary language was Spanish, which may lead to lower recognition because of language barriers between the patient and health care professionals. Prior studies have also documented that there are racial and ethnic and language-related variations in symptom prevalence, severity, and recognition.18,21 In our subgroup analysis by ethnicity, we did not find significant differences in health care professional recognition. However, we did not directly assess the patients' English proficiency or the health care professionals' Spanish proficiency. Additional larger studies in more diverse populations are needed to investigate facilitators of accurate symptom recognition despite language discordance.

We have previously demonstrated that NLP is better than International Classification of Diseases codes for identifying patient symptoms from the EHR.13 In that study, we did not have patient surveys to compare to the NLP results. In this study, we verify that NLP can accurately identify symptoms from the progress notes of patients. However, the number of patients identified with symptoms using NLP in this study was low compared with our prior study. This can be attributed to the lower health care professional recognition and even lower documentation of the symptoms into the EHR. Although our original study included multiple years of inpatient and outpatient notes, this study only included 3 months of outpatient dialysis notes. Inclusion of notes from a longer duration or from other health care professionals may improve performance of our NLP algorithm to identify patients with hemodialysis-associated symptoms. We found that symptoms of itching, cramping, and muscle soreness were identified more frequently by NLP than fatigue and dry skin. We speculate that this may be related to the dialysis unit note templates, which may preferentially screen for some symptoms over others. Additional prompts for critical PROMs may be needed to enhance patient–clinician communication and targeted interventions for these symptoms.

Despite a mandate by the Centers for Medicare and Medicaid to conduct annual health-related quality-of-life surveys in patients on dialysis, health care professional recognition and documentation of symptoms remains low. These surveys are provided to patients in paper format, which reduces the likelihood of health care professionals seeing the results of the surveys. Additional barriers include lack of guidance regarding completion of these surveys or guidance for clinicians on how to intervene. Two studies have examined electronic monitoring of symptoms, which found that while patients and health care professionals found the monitoring feasible and acceptable, implementation of such monitoring was fraught with logistical challenges.22,23 Treatment of hemodialysis-associated symptoms is complex. Symptoms such as fatigue may benefit from evaluation of hemoglobin levels, iron stores, dialysis adequacy, and nutrition.24 Pruritus can be managed with gabapentin or novel agents, such as difelikefalin.25 However, treatment cannot begin without recognition of the symptoms. Additional studies are needed to identify mechanisms to enhance EHR-based broad symptom reporting and improve patient–clinician communication.

Our study has the following limitations. Although we included all progress notes available in the EHR over a 3-month period, we did not evaluate for potential copy-forwarding of progress notes, which may affect the proportion of symptoms identified by NLP. As previously discussed, we only surveyed the physician once, while nurses and patients were surveyed at the end of every dialysis treatment session for a total of 12 surveys. This may contribute to the lower physician recognition of symptoms we identified; however, this survey frequency is concordant with the typical physician rounding structure. Our cohort is from a single hemodialysis unit with a limited sample size, and patients predominantly identified as Black or Hispanic. Although this is reflective of the patient population we care for in New York City, this does not reflect the racial and ethnic breakdown of patients on hemodialysis nationally, and our results may not generalize to results in other populations. Validation in larger and more diverse cohorts is needed to determine generalizability of our findings and methods. Despite these limitations, this is the largest and first study to assess patient symptoms with repeated measures, through different health care professionals, and through documentation.

In conclusion, we found under-recognition and underdocumentation of symptoms in patients treated with in-center hemodialysis. Although we developed an NLP algorithm that could accurately identify patient symptoms within the EHR, the lack of documentation of symptoms limits the clinical applications. Further studies are needed to determine whether standardized electronic assessments would improve identification and management of patients' symptoms.

Supplementary Material

SUPPLEMENTARY MATERIAL

Acknowledgments

S. Coca is an Associate Editor for Kidney360. He was not involved in the peer review and decision-making process for this manuscript. This research was supported by a grant from the Renal Research Institute. The sponsor was not involved with the study design, execution, analysis, or writing of this study. The sponsor did not have any decision on publishing.

Footnotes

See related editorial, “Enhancing Nephrology Research with Natural Language Processing and Artificial Intelligence: A Case for More Comprehensive Symptom Documentation,” on pages 689–691.

Disclosures

Disclosure forms, as provided by each author, are available with the online version of the article at http://links.lww.com/KN9/A849.

Funding

L. Chan: National Institute of Diabetes and Digestive and Kidney Diseases (K23DK124645) and Renal Research Institute.

Author Contributions

Conceptualization: Lili Chan.

Data curation: Huei Hsun Wen.

Formal analysis: Lili Chan, Yang Dai, Neepa Gupta, Joanna Yang.

Funding acquisition: Lili Chan, Steven Coca.

Investigation: Lili Chan, Huei Hsun Wen.

Methodology: Lili Chan, Yang Dai, Girish N. Nadkarni.

Software: Yang Dai.

Supervision: Lili Chan.

Visualization: Lili Chan, Yang Dai, Joanna Yang.

Writing – original draft: Lili Chan, Yang Dai.

Writing – review & editing: Lili Chan, Steven Coca, Yang Dai, Neepa Gupta, Carol R. Horowitz, Dinushika Mohottige, Girish N. Nadkarni, Connie Rhee, Huei Hsun Wen.

Data Sharing Statement

Partial restrictions to the data and/or materials apply. Deidentified data will be provided upon reasonable request.

Supplemental Material

This article contains the following supplemental material online at http://links.lww.com/KN9/A848.

Supplemental Table 1. Tokens used to identify each symptom.

Supplemental Table 2. Sensitivity, specificity, PPV, and NPV of nurses, physicians, and NLP for different subgroups. P values denote the significance level of sensitivity and specificity across subgroups by the same evaluator type.

Supplemental Table 3. Cohen's Kappa for nursing surveys and physician surveys compared with patient surveys by week.

Supplemental Figure 1. Study flow diagram.

Supplemental Figure 2. Test performance of physicians, nurses, and NLP by severity of symptom.

Supplemental Figure 3. Test performance of physicians, nurses, and NLP by frequency of symptom.

Supplemental Figure 4. Test performance of physicians, nurses, and NLP by sex of patients.

Supplemental Figure 5. Test performance of physicians, nurses, and NLP by ethnicity of patients.

References

  • 1.United States Renal Data System. 2021 USRDS Annual Data Report: Epidemiology of Kidney Disease in the United States. National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases; 2021. [Google Scholar]
  • 2.Amro A, Waldum B, Dammen T, Miaskowski C, Os I. Symptom clusters in patients on dialysis and their association with quality-of-life outcomes. J Ren Care. 2014;40(1):23–33. doi: 10.1111/jorc.12051 [DOI] [PubMed] [Google Scholar]
  • 3.Weisbord SD Fried LF Arnold RM, et al. Prevalence, severity, and importance of physical and emotional symptoms in chronic hemodialysis patients. J Am Soc Nephrol. 2005;16(8):2487–2494. doi: 10.1681/ASN.2005020157 [DOI] [PubMed] [Google Scholar]
  • 4.Caplin B, Kumar S, Davenport A. Patients' perspective of haemodialysis-associated symptoms. Nephrol Dial Transplant. 2011;26(8):2656–2663. doi: 10.1093/ndt/gfq763 [DOI] [PubMed] [Google Scholar]
  • 5.Weisbord SD Fried LF Mor MK, et al. Renal provider recognition of symptoms in patients on maintenance hemodialysis. Clin J Am Soc Nephrol. 2007;2(5):960–967. doi: 10.2215/CJN.00990207 [DOI] [PubMed] [Google Scholar]
  • 6.Kimmel PL Peterson RA Weihs KL, et al. Multiple measurements of depression predict mortality in a longitudinal study of chronic hemodialysis outpatients. Kidney Int. 2000;57(5):2093–2098. doi: 10.1046/j.1523-1755.2000.00059.x [DOI] [PubMed] [Google Scholar]
  • 7.Hedayati SS Bosworth HB Briley LP, et al. Death or hospitalization of patients on chronic hemodialysis is associated with a physician-based diagnosis of depression. Kidney Int. 2008;74(7):930–936. doi: 10.1038/ki.2008.311 [DOI] [PubMed] [Google Scholar]
  • 8.Tong A Manns B Hemmelgarn B, et al.; SONG-HD Investigators. Establishing core outcome domains in hemodialysis: report of the standardized outcomes in nephrology–hemodialysis (SONG-HD) consensus workshop. Am J Kidney Dis. 2017;69(1):97–107. doi: 10.1053/j.ajkd.2016.05.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Jawed A, Moe SM, Moorthi RN, Torke AM, Eadon MT. Increasing nephrologist awareness of symptom burden in older hospitalized end-stage renal disease patients. Am J Nephrol. 2020;51(1):11–16. doi: 10.1159/000504333 [DOI] [PubMed] [Google Scholar]
  • 10.Claxton RN, Blackhall L, Weisbord SD, Holley JL. Undertreatment of symptoms in patients on maintenance hemodialysis. J Pain Symptom Manage. 2010;39(2):211–218. doi: 10.1016/j.jpainsymman.2009.07.003 [DOI] [PubMed] [Google Scholar]
  • 11.Flythe JE. Integrating PROMs in routine dialysis care: the devil is in the (implementation) details. Clin J Am Soc Nephrol. 2022;17(11):1580–1582. doi: 10.2215/CJN.10840922 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Koleck TA Tatonetti NP Bakken S, et al. Identifying symptom information in clinical notes using natural language processing. Nurs Res. 2021;70(3):173–183. doi: 10.1097/nnr.0000000000000488 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chan L Beers K Yau A, et al. Natural language processing of electronic health records is superior to billing codes to identify symptom burden in hemodialysis patients. Kidney Int. 2020;97(2):383–392. doi: 10.1016/j.kint.2019.10.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Weng CY. Data accuracy in electronic medical record documentation. JAMA Ophthalmol. 2017;135(3):232–233. doi: 10.1001/jamaophthalmol.2016.5562 [DOI] [PubMed] [Google Scholar]
  • 15.Pakhomov SV, Jacobsen SJ, Chute CG, Roger VL. Agreement between patient-reported symptoms and their documentation in the medical record. Am J Manag Care. 2008;14(8):530–539. PMID: 18690769 [PMC free article] [PubMed] [Google Scholar]
  • 16.Chauhan K, Wen HH, Gupta N, Nadkarni G, Coca S, Chan L. Higher symptom frequency and severity after the long interdialytic interval in patients on maintenance intermittent hemodialysis. Kidney Int Rep. 2022;7(12):2630–2638. doi: 10.1016/j.ekir.2022.09.032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Weisbord SD Fried LF Arnold RM, et al. Development of a symptom assessment instrument for chronic hemodialysis patients: the dialysis symptom index. J Pain Symptom Manage. 2004;27(3):226–240. doi: 10.1016/J.JPAINSYMMAN.2003.07.004 [DOI] [PubMed] [Google Scholar]
  • 18.You AS Kalantar SS Norris KC, et al. Dialysis symptom index burden and symptom clusters in a prospective cohort of dialysis patients. J Nephrol. 2022;35(5):1427–1436. doi: 10.1007/s40620-022-01313-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Honnibal M, Montani I, Van Landeghem S, Boyd A. spaCy: Industrial-strength Natural Language Processing in Python, 2020. doi: 10.5281/zenodo.1212303 [DOI] [Google Scholar]
  • 20.Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(Database issue):D267–D270. doi: 10.1093/nar/gkh061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Garcia ME Hinton L Gregorich SE, et al. Primary care physician recognition and documentation of depressive symptoms among Chinese and latinx patients during routine visits: a cross-sectional study. Health Equity. 2021;5(1):236–244. doi: 10.1089/heq.2020.0104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Viecelli AK Duncanson E Bennett PN, et al.; Symptom Monitoring With Feedback Trial SWIFT Investigators. Perspectives of patients, nurses, and nephrologists about electronic symptom monitoring with feedback in hemodialysis care. Am J Kidney Dis. 2022;80(2):215–226.e1. doi: 10.1053/j.ajkd.2021.12.007 [DOI] [PubMed] [Google Scholar]
  • 23.Schick-Makaroff K Wozniak LA Short H, et al. How the routine use of patient-reported outcome measures for hemodialysis care influences patient-clinician communication: a mixed-methods study. Clin J Am Soc Nephrol. 2022;17(11):1631–1645. doi: 10.2215/CJN.05940522 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Jhamb M, Weisbord SD, Steel JL, Unruh M. Fatigue in patients receiving maintenance dialysis: a review of definitions, measures, and contributing factors. Am J Kidney Dis. 2008;52(2):353–365. doi: 10.1053/j.ajkd.2008.05.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Fishbane S, Jamal A, Munera C, Wen W, Menzaghi F.; KALM-1 Trial Investigators. A phase 3 trial of difelikefalin in hemodialysis patients with Pruritus. New Engl J Med. 2020;382(3):222–232. doi: 10.1056/NEJMoa1912770 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SUPPLEMENTARY MATERIAL

Data Availability Statement

Partial restrictions to the data and/or materials apply. Deidentified data will be provided upon reasonable request.


Articles from Kidney360 are provided here courtesy of American Society of Nephrology

RESOURCES