Abstract
This cross-sectional study applies natural language processing to electronic health records from a large health care delivery system to examine performance status documentation among patients newly diagnosed with colorectal cancer.
Assessing a patient’s functional status is critical for determining cancer prognosis, treatment, and clinical trial eligibility.1 Performance status (PS) measures summarize a patient’s ability to independently perform activities of daily living (ADLs). We applied natural language processing (NLP) to electronic health records (EHRs) to examine PS documentation among patients newly diagnosed with colorectal cancer in a large health care delivery system.
Methods
The institutional review board of Massachusetts General Hospital and Partners HealthCare approved this cross-sectional study, waiving patient written informed consent for data obtained from the Research Patient Data Registry. The review board approves all research queries prior to allowing data retrieval from the Research Patient Data Registry, which has a provision to allow patients to refuse data collection for research.
Data Source and Study Population
Our data source was the Research Patient Data Registry, which stores digital clinical data for 6.5 million patients from Partners HealthCare–affiliated health care providers in Massachusetts. The Research Patient Data Registry contains more than 2 billion data points, including EHRs. We used International Classification of Diseases, Clinical Modification, Ninth or Tenth edition diagnosis codes to identify patients aged 21 through 75 years old who were newly diagnosed with colorectal cancer between 2010 and 2017. We required 1 or more additional colorectal cancer diagnosis codes between 30 and 365 days after the initial code.2 We retrieved EHRs associated with these patients for 12 months following initial cancer diagnosis date.
Natural Language Processing
We used ClinicalRegex NLP software to search EHRs for functional status documentation.3,4 Our ontology for identifying functional status documentation included 5 keyword categories: (1) Eastern Cooperative Oncology Group, ECOG, Zubrod; (2) Karnofsky, Lansky, KPS; (3) performance status; (4) PS; and (5) activities of daily living, activity of daily living, ADL, ADLs, and ADLS.
We manually reviewed approximately 930 randomly selected clinical notes across categories flagged by NLP to refine and validate the ontology. Our data abstraction tool examined whether: (1) numeric scores were reported for PS; (2) qualitative descriptions of PS were provided; (3) ADL performance was indicated only with binary terms (independent vs not independent); (4) descriptions of at least 1 individual ADL were provided; and (5) ADL comments related to another context (eg, encouraged patient to rest between ADLs).
Results
We identified 3180 patients aged 21 through 75 years with at least 2 diagnosis codes within a 12-month period for colorectal cancer between 2010 and 2017 (Table 1). In the year following cancer diagnosis, these patients were associated with 113 462 clinical notes: mean (SD) notes per patient were 39 (34). Only 1557 of 3061 (50.9%) patients had any PS documentation in the first 3 months following diagnosis; just 365 (11.9%) had separate ADL documentation (Table 2).
Table 1. Demographic Characteristics of Population of Patients With Colorectal Cancer (N = 3180).
Demographic characteristic | No. (%) |
---|---|
Age, y | |
Mean (SD) | 59.8 (10.0) |
Median | 61 |
Gender | |
Male | 1710 (53.8) |
Female | 1470 (46.2) |
Race | |
White | 2692 (84.7) |
Black or African American | 150 (4.7) |
Asian | 106 (3.3) |
American Indian/Alaskan Native | 4 (0.1) |
Native Hawaiian or other Pacific Islander | 2 (0.1) |
Other | 84 (2.6) |
Unknown | 142 (4.5) |
Ethnicity | |
Hispanic | 105 (3.3) |
Non-Hispanic | 2949 (92.7) |
Unknown | 126 (4.0) |
Table 2. Functional Status Documentation.
Time since initial cancer diagnosis, mo | No. of patients (%) | ||||
---|---|---|---|---|---|
Documentation of performance status by PS measure | Documentation of activities of daily living | Documentation of any PS measure and activities of daily livinga | |||
ECOG | Karnofsky | Anya | |||
0-3 (n = 3061) | 1319 (43.1) | 23 (0.8) | 1557 (50.9) | 365 (11.9) | 352 (11.5) |
4-6 (n = 2417) | 1025 (42.4) | 12 (0.5) | 1366 (56.5) | 249 (10.3) | 246 (10.2) |
7-9 (n = 2121) | 844 (39.8) | 8 (0.4) | 1008 (47.5) | 210 (9.9) | 205 (9.7) |
10-12 (n = 1886) | 695 (36.9) | 11 (0.6) | 838 (44.4) | 168 (8.9) | 165 (8.7) |
Abbreviations: ECOG, Eastern Cooperative Oncology Group; PS, performance status.
Natural language processing search combined keywords from 4 keyword categories: ECOG, Karnofsky, PS, and performance status.
Of notes identified in the 4 performance status keyword categories, manual review found that, of 729 results, 677 (92.9%) reported a numeric score; 44 (6.0%) had no score but used keywords in other relevant contexts (eg, improved PS); and 8 (1.1%) used keywords in unrelated context (eg, PS ventilation). Of 200 records identified using ADL keywords, 92 (46.0%) reported ADL function using binary terms (eg, independent or not independent); 89 (44.5%) used ADL keywords in other contexts (eg, ADL not asked); and 19 (9.5%) described performance of 1 or more individual ADL (eg, toileting: independent).
Discussion
Functional status documentation is critical for cancer care. Nevertheless, only half of patients in this study had any PS documentation in the first 3 months following colorectal cancer diagnosis; roughly one-fifth had ADLs separately documented.
Despite the importance of this information, studies of functional status documentation in cancer care are limited. One study of inpatient palliative oncology care consultations found that only 6% of consultation notes contained any functional status documentation within a year.5
This study has limitations, notably concerning generalizability to other health care settings and different EHR systems. Electronic health records containing uniform modules for functional status reporting may have better documentation.5 We could not assess functional status documentation relating to clinical trial participation. We also could not determine whether PS documentation was ascertained at each clinical encounter or whether text was cut and pasted from previous EHRs. Future research should investigate approaches to improve documentation of functional status among patients with cancer.
References
- 1.West HJ, Jin JO. Performance status in patients with cancer. JAMA Oncol. 2015;1(7):998. doi: 10.1001/jamaoncol.2015.3113 [DOI] [PubMed] [Google Scholar]
- 2.Whyte JL, Engel-Nitz NM, Teitelbaum A, Gomez Rey G, Kallich JD. An evaluation of algorithms for identifying metastatic breast, lung, or colorectal cancer in administrative claims data. Med Care. 2015;53(7):e49-e57. doi: 10.1097/MLR.0b013e318289c3fb [DOI] [PubMed] [Google Scholar]
- 3.Lindvall C, Lilley EJ, Zupanc SN, et al. . Natural language processing to assess end-of-life quality indicators in cancer patients receiving palliative surgery. J Palliat Med. 2019;22(2):183-187. doi: 10.1089/jpm.2018.0326 [DOI] [PubMed] [Google Scholar]
- 4.Agaronnik ND, Lindvall C, El-Jawahri A, He W, Iezzoni LI. Challenges of developing a natural language processing method with electronic health records to identify persons with chronic mobility disability. Arch Phys Med Rehabil. Published online May 21, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bruggeman AR, Heavey SF, Ma JD, Revta C, Roeland EJ. Lack of documentation of evidence-based prognostication in cancer patients by inpatient palliative care consultants. J Palliat Med. 2015;18(4):382-385. doi: 10.1089/jpm.2014.0331 [DOI] [PubMed] [Google Scholar]