Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jun 8.
Published in final edited form as: Artif Intell Med Conf Artif Intell Med (2005-). 2021 Jun 8;12721:491–496. doi: 10.1007/978-3-030-77211-6_58

Identifying Symptom Clusters Through Association Rule Mining

Mikayla Biggs 1, Carla Floricel 3, Lisanne Van Dijk 2, Abdallah S R Mohamed 2, C David Fuller 2, G Elisabeta Marai 3, Xinhua Zhang 3, Guadalupe Canahuate 1
PMCID: PMC8444285  NIHMSID: NIHMS1738130  PMID: 34541584

Abstract

Cancer patients experience many symptoms throughout their cancer treatment and sometimes suffer from lasting effects post-treatment. Patient-Reported Outcome (PRO) surveys provide a means for monitoring the patient’s symptoms during and after treatment. Symptom cluster (SC) research seeks to understand these symptoms and their relationships to define new treatment and disease management methods to improve patient’s quality of life. This paper introduces association rule mining (ARM) as a novel alternative for identifying symptom clusters. We compare the results to prior research and find that while some of the SCs are similar, ARM uncovers more nuanced relationships between symptoms such as anchor symptoms that serve as connections between interference and cancer-specific symptoms.

Keywords: Association rule mining, Symptom clusters, PRO

1. Introduction

Cancer patients experience a range of symptoms during and after treatment [13]. Research on these symptoms, their prevalence, relationships, and progression can improve disease prognosis and inform the appropriate treatment [4,5]. Symptom cluster (SC) research aims to identify co-occurring symptoms (e.g., pain, fatigue, dry mouth) and to understand the underlying mechanisms that drive these clusters [6]. This research is facilitated by increasingly available Patient-Reported Outcome (PRO) data, collected via questionnaires, that allows patients to rate the occurrence and severity of their symptoms.

The M.D. Anderson Symptom Inventory (MDASI) [7], and its head-and-neck (HN) cancer module [8], are short, validated questionnaires that patients record each visit. Three key groups comprise the 28 MDASI-HN survey questions: 13 core items for common symptoms to all cancers, nine items specific to HN, and six items regarding symptom interference with daily activity. Patients rate their symptoms using a 0–10 scale, from “not present” to “as bad as you can imagine” (core and HN), respectively from “did not interfere” to “interfered completely” (interference). Preliminary SCs in the MDASI-HN data have been identified using factor and cluster analysis [9,10].

This paper introduces association rule mining (ARM) [11] as an alternative for identifying symptom clusters. To the best of our knowledge, this is the first ARM application in the SC domain. This work’s main contribution is to offer an alternative methodology for defining new and interesting relationships for SC research using PRO data. We model each PRO response as a patient transaction and process PROs during and after treatment to identify acute and late symptom clusters, respectively. We furthermore model the severity of the symptoms. We present a graph-based visualization for the most significant association rules to identify symptom clusters for both acute and late stages. Finally, we evaluate this methodology on a real HN cancer patient dataset.

2. Modeling Symptom Clusters with ARM

The ARM approach can use any PRO; in this work, we focus on the MDASI-HN questionnaire. The M.D. Anderson Symptom Inventory (MDASI) is a multi-symptom patient-reported outcome measure to assess both the severity of cancer symptoms and symptom interference with daily life. Table 1 shows a sample of the symptoms described in the MDASI-HN survey and the short symptom labels used to refer to the MDASI-HN symptoms to improve readability.

Table 1.

The 28 MDASI-HN symptoms organized into 3 symptom categories

Category Symptom labels
Common cancer Pain, fatigue, nausea, sleep, distress, SOB, memory, appetite, drowsy, drymouth, sad, vomit
Head & Neck Numb, mucus, swallow, choke, voice, skin, constipation, taste, mucositis, teeth
Interference General_activity, mood, work, enjoy, relations, walking, enjoy

ARM has two steps: the first one is to identify frequent item-sets (FIS) from the data, and the second is to generate the association rules from the FIS. The Apriori algorithm identifies the frequent items in the data set using a set of core metrics. Support is a measure of absolute frequency, i.e., the fraction of sets that contain items A and B. Confidence (AB) is a measure of correlative frequency. It tells us how often the items A and B occur together, given the number times A occurs. Lift indicates the strength of a rule over the random occurrence of A and B. The higher the lift, the more significant the association. A lift greater than 1.0 implies that the relationship between the antecedent and the consequent is more significant than expected if the two were independent. With a lift of 1.0, we can say that the relationships appear as expected and are not significantly associated. For example, with the rule {fatigue} → {drowsy} with 50% support, and 80% confidence we could say that these two symptoms are experienced together by 50% of the patients, and “if a patient experiences fatigue, they are 80% likely to experience drowsiness’.’

Since symptom severity is non-binary data, we generate two categories for each symptom and use the labels low and severe to distinguish them. For one questionnaire, symptoms with a rating greater than 0 are considered occurring symptoms. A symptom is low if the patient rated its severity less than five and severe otherwise. The data models the transactions with one unique PRO for each patient, and the two items being “bought” together, indicating low or severe, are concurrent symptoms. We consider symptom clusters at two different time points. Acute symptoms refer to symptoms experienced during treatment (about six weeks from the start of treatment). For late symptoms, patients survey the PROs up to 18-months post-treatment. Symptoms with missing scores (NaN) were replaced with 0s. Patients with no PRO recorded during the acute or late phases were not included in the time frame analysis.

3. Experimental Results

The dataset used for these experiments consists of MDASI-HN responses for a cohort of 823 patients. The patient surveys were broken into acute and late time points with two items per symptom (low and severe) used to capture the severity of the symptoms. A total of 643 patients had at least one acute PRO, and 745 patients had at least one late PRO. Figure 1 shows the symptom’s overall support for low and severe symptoms during the acute and late time frames. As shown, in the acute stage, many patients experienced both low and severe symptoms during treatment. In contrast, symptoms experienced in the late stage have a lower severity than during the acute phase. We used minimum support of 20% for both the acute and late as it is the minimum cutoff between both stages for consistency in our analysis of each.

Fig. 1. Symptom Severity in the (a) acute and (b) late stages.

Fig. 1.

Acute: > half of patients experience low severity symptoms, while a sizable 20% experience severe symptoms. Late: patients experience mostly low rated symptoms with highest prevalence in fatigue, drymouth, swallow, and taste.

Table 2 shows the top 5 association rules with the highest lift for the acute and late stages. The top rule for the acute stage involves pain, taste, and mucositis. While this association is clinically valid, since mucositis presents as small painful oral ulcers in patients, it notably could interfere with oral functions like taste. Other studies have shown pain to cluster more closely to fatigue than mucositis [10,12]. For late symptoms, the top three rules include interference symptoms rated with low severity. The acute symptoms showed that HN-related and common cancer symptoms were more prevalent than in late-stage analysis. Notably, rules involving drowsy and fatigue with low severity are among the top rules for both the acute and late stages. Previous studies have also supported the association between these two symptoms, drowsy and fatigue, as a symptom cluster [9,10]. Caution is advised when interpreting ARM relationships, as rules are not indicating causality but rather the probability of co-occurrence. To help visualize the symptom clusters, we adopt a graph representation for association rules [13]. Figure 2 shows the top 20 association rules sorted by lift for acute and late symptoms. The circles encode rules with size and color representing the support and lift metrics. The blue rectangles encode symptoms. An arrow pointing towards a circle means that the associated symptom is an antecedent for the association rule. If the arrow points towards a symptom, that symptom is the consequent for the association rule.

Table 2. Top association rules for acute and late symptoms.

Top five rules for each stage with the highest lift. The symptom’s subscripts l and s stand for low and severe ratings, respectively.

Acute Stage Late Stage

antecendent consequent confidence lift antecendent consequent confidence lift

{pains, tastes} {mucositiss} .85 2.82 {general_activityl} {workl} .79 2.96
{mucuss, tastes} {swallows} .77 2.71 {enjoyl} {moodl} .75 2.84
{swallows, tastes} {mucuss} .89 2.70 {fatiguel, swallowl} {painl} .77 2.35
{mucuss, tastes} {drymouths} .75 2.64 {painl, fatiguel} {swallowl} .80 2.28
{drowsyl} {fatiguel} .76 2.19 {drowsyl} {fatiguel} .83 2.19

Fig. 2. Symptoms Association Rule Graph.

Fig. 2.

The graph encoding shows the top 20 association rules for (a) acute and (b) late symptoms. In the acute state there is a large cluster of severe symptoms. In the late stage, drowsy and fatigue appear to be anchor symptoms connecting a cluster of interference symptoms with a cluster of cancer symptoms.

For acute symptoms, two clusters are consistent with previously reported clusters for HN cancer [10]. For late symptoms, there are four identifiable clusters. Interestingly, drowsy and fatigue seem to be anchor symptoms between interference and HN-related symptoms, a relationship that more traditional approaches for symptom cluster research cannot capture. Furthermore, we found that pain is associated with both mucositis and fatigue. These findings highlight that symptoms could appear in different clusters with the ARM algorithm, providing a more accurate model for the complex relationships between symptoms. In contrast, highly occurring symptoms would cluster together earlier when symptoms are partitioned into clusters, as in hierarchical clustering.

4. Conclusion

We introduce association rule mining as a powerful approach to identify patient symptom clusters and uncover interesting relationships between symptoms. Our approach models PRO data as transactions, visualizes the most significant association rules in symptom clusters, and captures the severity of symptoms in both acute and late stages. When applied to PRO data from head and neck cancer patients, this approach correctly identified higher symptom prevalence and severity during treatment and a gradual decrease after treatment. The new acute symptom clusters found include severely rated HN-related and common cancer symptoms. The late symptom clusters found include more interference symptoms and low severity symptoms. Our analysis identifies new anchor symptom clusters that connect interference and HN-related symptoms, offering new opportunities for targeted interventions that could positively affect cancer patients’ quality of life while supporting previously identified SCs. In the future, we plan to include clinical variables such as staging, dose, and organs at risk [14,15] into the ARM analysis to determine whether patient characteristics are related to individual symptoms or symptoms clusters.

References

  • 1.Christopherson KM, et al. for the MD Anderson Head and Neck Cancer Symptom Working Group; Spatial-Non-spatial Multi-Dimensional Analysis of Radiotherapy Treatment/Toxicity Team (SMART3). Chronic radiation-associated dysphagia in oropharyngeal cancer survivors: Towards age-adjusted dose constraints for deglutitive muscles. Clin Transl Radiat Oncol 2019June15;18:16–22. doi: 10.1016/j.ctro.2019.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Wentzel A, Hanula P, van Dijk LV, Elgohari B, Mohamed ASR, Cardenas CE, Fuller CD, Vock DM, Canahuate G, Marai GE. Precision toxicity correlates of tumor spatial proximity to organs at risk in cancer patients receiving intensity-modulated radiotherapy. Radiother Oncol 2020July;148245–251. doi: 10.1016/j.radonc.2020.05.023. Epub 2020 May 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wentzel A, Hanula P, Luciani T, Elgohari B, Elhalawani H, Canahuate G, Vock D, Fuller CD, Marai GE. Cohort-based T-SSIM Visual Computing for Radiation Therapy Prediction and Exploration. IEEE Trans Vis Comput Graph 2020January;26(1):949–959. doi: 10.1109/TVCG.2019.2934546. Epub 2019 Aug 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Marai GE, Ma C, Burks AT, Pellolio F, Canahuate G, Vock DM, Mohamed ASR, Fuller CD. Precision Risk Analysis of Cancer Therapy with Interactive Nomograms and Survival Plots. IEEE Trans Vis Comput Graph 2019April;25(4):1732–1745. doi: 10.1109/TVCG.2018.2817557. Epub 2018 Mar 20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Multidisciplinary Larynx Cancer Working Group. Conditional Survival Analysis of Patients With Locally Advanced Laryngeal Cancer: Construction of a Dynamic Risk Model and Clinical Nomogram. Sci Rep 2017March9;7:43928. doi: 10.1038/srep43928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Miaskowski C, Barsevick A, Berger A, Casagrande R, Grady PA, Jacobsen P, Kutner J, Patrick D, Zimmerman L, Xiao C, Matocha M, Marden S. Advancing Symptom Science Through Symptom Cluster Research: Expert Panel Proceedings and Recommendations. J Natl Cancer Inst 2017January24;109(4):djw253. doi: 10.1093/jnci/djw253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cleeland CS, Mendoza TR, Wang XS, Chou C, Harle MT, Morrissey M, Engstrom MC. Assessing symptom distress in cancer patients: the M.D. Anderson Symptom Inventory. Cancer. 2000October1;89(7):1634–46. doi: . [DOI] [PubMed] [Google Scholar]
  • 8.Rosenthal DI, Mendoza TR, Chambers MS, Asper JA, Gning I, Kies MS, Weber RS, Lewin JS, Garden AS, Ang KK, S Wang X, Cleeland CS. Measuring head and neck cancer symptom burden: the development and validation of the M. D. Anderson symptom inventory, head and neck module. Head Neck 2007October;29(10):923–31. doi: 10.1002/hed.20602. [DOI] [PubMed] [Google Scholar]
  • 9.Skerman HM, Yates PM, Battistutta D. Multivariate methods to identify cancer-related symptom clusters. Res Nurs Health 2009June;32(3):345–360. doi: 10.1002/nur.20323. [DOI] [PubMed] [Google Scholar]
  • 10.Rosenthal DI, Mendoza TR, Fuller CD, Hutcheson KA, Wang XS, Hanna EY, Lu C, Garden AS, Morrison WH, Cleeland CS, Gunn GB. Patterns of symptom burden during radiotherapy or concurrent chemoradiotherapy for head and neck cancer: a prospective analysis using the University of Texas MD Anderson Cancer Center Symptom Inventory-Head and Neck Module. Cancer 2014July1;120(13):1975–84. doi: 10.1002/cncr.28672. Epub 2014 Apr 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB ’94), 487–499. 1994, 487–499. doi: 10.5555/645920.672836 [DOI] [Google Scholar]
  • 12.Kirkova J, Aktas A, Walsh D, Davis MP. Cancer symptom clusters: clinical and research methodology. J Palliat Med 2011October;14(10):1149–66. doi: 10.1089/jpm.2010.0507. Epub 2011 Aug 23. [DOI] [PubMed] [Google Scholar]
  • 13.Hahsler M. arulesViz: Interactive Visualization of Association Rules with R. R Journal 2017: 9. 163–175. doi: 10.32614/RJ-2017-047. [DOI] [Google Scholar]
  • 14.Tosado J, Zdilar L, Elhalawani H, Elgohari B, Vock DM, Marai GE, Fuller C, Mohamed ASR, Canahuate G. Clustering of Largely Right-Censored Oropharyngeal Head and Neck Cancer Patients for Discriminative Groupings to Improve Outcome Prediction. Sci Rep 2020March2;10(1):3811. doi: 10.1038/s41598-020-60140-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Luciani T, Wentzel A, Elgohari B, Elhalawani H, Mohamed A, Canahuate G, Vock DM, Fuller CD, Marai GE. A spatial neighborhood methodology for computing and analyzing lymph node carcinoma similarity in precision medicine. J Biomed Inform 2020;112S:100067. doi: 10.1016/j.yjbinx.2020.100067. Epub 2020 Feb 3. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES