Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Feb 1.
Published in final edited form as: Arthritis Care Res (Hoboken). 2022 Sep 15;75(2):210–219. doi: 10.1002/acr.24973

Using Unsupervised Machine Learning Methods to Cluster Comorbidities in a Population-based Cohort of Patients with Rheumatoid Arthritis

Cynthia S Crowson 1,2, Tina M Gunderson 1, John M Davis III 2, Elena Myasoedova 1,2, Vanessa L Kronzer 2, Caitrin M Coffey 2, Elizabeth J Atkinson 1
PMCID: PMC9763549  NIHMSID: NIHMS1818406  PMID: 35724274

Abstract

Objective:

To identify clusters of comorbidities in patients with rheumatoid arthritis (RA) using four methods and compare to patients without RA.

Methods:

In this retrospective, population-based study, residents of 8 Minnesota counties with prevalent RA on 1-1-2015 were identified. Age, sex and county-matched non-RA comparators were selected from the same underlying population. Diagnostic codes were retrieved for 5 years before 1-1-2015. Using 2 codes ≥30 days apart, 44 previously defined morbidities and 11 non-overlapping chronic disease categories based on Clinical Classifications Software were defined. Unsupervised machine learning methods of interest included hierarchical clustering, factor analysis, k-means clustering, and network analysis.

Results:

Two groups of 1643 patients with and without RA (72% female; mean age 63.1 years in both groups) were studied. Clustering of comorbidities revealed strong associations among mental/behavioral comorbidities and among cardiovascular risk factors and diseases. Clusters were associated with age and sex. Differences between the four clustering methods were driven by comorbidities that are rare and those that were weakly associated with other comorbidities. Common comorbidities tended to group together consistently across approaches. The instability of clusters when using different random seeds or bootstrap sampling impugns the usefulness and reliability of these methods. Clusters of common comorbidities between RA and non-RA cohorts were similar.

Conclusion:

Despite the higher comorbidity burden in patients with RA compared to the general population, clustering comorbidities did not identify substantial differences in comorbidity patterns between the RA and non-RA cohorts. The instability of clustering methods suggest caution when interpreting clustering using one method.

Keywords: Rheumatoid arthritis, comorbidity

INTRODUCTION

The majority of patients with rheumatoid arthritis (RA) also suffer from other comorbidities, which negatively impact their health outcomes.(1, 2) Due to the rising incidence of multimorbidity in the general population, systematic study of the co-occurrence of comorbidities has been identified as a research priority by the US Department of Health and Human Services, which was recently reinforced in Nature.(3, 4) Clustering comorbidities is a way to group them into homogeneous categories within which the comorbidities are strongly related to each other. Several statistical methods for clustering dichotomous variables (i.e., presence or absence of comorbidities) are available and most researchers only use a single method. Whether different methods produce different clusters and which methods perform the best has not been evaluated for clustering of comorbidities.

Comorbidity patterns among patients with RA may differ from those in patients without RA because comorbidities can result from the disease itself, from RA treatments, and/or from the consequences of RA (e.g., physical limitations) that can provoke new comorbidities and alter outcomes. Thus, investigations of comorbidity patterns in the general population will likely fail to identify the RA-specific comorbidity patterns due to the low (~1-2%) prevalence of RA and the complexity of the autoimmune responses involved in the disease. For example, we found an RA-specific association between osteoporotic fractures and cardiovascular disease (CVD), which was not found among comparators without RA and may be related to adverse effects of glucocorticoid use in patients with RA.(5) Clustering comorbidities in patients with RA may provide new mechanistic insights regarding co-occurring comorbidities and may help identify associated comorbidities in order to facilitate early recognition and prevention efforts. We aimed to 1) identify clusters of comorbidities in patients with RA using several clustering approaches and 2) compare clusters identified among patients with RA to clusters identified in patients without RA. We hypothesized that clusters of comorbidities in patients with RA would differ from comorbidity clusters in patients without RA.

METHODS

The study included residents of 8 counties in Minnesota (Olmsted, Dodge, Mower, Goodhue, Wabasha, Freeborn, Steele and Waseca) with prevalent RA on January 1, 2015. The study was assembled using the resources of the Rochester Epidemiology Project (REP). The REP is a unique medical record linkage system that provides complete access to inpatient and outpatient medical records of all residents of Olmsted County, MN from all local health care providers for more than 50 years. Its history and utility for epidemiological investigations has been described in detail elsewhere.(6) Beginning in 2010, the REP has expanded its catchment area to a 27 county region.(7) The complete inpatient and outpatient medical records for each potential case were manually reviewed by experienced nurse abstractors, and all incident patients fulfilled the 1987 American College of Rheumatology (ACR) classification criteria for RA.(8) For those who moved to Olmsted County with pre-existing RA, physician diagnosis with disease modifying anti-rheumatic drug use was accepted if documentation of criteria fulfilment at the original diagnosis of RA was not available. For each patient with RA, a subject without RA of similar age, sex and county of residency on January 1, 2015 was randomly selected to form the non-RA comparison cohort.

Diagnostic codes from all healthcare providers in the counties of interest were retrieved for 5 years prior to the prevalence date. The 55 previously identified comorbidities of interest in patients with RA were defined using 2 ICD-9 codes ≥30 days apart based on previously published lists.(9, 10) Obesity was defined as subjects having either 2 codes ≥30 days apart or having a BMI ≥30 kg/m2 on January 1, 2015.

Statistical Methods

Descriptive statistics (medians, percentages, etc.) were used to summarize the data. Comparisons between cohorts were performed using chi-square and rank-sum tests. Comorbidities were clustered using four approaches and were run separately for the RA and non-RA cohorts. Default options and recommended similarity metrics were used for each approach, mimicking how these tools might be commonly used. First, hierarchical clustering was conducted using the ClustOfVar R package.(11) With this approach, measures of similarity and dissimilarity for qualitative measures (here, dichotomous) are based on the PCAMIX method; eight clusters were arbitrarily chosen for both cohorts.(12) The second approach was factor analysis using tetrachoric correlation to define similarity between pairs of comorbidities. The Wayne Velicer’s Minimum Average Partial (MAP) criterion was used to identify the optimal number of clusters.(13) This approach uniquely allowed comorbidities to appear in more than one factor. In order to minimize the overlap, loading values < .3 were not displayed. For comparisons with the other approaches, each comorbidity was assigned to the cluster which had the highest loading value. Next, k-means clustering analysis was performed also using the ClustOfVar package; for comparative purposes, the number of clusters was again set to eight. The final approach was network analysis using the R packages ggraph and tidygraph. Cramer’s V was used as the metric to define association between pairs of comorbidities. Only pairwise relationships greater than specified standard cutoffs (i.e., .05=0.22 and .10=0.32) are displayed. Alluvial plots (ggalluvial package) were used to visually compare clusters obtained using different approaches. More than one random seed was used for k-means clustering and network analysis to examine stability of the clusters. Stability of hierarchical clustering and factor analysis methods was examined using 100 bootstrap samples for each cohort. For the analysis of bootstrap samples, comorbidities occurring in <5% of patients with RA were excluded to reduce stability issues related to low prevalence. Analyses were performed using R 4.0.3 (R Foundation for Statistical Computing). This study was approved by institutional review boards of Mayo Clinic (IRB #17-002593) and Olmsted Medical Center (IRB #017-OMC-17).

RESULTS

A total of 1643 patients with RA and 1643 non-RA subjects were included in the study (Supplemental Table 1). The RA cohort had a median (Q1, Q3) age of 64 (54, 74) years, was 72% female, and 92% white. The median number of comorbidities was 5 (3, 9). The non-RA cohort had a median age of 64 (54, 74) years, was 72% female, and 91% white. The median number of comorbidities was 4 (1, 7). The median length of prior history (truncated at 5 years) was 5.0 for both cohorts with 96% of RA and 91% of non-RA having a full 5 years of prior medical history available.

Clustering comorbidities in patients with RA

Figure 1 shows the results of hierarchical clustering using the RA cohort. The first cluster is mostly bone-related; the second cluster includes circulatory system disorders as well as gout, anemia and renal disease; the third cluster includes CVD risk factors, hypothyroidism and vision loss. The fourth cluster is a heterogenous group of conditions without a clear theme. It includes several genitourinary conditions that are more common in men, as well as neurological conditions, gastrointestinal disorders, cancer, interstitial lung disease (ILD), and others. The fifth cluster includes primarily mental/behavioral conditions, and cluster 6 includes blood disorders. Cluster 7 includes pain-related conditions as well as vitamin D deficiency, obesity, and sleep problems. The eighth cluster is mostly lung conditions but excludes ILD. Figure 2 graphically displays each comorbidity with respect to the mean age and percent female for those who have that comorbidity. Those with a comorbidity in cluster 4 were the oldest subjects and on average cluster 5 included the youngest subjects. Cluster 4 includes conditions more frequently occurring in men and cluster 7 includes conditions occurring more frequently in women.

Figure 1.

Figure 1.

Dendrogram based on hierarchical clustering of variables applied to patients with rheumatoid arthritis divided into eight clusters. Abbreviations: BPH=benign prostatic hyperplasia, circ=circulatory, COPD=chronic obstructive pulmonary disease, GERD=gastroesophageal reflux disease, PTSD=post-traumatic stress disorder.

Figure 2.

Figure 2.

Comorbidities according to the mean age and percent female for those who have that comorbidity within the rheumatoid arthritis cohort. The names of the comorbidities are color coded based on the hierarchical clustering. Abbreviations: BPH=benign prostatic hyperplasia, circ=circulatory, COPD=chronic obstructive pulmonary disease, GERD=gastroesophageal reflux disease, PTSD=post-traumatic stress disorder.

Factor analysis was also performed to cluster comorbidities in patients with RA. The optimal number of factors was found to be three, and Figure 3 shows the primary loadings for each factor. Unlike hierarchical clustering, the factors are ordered by the amount of variability they explain, and factor analysis allows a single comorbidity to contribute to multiple factors. Factor 1 aligns mostly with hierarchical clusters 2 and 3, which include CVD and CVD risk factors, respectively. However, factor 1 also includes a variety of other conditions with no clear theme. Factor 2 corresponds primarily to hierarchical clusters 5 and 7 (i.e., mostly mental/behavioral and pain-related conditions), but also includes restless-leg syndrome, dementia, and liver disease. Factor 3 includes a wide variety of comorbidities with no clear theme and were not clustered together in the hierarchical clustering.

Figure 3.

Figure 3.

Loadings for three factors identified using factor analysis for the rheumatoid arthritis cohort. The colors of the bars correspond to the hierarchical clustering analysis to facilitate comparisons between methods. Loadings less than an absolute value of 0.3 were not displayed. Note some comorbidities are included in multiple factors and others are not included in any factor. Abbreviations: BPH=benign prostatic hyperplasia, circ=circulatory, COPD=chronic obstructive pulmonary disease, GERD=gastroesophageal reflux disease, PTSD=post-traumatic stress disorder.

K-means clustering applied to comorbidities among patients with RA produced clusters that were similar to the hierarchical clusters. Figure 4 shows an alluvial plot illustrating the mapping of each comorbidity from clusters identified using K-means clustering and hierarchical clustering. Most comorbidities in K-means cluster 1 mapped to hierarchical cluster 2, but hypercalcemia and skin ulcers combined with hierarchical cluster 8 instead. K-means cluster 2 included lung diseases, such as asthma and chronic obstructive pulmonary disease (COPD) from hierarchical cluster 3 and ILD and solid cancers from hierarchical cluster 6. K-means cluster 3 included all comorbidities in hierarchical cluster 4 except osteoarthritis and also included anxiety and depression from hierarchical cluster 1 and dementia, peptic ulcer disease restless leg syndrome and urinary incontinence from hierarchical cluster 6. K-means cluster 4 includes mental/behavioral conditions, K-means cluster 5 includes only liver disease and thrombocytopenia. The theme of K-means cluster 7 is unclear, as it groups allergic rhinitis, sinusitis and skin disorders with hematologic cancers and leukopenia. K-means cluster 8 includes exclusively male conditions of erectile dysfunction and benign prostatic hyperplasia. It is important to note that the K-means clustering approach has a random component and thus different runs of the algorithm produced different clustering of the comorbidities (Supplemental Figure 1).

Figure 4.

Figure 4.

Mapping between the k-means clusters and the prior hierarchical clusters identified using the rheumatoid arthritis cohort. For both approaches, eight clusters were chosen. Abbreviations: BPH=benign prostatic hyperplasia, circ=circulatory, COPD=chronic obstructive pulmonary disease, GERD=gastroesophageal reflux disease, PTSD=post-traumatic stress disorder.

The final clustering approach was network analysis which relies on some measure of similarity between pairs of variables (e.g., comorbidities). In our example we used the Cramer’s V distance metric which is similar to a correlation coefficient for binary variables. Instead of pre-specifying the number of desired clusters, this approach requires specification of a cutoff for the distance metric. Figure 5 shows diagrams using two different cutoffs (Cramer’s V=.22 and V=.32). The thickness of the line between pairs of comorbidities indicates the strength of the relationship (e.g., the line between hyperlipidemia and hypertension is much thicker than the line between peripheral vascular disease [PVD] and coronary artery disease [CAD]). The algorithm then picks a number of clusters. In panel A, depression and anxiety were clustered with bipolar disorders, post-traumatic stress disorder, restless leg syndrome, obesity and sleep disorders, however in panel B, only the strong relationship between anxiety and depression remained. Without some level of filtering, the figure becomes unreadable (i.e., the middle part of panel A). Retention of a random starting value (seed) is also important with this approach for reproducibility purposes. Different random seeds produce different clusters (data not shown).

Figure 5.

Figure 5.

Clusters identified using network analysis among subjects with rheumatoid arthritis. Panel A includes comorbidities that have pairwise Cramer’s V > .22 while panel B only includes Cramer’s V > .32. Thicker lines correspond to stronger relationships. Cramer’s V is a measure of the relative strength of an association between two variables and ranges from 0 to 1. Abbreviations: alcohol=Alcohol abuse, aller_rhin=Allergic rhinitis, arrhythm=Cardiac arrhythmia, back=Chronic back pain, bipolar=Bipolar disorder, cad=Coronary artery disease, copd=Chronic obstructive pulmonary disease, dm=Diabetes mellitus, drug=Drug abuse, hemat_ca=Hematologic cancer, hf=Heart failure, htn=Hypertension, oa=Osteoarthritis, ptsd=Post-traumatic stress disorder, pvd=Peripheral vascular disease, rls=Restless leg syndrome, sinusitis=Chronic sinusitis, skin_ulcer= Chronic skin ulcers, sleep=Sleep disorders, solid_ca=Solid cancer, stroke=Cerebrovascular disease, valvular=Valvular heart disease.

A full comparison of the clusters using the four approaches is available in Supplemental Table 2. Comorbidities that consistently clustered together across all 4 clustering methods included 1) anemia, cardiac arrhythmias, CAD, PVD, heart failure, valvular heart disease, renal disease, cerebrovascular disease, pulmonary circulation disorders, 2) hypertension, hyperlipidemia, severe vision reduction, and diabetes mellitus, 3) anxiety and depression, 4) alcohol and drug abuse, 5) post-traumatic stress disorder and bipolar disorder, and 6) obesity and sleep disorders.

Stability of comorbidity clusters in patients with RA

Since changing the random seeds impacted the clusters in both the k-means and network analysis methods, the stability of the clusters was also investigated in hierarchical clustering and factor analysis using bootstrap sampling. Comorbidities with low prevalence (<5%) were excluded from these analyses. Substantial instability of clusters in patients with RA was demonstrated for hierarchical clustering (left panel of Figure 6). Comorbidities that consistently clustered together (>90% of bootstrap samples) were 1) anxiety, depression, and headache, 2) headaches and gynecological disorders, 3) obesity and sleep disorders, 4) osteoporosis, anemia and hypercalcemia, 5) back problems and osteoarthritis, 6) diabetes mellitus, hypertension, and hyperlipidemia and 7) CAD, PVD, cardiac arrythmias and valvular heart disease. Substantial instability of clusters in patients with RA was also demonstrated for factor analysis (left panel of Supplemental figure 2). Comorbidities that consistently clustered together (>90% of bootstrap samples) were 1) anxiety, depression, headache, asthma, skin disorders and GERD, and 2) cardiac arrythmias and valvular heart disease.

Figure 6.

Figure 6.

Heat maps from 100 bootstrap samples for hierarchical clusters of comorbidities with at least 5% prevalence from the cohorts with rheumatoid arthritis (RA; left panel) and without RA (non-RA; right panel). Abbreviations: BPH=benign prostatic hyperplasia, circ=circulatory, COPD=chronic obstructive pulmonary disease, GERD=gastroesophageal reflux disease, PTSD=post-traumatic stress disorder.

Clustering comorbidities in patients without RA

Given the instability demonstrated for clustering in patients with RA, formal comparison of clusters between the RA and non-RA cohorts were not pursued. Clustering of patients without RA was performed using bootstrap samples with both hierarchical clustering and factor analysis methods. Comorbidities with low prevalence (<5%) in the RA cohort were excluded from these analyses; all excluded comorbidities had <5% prevalence in the non-RA cohort. Substantial instability of clusters in patients without RA was demonstrated for hierarchical clustering (right panel of Figure 6). Comorbidities that consistently clustered together (>90% of bootstrap samples) were 1) anxiety and depression, 2) obesity and sleep disorders, 3) back problems, osteoarthritis, osteoporosis, and neuropathy, 3) diabetes mellitus, hypertension, and hyperlipidemia and 4) CAD, PVD, arrythmia and valvular heart disease. Substantial instability of clusters in patients with RA was also demonstrated for factor analysis (right panel of Supplemental figure 2). Comorbidities that consistently clustered together (>90% of bootstrap samples) were 1) anxiety, depression, headache, and asthma, 2) CAD, PVD, cardiac arrythmia and valvular heart disease, 3) anemia and hypercalcemia, and 4) diabetes mellitus, hyperlipidemia and obesity.

DISCUSSION

Clustering of comorbidities revealed strong associations among mental/behavioral comorbidities and among cardiovascular risk factors and diseases in both the RA and non-RA cohorts, as expected. Clusters were associated with age and sex. The cluster of mental/behavioral comorbidities contained conditions with the youngest mean age and the cluster including dementia included conditions affecting more patients of older ages. Comorbidities that commonly affected women were also clustered together. Differences between the four clustering methods were driven by comorbidities that are uncommon (i.e., rare) and those that were weakly associated with other comorbidities. Common comorbidities tended to group together consistently across approaches. However, the differences between clusters identified using different methods, the lack of clear themes for some clusters, and the instability of clusters when using different random seeds or bootstrap sampling called into question the usefulness and reliability of these methods. Differences in clustering between RA and non-RA cohorts were numerous, but when comparing comorbidities that consistently clustered together in each cohort, differences between RA and non-RA cohorts were minimal.

To our knowledge, clustering of comorbidities has not been reported previously in patients with RA. One preliminary report from England et al. reported clusters of comorbidities based on factor analysis using data from MarketScan and Veterans’ Health Administration.(14) The clusters of comorbidities based on factor analysis among men and among women in each data source revealed consistent results regarding mental/behavioral/pain-related and CVD clusters in each analysis with some inconsistencies regarding where pulmonary and metabolic conditions were clustered. Clustering of comorbidities in several populations revealed similar results with clusters for CVD, mental/behavioral disorders and musculoskeletal disorders reported commonly.(15-19) Others have utilized clustering methods in patients with RA to define patient subgroups based on clinical features and disease activity measures.(20, 21) Clustering of comorbidities as well as clustering of patients based on their comorbidities has been reported in other rheumatic diseases (e.g., gout) to identify clinical phenotypes.(22, 23) Clustering of comorbidities, instead of patients, is potentially useful for gaining mechanistic insights regarding comorbidity patterns. Other authors have suggested clustering comorbidities may be useful for designing clinical guidelines for primary care of patients with multimorbidity. This notion is well aligned with the concept that investigation of comorbidity clusters could lead to linkages that help alert clinicians regarding which patients have increased risk for which comorbidities (e.g., the known linkage between diabetes mellitus and CVD risk). Our study hoped to provide unique linkages for comorbidities in patients with RA to help disentangle the complexity of multimorbidity in these patients, but such insights were not found. This finding suggests that despite having more comorbidities, patients with RA do not have different comorbidity patterns than patients without RA.

Our study compared four different methods for clustering comorbidities. Comparing the results of these methods demonstrated instability across methods. Instability within each method was also demonstrated as different random seeds or bootstrap samples produced variation in the clusters. Most reports use a single method when clustering and may not explain why that method was chosen. Binary data on the presence or absence of comorbidities require careful consideration of which method to utilize, as some clustering methods do not work well with binary data. In addition to the choice of method, each method involves analysis choices regarding the number of clusters or the distance measure, and some even require a random seed, which can yield different results when modified.(24, 25) These results suggest caution when interpreting results of clustering based on a single method, which is a long-standing admonition in the statistical literature.(26-29) Von Luxburg et al. provide important considerations for evaluating the success of clustering analyses, and they stress the importance of context when evaluating different methods for clustering.(30) In this exploratory, unsupervised clustering context, instability was noted in each clustering method and none provided strong evidence of the unique linkages of previously unknown relationships between comorbidities that were hypothesized.

The minimal differences in comorbidity clusters between patients with and without RA suggest patients with RA do not have different comorbidity profiles, despite having more comorbidities than patients without RA. This is consistent with previous reports of earlier development of age-related comorbidities in patients with RA, and the concept of accelerated aging, which has previously been demonstrated.(31-33) The strong relationship between age and the comorbidities that clustered together further supports this. Aging acts through a number of biological mechanisms at the cellular or tissue level such as inflammation, altered turnover and repair, mitochondrial dysfunction, epigenetic alteration, telomere attrition, impaired signaling, and stem cell exhaustion that may lead to multi-system loss of reserve and function. This multi-system loss expresses itself as increased disease susceptibility, reduced functional reserve, reduced healing capacity and stress resistance, unstable health, and failure to thrive, and may lead to geriatric syndromes such as gait disorders, urinary incontinence, sleep disorders, cognitive impairment, and disability.(15) When the impairment reaches a clinical threshold, it manifests as multimorbidity. Thus, the accumulation of medical conditions with age is a proxy measure for the rate of aging and for the loss of resilience.(34, 35) However, the increased prevalence of individual conditions could also be explained by other factors (e.g., inflammation, drug side effects, autoimmunity, damage, fibrosis), which are all intrinsic to the RA disease process or its treatment.

This study has multiple strengths, including the population-based cohorts of patients with RA and non-RA, the comprehensive resources of the REP that capture all medical care from all providers in the community, and the long length of prior medical history to assess comorbidities. The use of diagnostic codes to define morbidities and the retrospective study design, are limitations of this study. The positive predictive value using a definition of 2+ diagnostic codes ≥30 days apart varies across morbidities. However a recent study showed >90% positive predictive value for many morbidities with the exception of dementia, which had a positive predictive value <50%.(36) Only diagnoses that came to medical attention and were documented in the medical records were included. However, the focus on chronic morbidities minimizes the risk of missing morbidities of interest. In addition, some morbidities of interest may not have been included, despite the fairly comprehensive list of morbidities considered in this manuscript, and information on the severity of the morbidities was not available. Furthermore, analyses were not stratified by clinical RA phenotypes (e.g., seropositivity) or RA treatment, severity or activity. Finally, the study population is >90% white, which may limit generalizability to more diverse populations.

In conclusion, different clustering methods produce different clusters of comorbidities. Common comorbidities showed more stability in clustering together using multiple approaches, while rare comorbidities displayed inconsistent clustering across approaches. Age is a dominant factor that influences the clustering of comorbidities since many comorbidities are related to aging. Patients with RA did not demonstrate important differences in comorbidity clusters compared to patients without RA. This lack of differences in comorbidity clusters along with earlier onset of comorbidities in patients with RA supports the concept of accelerated aging in RA, but other explanations are also plausible. The instability of clustering results within and between methods suggest caution when interpreting results of clustering based on a single method; using multiple clustering methods may provide more information about reproducibility of the results. Further research is needed to identify subgroups of patients with RA who are at the highest risk of being in each of the comorbidity clusters in order to target preventive efforts.

Supplementary Material

Supplement

Significance and Innovations.

  • Clustering of comorbidities has not been reported previously in patients with RA and could provide unique linkages for comorbidities in patients with RA to help disentangle the complexity of multimorbidity in these patients.

  • Different clustering methods produced different clusters of comorbidities. Common comorbidities showed more stability in clustering together using multiple approaches, while rare comorbidities displayed inconsistent clustering across approaches. The instability of results within and between clustering methods suggest caution when interpreting results of clustering based on a single method.

  • Age is a dominant factor that influences the clustering of comorbidities since many comorbidities are related to aging.

  • Patients with RA did not demonstrate important differences in comorbidity clusters compared to patients without RA, despite the higher comorbidity burden in patients with RA.

Financial support:

This work was funded by grants from the National Institutes of Health, NIAMS (R01 AR46849) and NIA (R01 AG068192). Research reported in this publication was supported by the National Institute of Aging of the National Institutes of Health under Award Number R01AG034676. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

The authors declare no conflict of interest.

REFERENCES

  • 1.Gunderson TM, Myasoedova E, Davis JM 3rd, Crowson CS. Multimorbidity Burden in Rheumatoid Arthritis: A Population-based Cohort Study. J Rheumatol. 2021;48(11):1648–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Canning J, Siebert S, Jani BD, Harding-Edgar L, Kempe I, Mair FS, et al. Examining the relationship between rheumatoid arthritis, multimorbidity and adverse health-related outcomes: a systematic review. Arthritis Care Res (Hoboken). 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.U.S. Department of Health and Human Services. Multiple Chronic Conditions - A Strategic Framework: Optimum Health and Quality of Life for Individuals with Multiple Chronic Conditions Washington, D.C.2010. Available from: https://www.hhs.gov/sites/default/files/ash/initiatives/mcc/mcc_framework.pdf. [Google Scholar]
  • 4.Whitty CJM, Watt FM. Map clusters of diseases to tackle multimorbidity. Nature. 2020;579(7800):494–6. [DOI] [PubMed] [Google Scholar]
  • 5.Ni Mhuircheartaigh O, Crowson CS, Gabriel SE, Roger VL, Melton LJ 3rd, Amin S. Fragility Fractures Are Associated with an Increased Risk for Cardiovascular Events in Women and Men with Rheumatoid Arthritis: A Population-based Study. J Rheumatol. 2017;44(5):558–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.St Sauver JL, Grossardt BR, Yawn BP, Melton LJ 3rd, Rocca WA. Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester epidemiology project. Am J Epidemiol. 2011;173(9):1059–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Rocca WA, Grossardt BR, Brue SM, Bock-Goodner CM, Chamberlain AM, Wilson PM, et al. Data Resource Profile: Expansion of the Rochester Epidemiology Project medical records-linkage system (E-REP). Int J Epidemiol. 2018;47(2):368–j. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Arnett FC, Edworthy SM, Bloch DA, McShane DJ, Fries JF, Cooper NS, et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum. 1988;31(3):315–24. [DOI] [PubMed] [Google Scholar]
  • 9.Crowson CS, Gunderson TM, Dykhoff HJ, Myasoedova E, Atkinson EJ, Kronzer VL, et al. Comprehensive assessment of multimorbidity burden in a population-based cohort of patients with rheumatoid arthritis. RMD Open. 2022;8(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.England BR, Roul P, Yang Y, Sayles H, Yu F, Michaud K, et al. Burden and trajectory of multimorbidity in rheumatoid arthritis: a matched cohort study from 2006 to 2015. Ann Rheum Dis. 2021;80:286–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chavent M, Kuentz-Simonet VK, Liquet B, Saracco J. ClustOfVar: An R Package for the Clustering of Variables. J Stat Softw. 2012;50(13):1–16.25317082 [Google Scholar]
  • 12.Kiers H. Simple structure in component analysis techniques for mixtures of qualitative and quantitative variables. Psychometrika. 1991;56:197–212. [Google Scholar]
  • 13.Garrido LE, Abad FJ, Ponsoda V. Performance of Velicer's Minimum Average Partial Factor Retention Method With Categorical Variables. Educ Psychol Meas. 2011;71(3):551–70. [Google Scholar]
  • 14.England B, Yang Y, Roul P, Haas C, Najjar L, Sayles H, et al. Identification of Multimorbidity Patterns in Rheumatoid Arthritis Through Machine Learning [abstract]. . Arthritis Rheumatol. 2020;72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fabbri E, Zoli M, Gonzalez-Freire M, Salive ME, Studenski SA, Ferrucci L. Aging and Multimorbidity: New Tasks, Priorities, and Frontiers for Integrated Gerontological and Clinical Research. J Am Med Dir Assoc. 2015;16(8):640–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mino-Leon D, Reyes-Morales H, Doubova SV, Perez-Cuevas R, Giraldo-Rodriguez L, Agudelo-Botero M. Multimorbidity Patterns in Older Adults: An Approach to the Complex Interrelationships Among Chronic Diseases. Arch Med Res. 2017;48(1):121–7. [DOI] [PubMed] [Google Scholar]
  • 17.Foguet-Boreu Q, Violan C, Rodriguez-Blanco T, Roso-Llorach A, Pons-Vigues M, Pujol-Ribera E, et al. Multimorbidity Patterns in Elderly Primary Health Care Patients in a South Mediterranean European Region: A Cluster Analysis. PLoS One. 2015;10(11):e0141155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Guisado-Clavero M, Roso-Llorach A, Lopez-Jimenez T, Pons-Vigues M, Foguet-Boreu Q, Munoz MA, et al. Multimorbidity patterns in the elderly: a prospective cohort study with cluster analysis. BMC Geriatr. 2018;18(1):16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wang L, Palmer AJ, Cocker F, Sanderson K. Multimorbidity and health-related quality of life (HRQoL) in a nationally representative population sample: implications of count versus cluster method for defining multimorbidity on HRQoL. Health Qual Life Outcomes. 2017;15(1):7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Curtis JR, Weinblatt M, Saag K, Bykerk VP, Furst DE, Fiore S, et al. Data-Driven Patient Clustering and Differential Clinical Outcomes in the Brigham and Women's Rheumatoid Arthritis Sequential Study Registry. Arthritis Care Res (Hoboken). 2021;73(4):471–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Mars N, Kerola AM, Kauppi MJ, Pirinen M, Elonheimo O, Sokka-Isler T. Cluster analysis identifies unmet healthcare needs among patients with rheumatoid arthritis. Scand J Rheumatol. 2021:1–8. [DOI] [PubMed] [Google Scholar]
  • 22.Bevis M, Blagojevic-Bucknall M, Mallen C, Hider S, Roddy E. Comorbidity clusters in people with gout: an observational cohort study with linked medical record review. Rheumatology (Oxford). 2018;57(8):1358–63. [DOI] [PubMed] [Google Scholar]
  • 23.Richette P, Clerson P, Perissin L, Flipo RM, Bardin T. Revisiting comorbidities in gout: a cluster analysis. Ann Rheum Dis. 2015;74(1):142–7. [DOI] [PubMed] [Google Scholar]
  • 24.Preud'homme G, Duarte K, Dalleau K, Lacomblez C, Bresso E, Smail-Tabbone M, et al. Head-to-head comparison of clustering methods for heterogeneous data: a simulation-driven benchmark. Sci Rep. 2021;11(1):4202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Coombes CE, Liu X, Abrams ZB, Coombes KR, Brock G. Simulation-derived best practices for clustering clinical data. J Biomed Inform. 2021;118:103788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Armstrong JS. Derivation of Theory by Means of Factor Analysis or Tom Swift and His Electric Factor Analysis Machine. The American Statistician. 1967;21:17–21. [Google Scholar]
  • 27.Smith SP, Dubes R. Stability of a hierarchical clustering. Pattern recognition. 1980;12(3):177–87. [Google Scholar]
  • 28.van der Kloot WA, Spaans AM, Heiser WJ. Instability of hierarchical cluster analysis due to input order of the data: the PermuCLUSTER solution. Psychol Methods. 2005;10(4):468–76. [DOI] [PubMed] [Google Scholar]
  • 29.Velicer WF. A comparison of the stability of factor analysis, principal component analysis, and rescaled image analysis. Educ Psychol Meas. 1974;34(3):563–72. [Google Scholar]
  • 30.von Luxburg U, Williamson RC, Guyon I. Clusterind: Science of Art? JMLR: Workshop and Conference Proceedings. 2012(27):65–79. [Google Scholar]
  • 31.Crowson CS, Liang KP, Therneau TM, Kremers HM, Gabriel SE. Could accelerated aging explain the excess mortality in patients with seropositive rheumatoid arthritis? Arthritis Rheum. 2010;62(2):378–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Crowson CS, Therneau TM, Davis JM 3rd, Roger VL, Matteson EL, Gabriel SE. Brief report: accelerated aging influences cardiovascular disease risk in rheumatoid arthritis. Arthritis Rheum. 2013;65(10):2562–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bauer ME. Accelerated immunosenescence in rheumatoid arthritis: impact on clinical progression. Immun Ageing. 2020;17:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Fabbri E, An Y, Zoli M, Simonsick EM, Guralnik JM, Bandinelli S, et al. Aging and the burden of multimorbidity: associations with inflammatory and anabolic hormonal biomarkers. J Gerontol A Biol Sci Med Sci. 2015;70(1):63–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kennedy BK, Berger SL, Brunet A, Campisi J, Cuervo AM, Epel ES, et al. Geroscience: linking aging to chronic disease. Cell. 2014;159(4):709–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.St Sauver JL, Chamberlain AM, Bobo WV, Boyd CM, Finney Rutten LJ, Jacobson DJ, et al. Implementing the US Department of Health and Human Services definition of multimorbidity: a comparison between billing codes and medical record review in a population-based sample of persons 40-84 years old. BMJ Open. 2021;11(4):e042870. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

RESOURCES