Skip to main content
European Journal of Rheumatology logoLink to European Journal of Rheumatology
. 2022 Feb 8;9(1):3–7. doi: 10.5152/eurjrheum.2020.21225

Clinical subgroup clustering analysis in a systemic lupus erythematosus cohort from Western Pennsylvania

Patrick Coit 1,2,1,2, Lacy Ruffalo 3, Amr H Sawalha 1,3,4,5,1,3,4,5,1,3,4,5,1,3,4,5,
PMCID: PMC10089140  PMID: 34554910

Abstract

Objective:

Systemic lupus erythematosus (SLE) is a complex heterogenous autoimmune disease that can affect multiple organs. We performed clinical clustering analysis to describe a lupus cohort from the University of Pittsburgh Medical Center.

Methods:

A total of 724 patients who met the American College of Rheumatology (ACR) classification criteria for SLE were included in this study. Clustering was performed using the ACR classification criteria and the partitioning around medoid method. Correlation analysis was performed using the Spearman's Rho test.

Results: Patients with SLE in our cohort identify three district clinical disease subsets. Patients in cluster 1 were significantly more likely to develop renal and hematologic involvement, and had overrepresentation in African–American and male lupus patients. Clusters 2 and 3 identified a milder disease, with a significantly less likelihood of organ complications. Patients in cluster 2 are characterized by malar rash and photosensitivity, while patients in cluster 3 are characterized by oral ulcers, which is present in ∼90% of patients within this cluster. The presence of photosensitivity or oral ulcers appears to be protective against the development of lupus nephritis in our cohort.

Conclusion: We describe a large cohort of SLE from Western Pennsylvania and identify three distinct clinical disease subgroups. Clustering analysis might help to better manage and predict disease complications in heterogenous diseases like lupus.

Keywords: Lupus, cohort, clustering, subsets

Introduction

Systemic lupus erythematosus (SLE or lupus) is a chronic remitting–relapsing autoimmune disease characterized by the production of antinuclear antibodies. Lupus is heterogenous and can affect multiple organ systems.1 Although more commonly affects women, lupus tends to be more severe in men.2 In the United States, lupus is more common and more severe in patients of African–American descent compared to European–American patients with the disease.3

The etiology of lupus is not fully understood. Genetic and environmental factors are thought to be involved in the pathogenesis of lupus.4 Further, a clear role for epigenetic dysregulation in the pathogenesis lupus has been established.5,6 The clinical heterogeneity of lupus is suggested to reflect variability in the underlying genetic background, epigenetic modifications, and immunologic dysregulation, among individual lupus patients.7-10 While lupus is unified by the presence of autoantibodies directed against self-nuclear antigens, clinical and molecular heterogeneity of the disease is an important factor hindering the success of clinical trials in lupus.11

In this report, we describe a subset of lupus patients enrolled in the University of Pittsburgh Medical Center Lupus Cohort who meet the American College of Rheumatology (ACR) classification criteria for SLE.12 We implement a subgroup clinical clustering analysis and characterize three district clinical subsets of lupus in our cohort.

Methods

Patients

We studied a subset of patients included in our UPMC Lupus Cohort who met the American College of Rheumatology classification criteria for SLE.12 All patients were evaluated in our clinics between January 2018 and March 2020. A total of 724 patients were studied. The study was approved by the institutional review board at the University of Pittsburgh.

Clustering

The 11 ACR classification criteria for SLE were used as input for calculating a distance matrix using Gower's distance method using the cluster (v2.1.0) package in R. This method is intended for non-numeric data.13 All ACR criteria were entered as asymmetric binary values. Cluster group number (k) was determined a priori using the NbClust (v3.0) package.14 This method uses a collection of 30 clustering indices that suggested an optimal recommended k = 3. Clustering of Gower's distance matrix was performed using the partitioning around medoid (PAM) method in the cluster package, which identifies clusters based around a single object with minimal dissimilarity to all objects within its cluster.15 PAM operates on the same principles as the k-means algorithm but is more robust to outliers.16 Assigned clusters had a combined average silhouette of 0.24 (cluster 1 = 0.27, cluster 2 = 0.25, and cluster 3 = 0.19). Silhouette values can range from −1 to + 1, with a higher value indicating a better cohesion of the objects within the cluster.17 Cluster assignments for each sample were used to test for differences in the distribution of sex, race/ethnicity, and the presence of ACR criteria across clusters.

Statistical analysis

Pearson's chi-square test was performed to compare sex and the presence of ACR criteria across clusters. Fisher's exact test was performed to compare race/ethnicity across clusters. P values for the differences between the presence of the 11 ACR criteria across clusters were adjusted using the Benjamini–Hochberg method to account for multiple testing. Sex and race/ethnicity P values were reported unadjusted. Odds ratios and Fisher's exact test P values were calculated for sex and the presence of ACR criteria across clusters using the epitools (v0.5-10.1) package in R without correction for multiple testing.18 A significance threshold of P < .05 was used for all statistical testing. Correlation analysis was performed using the nonparametric Spearman's Rho test with Benjamini–Hochberg FDR-adjusted P values reported to correct for multiple testing using the correlations (v0.4.0) package in R.19

Results

We evaluated a total of 724 lupus patients included in the lupus cohort at the University of Pittsburgh Medical Center. These patients represent a subset of our lupus cohort who meet the American College of Rheumatology classification criteria for SLE and were evaluated at our center between January 2018 and March 2020.

Our study population included 672 female and 52 male lupus patients, and are 73% (n = 529) European–American, 23% (n = 168) African–American, 2% (n = 16) Asians, and <2% (n = 11) others (Table 1). The average and median age of our patients are 48 and 47 years, respectively (range 19-86).

Table 1.

Clinical characteristics of three subgroups of patients with lupus in our cohort.

All patients (n = 724) Cluster 1 (n = 270) Cluster 2 (n = 179) Cluster 3 (n = 275) P value
Race/ethnicity
 White 529 (73%) 157 (58%) 146 (82%) 226 (82%) 2.04 × 10−9
 Black 168 (23%) 98 (37%) 29 (16%) 41 (15%) -
 Asian 16 (2%) 9 (3%) 3 (2%) 4 (1%) -
 Other/not reported 11 (<2%) 6 (2%) 1 (<1%) 4 (1%)
Sex
 Female 672 (93%) 242 (90%) 166 (93%) 264 (96%) .0158
 Male 52 (7%) 28 (10%) 13 (7%) 11 (4%) -
 Manifestations
 Malar rash 218 (30%) 39 (14%) 125 (70%) 54 (20%) 1.77 × 10−39
 Discoid rash 95 (13%) 48 (18%) 18 (10%) 29 (11%) .0201
 Photosensitivity 353 (49%) 29 (11%) 142 (79%) 182 (66%) 7.11 × 10−56
 Oral ulcers 322 (44%) 39 (14%) 37 (21%) 246 (89%) 6.14 × 10−79
 Arthritis 623 (86%) 218 (81%) 163 (91%) 242 (88%) .00574
 Serositis 232 (32%) 91 (34%) 49 (27%) 92 (33%) .334
 Renal disorder 119 (16%) 82 (30%) 19 (11%) 18 (7%) 5.79 × 10−14
 Neurologic disorder 30 (4%) 14 (5%) 5 (3%) 11 (4%) .455
 Hematologic disorder 277 (38%) 204 (76%) 19 (11%) 54 (20%) 7.11 × 10−56
 Immunologic disorder 496 (69%) 250 (93%) 147 (82%) 99 (36%) 1.22 × 10−48
 Positive ANA 674 (93%) 262 (97%) 160 (89%) 252 (92%) .00562

To further characterize the patterns of disease involvement in our lupus patients, we performed a medoid clustering analysis using the 11 ACR classification criteria for lupus. The analysis revealed that our lupus patients cluster in three distinct clinical clusters (Figure 1).

Figure 1.

Figure 1.

Clustering analysis of 724 lupus patients reveals three disease subsets. Clusters were determined using portioning around medoids method applied to a Gower's distance matrix of 11 ACR criteria reported for all patients.

Lupus cluster 1 includes 270 (37%) patients with overrepresentation of organ specific manifestations. This includes renal involvement in 30% of patients, compared to 11% and 7% in clusters 2 and 3, respectively (P = 5.79 × 10−14), hematologic involvement (76%, compared to 11% and 20% in clusters 2 and 3, respectively, P = 7.11 × 10−56), and discoid rash (18%, compared to 10% and 11% in clusters 2 and 3, respectively, P = .02). As shown in Table 1, among all of lupus patients in our cohort that have renal involvement (n = 119), hematological involvement (n = 277), and discoid rash (n = 95), 69%, 74%, 51%, respectively, are in cluster 1. Not unexpectedly, the majority of our African–American lupus patients (98 of 168 patients) were in this cluster, which is also enriched with our male lupus patients (28 of 52 male patients in our cohort) (Table 1).

Patients in cluster 2 (25%, n = 179) were more likely to have nonchronic cutaneous involvement including malar rash (70% of patients, P = 1.77 × 10−39) and photosensitivity (79% of patients, P = 7.11 × 10−56), and arthritis (91% of patients, P = .0057). Cluster 3 (38%, n = 275) is characterized by oral ulcers in the vast majority of patients (89%, n = 246) and has the lowest rate of renal involvement among all three clusters (7%) (Table 1).

We next determined the odds of developing specific lupus features for patients in any given cluster (Table 2). Patients in cluster 1 were 3.7 and 6.25 times more likely to develop lupus renal involvement compared to clusters 2 and 3, respectively (P = 5.16 × 10−7 and 2.59 × 10−13), and 25 and 12.5 times more likely to have hematologic involvement (P = 6.25 × 10−45 and 5.63 × 10−41). Cluster 2 patients were 13.6 and 31.5 times more likely to have malar rash and photosensitivity, respectively, compared to cluster 1 (P = 1.85 × 10−33 and 1.69 × 10−51). Meanwhile, patients in cluster 3 were ∼50 times more likely to have oral ulcers (OR = 49.54, P = 1.20 × 10−76) and were protected from lupus nephritis (OR = 0.16, P = 2.59 × 10−13) compared to patients in cluster 1 (Table 2).

Table 2.

Odds ratios for differences in clinical characteristics and manifestations between the lupus subgroups identified in our study. Odds ratio values and 95% confidence intervals (CI) in clusters 2 and 3 versus cluster 1 are depicted.

Variable Cluster Odds ratio (vs. cluster 1) Lower 95%CI Upper 95%CI P
Sex (female) Cluster 2 1.48 0.72 3.20 .317
Cluster 3 2.77 1.30 6.31 4.43 × 10−3
Malar rash Cluster 2 13.60 8.40 22.48 1.85 × 10−33
Cluster 3 1.45 0.90 2.34 .112
Discoid rash Cluster 2 0.52 0.27 0.95 .0289
Cluster 3 0.55 0.32 0.92 .0191
Photosensitivity Cluster 2 31.50 18.23 56.06 1.69 × 10−51
Cluster 3 16.16 10.08 26.61 3.60 × 10−43
Oral ulcers Cluster 2 1.54 0.91 2.61 .0952
Cluster 3 49.54 29.23 87.04 1.20 × 10−76
Arthritis Cluster 2 2.43 1.31 4.72 2.94 × 10−3
Cluster 3 1.75 1.06 2.90 .0245
Serositis Cluster 2 0.74 0.48 1.14 .177
Cluster 3 0.99 0.68 1.43 1
Renal disorder Cluster 2 0.27 0.15 0.48 5.16 × 10−7
Cluster 3 0.16 0.09 0.28 2.59 × 10−13
Neurologic disorder Cluster 2 0.53 0.15 1.51 .242
Cluster 3 0.76 0.31 1.85 .545
Hematologic disorder Cluster 2 0.04 0.02 0.07 6.25 × 10−45
Cluster 3 0.08 0.05 0.12 5.63 × 10−41
Immunologic disorder Cluster 2 0.37 0.19 0.69 8.72 × 10−4
Cluster 3 0.05 0.03 0.08 3.99 × 10−47
Positive ANA Cluster 2 0.26 0.10 0.63 1.78 × 10−3
Cluster 3 0.34 0.13 0.79 8.70 × 10−3

A correlation analysis between the 11 ACR criteria was performed in our lupus patients. We detected a significant positive correlation between fulfilling the immunologic disorder criterion and both renal involvement and hematologic disorder (P < .001 and < .01, respectively). The presence of either photosensitivity or oral ulcers in our lupus patients was negatively correlated with the presence of renal disorder, hematologic disorder, and immunologic disorder (P < .001 for all correlations) (Figure 2).

Figure 2.

Figure 2.

Correlation matrix of 11 ACR criteria reported for 724 lupus patients included in our study. Correlation values were calculated using Spearman's Rho test. P values were adjusted for multiple testing using the Benjamini–Hochberg false discovery rate method, and adjusted P values are reported. *P < .05; **P < .01; and ***P < .001.

Discussion

Systemic lupus erythematosus is a heterogenous remitting–relapsing chronic autoimmune disease. In this report, we describe a lupus cohort from a single tertiary referral center in Western Pennsylvania. Clustering analysis based on the ACR classification criteria for systemic lupus erythematosus identified three distinct clinical lupus clusters. 37% of our lupus patients are within a cluster of a more severe disease characterized by renal and hematologic involvement, 25% are in a cluster characterized by malar rash and photosensitivity, and the remaining 38% are in a cluster characterized by the presence of oral ulcers. Patients in the latter two clusters have less severe lupus with a significantly lower frequency of organ complications such as renal involvement. Intriguingly, our data suggest that the presence of photosensitivity or oral ulcers in lupus patients is protective against the development of lupus nephritis.

Clinical clustering in heterogenous diseases helps to identify disease subsets and might have value in predicting patterns of disease involvement and expected disease severity and organ complications.20 In lupus, our data suggest three clinical disease subsets with distinct patterns of clinical manifestations and differences in the odds of developing organ involvement. These data might have implications in the management of lupus patients.

Gene expression signatures have been previously shown to correlate with clinical subsets in lupus patients.21 Whether differences in the molecular mechanisms underlying lupus influence or determine the clinical clustering we observed in our patients remains to be determined. If that were to be the case, then perhaps clinical clustering might be a useful tool to reduce disease heterogeneity in lupus clinical trials with the premise that this might improve the likelihood of achieving successful outcomes in lupus trials.22

Limitations of this study include that our results are derived from a single cohort of lupus patients and might not necessarily reflect the clinical subsets of lupus in other lupus cohorts from different geographic locations or different ancestral groups of patients. Expanding these observations and examining clinical clustering in lupus patients from across different ancestries and locations are certainly warranted. In addition, examining the differences in damage accrual among the lupus clusters identified will be of interest in future studies. Indeed, clustering analysis based on damage manifestations in a large lupus cohort revealed higher mortality in lupus patients within two clusters characterized by cardiovascular and musculoskeletal damage.23

In summary, we describe lupus patients from our lupus cohort at the University of Pittsburgh Medical Center and identify distinct clinical subsets of lupus characterized by a specific pattern of disease and organ involvement. These data might have implication in the clinical care of lupus patients. Further, clinical clustering might be a useful tool to reduce disease heterogeneity and improve outcomes in clinical trials in lupus and similar complex autoimmune diseases.

Footnotes

Ethics Committee Approval: Ethics committee approval was received for this study from the institutional review board at the University of Pittsburgh (Approval Date: July 19, 2019; Approval Number: 19060329).

Informed Consent: Informed consent was not obtained due to the nature of this study.

Peer-review: Externally peer-reviewed.

Author Contributions: Concept - A.H.S.; Design - P.C., A.H.S.; Supervision - A.H.S.; Resources - A.H.S.; Data Collection and/or Processing - L.R.; Analysis and/or Interpretation - P.C., A.H.S.; Literature Search - P.C., A.H.S.; Writing Manuscript - P.C., A.H.S.; Critical Review - P.C., L.R, A.H.S.

Acknowledgments: We are grateful to Swati Bhosale, MHA and the Clinical Quality Analytics Team in the Department of Medicine, University of Pittsburgh for their help with this study.

Conflict of Interest: The authors have no conflict of interest to declare.

Financial Disclosure: This work was supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health grant number R01AI097134 and the Lupus Research Alliance.

References

  • 1.Tsokos GC. Autoimmunity and organ damage in systemic lupus erythematosus. Nat Immunol. 2020;21(6):605-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hughes T, Adler A, Merrill JT, et al. Analysis of autosomal genes reveals gene-sex interactions and higher total genetic risk in men with systemic lupus erythematosus. Ann Rheum Dis. 2012;71(5):694-99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Coit P, Ognenovski M, Gensterblum E, Maksimowicz-McKinnon K, Wren JD, Sawalha AH. Ethnicity-specific epigenetic variation in naive CD4 + T cells and the susceptibility to autoimmunity. Epigenet Chromatin. 2015;8:49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Harley JB, Chen X, Pujato M, et al. Transcription factors operate across disease loci, with EBNA2 implicated in autoimmunity. Nat Genet. 2018;50(5):699-707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Weeding E, Sawalha AH. Deoxyribonucleic acid methylation in systemic lupus erythematosus: Implications for future clinical practice. Front Immunol. 2018;9:875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Li H, Tsokos MG, Bickerton S, et al. Precision DNA demethylation ameliorates disease in lupus-prone mice. JCI Insight. 2018;3(16):e120880. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sanchez E, Nadig A, Richardson BC, et al. Phenotypic associations of genetic susceptibility loci in systemic lupus erythematosus. Ann Rheum Dis. 2011;70(10):1752-57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Coit P, Renauer P, Jeffries MA, Merrill JT, McCune WJ, Maksimowicz-McKinnon K, et al. Renal involvement in lupus is characterized by unique DNA methylation changes in naive CD4 + T cells. J Autoimmun. 2015;61:29-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mok A, Solomon O, Nayak RR, et al. Genome-wide profiling identifies associations between lupus nephritis and differential methylation of genes regulating tissue hypoxia and type 1 interferon responses. Lupus Sci Med. 2016;3(1):e000183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Renauer P, Coit P, Jeffries MA, et al. DNA methylation patterns in naive CD4 + T cells identify epigenetic susceptibility loci for malar rash and discoid rash in systemic lupus erythematosus. Lupus Sci Med. 2015;2(1):e000101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Merrill JT, Manzi S, Aranow C, et al. Lupus community panel proposals for optimising clinical trials: 2018. Lupus Sci Med. 2018;5(1):e000258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hochberg MC. Updating the American college of rheumatology revised criteria for the classification of systemic lupus erythematosus. Arthritis Rheum. 1997;40(9):1725. [DOI] [PubMed] [Google Scholar]
  • 13.Gower JC. A general coefficient of similarity and some of its properties. Biometrics. 1971; 27(4):857-71. [Google Scholar]
  • 14.Charrad M, Ghazzali N, Boiteau V, Niknafs A. NbClust: An R package for determining the relevant number of clusters in a data set. J Stat Soft. 2014;61(6). [Google Scholar]
  • 15.Reynolds AP, Richards G, de la Iglesia B, Rayward-Smith VJ. Clustering rules: A comparison of partitioning and hierarchical clustering algorithms. J Math Model Algor. 2006;5(4): 475-504. [Google Scholar]
  • 16.Kaufman L Rousseeuw PJ. Clustering by means of Medoids. Amsterdam: North-Holland; 1987. [Google Scholar]
  • 17.Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Mat. 1987;20:53-65. [Google Scholar]
  • 18.Aragon TJ. epitools: Epidemiology Tools. 2020. [Google Scholar]
  • 19.Makowski D, Ben-Shachar M, Patil I, Lüdecke D. Methods and algorithms for correlation analysis in R. JOSS. 2020;5(51):2306. [Google Scholar]
  • 20.Stafford IS, Kellermann M, Mossotto E, Beattie RM, MacArthur BD, Ennis S. A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases. NPJ Digit Med. 2020;3:30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bradley SJ, Suarez-Fueyo A, Moss DR, Kyttaris VC, Tsokos GC. T cell transcriptomes describe patient subtypes in systemic lupus erythematosus. PLoS One. 2015;10(11):e0141171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dall’Era M, Bruce IN, Gordon C, Manzi S, McCaffrey J, Lipsky PE. Current challenges in the development of new treatments for lupus. Ann Rheum Dis. 2019;78(6):729-35. [DOI] [PubMed] [Google Scholar]
  • 23.Pego-Reigosa JM, Lois-Iglesias A, Rua-Figueroa I, et al. Relationship between damage clustering and mortality in systemic lupus erythematosus in early and late stages of the disease: Cluster analyses in a large cohort from the spanish society of rheumatology lupus registry. Rheumatology (Oxford). 2016;55(7):1243-50. [DOI] [PubMed] [Google Scholar]

Articles from European Journal of Rheumatology are provided here courtesy of AVES

RESOURCES