Abstract
Objectives
Four intrinsic molecular subsets (inflammatory, fibroproliferative, limited, normal-like) have previously been identified in SSc and are characterized by unique gene expression signatures and pathways. The intrinsic subsets have been linked to improvement with specific therapies. Here, we investigated associations between baseline demographics and intrinsic molecular subsets in a meta-analysis of published datasets.
Methods
Publicly available gene expression data from skin biopsies of 311 SSc patients measured by DNA microarray were classified into the intrinsic molecular subsets. RNA-sequencing data from 84 participants from the ASSET trial were used as a validation cohort. Baseline clinical demographics and intrinsic molecular subsets were tested for statistically significant associations.
Results
Males were more likely to be classified in the fibroproliferative subset (P = 0.0046). SSc patients who identified as African American/Black were 2.5 times more likely to be classified as fibroproliferative compared with White/Caucasian patients (P = 0.0378). ASSET participants sera positive for anti-RNA pol I and RNA pol III autoantibodies were enriched in the inflammatory subset (P = 5.8 × 10−5, P = 9.3 × 10−5, respectively), while anti-Scl-70 was enriched in the fibroproliferative subset. Mean modified Rodnan Skin Score (mRSS) was statistically higher in the inflammatory and fibroproliferative subsets compared with normal-like (P = 0.0027). The average disease duration for inflammatory subset was less than fibroproliferative and normal-like intrinsic subsets (P = 8.8 × 10−4).
Conclusions
We identified multiple statistically significant differences in baseline demographics between the intrinsic subsets that may represent underlying features of disease pathogenesis (e.g. chronological stages of fibrosis) and have implications for treatments that are more likely to work in certain SSc populations.
Keywords: SSc, machine learning, gene expression, meta-analysis, clinical associations
Rheumatology key messages.
The SSc intrinsic molecular subsets are enriched for specific clinical covariates and demographics.
Autoantibody serology, disease duration, mRSS, pulmonary disease and race show significant associations.
Intrinsic subset stratification may reduce molecular heterogeneity in SSc patient cohorts for clinical trials.
Introduction
SSc is a potentially deadly autoimmune disease of unknown aetiology and complex clinical phenotype. It is characterized by skin fibrosis, internal organ dysfunction, vascular damage, and immunological abnormalities. SSc clinical subtypes are defined as limited cutaneous (lcSSc) or diffuse cutaneous (dcSSc) according to the extent of skin involvement [1] and typically correlate with disease severity [2]. Four intrinsic molecular subsets (inflammatory, fibroproliferative, limited, normal-like) have been defined in SSc, characterized by unique biological processes and skin gene expression signatures [3–5]. The molecular subtypes have been demonstrated across multiple tissues [6, 7] and validated in multiple studies [3, 4, 8–10] demonstrating the systemic nature of the disease.
Intrinsic subset is consistent across different skin biopsy sites within a single patient, regardless of clinically affected or unaffected status [9]. The inflammatory subset is defined by increased expression of inflammatory, stress and defence responses [4], while cell cycle and mitosis are highly expressed in the fibroproliferative subset. The normal-like subset comprises biopsies from SSc patients whose skin gene expression closely resembles that of healthy controls, notably missing inflammatory and proliferative signatures [9, 11]. The limited subset consists of patients with lcSSc and is the least molecularly characterized. The intrinsic subsets are clinically meaningful and have been linked to improvement and long-term outcomes with different treatments [5, 8, 12, 13].
Studies that first assigned SSc intrinsic subsets used unsupervised, agglomerative methods to determine the number of subsets and each sample’s membership in a subset [3, 4, 8, 14, 15]. We developed a supervised machine learning classifier [16] to assign individual samples to intrinsic SSc molecular subsets using objective molecular genomic data and extend the use of this method here to classify publicly available gene expression data from SSc skin samples.
Although overall survival and treatment strategies are improving, SSc remains a challenge to treat. Refined patient stratification could increase treatment success rate [17], because statistical power in clinical trials is compromised by clinical and molecular heterogeneity. The use of genomic data and intrinsic subsets may help improve patient outcomes by identifying targeted therapies with higher success. For example, the inflammatory subset has been associated with response to immune-modulating therapies [8, 12, 18]. Identifying clinical variables associated with intrinsic subsets may allow clinical trials to refine inclusion criteria to decrease genetic heterogeneity in study cohorts, thereby increasing power to identify effective SSc treatments.
Most published genomic studies are underpowered to detect clinical associations due to limited sample size. To directly address this issue, we performed a genomic meta-analysis of intrinsic subsets in SSc to identify clinical covariates associated with SSc intrinsic subsets that may provide important insight into disease treatment or pathogenesis.
Methods
DNA microarray data preprocessing
Raw gene expression data for each study (Supplementary Table S1, available at Rheumatology online) were downloaded from National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) and processed independently. Missing values in the gene expression data were imputed with GenePattern [19] using k-nearest neighbours and default settings. Probes were collapsed to genes by the maximum expression value using the annotation file for each dataset/platform. Genes were median-centred across arrays within the dataset. Samples were classified using GLMnet as described [16]. Due to data distribution differences between Affymetrix data and the training data for GLMnet, feature specific quantile normalization (FSQN) was performed prior to classification [16, 20, 21].
RNA-seq data preprocessing
RNA-sequencing was performed on skin biopsies from 84 participants in the ASSET (Abatacept Systemic SclErosis Trial) trial. Normalized reads per kilobase of transcript per million mapped reads (RPKM) were classified into intrinsic subsets using FSQN and a support vector machine [16]. Gene expression from forearm biopsies at baseline was used for classification, with the exception of one patient whose baseline sample failed quality control metrics and the 3-month forearm sample was used instead.
Clinical data processing
Age at the time of the skin biopsy was coded in years. Disease duration was coded in months and defined as time between biopsy and first non-Raynaud’s symptom attributed to SSc. Sex was coded as male, female or unknown/not reported. Race was coded: White (identifying as Caucasian or White), Black (Black, African American or African), Asian (Asian or Southeast Asian), other (American Indian, Alaska Native or other), or unknown/not reported. Ethnicity was coded independently from race as Hispanic (identifying as Hispanic/Latino), non-Hispanic or unknown/not reported. Patients who identified as more than one race were included in counts for all designated categories. Pulmonary function test results were used in the meta-analysis only if they were reported as forced vital capacity (FVC) % predicted and diffusing capacity of the lungs for carbon monoxide (DLCO) % predicted. Autoantibodies were coded individually as positive/negative/missing for Scl-70, anti-RNA polymerase III, or CENPB in the meta-analysis. Anti-RNA polymerase I was also available for the ASSET cohort.
Statistical analyses
We did not impute, or test, any missing values in clinical data. Associations between baseline clinical demographics and intrinsic molecular subsets were tested pairwise using Fisher’s exact test or the chi-square test for categorical variables and the Wilcoxon rank sum test was used for continuous variables. For comparing intrinsic subsets, ANOVA with Tukey’s correction for multiple hypotheses was used in pairwise comparisons of continuous variables. P-values <0.05 were considered significant. Analyses were performed using R v4.0.3 (R Foundation for Statistical Computing, Vienna, Austria).
Results
We identified publicly available DNA microarray gene expression datasets generated from SSc skin to form a discovery cohort for clinical features associated with intrinsic subset assignment. Genomic studies were excluded if (i) there was no published individual-level patient clinical information, (ii) we were unable to obtain clinical information about the study participants from the investigators, or (iii) there were fewer than five individuals in the study. Following these criteria, we were able to include 13 genomic datasets in our meta-analysis (Supplementary Tables S1 and S2, available at Rheumatology online). We restricted our analyses to only include individuals with a diagnosis of SSc and classified as either lcSSc or dcSSc. SSc patients with morphea, sine scleroderma and polymyositis overlap were excluded. For each individual, only the baseline forearm sample (pre-treatment) was retained for further analysis. A baseline back/flank sample was used if there was no forearm sample. These criteria resulted in a study population of 311 SSc patients for our meta-analysis (Table 1). The majority of the SSc patients included in this analysis were female (72.67%) and classified as dcSSc (74.60%). The average age of subjects in this study was 50.3 years with a mean (s.d.) SSc disease duration of 3.2 years (38.8 months).
Table 1.
Clinical demographics of the overall discovery and validation study populations
| Characteristic | SSc patients (n = 311) | ASSET cohort (n = 84) |
|---|---|---|
| Age, mean (s.d.), years | 50.33 (12.30) | 50.92 (12.70) |
| Sex, n (%) | ||
| Female | 226 (72.67) | 62 (73.81) |
| Male | 58 (18.65) | 22 (26.19) |
| Unknown/not reported | 27 (8.68) | 0 (0.0) |
| Race, n (%) | ||
| White | 138 (44.37)a | 71 (84.52)a |
| Black | 25 (8.04) | 7 (8.33)a |
| Asian | 5 (1.61) | 6 (7.14)a |
| Other | 2 (0.64)a | 1 (1.19) |
| Unknown/not reported | 142 (45.66) | 1 (1.19) |
| Ethnicity, n (%) | ||
| Hispanic | 17 (5.45) | 10 (11.90) |
| Non-Hispanic | 64 (20.58) | 73 (86.90) |
| Unknown/not reported | 230 (73.95) | 1 (1.19) |
| Clinical subtype, n (%) | ||
| Limited cutaneous | 79 (25.40) | 0 (0.0) |
| Diffuse cutaneous | 232 (74.60) | 84 (100.0) |
| SSc disease duration, mean (s.d.), months | 38.77 (63.45) | 18.46 (10.39) |
| mRSS, mean (s.d.) | 20.02 (10.59) | 22.18 (7.50) |
| FVC % predicted, mean (s.d.) | 79.82 (19.44) | 84.89 (14.93) |
| DLCO % predicted, mean (s.d.) | 64.24 (20.11) | 77.60 (18.42) |
| Autoantibodies, n (%) | ||
| Scl70 | 55 (17.68) | 16 (19.05) |
| Anti-RNA polymerase III | 53 (17.04) | 36 (42.86) |
| Anticentromere | 17 (5.45) | 3 (3.57) |
One patient in the meta-analysis identified as White and American-Indian. In the validation cohort, one patient identified as both White and Black and one patient identified as both White and Asian.
Clinical demographics are associated with intrinsic subsets
In our meta-analysis study population, 311 patients were individually assigned to an intrinsic subset based on gene expression using a pre-trained classifier [16]. We tested corresponding baseline demographic data to identify clinical associations with the intrinsic subsets in a meta-analysis (Table 2, Fig. 1) and validation cohort (Supplementary Table S3, available at Rheumatology online). Of the 311 patients with SSc, 117 (37.6%) were assigned to the inflammatory subset, 105 (33.8%) were assigned to the fibroproliferative subset, 84 (27.0%) were assigned to the normal-like subset, and 5 (1.6%) were assigned to the limited subset. Of the 84 participants from the ASSET study used as a validation cohort, 33 (39.3%) were assigned to the inflammatory subset, 18 (21.4%) were assigned to fibroproliferative subset and 33 (39.3%) were assigned to the normal-like subset. No patients with lcSSc were included in the ASSET study.
Table 2.
Demographics across the discovery cohort SSc intrinsic subsets (n = 311)
| Characteristic | Inflammatory (n = 117) | Fibroproliferative (n = 105) | Normal-like (n = 84) | Limited (n = 5) |
|---|---|---|---|---|
| Age, mean (s.d.), years | 52.92 (11.22) | 47.41 (11.06) | 49.50 (12.46) | 60.6 (4.62) |
| Sex, n | ||||
| Female | 88 | 66 | 67 | 5 |
| Male | 16 | 29 | 13 | 0 |
| Unknown/not reported | 13 | 10 | 4 | 0 |
| Race, n | ||||
| White | 58 | 41 | 35a | 4 |
| Black | 7 | 13 | 4 | 1 |
| Asian | 3 | 2 | 0 | 0 |
| Other | 0 | 0 | 2a | 0 |
| Unknown/not reported | 49 | 49 | 44 | 0 |
| Ethnicity, n | ||||
| Hispanic or Latino | 4 | 6 | 7 | 0 |
| Non-Hispanic or non-Latino | 27 | 18 | 19 | 0 |
| Unknown | 86 | 81 | 58 | 5 |
| Clinical subtype, n | ||||
| Diffuse cutaneous | 103 | 79 | 50 | 0 |
| Limited cutaneous | 14 | 26 | 34 | 5 |
One patient in the normal-like subset in the meta-analysis identified as White and American-Indian.
Fig. 1.
Clinical demographics of the study population
Clinical demographics of gender and age distribution stratified by intrinsic subsets in the SSc meta-analysis (A, B) and validation cohorts (C, D), respectively.
There were more females than males in our study population (Fig. 1A), and distribution of intrinsic subsets differed between the sexes (P = 0.030, Fisher’s exact test). Males were 2.41 times more likely to be fibroproliferative than females (P = 0.0046, Fisher’s exact test). Females and males were equally likely to be classified as inflammatory, normal-like or limited. There was also a significant association with gender in the ASSET cohort (P = 0.015, Fisher’s exact test). Males were 3.99 times more likely to be classified as fibroproliferative (P = 0.015, Fisher’s exact test) while females and males were equally likely to be assigned to the inflammatory and normal-like subsets (Fig. 1C).
The average age of individuals differed significantly between the intrinsic subsets (P = 0.0041, ANOVA). The average age of patients was 52.92 years in the inflammatory subset, 47.41 years in the fibroproliferative subset, 49.50 years in the normal-like subset and 60.6 years in the limited subset (Fig. 1B). Overall, fibroproliferative patients were significantly younger than inflammatory patients (P = 0.011, Tukey’s HSD test, n = 259), but no other pairwise comparisons of age were statistically significant. In the ASSET validation cohort, the average age of inflammatory patients (53.21 years) compared with fibroproliferative patients (46.56 years; P = 0.203, ANOVA; Fig. 1D) demonstrates a larger absolute difference of means than in the discovery cohort, suggesting that the smaller sample size in the ASSET cohort is likely responsible for the lack of statistical significance.
We investigated the distribution of intrinsic subsets within and between self-reported races (n = 168). Though most patients identified as White/Caucasian in our study (n = 138, 82.1%), 25 patients identified as African American/Black (14.9%), five patients identified as Asian and one patient identified as White and Asian. Seventeen patients identified as Hispanic or Latino, and 64 patients identified as non-Hispanic or non-Latino. Some studies coded race and ethnicity together; we did not infer race or ethnicity in studies where it was not explicitly reported. Thus, race and ethnicity had high rates of missing information. There was no significant relationship between ethnicity and intrinsic subset for either the meta-analysis or the validation cohort (P = 0.38, P = 0.91, Fisher’s exact test).
There was no significant association when considering all races and all intrinsic subsets in the meta-analysis (P = 0.08, chi-square test), although this analysis was significant in the ASSET cohort (P = 0.0033, Fisher’s exact test). The lack of a significant difference in the meta-analysis could be due to vastly different sample sizes between races, because notable trends were conserved in both study populations, especially for African American/Black patients and the fibroproliferative subset. Of the patients who identified as White/Caucasian, 29.71% (41/138) were classified as fibroproliferative, whereas 52% (13/25) of patients identifying as African American/Black were classified as fibroproliferative. Compared with White/Caucasian SSc patients, African American/Black patients were 2.5 times more likely to be classified as fibroproliferative (P = 0.0378, Fisher’s exact test).
In the ASSET cohort, patients with SSc who identified as African American or Black were also much more likely to be classified as fibroproliferative compared with other SSc patients (P = 0.0062, Fisher’s exact test). Patients with SSc who identified as White or Caucasian are more likely to be classified as inflammatory or normal-like (P = 0.0037, Fisher’s exact test).
Association with autoantibodies
Serum autoantibody information was missing for a substantial number of lcSSc patients in our cohort, and there are known associations of autoantibodies with clinical subtype, so we restricted autoantibody analyses only to dcSSc patients. Of inflammatory patients, 47.5% (29/61) tested positive for anti-RNA polymerase III autoantibodies. Of normal-like patients 40.9% (9/22) tested positive for anti-RNA polymerase III. Of fibroproliferative patients 48.6% (18/37) tested positive for Scl-70 autoantibodies. However, the autoantibody analyses did not reach statistical significance in the meta-analysis cohort (P = 0.24, Fisher’s exact test).
Autoantibodies in the ASSET clinical trial were measured at a single centre in a consistent manner providing a complete, uniformly measured dataset. In the validation cohort, there was a statistically significant difference in anti-RNA polymerase III between the intrinsic subsets (P = 9.25 × 10−5, Fisher’s exact test). Of inflammatory patients 72.7% (24/33) tested positive for anti-RNA polymerase III in comparison to only 22.6% of normal-like patients (7/31) and 29.4% of fibroproliferative patients (5/17). The ASSET clinical trial also tested for anti-RNA polymerase I autoantibodies, and there was a statistically significant difference between the intrinsic subsets (P = 5.81 × 10−5, Fisher’s exact test). The inflammatory subset was more likely to be positive for anti-RNA polymerase I (66.7%, 22/33) compared with the normal-like (16.1%, 5/31) and fibroproliferative (23.5% 4/17) subsets. There were no significant differences in anti-Scl-70 (P = 0.111, Fisher’s exact test) or anticentromere (P = 0.6043, Fisher’s exact test) between the intrinsic subsets. Though failing to reach statistical significance, the fibroproliferative subset trends toward an enrichment for anti-Scl-70 autoantibodies in both the meta-analysis and ASSET datasets (Table 3 and Supplementary Table S4, available at Rheumatology online).
Table 3.
Phenotypic Measures across the discovery cohort of dcSSc intrinsic subsets (n = 232)
| Characteristic | Inflammatory (n = 103) | Fibroproliferative (n = 79) | Normal-like (n = 50) |
|---|---|---|---|
| Disease duration, mean (s.d.), months | 14.85 (15.29) | 35.59 (51.16) | 41.393 (45.50) |
| mRSS, mean (s.d.) | 24.67 (9.47) | 23.13 (8.12) | 19.29 (8.78) |
| FVC % predicted, mean (s.d.) | 81.04 (17.89) | 73.00 (20.50) | 71.79 (14.15) |
| DLCO % predicted, mean (s.d.) | 66.09 (19.29) | 59.58 (20.03) | 63.18 (22.15) |
| Autoantibodies, n | |||
| Anti Scl-70 | 16 | 18 | 6 |
| Anti-RNA polymerase III | 29 | 12 | 9 |
| Anticentromere | 4 | 1 | 1 |
| Unknown/not reported | 54 | 48 | 34 |
dcSSc: diffuse cutaneous SSc; DLCO: diffusing capacity of the lungs for carbon monoxide; FVC: forced vital capacity; mRSS: modified Rodnan Skin Score.
Measures of SSc severity and correlation to intrinsic subsets
We tested measures of SSc severity between intrinsic subsets in patients with dcSSc in the meta-analysis (Table 3, Fig. 2) and the validation cohort (Supplementary Table S4, available at Rheumatology online). mRSS is a standard outcome measure for skin involvement, calculated by assessing skin thickness (scored 0–3) across 17 body sites. mRSS differed significantly between intrinsic subsets (Fig. 2A) (P = 0.0027, ANOVA). Normal-like patients exhibited the lowest average mRSS (19.29), compared with inflammatory (P = 0.0017, Tukey’s HSD test) and fibroproliferative patients (P = 0.047, Tukey’s HSD test). There was no difference between the average mRSS for inflammatory (average = 24.67) and fibroproliferative (average = 23.13) patients (P = 0.48, Tukey’s HSD test). The inflammatory and fibroproliferative subsets in the ASSET cohort also showed significantly higher average mRSS than the normal-like subset (respectively: P = 1.53 × 10−5, P = 0.0060, ANOVA Tukey’s HSD test), confirming the meta-analysis results (Fig. 3A).
Fig. 2.
Phenotypic severity by intrinsic subset in the meta-analysis population
Measures of phenotypic severity in dcSSc patients of the discovery cohort stratified by intrinsic subset (A) mRSS, (B) FVC, (C) DLCO, (D) disease duration. dcSSc: diffuse cutaneous SSc; DLCO: diffusing capacity of the lungs for carbon monoxide; FVC: forced vital capacity; mRSS: modified Rodnan Skin Score.
Fig. 3.
Phenotypic severity by intrinsic subset in the validation cohort
Measures of phenotypic severity in dcSSc patients of the validation cohort (ASSET) stratified by intrinsic subset (A) mRSS, (B) FVC, (C) DLCO, (D) disease duration. dcSSc: diffuse cutaneous SSc; DLCO: diffusing capacity of the lungs for carbon monoxide; FVC: forced vital capacity; mRSS: modified Rodnan Skin Score.
FVC and DLCO are two standard measures of lung involvement in SSc (Fig. 2B, C). Lower values indicate more severe disease. There were no statistically significant differences between intrinsic subsets for DLCO % predicted (P = 0.49, ANOVA) or FVC % predicted (P = 0.067, ANOVA) in the meta-analysis. Although not statistically significant, we observed a consistent trend of preserved lung function (mean [s.d.] DLCO and FVC, respectively) in inflammatory patients (66.0 9 [19.29], 81.0 4 [17.89]) and slightly reduced lung function in normal-like (63.1 8 [22.15], 71.7 9 [14.15]) and fibroproliferative patients (59.5 8 [20.03], 73.0 [20.50]). A similar trend was observed for FVC in the ASSET cohort but did not reach statistical significance (P = 0.229, ANOVA) (Fig. 3B). There were no significant differences for DLCO (% corrected) in the ASSET cohort (P = 0.135, ANOVA) (Fig. 3C).
Evidence of a temporal relationship between intrinsic subsets
We investigated temporal spacing between the intrinsic subsets by quantifying disease duration in months from first non-Raynaud’s symptom attributed to SSc. Patients with lcSSc exhibited longer average disease duration than patients with dcSSc (P = 1.55 × 10−4, Wilcoxon test). In order to reduce confounding by clinical subtype, we restricted the analysis of disease duration to only patients with dcSSc. We identified a statistically significant difference in average SSc disease duration between the inflammatory, fibroproliferative and normal-like intrinsic subsets (P = 8.8 × 10−4, ANOVA) (Fig. 2D). Patients in the inflammatory subset had lower average disease duration (14.85 months) compared with both the fibroproliferative subset (P = 0.0073, Tukey’s HSD test) and the normal-like subset (P = 0.0042, Tukey’s HSD test). Disease duration for the fibroproliferative subset (35.59 months) was shorter, but not statistically different (P = 0.78, Tukey’s HSD test), than the normal-like subset (41.39 months). This difference in temporal distributions may reflect chronological stages of fibrosis, such as is suggested in myocardial tissue remodelling and pulmonary fibrosis [22, 23]. There was not a significant difference in disease duration (P = 0.416, ANOVA) in the ASSET cohort, as expected due to the clinical trial recruitment criteria (disease duration <36 months) (Fig. 3D). Because of the significant trends evident in disease duration from the meta-analysis cohort, we further investigated the relationship between disease duration, lung function measures and intrinsic subset but did not identify any significant findings, likely due to small sample size (Supplementary Materials; Figs S1, S2, Tables S5 and S6, available at Rheumatology online).
Discussion
In this study, we performed the first large-scale genomic meta-analysis identifying clinical covariates associated with intrinsic subsets defined by gene expression in SSc. We utilized machine learning to aggregate clinical data and summarize genomic information from multiple studies performed over time, on different platforms and in multiple independent laboratories. In total we analysed data from 311 individuals with SSc across 13 studies, combined with the 84 individuals in the validation cohort, for a total of 395 patients with SSc. We solidify and confirm associations between intrinsic subset with disease duration, mRSS and ILD that were suggested in individual datasets. Analysis of this larger set of individuals also allowed us to identify novel associations with race and autoantibodies that had not been previously reported.
Our results indicate that fibroproliferative patients may be younger and more likely to be male. Individuals of Black and African American ancestry were more likely to fall into this group and have anti-Scl-70 antibodies. Inflammatory patients were more likely to be older, female, Caucasian, and sera-positive for anti-RNA polymerase I and III. These results show an enrichment for certain demographics within the intrinsic subsets but that stratification by autoantibodies, gender or race alone is not sufficient to predict an individual’s molecular subset. Some findings were not significant in both cohorts, despite consistent trends, likely due to differences in clinical subtype distribution and sample size between the discovery and validation populations.
Notably, we identified differences in SSc disease severity between the intrinsic subsets. Inflammatory and fibroproliferative patients are more likely to have higher mRSS compared with normal-like patients. Patients in the fibroproliferative subset may have decreased lung function, a phenotype that has previously been noted [24]. This finding is particularly important given increased prevalence of African American/Black patients with SSc belonging to the fibroproliferative subset. This is the first study to find a significant association of race and intrinsic subset. Decreased lung function (regardless of SSc disease duration) [23, 24] and increased TGFβ gene expression signatures [24] have been reported in African American/Black SSc patients. This result further supports a plausible link between genomic signatures and phenotypic outcomes. These findings may have clinical implications for identifying effective treatments for this population, such as stem cell transplantation [25].
It has been suggested that normal-like patients may represent later stage disease [26], and our study supports a temporal relationship between the intrinsic subsets. These data support the inflammatory subset as earlier disease, and fibroproliferative as having an intermediate disease duration. However, based on longitudinal data, we believe the inflammatory and fibroproliferative subsets do not readily interconvert [4]. The normal-like subset may represent a later disease stage in which the early inflammatory and fibroproliferative stages have previously burned out. Prior studies have been unsuccessful in capturing patients’ changing subset over time [4], except in the context of treatment [12], and then typically only toward normal-like. Notably, in studies such as ASSET that only enrolled early dcSSc patients, inflammatory, fibroproliferative and normal-like subsets are all represented within the baseline biopsies.
We did not control for prior treatment in this study and that is a limitation to this analysis; however, in the ASSET trial, no patients were on background immunomodulatory therapy at the baseline visit. Other limitations of this study include using datasets that were originally used to derive the intrinsic subsets. In addition, we acknowledge the potential for double-counting of individuals who participated in multiple studies due to de-identified data reporting, although we expect this to be a small number of individuals. A major strength of this study is that only baseline samples are considered, and many of the samples were from ‘pre-treatment’ individuals in clinical trials who experienced wash-out time prior to sample collection. Thus, we believe the results of this analysis may be indicative of natural disease history and supportive of an immune-fibrotic axis in SSc [7, 9].
By leveraging data from multiple studies, we increased statistical power and identified multiple novel associations between clinical variables and intrinsic subsets in SSc. These associations may explain aspects of SSc pathogenesis and probe interesting biological questions, such as how fibroproliferative processes impact lung function and manifest in certain populations. These data provide additional clinical context for the intrinsic subsets, insights into molecular heterogeneity in early SSc, and rationale associated with marked variability in outcome measures in recent trials, and may influence trial design. Future recruitment criteria for clinical trials may consider associated clinical variables to increase the probability of recruiting patients in the intrinsic subsets of interest. In trials where the focus is early skin disease, the design may consider stratifying or recruiting patients with inflammatory gene expression. On the contrary, for those with ILD, the fibroproliferative signature can be used for stratification. Future trials should validate these findings.
Funding: This work was funded by the National Institutes of Health (National Institute of Allergy and Infectious Diseases Clinical and Autoimmunity Center of Excellence grant 5-UM1-AI-110557 to the University of Michigan for ASSET data and National Institute of Arthritis and Musculoskeletal and Skin Diseases grants K24-AR-063120 to D.K., 5P50AR060780-10 to M.L.W., and R01-AR-073270 to M.H.), Scleroderma Research Foundation (M.L.W.), Burroughs-Wellcome PUP Big Data in the Life Sciences Training Program (M.L.W., J.M.F.), the National Institutes of Health BD2K T32 5T32LM012204 (J.M.F.), National Institutes of Health T32 (D.T.) and the Dr Ralph and Marian Falk Medical Research Trust (M.L.W.).
Disclosure statement: V.M. is an employee at Celdara Medical LLC, which is developing gene expression biomarkers in systemic sclerosis. L.C. has received grants/contracts from Boerhinger Ingelheim; consulting fees from Boehringer Ingelheim, Mitsubishi Tanabe, Genentech, Kyverna, Eicos; and payment/honoraria from Boehringer Ingelheim. C.D. has received personal fees from Acceleron, Actelion, Corbus, Boehringer Ingelheim, Horizon, Roche and Sanofi, and grants and personal fees from CSL Behrin, GSK and Inventiva. J.G. has received grants/contracts from Cumberland Pharmaceuticals, Eicos Sciences and Genentech via UCLA. M.H. has received payment/honoraria/support for travel from Abbvie and Boehringer Ingelheim. D.K. has received grants/contracts from BMS, Pfizer, Horizon and Bayer; consulting fees from Bayer, BMS, Horizon, CSL Behring, Corbus, Horizon, GSK, Theraly, Boehringer Ingelheim, Genentech/Roche, Chemomab, Prometheus and Astra Zeneca; and is a stockholder of Eicos Sciences, Inc. M.W. has received consulting fees from Celdara Medical LLC, Bristol-Myers Squibb, Corbus Pharmaceuticals, UCB Biopharma, Third Rock Ventures and Acceleron; and payment/honoraria from Abbvie. All other authors have no disclosures.
Supplementary Material
Contributor Information
Jennifer M Franks, Department of Biomedical Data Science; Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH.
Diana M Toledo, Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH.
Viktor Martyanov, Department of Biomedical Data Science; Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH.
Yue Wang, Department of Biomedical Data Science; Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH.
Suiyuan Huang, Department of Biostatistics, University of Michigan, Ann Arbor, MI.
Tammara A Wood, Department of Biomedical Data Science; Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH.
Cathie Spino, Department of Biostatistics, University of Michigan, Ann Arbor, MI.
Lorinda Chung, Palo Alto Health Care System, Palo Alto, Stanford, CA, USA.
Christopher P Denton, Division of Medicine, University College London, London, UK.
Emma Derrett-Smith, Division of Medicine, University College London, London, UK.
Jessica K Gordon, Hospital for Special Surgery, New York, NY.
Robert Spiera, Hospital for Special Surgery, New York, NY.
Robyn Domsic, University of Pittsburgh, Pittsburgh, PA.
Monique Hinchcliff, Yale University, New Haven, CT.
Dinesh Khanna, Department of Biostatistics, University of Michigan, Ann Arbor, MI; Division of Rheumatology, Department of Medicine, University of Michigan, Ann Arbor, MI, USA.
Michael L Whitfield, Department of Biomedical Data Science; Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH.
Data availability statement
All genomic data used in this study are publicly available via the Gene Expression Omnibus (GEO) from the National Center for Biotechnology Information (NCBI). GEO accession numbers for each dataset are reported in Supplementary Table S1, available at Rheumatology online.
Supplementary data
Supplementary data are available at Rheumatology online.
References
- 1. LeRoy EC, Black C, Fleischmajer R. et al. Scleroderma (systemic sclerosis): classification, subsets and pathogenesis. J Rheumatol 1988;15:202–5. [PubMed] [Google Scholar]
- 2. Varga J, Denton CP, Wigley FM, Alanore Y, Kuwana M (eds). Scleroderma: from pathogenesis to comprehensive management. 2nd edn.Cham: Springer, 2017. [Google Scholar]
- 3. Milano A, Pendergrass SA, Sargent JL. et al. Molecular subsets in the gene expression signatures of scleroderma skin. PLoS One 2008;3:e2696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Pendergrass SA, Lemaire R, Francis IP. et al. Intrinsic gene expression subsets of diffuse cutaneous systemic sclerosis are stable in serial skin biopsies. J Invest Dermatol 2012;132:1363–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Hinchcliff M, Toledo DM, Taroni JN. et al. Mycophenolate mofetil treatment of systemic sclerosis reduces myeloid cell numbers and attenuates the inflammatory gene signature in skin. J Invest Dermatol 2018;138:1301–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Taroni JN, Martyanov V, Huang CC. et al. Molecular characterization of systemic sclerosis esophageal pathology identifies inflammatory and proliferative signatures. Arthritis Res Ther 2015;17:194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Taroni JN, Greene CS, Martyanov V. et al. A novel multi-network approach reveals tissue-specific cellular modulators of fibrosis in systemic sclerosis. Genome Med 2017;9:27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Hinchcliff M, Huang CC, Wood TA. et al. Molecular signatures in skin associated with clinical improvement during mycophenolate treatment in systemic sclerosis. J Invest Dermatol 2013;133:1979–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Mahoney JM, Taroni J, Martyanov V, Wood TA. et al. Systems level analysis of systemic sclerosis shows a network of immune and profibrotic pathways connected with genetic polymorphisms. PLoS Comput Biol 2015;11:e1004005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Whitfield ML, Finlay DR, Murray JI. et al. Systemic and cell type-specific gene expression patterns in scleroderma skin. Proc Natl Acad Sci USA 2003;100:12319–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Johnson ME, Mahoney JM, Taroni J. et al. Experimentally-derived fibroblast gene signatures identify molecular pathways associated with distinct subsets of systemic sclerosis patients in three independent cohorts. PLoS One 2015;10:e0114017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Gordon JK, Martyanov V, Franks JM. et al. Belimumab for the treatment of early diffuse systemic sclerosis: results of a randomized, double-blind, placebo-controlled, pilot trial. Arthritis Rheumatol 2018;70:308–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Gordon JK, Martyanov V, Magro C. et al. Nilotinib (Tasigna) in the treatment of early diffuse systemic sclerosis: an open-label, pilot clinical trial. Arthritis Res Ther 2015;17:213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Sorlie T, Perou CM, Tibshirani R. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA 2001;98:10869–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Perou CM, Sorlie T, Eisen MB. et al. Molecular portraits of human breast tumours. Nature 2000;406:747–52. [DOI] [PubMed] [Google Scholar]
- 16. Franks JM, Martyanov V, Cai G. et al. A machine learning classifier for assigning individual patients with systemic sclerosis to intrinsic molecular subsets. Arthritis Rheumatol 2019;71:1701–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Denton CP, Khanna D.. Systemic sclerosis. Lancet 2017;390:1685–99. [DOI] [PubMed] [Google Scholar]
- 18. Chakravarty EF, Martyanov V, Fiorentino D. et al. Gene expression changes reflect clinical response in a placebo-controlled randomized trial of abatacept in patients with diffuse cutaneous systemic sclerosis. Arthritis Res Ther 2015;17:159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Reich M, Liefeld T, Gould J. et al. GenePattern 2.0. Nat Genet 2006;38:500–1. [DOI] [PubMed] [Google Scholar]
- 20. Franks JM, Cai G, Whitfield ML.. Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data. Bioinformatics 2018;34:1868–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Khanna D, Spino C, Johnson S. et al. Abatacept in early diffuse cutaneous systemic sclerosis: results of a phase II investigator-initiated, multicenter, double-blind, randomized, placebo-controlled trial. Arthritis Rheumatol 2020;72:125–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Suthahar N, Meijers WC, Silljé HHW, de Boer RA.. From inflammation to fibrosis-molecular and cellular mechanisms of myocardial tissue remodelling and perspectives on differential treatment opportunities. Curr Heart Fail Rep 2017;14:235–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Gifford AH, Matsuoka M, Ghoda LY, Homer RJ, Enelow RI.. Chronic inflammation and lung fibrosis: pleotropic syndromes but limited distinct phenotypes. Mucosal Immunol 2012;5:480–4. [DOI] [PubMed] [Google Scholar]
- 24. Sargent JL, Milano A, Bhattacharyya S. et al. A TGFβ-responsive gene signature is associated with a subset of diffuse scleroderma with increased disease severity. J Invest Dermatol 2010;130:694–705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Franks JM, Martyanov V, Wang Y. et al. Machine learning predicts stem cell transplant response in severe scleroderma. Ann Rheum Dis 2020;79:1608–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Assassi S, Swindell WR, Wu M. et al. Dissecting the heterogeneity of skin gene expression patterns in systemic sclerosis. Arthritis Rheumatol 2015;67:3016–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All genomic data used in this study are publicly available via the Gene Expression Omnibus (GEO) from the National Center for Biotechnology Information (NCBI). GEO accession numbers for each dataset are reported in Supplementary Table S1, available at Rheumatology online.



