Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Apr 15.
Published in final edited form as: Rheumatology (Oxford). 2026 Feb 4;65(2):keag029. doi: 10.1093/rheumatology/keag029

Heterogeneity in the association of genetic risk for rheumatoid arthritis and resultant rheumatoid arthritis phenotypes

Thomas R Riley IV 1,2, Austin M Wheeler 3,4, Bryant R England 3,4, Grant W Cannon 5, Brian Sauer 5, Gary A Kunkel 5, Katherine D Wysham 6, Beth I Wallace 7, Paul A Monach 8, Andreas Reimold 9, Gail S Kerr 10, Isaac D Smith 11, John Steuart Richards 12, Iris Lee 13, Geoffrey M Thiele 3,4, Rui Xiao 1,14, Scott M Damrauer 1,2, Michael G Levin 1,2, Michael D George 1, Ted R Mikuls 3,4, Joshua F Baker 1,2
PMCID: PMC13078286  NIHMSID: NIHMS2145944  PMID: 41548231

Abstract

Objectives

The impact of the genetic risk for rheumatoid arthritis (RA) on resultant disease phenotype in RA is incompletely understood. Using individual genetic variants associated with RA and a polygenic score (PGS), we hypothesized that those with higher genetic risk for RA would demonstrate a more severe disease course.

Methods

We genotyped participants from the Veterans Affairs RA registry (VARA), a prospective cohort of RA. We evaluated associations between PTPN22 R620W genotype, HLA-DRB1 shared epitope (0, 1, or 2 copies of a high-risk HLA allele), and a non-HLA PGS with RA disease activity scores and disease characteristics using linear and logistic regression, adjusted for sex, age, and principal components of population structure.

Results

Among 2557 VARA participants, 50 (2%) were homozygous for PTPN22 R620W, and 1603 (62%) had a PGS greater than the 50th percentile when compared to a reference population. PTPN22 R620W, shared epitope (SE), and the PGS were strongly associated with seropositivity (PGS OR= 1.41, 95% CI 1.26-1.58, p<0.001). At enrollment, compared to those with no R620W alleles, those with 2 alleles had higher disease activity [CDAI β=6.10, 95% CI 1.23-10.97, p=0.014]; no difference was observed for those with one R620W allele or SE. PGS was not significantly associated with disease activity.

Conclusion

Genetic variation renders heterogeneous associations with disease activity and phenotype in RA. Individual genetic variants and pathway-specific genetic risk associated with RA may be more informative than the pooled genetic risk in understanding disease phenotype and disease activity.

Keywords: Rheumatoid arthritis, polygenic score, PTPN22, disease activity score, phenotype, genetics

Background

Over 120 genetic variants are reported to be associated with susceptibility to rheumatoid arthritis (RA). Prior studies suggest that individual genetic markers and polygenic scores (PGS) for RA are associated with seropositivity and the risk of erosive disease (14). However, the association between genetic risk and disease activity in RA remains poorly understood. Improving our understanding of genetic factors influences disease activity and RA phenotype could help to inform treatment and prognosis in order to precisely target therapies in RA.

Genetic variation at the HLA-DRB1 shared epitope (SE) and at PTPN22 R620W are established risk factors for RA. The HLA-DRB1 SE is defined by HLA-DRB1 alleles that represent the strongest genetic risk factor for RA with studies showing a 3-fold higher risk of RA in those with SE (5); SE is also associated the development of erosive disease and favorable treatment response to TNF inhibitors in patients with RA (6). The PTPN22 R620W variant is a single nucleotide polymorphism (SNP) that increases downstream signaling from the T cell receptor. This variant is of particular interest as it has been shown in multiple genome-wide association studies (GWAS) to be a strong risk factor for the development of different autoimmune conditions, including RA, and may be subject to gene-gene interactions with HLA-DRB1 (3,7,8). Unlike the HLA-DRB1 shared epitope (SE), PTPN22 R620W was not associated with erosive disease in prior studies; however, it is unknown if this variant is associated with higher disease activity or more refractory disease in patients with RA (9,10). Aside from single genetic variants, PGS may provide a more comprehensive assessment of risk by incorporating multiple genetic variants associated with the risk for RA from prior GWAS (3,11,12).

To establish how the genetic risk for RA affects disease activity scores, we evaluated the association between a multi-ancestry PGS that excludes the HLA region, PTPN22 R620W, and SE with disease activity in US veterans diagnosed with RA (3). We hypothesized that these approaches to characterize the genetic risk for RA would be associated with more severe disease features including higher rates of seropositivity, higher rates of radiographic damage, higher disease activity at enrollment, and long-term disease activity at follow-up, independent of disease activity at enrollment.

Methods

Study Population and Clinical Data

The study population is the Veterans Affairs RA registry (VARA), a national longitudinal prospective cohort of veterans with RA from VA rheumatology clinics at 19 VA sites (13). Participants were ≥ 18 years old and satisfied the 1987 American College of Rheumatology RA classification criteria (14); enrollment began in 2003. Participants provided written informed consent prior to enrollment, and each site received IRB approval. Clinical data including the components of disease activity assessments were recorded by clinicians at the time of enrollment and during routine clinical visits (15). Biospecimens were collected and banked at enrollment including serum, plasma, and DNA. Anti-citrullinated protein antibodies (ACPA) and rheumatoid factor (RF) were measured from serum samples collected at enrollment. ACPA positivity was defined by an anti-CCP ELISA > 5 U/ml. RF positivity was defined by RF nephelometry > 15 IU/mL.

The VARA registry incorporates Veterans Health Administration records, which include prescribed and dispensed disease-modifying anti-rheumatic drugs (DMARDs), body mass index (BMI, kg/m2), and routine clinical labs (15). DMARDs were extracted from prescription claims captured in the VA Corporate Data Warehouse (CDW). Prescription claims were organized into courses, defined as a continuous treatment with at least 1 dispensing without a gap of >90 days between the expected end of the days’ supply and the start of the subsequent dispensing episode (16). Demographics and health behaviors were collected by self-report at enrollment including sex, date of birth, and smoking history. Race and ethnicity were extracted from CDW. Race options included American Indian or Alaska Native, Asian American, Black/African American, Pacific Islander or Native Hawaiian, White, and other. For ethnicity, chosen separately from race, options included Hispanic or non-Hispanic (17).

Outcome measures

The primary outcomes for this study were composite disease activity measures, the 28-joint Disease Activity Score with C-reactive protein [DAS28(CRP)] and the Clinical Disease Activity Index [CDAI] among those with available data (18,19). The primary outcome was assessed at enrollment and in longitudinal models for each registry visit. Secondary disease phenotype outcomes included seropositive status, the presence of erosive disease, and the presence of rheumatoid nodules, assessed at cohort enrollment. Rheumatoid nodules and radiographic changes consistent with erosive disease were reported by site investigators at VARA enrollment.

Exploratory outcomes included the component disease activity measures. Patient global assessment (0-100mm), provider global assessment (0-100mm), tender joint count (0-28), and swollen joint count (0-28) were extracted from the medical record (13). Erythrocyte sedimentation rate (ESR, mm/hr), and C-reactive protein (CRP, mg/dL) were extracted from the CDW laboratory data package. CRP values were log-transformed and ESR values were log- and skew-transformed to approximate a normal distribution. Complete case analysis was performed for each outcome (Supplementary Table 1).

Genotyping

Genotyping for participants was performed at the Children’s Hospital of Philadelphia Genetics Core using the Infinium Global Screening Array-24 v2.0array [Illumina, Inc.; San Diego, CA, USA] (17,20). The assay directly genotypes PTPN22 R620W (rs2476601, G>A) which passed quality control measures and Hardy-Weinberg Equilibrium (HWE) test (p > 0.05). Participants had SE determination through either direct sequencing for HLA-DRB1 using AlleleSEQR HLA-DRB1 kit or using a PCR-based sequence specific oligonucleotide probe system; participants were considered have SE with the following alleles: *0101, *0102, *0104, *0105, *0401, *0404, *0405, *0408,*0409, *1001, *1402 and *1406 (2,21).

The PGS used in this analysis was previously generated from a large international multi-ancestry GWAS of RA (PGS Catalog PGS002745) (3). The HLA region on chromosome 6 was excluded from this GWAS and were not included in the PGS. Applying this PGS to the VARA cohort, the log-odds were used as weights for the PGS, standardized using the mean and standard deviation of the score, and normalized for each ancestry group. Scores were calculated and normalized by the mean and residuals in the most similar reference population using the Polygenic Score Catalog Calculator, a publicly-available pipeline which allows for the calculation of PGS (22).

A key confounder in genetic epidemiology studies is population structure (23,24). The differences in allele frequency related to the genetic background of participants may lead to the identification of associations based on genetic background rather than the effect of the variants of interest. As genetic background cannot be directly measured, principal component analysis on the quality-controlled genotypes was performed with reference samples from the 1000 Genomes Phase 3 reference panel and the Human Genome Diversity Panel accessed through the PGS Catalog Calculator to identify groups of individuals genetically similar to reference populations (17,22) (Supplementary Fig. 1). Similarity to a reference population was assigned using FRAPOSA as implemented in the PGS Catalog Calculator (22,25). A participant’s PGS percentile was defined based on the distribution of the score in the most similar reference population. The top five principal components of population structure explained 90% of the variance and were included in analyses as covariates.

Statistical Analysis

Descriptive statistics were calculated for VARA participants stratified by quartile of PGS, PTPN22 R620W genotype, and number of SE alleles. Continuous variables were compared using analysis of variance for testing across each quartile or genotype. Categorical variables were assessed using Pearson’s Chi-squared testing.

Linear and logistic regression models were used to assess the relationship of the genetic factors. Population normalized and standardized PGS was examined as a continuous exposure. PTPN22 R620W genotype was coded as 0 for R620/R620 or GG, 1 for R620/W620 or GA, and 2 for W620/W620 or AA. Given established associations with erosive disease and seropositive, SE was used as a positive control, and as 0, 1, or 2 high-risk HLA alleles (6). We used a genotypic model (2 degrees of freedom), which does not assume a priori any specific relationship between the allelic dose and outcome (26). In these models, outcomes were assessed at the time of enrollment. Models were adjusted for the top 5 PCs of population structure, age, and sex.

To evaluate the prognostic value of higher genetic risk over follow-up, independent of the current disease activity, generalized estimating equations (GEE) were used to assess the effect the genetic factors on long-term disease activity over the follow-up period in the longitudinal cohort. GEE models account for the correlation due to repeated measures by specifying a correlation structure and providing robust standard error estimates that adjust for the correlation. In this analysis, an exchangeable correlation structure was applied. The GEE models were adjusted for the top 5 PCs of population structure, age, sex, and baseline value of the outcome measure.

To understand the potential for gene-phenotype and gene-gene interaction, the interaction of genetic exposure with disease duration and with seropositivity was tested, respectively. Testing for interaction with SE was performed for PTPN22 R620W and PGS. These analyses aimed to determine if the effect of genetic risk varies with particular disease phenotypes, considering potential differences in the biologic for the development of RA in those who were seropositive compared to those who were seronegative.

Several sensitivity analyses were also performed (Supplemental Text). We chose not to adjust for multiple comparisons in this study, due to the correlated nature of disease activity scores and their components. All statistical analyses were performed on the Veterans Affairs Informatics and Computing Infrastructure, a HIPAA-secure computing system. Analytic software included PLINK [v1.9], R [v4.3.3], and STATA [v18.0] (27,28).

Results

Participant characteristics

In 2557 participants with genotyping data available, 1603 (62.7%) had a RA PGS greater than 50th percentile (Table 1, Fig. 1, Supplementary Fig. 2). The PGS was significantly higher in VARA participants than that of the reference population (p < 0.0001). For PTPN22, 529 (21%) were heterozygous and 50 (2%) were homozygous for the R620W allele (Supplementary Table 2). In 2328 participants with genotyping and SE data , 1169 (50.1%) had one copy of SE and 420 (18.0%) had two copies (Supplementary Table 3). Table 1 shows the patient characteristics of the population stratified by PGS quartiles. Those with higher PGS had an earlier age of onset of disease; those in the highest risk quartile of the PGS had disease diagnosed 4 years earlier than those in the lowest risk quartile of the PGS (p <0.001) Those with higher PGS were more likely to have PTPN22 R620W alleles (p < 0.001) but not SE alleles (p = 0.069).

Table 1.

Patient Characteristics by Percentile of Polygenic Score in the most similar reference populations.

Total 1st quartile 2nd quartile 3rd quartile 4th quartile
N=2,557 N=436 N=518 N=655 N=948
Age (yr), median (IQR) 64 (58-72) 65 (58-73) 65 (58-73) 64 (58-72) 64 (58-71)
Female 279 (10.9%) 57 (13.1%) 66 (12.8%) 68 (10.4%) 88 ( 9.3%)
Most Similar Reference Population
 Africa (AFR) 422 (16.5%) 112 (25.7%) 113 (21.8%) 88 (13.4%) 109 (11.5%)
 Americas (AMR) 176 (6.9%) 23 (5.3%) 22 (4.2%) 55 (8.4%) 76 (8.0%)
 Central & Southeast Asia (CSA) 2 ( 0.1%) 1 ( 0.2%) 0 ( 0.0%) 0 ( 0.0%) 1 ( 0.1%)
 East Asia (EAS) 15 ( 0.6%) 0 ( 0.0%) 1 ( 0.2%) 6 ( 0.9%) 8 ( 0.8%)
 Europe (EUR) 1,941 (75.9%) 300 (68.8%) 382 (73.7%) 505 (77.1%) 754 (79.5%)
 Middle East (MID) 1 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 1 ( 0.2%) 0 ( 0.0%)
Disease duration, yr 7.9 (2.4-17.2) 6.6 (1.8-15.3) 7.0 (1.9-15.5) 7.4 (2.2-17.7) 9.5 (3.4-19.3)
Age of onset of RA, yr 63 (53.6-71.8) 65 (55.6-72.1) 64 (55.6 – 72.9) 63.5 (53.8 – 72.3) 61.6 (52.1 – 70.1)
Body Mass Index (BMI) 28.1 (24.8-31.8) 27.4 (24.3-31.4) 28.2 (25.3-32.2) 27.8 (24.7-31.4) 28.2 (24.8-32.2)
Cigarette Smoking History
 Current Smoker 520 (20.3%) 91 (20.9%) 120 (23.2%) 118 (18.0%) 191 (20.1%)
 Former Smoker 640 (25.0%) 115 (26.4%) 118 (22.8%) 165 (25.2%) 242 (25.5%)
 Never Smoker 1,342 (52.5%) 219 (50.2%) 274 (52.9%) 358 (54.7%) 491 (51.8%)
 Unknown/Not Reported 55 (2.2%) 11 (2.5%) 6 (1.2%) 14 (2.1%) 24 (2.5%)
CRP (mg/L) 5.2 (2.0-11.9) 5.0 (1.9-14.1) 5.2 (1.8-11.7) 5.3 (2.1-11.3) 5.1 (2.0-11.8)
DAS28(CRP) 3.36 (2.39-4.54) 3.47 (2.39-4.70) 3.40 (2.36-4.68) 3.32 (2.42-4.57) 3.29(2.38-4.38)
SE Positivity (1 or 2 copies) 1,589 (68.3%) 255 (64.6%) 317 (67.2%) 405 (67.1%) 612 (71.4%)
PTPN22 R620W
 0 alleles 1,978 (77.4%) 371 (85.1%) 423 (81.7%) 507 (77.4%) 677 (71.4%)
 1 allele 529 (20.7%) 60 (13.8%) 89 (17.2%) 137 (20.9%) 243 (25.6%)
 2 alleles 50 ( 2.0%) 5 ( 1.1%) 6 ( 1.2%) 11 ( 1.7%) 28 ( 3.0%)
ACPA Positive 1,972 (77.1%) 294 (67.4%) 395 (76.3%) 506 (77.3%) 777 (82.0%)
RF Positive 1,957 (76.5%) 299 (68.6%) 390 (75.3%) 513 (78.3%) 755 (79.6%)
RF and CCP Seronegative 381 (14.9%) 104 (23.9%) 79 (15.3%) 92 (14.0%) 106 (11.2%)
Current Therapy
 Methotrexate 1,304 (51.0%) 236 (54.1%) 268 (51.7%) 343 (52.4%) 457 (48.2%)
 Sulfasalazine 320 (12.5%) 50 (11.5%) 55 (10.6%) 88 (13.4%) 127 (13.4%)
 Leflunomide 244 (9.5%) 37 (8.5%) 45 (8.7%) 52 (7.9%) 110 (11.6%)
 Hydroxychloroquine 820 (32.1%) 157 (36.0%) 163 (31.5%) 188 (28.7%) 312 (32.9%)
 Any bDMARD or tsDMARD 724 (28.3%) 116 (26.6%) 152 (29.3%) 172 (26.3%) 284 (30.0%)
 Any TNF Inhibitor 635 (24.8%) 105 (24.1%) 136 (26.3%) 151 (23.1%) 243 (25.6%)
 Corticosteroid Use 854 (33.4%) 144 (33.0%) 188 (36.3%) 208 (31.8%) 314 (33.1%)
Comorbid Osteoarthritis 2,074 (81.1%) 364 (83.5%) 412 (79.5%) 528 (80.6%) 770 (81.2%)
Comorbid Spine Disease 1,223 (47.8%) 217 (49.8%) 259 (50.0%) 324 (49.5%) 423 (44.6%)
RDCI 3 (2-5) 4 (2-5) 3 (2-5) 3 (2-5) 3 (2-5)

Abbreviations: CRP – C-reactive protein; DAS28(CRP) – Disease Activity Score 28 with CRP; SE – shared epitope; ACPA – anti-citrullinated protein antibody; RF – rheumatoid factor; TNF – Tumor Necrosis Factor; IFN – Interferon; IL – Interleukin; DMARD - disease modifying anti-rheumatic drug; bDMARD biologic disease modifying anti-rheumatic drug; tsDMARD - targeted synthetic DMARD; TNF – Tumor Necrosis Factor; RDCI – Rheumatic Disease Comorbidity Index

Figure 1.

Figure 1.

Distribution of genetic risk in VARA.

Distribution of the non-HLA polygenic score by combined serologic status (RF-positive and/or ACPA-positive), normalized and standardized to the reference population.

All three genetic exposures were associated with seropositivity measured at registry enrollment (Fig. 1, Fig. 2, Table 2). PGS, 2 copies of PTPN22 R620W, and SE were associated with higher odds of seropositivity (OR of PGS: 1.41, 95% CI 1.26, 1.58, p < 0.001). The presence of the SE was associated with erosive disease at study enrollment [SE 1 allele: OR 1.40, 95% CI 1.13, 1.74, p = 0.002; SE 2 allele OR 1.46, 95% CI 1.10, 1.93, p = 0.009]; higher PGS or and presence of R620W was not.

Figure 2.

Figure 2.

Effect of SE, PTPN22 R620W, and non-HLA PGS on seropositivity and erosive disease at VARA enrollment.

determined by linear regression in a genotype model, adjusted for age, sex, and top 5 PCs of population structure. All genetic models demonstrated increased odds for seropositivity, whereas only SE was significantly associated with erosive disease.

Table 2:

Association of genetic risk with disease activity and characteristics in participants with rheumatoid arthritis at enrollment

Outcome 1 vs 0 R620W allele 2 vs 0 R620W allele 1 vs 0 SE 2 vs 0 SE 1 SD increase in PGS

OR (95% CI) OR (95% CI) OR (95% CI) OR (95% CI) OR (95% CI)

Rheumatoid Nodule 1.01 (0.79-1.28) 1.26 (0.65-2.44) 1.25 (0.99-1.59) 1.20 (0.88-1.62) 1.09 (0.99-1.21)
Erosive Disease 0.85 (0.68-1.06) 1.55 (0.78-3.06) 1.40 (1.13-1.74) * 1.46 (1.10-1.93) * 1.08 (0.98-1.18)
ACPA seropositivity 1.37 (1.07-1.74) * 3.88 (1.39-10.87) * 2.48 (1.98-3.09) ** 5.33 (3.73-7.62) ** 1.32 (1.20-1.45) **
RF seropositivity 1.14 (0.90-1.45) 1.71 (0.79-3.69) 1.59 (1.27-1.99) ** 2.36 (1.71-3.25) ** 1.26 (1.14-1.38) **
RF or ACPA seropositivity 1.29 (0.97-1.71) 4.82 (1.16-19.98) * 2.28 (1.76-2.95) ** 4.54 (2.98-6.93) ** 1.41 (1.26-1.58) **

β (95% CI) β (95% CI) β (95% CI) β (95% CI) β (95% CI)

CDAI −1.30 (−2.97-0.37) 6.10 (1.23-10.97) * −1.44 (−3.08-0.20) −1.78 (−3.94-0.38) −0.65 (−1.31-0.02)
DAS28(CRP) −0.06 (−0.21-0.09) 0.47 (0.03-0.91) * −0.02 (−0.17-0.13) −0.05 (−0.24-0.15) −0.06 (−0.12-0.00)
CRP 0.05 (−0.09-0.18) 0.23 (−0.15-0.61) 0.11 (−0.02-0.23) 0.17 (0.00-0.34) −0.01 (−0.06-0.05)
ESR 0.005 (−0.10-0.11) 0.39 (0.09-0.69) * 0.02 (−0.09-0.12) 0.03 (−0.11-0.16) 0.03 (−0.01-0.07)
Patient Global Assessment −2.04 (−4.71-0.63) 5.21 (−2.55-12.96) −0.25 (−2.82-2.33) −1.07 (−4.46-2.32) −0.83 (−1.90-0.25)
Provider Global Assessment −3.07 (−5.92-−0.21) * 8.48 (0.15-16.81) * −2.30 (−5.09-0.49) −3.30 (−6.97-0.37) −1.52 (−2.66-−0.39) *
Swollen Joint Count −0.01 (−0.53-0.51) 1.75 (0.28-3.21) * 0.06 (−0.45-0.57) −0.05 (−0.72-0.63) −0.04 (−0.25-0.17)
Tender Joint Count −0.31 (−0.98-0.35) 2.01 (0.14-3.88) * −0.30 (−0.95-0.35) −0.37 (−1.23-0.49) −0.31 (−0.58-−0.04) *

The effect of genotype on disease characteristics and current therapy at time of enrollment was determined by logistic regression. The effect of genotype at on disease activity scores and components at cohort enrollment was determined by linear regression. All the regression models were adjusted for sex, age, and top 5 PCs of population structure. Bold face indicated significant p-value,

*

indicates p < 0.05,

**

p < 0.001

Current treatments and comorbidities, including non-RA musculoskeletal comorbidities, were similar between the groups by PGS quartile (Table 1). Participants with SE were more 26-40% more likely to be on biologic therapy at cohort enrollment and over time(enrollment: 1 SE OR 1.26 95% CI 1.02-1.57, p = 0.005; 2 SE. OR 1.40, 95% CI 1.06-1.86, p = 0.005, Supplementary Fig. 3). PTPN22 R620W and PGS were not significantly associated with biologic therapy at enrollment, but those with 2 copies of PTPN22 R620W had higher odds of being on biologic therapy over all registry observations (OR 1.57, 95% CI 1.02 – 2.42, p= 0.04).

Associations with disease activity

Compared to those with 0 copies of R620W, those with 2 copies had higher CDAI and DAS28-CRP at enrollment (CDAI β=6.10, 95% CI 1.23, 10.97, p=0.014, all others in Table 2, Fig. 3, Supplementary Table 4). In exploratory analysis, those with 2 R620W alleles had higher ESR, provider global assessment, swollen joint count, and tender joint count (Table 2). No differences in disease activity scores were observed for those with one R620W allele or SE (Fig. 3). The PGS was negatively associated with disease activity scores and components, though this difference achieved statistical significance only for the provider global assessment and tender joint count (Fig. 3, Table 2).

Figure 3.

Figure 3.

Effect of SE, PTPN22 R620W, and non-HLA PGS on CDAI at enrollment.

Participants with 2 copies of PTPN22 demonstrated a higher CDAI (p = 0.014). A trend toward a negative associations with CDAI is observed for 1 standard deviation change of PGS. Effect at enrollment determined by linear regression in a genotype model, adjusted for age, sex, and top 5 PCs of population structure.

Longitudinal models including all registry visits did not demonstrate consistent associations between the PGS and follow-up disease activity scores after adjusting for baseline disease activity. DAS28(CRP) scores were lower in follow-up for those with 2 copies of SE (β: −0.10, 95% CI −0.20, −0.00020, p = 0.05), and CDAI was lower in follow-up for those with 1 copy of SE (β: −1.1, 95% CI −1.9, −0.31, Supplementary Table 5). There was no association between the PTPN22 R620W and longitudinal disease activity scores or disease activity components (Fig. 4, Supplementary Table 5).

Figure 4.

Figure 4.

Subgroup Analysis for PTPN22 R620W and CDAI.

Subgroup analysis performed by stratification in linear regression in a genotype model, adjusted for age, sex, and top 5 PCs of population structure. Longitudinal disease activity was analyzed using generalized estimating equations adjusting for these same factors as well as enrollment CDAI. Participants with 2 copies of PTPN22 demonstrated a higher CDAI at enrollment, but not in longitudinal disease activity corrected for the enrollment CDAI. This effect was predominant in those with seropositive RA, those with 1 or 2 shared epitope HLA alleles, and with disease duration ≥ 3 years. There was not a significant effect of genotype in early disease. Those with 1 PTPN22 R620W allele with seronegative RA had significantly lower disease activity, and a trend toward lower disease activity in those who were seronegative or without shared epitope with 2 PTPN22 R620W alleles, though this was not statistically significant.

Effect modification and subgroup analyses

Higher genetic risk appeared to be more strongly associated with adverse disease outcomes in those with longer disease duration. For example, there was a significant interaction (p for interaction 0.001) with PGS and disease duration for the presence of erosive disease. In stratified analyses, a higher PGS was significantly associated with lower disease activity and lower rates of erosive disease among those with < 3 years disease duration but not among those with longer disease duration. In contrast, a higher PGS was associated with higher odds of erosive disease in those with > 3 years duration (Supplementary Table 6). The association between SE and erosive disease was also only observed in those with > 3 years disease duration. The association between 2 copies of PTPN22 R620W and higher disease activity was also only observed in those with > 3 years disease duration, but not in those with shorter disease duration (Fig. 4, Supplementary Table 6).

The association between 2 PTPN22 R620W alleles and higher disease activity was also primarily observed among seropositive patients (Fig. 4, Supplementary Table 7). Those with 2 R620W alleles and who were seropositive had significantly higher DAS28(CRP). In contrast, those who were seronegative with either 1 or 2 R620W alleles had lower disease activity scores than those with the reference genotype (Fig. 4). The association between the presence of 2 R620W alleles and higher disease activity was also stronger in the presence of SE (CDAI: effect with SE 9.17, 95% CI 3.72, 14.62, p for interaction 0.011, Fig 4, Supplementary Table 8). No interactions were observed between seropositivity and the PGS, and there was no evidence of effect-modification between the PGS and SE (Supplementary Tables 7 & 8).

Discussion

This work demonstrates that genetic variants previously identified to be associated with disease susceptibility are associated with different RA phenotypes. Those with high genetic risk were diagnosed at a younger age, and, as reported elsewhere, we observed that all measures of genetic risk were associated with seropositivity. However, there were key differences in associations for other disease characteristics including patterns of disease activity, erosive changes, and medication use. These results have important implications for genetically subtyping RA, characterizing the genetic architecture of RA, and informing further research into the genetic factors contributing to disease severity in RA.

The primary advance of the current study is the identification of associations for PTPN22 R620W with disease activity. Subsequent analysis for gene-gene and gene-phenotype interactions identified effect modification between PTPN22 R620W and SE, disease duration, and seropositivity. These interactions suggest that PTPN22 may be associated with increased disease activity, primarily in those that are SE positive, seropositive, and that have had longer disease duration. This effect modification in the context of disease duration may indicate that the effects of PTPN22 R620W may take time to accrue in RA. The lack of associations between 2 copies of PTPN22 R620W and erosive disease is consistent with prior work (9); however, considering that the absolute magnitude of effect size is similar to the association of SE with erosive disease, it may be that these associations remain underpowered.

It was previously unknown how the polygenic risk for RA might be associated with disease activity in RA. The current study suggests that a PGS for RA identifies earlier onset and seropositive disease but does not identify patients who are likely to experience greater disease activity or more refractory disease after diagnosis. This may suggest that the pathways outside the HLA region involved in the development of RA are not necessarily those responsible for perpetuating disease activity. This is supported by prior work that failed to demonstrate a significant association between polygenic risk for RA and response to methotrexate or TNF inhibitors (6,29,30).

A methodological advancement in this work is through the use of the genotypic model to test the effects of PTPN22 alleles. In sensitivity analyses (supplementary text, Supplementary Table 9), a traditional additive model, as commonly performed in GWAS, would not have identified this association (26). By assuming that the effect of each additional genetic variant has an additive dose effect, an additive model may not accurately capture the true effect if the presence of two alleles has a different impact than one allele for an outcome, such as in the setting of a multiplicative effect or an autosomal recessive allele. This work suggests that revisiting the modeling approach of key genetic markers with strong pathophysiologic rationale in post-GWAS studies can ensure that the genetic effects are being accurately quantified and may expand the utility of genetic markers (26). The magnitude of effect of 2 copies of PTPN22 R620W is large and approaches the meaningful clinical difference for CDAI (31). This could support the hypothesis that this PTPN22 variant results in increased activation of T cells with low-affinity T cell receptors; increased T cell activation could subsequently lead to increased disease activity (32). However, homozygosity for PTPN22 R620W is infrequent, occurring in 2% of this cohort, which would imply only a small proportion of patients with RA may be affected. Murine male immune cells have also been shown to have higher expression of Ptpn22 compared to murine female immune cells (33). As our study population has a high proportion of male participants of European ancestry, this sex-difference in gene expression may have served to magnify the effect of genetic variation in PTPN22 on RA phenotypes.

The modest inverse associations between the PGS and SE with disease activity warrant additional discussion. Biologically, it is plausible that in this prevalent cohort, these genetic markers may be indicative of a clinical phenotype that is more likely to have a favorable response to existing RA treatments; this would be supported by our observations that SE was associated with higher odds of biologic use. Alternatively, given the association with seropositivity and erosive disease, these features may direct rheumatologists into earlier aggressive treatment. It is also important to consider the potential impact of collider bias. In collider bias, those that develop the disease despite a low genetic risk may have other exposure or risk factors that may represent stronger drivers of disease activity and severity. Similarly, it is possible the presence of comorbidities such as osteoarthritis or fibromyalgia could lead to the misclassification of RA among those with low genetic risk. In the context of these comorbidities, those with low genetic risk may appear to have poor response to therapy and persistent symptoms because of other chronic pain conditions. Reassuringly, we did not observe significant differences in baseline prevalence of osteoarthritis or degenerative disc disease among those with different levels of risk based on the PGS.

This work informs the hypothesis that the genetics of disease susceptibility in RA play a heterogeneous role in disease phenotype and disease activity. Further study is needed to identify other variants that are meaningfully associated with disease activity and to refine the ability of genetic variants to stratify patients using their genetics to higher disease activity and severity, with the ultimate goal of identifying clinically useful tools that can diagnose patients at disease onset and also identify and predict high-risk disease (34,35).

There are important limitations to consider. The VARA cohort is nationally representative of the US veteran population, which tends to be older with a higher frequency of male patients than the general RA population. Replicating these findings in other large cohorts with younger patients is important to further validate these results. Similarly, given that many participants enter VARA with prevalent disease, understanding how genetic risk influences disease activity and phenotype in cohorts with early or incident disease may provide further insight, particularly since our stratified analysis suggested a greater effect of genetic risk for those with longer disease duration. Our longitudinal analysis did not suggest that there was additional prognostic value to genetic risk models beyond the initial disease activity assessment. Despite the large overall sample size of this study, the relative infrequency of the R620W homozygosity (2%) means that the study was somewhat underpowered to fully characterize this group, particularly for the association with erosive disease.

While this cohort has a large population of non-white participants, the majority were individuals genetically similar to European reference populations (Supplementary Table 10). Evaluating these effects in more diverse cohorts would be informative since a PGS may perform differently across different populations, not only due to differences in allele frequency but also related to differences in linkage-disequilibrium structure (3).

In conclusion, heterogeneous effects were observed for the effects of genetic risk models associated with susceptibility to RA on disease activity and phenotype. Higher genetic risk was associated with higher rates of seropositivity and earlier disease onset. In addition, the presence of PTPN22 R620W homozygosity was associated with greater disease activity at enrollment in patients with RA, whereas the presence of SE was not. A non-HLA polygenic score was weakly and inversely associated with some, but not all, disease activity scores. These findings may support the hypothesis that the genetics of disease susceptibility may also influence disease activity. However, it seems likely that a broad estimate of genetic risk will not be as informative as particular genetic variants that play a role in key pathways perpetuating inflammation.

Supplementary Material

Supplementary Tables
Supplementary Text

Key Messages:

  • Polygenic risk for RA is associated with seropositivity and earlier age of onset of RA

  • Those with 2 copies of PTPN22 R620W had higher disease activity at enrollment.

  • Effects of PTPN22 were highest in seropositive RA or in those with high-risk HLA-DRB1 alleles

Acknowledgements

Dr. Baker would like to acknowledge funding through a Veterans Affairs Clinical Science Research & Development Career Merit Award (I01 CX001703). Dr. Riley would like to acknowledge funding through the NIH Pharmacoepidemiology Training grant (T32-GM075766) and the Arthritis Foundation. Dr. Baker, Dr. Riley, and Dr. George would like to thank the Penn Rheumatology Clinical Research Group and Penn Center for Clinical Epidemiology and Biostatistics for their feedback on this work. We would like to thank the ACR and ACR attendees for feedback on preliminary data for the current work presented during ACR Convergence 2025. The contents of this work do not represent the views of the Department of the Veterans Affairs or the United States Government.

Conflicts of Interest and Funding:

This work was supported by a Veterans Affairs VA Merit Award (I01 CX001703). JFB is also supported by Rehabilitation Research & Development Merit Awards (RX003644; RX004770). BRE is supported by the VA CSR&D (CX002203) and the Rheumatology Research Foundation. KDW is supported by VA CSR&D (CX002351). TRM is supported by the VA BLRD (BX003635), U.S. Department of Defense (PR200793), and National Institutes of General Medical Sciences (U54 GM115458). TRR is supported by the National Institutes of Health Pharmacoepidemiology T32 (5T32GM075766-17) and the Arthritis Foundation. AMW is supported by the Rheumatology Research Foundation. MGL is supported by the Doris Duke Foundation (award 2023-0224) and US Department of Veterans Affairs (IK2-BX006551). BIW is supported by VA CSR&D (CX002430). IDS is supported by the Duke Center for Research to Advance Healthcare Equity (Grant # 3U54MD012530-05S2).

Data availability statement:

Participant level data-sharing is restricted by the US government. Summary statistics are available upon request, and the authors welcome requests for collaboration.

References

  • 1.Honda S, Ikari K, Yano K, Terao C, Tanaka E, Harigai M, et al. Association of Polygenic Risk Scores With Radiographic Progression in Patients With Rheumatoid Arthritis. Arthritis Rheumatol 2022;74:791–800. [DOI] [PubMed] [Google Scholar]
  • 2.Zhao M, Mauer L, Sayles H, Cannon GW, Reimold A, Kerr GS, et al. HLA-DRB1 haplotypes, shared epitope, and disease outcomes in US veterans with rheumatoid arthritis. J Rheumatol 2019;46:685–693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ishigaki K, Sakaue S, Terao C, Luo Y, Sonehara K, Yamaguchi K, et al. Multi-ancestry genome-wide association analyses identify novel genetic mechanisms in rheumatoid arthritis. Nat Genet 2022;54:1640–1651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, Kurki M, Koshiba S, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 2021;53:1415–1424. Available at: 10.1038/s41588-021-00931-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Raychaudhuri S, Sandor C, Stahl EA, Freudenberg J, Lee HS, Jia X, et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat Genet 2012;44:291–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Viatte S, Plant D, Han B, Fu B, Yarwood A, Thomson W, et al. Association of HLA-DRB1 haplotypes with rheumatoid arthritis severity, mortality, and treatment response. JAMA 2015;313:1645–1656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Merkel PA, Xie G, Monach PA, Ji X, Ciavatta DJ, Byun J, et al. Identification of Functional and Expression Polymorphisms Associated With Risk for Antineutrophil Cytoplasmic Autoantibody–Associated Vasculitis. Arthritis Rheumatol 2017;69:1054–1066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Li YR, Li J, Zhao SD, Bradfield JP, Mentch FD, Maggadottir SM, et al. Meta-analysis of shared genetic architecture across ten pediatric autoimmune diseases. Nat Med 2015;21:1018–1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Taylor LH, Twigg S, Worthington J, Emery P, Morgan AW, Wilson AG, et al. Metaanalysis of the association of smoking and PTPN22 R620W genotype on autoantibody status and radiological erosions in rheumatoid arthritis. J Rheumatol 2013;40:1048–1053. [DOI] [PubMed] [Google Scholar]
  • 10.Stastny P Association of the B-cell Alloantigen DRw4 with Rheumatoid Arthritis. N Engl J Med 1978;298:869871. [DOI] [PubMed] [Google Scholar]
  • 11.Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 2018;50:1219–1224. Available at: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wheeler AM, Riley TR, Merriman TR. Genetic Risk Scores for the Clinical Rheumatologist. J Clin Rheumatol 2024;31:26–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mikuls TR, Baker JF, Cannon GW, England BR, Kerr G, Reimold A. The Veterans Affairs Rheumatoid Arthritis Registry: A unique population in rheumatoid arthritis research. Semin Arthritis Rheum 2024;70:152580. Available at: 10.1016/j.semarthrit.2024.152580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Arnett FC, Edworthy SM, Bloch DA, McShane DJ, Fries JF, Cooper NS, et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315–324. [DOI] [PubMed] [Google Scholar]
  • 15.Cannon GW, Rojas J, Reimold A, Mikuls TR, Bergman D, Sauer BC. Extraction of Rheumatoid Arthritis Disease Activity Measures From Electronic Health Records Using Automated Processing Algorithms. ACR Open Rheumatol 2019;1:632–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Cannon GW, Mikuls TR, Hayden CL, Ying J, Curtis JR, Reimold AM, et al. Merging Veterans Affairs rheumatoid arthritis registry and pharmacy data to assess methotrexate adherence and disease activity in clinical practice. Arthritis Care Res 2011;63:1680–1690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wheeler AM, Baker JF, Riley T, Yang Y, Roul P, Wysham KD, et al. Development and internal validation of a clinical and genetic risk score for rheumatoid arthritis-associated interstitial lung disease. Rheumatol 2024;00:1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Inoue E, Yamanaka H, Hara M, Tomatsu T, Kamatani N. Comparison of Disease Activity Score (DAS)28- erythrocyte sedimentation rate and DAS28- C-reactive protein threshold values. Ann Rheum Dis 2007;66:407–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wells G, Becker JC, Teng J, Dougados M, Schiff M, Smolen J, et al. Validation of the 28-joint Disease Activity Score (DAS28) and European League Against Rheumatism response criteria based on C-reactive protein against disease progression in patients with rheumatoid arthritis, and comparison with the DAS28 based on erythr. Ann Rheum Dis 2009;68:954–960. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Riley TRI, Wheeler AM, Cannon GW, Sauer B, Wysham KD, England BR, et al. Genetically-determined variation in C-reactive protein impacts disease activity assessment in rheumatoid arthritis. Rheumatol 2025;Online ahe:1–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wheeler AM, Baker JF, Poole JA, Ascherman DP, Yang Y, Kerr GS, et al. Genetic, social, and environmental risk factors in rheumatoid arthritis-associated interstitial lung disease. Semin Arthritis Rheum 2022;57:152098. Available at: 10.1016/j.semarthrit.2022.152098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lambert SA, Wingfield B, Gibson JT, Gil L, Ramachandran S, Yvon F, et al. Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization. Nat Genet 2024;56. Available at: 10.1038/s41588-024-01937-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.National Academies of Sciences Engineering and Medicine. Using population descriptors in genetics and genomics research: A new framework for an evolving field. Washington, DC: The National Academies Press; 2023. [PubMed] [Google Scholar]
  • 24.Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 2024;625:92–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhang D, Dey R, Lee S. Fast and robust ancestry prediction using principal component analysis. Bioinformatics 2020;36:3439–3446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Guindo-Martínez M, Amela R, Bonàs-Guarch S, Puiggròs M, Salvoro C, Miguel-Escalada I, et al. The impact of non-additive genetic associations on age-related complex diseases. Nat Commun 2021;12:1–14. Available at: 10.1038/s41467-021-21952-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Elhaik E Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. Nature Publishing Group UK; 2022. Available at: 10.1038/s41598-022-14395-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 2015;4:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Jiang X, Askling J, Saevarsdottir S, Padyukov L, Alfredsson L, Viatte S, et al. A genetic risk score composed of rheumatoid arthritis risk alleles, HLA-DRB1 haplotypes, and response to TNFi therapy - results from a Swedish cohort study. Arthritis Res Ther 2016;18:1–10. Available at: 10.1186/s13075-016-1174-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sysojev AÖ, Saevarsdottir S, Diaz-Gallo LM, Silberberg GN, Alfredsson L, Klareskog L, et al. Genome-wide investigation of persistence with methotrexate treatment in early rheumatoid arthritis. Rheumatol (United Kingdom) 2024;63:1221–1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ward MM, Guthrie LC, Alba MI. Clinically Important Changes in Individual and Composite Measures of Rheumatoid Arthritis. Ann Rheum Dis 2015;74:1691–1696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Anderson W, Whitman FB-, Linsley PS, Cerosaletti K, Buckner JH, Rawlings DJ. PTPN22 R620W gene editing in T cells enhances low- - avidity TCR responses. Elife 2023;22:1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lee J, Yurkovetskiy LA, Reiman D, Frommer L, Strong Z, Chang A, et al. Androgens contribute to sex bias of autoimmunity in mice by T cell-intrinsic regulation of Ptpn22 phosphatase expression. Nat Commun 2024. Available at: 10.1038/s41467-024-51869-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Knevel R, Cessie S le, Terao CC, Slowikowski K, Cui J, Huizinga TW, et al. Using genetics to prioritize diagnoses for rheumatology outpatients with inflammatory arthritis. Sci Transl Med 2020;12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hum RM, Sharma SD, Stadler M, Viatte S, Ho P, Nair N, et al. Using Polygenic Risk Scores to Aid Diagnosis of Patients With Early Inflammatory Arthritis: Results From the Norfolk Arthritis Register. Arthritis Rheumatol 2023;76:696–703. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Tables
Supplementary Text

Data Availability Statement

Participant level data-sharing is restricted by the US government. Summary statistics are available upon request, and the authors welcome requests for collaboration.

RESOURCES