Summary
Juvenile idiopathic arthritis (JIA) is a complex rheumatic disease encompassing several clinically defined subtypes of varying severity. The etiology of JIA remains largely unknown, but genome-wide association studies (GWASs) have identified up to 22 genes associated with JIA susceptibility, including a well-established association with HLA-DRB1. Continued investigation of heritable risk factors has been hindered by disease heterogeneity and low disease prevalence. In this study, we utilized shared genomic segments (SGS) analysis on whole-genome sequencing of 40 cases from 12 multi-generational pedigrees significantly enriched for JIA. Subsets of cases are connected by a common ancestor in large extended pedigrees, increasing the power to identify disease-associated loci. SGS analysis identifies genomic segments shared among disease cases that are likely identical by descent and anchored by a disease locus. This approach revealed statistically significant signals for major histocompatibility complex (MHC) class I and class III alleles, particularly HLA-A∗02:01, which was observed at a high frequency among cases. Furthermore, we identified an additional risk locus at 12q23.2–23.3, containing genes primarily expressed by naive B cells, natural killer cells, and monocytes. The recognition of additional risk beyond HLA-DRB1 provides a new perspective on immune cell dynamics in JIA. These findings contribute to our understanding of JIA and may guide future research and therapeutic strategies.
Juvenile idiopathic arthritis is the most common rheumatic disease in pediatrics, with multiple genetic risk factors. In this study, we utilized shared genomic segments analysis on whole-genome sequencing of 40 cases from 12 multi-generational pedigrees significantly enriched for JIA. This approach revealed risk haplotypes within the MHC and beyond.
Introduction
Juvenile idiopathic arthritis (JIA) is the most common pediatric rheumatic disease, with an estimated prevalence of 45 cases per 100,000 children in North America.1 Broadly, the American College of Rheumatology describes JIA as joint inflammation of unknown origin lasting at least 6 weeks in a patient under 16 years old. JIA is a phenotypically heterogeneous disease with subtypes primarily distinguished by the number of affected joints and the presence of additional symptoms like rash, uveitis, or fever. The three primary subtypes are oligoarticular JIA (involving four or fewer joints), polyarticular JIA (five or fewer joints), and systemic JIA (arthritis, fever, and other organ system involvement).2
Genome-wide association studies (GWASs) and linkage analysis have identified a limited number of genetic risk loci for JIA, some of which overlap with rheumatoid arthritis (RA).3 Notably, the human leukocyte antigen (HLA) super-locus confers the greatest genetic risk for the development of JIA and is primarily attributed to a small number of major histocompatibility complex (MHC) class II alleles.4,5,6,7 However, associations with other MHC molecules, such as HLA-A, have been observed.6,8,9,10 Comparative studies of RA and JIA have revealed that shared risk loci exhibit higher magnitudes of association in JIA,3,11 supporting the notion of a stronger genetic component in pediatric disease than in adult counterparts. The discovery and characterization of JIA risk loci hold promise for advancing our understanding of RA pathogenesis, which can contribute to improved outcomes for patients across both diseases.
Large multiplex families are invaluable to genetic risk studies as they tend to exhibit reduced genetic heterogeneity between cases and are enriched for susceptibility variants in coding and regulatory regions.12,13 To assemble such a cohort, we utilized the Utah Population Database (UPDB), a computerized resource containing genealogical information for individuals spanning up to 16 generations, linked with state and medical records.14 This dataset enabled the identification of UPDB ancestors whose descendants displayed a higher incidence of JIA compared with the population prevalence, thereby establishing high-risk pedigrees.
In this study, we utilized a cohort of 40 affected individuals from 12 high-risk multi-generational pedigrees to identify genomic regions that are likely identical by descent (IBD) using shared genomic segments (SGS) analysis. SGS analysis is a “linkage-like” method that utilizes a high-density set of genetic markers. Consecutively shared alleles are used to detect identity by state (IBS) between distantly related cases that are inferred to be likely IBD when the shared genomic segment is significantly longer than expected by chance. These statistically significant tracts of IBS are hypothesized to contain the disease risk locus shared by the cases and the ancestral founder. The algorithm also iterates over subsets of subjects within a pedigree to allow for polygenic risk in complex traits, such as JIA. SGS offers advantages over traditional likelihood-based linkage analysis and GWASs by utilizing affected individuals from extended pedigrees separated by many meioses for which sharing by chance becomes increasingly unlikely, such that a relatively small number of cases embedded in a large pedigree can yield statistically significant results.15,16 We applied this approach to 10 suitably powered, high-risk families to uncover heritable risk factors associated with the pathogenesis of JIA. By employing this technique on a cohort connected by large extended pedigrees, we aim to shed light on the genetic architecture of JIA and contribute to a better understanding of its development and progression.
Material and methods
Ascertainment of samples
DNA was extracted from whole blood from 40 subjects diagnosed with JIA, including two sibling pairs. (Table S1) Subjects were identified in the UPDB, and pedigrees were drawn for cases that share a common ancestor within the UPDB.
The UPDB was used to calculate the familial standardized incidence ratio (FSIR)17 of large extended pedigrees with JIA, as well as the number of meioses separating the genotyped cases in each pedigree. The 12 pedigrees used in this study have a statistically significantly elevated incidence of JIA (Table S2) and 10 pedigrees were deemed well-powered for shared genomic segments analysis, as a minimum of 15 total independent meioses separating the founder from all cases in a pedigree is recommended for pedigrees used in SGS analyses.15
Whole-genome sequencing and variant calling
Sample libraries were prepared using TruSeq DNA PCR-free libraries (Illumina) and run on the Illumina HiSeq X Ten System to a target depth of 60×. Genome alignment to GRCh37, variant calling, and joint genotyping with the phase three 1000 Genomes Project18 call set was performed by the Utah Center for Genetic Discovery using Sentieon.19 Variant Effect Predictor (VEP) v97.320 was used to annotate calls with gnomAD hg37 v2.121 allele frequencies, genome features, and predicted pathogenic impact. Post-genotyping quality control was performed to produce a high-quality final call set that included the removal of sites with poor genotyping quality (GQ < 29.5) and genotypes called in at least 98% of the cohort. Ancestry, sex, and relationship estimation was performed by Peddy22 to detect sample swaps or contamination and to provide appropriate ancestry matching to controls. QC included concordance between described and predicted sex, outlier heterozygosity indicative of contamination, or more relatedness between samples than expected (IBS0 and IBS2 values). Variants corresponding to Illumina’s Global Screening Array-24 v2.0 chip were lifted from the original VCF using BCFtools v1.923 for use in SGS analysis. Over the whole set of 515,000 markers, there was a median of 2.6% missing markers across each autosome. There was a notable paucity of markers on chromosome 12 (9.17% missing markers) concentrated at 12p13.33, which precluded the detection of shorter SGS segments at that locus.
SGS analysis
A detailed description of SGS methodology and statistical power has been published previously.15 In brief, SGS identifies genomic regions between distantly related cases in a large pedigree that are identical by state and longer than one would expect by chance. These segments are likely to be IBD and indicate regions harboring risk alleles transmitted by a common ancestor. Shared segments contain consecutive SNPs that share at least one allele in a pair of subjects, and segments are broken when several nearby SNPs are homozygous for the alternate and reference allele between subjects. For pedigrees with n ≥ 3 affected individuals, sharing is determined across all individuals as well as each subset of indviduals for each position in the genome. Siblings within extended pedigrees are both included in SGS analysis but contribute relatively little to the overall statistical power to detect shared segments compared with more distantly related subjects. The maximum length of a contiguous group of SNPs for which at least one allele is shared between cases represents an optimal segment for the pedigree at a particular locus. A similar procedure was performed that compares optimal segments in pairs of pedigrees to identify shared locations of segments, previously described as “Duo” analysis.24 This technique recovers statistical power in pedigrees with fewer meioses separating subjects.
For each optimal genomic segment, nominal significance is determined empirically. Haplotypes of ancestry-matched individuals (n = 134 members of generation 1 of the Utah Center d'Etude du Polymorphisme Humain [CEPH] pedigrees25,26) are randomly assigned to pedigree founders and used to generate the null distribution of SGSs using the pedigree structure of the affected individuals. Recombination and Mendelian inheritance are simulated according to the Rutgers genetic map,27 and the resulting genotypes of the simulated affected individuals in the pedigree represent sharing that can occur by chance. The proportion of control segments that are at the same position and are the same length or longer as those generated from affected indivdual data are used to calculate the nominal empirical p value. A median of 1 million simulations are performed per pedigree to generate a null distribution of genome-wide empirical p values. The Theory of Large Deviations is applied to model genome-wide fluctuations in linkage and establish pedigree-specific thresholds (T) for statistical significance or suggestive significance with a false-positive rate of 0.05 per genome.28 SGS results are also reported as μ(T), which represents the significance level in relation to threshold (T) and allows for ranking of genome-wide significant results across the entire cohort.
Haplotype estimation
SGS determines the most significant shared segment between all affected individuals in a pedigree, but the length of the shared segment varies between subsets of subjects. Between a pair of sharers, the estimated haplotype can extend beyond the core shared segment between all sharers. Patterns of sharing around the genomic segment are used to estimate the boundaries of the ancestral haplotype for each individual (Figure S1).
Whole-genome sequencing data were used to further investigate shared low-frequency or rare variants and used as points of interest for haplotype reconstruction in significant shared segments >1 Mb in length. Bcftools23 and PLINK 1.929 were used to isolate SGS markers and genomic segments and convert the resulting .vcf to .ped and .info files for import into Haploview.30 Segments exceeding 1 Mb were split into approximately 500-kb windows for ease of interpretation. Patterns of linkage disequilibrium and haplotype blocks were estimated for each region over all affected and unaffected individuals.
Genome features and variant curation
Statistically significant genomic segments were annotated with Ensembl BioMart31 for genes, GeneHancer32 for enhancer elements, and the UCSC Table Browser data retrieval tool33 for transcription factor binding sites and repetitive elements. For each pedigree with a significant shared genomic segment, all variants within the segment were filtered for a gnomAD21 allele frequency (AF) of <0.05 via Slivar34 and allele count, to narrow potential risk variants to those that are low frequency and shared between all cases in the pedigree. The resulting set of variants and genome features were converted to .bed format for filtering via Bedtools v2.25,35 and variants represented in CRAMs were reviewed in IGV.
HLA imputation
PLINK 1.929 was used to convert the post-QC VCF to bed format and extract the MHC region (Chr6: 28,000,000–34,000,000) from the whole-genome call set of 40 cases and 134 controls. Subsequent phasing and imputation were performed via HLA-TAPAS36 with the exception of the shared segment on chromosome 12. This region was phased with BEAGLE37 using ancestry-matched Thousand Genomes Project CEU data. Post imputation QC included a filter on imputation R2 values above 0.7 to eliminate poorly imputed variants.
HLA genotyping was performed using HLAScan,38 which uses BAM files as input to re-assemble the MHC region and report the estimated alleles for all major HLA groups (A, B, C, DPA1, DQA1, DRB1, etc.). This method employs uniquely mapped reads and alleles with uniform coverage to perform HLA typing and performs best on samples with 60x coverage and above. Read depth at the HLA was sufficient in all samples except for samples 1,182 and 1,183. Concordance between this method and HLA-TAPAS was approximately 92.8% across HLA-A, HLA-B, HLA-C, and HLA-DRB1, and indicated high-confidence calls to four-digit resolution. HLA-B typing contributed to a majority of discordant calls (84.3% concordance), whereas HLA-A typing was 95.8% concordant between the two methods, consistent with the reported accuracy of typing these alleles by HLA-TAPAS. Discordant calls were excluded from analysis. HLAScan38 provided two additional alleles across two individuals assayed that were not imputed by HLA-TAPAS.36 A Fisher’s exact test was performed to determine if differences in allele group frequencies between cases and controls were statistically significant and adjusted for multiple comparisons via the Benjamini-Hochberg procedure.
Short tandem repeat calling
HipSTR (version 0.6.2)39 was used for short tandem repeat (STR) analysis. For a reference STR dataset, we used the GRCh37 STR reference file from HipSTR for all analyses. HipSTR was run on each of the significant SGS and each family was analyzed individually. We used filtering methods included with HipSTR to filter for loci that had a minimum quality score >0.9, maximum flanking indels <0.15, and a maximum call stutter <0.15. To determine if STRs in potentially impactful regions of the genome were more frequent in JIA patients, we compared these to the previously generated allele frequencies for STRs from the CEPH grandparents.40
Expression analysis
We obtained normalized counts in transcripts per million (TPM) from two publicly available datasets. The Database of Immune Cell Expression (DICE)41 contains normalized counts for 15 immune cell types and/or states, and GTEx42 provides normalized counts for cultured fibroblasts. Investigation of genes expressed by immune cells was limited to genes within the boundaries of significant SGS regions and filtered for immune cell expression >0 TPM.
Ethics approval and consent to participate
All work was performed under approved University of Utah IRB protocol 9524. Ethics committees at the University of Utah approved this research. All participants provided written informed consent.
Results
All 40 samples, representing 12 pedigrees, passed variant and sample quality control. Ten pedigrees had ≧ 15 meioses, the suggested minimum number for statistical power for SGS analysis (Table S2). SGS was run on these 10 pedigrees, resulting in three genome-wide significant regions and 64 suggestive regions (Figure S2). The genome-wide significant results include a 1.8-Mb segment at 6p22.1 shared by all three subjects in pedigree K6, 0.7-Mb segment at 6p21.33-p21.32 shared by all five subjects in pedigree K5, and a 2.1-Mb segment at 12q23.2-q23.3 shared by three of the four subjects in pedigree K3 (Table 1). Interestingly, the three subjects representing the risk locus at 12q23.2-q23.3 are concordant for oligoarticular JIA, whereas the non-sharing subject was diagnosed with rheumatoid factor (RF)-positive polyarticular JIA. The three subjects within pedigree K6 are all concordant for oligoarticular JIA, and the five subjects within pedigree K5 exhibit greater phenotypic variability with oligoarticular, polyarticular, and enthesitis-related JIA all present (Figure 1).
Table 1.
Genome-wide significant segments
| Pedigree ID | Locus | Mb | Region | p value | μ(T) |
|---|---|---|---|---|---|
| K6 | 6p22.1 | 1.8 | Chr6:28213918-30025285 | 2.00 × 10−6 | 5.59 × 10−3 |
| K5 | 6p21.33-p21.32 | 0.68 | Chr6:31481146-32158319 | 5.00 × 10−7 | 7.65 × 10−3 |
| K3 | 12q23.2-q23.3 | 2.1 | Chr12:101969277-104092538 | 2.16 × 10−6 | 4.10 × 10−2 |
Genome-wide significant results from SGS analysis applied to 10 high-risk pedigrees, ordered by significance. μ(T) represents the significance level in relation to threshold (T), with the genome-wide significance threshold set to 0.05.28 This allows for inter-pedigree comparisons of SGS results given pedigree-specific thresholds.
Figure 1.
Shared genomic segments analysis results from extended high-risk juvenile idiopathic arthritis (JIA) pedigree K5
(A) Pruned pedigree containing five JIA cases connected by a common ancestor. Affected status of intervening individuals is unknown.
(B) Manhattan plot of SGS optimal segment p values across the genome. Significant genome-wide threshold (μ = 0.05) is 2.04 × 10−6. Suggestive threshold (μ = 1.0) 2.20 × 10−5.
SGS does not directly utilize WGS and instead utilizes a pre-determined, unphased set of SNP markers lifted from WGS data. Following SGS, we returned to WGS and performed HLA imputation and statistical phasing to investigate regions of interest and visualize a potential risk haplotype. In-phase shared haplotype blocks appear in each genome-wide significant segment, suggestive of DNA inherited from a common ancestor as predicted by SGS. For instance, among the five subjects sharing a 0.7-Mb segment at 6p21.33-p21.32, several in-phase predicted haplotype blocks are shared spanning Chr6:31572481-31618761. This region includes the MHC class III genes AIF1 and PRRCA, which neighbor TNF, a gene encoding a pro-inflammatory cytokine (TNF-alpha) that has been a successful target in therapies for RA.43 In addition, a 61.7-kb haplotype encompassing HLA-A is shared across the three subjects representing the genome-wide significant segment at 6p22.1. When expanded to the unaffected individuals, which are representative of individuals of European ancestry, and the entire set of affected individuals, the core haplotype encompassing HLA-A appears at a frequency of 42.5% in subjects and 38.4% in control individuals. Given the haplotype frequencies observed, a rare pathogenic variant that alters the expression of genes or otherwise confers risk of JIA may have recently appeared on this haplotype in a common ancestor. In the absence of targeted MHC sequencing data, such a variant was not genotyped in the pedigree K6 sharers. This analysis allowed us to narrow a region identified by SGS, which tends to be much longer than the average human haplotype and estimate the boundaries of the core risk haplotype.
Both genome-wide significant segments on chromosome 6 fall within the MHC region. The segment located at 6p22.1 contains several HLA class I genes, including HLA-A. The segment at 6p21.33-p21.32 includes several MHC class III genes. In contrast, HLA Class II genes such as HLA-DRB1 did not exhibit genomic segment sharing. This region had sufficient markers to detect significant sharing between subjects, and SGS segments that either partially or entirely contained HLA class II genes did not exceed genome-wide significance thresholds for any extended pedigrees (Table S5).
Duo SGS analysis, a pairwise pedigree approach to SGS, recovers statistical power in smaller extended pedigrees and identifies IBD in the same genomic location as other pedigrees.24 This analysis produced 27 additional genome-wide significant associations between pedigrees at 6p22.1 and 6p21.33-p21.32, and one additional 1.4- to 2.2-Mb genome-wide significant segment at 5q14.1 among five subjects across two pedigrees (two subjects and three subjects, respectively). The boundaries of the segments from the duo results implicating the MHC are similar to the boundary coordinates of the genome-wide significant single-pedigree results at the same loci. Pedigrees K5 and K6 are included in a majority of the genome-wide significant pairwise results, but do not drive the associations exclusively. The consistent replication of the results within the MHC across several additional pedigrees suggests that MHC class I and III molecules are relevant to JIA risk (Table S3).
JIA is more prevalent in females vs. males44 and this bias is observed at a 3:1 ratio in our cohort. We sought to determine if candidate risk haplotypes were observed with equal prevalence in male and female subjects. However, with only 10 male JIA subjects, the capacity to perform robust statistical analysis was limited. We performed a jackknife estimate to determine if sex covaries with the risk haplotype in subjects and/or control individuals. Relatedness as a variable was not a concern; apart from one sibling pair, the subjects are effectively unrelated—a key component to SGS analysis. We observe that the candidate HLA-A haplotype is carried more frequently in female subjects than male subjects (0.73 vs. 0.5). The jackknife estimate of bias for females and males was effectively zero (females:1.11 × 10−16, males: 0), and the variance was approximately 0.03 in both categories. The HLA-A risk haplotype is slightly more prevalent in female control individuals vs. male control individuals (0.67 vs. 0.59). The jackknife estimate of bias was approximately zero (females: 5.55 × 10−15, males: 2.22 × 10−16), with a variance of 0.003 between both female and male control individuals. The same test was also used to investigate the MHC class III candidate risk haplotype. In affected individuals, the risk haplotype is more prevalent in females (0.67 vs. 0.5). Control individuals also show a higher frequency of the candidate MHC class III haplotype in females vs. males (0.54 vs. 0.33). Across all categories, the jackknife estimate of bias was essentially zero (female cases: 1.11 × 10−16, male cases: 0, female controls: −7.77 × 10−16, male controls: 5.55 × 10−17). Variance was approximately 0.02 in female and male subjects, and 0.02 in female and male control individuals. We then opted to perform a Fisher’s exact test to determine the statistical significance of the risk haplotype frequencies between male and female subjects. We were unable to detect a statistically significant difference in the proportion of male and female subjects carrying the candidate HLA-A haplotype (p = 0.246, odds ratio [OR]: 2.67; 95% confidence interval [CI]: 0.48–15.39). Similarly, we were underpowered to detect a difference in the MHC III candidate risk haplotype frequencies (p = 0.4568, OR: 1.96; 95% CI: 0.36–10.89).
Given the importance of shared MHC segments in these pedigrees, HLA imputation and four-digit typing was performed across all cases and controls included in the SGS analysis. HLAscan produced genotyping calls for classic HLA alleles across the cohort and was used to investigate the relative frequencies of HLA alleles. Follow-up work to determine phased HLA alleles was performed via SNP2HLA, a module within HLA-TAPAS, to estimate shared HLA haplotypes between cases with statistically significant genomic segments. There was a high degree of similarity of HLA allele genotyping between the two methods at four-digit resolution. HLA alleles were of particular interest in the single-pedigree result at 6p22.1, shared among three subjects. These individuals share the HLA-A∗02:01 allele, and two of the three subjects appear to share a longer HLA haplotype including A∗02:01-C∗05:01-B∗44:02 and A∗24:02-C∗03:03-B∗15:01. Over the entire cohort, there is an increased frequency of HLA-A∗02:01 in the subjects vs. control individuals, with HLA-A∗02:01 present at an AF of 0.41 in affecteds and 0.35 in control individuals (p value = 0.419). HLA-A∗02:01 is present in nearly all subjects implicated by SGS at 6p22 but is also present in subjects not directly implicated by SGS (Table S9). There is also an increase in the frequency of HLA-DRB1∗08:01, DRB1∗11:01, and DRB1∗11:03 alleles among subjects, which are not directly implicated by SGS but are well-established risk alleles for JIA (Figure 2). Due to the heterogeneity of HLA alleles and modest sample size, this study is underpowered to establish statistical significance for the differences in frequencies observed in HLA allele groups.
Figure 2.
HLA-A and HLA-DRB1 allele group frequencies between JIA cases and ancestry-matched controls
Frequencies are reported as a proportion of allele counts.
Within the genome-wide significant shared segments, rare single nucleotide variants (AF < 0.001) in coding regions or regulatory elements shared between affected individuals in a pedigree were not detected. However, a limited number of low-frequency (gnomAD AF < 0.05) single nucleotide variants (SNV) and STRs were shared. Three subjects in pedigree K3 share low-frequency SNVs that fall within GeneHancer32 predicted enhancer elements and transcription factor binding sites (TFBSs) within the 2.1-Mb segment at 12q23.2-q23.3. A T>C variant at position Chr12:102137552 (rs73163751, gnomAD AF = 0.023) falls within an annotated enhancer (GH12J101742) predicted to interact with 14 transcription factors including several with immunological relevance: NF-κB, RUNX3, and PAX5 (Table S4). NF-κB for example, is a transcription factor involved in the regulation of several critical biological processes including proliferation, development, and inflammation.45 This SNV lies on a haplotype shared by three of the four subjects within the significant segment at 12q23.2-q23.3, consistent with the SGS result. This SNV is only present in one additional subject in the cohort, not implicated by SGS (Table S9). In addition, several STRs were investigated. At 6p22.1, a 6-base pair (bp) dinucleotide expansion within the first intron of PRRC2A is shared by all five subjects within pedigree K5 and present in several additional subjects in the cohort (Table S9). This locus intersects an enhancer element (GH06J031619) that binds 21 transcription factors, including NF-κB. Several of the same transcription factors are also predicted to interact with an enhancer (GH06J028349) that contains a 2-bp deletion of a dinucleotide repeat within an intron of ZCAN31 shared among the three subjects of pedigree K6. This locus appears to harbor significant regulatory activity, with 53 transcription factor annotations including JUND, IRF1, and NF-κB.
The genomic segments investigated in this study each contain several genes, and in the absence of a clear pathogenic variant and gene candidate, variation of gene expression between tissues generates hypotheses concerning the disease risk contained within a shared genomic segment. CD4+ T cells are the most frequently implicated cell type in JIA pathophysiology, but the roles of additional immune cell types such as macrophages have also been investigated.46 To capture a wide array of immune cell types, we utilized the Database of Immune Cell Expression (DICE) to assess expression data for 15 immune cell types and states in genes that lie within genome-wide significant regions. The 2-Mb segment at 12q23.2-q23.3 shared by three distantly related subjects contains the gene DRAM1, which encodes an autophagy induction protein that is primarily expressed by monocytes. Other genes that appear to be primarily expressed by a specific cell type include CHPT1, primarily expressed by naive B cells, and GNPTAB, primarily expressed by NK cells. (Figure 3) These genes within statistically significant SGS segments are expressed by immune cell types that may contribute to the activation of T cells and are much less commonly investigated in the context of JIA pathophysiology.
Figure 3.
Immune cell expression of genes located at 12q.23.2–23.3 using data from the database of immune cell expression (DICE) and GTEx (fibroblasts).
Discussion
Due to the relative rarity of JIA, very few studies have leveraged cohorts of JIA families to identify genetic risk factors. A large cohort of high-risk JIA families was utilized in a linkage analysis including 121 affected sibling pairs but was limited to genotype data for HLA-DR alleles and a set of 386 microsatellite markers.47 In contrast, this work includes 511K SNP markers, significantly improving upon the resolution of previous methods. The extended, high-risk pedigrees utilized in this work enabled the detection of long tracts of DNA shared between distantly related JIA cases that are likely IBD and anchored by a segregating risk locus. Our study implicates three genomic regions, two within the MHC region and one at 12q23.2-q23.3. Exploration of genes and variants within these regions suggest genetic risk within immunologically relevant TFBSs, MHC class III molecules, and HLA-A. Thompson et al. confirmed risk within the HLA (LOD 2.26) and several other loci with LOD scores <3.0, indicating significant heterogeneity in their cohort and the phenotype.47 Stratifying subjects on JIA subtype or on the HLA allele HLA-DRB1∗08 produced LOD scores >3, suggesting genetically distinct subtypes of disease.47 An MHC fine-mapping study found that HLA susceptibility loci differed between RF-positive and -negative subjects,10 providing further rationale for separating RF− subjects (oligoarticular and polyarticular) from RF+ polyarticular JIA subjects. SGS analysis is well suited for the phenotypic heterogeneity of JIA found in our cohort and highlights genetic risk that is unique and shared across subtypes.
The MHC region has been estimated to contribute up to 13% of the genetic risk of JIA.4 HLA-DRB1, a MHC class II molecule, is the most consistently implicated risk factor for JIA, indicating that the adaptive immune system is a major component of disease pathogenesis. Interestingly, HLA-DRB1 also influences adult RA risk but does not share all the same risk allele groups as JIA. JIA HLA-DRB1 risk is heterogeneous and includes HLA-DRB1∗08, DRB1∗11, and DRB1∗13. Seropositive polyarticular JIA risk is positively associated with HLA-DRB1∗04, the predominant DRB1 risk allele for adult RA. Conversely, HLA associations for oligoarticular arthritis and RF− polyarticular arthritis are most correlated with seronegative adult RA.10 Adult RA risk is also associated with DRB1∗01 and DRB1∗14, which are not associated with JIA. We observe notably increased frequencies of DRB1∗08 (case AF = 0.11, control AF = 0.03, p = 0.081) and DRB1∗11 (case AF = 0.11, control AF = 0.04, p = 0.212) in case individuals vs. control individuals, although the difference fails to reach statistical significance after multiple testing correction (Figure 2). The well-established JIA risk haplotype DRB1∗08:01-DQA1∗04:01-DQB1∗04:024,6 is also found at higher frequency in affected individuals vs. control individuals but does not lie in regions likely inherited from a common ancestor. SGS analysis detects regions that are likely IBD from a distant common ancestor, and the heterogeneity of DRB1 risk alleles between affected individuals suggests that these alleles are on distinct haplotypes not inherited from a common ancestor. Therefore, SGS would not recognize DRB1 alleles as a shared genetic risk factor despite their relevance to JIA pathogenesis.
MHC class I molecules are expressed by almost all somatic tissues and are critical for triggering adaptive immune responses. Our results include two non-overlapping genomic segments found within the MHC region containing MHC class I and III genes. A classic MHC class I gene, HLA-A, is contained within the most significant shared region identified by SGS. This 1.8-Mb region contains several other HLA and MHC class I genes; however, HLA-A is by far the most highly expressed gene across all immune cell types and fibroblasts in this region (Figure S3) and has been previously suggested to play a role in JIA risk. However, the biological mechanism for the role HLA-A plays in JIA pathogenesis is not clear. We speculate that autoantigens bound to HLA-A are recognized by autoreactive cytotoxic T cells and lead to disease, as hypothesized in type 1 diabetes and vitiligo.48,49 The three individuals who share the genome-wide significant segment on 6p.22 all share HLA-A∗02:01, which appears in the cohort at a higher frequency than controls (Figure 2). This is notable, as pairwise comparisons of pedigrees, which recover statistical power among smaller pedigree structures, replicate the single-pedigree MHC class I signal.
Previous studies have found associations between HLA-A∗02 and oligoarticular JIA, including transmission disequilibrium tests8,50 and in HLA association tests.6,9 Risk in the HLA is still predominantly conferred by HLA-DRB1 alleles,4 but HLA-A also contributes to the overall risk contributed by the HLA.8,10 SNPs near HLA-A were found to have ORs of up to 1.54 (95% CI: 1.39–1.7, p value: 2.22 × 10−16), indicating that genetic risk harbored around this locus moderately increases risk of developing JIA. This is a notable association despite SNPs near HLA-DQB1 having ORs of up to 6 (95% CI: 5.3–6.81, p value: 3.14 × 10−174).4 A fine-mapping study of the MHC in JIA subjects found a statistically significant association for a valine at HLA-A amino acid 95,10 but our subjects are all homozygous for the reference allele encoding a serine at this locus. Interestingly, a study of serological and genotyped adult RA patients found that frequencies of HLA-A∗02 were higher for those in the younger-onset category.51 However, no association was detected with a larger cohort of seropositive adult RA patients,52 further suggesting that the effect of HLA-A∗02 may be relevant and unique to a particular subtype of disease. A combinatorial effect of MHC Class I and II alleles may be pertinent for understanding the genetic mechanisms that distinguish the varying severity seen between JIA subtypes.
Genomic features and gene expression patterns associated with significant SGS guide our understanding of potential pathogenic mechanisms at those loci. For instance, SGS implicates a region of IBD containing MHC class III molecules in a single pedigree as well as between pedigrees, suggesting that processes of the immune system regulated at this locus are relevant to JIA risk. This region contains genes associated with the “TNF locus” (e.g., AIF1, PRRC2A/BAT2, TNF, LTB, NFKBIL1) that have been previously associated with several autoimmune disorders including adult RA53,54,55,56,57 but have not yet been associated with JIA. The density of genes at this locus complicates the identification of a specific candidate gene or risk variant. However, a 6-bp expansion of an STR in the first intron of PRRC2A is found across all sharers in pedigree K5 and found at a frequency of 42.31% among all 40 subjects in the cohort. This expansion was also observed at a frequency of approximately 39.62% in 130 control individuals, suggesting that it may not explain disease risk within this region. In addition, imputation and statistical phasing around PRRC2A indicate that the haplotype shared by affected individuals in pedigree K5 appears at a frequency of 35% in affected individuals and 39.9% in control individuals. The high mutability of STRs could contribute to disease risk on several haplotypes.40 It is worth noting that microsatellite repeats in PRRC2A are suspected to influence the age of onset of type 1 diabetes,58 another autoimmune disease often diagnosed in pediatric patients. Given that STRs have been shown to differentially bind transcription factors, even in the absence of known motifs, dysregulation of gene expression at this locus could contribute to autoimmune disease pathogenesis.59 Future work may find genetic differences between the risk conferred by MHC class III genes in RA vs. JIA, perhaps stemming from non-coding variation resulting in differential regulation of immune response genes.
Regulatory elements are also important in understanding the development of JIA, as several genetic studies of autoimmune disease identify risk variants that act through transcriptional regulation impacting immune cell regulation and development.60 For example, a gene-mapping study of vitiligo, an autoimmune disease that results in the loss of melanocytes, found strong linkage disequilibrium between HLA-A∗02:01 and an SNP (rs60131261) 20 kb downstream. This risk haplotype includes an ENCODE transcriptional enhancer that resulted in increased HLA-A expression on the surface of peripheral blood mononuclear cells.49 Low-frequency variants and STR variation within enhancer elements in statistically significant segments share interactions with NF-κB, which was identified as a key transcription factor contributing to JIA risk in a large GWAS and fine-mapping study.61 SNPs within regulatory elements can result in allele-specific transcription factor binding, and in fact, autoimmune disease-associated SNPs within enhancers that alter transcription factor binding and gene expression in T cells have been identified.62,63 These findings are strongly suggestive that these haplotypes harbor pathogenic genetic variation that could result in inflammatory pathway sensitization, such as NF-κB signaling, and identify candidates for further computational and functional follow-up.
When investigating functionally relevant variation within our SGS, we hypothesize that low-frequency or rare genetic risk anchors a risk haplotype segregating in a large pedigree. We speculate that these risk variants could be in LD with GWAS SNPs identified in larger, unrelated cohorts. Three total SNPs from JIA and RA GWASs fell within the boundaries of a genome-wide significant or suggestive significant genomic segment. (Table 2) Most notable is rs76870128, a JIA GWAS SNP that lies in a distal enhancer upstream of CEP70.61 The genomic region containing this SNP was also implicated in a large linkage analysis with a suggestive LOD score of 2.16.47 This SNV was not present in our subjects; however, RA GWAS SNVs rs2105325 and rs11605042 were observed in subjects at allele frequencies of 70% and 53%, respectively (Table S10). Allele frequencies of 75% and 48% are reported in ancestry-matched Europeans in gnomAD (v4.0.0). Using an approach that takes advantage of the extended pedigree structures within the sample cohort, our study replicates previous JIA risk associations within the MHC region and implicates novel risk genes with a considerably smaller sample size relative to GWASs and other association studies.
Table 2.
Genome-wide association study SNPs of JIA and RA intersect with SGS results
| Chromosome | SGS segment start | SGS segment end | SGS p value | GWAS SNP | GWAS p value | GWAS odds ratio |
|---|---|---|---|---|---|---|
| 3 | 136879530 | 139738005 | .00135 | JIA (rs76870128) | 2.66 × 10−6 | 1.61 |
| 1 | 173272892 | 176945623 | .00123 | RA (rs2105325) | 2.476 × 10−11 | 1.09 |
| 11 | 61812942 | 79026374 | 7.19E−06 | RA (rs11605042) | 5.361 × 10−9 | 0.94 |
| 11 | 70728944 | 74187502 | .00142 | RA (rs11605042) | 5.361 × 10−9 | 0.94 |
This study identifies shared haplotypes transmitted by a common ancestor to distantly related subjects using SGS analysis and hypothesizes that a segregating disease locus anchors that haplotype. Our SGS analysis included 10 large high-risk JIA pedigrees, leading to the identification of three genome-wide significant regions on chromosomes 6 and 12. The replication of these findings in pairwise pedigree comparisons underscores the risk associated with MHC class I and III molecules. Historically, GWASs have predominantly assigned statistical signals of genetic risk at the MHC region to MHC class II molecules such as HLA-DRB1. However, this study suggests a more complex role of HLA in JIA risk. The genomic regions identified here narrow the genomic search space for pathogenic variation relevant to JIA pathogenesis. Further investigation is warranted to determine precise mechanisms of action and genetic interactions between HLA alleles in each JIA subtype, and to estimate the frequencies of these haplotypes in a diverse array of ancestral populations.
Data and code availability
The Shared Genome Segment (SGS) analysis software is freely available and can be accessed online: https://uofuhealth.utah.edu/huntsman/labs/camp/analysis-tool/shared-genomic-segment.php. Detailed pedigree structures necessary for these analyses were acquired from the UPDB. These are considered potentially identifiable by the Resource for Genetic and Epidemiologic Research (RGE)—the ethical oversight committee for the UPDB. As a result, access to these data require review by the RGE committee (contact Jahn Barlow, jahn.barlow@utah.edu). Upon RGE approval, pedigree structures in a format ready to be used by the SGS software will be provided. Whole-genome sequencing data generated during this study are accessible on Sequence Read Archive (SRA). The accession number for the sequence data reported in this paper is PRJNA1054595 (“Whole-genome sequencing of juvenile idiopathic arthritis cases”).
Web resources
BCFtools, http://www.htslib.org/download/
Beagle, http://faculty.washington.edu/browning/beagle/
Database of Immune Cell Expression (DICE), https://dice-database.org/
Bedtools, https://bedtools.readthedocs.io/en/latest/
Biomart, https://www.ensembl.org/biomart/martview/
gnomAD browser, https://gnomad.broadinstitute.org/
GTEx Portal, https://www.gtexportal.org/home/
Haploview, https://www.broadinstitute.org/haploview/downloads.
HipSTR, https://hipstr-tool.github.io/HipSTR/
HLAscan, https://github.com/SyntekabioTools/HLAscan.
HLA-TAPAS, https://github.com/immunogenomics/HLA-TAPAS/tree/master/SNP2HLA.
Peddy, https://github.com/brentp/peddy
PLINK 1.9, https://www.cog-genomics.org/plink/
Sentieon, https://www.sentieon.com/
Shared Genomic Segments Analysis, https://gitlab.com/camplab/sgs
Slivar, https://github.com/brentp/slivar
UCSC Table Browser, https://genome.ucsc.edu/cgi-bin/hgTables
Variant Effect Predictor (VEP), https://ensembl.org/info/docs/tools/vep/index.html.
Acknowledgments
We thank the Pedigree and Population Resource of Huntsman Cancer Institute, University of Utah (funded in part by the Huntsman Cancer Foundation) for its role in the ongoing collection, maintenance, and support of the Utah Population Database (UPDB). We also acknowledge partial support for the UPDB through grant P30 CA2014 from the National Cancer Institute, University of Utah, and from the University of Utah’s program in Personalized Health and Utah Clinical and Translational Science Institute. We thank the staff at the UPDB for their support in the identification of the JIA pedigrees. We greatly appreciate Rob Sargent and Myke Madsen for technical and programming support performing the SGS analyses, and Nicola Camp for guidance on feasibility and SGS concepts. The authors gratefully acknowledge the Utah Genome Project and the Chan Soon-Shiong Family Foundation for providing the funds for sequencing the study samples.
The research reported in this publication was supported in part by NIH R35GM118335 (to L.B.J.) and the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number TL1TR002540. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Declaration of interests
The authors declare no competing interests.
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.xhgg.2024.100277.
Contributor Information
Cecile N. Avery, Email: cecile.avery@utah.edu.
Lynn B. Jorde, Email: lbj@genetics.utah.edu.
Supplemental information
References
- 1.Harrold L.R., Salman C., Shoor S., Curtis J.R., Asgari M.M., Gelfand J.M., Wu J.J., Herrinton L.J. Incidence and Prevalence of Juvenile Idiopathic Arthritis Among Children in a Managed Care Population, 1996–2009. J. Rheumatol. 2013;40:1218–1225. doi: 10.3899/JRHEUM.120661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zaripova L.N., Midgley A., Christmas S.E., Beresford M.W., Baildam E.M., Oldershaw R.A. Juvenile idiopathic arthritis: from aetiopathogenesis to therapeutic approaches. Pediatr. Rheumatol. Online J. 2021;19 doi: 10.1186/S12969-021-00629-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jia J., Li J., Yao X., Zhang Y.H., Yang X., Wang P., Xia Q., Hakonarson H., Li J. Genetic architecture study of rheumatoid arthritis and juvenile idiopathic arthritis. PeerJ. 2020;2020 doi: 10.7717/PEERJ.8234/SUPP-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hinks A., Cobb J., Marion M.C., Prahalad S., Sudman M., Bowes J., Martin P., Comeau M.E., Sajuthi S., Andrews R., et al. Dense genotyping of immune-related disease regions identifies 14 new susceptibility loci for juvenile idiopathic arthritis. Nat. Genet. 2013;45:664–669. doi: 10.1038/NG.2614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ombrello M.J., Remmers E.F., Tachmazidou I., Grom A., Foell D., Haas J.P., Martini A., Gattorno M., Özen S., Prahalad S., et al. HLA-DRB1∗11 and variants of the MHC class II locus are strong risk factors for systemic juvenile idiopathic arthritis. Proc. Natl. Acad. Sci. USA. 2015;112:15970–15975. doi: 10.1073/PNAS.1520779112/SUPPL_FILE/PNAS.1520779112.ST05.DOCX. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hollenbach J.A., Thompson S.D., Bugawan T.L., Ryan M., Sudman M., Marion M., Langefeld C.D., Thomson G., Erlich H.A., Glass D.N. JUVENILE IDIOPATHIC ARTHRITIS AND HLA CLASS I AND CLASS II INTERACTION AND AGE OF ONSET EFFECTS. Arthritis Rheum. 2010;62:1781–1791. doi: 10.1002/ART.27424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Angeles-Han S.T., McCracken C., Yeh S., Jang S.R., Jenkins K., Cope S., Bohnsack J., Hersh A., Thompson S.D., Prahalad S. HLA Associations in a Cohort of Children With Juvenile Idiopathic Arthritis With and Without Uveitis. Invest. Ophthalmol. Vis. Sci. 2015;56:6043–6048. doi: 10.1167/IOVS.15-17168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zeggini E., Donn R.P., Ollier W.E.R., Thomson W., British Paediatric Rheumatology Study Group. Becker M., Bell A., Craft A., Crawley E., David J., et al. Evidence for linkage of HLA loci in juvenile idiopathic oligoarthritis: independent effects of HLA-A and HLA-DRB1. Arthritis Rheum. 2002;46:2716–2720. doi: 10.1002/ART.10551. [DOI] [PubMed] [Google Scholar]
- 9.Yanagimachi M., Miyamae T., Naruto T., Hara T., Kikuchi M., Hara R., Imagawa T., Mori M., Kaneko T., Goto H., et al. Association of HLA-A∗02:06 and HLA-DRB1∗04:05 with clinical subtypes of juvenile idiopathic arthritis. J. Hum. Genet. 2011;56:196–199. doi: 10.1038/jhg.2010.159. [DOI] [PubMed] [Google Scholar]
- 10.Hinks A., Bowes J., Cobb J., Ainsworth H.C., Marion M.C., Comeau M.E., Sudman M., Han B., Juvenile Arthritis Consortium for Immunochip. Becker M.L., et al. Fine-mapping the MHC locus in juvenile idiopathic arthritis (JIA) reveals genetic heterogeneity corresponding to distinct adult inflammatory arthritic diseases. Ann. Rheum. Dis. 2017;76:765–772. doi: 10.1136/ANNRHEUMDIS-2016-210025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hersh A.O., Prahalad S. Immunogenetics of juvenile idiopathic arthritis: A comprehensive review HHS Public Access. J. Autoimmun. 2015;64:113–124. doi: 10.1016/j.jaut.2015.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wijsman E.M. The role of large pedigrees in an era of high-throughput sequencing. Hum. Genet. 2012;131:1555–1563. doi: 10.1007/s00439-012-1190-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hanson H.A., Leiser C.L., Madsen M.J., Gardner J., Knight S., Cessna M., Sweeney C., Doherty J.A., Smith K.R., Bernard P.S., et al. Family Study Designs Informed by Tumor Heterogeneity and Multi-Cancer Pleiotropies: The Power of the Utah Population Database. Cancer Epidemiol. Biomarkers Prev. 2020;29:807–815. doi: 10.1158/1055-9965.EPI-19-0912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Smith K.R., Fraser A., Reed D.L., Barlow J., Hanson H.A., West J., Knight S., Forsythe N., Mineau G.P. The Utah Population Database. A Model for Linking Medical and Genealogical Records for Population Health Research. Hist. Life Course Stud. 2022;12:58–77. doi: 10.51964/HLCS11681. [DOI] [Google Scholar]
- 15.Waller R.G., Darlington T.M., Wei X., Madsen M.J., Thomas A., Curtin K., Coon H., Rajamanickam V., Musinsky J., Jayabalan D., et al. Novel pedigree analysis implicates DNA repair and chromatin remodeling in multiple myeloma risk. PLoS Genet. 2018;14 doi: 10.1371/JOURNAL.PGEN.1007111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Knight S., Abo R.P., Abel H.J., Neklason D.W., Tuohy T.M., Burt R.W., Thomas A., Camp N.J. Shared Genomic Segment Analysis: The Power to Find Rare Disease Variants. Ann. Hum. Genet. 2012;76:500–509. doi: 10.1111/J.1469-1809.2012.00728.X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kerber R.A. Method for calculating risk associated with family history of a disease. Genet. Epidemiol. 1995;12:291–301. doi: 10.1002/GEPI.1370120306. [DOI] [PubMed] [Google Scholar]
- 18.Sudmant P.H., Rausch T., Gardner E.J., Handsaker R.E., Abyzov A., Huddleston J., Zhang Y., Ye K., Jun G., Fritz M.H.Y., et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75–81. doi: 10.1038/nature15394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Freed D., Aldana R., Weber J.A., Edwards J.S. The Sentieon Genomics Tools - A fast and accurate solution to variant calling from next-generation sequence data. bioRxiv. 2017;7 doi: 10.1101/115717. Preprint at. [DOI] [Google Scholar]
- 20.McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R.S., Thormann A., Flicek P., Cunningham F. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Pedersen B.S., Quinlan A.R. Who’s Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy. Am. J. Hum. Genet. 2017;100:406–413. doi: 10.1016/j.ajhg.2017.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Danecek P., Bonfield J.K., Liddle J., Marshall J., Ohan V., Pollard M.O., Whitwham A., Keane T., McCarthy S.A., Davies R.M., Li H. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10 doi: 10.1093/GIGASCIENCE/GIAB008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Waller R.G., Madsen M.J., Gardner J., Sborov D.W., Camp N.J. Duo Shared Genomic Segment analysis identifies a genome-wide significant risk locus at 18q21.33 in myeloma pedigrees. J Transl Genet Genom. 2021;5 doi: 10.20517/JTGG.2021.09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Dausset J., Cann H., Cohen D., Lathrop M., Lalouel J.M., White R. Centre d’Etude du polymorphisme humain (CEPH): Collaborative genetic mapping of the human genome. Genomics. 1990;6:575–577. doi: 10.1016/0888-7543(90)90491-C. [DOI] [PubMed] [Google Scholar]
- 26.Prescott S.M., Lalouel J.M., Leppert M. From linkage maps to quantitative trait loci: the history and science of the Utah genetic reference project. Annu. Rev. Genom. Hum. Genet. 2008;9:347–358. doi: 10.1146/ANNUREV.GENOM.9.081307.164441. [DOI] [PubMed] [Google Scholar]
- 27.Matise T.C., Chen F., Chen W., De La Vega F.M., Hansen M., He C., Hyland F.C.L., Kennedy G.C., Kong X., Murray S.S., et al. A second-generation combined linkage physical map of the human genome. Genome Res. 2007;17:1783–1786. doi: 10.1101/GR.7156307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lander E., Kruglyak L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat. Genet. 1995;11:241–247. doi: 10.1038/NG1195-241. [DOI] [PubMed] [Google Scholar]
- 29.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., De Bakker P.I.W., Daly M.J., Sham P.C. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Barrett J.C., Fry B., Maller J., Daly M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–265. doi: 10.1093/BIOINFORMATICS/BTH457. [DOI] [PubMed] [Google Scholar]
- 31.Cunningham F., Allen J.E., Allen J., Alvarez-Jarreta J., Amode M.R., Armean I.M., Austine-Orimoloye O., Azov A.G., Barnes I., Bennett R., et al. Ensembl 2022. Nucleic Acids Res. 2022;50:D988–D995. doi: 10.1093/NAR/GKAB1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Fishilevich S., Nudel R., Rappaport N., Hadar R., Plaschkes I., Iny Stein T., Rosen N., Kohn A., Twik M., Safran M., et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database. 2017;2017 doi: 10.1093/DATABASE/BAX028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Karolchik D., Hinricks A.S., Furey T.S., Roskin K.M., Sugnet C.W., Haussler D., Kent W.J. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32 doi: 10.1093/NAR/GKH103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Pedersen B.S., Brown J.M., Dashnow H., Wallace A.D., Velinder M., Tristani-Firouzi M., Schiffman J.D., Tvrdik T., Mao R., Best D.H., et al. Effective variant filtering and expected candidate variant yield in studies of rare human disease. NPJ Genom. Med. 2021;6:60–68. doi: 10.1038/s41525-021-00227-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Luo Y., Kanai M., Choi W., Li X., Sakaue S., Yamamoto K., Ogawa K., Gutierrez-Arcelus M., Gregersen P.K., Stuart P.E., et al. A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response. Nat. Genet. 2021;53:1504–1516. doi: 10.1038/s41588-021-00935-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Browning B.L., Tian X., Zhou Y., Browning S.R. Fast two-stage phasing of large-scale sequence data. Am. J. Hum. Genet. 2021;108:1880–1890. doi: 10.1016/J.AJHG.2021.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ka S., Lee S., Hong J., Cho Y., Sung J., Kim H.N., Kim H.L., Jung J. HLAscan: Genotyping of the HLA region using next-generation sequencing data. BMC Bioinf. 2017;18:258–269. doi: 10.1186/S12859-017-1671-3/FIGURES/3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Willems T., Zielinski D., Yuan J., Gordon A., Gymrek M., Erlich Y. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods. 2017;14:590–592. doi: 10.1038/NMETH.4267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Steely C.J., Watkins W.S., Baird L., Jorde L.B. The mutational dynamics of short tandem repeats in large, multigenerational families. Genome Biol. 2022;23:253–272. doi: 10.1186/S13059-022-02818-4/FIGURES/5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Schmiedel B.J., Singh D., Madrigal A., Valdovino-Gonzalez A.G., White B.M., Zapardiel-Gonzalo J., Ha B., Altay G., Greenbaum J.A., McVicker G., et al. Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression. Cell. 2018;175:1701–1715.e16. doi: 10.1016/j.cell.2018.10.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.GTEx Consortium. Laboratory Data Analysis &Coordinating Center LDACC—Analysis Working Group. Statistical Methods groups—Analysis Working Group. Enhancing GTEx eGTEx groups. NIH Common Fund. Jo B., Mohammadi P., Park Y.S., Parsana P., et al. Biospecimen Collection Source Site—NDRI Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Palladino M.A., Bahjat F.R., Theodorakis E.A., Moldawer L.L. Anti-TNF-α therapies: the next generation. Nat. Rev. Drug Discov. 2003;2:736–746. doi: 10.1038/nrd1175. [DOI] [PubMed] [Google Scholar]
- 44.Cattalini M., Soliani M., Caparello M.C., Cimaz R. Sex Differences in Pediatric Rheumatology. Clin. Rev. Allergy Immunol. 2019;56:293–307. doi: 10.1007/S12016-017-8642-3. [DOI] [PubMed] [Google Scholar]
- 45.Vallabhapurapu S., Karin M. Regulation and Function of NF-κB Transcription Factors in the Immune System. Annu. Rev. Immunol. 2009;27:693–733. doi: 10.1146/ANNUREV.IMMUNOL.021908.132641. [DOI] [PubMed] [Google Scholar]
- 46.Crinzi E.A., Haley E.K., Poppenberg K.E., Jiang K., Tutino V.M., Jarvis J.N. Analysis of chromatin data supports a role for CD14+ monocytes/macrophages in mediating genetic risk for juvenile idiopathic arthritis. Front. Immunol. 2022;13 doi: 10.3389/FIMMU.2022.913555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Thompson S.D., Moroldo M.B., Guyer L., Ryan M., Tombragel E.M., Shear E.S., Prahalad S., Sudman M., Keddache M.A., Brown W.M., et al. A genome-wide scan for juvenile rheumatoid arthritis in affected sibpair families provides evidence of linkage. Arthritis Rheum. 2004;50:2920–2930. doi: 10.1002/ART.20425. [DOI] [PubMed] [Google Scholar]
- 48.Wu X., Xu X., Gu R., Wang Z., Chen H., Xu K., Zhang M., Hutton J., Yang T. Prediction of HLA class I-restricted T-cell epitopes of islet autoantigen combined with binding and dissociation assays. Autoimmunity. 2012;45:176–185. doi: 10.3109/08916934.2011.622014. [DOI] [PubMed] [Google Scholar]
- 49.Hayashi M., Jin Y., Yorgov D., Santorico S.A., Hagman J., Ferrara T.M., Jones K.L., Cavalli G., Dinarello C.A., Spritz R.A. Autoimmune vitiligo is associated with gain-offunction by a transcriptional regulator that elevates expression of HLA-A∗02:01 in vivo. Proc. Natl. Acad. Sci. USA. 2016;113:1357–1362. doi: 10.1073/PNAS.1525001113/-/DCSUPPLEMENTAL. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Moroldo M.B., Donnelly P., Saunders J., Glass D.N., Giannini E.H. Transmission disequilibrium as a test of linkage and association between HLA alleles and pauciarticular-onset juvenile rheumatoid arthritis. Arthritis Rheum. 1998;41:1620–1624. doi: 10.1002/1529-0131(199809)41:9<1620::AID-ART12>3.0.CO;2-L. [DOI] [PubMed] [Google Scholar]
- 51.Tsuchiya K., Kimura A., Kondo M., Nishimura Y., Sasazuki T. Combination of HLA-A and HLA class II alleles controls the susceptibility to rheumatoid arthritis. Tissue Antigens. 2001;58:395–401. doi: 10.1034/J.1399-0039.2001.580608.X. [DOI] [PubMed] [Google Scholar]
- 52.Raychaudhuri S., Sandor C., Stahl E.A., Freudenberg J., Lee H.S., Jia X., Alfredsson L., Padyukov L., Klareskog L., Worthington J., et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 2012;44:291–296. doi: 10.1038/ng.1076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Greetham D., Ellis C.D., Mewar D., Fearon U., an Ultaigh S.N., Veale D.J., Guesdon F., Wilson A.G. Functional characterization of NF-κB inhibitor-like protein 1 (NFκBIL1), a candidate susceptibility gene for rheumatoid arthritis. Hum. Mol. Genet. 2007;16:3027–3036. doi: 10.1093/HMG/DDM261. [DOI] [PubMed] [Google Scholar]
- 54.Yau A.C.Y., Tuncel J., Haag S., Norin U., Houtman M., Padyukov L., Holmdahl R. Conserved 33-kb haplotype in the MHC class III region regulates chronic arthritis. Proc. Natl. Acad. Sci. USA. 2016;113:E3716–E3724. doi: 10.1073/PNAS.1600567113/-/DCSUPPLEMENTAL. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Brinkman B.M., Huizinga T.W., Kurban S.S., Van Der Velde E.A., Schreuder G.M., Hazes J.M., Breedveld F.C., Verweij C.L. Tumour necrosis factor alpha gene polymorphisms in rheumatoid arthritis: association with susceptibility to, or severity of, disease? Rheumatology. 1997;36:516–521. doi: 10.1093/RHEUMATOLOGY/36.5.516. [DOI] [PubMed] [Google Scholar]
- 56.Tamiya G., Shinya M., Imanishi T., Ikuta T., Makino S., Okamoto K., Furugaki K., Matsumoto T., Mano S., Ando S., et al. Whole genome association study of rheumatoid arthritis using 27 039 microsatellites. Hum. Mol. Genet. 2005;14:2305–2321. doi: 10.1093/HMG/DDI234. [DOI] [PubMed] [Google Scholar]
- 57.Harney S.M.J., Vilariño-Güell C., Adamopoulos I.E., Sims A.M., Lawrence R.W., Cardon L.R., Newton J.L., Meisel C., Pointon J.J., Darke C., et al. Fine mapping of the MHC Class III region demonstrates association of AIF1 and rheumatoid arthritis. Rheumatology. 2008;47:1761–1767. doi: 10.1093/RHEUMATOLOGY/KEN376. [DOI] [PubMed] [Google Scholar]
- 58.Hashimoto M., Nakamura N., Obayashi H., Kimura F., Moriwaki A., Hasegawa G., Shigeta H., Kitagawa Y., Nakano K., Kondo M., et al. Genetic contribution of the BAT2 gene microsatellite polymorphism to the age-at-onset of insulin-dependent diabetes mellitus. Hum. Genet. 1999;105:197–199. doi: 10.1007/S004390051089. [DOI] [PubMed] [Google Scholar]
- 59.Horton C.A., Alexandari A.M., Hayes M.G.B., Marklund E., Schaepe J.M., Aditham A.K., Shah N., Suzuki P.H., Shrikumar A., Afek A., et al. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science. 2023;381:eadd1250. doi: 10.1126/SCIENCE.ADD1250. [DOI] [PubMed] [Google Scholar]
- 60.Harroud A., Hafler D.A. Common genetic factors among autoimmune diseases. Science. 2023;380:485–490. doi: 10.1126/SCIENCE.ADG2992. [DOI] [PubMed] [Google Scholar]
- 61.López-Isac E., Smith S.L., Marion M.C., Wood A., Sudman M., Yarwood A., Shi C., Gaddi V.P., Martin P., Prahalad S., et al. Combined genetic analysis of juvenile idiopathic arthritis clinical subtypes identifies novel risk loci, target genes and key regulatory mechanisms. Ann. Rheum. Dis. 2021;80:321–328. doi: 10.1136/ANNRHEUMDIS-2020-218481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Schwartz A.M., Demin D.E., Vorontsov I.E., Kasyanov A.S., Putlyaeva L.V., Tatosyan K.A., Kulakovskiy I.V., Kuprash D.V. Multiple single nucleotide polymorphisms in the first intron of the IL2RA gene affect transcription factor binding and enhancer activity. Gene. 2017;602:50–56. doi: 10.1016/J.GENE.2016.11.032. [DOI] [PubMed] [Google Scholar]
- 63.Abramov S., Boytsov A., Bykova D., Penzar D.D., Yevshin I., Kolmykov S.K., Fridman M.V., Favorov A.V., Vorontsov I.E., Baulin E., et al. Landscape of allele-specific transcription factor binding in the human genome. Nat. Commun. 2021;12:2751–2766. doi: 10.1038/s41467-021-23007-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Shared Genome Segment (SGS) analysis software is freely available and can be accessed online: https://uofuhealth.utah.edu/huntsman/labs/camp/analysis-tool/shared-genomic-segment.php. Detailed pedigree structures necessary for these analyses were acquired from the UPDB. These are considered potentially identifiable by the Resource for Genetic and Epidemiologic Research (RGE)—the ethical oversight committee for the UPDB. As a result, access to these data require review by the RGE committee (contact Jahn Barlow, jahn.barlow@utah.edu). Upon RGE approval, pedigree structures in a format ready to be used by the SGS software will be provided. Whole-genome sequencing data generated during this study are accessible on Sequence Read Archive (SRA). The accession number for the sequence data reported in this paper is PRJNA1054595 (“Whole-genome sequencing of juvenile idiopathic arthritis cases”).



