Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Apr 16.
Published in final edited form as: Cell. 2020 Mar 26;181(2):362–381.e28. doi: 10.1016/j.cell.2020.02.057

Evolutionary Selection and Constraint on Human Knee Chondrocyte Regulation Impacts Osteoarthritis Risk

Daniel Richard 1,11, Zun Liu 1,11, Jiaxue Cao 1,2, Ata M Kiapour 3, Jessica Willen 1, Siddharth Yarlagadda 1, Evelyn Jagoda 1, Vijaya B Kolachalama 4,5,6, Jakob T Sieker 3,7, Gary H Chang 4, Pushpanathan Muthuirulan 1, Mariel Young 1, Anand Masson 8, Johannes Konrad 3, Shayan Hosseinzadeh 3, David E Maridas 9, Vicki Rosen 9, Roman Krawetz 8, Neil Roach 1, Terence D Capellini 1,10,12,*
PMCID: PMC7179902  NIHMSID: NIHMS1570746  PMID: 32220312

SUMMARY

During human evolution, the knee adapted to the biomechanical demands of bipedalism by altering chondrocyte developmental programs. This adaptive process was likely not without deleterious consequences to health. Today, osteoarthritis occurs in 250 million people, with risk variants enriched in non-coding sequences near chondrocyte genes, loci that likely became optimized during knee evolution. We explore this relationship by epigenetically profiling joint chondrocytes, revealing ancient selection and recent constraint and drift on knee regulatory elements, which also overlap osteoarthritis variants that contribute to disease heritability by tending to modify constrained functional sequence. We propose a model whereby genetic violations to regulatory constraint, tolerated during knee development, lead to adult pathology. In support, we discover a causal enhancer variant (rs6060369) present in billions of people at a risk locus (GDF5-UQCC1), showing how it impacts mouse knee-shape and osteoarthritis. Overall, our methods link an evolutionarily novel aspect of human anatomy to its pathogenesis.

Graphical Abstract

graphic file with name nihms-1570746-f0001.jpg

In Brief

Epigenetic profiling of chondrocytes reveals selection and constraint on knee regulatory elements including those overlapping osteoarthritis risk variants, including a functionally verified common enhancer variant sufficient to affect knee shape in mouse.

INTRODUCTION

A derived feature of humans and our hominin ancestors is a bipedal gait (Darwin, 1888). From the last common ancestor with chimpanzees, natural selection shaped the hominin hind limb to accommodate the biomechanical demands of bipedality, influencing human knee anatomy (Jungers, 1988; Morrison, 1970). For example, the condyles of the distal femur expanded, dissipating the higher forces generated during bipedal walking/running. The proximal plateau and condyles of the tibia became more symmetrical, its epicondyles buttressed with bone mass, permitting more equal weight distribution and greater knee stabilization (Bramble and Lieberman, 2004; Lovejoy, 2007). These and other derived features were present by ~1.6 million years ago in Homo erectus, and today, variation in human knees is reduced compared to earlier Homo and differs from patterns in African apes (Frelat et al., 2017; Lovejoy, 2007; Tardieu, 1999).

Derived human traits have been shaped by selection and its targeting of early development (Capra et al., 2013; Gokhman et al., 2017; Kanton et al., 2019; Tardieu, 1999). The development of knee features, many present by birth, is tied to the regulation of chondrocytes that prefigure joints (Andersen, 1961; Gardner and O’Rahilly, 1968; Gray and Gardner, 1950; Mérida-Velasco et al., 1997; O’Rahilly, 1951). During development, chondrocytes differentiate within the early knee to form femoral and tibial articular cartilage, while adjacent to the joint they arise to form condyle and plateau epiphyseal cartilages (Decker et al., 2014). These cartilages remain protected from ossification until postnatal life to respond to mechanical loading and hormonal signals during growth (Tardieu, 2010). Adaptive changes to these processes underlie anatomical features of the unique human knee. To this end, modifications to chondrocyte networks, in the form of regulatory changes, are the likely candidates as they typically are modular in effect (King and Wilson, 1975; McLean et al., 2011; Varki and Altheide, 2005). Here, we first investigate the evolutionary basis of human knee development by examining accessible/open chromatin sequences, regions often involved in gene regulation (Shlyueva et al., 2014), via epigenomic profiling of long bone chondrocytes in human and mouse. We find evidence of the effects of ancient selection on knee-specific regulatory elements that likely underlie derived human morphology.

Natural selection undoubtedly shaped knee morphology but not without impacts to health. In the elderly today, there is a high prevalence of knee osteoarthritis (OA), a condition in which joint- and epiphyseal-derived tissues deteriorate. Knee OA risk includes non-genetic factors, such as joint mechanics (Barr et al., 2016; Neogi and Felson, 2016; Neogi et al., 2013), obesity (Felson et al., 1988; Reyes et al., 2016), inflammation (Houard et al., 2013; Richette et al., 2011), and longevity (Berenbaum et al., 2018). However, knee OA risk is ~40% heritable (Loughlin, 2015), a component that interacts with these modern conditions. Genome-wide association studies (GWAS) have revealed over 80 OA loci with ~95% of risk variants present in non-coding sequences, enriched near genes involved in chondrocyte and bone development (Klein et al., 2019; Styrkarsdottir et al., 2018; Tachmazidou et al., 2019; Zengini et al., 2018). These findings suggest a key role for knee chondrocyte regulatory elements in mediating OA risk. Here, we build on our evolutionary analyses and use chondrocyte chromatin datasets to study how regulatory variation, shaped by ancient selection but also more recent evolutionary forces (genetic drift and antagonistic pleiotropy), influence OA risk. We develop a model of how nucleotide changes, occurring within evolutionarily constrained developmental regulatory sequences, violate this constraint and underlie genetic OA risk, and find evidence using at-risk patient data and functional validation studies in human chondrocytes and the mouse model. Our methods link a derived aspect of human anatomy to its pathogenesis, an approach applicable to other features to explore disease links.

RESULTS

Epigenetic Profiling of Long Bone Chondrocytes

We first used the assay for transposase-accessible chromatin with sequencing (ATAC-seq) (Buenrostro et al., 2013) to profile open chromatin regions (hereafter, “elements”) in chondrocytes from E15.5 mouse and stage-matched E59 human proximal/ distal femur and tibia samples (Figure 1A; Table S1; STAR Methods). For the mouse, we cataloged a “general knee set” comprising elements from the distal femur and proximal tibia. This set was filtered for shared proximal femur, distal tibia, and embryonic brain elements to make a “knee-specific set” (Table S1), consisting of elements unique to the distal femur (“DF-specific”), proximal tibia (“PT-specific”), and both knee components (“knee-common [KC]-specific”). An identical approach was used on homologous humeri and radii, and generated “elbow-specific” sets show minimal overlap with knee-specific sets (Table S2), highlighting each sets’ anatomical specificity. Comparing datasets to those using fluorescence-activated cell sorting (FACS) on Col2a1-eCFP-reporter mice, we found strong and significant overlap (Table S2), indicating that gross dissection identifies chondrocyte cell populations typically marked by COL2A1. Given the lack of reliable FACs markers to isolate chondrocytes from procured human tissues, this helps ensure the reliability of dissection on rare human samples. We thus generated ATAC-seq sets on E59 human long bones using the same pipelines, and intersected human and mouse elements from equivalent tissues (e.g., distal-femur), finding a significant degree of overlap (Tables S1 and S2). Overlapping mouse and human element sets show substantial one-to-one orthology (Table S1). Although there is regulatory divergence between species (Yue et al., 2014), this overlap is sizable despite disparate collection sites/protocols and uncontrollable differences in time-points. Mouse and overlapping E59 human elements intersect with biological signals related to chondrocytes and skeletogenesis using GREAT (McLean et al., 2010) (Table S1). We found significant overlap with regulatory activity in other mouse (Guo et al., 2017) and human E47 limb samples (Cotney et al., 2013) and bone marrow-derived chondrocytes (BMDCs) (ENCODE Project Consortium, 2012) (Table S2). We further observed enrichments for motifs similar to chondrogenesis-associated transcription factors (TFs), which significantly overlap with chromatin immunoprecipitation sequencing (ChIP-seq) datasets (Table S2).

Figure 1. Sequence Features of Chondrocyte Epigenetic Profiles.

Figure 1.

(A) Diagram of ATAC-seq tissues. Numbers indicate replicate-consolidated peak calls before filtering with brain data. Inset: genomic distribution of lifted-over DF (top) and PT (bottom) elements.

(B) Per-bp conservation scores for 20 primates (phyloP20ways) aggregated for ATAC-seq sets and compared via Wilcoxon rank-sum test.

(C) Overlaps of human acceleration regions and ATAC-seq sets. KC, knee-common; GM, GM12878; DF, distal femur-specific; PT, proximal tibia-specific; EB, embryonic brain. Overlaps shown relative to set size (per bp of sequence) for background (gray) and target (colored) sets.

(D) Relationship between human-chimp sequence identity (red line, right y axis) and phyloP20ways score averaged over an ATAC-seq peak (blue line, left y axis) for DF-specific elements. Dashed curve and horizontal lines indicate average phyloP20ways scores for random background region set. Shaded regions reflect low/high %ID regions used in GREAT. Significance codes: not significant (ns), < 0.05 (*), < 0.01 (**), < 1e–5 (***).

See also Figure S1 and Tables S1, S2, and S4.

Inter-species Sequence Variation in Element Sets

To understand how accessible elements involved in human knee development have been modified evolutionarily, we broadly examined how primates with diverse locomotor repertoires (Polk et al., 2009) modified these elements. We examined mouse and overlapping E59 human knee element sets for evidence of sequence evolution using phyloP (Pollard et al., 2010), a measure of nucleotide conservation/acceleration. Across primates, knee elements had elevated conservation, with strongest conservation toward each element’s center (Figure S1), likely due to conserved TF sites (Buenrostro et al., 2013). When comparing DF-, PT-, and KC-specific sets, conservation was highest in KC-specific elements (i.e., those that are likely pleiotropic across the knee) (Figure 1B; Table S2). Conservation was reduced in elements unique to each bone’s end, especially for the proximal tibia. These patterns were also observed for overlapping E59 human elements (Table S2).

We next examined signatures of evolution shaping the human knee. Knee sets were intersected with human sequences exhibiting accelerated nucleotide changes (i.e., human accelerated regions) relative to chimps and other apes, often indicative of positive selection (Bird et al., 2007; Bush and Lahn, 2008; Gittelman et al., 2015; Pollard et al., 2006a; Prabhakar et al., 2006). Knee-specific elements significantly overlapped these regions (Figure 1C; Table S4), largely recapitulated using overlapping E59 human elements, despite smaller set sizes. This suggests that ancient positive selection acted on knee elements. Mouse brain elements, mapped to their human orthologs, also enrich for overlaps, as reported for the human brain (Pollard et al., 2006a, 2006b), while enrichment was not seen for B-lymphocyte elements. When examining en masse accelerated sequences in knee-specific elements using GREAT, we identified enrichments for “ossification,” “skeletal phenotypes,” and “limb developmental expression” (Table S4). Two example loci are shown in Figure S1.

We next considered sequence change between humans and chimps. Nucleotide similarity (%ID) positively tracked with cross-primate sequence conservation for each element set (Figure 1D; Table S2). To understand the biological significance of different levels of human-chimp %ID, we ran GREAT on conserved sequences (i.e., third quartile of %ID) from DF- and PT-specific sets (Table S2) and saw enrichments for “regulation of ossification,” “skeletal system development,” and “chondrocyte differentiation.” Because divergence in morphology can result from sequence changes in existing regulatory switches (Cotney et al., 2013), we ran GREAT on first quartile %ID sequences, which tended to retain conservation in primates (Figure 1D, left shaded area), and found enrichments for “mild short stature” and “negative regulation of ossification,” among other terms. Weaker/insignificant enrichments were seen in top/bottom 10% of sequences, with reduced strength due to smaller set sizes and the decreased power of GREAT to find enrichments (Table S2).

Given that human-specific enhancer mutations can impact development (Boyd et al., 2015), we considered the potential for base-pair alterations (i.e., chimp-human substitutions) in knee elements to alter sequence motifs, as shown for brain enhancers (Zehra and Abbasi, 2018). Identified motifs in knee sets (Table S3) were combined with those of known chondrogenesis-related TFs (Liu et al., 2017a) and used to identify the biased occurrence of alterations predicted to modify certain motifs (Figure 2A; Table S3; STAR Methods). Across knee-specific sets for several chondrocyte TF motifs there is a strong tendency for protection from modification (biased against possessing substitutions). For example, putative sequence motifs (i.e., similar to a known TF motif) for FOXP1/FOXP2, factors important for knee morphology (Xu et al., 2018; Zhao et al., 2015), were biased against disruptive changes (Figure 2A, left). Human FOXP1/ FOXP2 ChIP-seq data also shows strong overlap with knee-specific elements (Table S2). Conversely, we observed alterations biasing toward TF motifs (Table S3). For example, predicted motifs for KLF5, involved in chondrogenesis (Shinoda et al., 2008), displayed such biases in the DF-specific set (Figure 2A, right), which also significantly overlap (p < 0.05) KLF5 binding data (Tables S2 and S3) (Figure 2B for an example). We also observed in knee-specific sets a bias for alterations at predicted CTCF motifs, whose loss-of-function results in skeletal defects and spontaneous OA (Hijazi, 2018). Knee-specific sets also significantly overlap with CTCF binding data (Table S2) (Figure 2C for an example).

Figure 2. Sequence Modification of Putative Regulatory Elements.

Figure 2.

(A) Histograms showing depletion and enrichment of human-chimp base pair alterations intersecting predicted FOXP1 and KLF5 TF motifs, respectively, in DF-specific elements relative to randomized sets (p < 0.01). Counts of altered bases in background (gray) and element (red) sets shown as a fraction of total set size. See Table S3.

(B) KLF5 motif example in a DF-specific element intronic to PBX1, a TF regulating HOX expression and endochondral ossification (Capellini et al., 2006; Selleri et al., 2001). Predicted disruption by a human-derived T/G nucleotide change shown. A region of acceleration in a knee element identified upstream of PBX1 (Table S3). Red shading indicates altered base relative to motif logo, blue indicates genomic position of sequence. ATAC-seq regions and phyloPways conservation tracks, UCSC sequence alignment shown.

(C) Example of a human-chimp alteration predicted to improve a CTCF motif within a PT-specific element intronic to NCOR2, involved in skeletal biology (Blake et al., 2017). Format similar to (B).

Examination of Intra-species Sequence Diversity in Element Sets

As selection also shapes within-species sequence variation (Vitti et al., 2013), we acquired variant data for humans (Auton et al., 2015), chimps and gorillas (Prado-Martinez et al., 2013), examining whether mouse and overlapping E59 human element sets differ in patterns of diversity within and between species. Considering human variation we found that diversity in PT- and DF-specific elements is constrained compared to genomic backgrounds, as well as to promoter-TSS and intronic elements (Figure 3A; Table S5; STAR Methods). Comparing knee-sets directly, we observed that DF- and PT-specific elements have significantly reduced diversity compared to KC-specific elements, with more marked constraint in PT relative to DF (Figure 3B). These patterns between specific knee element sets were not observed for orthologous forelimb sets, nor unfiltered “general” knee sets (Figures 3B and S2; Tables S5 and S6).

Figure 3. Intra-species Variation in Regulatory Elements and Human Variation in Knee Morphology.

Figure 3.

(A) Counts of common human variants per bp of sequence for element sets compared to random region sets along with other genomic features; labels correspond to Table S5.

(B) Common human variants intersecting elements in the elbow (left) and knee (right) specific sets were counted and compared across sets.

(C) Comparisons of chimp sequence diversity across knee-specific sets.

(D–F) Common variants in human, chimp, and gorilla intersecting a given ATAC-seq peak counted for all variable sequences in knee-specific sets, expressed as “SNPs per sequence”—mean/median values shown in dashed/bold lines, respectively.

(D) Distal femur-specific (DF).

(E) Proximal tibia-specific (PT).

(F) Knee-common-specific (KC).

(G) Measurements of human medial/lateral tibia and femur via MRI dataset. Volume of cartilage (left) and total area of subchondral bone (right) for medial/lateral portions of both bones were measured across all subjects. Individual points are plotted alongside an outlined density curve, quartiles indicated in dashed lines. Significance codes: not significant (ns), < 0.05 (*), < 0.01 (**), < 1e–5 (***).

See also Figure S2 and Tables S5 and S6.

Corresponding sets in chimps and gorillas were examined for similar patterns of diversity between DF-, PT-, and KC-specific elements (Figure 3C). No significant differences were observed, similarly true for the forelimb and “general knee” sets (Table S5). Comparing patterns of diversity between humans and African apes, we observed reduced variation within DF- and PT-specific elements in humans compared to corresponding chimp and gorilla elements (Figures 3D and 3E). We note that no significant differences in diversity levels in KC-specific elements between species were observed (Figure 3F; Table S5). These patterns were recapitulated in overlapping E59 human elements (Figures S2AS2C and S2JS2L; Tables S5 and S6).

However, the human pattern we observed (Figures 3B3F) could reflect a general increase in genetic diversity in apes, notably chimps (Prado-Martinez et al., 2013). We therefore performed an analysis of sequence set constraint by generating chimp genomic backgrounds and functional annotations and comparing diversity with knee-element sets. We found elevated background diversity across the chimp genome, indicated by the left skew in the distribution. Regardless of skew, knee-specific sets in chimp behave markedly different; specifically, they each exhibit less constraint than promoter-TSS and intronic elements (Figures 3A and S2P; Table S5). Overall, these data reveal a unique human pattern of constraint among DF- and PT-specific elements (i.e., reduction of genetic diversity relative to KC elements and functional annotations, not in African apes), suggesting earlier results were capturing, at least in part, these signals.

Examination of Human Knee Morphological Diversity

Given these sequence constraint patterns, we sought a corresponding phenotypic signal in human knee morphology. Using morphometric data from OA Biomarkers Consortium Project (OABCP) MRI images, we confirmed reduced variation in all proximal tibia measurements compared to matched distal femur features (Figure 3G; Table S5). Intersecting OAI sequence data with knee elements, we found significantly less genetic variation in PT-specific compared to DT-specific elements (Table S5), supporting a genotype-phenotype link between reduced knee regulatory sequence variation genome-wide and reductions in morphological diversity.

Examination of the Biological Impacts of Sequence Variation in Element Sets

We next examined how genetic diversity in knee elements generally impacts phenotypic and regulatory variation within species. Human and chimp sequences in knee sets partitioned by variation were examined using GREAT. For the least variable (first quartile) human and chimp elements, similar enrichments were observed for terms such as “cartilage development” and “chondrocyte differentiation.” Conversely, the most variable (third quartile) sequences showed different enrichments for humans and chimps. The most variable human elements yielded enrichments for “collagen catabolism/metabolism,” “anchoring collagen,” and “osteoarthritis,” functional terms not identified with variable chimp elements (Table S5).

These findings in variable human sequences lead us to examine the relationship between common variation in knee sets and OA loci identified via GWAS. We aggregated 95 lead variants across 83 OA GWAS loci (Table S7) and found that such variants were enriched in knee-specific element sets, including E59 human elements (Figure 4A; Table S7). We saw no enrichment of variants in less-specific human E47 limb datasets, BMDCs, nor for the brain. These findings suggest a specific link between human variation in knee chondrocyte regulatory sequences and OA risk.

Figure 4. Human Variation in Knee Elements and Its Impacts on OA Risk.

Figure 4.

(A) Enrichments for OA risk variants in general knee, knee-specific, human E59 hind limb, and brain region sets, along with human E47 and BMDC H3K27ac ChIP-seq regions. Calculated Z score enrichments over randomized sets shown as −log (p value); red line indicates significance threshold (p < 0.05).

(B) Distribution of predicted motifs intersected by OA risk variants, counted per TF. Significantly enriched factors are indicated—FOXP1 significant following p value correction (p < 0.05).

(C) Distribution of predicted motifs intersecting OA risk variants in knee elements for a set of chondrogenesis-related TFs.

(D) Overlaps of region sets and signals of recent selection calculated for general knee, knee-specific, brain, and human H3K27ac ChIP-seq region sets, along with OA risk-variants and the regions intersecting them. Clustered set overlaps also shown. Hypergeometric tests represented as −log (p value), with sign denoting enrichment/depletion; red lines indicate respective significance cut-offs (p < 0.05).

(E) Top: formalized model for the role of evolutionary history in modern heritable OA risk. Ancient selection acting on ancestral sequence diversity in regulatory elements establishes a derived knee configuration, which is subsequently maintained through ancient purifying selection (i.e., functional constraint). More recently, genetic drift, in combination with antagonistic selection for other traits, increases the frequency of alternative alleles in functionally constrained sites. Bottom: the presence of moderate mutational load in highly constrained elements (e.g., prox. tibia elements), or high mutational load in less-constrained elements (e.g., dist. femur elements), stand to disrupt knee homeostasis and promote pathology risk, while low mutational loads (or low sequence constraint) are more tolerated (i.e., harbor lesser risk of pathology).

(F) The number of alternative alleles falling in chondrocyte regulatory elements counted per-individual for the 1KG3 population (blue) and the OAI patient cohort (red), shown as density distributions with mean values (dashed lines); significance bar indicates Student’s t test result (p < 0.05).

See also Figure S3 and Tables S3, S7, and S8.

As GWAS variants are thought to alter gene expression via TF modifications (Zhang and Lupski, 2015), we examined if OA variants in knee elements modify sequence motifs (Table S7). Although we observed that human-chimp substitutions in knee-specific elements bias against altering FOXP1/FOXP2 motifs, OA variants bias toward altering them (adj. p value < 0.05) (Figure 4B; Table S7). We also observed these OA variants intersecting motifs for chondrogenesis-associated TFs (Figure 4C; Table S7); in particular, pooled knee- and PT-specific elements have variants tending to overlap predicted CTCF and KLF5 motifs (adj. p value < 0.05). Furthermore, knee-specific elements capture a significant proportion of all OA variants intersecting these motifs (>3-fold enrichment, adj. p value <0.05, Table S7). We examined if common variants in knee element sets (i.e., SNPs with minor allele frequency [MAF] ≥ 0.05) frequently modify chondrogenesis-related TF motifs but found a lack of significant bias (Table S3), pointing to the specific enrichment of signals for OA GWAS variants. Nevertheless, at some functional OA-annotated loci (i.e., found via GREAT) common variants were predicted to alter relevant motifs in knee-specific elements. Two examples are shown in Figures S3A and S3B.

Given the polygenic nature of OA (Hunter and Bierma-Zeinstra, 2019), smaller effect-size variants in knee elements may cumulatively contribute to alter chondrocyte regulation, and ultimately risk heritability. We assessed the contributions that variants in knee elements, chondrocyte elements generally, and other genomic features have to OA risk heritability. Using linkage disequilibrium score regression (LDSC) (Finucane et al., 2015), we found orthologous chondrocyte elements pooled from long bones and human E59 hind limb elements captured variants explaining a significant proportion of OA heritability (adj. p < 0.05, Table S7), a feature not seen using brain or B-lymphocyte elements. Variants captured by more refined knee-specific element sets did not reach LDSC significance, owing to limits in ascertaining significant partitioned heritability from very small sets (Finucane et al., 2015). Interestingly, annotations pertaining to sequence conservation, i.e., Genomic Evolutionary Rate Profiling (GERP) score (Davydov et al., 2010), primate phastCons (Hubisz et al., 2011), and predicted allele age (Rasmussen et al., 2014) also associated with OA heritability.

Examination of Recent Evolutionary Forces

We next wanted to determine whether more recent genetic drift and positive selection shaped human variant frequencies in knee-specific elements and influenced the observed disease associations. We considered the behavior of variants in knee elements, investigating pairwise Fst (Weir and Cockerham, 1984) across 18 populations (STAR Methods). Principal-component analysis (PCA) was used to visualize the separation of knee variants based on shared behaviors in population stratification (Figure S3C; Table S8). The first two PCs represented population differences between Eurasia/Africa (PC1) and Europe/Asia (PC2) populations (Figure S3D), capturing the majority (>60%) of observed variation. K-means clustering of variants based on Fst resulted in two groups per knee set (Table S8; STAR Methods): cluster 1 variants have a narrower Fst range across populations (Figures S3E and S3F), while cluster 2 trends toward increased Eurasia/Africa divergence. Using GREAT, we determined whether clusters have divergent functional enrichments (Table S8). Cluster 2 variants were functionally associated with “skeletal development” and related chondrocyte terms, while cluster 1 variants tended to occur near genes with annotations such as “arthritis,” “arthralgia,” “osteoarthritis,” and “knee joint.”

We also assessed whether recent positive selection shaped the patterns of clustered variants, and OA variants in general. We found that only 18% of regions containing clustered knee elements (cluster 1) are known to have undergone positive selection in the last 30,000 years (Jagoda et al., 2018; Pagani et al., 2016), a significant depletion (1.25 fold-decrease, adjusted p value: 5.26e−11) (Figure 4D; Table S8). No cluster 2 set was enriched in selection windows; rather, they trended toward depletion (Figure 4D). We then examined loci for which OA GWAS variants fell in knee elements and selection windows, identifying four (Tables S7 and S8). For three loci (UNC5C-BMPR1B, ENPP1/3, LSM5) the putative OA risk variant occurred on the non-selected, recombined haplotype (Figures S3GS3J). However, at GDF5, OA risk variants were found to reside on the positively selected haplotype (Capellini et al., 2017; Miyamoto et al., 2007). OA GWAS variants, in general, were not enriched in selection regions (Figure 4D), and by one test (Grossman et al., 2013) were strongly depleted (Table S8). Last, TF motif analyses on clustered low- and high-Fst variants did not show strong biases for intersections, indicating a consistent lack of directionality in regulatory modification in line with genetic drift effects (Table S3).

Formal Model of Chondrocyte Knee Developmental Regulation, Evolution, and OA Risk

Our analyses revealed that DF- and PT-specific regulatory elements exhibit evidence in humans of ancient positive selection (Figure 1C), followed by more recent sequence constraint not observed in African apes (Figures 3A3E). Furthermore, the allele frequencies of human common variants in these elements, and OA risk variants in general, appear to have been shaped by genetic drift on the background of constraint (Tables S7 and S8), rather than recent positive selection (Figure 4D). These variants have consistent links to OA (Table S8), likely by impacting TF motifs under constraint (Figures 4B and 4C), and contribute to overall heritability of risk (Table S7). We propose a model in which violations to constraint in functional conserved sequences, tolerated during knee development, have pathological consequences later in life (Figure 4E). Two evolutionary mechanisms may cause violation to constraint thereby increasing genetic risk for OA: drift and antagonistic selection. We provide genetic and functional evidence below addressing how both processes may drive heightened OA risk genome-wide and on a locus-specific level.

Functional Sequence Variation within At-Risk Individuals

Because it is anticipated that drift should increase the frequency of violations underlying OA risk, patients should have higher loads of violating mutations than the general population. To test this, we took the OAI patient cohort, consisting of individuals suffering from, or identified as being at high risk of developing OA, in comparison to 1000Genomes populations. We considered variants falling in knee-specific elements and asked whether, across the set of variable sites, patients tended to possess more alternative alleles. We observed a significant increase in the average number of alternative alleles possessed by subsets of the OAI cohort (Figure 4F), results replicated using another population control (Pagani et al., 2016) (Table S7). To confirm that it is specifically sequence violation within constrained knee regulatory elements that is related to the heightened OA risk in this cohort, we compared the number of alternative alleles falling in B-lymphocyte elements as a control, observing different variant behaviors (Table S7).

Functional Interrogation of an OA Risk Locus

Another mechanism that may lead to constraint violations is antagonistic pleiotropy, in which positive selection increases a beneficial allele along with linked deleterious alleles. Earlier, we uncovered one such locus, Growth Differentiation Factor Five (GDF5). GDF5 is a BMP with quintessential roles in knee development across mammals (Basit et al., 2008; Rountree et al., 2004; Settle et al., 2003), and yet is the most reproducibly associated OA locus (Miyamoto et al., 2007; Zengini et al., 2018). At GDF5, selection on reduced height via a regulatory variant (rs4911178) in GROW1, a growth plate enhancer, likely increased frequencies of linked OA risk variants (Capellini et al., 2017). As our model predicts high-frequency sequence violations contribute to OA risk, we explored this exemplar locus in-depth.

We intersected GDF5 GWAS OA variants (Zengini et al., 2018) with general E15.5 mouse/E59 human knee elements and found putatively causal variants, rs4911178 (G/A) in GROW1 and rs6060369 (C/T) (Figure 5A). Rs4911178 and GROW1 do not impact knees when deleted in mice (Capellini et al., 2017), implicating rs6060369 as causative. Rs6060369 overlaps a knee element, R4, located in downstream sequence shown in rescue experiments to regulate expression and mediate knee morphology of Gdf5 null mice (Chen et al., 2016; Pregizer et al., 2018). To characterize R4 activity, we made stably expressing R4 lacZ reporter mice and found, at E14.5/15, expression in the early knee, including the condylar regions and notch (but not trochlea), and postnatally, expression restricted to femoral condyle and tibial articular chondrocytes and cruciate ligaments (Figure 5B). Compiling human functional genomics data on E47 limbs, E59 distal femur chondrocytes, and BMDCs, we found R4 activity at each time point (Figure 5A), indicating a similar regulatory time course as in mouse.

Figure 5. Functional Characterization of the GDF5 Locus and R4 Enhancer in the Mouse.

Figure 5.

(A) UCSC Genome Browser view of human GDF5 locus with intersections of OA variants, knee ATAC-seq regions (for human and mouse tissue), GDF5 enhancers, and H3K27ac ChIP-seq signals from human embryonic limbs (E47) and BMDCs. Rs4911178 (left red line) and rs6060369 (right red line) overlap functional knee sequences.

(B) R4-driven lacZ expression in inferior-most distal femur (DF) and superior-most proximal tibia (PT) (left) tissues and in adult distal femora (right two images). AC, articular cartilage; FN, femoral notch; IC, inferior condyles; CL, cruciate ligaments; TG, trochlear groove. Scale bars, E14.5 = 250 μm; adult = 1 mm.

(C) P30 knee anatomy in C57BL/6J R4 null mice. #, same trends for medial condyle. R4+/+ (wild-type [WT]), R4−/− homozygous (HOM).

(D) 1-year knee anatomy.

(E) OARSI scores on WT and HOM knees at P30 and 1 year. Triangles, heterotopic ossification.

(F) 3D renditions (top) and histology (bottom) of WT/HOM knees with minimum (blue), mean (green), and maximum (red) OARSI scores. Heterotopic ossification observed in HOM knees with highest OARSI scores and most cartilage damage. Scale bars, 50 μm.

(G) X-ray 3D cartilage scanning of WT/HOM distal femur condyles (top) and proximal tibia platforms (bottom) at 1 year, showing OA lesions (white arrows). Scale bars, 1 mm.

(H) Linear regressions between knee morphology and OARSI score at 1 year. In (C)–(E) colored dots correspond to specimens shown in (F). Significance code: <0.05 (*).

See also Figures S4 and S5 and Table S9.

We then deleted R4 in mice, and using allele-specific expression analyses (ASE) found that loss of one R4 copy downregulated Gdf5 expression in E15.5 distal femur chondrocytes (deletion-allele expression 69.6% ± 0.05% of control expression; n = 4/genotype; p = 0.001). At P30, R4 loss (R4−/−) led to significant distal femur changes, specifically, smaller condyle curvature radius and notch sizes, among other changes (Figures 5C, 7, and S4AS4C; Table S9). R4+/− mice displayed a similar but non-significant trend (Table S9). Despite changes, a normal articular cartilage and joint cavity formed in R4−/− mice (Figure 5F, left). By 1 year, R4−/− mice had exacerbated defects in the same distal femur features (Figure 5D, p < 0.05) and new alterations to the proximal tibia (Table S9). These changes are related to genetic variants and not aging as morphological comparisons occurred between genotypes at each time point separately. Strikingly, OA developed in R4−/− mice as observed via histology, Osteoarthritis Research Society International (OARSI) score (Mann-Whitney test, p = 0.041), and cartilage imaging (Figures 5E5G and S5). This included a loss of glycosaminoglycans throughout articular surfaces with minor fibrillations of isolated surfaces in most individuals and severe damage up to complete denudation of isolated joint surfaces in some individuals (Figure 5F). Using X-ray 3D scanning, we observed OA lesions on the femoral condyle and proximal tibia (Figure 5G). In the most effected mice, heterotopic ossification was observed, a phenotype that occurs in late human OA (Figure 5F). Several morphologic features of R4−/− knees (e.g., condyle curvature, notch size, etc.) were correlated with OA severity; features which have been shown to change during human OA progression (Barr et al., 2016; Hunter et al., 2016; Neogi et al., 2013) (Figures 5H and S4D). However, as these changes were also detected in R4−/− mice at P30, such shape alterations preceded and may have caused OA.

Figure 7. Functional Characterization of the rs6060369 OA Risk Allele in Humanized Replacement Mice.

Figure 7.

(A) ASE assays on E14.5 distal femur chondrocytes from C57BL/6J/129SVJ R4rs6060369-T, rs6060369-A replacement mice (right, p = 0.0005) and C57BL/6J/129SVJ R4 heterozygous (R4+/−) mice (left, p = 0.001).

(B) 3D comparative analysis indicating the locations of largest morphological differences between (left) WT R4rs6060369-A, rs6060369-A and HOM R4rs6060369-T, rs6060369-T hind limbs, as well as between (right) R4+/+ and R4−/− hind limbs (zoom-in images focus on inferior distal femur [top], superior proximal tibia [bottom]). Areas with largest variations are highlighted in red (WT > HOM) and dark blue (WT < HOM), with minimal variation in green/yellow.

(C) mCT measurements of indicated features in base pair replacement and R4 mice at postnatal days P30, P56, and 1 year. See also Figure S4 and Table S9.

We next explored R4 function and its OA variant (rs6060369) in human and mouse chondrocytes in vitro. R4 deletion in human T/C-28a2 articular chondrocytes caused downregulation of GDF5 expression (p < 0.005), but not nearby genes (Figure 6A). Deletion of a 41 bp subregion containing rs6060369 had a similar effect (p < 0.05, Figure 6B). Luciferase reporter assays in these cells for rs6060369 found that the risk allele “T” drove reduced expression (Figure 6C, p = 0.000044; Fisher-combined p = 8.79E–09). We computationally predicted that Pituitary homeo-box 1 (PITX1), a major TF in knee formation (Nemec et al., 2017) and OA factor (Butterfield et al., 2019; Picard et al., 2007), binds to this variant position. Using ChIP on T/C-28a2 cells (Figure 6D), E15.5 distal femur and proximal tibia cartilage, and PITX1 ChIP-seq data (Figure 6E), we validated PITX1 binding at this position, indicating that it likely mediates the cis-acting effects of rs6060369 in R4 on GDF5 expression.

Figure 6. Functional Characterization of the R4 Enhancer in Human and Mouse Chondrocytes.

Figure 6.

(A) Expression by qRT-PCR of CEP250 (not significant [n.s.]), GDF5 (p < 0.005), and UQCC1 (n.s.) in human T/C-28a2 chondrocytes lacking R4.

(B) Expression of CEP250 (n.s.), GDF5 (p < 0.05), and UQCC1 (n.s.) in cells lacking 41 bp containing rs6060369 in R4.

(C) In vitro reporter analyses of R4-driven luciferase activity comparing constructs with the OA risk “T” or non-risk “C” variant in T/C-28a2 cells.

(D) PITX1 ChIP on a R4 sub-region containing rs6060369 in T/C-28a2 cells showing input (left) and pull-down (right) in image.

(E) UCSC Genome Browser view of mouse R4 corresponding to knee ATAC-seq region and PITX1 binding via ChIP-seq (Infante et al., 2013).

We finally edited the rs6060369 orthologous position in C57BL/6J mice to contain the human risk “T” allele. ASE on mice with one R4 “T” allele downregulated Gdf5 expression in E15.5 distal femur chondrocytes (replacement “T” allele expression 82.3% ± 0.05% of control expression; n = 8/genotype; p = 0.0005, Figure 7A), revealing an endophenotype within an anatomically relevant context. Phenotyping P56 humanized mice revealed that several distal femur features, also dysmorphic in R4−/− mice (e.g., notch width; lateral condyle sagittal curvature radius), display alterations in R4rs6060369-T, rs6060369-T mice compared to controls, and in the same direction of effect (Figures 7B and 7C; Table S9). While adjusted comparisons did not yield many significant differences, matched unadjusted analyses demonstrate marked differences in condyle curvature and width, and tibial spine size (Table S9). These precise enhancer-mediated alterations are evident in heatmaps of morphological change between wildtype and R4rs6060369-T, rs6060369-T limbs (Figure 7B). As a subset of effected measures are correlated with OARSI score in R4−/− mice (Figure 5H), and as knee morphology and OA are complex polygenic phenotypes, these findings constitute evidence that the derived “T” at rs6060369 is a causal variant. Moreover, its location in a conserved TF binding site within a functional element that modifies knee development supports our model’s expectations that violations to constraint can confer elevated OA risk.

DISCUSSION

The Evolutionary Developmental Regulation of the Human Knee

To examine the developmental genetic changes involved in the evolution of the human knee, we profiled open chromatin regions from human and mouse developmental samples, comparing patterns of inter- and intra-species sequence evolution across hind- and forelimb elements. Open chromatin regions specific to distal femur and proximal tibia, reflecting site-specific regulatory elements involved in knee chondrogenesis, exhibited signals of the effects of ancient selection during primate and hominin evolution. These include reduced sequence conservation among primates relative to pleiotropic elements, enrichments for human accelerated regions, and significantly reduced diversity relative to pleiotropic elements and certain genome features (e.g., TSS and promoter elements) within humans but not chimps. These findings augment our limited understanding of prenatal knee development, pointing to a series of adaptive regulatory modifications to human knee formation.

Furthermore, we observed that proximal tibia elements display a stronger reduction in genetic diversity than distal femur, suggestive of greater purifying selection acting on this part of the knee to maintain its anatomical integrity. Importantly, we observed corresponding reductions in morphological variation in features of the proximal tibia relative to distal femur in the OAI dataset. Because we could only study KL = 1 patients (i.e., the lowest score depicting initial OA onset), we cannot rule out confounding issues regarding the time course of OA on different anatomical knee subdomains, although there is currently no consensus as to whether one particular bone end is the primary site of OA and morphological change. These findings on proximal tibia constraint in human genetic and morphological data, together with reduced sequence conservation across primates, suggest a more evolutionarily labile feature for locomotor adaptations which, when a more optimal configuration is achieved, becomes “locked-in” and is subject to negative selection pressures to maintain its morphology.

We were unable to examine gained human-specific regions, which could also show evidence of regulatory evolution in the human lineage. Similarly, the assumption of conserved chimp regulatory activity for overlapping elements present in mouse and human may not hold in all cases. Resolving these questions, however, would require chimp developmental samples, which is infeasible and unethical. As the majority of regions identified were shared across chondrocyte zones and uninformative on specific anatomical evolution and patterns, use of chondrocytes derived from chimp iPS cells, while providing a general “chondrocyte” signal, would not inform on anatomically specific open chromatin regions.

Nevertheless, given the expectation that changes to human knee development are mediated at the regulatory level, accelerated regions and more targeted nucleotide changes in existing knee-specific elements (i.e., substitutions modifying TF binding) likely participated in altering conserved regulatory networks. For example, biased occurrence of human-chimp changes intersecting motifs of chondrogenesis-associated factors, suggest that some network components were modified (KLF5 and CTCF) while ancient functional constraints precluded the exploitation of others (FOXP1/FOXP2). Once modified, these regulatory elements fell under new functional constraints to maintain the derived human knee, manifesting in reduced genetic diversity within modern humans at these sites relative to orthologous ape sequences and pleiotropically acting knee elements.

Evolutionary Insights on the Genetic Risk of OA and Causal Variant Discovery

Despite evidence of ancient selection, human variants in knee elements show little evidence of recent positive selection (i.e., within the past 30,000 years), display weak population-level differentiation, exhibit no biases in their intersection of predicted TF motifs, and are enriched near genes that impact OA. These point to the predominant effects of genetic drift on functionally constrained regulatory elements in joint disease. To this end, we found that when regulatory constraint is violated (e.g., through the emergence of a variant disrupting a TF motif) there is greater likelihood for pathogenic consequences. Knee elements more variable in humans, but not chimps, are linked with OA gene annotations, knee-specific elements are enriched for OA GWAS variants, and notably, while predicted FOXP1/FOXP2 motifs are protected against human-chimp substitutions, they were also those likely to be disrupted by OA risk variants. We find it interesting that conditional excision of FoxP2 using Prx1-cre mice leads to knee defects, changes in articular cartilage thickness, and OA (Xu et al., 2018). This suggests that numerous genetic perturbations to constrained FOXP binding sites might underlie human OA risk. Importantly, we also observed impacts at transforming growth factor β (TGF-β) and BMP loci, which not only exhibit signals of adaptive evolution but are involved in knee development (Lyons and Rosen, 2019), OA (Wang et al., 2014; Wu et al., 2016), and possess GWAS variants overlapping knee-specific elements (e.g., BMPR1B and GDF5).

We formulate a model of human knee evolution that impacts OA risk at the regulatory level (Figure 4E). Normally, it would be expected that variants violating constraint in particular regulatory sequences (i.e., those integral to knee development, structure, and/or maintenance) would be selected against due to the fitness consequences of an inefficient bipedal gait. However, recently these violations may have become tolerated due to buffers (e.g., improved health, medical care, footwear, etc.). If so, their selective removal on a genome-wide level would become less efficient (Carnes and Olshansky, 1993), permitting retention and accumulation. Additionally, prehistorically tolerated variants in these elements (i.e., those that only mildly influenced OA risk in the past) may have become more potent risk effectors given complex interactions with modern factors, including obesity, inflammation, and activity changes (Hunter and Bierma-Zeinstra, 2019). In both contexts, given typical late-life onset of OA, the deleterious consequences of variants on morphology and joint homeostasis may emerge later, as joint stress compounds over decades, contributing to elevated disease risk. We observed that chondrocyte open-chromatin regions collectively capture a significant portion of OA heritability, and the OAI patient cohort exhibited a greater proportion of alternative alleles in knee elements specifically. These indicate an increased common variant load in constrained elements, consistent with our model (Figure 4E).

However, it remains unclear how OA prevalence differs between primates. Studies on baboons reveal similar prevalence to humans (Macrini et al., 2013), whereas in wild and captive apes OA prevalence is lower than humans and captive baboons (Lowenstine et al., 2016). This suggests that the nature of OA in humans is quite different compared to African apes. Understanding prevalence is also complicated, because wild catarrhine primates typically show high levels of past trauma, which can be associated with OA and obscure genetic effects (Lowenstine et al., 2016). We did find that primate sequence conservation and the predicted age of variants significantly associate with OA heritability. The latter has been suggested to represent negative selection acting on complex diseases, wherein recent risk variants have had less time to be removed (Gazal et al., 2017). These findings support our model and suggest that part of human OA heritability may result from how genetic variants violate a phylogenetic sequence constraint retained in primates.

Our evolutionary model suggests that violations to constraint may arise not only via drift but also through antagonistic pleiotropy. Recent selection acting on other traits may oppose the functional constraints of a derived knee. Furthermore, recent selection may target the same loci used to sculpt the hominin knee because of their recurring importance to development, growth, and homeostasis of the musculoskeletal system (Salazar et al., 2016; Chan et al., 2010; Capellini et al., 2017) and underlying complex modular cis-regulatory systems (Chen et al., 2016; DiLeone et al., 1998; Guenther et al., 2008; Indjeian et al., 2016). While we did not find enrichment of OA variants nor knee-specific regulatory elements in regions of recent positive selection, several loci were identified that fell in selection windows and may represent examples of pleiotropy-driven sequence violation. Here, we show that one OA risk variant (rs6060369), present on a haplotype under recent positive selection for height (Capellini et al., 2017), is located in a classic BMP locus involved in knee development, GDF5, and in a knee enhancer (R4) located using our ATAC-seq strategy. This enhancer and its variant (rs6060369) alter knee regulation in vivo and in vitro and contribute to OA susceptibility.

The R4 enhancer is found in humans, mice, and anoles (Wang et al., 2018), and thus is part of a conserved BMP network in knee development. Enhancer loss in human chondrocytes downregulates GDF5 expression, whereas its loss in mice (R4−/−) downregulates Gdf5 expression locally and alters knee shape at the specific anatomical locations of activity in prenatal and early postnatal development. Note, the localized knee defects seen in R4−/− mice are only a subset of those plaguing Gdf5 null knees (Pregizer et al., 2018), revealing the modular effects that regulatory elements have on morphology and disease risk. While the R4−/− knee is functional, these mice do not require any experimental insult (e.g., medial meniscus surgical destabilization) to recapitulate several features of human OA progression.

The human OA risk “T” allele at rs6060369 on the selected haplotype now exhibits frequencies between 40%–70% across Eurasia, making it present in billions of people, and confers a 1.3- to 1.8-fold increase in risk (Miyamoto et al., 2007; Zengini et al., 2018). As it resides at a PITX1 binding site, whose binding occurs in human and mouse, this risk variant alters an evolutionary-constrained transcriptional complex governing R4 function, which our model would predict to be causally linked to OA risk. Mice hetero/homozygous for the human risk “T” allele display reductions to Gdf5 expression in vivo and present with alterations in some of the same structures and directions of effect as found in R4 null mice. Yet, this variant modifies knee shape not so dramatically as to disrupt development or early postnatal locomotion. We suggest that by slightly altering knee shape in mice and humans, abnormal joint biomechanics and excessive cartilage wear and degeneration over time may lead to OA, especially when compounded with other risk factors. However, we cannot rule out other non-biomechanical influences for this variant on knee phenotypes. We believe both novel-engineered mouse models will provide effective and realistic tools for testing OA treatments in the context of genetic and non-genetic risk factors. Moreover, that the single base-pair change has such an observable effect on phenotype lends support for GWAS variant testing in the mouse, especially when it permits the proper 3D and physiological context to observe an effect. Overall, the model proposed here represents a generalized means by which hypotheses linking functional sequences, evolutionary history, and modern-day disease may be made. We present genetic and functional analyses, which seek to test such hypotheses in the context of heritable OA risk, and suggest an evolutionary framework in considering the molecular underpinnings of complex diseases as a useful avenue for scientific research.

STAR★METHODS

LEAD CONTACT AND MATERIALS AVAILABILITY

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Terence Capellini (tcapellini@fas.harvard.edu)

During the course of this study, three separate experimental mouse lines (R4 enhancer lacZ line, R4+/− enhancer null, and R4rs6060369-T/+ single allelic replacement) were generated and are available from the Lead Contact after completion of a written Material Transfer Agreement between Harvard University and the requesting institution. Additionally, all ATAC-seq data raw sequencing fastq files and processed peak bed files have been deposited on NCBI GEO (GSE122877). The mouse lines and ATAC-seq datasets, described in this manuscript, are also outlined in the Key Resources Table.

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
PITX1 (G-4) X antibody Santa Cruz Biotechnology, NJ, USA sc-271435; RRID: AB_10658969
Bacterial and Virus Strains
E. coli DH10B ThermoFisher Scientific 18297010
Biological Samples
Human E59 embryonic sample University of Washington Birth Defects Research Laboratory H27242
Human Knee MRI Image Data OA Biomarkers Consortium Project for the National Institutes of Health (FNIH) Project http://oai.epi-ucsf.org./datarelease/
Chemicals, Peptides, and Recombinant Proteins
Collagenase II VWR 80056–222
Nextera Tn5 transposase Illumina #FC-121–1030
Critical Commercial Assays
SuperScript III First Strand cDNA Synthesis Reaction kit Life Technologies 18090010
Pyrosequencing PSQ96 HS System QIAGEN PSQ H96A
Direct-zol RNA Miniprep kit Zymo Research Corporation R2072
Applied Biosystems Power SYBR master mix Thermo Fisher Scientific 4368577
Lipofectamine 2000 Invitrogen 11668–019
Zymo-SpinTM ChIP Kit Zymo Research Corporation D5210
NEBNext High-Fidelity PCR Master Mix NEB M0541L
KAPA Library Quantification Complete Kit KAPA KK4824
Deposited Data
E15.5 Mouse Appendicular Skeleton ATAC-seq This paper GEO: GSE122877
E59 Human Appendicular Skeleton ATAC-seq This paper GEO: GSE122877
Mouse reference genome NCBI build GRCm38.p6 Genome Reference Consortium https://www.ncbi.nlm.nih.gov/assembly/GCF_000001635.26
Human reference genome NCBI build 37, GRCh37 Genome Reference Consortium https://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/
BMDC ChIP-seq (H3K27ac) Roadmap Epigenomics GEO: GSE17312
Human E47 Limb Bud H3K27ac ChIP-seq Cotney et al., 2013 GEO: GSM1039552 GSM1039553
GM12878 ATAC-seq Buenrostro et al., 2013 GEO: GSE47753
Pitx1 ChIP-seq Infante et al., 2013 GEO: GSE41591
ENCODE TF ChIP-seq ENCODE Project Consortium, 2012 See Table S2, Sheets 4–6 for accession codes
Mouse CTCF ChIP-seq DeMare et al., 2013 GEO: GSE42237
Mouse Sox9 ChIP-seq Ohba et al., 2015 GEO: GSE69109
Mouse Runx2 ChIP-seq Meyer et al., 2014 GEO: GSM1027496
Mouse RARb He et al., 2013 GEO: GSE53736, GSM1299599
Mouse Shox2 ChIP-seq Ye et al., 2016 GEO: GSM2177161
PhyloP20ways (hg38) UCSC http://hgdownload.cse.ucsc.edu/goldenpath/hg38/phyloP20way/
PhyloP100ways (hg38) UCSC http://hgdownload.cse.ucsc.edu/goldenpath/hg38/phyloP100way/
Multiz20way Alignment UCSC http://hgdownload.cse.ucsc.edu/goldenpath/hg38/multiz20way/
1000 Genomes Phase 3 Data 1000 Genomes Project Consortium ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/
Great Apes Diversity Project (GADP) Data Prado-Martinez et al., 2013 https://eichlerlab.gs.washington.edu/greatape/data/
RepeatMasker (hg19) UCSC Table Browser https://genome.ucsc.edu/cgi-bin/hgTables
UK Biobanks Summary Statistics - Selfreported OA http://www.nealelab.is/uk-biobank/ http://ldsc.broadinstitute.org/gwashare, http://www.nealelab.is/uk-biobank/
LDSC Baseline LD Model, 1KG3 reference allele frequencies, plink files and weights Alkes Group https://data.broadinstitute.org/alkesgroup/LDSCORE
Estonian Biocenter Human Genome Diversity Panel (EGDP) Pagani et al., 2016 https://evolbio.ut.ee/CGgenomes_VCF/
Human-Chimp Whole-Genome Alignment (hg38 - panTro4) UCSC http://hgdownload.cse.ucsc.edu/goldenPath/hg38/vsPanTro4/
Experimental Models: Cell Lines
Human embryonic kidney (HEK293T) ATCC CRL-11268
Human T/C-28a2 chondrocytes Mary Goldring SCC042
Mouse NIH 3T3 ATCC CRL-1658
Experimental Models: Organisms/Strains
R4 enhancer lacZ line Harvard Genome Modification Facility PHC21
R4+/− enhancer null mouse line Harvard Genome Modification Facility R4(R37)
R4rs6060369-T/+ single allelic replacement mouse line Applied StemCell MC140
C57BL/6J, wildtype Jackson Laboratories 000664
129X1SVJ, wildtype Jackson Laboratories 000691
FVB/NJ, wildtype Jackson Laboratories 001800
Oligonucleotides
See Table S9 for generated primers used in study This Paper N/A
See Table S9 for generated sgRNAs used in study This Paper N/A
sg1:ACAGGGTGGGAGCGCTCAAT Applied StemCell N/A
sg2: GGGTGGGAGCGCTCAATAGG Applied StemCell N/A
Chr2-Reg-III-PM.F CAGGATTCGATG GCTGCTATTACTGAATG Applied StemCell N/A
Chr2-Reg-III-PM.R CACCTCAGCA GAGAGCG Applied StemCell N/A
Allele-specific Expression primers EpigenDx N/A
Recombinant DNA
PX458 plasmid Addgene 101731
pGL4.23 Firefly luciferase Promega E8411
pGL4.74 Renilla luciferase Promega E6921
Software and Algorithms
Bowtie version 2.3.2 Langmead and Salzberg, 2012 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
Samtools version 1.5 Li, 2011a http://samtools.sourceforge.net/
MACS2 version 2.1.1.2 Zhang et al., 2008 https://github.com/taoliu/MACS
Irreproducible Discovery Rate Li et al., 2011 https://www.encodeproject.org/software/idr/,
https://codeload.github.com/nboley/idr/tar.gz/2.0.0
Bedtools version 2.26.0 Quinlan and Hall, 2010 https://bedtools.readthedocs.io/en/latest/
HOMER version 4.10 Heinz et al., 2010 http://homer.ucsd.edu/homer/
R base version 3.4.4 R Development Core Team, 2008 https://www.r-project.org/
regioneR version 1.8.1 Gel et al., 2016 http://bioconductor.org/packages/release/bioc/html/regioneR.html
GREAT version 3.0.0 McLean et al., 2010 http://great.stanford.edu/public/html/
car version 3.0.3 Fox and Weisberg, 2019 https://cran.r-project.org/web/packages/car/index.html
fitdistrplus version 1.0–14 Delignette-Muller and Dutang, 2015 https://cran.r-project.org/web/packages/fitdistrplus/index.html
Tabix version 1.7.24 Li, 2011a http://www.htslib.org/doc/tabix.html
bcftools version 1.8 Li, 2011a https://samtools.github.io/bcftools/bcftools.html
vcftools 0.1.15 Danecek et al., 2011 https://vcftools.github.io/
ProxyFinder Raychaudhuri Lab https://github.com/immunogenomics/harmjan/tree/master/ProxyFinder
LDSC Finucane et al., 2015 https://github.com/bulik/ldsc
Pegas version 0.11 Paradis, 2010 https://cran.r-project.org/web/packages/pegas/index.html
ComplexHeatmap version 1.17.1 Gu et al., 2016 https://github.com/jokergoo/ComplexHeatmap
Factoextra version 1.0.5 Kassambara and Mundt, 2017 https://cran.r-project.org/web/packages/factoextra/index.html
Cluster version 2.0.7–1 Maechler et al., 2018 https://cran.r-project.org/web/packages/cluster/index.html
Plink version 1.90 Chang et al., 2015 www.cog-genomics.org/plink/1.9/
Haploview version 4.2 Barrett et al., 2005 https://www.broadinstitute.org/haploview/haploview
RaxML version 7.3.0 Stamatakis, 2006 https://github.com/stamatak/standard-RAxML
motifbreakR version 1.6.0 Coetzee et al., 2015 https://bioconductor.org/packages/release/bioc/html/motifbreakR.html
MotifDb version 1.6.0 Shannon and Richards, 2014 http://bioconductor.org/packages/release/bioc/html/MotifDb.html
Haploreg version 4.162 Ward and Kellis, 2012 https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php
Uniprobe Hume et al., 2015 http://the_brain.bwh.harvard.edu/uniprobe/about.php/
Crossmap version 0.3.3 Zhao et al., 2014 http://crossmap.sourceforge.net/
BEAGLE5 Browning et al., 2018 https://faculty.washington.edu/browning/beagle/beagle.html
Other
Allele-specific Expression Assays EpigenDx N/A
MicroCT imaging Center for Skeletal Research, MGH N/A
Primer oligonucleotide synthesis Integrated DNA Technologies, Inc N/A
Human oligonucleotide synthesis TwistBioscience N/A
ATAC-sequencing Harvard Bauer Core Facility N/A
CRISPR Mi-seq sequencing MGH CCIB DNA Core N/A
Mouse R4 element targeting Applied StemCell & Harvard Genome Modification Facility N/A

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Animals: Three mouse lines were generated:

  1. The R4 enhancer lacZ line contains the mouse R4 enhancer cloned upstream of a minimal promoter and the lacZ reporter gene and was generated via transgenesis on the FVB/NJ Mus musculus background by the Harvard University Genome Modification Facility (HUGMF). Upon receipt of founder mice, the line was continually backcrossed on FVB/NJ for at least 4 generations. We followed standard breeding and husbandry protocols to maintain and expand the line; and both males and females were used in pre- and post-natal experimental studies examining expression at joint sites at E14.5, E15.5, P0, P56 and P180 (6 months). All breeding, husbandry, euthanasia, and experimental protocols strictly followed IACUC-approved protocols (Capellini: 13–04-161–2) at Harvard University.

  2. The R4+/− enhancer null mouse line contains a deletion of the R4 enhancer and was generated on the C57BL/6J Mus musculus background by the Lead Contact and his laboratory. Upon receipt of founder mice from the HUGMF, each of three separate founder lines (constituting lines 3, 7, and 8) was continually backcrossed on C57BL/6J for at least 4 generations. We followed standard breeding and husbandry protocols to maintain and expand each line. Males from all three lines were used in post-natal experimental studies examining morphology and histology at P30 and P365 (1 year), with cohorts of different genotypes per sex being subjected to the above mentioned methods. All breeding, husbandry, euthanasia, and experimental protocols strictly followed IACUC-approved protocols (Capellini: 13–04-161–2) at Harvard University.

  3. The R4rs6060369-T/+ single allelic replacement mouse line contains a single “T” allelic base-pair replacement of the orthologous human rs6060369 variant in the R4 enhancer and was generated on the C57BL/6J Mus musculus background by Applied StemCell. Upon receipt of founder mice, the line was continually backcrossed on C57BL/6J for at least 4 generations. We followed standard breeding and husbandry protocols to maintain and expand the line; and both males and females were used in post-natal experimental studies examining morphology and histology at P56, with cohorts of different genotypes per sex being subjected to the above mentioned methods. All breeding, husbandry, euthanasia, and experimental protocols strictly followed IACUC-approved protocols (Capellini: 13–04-161–2) at Harvard University.

Cell lines: Human embryonic kidney (HEK293FT, female) cells were acquired from Dr. Pardis Sabeti (Harvard University and Broad Institute). T/C-28a2 human chondrocyte cells (female) were acquired from Dr. Li Zeng (Tufts University) courtesy of Dr. Mary Goldring (The Hospital for Special Surgery). Both cell lines were cultured at 5% CO2 at 37°C in Dulbecco’s Modified Eagle’s Medium (DMEM), 10% fetal bovine serum (FBS), and 1% penicillin-streptomycin (P/S). Media was replaced every 2–3 days and the cells were sub-cultured every 5 days. After receipt of each cell line from source institution, cells were passaged and used in experimental assays without additional STR authentication or mycoplasma testing.

Human Samples: The human product of conception at gestational day (E) 59, female, was collected from late first-trimester termination through the Laboratory of Developmental Biology at the University of Washington in full compliance with the ethical guidelines of the National Institutes of Health and with the approval of the University of Washington Institutional Review Boards for the collection and distribution of human tissues for research, and Harvard University for the receipt and use of such materials. The Laboratory of Developmental Biology obtained written consent from all tissue donors. The University of Washington Birth Defects Research Laboratory was supported by NIH award number 5R24HD000836 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development. Harvard University IRB determined this sample constitutes Non-Human Subjects Determination Status (Capellini: IRB16–1504). The Lead Contact (Capellini) received no federal funds (e.g., NIH) to acquire, receive, process, or utilize this sample. The human sample was briefly washed in HBSS and transported at 4°C during shipment. Upon arrival the sample was dissected under a light dissection microscope in identical fashion to all mouse samples reported above, and subjected to the ATAC-seq protocol described below, and following approved Harvard University IRB (Capellini: IRB16–1504) and COMS (Capellini: 18–103) protocols.

Bone and cartilage measurements of 600 knees were derived from the OA Biomarkers Consortium Project for the National Institutes of Health (FNIH) Project, which used the male and females patient knee MRI images from the Osteoarthritis Initiative (see http://oai.epi-ucsf.org and https://fnih.org/what-we-do/biomarkers-consortium/programs/osteoarthritis-project). The OAI is a public–private partnership between the NIH (contracts N01-AR-2–2258, N01-AR-22259, N01-AR-2–2260, N01-AR-2–2261, and N01-AR-2–2262) and private funding partners (Merck Research Laboratories, Novartis Pharmaceuticals, GlaxoSmithKline, and Pfizer, Inc.) and is conducted by the OAI Study Investigators. Private sector funding for the OAI is managed by the Foundation for the NIH. The OAI was also funded by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (grant HHSN-268201000019C). Harvard University IRB determined the manner in which these sample will be utilized constitutes Non-Human Subjects Determination Status (Capellini: IRB19–0019).

METHOD DETAILS

Animal Models

R4+/− enhancer null line:

The following steps were used to generate R4 enhancer null mice:

1. sgRNA guide design for targeting R4

We used http://zlab.bio/guide-design-resources to design sgRNAs, and ordered sgRNA oligo from Integrated DNA Technologies, Inc. The guide sequence information is provided in Table S9.

2. sgRNAs cloning, cell transfection, and deletion of the R4 element in vitro.

We cloned sgRNAs targeting R4 into the PX458 plasmid following protocol (Ran et al., 2013) (step 5-part B) and tested the efficiency of sgRNAs in vitro in mouse NIH 3T3 cells. Briefly, NIH 3T3 cells were maintained in DMEM (GIBCO) supplied with 10% Fetal Calf Serum (GIBCO) and 1% Pen/Strep (0.025%). Cells were passaged once every 3 days. We seeded 0.3×106 cells per well of a 6-well plate at 1 day prior to transfection. On the day of transfection, cell culture media was replaced with fresh complete media. Next, 1 μg sgRNA1-PX458 and 1 μg sgRNA2-PX458 were combined with 250 μL Opti-MEM (Invitrogen); while 6 μL Lipofectamine 2000 (Thermo Fisher Scientific) and 250 μL Opti-MEM (Invitrogen) were combined. Both reactions were incubated at room temperature for 5 minutes, and then combined and incubated for 25 minutes. We added the entire 500 μL into each well, and placed the transfected cells at 37°C with 5% CO2. After 48 hours of transfection, we observed the plates under a GFP-microscope (Zeiss, Inc), noting strong GFP signals at ~70% efficiency. Cells were then harvested, and DNA extracted by using E.Z.N.A Tissue DNA Kit according to the manufacture’s protocols. Next, we tested the efficiency of sgRNAs by using touch down PCR; PCR components included 10 μL Taq PCR Master Mix (BioBasic), 1 μL (5 μM) Forward Primer, 1 μL (5 μM) Reverse Primer, and 1 μL DDH20; Touch-down PCR cycling conditions were initial 94°C for 4 minutes; 35 cycles of 94°C for 30 s, 58–52°C for 30 s ramp down 1°C every 5 cycles, 72°C for 1 minute; and 72°C for 10 minutes; Primer information listed in Table S9. PCR products were then run on a 1% agarose gel containing 0.5 μg/mL of Ethidium bromide at 100V for 1 hour. The DNA bands visualized on the gel indicated the targeted, deleted region of 1.56kb encompassing the R4 element. After gel purification using the E.Z.N.A Gel Extraction Kit, amplicons were sent to Eton Bioscience Inc for sequencing. Sequencing confirmed that the R4 element was deleted in vitro.

3. Deletion of the R4 element in vivo

Following previously published protocols (Capellini et al., 2017; Ran et al., 2013) in vitro transcription of sgRNAs targeting R4 was performed with products then sent to HUGMF for microinjection into wild-type C57BL/6J pronuclei. After, we recovered 4 F0 mice and in all cases deletion of 1.56kb region encompassing R4 in founders was confirmed on DNA extracted from mouse tails using the strategy described above with PCR primers listed in Table S9. These F0 founders were then crossed once again to wild-type C57BL/ 6J mice to confirm allelic transmission, upon which we recovered three separate functional R4+/− F1 lines (constituting lines 3, 7, and 8). All three lines were then backcrossed for 4 generations to C57BL/6J wild-type mice to purify each line and remove possible off-target effects. Additionally, given the finding that sgRNAs can cause local alterations (deletions/insertions/rearrangements/substitutions) within a 5kb vicinity of the intended target site (Kosicki et al., 2018), homozygous null mice were screened using local primers targeting this interval followed by Sanger sequencing, resulting in no non-intended modifications.

R4rs6060369-T/+ single allelic line:

Initially, this variant position was examined across all sequenced strains of mice in order to find evidence of the “T” variant change. Interestingly, in C57BL/6J mice as well as all other 88 mouse strains reported in the MGI strain database, the “T” variant is absent, but instead these mice naturally harbor a different, fixed ancestral variant “A” compared to the primate ancestral “C” allele found in some humans. We next examined patterns of conservation across sequenced vertebrates to identify instances of the derived “T” allele as compared to the mouse “A” allele and human ancestral “C” allele. As the R4 element is highly conserved through Coelacanth, 83 sequenced Sarcopterygii had detectable orthology at this element, with 43 species possessing the fixed mouse “A” variant, 36 species possessing the human ancestral “C” variant (including catarhine primates), 1 possessing a deletion of the element (cape elephant shrew in Afrotheria), and only 3 species [squirrel monkey (a single point mutation), brush tailed rat (a two base mutation), X. tropicalis (general poor sequence orthology)] possessing some form of the “T” variant. This indicated that in general the “T” variant is especially novel in humans compared to all sequenced catarhines including hominoids, and extremely rare across sequenced vertebrates. Therefore, we sought to replace the orthologous “A” variant, which appears to be tolerated evolutionarily across multiple Sarcopterygii and is fixed within laboratory mice, with the derived human risk allele “T.”

Targeting of the mouse ortholog of the human rs6060369 variant for replacement with the human risk variant at this locus was performed in collaboration with Applied StemCell. Specifically, the goal was to use CRISPR-Cas9 gene targeting to introduce a single base-pair change at chr2:155,863,176 in the C57BL/6J mouse strain, in which the ancestral “A” variant was substituted by the human risk “T.” To achieve this, a mixture containing in vitro transcribed active sgRNAs (sg1:ACAGGGTGGGAGCGCTCAAT and sg2: GGGTGGGAGCGCTCAATAGG), a single-stranded oligo deoxynucleotide (ssODN), and Cas9 protein was first microinjected into C57BL/6J embryos acquired from Jackson laboratory. Next, initial founder mice were screened for mutations in the region by extracting DNA from tail tissues and then PCR amplifying the target region using two primers (Chr2-Reg-III-PM.F CAGGATTCGATGGCTGCTATTACTGAATG and Chr2-Reg-III-PM.R CACCTCAGCAGAGAGCG) and the following conditions: All PCR amplifications were prepared in 25 μL using MyTaqTM Red Mix (Bioline, Cat#, BIO-25044) and the amplifications were carried out using the following program: 95°C for 2 minutes; 35 cycles of 95°C for 15 s, 60°C for 15 s, 72°C for 1+ minutes depending on amplicon size; and 72°C for 5 minutes. Finally, next generation sequencing (NGS) libraries were prepared and subsequently sequenced using Illumina PCR genotyping to identify chr2:155,863,176A > T point mutation.

After microinjecting greater than 300 embryos, and screening of five potential founders, we recovered one F0 mouse with the “T” replacement confirmed using NGS on DNA extracted from its tail. As assessed by sequencing, this F0 founder lacked any other artifactual alteration to the locus. This F0 was then crossed once again to wild-type C57BL/6J mice to confirm allelic transmission, upon which we recovered a number of transmitting F1 C57BL/6J R4rs6060369-+, rs6060369-T males and females. These mice were then backcrossed for four generations to C57BL/6J wild-type mice to remove possible off-target effects. Additionally, given the finding that sgRNAs can cause local alterations (deletions/insertions/ rearrangements/substitutions) within a 5kb vicinity of the intended target site (Kosicki et al., 2018), homozygous replacement mice were screened using local primers targeting this interval followed by Sanger sequencing, resulting in no non-intended modifications. Subsequently, C57BL/6J R4rs6060369-+, rs6060369-T mice were intercrossed to generate C57BL/6J R4rs6060369-+, rs6060369-+, R4rs6060369-+, rs6060369-T, and R4rs6060369-T, rs6060369-T mice for all downstream functional experiments.

Phenotypic Assessment (of mouse lines)

MicroCT and morphometrics:

To quantify phenotypes in R4+/+, R4+/−, and R4/ mice, right femur and tibia of 15 male P30 mice (5 wild-type, 5 heterozygous, and 5 homozygous) and 35 male 1 year old mice (6 wild-type, 15 heterozygous and 14 homozygous) were scanned using high-resolution Micro-Computed Tomography (μCT40, SCANCO Medical AG, Brüttisellen, Switzerland) at the Center for Skeletal Research (CSR) at Massachusetts General Hospital, and in the laboratory of Vicki Rosen at the Harvard School of Dental Medicine. To quantify phenotypes in R4rs6060369-+, rs6060369-+, R4rs6060369-+, rs6060369-T, and R4rs6060369-T, rs6060369-T mice, right femur and tibia of 20 male/female P56 (6–8 per genotype) were also scanned. Scan parameters were: 12 μm3 isotropic voxel size, 70 kVp peak X-ray tube intensity, 114 mA X-ray tube current, and 200 ms integration time. While there is an effect of sex on knee osteoarthritis risk in humans, with females generally showing higher risk than males, at the GDF5 locus there has been no identified sex-specific effect on osteoarthritis risk (Miyamoto et al., 2007; Zengini et al., 2018). We therefore chose to only include males in our study. Digital Imaging and Communications in Medicine (DICOM) images were exported for measurements of several femoral and tibial anatomical indices in Osirix MD v7.5 (Pixemo SARL, Bernex, Switzerland), using clinically established protocols (Charles et al., 2013; Hashemi et al., 2008; Howell et al., 2010; Schwartz, 1989; Sonnery-Cottet et al., 2011) (Figure S5A). Micro-CT images were also used to generate 3D models of each bones using bone segmentation process in a commercially available image processing software (Mimics v17.0, Materialise). The 3D models were then imported to 3-matic software package (v9.0, Materialise) and then co-registered together using a global n-point registration technique. The registered models from the wild-type and homozygous mice was then used to generate 3D heatmaps indicating the geometrical differences between the genotypes at P30 and at 1 year. The heatmaps were generated by calculating the distance between the corresponding points in co-registered models, where dark blue indicates the maximum deviation in the negative direction and red indicates the maximum deviation in the positive direction.

3D X-ray microscopy (XRM):

Hind limbs (n = 6 R4+/+ n = 10 R4/) were fixed in 10% neutral buffered formalin (NBF) for 24h, then carefully disarticulated under a dissection microscope to expose the articular cartilage of the femur and tibia, and subsequently stored in 70% ethanol solution. Femora and tibiae were incubated in 1% phosphotungstic acid (PTA) contrast enhancer at room temperature for 12–18h before imaging (Das Neves Borges et al., 2014). Samples were attached and enclosed onto custom specimen chamber and imaged in a high-resolution X-ray microscopy (XRM) scanner (Zeiss, Xradia 520 Versa) employing low energy (40 kVp voltage, 3W power) X-ray source. XRM scans (2001 projections, 360o scan) were performed using a 4X objective, with resolution of 4.4 μm/pixel. The 3D XRM datasets of the isolated femurs and tibia were imported into Amira software Version 6.0.1 (FEI, Portland, USA) for image reconstruction and visualization of cartilage injury between the HOM and WT mice.

X-gal staining:

Whole-mount staining for β-galactosidase activity was performed as previously described (Capellini et al., 2017). Gestational day (E) 14.5 R4 enhancer lacZ positive embryos were hemisected and fixed in 4% paraformaldehyde (PFA) (Sigma, 158127) at 4°C, according to gestational-day guidelines. Fixed embryos were washed three times in wash buffer and stained for 16–24 hours in the dark with 1 mg/ml X-gal (Sigma, B4252) in staining buffer at room temperature. Adult R4 enhancer lacZ positive mice were skinned and eviscerated, and the limbs were removed. Each limb then had its most superficial muscles removed and its joint capsules gently pierced to permit solutions from entering into the joint cavity. Fixation and staining times were adjusted accordingly for these post-natal specimens. After staining, embryos and post-natal limbs were briefly rinsed in wash buffer and post-fixed in 4% PFA at 4°C for 5 hours. For sectioning, X-gal stained embryos were placed first in sucrose and then embedded in gelatin/sucrose solution and cryo-sectioned at 25 μm. Sections were counterstained with Nuclear Fast Red (Vector labs, #H-3403).

Assessment of osteoarthritis:

To assess the cartilage status in R4 enhancer null mice, right hind limbs of 12 male P30 mice (3 wild-type and 9 homozygous) and right hind limbs of 20 male 1 year old mice (6 wild-type and 14 homozygous) were obtained for histological analysis. Given that we observed significant changes in shape at P30 and 1 year in R4/ (with only a trend of such effect in R4+/− mice) we chose to only examine wild-type and R4/ mice for signs of osteoarthritis at P30 and 1 year. The also held true for R4rs6060369+/ mice compared to R4rs6060369 homozygotes. After removal of the skin and excess muscle, all limbs were fixed in 10% neutral buffered formalin for 24 hours. The 1 year old mice were stored in 70% ethanol until completion of the micro-computed to-mography. Note that each litter of mice consisted of all genotypes, and each litter was stored in 70% ethanol for the exact same duration. The limbs were then decalcified in 14% Ethylenediaminetetraacetic acid (EDTA) at pH 7.5 for 4 and 8 days for P30 and 1 year old specimen, respectively. The specimens were then dehydrated by incubation in 70% ethanol, 95% ethanol, 100% ethanol, 1:1 ethanol-xylene-solution, 100% xylene (all at room temperature), and paraffin (at 60°C) (Sigma). The latter was exchanged once before embedding. Each incubation period was 12 hours for P30 and 1 day for 1 year old specimens. The specimens were then bisected in a frontal plane and embedded. The formalin-fixed paraffin-embedded tissues were then cut into 6 mm sections and a representative section from the center of each knee was selected for alcian blue staining at pH 2.5 with nuclear fast red counterstain. Alcian blue was selected for the evaluation of cartilage integrity in this study, as it is commonly used to stain skeletal preparations in developmental biology investigations (in combination with Alizarin Red S), and thus allows highlighting cartilaginous elements in a coherent color across all of our experiments. The P30 specimens were stained in one batch at Boston Children’s Hospital, while the 1 year old specimens were stained at Tufts Medical Center due to the fact that one of the co-authors (JS) moved institutions. All 1 year specimens were initially stained in one batch. If a specimen’s cartilage surface was not fully visualized due to artifactual folding of the cartilage on the slide, another adjacent section of the specimen was submitted for staining (batch 2). If required, a second additional section was submitted for staining (batch 3). For all three batches the identical staining protocol was utilized ad no difference in staining intensity was observed between the three batches. Embedding, sectioning, and staining were performed without knowledge of the genotype.

Alcian blue stained sections were then scored by one reader (JTS) who was blinded to the genotype, according to the Osteoarthritis Research Society International (OARSI) histopathology initiative recommendations for the histological assessment of osteoarthritis in the mouse (Glasson et al., 2010). The intra-observer reproducibility has been shown to be high for this scoring system (Glasson et al., 2010). Briefly, medial tibial plateau (MTP), medial femoral condyle (MFC), lateral tibial plateau (LTP), and lateral femoral condyle (LFC) were assigned a semiquantitative score ranging from 0 to 6, with 0 representing a normal articular cartilage and 6 representing vertical clefts/erosion to the calcified cartilage extending > 75% of the articular surface. In addition, a score of 0.5 indicates the loss of alcian blue staining without structural changes. The sum score of all four joint surfaces was determined (referred to as OARSI Score in the manuscript) and selected as primary outcome. In addition, osteophytes were scored according to the OARSI recommendations, by assigning one semiquantitative score to the entire joint ranging from 0 to 3, with 0 indicating absence of osteophytes and 3 indicating severe osteophyte formation. Comparisons were only made between genotypes at single time points and not across time points.

One P30 limb (wild-type) disintegrated during processing and was omitted. A total of four individual joint surfaces were obscured despite re-staining and the respective scores were omitted from the sum OARSI Score (two wild-type and two homozygous).

Allele-specific expression analyses (ASE):

Timed matings were established between C57BL/6J R4+//− heterozygous mice and 129X1/SVJ R4+/+ wild-type mice. Likewise, timed matings were established between C57BL/6J R4rs6060369-+, rs6060369-T heterozygous mice and 129X1/SvJ R4 rs6060369-+, rs6060369-+ wild-type mice. Pregnant females were sacrificed according to IACUC-approved protocols to acquire E15.5 embryos. Right and left hind limbs were stripped of all soft tissues, and distal femoral and proximal tibia chondrogenic tissues were dissected from each limb, with each tissue (e.g., left and right distal femur) placed in TRIzol reagent (15596–026, Ambion by Life Technologies) with a homogenizer bead, and then homogenized for 2 minutes with a tissue homogenizer (QIAGEN). Samples were then stored at −80°C. RNA was then isolated with TRIzol reagent and a Direct-zol™ RNA Miniprep Kit (supplied with DNase I, Zymo) (For R4 enhancer tests: n = 4 distal femoral biological replicates and; n = 4 proximal tibia biological replicates; for R4 rs6060369 variant tests: n = 8 distal femur biological replicates; n = 8 proximal tibia biological replicates). Samples were then run on a Bioanalyzer to ensure RNA integrity numbers greater than 8. These RNA samples were then reverse transcribed with a SuperScript III First Strand cDNA Synthesis Reaction kit (18090010, Life Technologies) according to the manufacturer’s recommendations. Independently, tails from each embryo were used for R4 enhancer and R4 rs6060369 genotyping as described above.

cDNA samples were then sent to EpigenDx for allele-specific expression-assay design and execution. SNPs in the coding regions of Gdf5 (rs27340038), Cep250 (rs27339949), and Uqcc1 (rs3684985) were identified by EpigenDx. Pyrosequencing for SNP genotyping (PSQ H96A, QIAGEN Pyrosequencing) is a real-time sequencing–based DNA analysis that quantitatively determines the genotypes of single or multiple mutations in a single reaction. Briefly, 1 ng of sample cDNA was used for PCR amplification. PCR was performed with 10X PCR buffer (QIAGEN) with 3.0 mM MgCl2, 200 M each dNTP, 0.2 μM each of the forward and reverse primers (available through EpigenDx), and 0.75 U of HotStar DNA polymerase (QIAGEN) per 30 μL reaction. The PCR cycling conditions were 94°C for 15 minutes; 45 cycles of 94°C for 30 s, 60°C for 30 s, 72°C for 30 s; and 72°C for 5 minutes. One of the PCR-primer pairs was biotinylated to convert the PCR product to single-stranded DNA-sequencing templates with streptavidin beads and the PyroMark Q96 Vacuum Workstation. 10 μL of the PCR products was bound to streptavidin beads, and the single strand containing the biotinylated primer was isolated and combined with a specific sequencing primer (available through EpigenDx). The primed single-stranded DNA was sequenced with a Pyrosequencing PSQ96 HS System (QIAGEN Pyrosequencing) according to the manufacturer’s instructions (QIAGEN Pyrosequencing). The genotypes of each sample were analyzed with Q96 software AQ module (QIAGEN Pyrosequencing).

qRT-PCR analysis (GDF5, CEP250, UQCC1):

Total RNA was extracted from T/C-28a2 cells (n = 3 biological replicates, with 4 technical replicates per experiment per condition) and prepared using the Trizol Reagent (Thermo Fisher Scientific, Springfield Township, New Jersey) and Direct-zol RNA Miniprep kit (ZYMO). Two micrograms of total RNA were used to synthesize first-strand cDNA using SuperScript III First-Strand Synthesis System (Thermo Fisher Scientific). qRT-PCR analysis was then performed with specific primers and Applied Biosystems Power SYBR master mix (Thermo Fisher Scientific) with GAPDH house-keeping gene as an internal control. Primers used for qRT-PCR are listed in Table S9.

Cloning Methods

R4 lacZ reporter line:

Cloning of constructs used to generate R4 lacZ stable-expressing reporter mouse lines is described above.

R4+/− enhancer null line:

Cloning of constructs used to generate CRISPR-Cas9 sgRNA for targeting R4 in vitro and in vivo is described above.

R4rs6060369-T/+ single allelic line:

Cloning of constructs used to generate CRISPR-Cas9 sgRNA for targeting rs6060369 ortholog in vivo was performed by Applied StemCell (see above).

sgRNAs targeting R4/rs6060369 in cell lines:

All sgRNAs flanking the human R4 regulatory element or a smaller 41 bp region within R4 containing the osteoarthritis-risk variant rs6060369 were designed using MIT CRISPR Tools (http://zlab.bio/guide-design-resources), synthesized by Integrated DNA Technologies, Inc (Coralville, Iowa), and cloned into the PX458 vector following published protocols (Ran et al., 2013). The sequence of all sgRNAs along with their chromosomal locations (hg19) are listed in Table S9.

Transfections using rs6060369 enhancers:

To investigate the functional effects of the osteoarthritis-risk and non-risk variants at rs6060369, each human variant R4 enhancer sequence (i.e., differing only at the non-risk “C” or risk “T” allele) was cloned into the pGL4.23 Firefly luciferase reporter construct, initially acquired from Promega (Madison, Wisconsin). A control pGL4.74 Renilla luciferase plasmid was also acquired from Promega. Primers were first designed to amplify the R4 regulatory element from human synthesized sequences (TwistBioscience). These primers possessed KpnI/HindIII linker sequences for cloning into pGL4.23 vector. These primers are listed in Table S9. The following PCR protocol was used to amplify each region of interest: 98°C for 30 s; 34 cycles at 98°C for 20 s, 60°C for 20 s, 72°C for 30 s; and a 5 minute final extension at 72°C. Each amplicon was then ligated into a KpnI/HindIII digested pGL.4.23 Firefly luciferase vector. The digestion occurred in NEBuffer 2.1 (NEB B7202S) and ligation was performed using T4 DNA Ligase (NEB M0202S), following standard protocols (New England Biolabs, Beverly, Massachusetts). Insert-containing and empty (without regulatory insert) construct ligates were then transformed into E. coli DH10B cells. These cells were then streaked onto Luria Bertani (LB) agar plates containing ampicillin (50 μg/ml), from which single colonies were picked, screened (using the same PCR protocol as above), and sequenced. Using the E.Z.N.A. Endo-Free Plasmid DNA Maxi Kit (Omega D6929–03), the colonies were purified. Each construct then underwent Sanger sequencing to confirm sequence identity and orientation of each insert as well as the entire pGL4.23 Firefly vector sequence.

Cell lines and culture conditions:

Human embryonic kidney (HEK293FT) cells were acquired from Dr. Pardis Sabeti (Harvard University and Broad Institute). T/C-28a2 human chondrocyte cells were acquired from Dr. Li Zeng (Tufts University) courtesy of Dr. Mary Goldring (The Hospital for Special Surgery). Both cell lines were cultured at 5% CO2 at 37°C in Dulbecco’s Modified Eagle’s Medium (DMEM), 10% fetal bovine serum (FBS), and 1% penicillin-streptomycin (P/S). Media was replaced every 2–3 days and the cells were sub-cultured every 5 days.

Human Cell Line Transfection

Transfection and targeting R4 enhancer:

Guide RNAs, sg1 and sg2, flanking an ~2kb region encompassing the R4 element, and guide RNAs, sg3 and sg4, flanking a 41 bp region containing osteoarthritis risk variant rs6060369 were initially tested for ability to induce efficient deletions of the human element in cultured HEK293FT cells (n = 2 biological replicates per assay). After two days of culture at 37°C, transfected HEK293FT cells were examined under a GFP-microscope (Zeiss) to verify successful transfection and GFP expression. DNA was then extracted using E.Z.N.A Tissue DNA Kit, and the R4 regulatory element region was amplified using PCR with primers flanking the sgRNA locations (listed in Table S9) using program above. PCR amplified products were isolated from 1% agarose gel (E.Z.N.A Gel Extraction Kit) and Sanger sequenced to verify deletions corresponding to the R4 element and the 41 bp region.

After confirmation that all sgRNAs worked efficiently in HEK293FT cells, we performed in vitro deletion of the R4 enhancer element in the T/C-28a2 chondrocyte cell line (Finger et al., 2003; Kokenyesi et al., 2000). The same sgRNAs were used as above. T/C-28a2 cells were maintained in DMEM (GIBCO, Gaithersburg, Maryland) supplied with 10% FBS (GIBCO) and 1% Pen/Strep (0.025%), and seeded in a six-well plate 1 day prior to transfection. After culturing at 37°C, we scanned cells under a GFP-microscope to verify successful transfection efficiency (i.e., > 70% of cells were GFP positive) and GFP expression. DNA was then extracted using E.Z.N.A Tissue DNA Kit, and the R4 regulatory element region was amplified using PCR primers flanking each sgRNA location (listed in Table S9). PCR amplified products were isolated from 1% agarose gel (E.Z.N.A Gel Extraction Kit). Mi-Seq sequencing at the MGH CCIB DNA Core was used to verify successful targeting of the larger R4 regulatory region with a modification efficiency of ~10% and smaller 41 bp R4 regulatory region containing rs6060369 with a modification efficiency of ~9%. RNA was also extracted from control and CRISPR-Cas9 targeted T/C-28a2 cells. RNA was DNase-treated and converted to cDNA for qPCR (see below). Three biological replicate experiments were performed per condition.

Transfecting rs6060369 enhancer constructs:

Prior to transfection, T/C-28a2 cells were seeded in 96-well dishes at a density 3 × 104 cells/well and cultured in DMEM with 10% FBS and 1% P/S for 24 hours. Total volume of media in each well was brought to 100 μl. For T/C-28a2 transfection, to compare expression between human constructs containing the risk “T” and non-risk “C” allele at rs6060369, three independent transfection experiments (n = 3), each containing eight technical replicates (i.e., individual wells of a 96-well plate) per construct were performed. For T/C-28a2 cell transfection, Firefly luciferase reporter vector concentrations of 50 ng, 100 ng, 150 ng, and 200 ng were initially tested for efficient transfection. 100 ng showed highest transfection efficiency and consistency and was used for all T/C-28a2 experiments. Cells were then transfected transiently with 5 ng pGL4.74 Renilla luciferase vector and either 100 ng of Firefly luciferase reporter vector or 100 ng of empty pGL4.23 Firefly luciferase vector. The transfection reagent used in the experiment was Lipofectamine 2000 (Invitrogen 11668–019). The Lipofectamine:DNA ratio was 4:1 (as optimized in the lab and following manufacturer instructions); thus, there were 0.4 μL of transfection reagent per 100 ng of DNA. Forty-eight hours after the transfection, luciferase activity was measured on a SpectraMax L Microplate Reader (Molecular Devices; Cat# SpectraMax L Config), following the 96-well Dual-Glo Luciferase Assay System (Promega E2940) protocols. Plate reading conditions were as follows: 1 minute dark adapt, 5 s integration, and max range settings.

Chromatin Immunoprecipitation (ChIP) Assay

PITX1 ChIP in T/C-28a2 cells:

Cell line T/C-28a2 was cultured in DMEM containing 10% FBS and 1% PenStrep with 5% CO2 at 37°C for 48 h. The cells were harvested at 48 h by treatment with GIBCO Trypsin-EDTA (0.05%) solution (Life Technologies Corporation, NY, USA) and subsequently washed twice with 1X phosphate-buffered saline (PBS). ChIP-assay was performed using Zymo-SpinTM ChIP Kit (Zymo Research Corporation, USA). Briefly, the cells were counted using a hemocytometer and resuspended (5 ×106 cells) in 1ml of 1X PBS. For cross-linking proteins to DNA, formaldehyde was added to the cell suspension at a final concentration of 1% (v/v) and incubated for 7 min at room temperature (RT) with gentle shaking. The cross-linking reaction was then stopped by addition of 0.125 M glycine with gentle shaking for 5 min at RT followed by centrifugation at 3,000 × g for 1 min at 4°C. The cross-linked cell pellet was resuspended in 1 mL of ice-cold 1X PBS containing a mixture of 1 mM phenylmethylsulphonyl fluoride (PMSF) and 1X protease inhibitor cocktail (PIC). The cell pellet was then lysed with 500 μL of ice-cold nuclei prep buffer and the cell lysate was sonicated in the chilled chromatin shearing buffer containing 1mM PMSF and 1X PIC using Covaris s220X Focused Ultrasonicator (Covaris, Inc. USA) to generate DNA fragments between 100 and 500 bp. Once the cell debris was removed by centrifugation, the sheared chromatin in the supernatant was diluted in ChIP dilution buffer (1:10), and 1% of this sheared chromatin (as input DNA) was collected, purified, and subjected to genomic PCR with the primer sets described in Table S9. The 100 μL of sheared chromatin was incubated overnight at 4°C with PITX1 (G-4) X antibody (Santa Cruz Biotechnology, NJ, USA). Immune complexes were collected by magnetic separation using ZymoMag Protein A beads and the complexes were incubated with 6 μL of 5M NaCl at 65°C followed by proteinase K digestion to reverse DNA-protein cross-linking. ChIP-DNA was then purified by Zymo-SpinTM IC Column as per manufacturer’s instruction. Four biological replicate ChIP-assays (n = 4) were performed.

The strand-specific PCR primers used for amplifying the 257 bp region containing rs6060369 are indicated in Table S9. PCR amplification was performed in a 50 μL reaction mixture containing 2 μL of input/ChIP DNA by addition of 0.02 U/μl Q5 High-Fidelity DNA Polymerase. A hot start was performed at 98°C for 30 s; followed by 35 cycles at 98°C for 10 s, 55°C for 30 s, and 72°C for 30 s; with final extension at 72°C for 1 min. The PCR product was separated on 1% agarose gel containing 0.05 μg/mL of Ethidium Bromide (EtBr) and photographed using VWR photoimager Dual UV transilluminator system (VWR International, USA).

Pitx1 ChIP in vivo in mouse:

PITX1 ChIP on the mouse R4 element was performed as follows: Timed matings were performed between homozygous C57BL/6J R4rs6060369-T,rs6060369-T and wild-type C57BL/6J R4rs6060369-A,rs6060369-A mice. Pregnant females were sacrificed according to IACUC-approved protocols. Embryonic hindlimbs were dissected taking the distal femoral and proximal tibia at E15.5 and separately placed into a microcentrifuge tube containing 200 μL of 5% FBS supplemented DMEM medium. The collected tissues were subjected to 1% collagenase, type 2 (Cat. No. LS004176, Worthington Biochemical Corp. NJ, USA) digestion for 2 h at 37°C rocking, gentle mixing every 30 minutes to generate single cell chondrocyte suspension. The dissociated single cells were then washed twice with 1X PBS and subsequently ChIP-assay was performed on these cells using the methods described above. Three biological replicate ChIP-assays (n = 3) were performed on each tissue. The strand-specific PCR primers used for amplifying the 273 bp region containing the orthologous replacement rs6060369 base-position in the mouse genome are indicated in Table S9. PCR amplification was performed in a 50 μL reaction mixture containing 2 μL of input/ChIP DNA by the addition of 0.02 U/μl Q5® High-Fidelity DNA Polymerase. A hot start was performed (98°C for 30 s), followed by 35 cycles at 98°C for 10 s, 55°C for 30 s, and 72°C for 30 s with final extension at 72°C for 1 min. The PCR product was separated on 1% agarose gel containing 0.05 μmg/mL of Ethidium Bromide (EtBr) and photographed using VWR photoimager Dual UV transilluminator system (VWR International, USA).

ATAC-seq Data Collection and Analysis

Raw sequencing fastq files and processed peak bed files have been deposited on NCBI GEO(GSE122877).

Mouse samples:

FVB/NJ male and female mice were used to establish timed matings, and at E15.5 pregnant females were euthanized to acquire embryos. At this time point, chondrocytes are easily extracted from surrounding extracellular matrix for ATAC-seq with negligible effects on the epigenome (Guo et al., 2017). Embryos were dissected under a microscope in 1X PBS on ice and the proximal and distal portions of the right and left femur and tibia of the hind limb and right and left humerus and radius of the forelimb were stripped clear of soft tissues. Each proximal or distal cartilaginous end comprising of the articular chondrocytes, epiphyseal chondrocytes, and metaphyseal chondrocytes was then micro-dissected from the bony diaphysis (e.g., see Figure 1A) and separately pooled from a single litter, consisting on average of eight animals. Two to three biological replicates were collected in line with previous ATAC-seq studies (Gehrke et al., 2015; Guo et al., 2017). All samples were collected in micro-centrifuge Eppendorf tubes containing 200μL 5% FBS/DMEM. To generate a single-cell chondrocyte suspension, each pooled sample was then subjected to 1% Collagenase II (VWR 80056–222, Radnor, Pennsylvania) digestion for 2 hours at 37°C rocking, mixing every 30 minutes. After placing on ice, samples were next filtered using a micro-centrifuge filter set-up by gently mashing the residual tissues through the filter followed by rinsing with 5% FBS/DMEM. Samples were then spun down at 500 g at 4°C for 5 minutes. All cell counting methods were performed using trypan blue and a hemocytometer and subsequent ATAC-seq steps were performed on those samples that had cell death rates well below 10%. On average we acquired 500,000–1,000,000 living cells per harvest. Next, cells were re-suspended in concentrations of 50,000 cells in 1x PBS. Cell samples then subjected to the ATAC-seq protocol as described previously (Buenrostro et al., 2013, 2015), modifying the protocol by using 2 μl of transposase per reaction. The transposase reaction product was then purified using the Omega MicroElute DNA Clean Up Kit following manufacturers protocols, eluted in 10 μl of warmed ddH20, and stored at −20°C. We also performed experiments on transgenic Col2a1-ECFP reporter mice (compliments of Dr. Cliff Tabin at Harvard Medical School) (Chokalingam et al., 2009), which has an enhanced cyan fluorescent protein reporter under the control of the promoter of Col2a1. Identical to experiments on FVB/NJ mice, E15.5 proximal and distal skeletal elements of the femur were micro-dissected from pooled Col2a1-ECFP+ littermates, then subjected to collagenase treatment to make single cell suspensions (as above), and finally fluorescently sorted for CFP at the Harvard University Bauer Core Facility (HUBCF) to acquire replicates of 50,000 purified Col2a1-ECFP+ cells per bone-end for ATAC-seq (n = 4 Biological Replicates).

All samples were next subjected to PCR amplification and barcoding following (Buenrostro et al., 2013, 2015). Ten microliters of transposed DNA were then placed in a reaction containing NEBNext High-Fidelity PCR Master Mix, ddH20, and primers. Following amplification, samples were transferred to new tubes and treated using the OMEGA Bead Purification Protocol following manufacturer’s instructions. The samples were eluted in 30 μl of TE, nano-dropped, diluted to 5 ng/μl and run on a Bioanalyzer. Prior to sequencing sample concentrations were determined using the KAPA Library Quantification Complete Kit (KK4824). Samples were then sent out to the Harvard University Bauer Core Facility for sequencing on one lane of the Illumina NextSeq 500. Sequencing yielded ~400 million reads per lane and an average of 50 million per sample. Quality control statistics and primer information are presented in Table S1.

Human samples:

Human products of conception were collected from late first-trimester and early second-trimester terminations through the Laboratory of Developmental Biology in full compliance with the ethical guidelines of the National Institutes of Health and with the approval of the University of Washington Institutional Review Boards for the collection and distribution of human tissues for research, and Harvard University for the receipt and use of such materials. The Laboratory of Developmental Biology obtained written consent from all tissue donors. The University of Washington Birth Defects Research Laboratory was supported by NIH award number 5R24HD000836 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development. The human sample was briefly washed in HBSS and transported at 4°C during shipment. One rare intact human developmental sample at E59 days was acquired from the Laboratory of Developmental Biology. Upon arrival the sample was dissected under a light dissection microscope in identical fashion to all mouse samples reported above. All forelimb and hindlimb long bone elements were intact but then manually separated from one another, and each proximal or distal cartilaginous end corresponding to the cell populations mentioned above was then micro-dissected from the bony diaphysis. Samples were then processed accordingly using the same ATAC-seq pipeline as done for mouse samples. Proximal and distal bone end samples for each long bone (i.e., femur, tibia, humerus, radius) were then sent out for sequencing at the HUBCF running a pooled library for 3 rounds of sequencing on the Illumina NextSeq 500 in order to obtain at least 50 million reads per sample. Quality control statistics and primer information are presented in Table S1.

Computational Methods

Initial ATAC-seq read processing:

Mouse: Sequence read quality was checked with FastQC and subsequently aligned to the mouse reference mm10 genome assembly with Bowtie2 v2.3.2 (Langmead and Salzberg, 2012) using default parameters for paired-end alignment. Reads were filtered for duplicates and subsequently used for peak calling using MACS2 (Zhang et al., 2008) software (version 2.1.1.2), using the following flags for ‘callpeak’:–nomodel–shift 100–extsize 200. Peaks (referred to as “elements” in main text) reproducible across biological replicates were screened using an IDR threshold of < 0.05, as defined by the IDR statistical test (Li et al., 2011b) (version 2.0.3). Briefly, the IDR method looks for overlaps in peak calls across pairs of replicate samples through comparing ranked peak lists (using MACS2 q-value) to define a reproducibility score curve. All paired ranks are assigned a pointwise score based on this curve, subsequently sorted, and all peaks falling below an ‘irreproducible discovery rate’ (IDR) threshold of 0.05 are taken as our final reproducible peak set.

For initial comparisons with human ATAC-seq data (see below) called peaks were lifted-over from mm10 to hg19 using the UCSC ‘liftover’ utility (Kent et al., 2002) using the following flags: ‘-minMatch = 0.1 -bedPlus = 4’, with the relevant liftover chain file similarly obtained from UCSC.

Human: Sequenced reads across runs were pooled for each sample and subsequently aligned to the hg19 reference genome using bowtie2 (version 2.3.2) using default paired-end parameters. Duplicate reads were filtered and subsequently used for peak calling with MACS2 software (version 2.1.1.2), using the following flags for ‘callpeak’:–nomodel–shift 100–extsize 200.

IDR replicate consistency checks:

Following the guidelines for high-quality, replicable ATAC-seq datasets as defined by the ENCODE Consortium (Davis et al., 2018) (https://www.encodeproject.org/atac-seq/), we performed additional quality checks to assess both within- and between-sample concordance. Briefly, each individual sample was randomly subset into two halves (‘pseudoreplicates’), which were subsequently compared using IDR (applying a 0.05 IDR threshold) to check for within-sample reproducibility in peak-calling. Subsequently, pseudoreplicates were pooled across samples (i.e., pseudo-rep1 from sampleA was pooled with pseudo-rep1 from sample B) and these pooled pseudo-samples were compared using IDR. The metric for internal replicate consistency is termed the ‘self-consistency ratio’, while the pseudoreplicate-compared (cross-sample) consistency is termed the ‘rescue ratio’. ENCODE guidelines indicate that high-quality samples should have values for these ratios less than two – this we observe for testing done on two distal-femur and proximal-tibia samples (see Table S1, Sheet 10), confirming the reproducibility of our ATAC-seq datasets.

Generation of element sets:

Mouse ATAC-seq element sets from both the grossly dissected proximal and distal ends of the femur, tibia, humerus, and radius were intersected with ATAC-seq data our group previously generated on mouse embryonic brain tissue (Guo et al., 2017) using bedtools (Quinlan and Hall, 2010) (version 2.26.0) in order to remove non-specific open chromatin regions. Filtered element sets for the distal femur and proximal tibia were intersected to generate a “general knee set,” with shared elements merged using the ‘merge’ function in bedtools and default parameters. Additionally, distal-femur and proximal-tibia brain-filtered sets were filtered for proximal-femur and distal-tibia elements, respectively, with these further-refined sets subsequently intersected to generate a “knee-specific set,” with shared elements merged as before. Similarly, proximal femur-specific and distal tibia-specific elements were also generated. We followed this same strategy to generate a “general elbow set” as well as an “elbow-specific set,” the latter consisting of elbow-separated and common element sets. Like the hind limb, we also generated for the forelimb proximal humerus-specific and distal radius-specific elements. All processed element sets were subsequently lifted-over from mm10 to hg19 as described above. To confirm consistent one-to-one orthology behavior of these sets, elements were lifted-over from mm10 to hg19 (liftOver utility flags ‘-minMatch = 0.1 -bedPlus = 4’), then subsequently lifted-back to mm10 (same flags). The original mm10 sets were compared to those lifted-back (Table S1, Sheet 30), indicating that, for the vast majority of sequences (> 99%) one-to-one orthology in liftover was achieved. In order to account for variation in called peak sizes when performing several analyses (e.g., calculating human sequence diversity between sets and motif alteration analyses), peaks in these final sets were assigned a fixed size of 500 bp (based on the frequency distribution of called peak sizes across all sets) and centered on the middle of a called peak.

The genomic distribution of these element sets was assessed with HOMER (Heinz et al., 2010) (version 4.10) ‘annotatePeaks.pl’, which assigns regions to the closest annotated TSS and subsequently categorizes based on their positioning relative to genes (e.g., intronic, exonic, intergenic, etc.). The distribution of assignments for lifted-over distal femur-specific and proximal tibia-specific elements is shown in Figure 1A, with distributions for additional element sets shown in Table S1, Sheet 31.

Overlaps between human and lifted-over mouse ATAC-seq peaks were assessed using bedtools (version 2.26.0) ‘intersect’ with default settings (see Table S1, Sheet 29), overlapping the peak sets generated from brain-filtered mouse samples (e.g., distal femur, filtered for embryonic brain) as well as those knee-specific sets (distal femur-specific, proximal tibia-specific, knee-common-specific) made using the below filtering steps. Sets were matched for tissue-type (e.g., mouse and human distal femur), with resulting overlap sets used in subsequent analyses.

Additional epigenetic dataset processing and intersections:

Col2a-ECFP ATAC-seq data was obtained and processed as described above and then intersected with ATAC-seq datasets acquired on grossly dissected long bone chondrocytes as described above. GM12878 ATAC-seq data was obtained from GEO datasets (GSE47753) (Buenrostro et al., 2013) as raw .fastq files (for 50K samples); reads were subsequently mapped to hg19 using the same ATAC-seq processing pipeline detailed above, with IDR replication performed for n = 4 replicates. Bone Marrow-Derived Chondrocytes (BMDC) ChIP-seq datasets were downloaded as .bed.gz reads from GEO datasets (Superset GSE17312), and used to call peaks with MACS2 (version 2.1.1.2), using the ‘–nomodel–shift 0–extsize X’ options, with extension size chosen using the ‘predictd’ utility from MACS2. BMDC replicates were subsequently consolidated using irreproducible discovery rate (IDR) (version 2.0.3) analysis with a cut-off of 0.05. Human embryonic limb bud H3K27ac data (Cotney et al., 2013) was obtained from GEO datasets (GSM1039552, GSM1039553) as .bed.gz files, and used to call peaks with MACS2 (with IDR replicate-consolidation) as above for the BMDC data. ENCODE (ENCODE Project Consortium, 2012) TF ChIP-seq was obtained as processed .bed files (called peaks) (see Table S2, Sheets 5–7 for accession codes) from the ENCODE website. Pitx1 ChIP-seq (Infante et al., 2013) data was obtained from GEO datasets (GSE41591) as called peaks in .bed format; replicated peaks were intersected and merged using bedtools (version 2.26.0). All datasets which are indicated as originating from mouse (See Table S2) were lifted-over from their original mouse genome coordinates to hg19 using the liftOver utility as described above. E59 human bone-end chondrogenesis ATAC-seq datasets (see above) were matched to their equivalent mouse tissue (e.g., mouse distal femur and human distal femur) for overlaps. All datasets were tested for enriched overlap with knee elements using regioneR (Gel et al., 2016) (version 1.8.1) using the ‘permTest’ function, generating 1000 randomized region sets as a background using the ‘circularRandomizeRegions’ option and the ‘count.once’ flag, with all other options set to defaults. Significance was assessed at p < 0.05 (Table S2, Sheets 1–7).

Inter-species conservation analyses:

To examine inter-species conservation, phyloP (Pollard et al., 2010) scores for 20 primate species (phyloP20ways) were obtained from UCSC (Karolchik et al., 2014) and used to extract per-bp conservation scores for all elements in the general knee, knee-specific, general elbow, and elbow-specific sets, as well as those knee-specific elements that overlapped human E59 ATAC-seq data. To make elements comparable for these analyses, region coordinates were fixed to a constant size of 500 bp (see above). Given the potential for base-pair modifications to alter enhancer activity (Maas and Fallon, 2005; Prabhakar et al., 2008), coupled with the expectation that selection forces likely do not act across the entire length of a multi-TFBS enhancer, instead modifying individual binding sites (Wittkopp and Kalay, 2011), per-bp scores for each region were aggregated (within each set, averaged per-bp across all sequences in a set, for a total of 500 data-points) in order to compare pan-species sequence conservation across sets. These sets of points were subsequently used to perform a Wilcoxon Rank-Sum test as implemented in R (R Development Core Team, 2008) (version 3.4.4) to test for the significance of differences in conservation, resulting one-sided p values were corrected for multiple testing using Benjamini-Hochberg (Benjamini and Hochberg, 1995) correction. Scores were also averaged over the length of elements (500 bp, per-region in a set) and compared as for the per-bp results. Significance was determined at adjusted p < 0.05; see Table S2, Sheet 8 for comparison information.

For analysis of trends in phyloP score over the length of a peak (Figure S1), per-bp conservation scores were averaged per-bp across all ATAC-seq regions in a set and plotted as a function of distance along a 1kb window (centered in the middle of each ATAC-seq region). A randomly-generated set of 1kb regions (matched for set size) sampled genome-wide was also generated and average per-bp conservation scores calculated in order to provide a background set. For the analysis comparing primate conservation to human-chimpanzee divergence, aligned sequences for human and chimpanzee were extracted from a primate alignment (multiz20way) obtained from UCSC (Karolchik et al., 2014), with sequence identity (%ID) for a given ATAC-seq region calculated as the number of nucleotide matches divided by total sequence length. These values were compared to per-bp phyloP20ways scores averaged over the length of the same region, and subsequently plotted in order of increasing %ID (as seen in Figure 1F). Regions sorted by %ID were sliced into top and bottom 25%, as well as 10%, groups for each sequence set and were analyzed using GREAT (McLean et al., 2010) (version 3.0.0) (Table S2, Sheets 9 – 20). Region coordinates (hg38), extracted sequences, calculated %ID, and averaged phyloP20ways score for each sequence set analyzed are available in Table S2, Sheets 21–23.

Note about GREAT (McLean et al., 2010): GREAT takes an input set of genomic regions along with a defined ontology of gene annotations; first, it defines regulatory domains for all genes genome-wide, then measures the fraction of the genome covered by the regulatory domains of genes associated with a particular annotation (e.g., ‘cartilage development’). These fractions are used as the expectation in a binomial test counting the number of input genomic regions falling within a given set of regulatory domains, which results in the reported significance of association between an input region set and a particular gene ontology term. GREAT also performs a more traditional gene-based hypergeometric test to test for significance of region set-ontology association. The program returns a set of enriched ontologies sorted by the joint rankings of FDR-corrected binomial and hypergeometric tests, as reported here in our supplemental tables. For this study, we chose terms in the GREAT output with relevance to chondrocyte biology – bone end-specific anatomy is not well annotated, particularly at the genetic level, and it is well known that there is a bias toward GO annotations for other phenotypes (like immune function), particularly for phenotypes involving better-studied genes which tend to have increased GO term representation (Gillis and Pavlidis, 2013).

A set of human sequences displaying evidence of nucleotide acceleration in a variety of different contexts was aggregated from multiple studies (Bird et al., 2007; Bush and Lahn, 2008; Gittelman et al., 2015; Pollard et al., 2006a; Prabhakar et al., 2006) and intersected with the general knee, knee-specific, general elbow, and elbow-specific sets (using original called-peak coordinates), knee-specific and elbow-specific elements that overlapped human E59 ATAC-seq data, along with ATAC-seq data obtained from embryonic brain tissue, using bedtools (version 2.26.0). In order to correct for both the variable number and size of peaks in different sets, intersections were calculated as intersections/bp of sequence in a given set. A background distribution was established by generating 10,000 random region sets consisting of 4015 (average set size for specific knee sequence sets) regions from across the genome with a constant length of 500 bp using the bedtools ‘random’ function, and intersecting each set with accelerated regions. Intersections/bp values for both target and background sets were plotted as a histogram (Figure 1C); the distribution of these values was assessed using the ‘qqnorm’ (R base) and ‘qqPlot’ (car version 3.0.3) (Fox and Weisberg, 2019) functions – these suggested substantial deviations from a normal distribution. Accordingly, the ‘fitdistrplus’ package (version 1.0–14) (Delignette-Muller and Dutang, 2015) was used to determine an appropriate distribution to fit the data. Given the limitations in fitting a number of distributions (e.g., gamma, beta, log-normal) for positive values background sets containing no accelerated region overlaps were removed for curve fitting (n = 9427 after filtering). The ‘descdist’ function was initially used to assess curve behavior; subsequently goodness-of-fit statistics (from the ‘gofstat’ function) for gamma, beta, exponential, and log-normal distributions were compared, with the beta-distribution subsequently selected (this choice also being appropriate given the fractional nature of the datapoints) (Mun, 2008). Beta distribution parameters (‘shape1’ and ‘shape2’ in the R implementation of ‘pbeta’) were fit using a bootstrap method (‘bootdist’ from ‘fitdistrplus’), with the median parameter estimates from 1000 samples used to define the distribution for significance testing of target set intersections/bp values with ‘pbeta’ (upper-tail p values). Results were subsequently adjusted using BH correction (Table S4, Sheet 14).

Elements intersecting regions of acceleration were associated with the closest annotated TSS with HOMER (version 4.10) ‘annotatePeaks.pl’ (Table S4, Sheets 2 – 13). All intersecting hindlimb elements were aggregated and used with GREAT to identify genome-wide signals (Table S4, Sheet 15).

Intra-species conservation analysis:

Variation data from the 1000 Genomes Project phase 3 (1KGP) (Auton et al., 2015) (n = 2504 individuals) in .vcf.gz format was obtained and intersected with the general knee, knee-specific, general elbow, and elbow-specific ATAC-seq region sets, as well as those knee-specific elements that overlapped human E59 ATAC-seq data (elements fixed to a size of 500 bp) using tabix (Li, 2011a) (version 1.7.24) to obtain variants occurring within these putative regulatory regions. Sequence data from chimpanzee (n = 25) and gorilla (n = 31) was similarly obtained from the Great Ape Genome Diversity Project (GADP) (Prado-Martinez et al., 2013). Peak sets were lifted-over from hg19 to hg18 for use with the GADP datasets with the UCSC ‘liftover’ utility and relevant liftover chain file. Resulting subset VCF files were converted to tab format with the following Unix command, using bcftools (version 1.8) (Li, 2011b):

bcftools query -f ‘%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%SAMPLE = %TGT]\n’ -o out.vcf. in.vcf.

Variant data for all datasets were down-sampled to n = 25 (with replacement, 5 resamples for gorilla and 200 re-samples for the human set) in order to match sample size for all comparisons based on the least-sampled species (chimp), using a custom R script.

Variants were defined using a minor allele frequency (MAF) threshold of > = 0.05 for all datasets, filtering tab-formatted files using a custom Python script. Counts data was defined as the number of variants intersecting a given element and averaged over resampled variant sets (see below). Counts data for human, chimp, and gorilla were then compared within a given element set (e.g., proximal tibia-specific set) to compare intra-species diversity of putative regulatory regions. Hurdle modeling was used to test for significant differences in both total number of sequences containing variants (hurdle) as well as degree of variation between species (counts); this was implemented using the ‘hurdle’ function from the pscl (Jackman, 2017; Zeileis et al., 2008) package in R (version 1.5.2). A binomial model was applied for the initial hurdle/zero-counts step, with the subsequent counts modeling done using a negative binomial regression model. Tukey post hoc testing was performed using the emmeans package in R (version 1.2.1) for both hurdle/zero-counts and counts models, with significance assessed at adjusted p value < 0.05 (Table S5, Sheets 1–26). Additionally, element sets were compared to one another (e.g., distal femur-specific versus proximal tibia-specific) within a given species using the same methods. Variant per-sequence counts were visualized as boxplots using ggplot2 (version 2.2.1) with logged values (Figures 3B3F and S2).

In order to check for consistency across sub-samples when taking averaged values for the above analyses, hurdle modeling was performed using human counts data calculated from the first and third quartiles of counts for a particular sequence across the sub-sampled groups to confirm that statistical relationships between species (for a given element set) and within species (comparing element sets) were reproduced. Additionally, subsampling was re-performed using a sub-sample size of n = 100 (for 200 resamples) to confirm that statistical relationships are robust to sub-sample size of the human set. These results are similarly presented in Table S5. Sequences within each region set were ranked according to human SNP/bp value averaged over sub-samples of 1KG3 data and sliced into top and bottom 25% most/least-variable (respectively) sections. Slices were subsequently analyzed with GREAT (Table S5, Sheets 31–42). Similar slicing was done for chimp SNP/bp values to generate equivalent most/least-variable sections (Table S5, Sheets 43–54).

In order to look at sequence constraint of putative regulatory regions within humans the set of knee-specific elements (i.e., distal femur-specific, proximal tibia-specific, and knee-common-specific sets) were pooled and a set of 1000 randomly-generated region sets, consisting of 8491 sequences (size of the pooled set) of 500 bp was generated using the bedtools ‘random’ function. These sets were subsequently pooled, sorted, and merged using bedtools, with the resulting bed file used to extract variants from the 1KG3 set with tabix (version 1.7.24). The pooled set of knee-specific elements was also used to extract variants from the 1KG3 set. Additionally, several genomic features were extracted from the HOMER set of genomic annotations (hg19) provided with the program, including the following region sets: intronic, promoter-TSS, TTS, and exon. Further, regions from RepeatMasker were also obtained from the UCSC Table Browser (Karolchik et al., 2004). These additional sets were also used to extract variants from the 1KG3 set. The resulting files were filtered for duplicate variants and subsequently MAF > = 0.05 with bcftools (Li, 2011b) (version 1.8). Variants falling within particular elements in the random background, target (e.g., knee-specific), and genomic annotation sets were then extracted using tabix. The number of variants extracted for each set was counted using vcftools (Danecek et al., 2011) (version 0.1.15) ‘–counts2– stdout’ arguments. Variant counts were then adjusted to account for the number of bp within a given set. The background distribution of these values was investigated using the ‘qqnorm’ (R base) and ‘qqPlot’ (car package) functions to look for visible deviations from normality, for which no obvious deviations were observed. Values were standardized and statistical significance was assessed using a CDF of the standard normal distribution as implemented in the ‘pnorm’ function in R (version 3.4.4). P values for significant deviations from the background distribution were corrected for the number of sets (n = 10) tested using a BH correction. Significance was defined as adjusted p < 0.05 (Table S5, Sheet 27).

A similar analysis of sequence constraint of putative regulatory regions within chimpanzees was also performed. Knee-specific elements were pooled and lifted-over to hg18 using the ‘liftOver’ utility; a set of 1000 randomly-generated region sets, consisting of 8431 sequences (size of lifted-over pooled set) of 500bp was generated using the bedtools ‘random’ function. Randomized sequence sets were subsequently pooled, sorted, and merged using bedtools, with the resulting bed file used to extract variants from the GADP set with tabix (version 1.7.24). Additionally, several genomic features were extracted from a downloaded set of chimpanzee HOMER genomic annotations (panTro4), including the following region sets: intronic, promoter-TSS, TTS, and exon. These additional sets were lifted-over to hg18 (flags as indicated above) and used to extract variants from the GADP set. The resulting files were filtered for duplicate variants and subsequently MAF > = 0.05 with bcftools (Li, 2011b) (version 1.8). Variants falling within particular elements in the random background, target (e.g., knee-specific), and genomic annotation sets were then extracted using tabix. The number of variants extracted for each set was counted using vcftools (Danecek et al., 2011) (version 0.1.15) ‘–counts2–stdout’ arguments. Variant counts were then adjusted to account for the number of bp within a given set. The background distribution of these values was investigated using the ‘qqnorm’ (R base) and ‘qqPlot’ (car package) functions to look for visible deviations from normality, for which an obvious left-skew was observed (Figure S2P). For comparison with the above human analysis, background values were standardized and statistical significance was assessed using a CDF of the standard normal distribution as implemented in the ‘pnorm’ function in R (version 3.4.4). P values for significant deviations from the background distribution were corrected for the number of sets (n = 10) tested using a BH correction. Significance was defined as adjusted p < 0.05. See Table S5, Sheet 28. Given the left-skew distribution, the ‘fitdistrplus’ package (version 1.0–14) (Delignette-Muller and Dutang, 2015) was used to determine an additional distribution to fit the data, as described for the accelerated region enrichment analysis above, with a gamma distribution selected on the basis of goodness-of-fit statistics. Distribution parameters were fit using a bootstrap method (‘bootdist’ from ‘fitdistrplus’), with the median parameter estimates from 1000 samples used to define the distribution for significance testing of adjusted variant counts with ‘pgamma’. Results differed negligibly from those obtained using a normal CDF (Table S5, Sheet 28).

Human knee morphometric analysis:

Bone and cartilage measurements of 600 knees were derived from the OA Biomarkers Consortium Project (OABCP) for the National Institutes of Health (FNIH) Project, which used the patient knee MRI images from the Osteoarthritis Initiative (OAI) (see http://oai.epi-ucsf.org./datarelease/ and https://fnih.org/what-we-do/biomarkers-consortium/programs/osteoarthritis-project). Some of these measurements including the total area of medial and lateral subchondral bone were performed by iMorphics (Manchester, UK), a company focused on advanced image analysis technology for the analysis and interpretation of 3D medical images. The total area of the subchondral bone was measured according to the definition by Eckstein et al., with peripheral osteophytes excluded and central osteophytes included (Eckstein et al., 2006). These measurements were obtained based on the 3D bone shapes that were automatically segmented from 3D DESS MRI images using active appearance models (Hunter et al., 2016).

The measurements of the volume of medial and lateral cartilage were performed using KneeIQ, a fully automatic computer-based framework developed by Biomediq (Copenhagen, Denmark). Biomediq is a company that specializes in analyses of knee MRI for scoring of cartilage quantity and quality and bone structure in relation to osteoarthritis. The framework comprises a rigid multi-atlas registration method to detect the region of interest for each anatomical structure, and a supervised voxel classification method based on a k-NN classifier for structure-wise classification combined with supervised voxel classification (Folkesson et al., 2007). The corresponding Kellgren Lawrence (KL) grade/score of each subject was derived from OAI’s baseline clinical dataset (0.2.3). Morphometric data for proximal tibia/distal femur features were taken from all patients with an annotated KL grade of one (n = 75) and tested for homogeneity of variance using Fligner-Killeen and Levene’s tests as implemented in R (version 3.4.4). Significance was defined as p < 0.05; test statistics and resulting p values are shown in Table S5, Sheet 29.

Description of OAI patient cohort dataset:

The OAI dataset processed consisted of 4129 individuals identified as either having, or being at risk of developing, osteoarthritis. This set contains groups of 3366 Caucasian and 763 Black/African-American individuals (on the basis of self-reported ethnicity); in order to control for potential effects of demographic history, only those individuals in the Caucasian group were considered for genetic analyses (see below). This group ranged in age from 45–79 (average of 62), consisted of 1498 males and 1868 females, for which 916 presented with no radiographic OA at entry, 774 had OA in one knee (KL grade > 0), and 1676 had bilateral OA (KL grade > 0).

Sequence variation in OAI cohort:

Genotyping data for study participants in the OAI dataset was obtained through dbGaP with proper permissions (n = 4129); self-identified race was used to separate the dataset into White/Caucasian (n = 3366) and Black/ African-American (n = 763) groups. Given the disparate sampling sizes of these two groups, and to avoid potential demographic signals on genotype, only individuals in the White/Caucasian group were further analyzed. Genotyped SNPs were extracted using plink (Chang et al., 2015) version 1.9, and subsequently subset to those intersecting 20kb-padded ATAC-seq peaks (pooled from all ATAC-seq sets generated, lifted-over to hg18). Subset hg18 genotyping files were then lifted-over to hg19 using Crossmap (Zhao et al., 2014) version 0.3.3 using chain files obtained from UCSC. BEAGLE5 (Browning et al., 2018) was used to impute variants from the 1KG3 European reference panel using the ‘conform-gt.jar’ and ‘beagle.28Sep18.793.jar’ utilities, leaving all settings to default. Imputed VCF files were intersected with knee-specific ATAC-seq region sets (elements fixed to a size of 500bp) using tabix (Li, 2011) (version 1.7.24) to obtain variants occurring within these putative regulatory regions. Subset files were converted to tab format with the following Unix command: bcftools query -f ‘%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%SAMPLE = % TGT]\n’ -o out.vcf. in.vcf. The patient data was down-sampled to n = 25 (with replacement for 200 re-samples), using a custom R script, to make this analysis comparable to that done on the 1KG3 dataset in the intra-species diversity analysis (see above). Variants were defined using a minor allele frequency (MAF) threshold of > = 0.05 for all subsamples, filtering tab-formatted files using a custom python script. Counts data was defined as the number of variants intersecting a given element and were averaged over re-sampled variant sets. Counts data between proximal tibia-specific, distal femur-specific and knee-common-specific elements were subsequently compared using hurdle modeling as described above, with significance assessed at adjusted p value < 0.05 (Table S5, Sheet 30). To check for consistency across sub-samples when taking averages for hurdle modeling, the same comparisons were done using the first and third quartiles of counts for a particular sequence across sub-sampled sets, to confirm that the statistical relationships between proximal-tibia and distal-femur elements observed are reproduced.

Analysis of OA GWAS variants:

Acquisition of lead variants, as presented in Table S7 (Sheet 1), resulted in a final aggregate set of 95 leads. Leads were aggregated from a number of studies (arcOGEN Consortium et al., 2012; Castaño Betancourt et al., 2012, 2016; Casalone et al., 2018; Day-Williams et al., 2011; Evangelou et al., 2011, 2014; Hindorff et al., 2009; Kerkhof et al., 2010; Klein et al., 2019; Liu et al., 2017b; Miyamoto et al., 2008; Nakajima et al., 2010; Panoutsopoulou et al., 2011; Ramos et al., 2014; Styrkarsdottir et al., 2014, 2017; Zengini et al., 2018; Zhai et al., 2009). Linkage disequilibrium (LD) between lead osteoarthritis SNPs was calculated using vcftools (version 0.1.15) and the EUR subset of 1KG3 (Auton et al., 2015); leads with R2 greater than 0.5 were collapsed, retaining the lead with the lowest reported association p value, for a final set of 83 independent osteoarthritis loci. Proxy SNPs were subsequently obtained using ProxyFinder (https://github.com/immunogenomics/harmjan/tree/master/ProxyFinder) on 1KG3 EUR data, using an LD-cutoff of 0.5, a minimum MAF of 0.05, and a window size of 500kb, with all other settings left to defaults. This resulted in a set of 4691 proxy SNPs (Table S7, Sheet 2), which were combined with the leads and subsequently used for intersections with bedtools (version 2.26.0). For each element set (e.g., proximal tibia-specific) overlapping regions were merged with bedtools, with randomized background regions subsequently generated to match the number and average size of peaks within the target set using the bedtools ‘random’ function; each custom background (for each target set) consists of 10,000 sequence sets, with each randomized set intersected with the osteoarthritis variant set to generate a background distribution. These distributions were investigated using the ‘qqnorm’ (R base) and ‘qqPlot’ (car package) functions to look for visible deviations from normality, for which no obvious deviations were observed. The number of variants intersecting the target set was then compared to this background following standardization - statistical significance was assessed using a CDF of the standard normal distribution as implemented in the ‘pnorm’ function in R (version 3.4.4). Significance was defined as p < 0.05, relevant test statistics and values are in Table S7 Sheet 8.

Partitioned heritability of OA:

Summary statistics for the UK Biobanks dataset relating to self-reported osteoarthritis incidence was obtained from LDHub (Zheng et al., 2017) (file ‘20002_1465.ukbb.sumstats.gz’, http://ldsc.broadinstitute.org/gwashare) and analyzed using the LDSC software (https://github.com/bulik/ldsc) (Finucane et al., 2015). Summary data was converted to an appropriate format for the ‘munge_numstats.py’ script using custom python code. The latest version of the ‘baseline-LD’ model available from the Alkes group (version 2.2) (Hormozdiari et al., 2018) (https://data.broadinstitute.org/alkesgroup/LDSCORE) was downloaded. Initial heritability analysis (using the ‘ldsc.py’ script, ‘–h2’ mode) was performed using 1000 Genomes reference allele frequencies provided by the Alkes group, along with weights extracted from the baseline-LD model using a custom script, yielding an observed scale h2 estimate of 0.0188, in-line with the value reported for this trait by LDHub (http://ldsc.broadinstitute.org/lookup). For partitioned heritability analysis custom scripts were used to generate a modified version of the baseline-LD model with a reduced set of features including our element sets (see Table S7, Sheet 13 for full list of features), which was subsequently used to re-calculate LD scores with the ‘ldsc.py’ script with the following options: plink files for the EUR subset of 1000G (obtained from the Alkes group, see above link),–ld-wind-cm 1, constraining to the same set of SNPs used in the original baseline LD model. Partitioned heritability of features was done using the ‘ldsc.py–h2’ mode, using re-calculated LD scores, extracted weights, reference 1KG3 frequencies, the ‘–overlap-annot’ flag, and all other settings left to defaults. P values for significant proportion of heritability captured by particular features was corrected for the number of features tested (n = 31) using BH correction. Results are shown in Table S7, Sheet 13, with statistical significance defined as adjusted p value < 0.05.

Sequence diversity in 1KG3, EGDP and OAI populations:

Knee-specific element sets (i.e., distal-femur-specific, proximal-tibia-specific, and knee-common-specific) were pooled and subsequently used to extract variants from OAI VCF files (as described above) using tabix (Li, 2011a) (version 1.7.24). Given that imputed variant information for the OAI dataset was generated using 1KG3 as a reference, and thus would likely bias apparent differences in alternative-allele occurrence between these two datasets, only variants which were directly genotyped in the OAI study were retained for subsequent analysis. These variants were subsequently subset to those for which information from 1KG3 was also available, resulting in a final set of 1946 variants. The number of reference and alternative alleles (0, 1 or 2) for all variants were summed per-individual using a custom Python script. In order to account for demographic history that may bias trends in allele frequencies between the OAI and 1KG3 populations, the CEU subset of 1KG3 (n = 99) was selected to match the Caucasian subset (n = 3366) of the OAI population. Given the disparities in sample size between the two datasets, subsampling was performed for 200 subsamples for both populations; the alternative-allele counts for individuals within each subsample were averaged, with the final set of averaged data-points compared between OAI and 1KG3 sets using the Student’s t test (one-sided, alternative of greater OAI diversity). Subsampling was done using n = 20 as well as n = 200 to confirm that results held under both conditions. In order to ensure robustness of comparison tests to subsample-averaging, first and third quartile values for alternative-allele counts within each subsample were also compared using the Student’s t test. To confirm that the comparison results were not sensitive to spurious sets of randomized sampling, the above algorithm was applied 1000 times, with t test statistics and p values averaged across replicates. Averages and standard deviations for replicated comparisons are reported in Table S7, Sheet 14, along with all other statistics mentioned above. In order to replicate these results in another independent background population data from the EGDP (Pagani et al., 2016) was obtained, with variant files (both OAI and EGDP) subset to match those variants directly genotyped in both datasets. This final intersect set consisted of 1896 variants, for which alternative-allele counts were summed per-individual. To try to minimize biases due to demographic history the Caucasian subset of the OAI population (n = 3366) was compared to the EGDP dataset filtered to include only those individuals of European continent origin (i.e., Northern, Eastern, Western, Southern Europe) (n = 101). Subsampling, comparisons, and robustness checks were performed as described above for the 1KG3-OAI comparison. As an additional check to confirm the observed increase in OAI sequence variation relative to both background populations, the 1KG3 and EGDP subsets were combined (n = 200) and the above described statistical comparison algorithm was applied to compare the subset OAI population to this combined background population.

In order to confirm that this sequence behavior was a feature of knee-specific chondrocyte regulatory elements, rather than driven by systemic differences in allele frequencies between the OAI cohort and background populations, an independent element set (GM12878 ATAC-seq, described above) was used to subset the OAI dataset, impute, and subsequently filter for directly-genotyped variants falling within GM12878 elements as described above and in section ‘Sequence Variation in OAI Cohort’. This variant set was then filtered to include only those for which genotyping data in the 1KG3 and EGDP datasets was similarly available, resulting in a final consolidated variant set of 3932, for which per-individual alternative-allele counts were generated as above for the three populations. Subsampling, comparisons, and robustness checks for these count data were performed as above, the results of which are shown in Table S7, Sheet 14.

In order to generate the density histogram shown in Figure 4F, subsample-averaged alternative allele counts for OAI and 1KG3 subsamples (200 datapoints with n = 20 per sub-sample) were converted to density curves using the ‘density’ function (R base), using min, max values defined by the OAI/1KG3 superset to enforce a shared x axis. These curves were subsequently averaged over 1000 replicates, filtered for extreme x-values (i.e., those for which density approached 0), and plotted. The indicated significance bar represents the robusticity-checked t-test results presented in Table S7, Sheet 14 for the OAI-1KG3 knee-specific element comparison.

Selection and genetic drift analyses:

Variants from the entire 1KGP Phase 3 dataset were extracted for knee-specific element sets using tabix (version 1.7.24), filtered for duplicates and MAF > 0.05 with bcftools (version 1.8), and subsequently separated into individual 1000 Genomes populations, to the exclusion of admixed populations including all Americas (with the exception of GIH), Finns (FIN) and Iberians (IBS) for a total of 18 populations across Europe, Asia and Africa. These populations include the following 1KG3 codes: BEB, CDX, CEU, CHB, CHS, ESN, GBR, GIH, GWD, ITU, JPT, KHV, LWK, MSL, PJL, STU, TSI, YRI. Fst calculations for all SNPs extracted for a given set were performed using the ‘Fst’ function of the pegas (Paradis, 2010) package in R (version 0.11) which implements the Weir and Cockerham formula for Fst calculation (Weir and Cockerham, 1984). Values were calculated for all population comparisons (n = 153), with the resulting matrix used to perform principal components analysis (PCA) using the prcomp function as implemented in R (version 3.4.4) (Figure S5C). To gauge the divergence trends captured by principal components (PCs) Pearson correlations between PCs and population comparisons (e.g., CDX and GIH) were calculated and subsequently clustered (using hierarchical clustering of population comparisons, e.g., BEB-CHS) for visualization using the ComplexHeatmap (Gu et al., 2016) R package (version 1.17.1) (Figure S3D). For variant clustering the fviz ‘nbclust’ function from the factoextra (Kassambara and Mundt, 2017) package (version 1.0.5) was used to estimate ideal ‘k’ for CLARA clustering (as implemented in cluster [Maechler et al., 2018], 2.0.7–1) of each set of SNPs using the silhouette method (also from cluster), which determines the ideal cluster number that minimizes within-group Euclidean distances, stepping through a range of possible k values (k = 1–10). Silhouette plots were generated for all knee-specific sets separately, consistently indicating an ideal ‘k’ of 2. After cluster generation the clustered pairwise-Fst matrices were visualized using ComplexHeatmap and hierarchical clustering for both population comparisons as well as individual SNPs, as seen in Figures S3E and S3F.

ATAC-seq - selection test intersection:

To test whether knee elements intersect regions known to be under selection in humans, we utilized selection data from the 1KGP and the Estonian Genome Diversity Project (EGDP) (Jagoda et al., 2018). For the 1KGP, selected regions were acquired from Grossman and colleagues (Grossman et al., 2013) who used the Genome-Wide Composite of Multiple Signals Test (CMSGW) to identify regions of the genome under selection (FDR 19%) in the Yoruba (YRI), Northern and Western European (CEU), Han Chinese (CHB), and Japanese (JPT) 1KGP populations (see Table S1 of Grossman et al. [2013]). The coordinates of these selected regions from Grossman and colleagues were lifted over from the hg18 to the hg19 human genome build using the UCSC Genome Browser LiftOver tool, with overlap of at least one base pair with an ATAC-seq element was reported.

Selected regions from the EGDP came from two sources. Tests for regions under recent (< 30kya) positive selection were reported by Pagani and colleagues (Pagani et al., 2016), who identified regions of recent selection using three selection tests: iHS (Voight et al., 2006), nSL (Ferrer-Admetlla et al., 2014), and Tajima’s D (Tajima, 1989). They employed these tests on individuals from 12 populations in the EGDP: West and Central Africa, Middle East, South and West Europe, East and North Europe, Volga-Uralic, South Asia, West Siberia, South Siberia and Mongolia, Central Siberia, Northeast Siberia, Mainland East and Southeast Asia, and Island Southeast Asia. Tests for regions under positive selection in humans but in the more distant past - i.e., after the split between Africans and non-Africans but before the split between East and West Eurasian populations (50–30kya) - were identified by Jagoda et al. (2018) using the 3P-CLR statistic (Racimo, 2016). Jagoda and colleagues employed this statistic on EGDP individuals from the South and West Europe, East and North Europe, Mainland East and Southeast Asia and West and Central Africa populations. Both the recent and ancient selection tests were conducted on 200kb windows, with windows in the top 1% of signals considered significant. To determine the overlap between knee elements and these regions, elements were binned into 200kb windows. An element was scored as overlapping a recent selection signal if the 200kb window containing it had been in the top 1% of one of the three recent tests conducted by Pagani et al. (2016). An element was scored as overlapping an ancient selection signal if the 200kb window containing it had been in the top 1% of one of the 3P-CLR tests conducted by Jagoda et al. (2018). Enrichment/depletion for intersections with selection windows was assessed using hypergeometric tests (‘phyper’ from R Base); comparisons were defined as the number of unique genomic windows intersected by any element, and the number of said windows also being under selection, compared to the total number of genomic windows and the total genome-wide number of windows under selection. P values were adjusted for the number of sets compared (n = 20); relevant test-statistics, p values, and fold-changes are presented in Table S8, Sheets 20–21, as well as Figure 4D. Selected regions defined by Grossman et al. (2013) were tested for enriched overlap with element sets using regioneR (version 1.8.1) using the ‘permTest’ function, generating 1000 randomized region sets as a background using the ‘circularRandomizeRegions’ option and the ‘count.once’ flag, with all other options set to defaults. Significance was assessed at p < 0.05 (Table S8, Sheet 22).

Haploblock analysis:

1MB regions around each selection window of interest were defined and used to extract variant data from EGDP (Pagani et al., 2016) for populations which were identified as being subject to selection in the given locus using bcftools (version 1.8), with duplicate SNPs removed. Files were subsequently re-processed with vcftools (version 0.1.15) and PLINK (Chang et al., 2015) (version 1.90) (http://www.cog-genomics.org/plink/1.9/) for use with Haploview (Barrett et al., 2005) (version 4.2) with a -minMAF cutoff of 0.05 and the default Gabriel (Gabriel et al., 2002) block generation algorithm, all other flags being left to default settings. To visually assess whether a given osteoarthritis lead and associated proxies were present on the selected or alternate haploblock in the population, a multiple alignment based on SNP sequences (MAF > = 0.05) for all individuals was constructed either across a haploblock or across the full 1MB region. RaxML (Stamatakis, 2006) (version 7.3.0) was used to generate a maximum-likelihood tree with the GTR-CAT model along with bootstrap (n = 100) support. Chimpanzee reference sequences for all SNPs used in the multiple alignment were taken from panTro4 and used to assign ancestral (red) and derived (blue) states for all aligned sequences. See Figures S3GS3J.

Motif analysis – de-novo:

Sequence sets for the knee-specific regions (with a fixed length of 500 bp) were generated using reference sequences from mm10 and hg19 for original and lifted-over regions (respectively). HOMER (version 4.10) de novo motif analysis was performed on each sequence set using a 10x random shuffling as a background set. De novo motifs were compared to a vertebrate motif library included with HOMER, which incorporates the JASPAR (Mathelier et al., 2016) database; matches are scored using Pearson’s correlation coefficient of vectorized motif matrices (PWMs), with neutral frequencies (0.25) substituted for non-overlapping (e.g., gapped) positions. Best-matching motif PWMs are shown in Table S3, Sheets 1–6.

Motif analysis – inter-species:

For analysis of inter-species alterations to the predicted binding sites of certain TFs, a set of TFs bearing resemblance to the de-novo motifs identified in our element sets was combined with a previously-defined set of TFs associated with chondrogenesis (Liu et al., 2017a) for a total of n = 45 transcription factors for which defined PWMs were available from the compendium provided by MotifDb (Shannon and Richards, 2014), which contains matrices from various sources including the JASPAR database.

A human-chimpanzee whole-genome alignment was obtained from UCSC and used to extract orthologous reference hg38 and panTro4 sequences for all regions in each set (lifted-over from hg19 to hg38), regions fixed to a constant size of 500 bp (as done for the de-novo analysis above), with custom code. Aligned sequences were compared to identify human/chimpanzee divergences within elements of a given set. These divergences were subsequently analyzed using motifbreakR (Coetzee et al., 2015) (version 1.6.0). Briefly, this software first takes the set of PWMs for TFs of interest and scans the genomic region around altered bases (here, inter-species nucleotide divergences) for sequence matches to a particular PWM (via log-odds scoring). Once potential binding sites are identified via this in-silico scoring, these sites are subsequently re-scored using the altered base (here, replacing the hg38 sequence with the panTro4 equivalent); effect of an altered base is calculated as relative difference in log-odds scores for site, with positive differences indicating improved matching of a sequence to a TF PWM, negative differences indicating diminished matching. Thus, for human-chimp divergences intersecting sequence matches for a particular TF’s PWM, an ‘improving’ or ‘disrupting’ effect classification can be applied.

This scanning and scoring method was applied to human-chimp divergences within elements from the three knee-specific sets, with the final number of divergences intersecting sequence matches for a particular TF PWM counted.

A random background of 500 region sets (each consisting of 464 lifted-over human sequences, the size of the smallest set, the knee-common-specific set) with a size of 500 bp was generated from genome-wide sampling; human/chimpanzee sequences for these randomized sets were subsequently compared to generate randomized sets of human-chimp divergences. These sets were analyzed with the same scanning/scoring method, with the number of intersecting divergences for a particular TF calculated as for the target sets. Values for both target and randomized sets were corrected as divergences per bp of sequence to account for differences in set size and plotted as histograms (Figure 2A). These distributions were investigated using the ‘qqnorm’ (R base) and ‘qqPlot’ (car package) functions to look for visible deviations from normality, for which no obvious deviations were observed. Values were standardized and statistical significance was assessed using a CDF of the standard normal distribution as implemented in the ‘pnorm’ function in R (version 3.4.4). P values for significant deviations from the background distribution were corrected for the number of TFs tested using a BH correction. Significance was defined as adjusted p < 0.05 (Table S3, Sheets 7–9).

Motif analysis – intra-species:

Human sequence variants (SNPs) identified with a MAF > = 0.05 in the full 1KGP Phase 3 (Auton et al., 2015) set which overlap knee elements were analyzed for intersections with TF PWM sequence matches (i.e., predicted TF binding sites) as described above, using the same set of 45 TF PWMs. Alongside these sets of SNPs, variants clustered by pairwise population Fst (see above) were analyzed with the same methodology. SNPs per bp of sequence values were calculated for each element set (as well as clustered variant sets) to account for differences in set size. The same background region set used in the inter-species motif analysis was used, with counts of human variants intersecting predicted TF sequence matches adjusted for set size. Statistical deviations from the background distribution were similarly assessed using a CDF of the normal distribution as implemented in ‘pnorm’, with BH correction applied to account for the number of TFs tested. Significance was defined as adjusted p < 0.05 (Table S3, Sheets 10–20).

Motif analysis – OA variants:

All defined TF PWMs available from MotifDb (version 1.6.0) were retrieved, constraining results to single PWMs for unique TFs (taking JASPAR-sourced PWMs when available), for a final set of 1686 PWMs used. OA lead variants and their associated proxies (after filtering for inconsistent ref/alt classifications with respect to hg19 reference sequence, a final set of 4172 variants was used) were analyzed using motifbreakR (version 1.6.0), as described above, for intersections with predicted TF sequence motifs (or, ‘binding sequences’) for the TF superset. The distribution of variant intersections with predicted binding sequences is shown in Figure 4B, with intersections counted per TF PWM, for the entire OA lead/proxy set. This distribution was assessed using the ‘qqnorm’ (R base) and ‘qqPlot’(car version 3.0.3) functions – these suggested substantial deviations from a normal distribution. Accordingly, the ‘fitdistrplus’ package (version 1.0–14) was used to determine an appropriate distribution to fit the data. The ‘descdist’ function was initially used to assess curve behavior; given this, goodness-of-fit statistics (from the ‘gofstat’ function) for gamma, negative-binomial, exponential, log-normal and Weibull distributions were compared, with the gamma distribution subsequently selected given the observed left-skew of the logged distribution. Gamma distribution parameters (‘shape’ and ‘rate’ in the R implementation of ‘pgamma’) were fit using a bootstrap method (‘bootdist’ from ‘fitdistrplus’), with the median parameter estimates from 1000 samples used to define the distribution for significance testing of particular TF PWMs with ‘pgamma’ (upper-tail p values). Significant deviations from this superset distribution were tested for the set of TFs which showed significant biases in their intersection by human-chimp nucleotide divergences (adjusted p value < 0.05; n = 28, see Table S7, Sheets 7–9), with p values accordingly adjusted using BH correction (Table S7, Sheet 10). Additionally, deviations of particular TF PWMs from the superset distribution were also tested for OA variants falling within element sets (e.g., chondrocyte-pooled, knee-specific, GM12878, embryonic brain)– gamma distribution parameters for different element sets (e.g., knee-specific pooled, distal femur-specific) were fit using the ‘bootdist’ function as above, with enrichment p values similarly adjusted for n = 28 TF PWMs tested (Table S7, Sheets 10–11).

Predicted binding sequences for chondrogenesis-associated and de-novo identified motifs (n = 45, used above in inter- and intra-species analyses) intersected by element-captured OA variants were similarly identified with motifbreakR (Table S7, Sheet 9) and are shown as a per-TF histogram (for the pooled knee-specific set, Figure 4C).

To test whether knee elements capture a significant proportion of OA variants intersecting binding sequences for a particular TF (e.g., elements tend to capture OA variants predicted to intersect KLF5 sites) hypergeometric testing was performed using the ‘phyper’ function from R base, testing the set of inter-species biased TFs used earlier (n = 28). Comparisons were defined as: (a) the number of predicted binding sites (for target TF) intersected by element-captured variants, and the total number of predicted binding sites (for all TFs tested) intersected by element-captured variants, (b) the number of predicted binding sites (for target TF) intersected by all OA variants, and the total number of predicted binding sites (for all TFs tested) intersected by all OA variants. P values were adjusted for the number of TFs tested (n = 28); relevant test-statistics, p values and fold-changes are presented in Table S7, Sheet 12.

TF binding sites HaploregV4.1 and UniProbe:

To identify upstream TFs physically bound at the R4 enhancer and rs6060369 variant position, we used HaploReg (Ward and Kellis, 2012) (version 4.162) with default parameters (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php). We also queried the UniProbe database (Hume et al., 2015) (http://the_brain.bwh.harvard.edu/uniprobe/about.php/), which is based on replicate experimental measurements of in vitro binding interactions between expressed TFs and all possible 8-mer target oligonucleotides, to identify additional factors. The HaploReg analysis identified 5 TFs physically bound via ENCODE ChIP-seq datasets: CJUN, JUND, POL2, STAT3, ZNF263. For the UniProbe analysis (Table S9, Sheets 7–9), two 15 base-pair sequences differing only at the non-risk “C” and risk “T” alleles at rs6060369 were analyzed using Score Threshold: 0.2 and Species: Homo sapiens. This produced 7 distinct TF predictions for non-risk, and 19 distinct TF predictions for risk allele. For a TF binding site to be considered reduced or gained, the site for that TF must exhibit an enrichment score change of greater than at least 0.10, which corresponded to significant changes in Z-score. Next, both lists of potential upstream regulators exhibiting either true physical binding (Haploreg) or marked reduction/gain in binding affinity (UniProbe) were then screened using expression and phenotypic data in order to identify those TFs also expressed in, or functionally required, specifically in hind limb joints. To carry out this analysis we used data in VisiGene (Karolchik et al., 2014) (http://genome.ucsc.edu/cgi-bin/hgVisigene), Eurexpress (Diez-Roux et al., 2011) (http://www.eurexpress.org/ee/), Genepaint (Visel et al., 2004) (http://www.genepaint.org/Frameset.html), and the Mouse Genome Informatics expression and phenotypic databases (Blake et al., 2017) (http://www.informatics.jax.org/). Of the listed factors, Pitx1 met the above criteria as it is expressed in hind limbs, is required for normal development of hind limb long bones, particularly knees, in humans and mice (Nemec et al., 2017) and has roles in osteoarthritis (Picard et al., 2007).

QUANTIFICATION AND STATISTICAL ANALYSIS

MicroCT analyses:

In all cases, the measurer (AMK and SH) was blinded to specimens’ genotype. Quantified anatomical indices were compared between the genotypes using multivariate linear regression analysis with anatomy used as dependent variable, and genotype and line as independent variables. Sidak post hoc was used to correct for multiple comparisons between genotypes. Analysis was conducted in SPSS (IBM Corp., Armonk, NY). P values are two-sided and the statistical significance was assessed at alpha = 0.05 (Figure S4, Table S9, Sheets 1–4).

Assessment of osteoarthritis:

A Mann-Whitney test was used to compare the OARSI scores between the wild-type (n = 5 at P30; n = 6 at 1 year) and homozygous (n = 5 at P30; n = 14 at 1 year) knees. Linear regression was used to investigate the associations between OARSI score and quantified anatomical indices of the femur and tibia. The analyses were conducted in SPSS (IBM Corp., Armink, NY). P values are two-sided and the statistical significance was assessed at alpha = 0.05.

Allele-specific expression:

Pyrosequencing results for each SNP were used to calculate the allelic ratios of C57BL/6J (R4 null allele or R4 rs6060369 “T” allele in the heterozygous state) to 129X1/SVJ (wild-type allele). Each ratio of cDNA products found in heterozygous animals was then normalized by the ratio of WT C57BL/6J to 129X1/SVJ genomic products, amplified from known 1:1 mixtures of each sequence. The permutation test, a nonparametric measure, was used to determine significance between the wild-type and heterozygous allelic expression ratio with the {perm} module in R.

Gene expression experiments:

For each gene (GDF5, CEP250, and UQCC1) expression data was normalized relative to housekeeping gene expression and compared between control and experimental condition (i.e., R4 enhancer deletion or R4 41 bp variant position deletion). All data are presented as the mean ± SEM. Individual pairwise comparisons between control and experimental condition were analyzed by two- sample, two-tailed Student’s t test unless otherwise noted, with p < 0.05 regarded as significant. n numbers listed in figure legends (n = 3 biological replicates per comparison).

rs6060369 luciferase reporter assay:

Upon acquiring luciferase data after transfection of rs6060369 R4 enhancer constructs, we first normalized each Firefly luciferase value per well by its corresponding Renilla luciferase value per well, and then compared the mean expression of the reference allele construct (normalized by empty vector) to that of the mean expression of the alternate allele construct (normalized by empty vector). For each, the means ± SEM of multiple independent measurements were calculated. The unpaired two-tailed Student’s t test was used to determine the significance of differences between means. The p values from independent experiments were combined across three experiments using Fisher’s combined probability test.

DATA AND CODE AVAILABILITY

The data supporting the findings of this study, as presented in Figures 1, 2, 3, 4, 5, 6, 7, S1, S2, S3, S4, and S5 are available within Tables S1, S2, S3, S4, S5, S6, S7, S8, and S9 included in this publication as well as from the Lead Contact upon reasonable request. The accession number for the raw ATAC-seq data and processed peak bed files reported in this paper is GEO: GSE122877

Relevant URLs

Jackson Laboratory, https://www.jax.org/; MIT CRISPR Tools, http://zlab.bio/guide-design-resources; HaploRegV.4.1, https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php/; UniPROBE Database, http://the_brain.bwh.harvard.edu/uniprobe/about.php/; Visigene,:http://genome.ucsc.edu/cgi-bin/hgVisigene; Eurexpress, http://www.eurexpress.org/ee/; GenePaint, http://www.genepaint.org/Frameset.html; Mouse Genome Informatics, http://www.informatics.jax.org/; 1000 Genomes Project raw data, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr20.phase1_release_v3.20101123.snps_indels_svs.-genotypes.vcf.gz; GADP data, https://eichlerlab.gs.washington.edu/greatape/data/; EGDP data, https://evolbio.ut.ee/CGgenomes_VCF/; SAMtools, http://samtools.sourceforge.net/; ProxyFinder (https://github.com/immunogenomics/harmjan/tree/master/ProxyFinder); OAI: http://oai.epi-ucsf.org./datarelease/; FNIH: https://fnih.org/what-we-do/biomarkers-consortium/programs/osteoarthritis-project; LDSC, https://github.com/bulik/ldsc, LDHub Files, http://ldsc.broadinstitute.org/gwashare; LDHub results, http://ldsc.broadinstitute.org/lookup; Alkes Group LDSCORE files, https://data.broadinstitute.org/alkesgroup/LDSCORE

Supplementary Material

1. Figure S1. Related to Figure 1.

(A-C): phyloP scores (phyloP20ways) were averaged per-bp across all specific-knee ATAC-seq regions in a peak set and plotted in respective colors as a function of distance along a 1kb window centered on the peak middle; a random background distribution of conservation scores is indicated in gray (see STAR Methods). (A) Specific Distal Femur (B) Specific Proximal Tibia (C) Specific Knee-common. (D-E): HAR enrichment gene loci screenshots. (D) A general distal-femur element overlaps a human-accelerated region (HAR) (Prabhakar et al., 2006) upstream of BMP2, a TGF-β family member with a well-established role in chondrogenic differentiation, that is expressed in the knee (Rosen, 2009) and which been linked to osteoarthritis (Gamer et al., 2018; Valdes et al., 2006). ATAC-seq regions (“General DF”), ENCODE DNase hypersensitivity sites and phyloP100ways conservation tracks are shown on bottom. (E) A proximal-tibia-specific element overlaps an accelerated region (Bird et al., 2007) intronic to IQGAP1, a protein complex scaffold (Hedman et al., 2015) expressed in embryonic mouse cartilage (Cupit et al., 2004) which impacts skeletal development, particularly in the tibia (Dickinson et al., 2016). Format as in (D), with ATAC-seq track “PT” shown.

2. Figure S2. Related to Figure 3.

Intra-species sequence diversity in humans and apes. The number of intra-species variants intersecting a given ATAC-seq peak were counted for all variable sequences in a set within a given species and expressed as ‘SNPs per sequence’. Mean and median values of distributions are indicated in dashed and bold lines, respectively. Table S5 lists numerical values for each comparison. (A) General Knee Distal Femur, (B) General Knee Proximal Tibia, (C) General Knee Common, (D) General Elbow Distal Humerus, (E) General Elbow Proximal Radius, (F) General Elbow Common, (G) Elbow Distal Humerus-Specific, (H) Elbow Proximal Radius-Specific, (I) Elbow-Common-Specific. (J) Knee Distal Femur-Specific, overlapping human E59 Distal Femur data. (K) Knee Proximal Tibia-Specific, overlapping human E59 Proximal Tibia data. (L) Knee-Common-Specific, overlapping pooled human E59 Distal Femur and Proximal Tibia data. (M) Elbow Distal Humerus-Specific, overlapping human E59 Distal Humerus data. (N) Elbow Proximal Radius-Specific, overlapping human E59 Proximal Radius data. (O) Elbow-Common-Specific, overlapping pooled human E59 Distal Humerus and Proximal Radius data. (P) Counts of common chimpanzee variants (via the GADP dataset) per bp of sequence for element sets were compared to random region sets along with other genomic features for enrichment/depletion analysis; labels correspond to results in Table S5. Significance codes: not significant (ns), < 0.05 (*), < 0.01 (**), < 1e-5 (***).

3. Figure S3. Related to Figure 4.

Examining modern human sequence variation in knee sets. (A-B): Motif alteration in putative regulatory elements near osteoarthritis-related genes. Red shading indicates altered base relative to motif logo, blue shading below indicates genomic position of sequence. ATAC-seq regions and phyloP100ways conservation tracks are shown on bottom. (A) Common human variant (rs265053: C/T) within an intron of UNC5C, improves a predicted CEBPB sequence motif downstream of BMPR1B (Baugé et al., 2013; Zhai et al., 2015). (B) Common human variant (rs2280153: G/A) improves a predicted SOX9 sequence motif within the WISP3 promoter (Kannu et al., 2009; Sen et al., 2004). (C) Common human variants within the indicated peak sets distributed across the first two PCs of an Fst-based PCA analysis. Two sub-groups based on k-means clustering are indicated in red (Cluster 1) and blue (Cluster 2). Variance explained by each PC is indicated on axes. (D) Representative correlation heatmap between PCs and population comparisons across all variants in the knee-specific pooled set. Pearson’s correlation between PCs and population pairs are clustered, with populations colored by continent: red - Asia, green - Europe, blue - Africa. (E, F) Calculated Fst for common variants across populations. Fst values for a single variant across multiple pairwise population comparisons are shown, with each row representing a single variant; the particular pairwise comparison is indicated below, color scheme as for (D). Red/blue scale for Fst values is indicated on far right. Both variants and pairwise comparisons are grouped by hierarchical clustering. (E) Cluster 1 variants within the knee-specific pooled set. (F) Cluster 2 variants within the knee-specific pooled set. (G-J) Visual genotype plots for selected populations in a given locus. SNP sequences for all individuals are arranged according to a maximum-likelihood tree. Red/Blue indicates ancestral-derived assignments, respectively. Solid orange lines indicate SNPs which are gapped in chimpanzee, solid green lines indicates individuals carrying alleles besides the ancestral/derived annotations. Vertical yellow dashed lines indicate proxy variants, vertical green dashed lines indicate lead variants, when present in indicated region. For plots with a distinguishable haplotype (G, H) the alternate haplotype is indicated with a thick green line. (G) Visual genotypes for selected population (Volga-Ural region of Russia) (Capellini et al., 2017) within a ~150kb haploblock in the GDF5-UQCC1 locus. Distribution of genotypes indicates the presence of a high-frequency haploblock carrying the lead GWAS variant “A” at rs143383 (Miyamoto et al., 2007; Valdes et al., 2011) and linked variants (rs4911178, rs6060369). (H) Visual genotypes for selected population (South-East Asia) within a ~64kb haploblock in the UNC5C-BMPR1B locus; variant rs2626053 (arcOGEN Consortium et al., 2012) appears on a low-frequency haploblock. (I) Visual genotypes for selected population (South-Western Europe) in the ENPP1/3 locus; given the lack of clearly-defined haploblock structure in the locus, a 200kb region centered on rs3850251 (arcOGEN Consortium et al., 2012; Klein et al., 2019) was used to generate this plot, with linked rs7744039, rs7773292, rs9493095 indicated. (J) Visual genotypes for selected population (Central Siberia) in the LSM5 locus; given the lack of clearly-defined haploblock structure in the locus, a 200kb region centered on rs4141788, linked to rs7785659 (arcOGEN Consortium et al., 2012), was used to generate this plot.

4. Figure S4. Related to Figures 5 and 7.

Additional morphometric analyses on R4 enhancer null mice. (A) Quantified anatomical indices of the femur and tibia used in this study. Femoral length (FL), bicondylar width of the femur (BCW), trochlear groove depth (TD), width of the lateral femoral condyle (LCW), width of the medial femoral condyle (MCW), intercondylar notch width (NW), trochlear groove sulcus angle (SA), width of the tibial plateau (TPW), height of the medial tibial spine (MTSH), height of the lateral tibial spine (LTSH), posterior slope of the tibial plateau (PSTP), and tibial length (TL). Curvature radius of the femoral condyle and the posterior slope of the tibial plateau were measured across both medial and lateral compartments. (B,C) Morphological defects in 1 year old R4 enhancer null mice (HOM), compared to heterozygous mice (HET) and wild-type mice (WT). A number of measurements were carried out at 1 year of age (see (A)). This figure displays only those which revealed significant differences between control and R4 enhancer null mice in distal femur (B) and proximal tibia (C) structures. See Table S9 for complete results of statistical comparisons for all measurements as well as significance values for each comparison. (D) Correlation between OARSI scores and morphometric measures of the distal femur and proximal tibia on R4 enhancer mice. Correlation (R2) and p value results for Pearson’s correlation tests are indicated on each graph.

5. Figure S5. Related to Figure 5.

Additional histological sections on R4 enhancer null mice at P30 and 1 year. (Left) Comparison of WT and HOM knee joints at P30, showing microCT renditions (top row), high-magnification images of sections of the unaffected knee joint in both genotypes as reported in Figure 5 (second row), and low-magnification images of the of same medial sections but at different planes (third and fourth rows). (Right) Comparison of WT and two HOM knee joints of differing phenotypic severity (HOM-1 mild osteoarthritis (green); HOM-2 severe osteoarthritis (red)) at 1 year, showing microCT renditions (top row), including an X-ray image of HOM-2 specimen with heterotopic ossification (top row, far right), high-magnification images showing effected medial and lateral sections (when effected) as reported in Figure 5 (second row), low-magnification images of the same medial and lateral sections of the joint but in two different planes (third and fourth rows), and total joint images for all three specimens (fifth row). Note, the loss of articular cartilage matrix in the lateral compartment and clustering and loss of cells in the cartilage of the medial compartment in HOM knees. Scale bars, 50 μm (second row); 500 μm (third/fourth rows); 250 μm (fifth row).

6
7
8
9
10
11
12
13
14

Highlights.

  • Human/mouse long bone chondrocyte ATAC-seq reveals site-specific regulatory elements

  • Ancient positive and purifying selection shape human knee regulation and morphology

  • Modern human drift and pleiotropy violate regulatory constraint conferring OA risk

  • Validation at GDF5 reveals a single regulatory variant impacting both shape and OA

ACKNOWLEDGMENTS

The authors thank Drs. Bruce, Brooks, and Bouxsein for mCT from the Center for Skeletal Research (NIH P30 AR066261) (Massachusetts General Hospital); EpigenDx for ASE; Applied StemCell for mice; Drs. Sabeti (Harvard University [HU]), Zeng (Tufts University), and Goldring (The Hospital for Special Surgery) for cell lines; the HUGMF for targeting; the HUBCF for sequencing; Tufts Medical Center and Boston Children’s Hospital for histology; Dr. Worthington (HU) for statistics guidance; Drs. Price (Harvard T.H. Chan School of Public Health), Tabin (Harvard Medical School), Doxey (University of Waterloo), Lowe (Duke University), Bowes (iMorphics), Yau (Marcus Institute for Aging Research), Felson (Boston University), Pregizer (Boston Children’s Hospital), and Pilbeam (HU) and the Capellini lab members for insight and manuscript review; three reviewers for their insight into this work. This work was supported by NIH/ NIAMS (1R01AR070139 to T.D.C. and R01AR065462 to A.M.K.); the Orthopaedic Foundation at Boston Children’s Hospital (to A.M.K.); NIH/NCATS BU-CTSI (1UL1TR001430 to V.B.K. and G.H.C.); an American Heart Association Scientist Development Grant (17SDG33670323 to V.B.K. and G.H.C.); the Hariri Institute for Computing and Computational Science and Engineering at Boston University (Hariri Research Award to V.B.K. and G.H.C.); NIH (5U01AG-018820 to V.B.K. and G.H.C.); HU PRISE (to S.Y.); NSF (BCS1518596 to T.D.C. and N.R.) for mouse research; and HU Dean’s Competitive Fund and Milton Fund for Human Research (to T.D.C.).

Footnotes

SUPPLEMENTAL INFORMATION

Supplemental Information can be found online at https://doi.org/10.1016/j.cell.2020.02.057.

DECLARATION OF INTERESTS

The authors declare no competing interests.

REFERENCES

  1. Andersen H (1961). Histochemical studies on the histogenesis of the knee joint and superior tibio-fibular joint in human foetuses. Acta Anat. (Basel) 46, 279–303. [DOI] [PubMed] [Google Scholar]
  2. arcOGEN Consortium; arcOGEN Collaborators, Zeggini E, Panoutsopoulou K, Southam L, Rayner NW, Day-Williams AG, Lopes MC, Boraska V, Esko T, et al. (2012). Identification of new susceptibility loci for osteoarthritis (arcOGEN): a genome-wide association study. Lancet 380, 815–823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, and Abecasis GR; 1000 Genomes Project Consortium (2015). A global reference for human genetic variation. Nature 526, 68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Barr AJ, Dube B, Hensor EMA, Kingsbury SR, Peat G, Bowes MA, Sharples LD, and Conaghan PG (2016). The relationship between three-dimensional knee MRI bone shape and total knee replacement-a case control study: data from the Osteoarthritis Initiative. Rheumatology (Oxford) 55, 1585–1593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Barrett JC, Fry B, Maller J, and Daly MJ (2005). Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265. [DOI] [PubMed] [Google Scholar]
  6. Basit S, Naqvi SK-H, Wasif N, Ali G, Ansar M, and Ahmad W (2008). A novel insertion mutation in the cartilage-derived morphogenetic protein-1 (CDMP1) gene underlies Grebe-type chondrodysplasia in a consanguineous Pakistani family. BMC Med. Genet 9, 102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Baugé C, Girard N, Lhuissier E, Bazille C, and Boumediene K (2013). Regulation and Role of TGFβ Signaling Pathway in Aging and Osteoarthritis Joints. Aging Dis 5, 394–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Benjamini Y, and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300. [Google Scholar]
  9. Berenbaum F, Wallace IJ, Lieberman DE, and Felson DT (2018). Modern-day environmental factors in the pathogenesis of osteoarthritis. Nat. Rev. Rheumatol 14, 674–681. [DOI] [PubMed] [Google Scholar]
  10. Bird CP, Stranger BE, Liu M, Thomas DJ, Ingle CE, Beazley C, Miller W, Hurles ME, and Dermitzakis ET (2007). Fast-evolving noncoding sequences in the human genome. Genome Biol 8, R118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Blake JA, Eppig JT, Kadin JA, Richardson JE, Smith CL, and Bult CJ; the Mouse Genome Database Group (2017). Mouse Genome Database (MGD)-2017: community knowledge resource for the laboratory mouse. Nucleic Acids Res 45 (D1), D723–D729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Boyd JL, Skove SL, Rouanet JP, Pilaz L-J, Bepler T, Gordân R, Wray GA, and Silver DL (2015). Human-chimpanzee differences in a FZD8 enhancer alter cell-cycle dynamics in the developing neocortex. Curr. Biol 25, 772–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Bramble DM, and Lieberman DE (2004). Endurance running and the evolution of Homo. Nature 432, 345–352. [DOI] [PubMed] [Google Scholar]
  14. Browning BL, Zhou Y, and Browning SR (2018). A One-Penny Imputed Genome from Next-Generation Reference Panels. Am. J. Hum. Genet 103, 338–348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, and Greenleaf WJ (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Buenrostro JD, Wu B, Chang HY, and Greenleaf WJ (2015). ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol 109, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Bush EC, and Lahn BT (2008). A genome-wide screen for noncoding elements important in primate evolution. BMC Evol. Biol 8, 17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Butterfield NC, Curry KF, Steinberg J, Dewhurst H, Komla-Ebri D, Mannan NS, Adoum A-T, Leitch VD, Logan JG, Waung JA, et al. (2019). Accelerating functional gene discovery in osteoarthritis. bioRxiv. 10.1101/836221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Capellini TD, Di Giacomo G, Salsi V, Brendolan A, Ferretti E, Srivastava D, Zappavigna V, and Selleri L (2006). Pbx1/Pbx2 requirement for distal limb patterning is mediated by the hierarchical control of Hox gene spatial distribution and Shh expression. Development 133, 2263–2273. [DOI] [PubMed] [Google Scholar]
  20. Capellini TD, Chen H, Cao J, Doxey AC, Kiapour AM, Schoor M, and Kingsley DM (2017). Ancient selection for derived alleles at a GDF5 enhancer influencing human growth and osteoarthritis risk. Nat. Genet 49, 1202–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Capra JA, Erwin GD, McKinsey G, Rubenstein JLR, and Pollard KS (2013). Many human accelerated regions are developmental enhancers. Philos. Trans. R. Soc. Lond. B Biol. Sci 368, 20130025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Carnes BA, and Olshansky SJ (1993). Evolutionary Perspectives on Human Senescence. Popul. Dev. Rev 19, 793. [Google Scholar]
  23. Casalone E, Tachmazidou I, Zengini E, Hatzikotoulas K, Hackinger S, Suveges D, Steinberg J, Rayner NW, arcOGEN Consortium, Wilkinson JM, Panoutsopoulou K, and Zeggini E (2018). A novel variant in GLIS3 is associated with osteoarthritis. Ann. Rheum. Dis 77, 620–623. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Castaño-Betancourt MC, Cailotto F, Kerkhof HJ, Cornelis FM, Doherty SA, Hart DJ, Hofman A, Luyten FP, Maciewicz RA, Mangino M, et al. (2012). Genome-wide association and functional studies identify the DOT1L gene to be involved in cartilage thickness and hip osteoarthritis. Proc. Natl. Acad. Sci. USA 109, 8218–8223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Castaño-Betancourt MC, Evans DS, Ramos YF, Boer CG, Metrustry S, Liu Y, den Hollander W, van Rooij J, Kraus VB, Yau MS, et al. (2016). Novel genetic variants for cartilage thickness and hip osteoarthritis. PLoS Genet 12, e1006260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Chan YF, Marks ME, Jones FC, Villarreal G Jr., Shapiro MD, Brady SD, Southwick AM, Absher DM, Grimwood J, Schmutz J, et al. (2010). Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science 327, 302–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, and Lee JJ (2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Charles MD, Haloman S, Chen L, Ward SR, Fithian D, and Afra R (2013). Magnetic resonance imaging-based topographical differences between control and recurrent patellofemoral instability patients. Am. J. Sports Med 41, 374–384. [DOI] [PubMed] [Google Scholar]
  29. Chen H, Capellini TD, Schoor M, Mortlock DP, Reddi AH, and Kingsley DM (2016). Heads, shoulders, elbows, knees, and toes: modular Gdf5 enhancers control different joints in the vertebrate skeleton. PLoS Genet 12, e1006454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Chokalingam K, Hunter S, Gooch C, Frede C, Florer J, Wenstrup R, and Butler D (2009). Three-dimensional in vitro effects of compression and time in culture on aggregate modulus and on gene expression and protein content of collagen type II in murine chondrocytes. Tissue Eng. Part A 15, 2807–2816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Coetzee SG, Coetzee GA, and Hazelett DJ (2015). motifbreakR: an R/Bioconductor package for predicting variant effects at transcription factor binding sites. Bioinformatics 31, 3847–3849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Cotney J, Leng J, Yin J, Reilly SK, DeMare LE, Emera D, Ayoub AE, Rakic P, and Noonan JP (2013). The evolution of lineage-specific regulatory activities in the human embryonic limb. Cell 154, 185–196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Cupit LD, Schmidt VA, Miller F, and Bahou WF (2004). Distinct PAR/ IQGAP expression patterns during murine development: implications for thrombin-associated cytoskeletal reorganization. Mamm. Genome 15, 618–629. [DOI] [PubMed] [Google Scholar]
  34. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. ; 1000 Genomes Project Analysis Group (2011). The variant call format and VCFtools. Bioinformatics 27, 2156–2158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Darwin C (1888). The Descent of Man and Selection in Relation to Sex (Murray).
  36. Das Neves Borges P, Forte AE, Vincent TL, Dini D, and Marenzana M (2014). Rapid, automated imaging of mouse articular cartilage by microCT for early detection of osteoarthritis and finite element modelling of joint mechanics. Osteoarthritis Cartilage 22, 1419–1428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, Hilton JA, Jain K, Baymuradov UK, Narayanan AK, et al. (2018). The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res 46 (D1), D794–D801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, and Batzoglou S (2010). Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol 6, e1001025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Day-Williams AG, Southam L, Panoutsopoulou K, Rayner NW, Esko T, Estrada K, Helgadottir HT, Hofman A, Ingvarsson T, Jonsson H, et al. ; arcOGEN Consortium (2011). A variant in MCF2L is associated with osteoarthritis. Am. J. Hum. Genet 89, 446–450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Decker RS, Koyama E, and Pacifici M (2014). Genesis and morphogenesis of limb synovial joints and articular cartilage. Matrix Biol 39, 5–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Delignette-Muller ML, and Dutang C (2015). fitdistrplus: An R Package for Fitting Distributions. J. Stat. Softw 64, 1–34. [Google Scholar]
  42. DeMare LE, Leng J, Cotney J, Reilly SK, Yin J, Sarro R, and Noonan JP (2013). The genomic landscape of cohesin-associated chromatin interactions. Genome Res 23, 1224–1234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, Meehan TF, Weninger WJ, Westerberg H, Adissu H, et al. ; International Mouse Phenotyping Consortium; Jackson Laboratory; Infrastructure Nationale PHENOMIN, Institut Clinique de la Souris (ICS); Charles River Laboratories; MRC Harwell; Toronto Centre for Phenogenomics; Wellcome Trust Sanger Institute; RIKEN BioResource Center (2016). High-throughput discovery of novel developmental phenotypes. Nature 537, 508–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Diez-Roux G, Banfi S, Sultan M, Geffers L, Anand S, Rozado D, Magen A, Canidio E, Pagani M, Peluso I, et al. (2011). A high-resolution anatomical atlas of the transcriptome in the mouse embryo. PLoS Biol 9, e1000582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. DiLeone RJ, Russell LB, and Kingsley DM (1998). An extensive 3′ regulatory region controls expression of Bmp5 in specific anatomical structures of the mouse embryo. Genetics 148, 401–408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Eckstein F, Ateshian G, Burgkart R, Burstein D, Cicuttini F, Dardzinski B, Gray M, Link TM, Majumdar S, Mosher T, et al. (2006). Proposal for a nomenclature for magnetic resonance imaging based measures of articular cartilage in osteoarthritis. Osteoarthritis Cartilage 14, 974–983. [DOI] [PubMed] [Google Scholar]
  47. ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Evangelou E, Valdes AM, Kerkhof HJ, Styrkarsdottir U, Zhu Y, Meulen-belt I, Lories RJ, Karassa FB, Tylzanowski P, Bos SD, et al. ; arcOGEN Consortium; Translation Research in Europe Applied Technologies for Osteoarthritis (TreatOA) (2011). Meta-analysis of genome-wide association studies confirms a susceptibility locus for knee osteoarthritis on chromosome 7q22. Ann. Rheum. Dis 70, 349–355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Evangelou E, Kerkhof HJ, Styrkarsdottir U, Ntzani EE, Bos SD, Esko T, Evans DS, Metrustry S, Panoutsopoulou K, Ramos YF, et al. ; arcOGEN Consortium (2014). A meta-analysis of genome-wide association studies identifies novel variants associated with osteoarthritis of the hip. Ann. Rheum. Dis 73, 2130–2136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Felson DT, Anderson JJ, Naimark A, Walker AM, and Meenan RF (1988). Obesity and knee osteoarthritis. The Framingham Study. Ann. Intern. Med 109, 18–24. [DOI] [PubMed] [Google Scholar]
  51. Ferrer-Admetlla A, Liang M, Korneliussen T, and Nielsen R (2014). On detecting incomplete soft or hard selective sweeps using haplotype structure. Mol. Biol. Evol 31, 1275–1291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Finger F, Schörle C, Zien A, Gebhard P, Goldring MB, and Aigner T (2003). Molecular phenotyping of human chondrocyte cell lines T/C-28a2, T/C-28a4, and C-28/I2. Arthritis Rheum 48, 3395–3403. [DOI] [PubMed] [Google Scholar]
  53. Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh P-R, Anttila V, Xu H, Zang C, Farh K, et al. ; ReproGen Consortium; Schizophrenia Working Group of the Psychiatric Genomics Consortium; RACI Consortium (2015). Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet 47, 1228–1235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Folkesson J, Dam EB, Olsen OF, Pettersen PC, and Christiansen C (2007). Segmenting articular cartilage automatically using a voxel classification approach. IEEE Trans. Med. Imaging 26, 106–115. [DOI] [PubMed] [Google Scholar]
  55. Fox J, and Weisberg S (2019). An R Companion to Applied Regression, Third Edition (SAGE Publications; ). [Google Scholar]
  56. Frelat MA, Shaw CN, Sukhdeo S, Hublin J-J, Benazzi S, and Ryan TM (2017). Evolution of the hominin knee and ankle. J. Hum. Evol 108, 147–160. [DOI] [PubMed] [Google Scholar]
  57. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, et al. (2002). The structure of haplotype blocks in the human genome. Science 296, 2225–2229. [DOI] [PubMed] [Google Scholar]
  58. Gamer LW, Pregizer S, Gamer J, Feigenson M, Ionescu A, Li Q, Han L, and Rosen V (2018). The Role of Bmp2 in the Maturation and Maintenance of the Murine Knee Joint. J. Bone Miner. Res 33, 1708–1717. [DOI] [PubMed] [Google Scholar]
  59. Gardner E, and O’Rahilly R (1968). The early development of the knee joint in staged human embryos. J. Anat 102, 289–299. [PMC free article] [PubMed] [Google Scholar]
  60. Gazal S, Finucane HK, Furlotte NA, Loh P-R, Palamara PF, Liu X, Schoech A, Bulik-Sullivan B, Neale BM, Gusev A, and Price AL (2017). Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet 49, 1421–1427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Gehrke AR, Schneider I, de la Calle-Mustienes E, Tena JJ, Gomez-Marin C, Chandran M, Nakamura T, Braasch I, Postlethwait JH, Gómez-Skarmeta JL, and Shubin NH (2015). Deep conservation of wrist and digit enhancers in fish. Proc. Natl. Acad. Sci. USA 112, 803–808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Gel B, Díez-Villanueva A, Serra E, Buschbeck M, Peinado MA, and Malinverni R (2016). regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests. Bioinformatics 32, 289–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Gillis J, and Pavlidis P (2013). Assessing identity, redundancy and confounds in Gene Ontology annotations over time. Bioinformatics 29, 476–482, In press. 10.1093/bioinformatics/bts727. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Gittelman RM, Hun E, Ay F, Madeoy J, Pennacchio L, Noble WS, Hawkins RD, and Akey JM (2015). Comprehensive identification and analysis of human accelerated regulatory DNA. Genome Res 25, 1245–1255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Glasson SS, Chambers MG, Van Den Berg WB, and Little CB (2010). The OARSI histopathology initiative - recommendations for histological assessments of osteoarthritis in the mouse. Osteoarthritis Cartilage 18 (Suppl 3), S17–S23. [DOI] [PubMed] [Google Scholar]
  66. Gokhman D, Agranat-Tamir L, Housman G, Nissim-Rafinia M, Nieves-Colón M, Gu H, Ferrando-Bernal M, Gelabert P, Lipende I, Bontrop R, et al. (2017). Recent Regulatory Changes Shaped Human Facial and Vocal Anatomy. bioRxiv. 10.1101/106955. [DOI] [Google Scholar]
  67. Gray DJ, and Gardner E (1950). Prenatal development of the human knee and superior tibiofibular joints. Am. J. Anat 86, 235–287. [DOI] [PubMed] [Google Scholar]
  68. Grossman SR, Andersen KG, Shlyakhter I, Tabrizi S, Winnicki S, Yen A, Park DJ, Griesemer D, Karlsson EK, Wong SH, et al. ; 1000 Genomes Project (2013). Identifying recent adaptations in large-scale genomic data. Cell 152, 703–713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Gu Z, Eils R, and Schlesner M (2016). Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849. [DOI] [PubMed] [Google Scholar]
  70. Guenther C, Pantalena-Filho L, and Kingsley DM (2008). Shaping skeletal growth by modular regulatory elements in the Bmp5 gene. PLoS Genet 4, e1000308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Guo M, Liu Z, Willen J, Shaw CP, Richard D, Jagoda E, Doxey AC, Hirschhorn J, and Capellini TD (2017). Epigenetic profiling of growth plate chondrocytes sheds insight into regulatory genetic variation influencing height. eLife 6, e29329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Hashemi J, Chandrashekar N, Gill B, Beynnon BD, Slauterbeck JR, Schutt RC Jr., Mansouri H, and Dabezies E (2008). The geometry of the tibial plateau and its influence on the biomechanics of the tibiofemoral joint. J. Bone Joint Surg. Am 90, 2724–2734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. He Y, Gong L, Fang Y, Zhan Q, Liu HX, Lu Y, Guo GL, Lehman-McKeeman L, Fang J, and Wan YJY (2013). The role of retinoic acid in hepatic lipid homeostasis defined by genomic binding and transcriptome profiling. BMC Genomics 14, 575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Hedman AC, Smith JM, and Sacks DB (2015). The biology of IQGAP proteins: beyond the cytoskeleton. EMBO Rep 16, 427–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, and Glass CK (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Hijazi A (2018). Investigating the role of CCCTC-binding factor in osteoarthritis pathogenesis. Osteoarthritis Cartilage 26, S154. [Google Scholar]
  77. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, and Manolio TA (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Hormozdiari F, Gazal S, van de Geijn B, Finucane HK, Ju CJ-T, Loh P-R, Schoech A, Reshef Y, Liu X, O’Connor L, et al. (2018). Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet 50, 1041–1047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Houard X, Goldring MB, and Berenbaum F (2013). Homeostatic mechanisms in articular cartilage and role of inflammation in osteoarthritis. Curr. Rheumatol. Rep 15, 375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Howell SM, Howell SJ, and Hull ML (2010). Assessment of the radii of the medial and lateral femoral condyles in varus and valgus knees with osteoarthritis. J. Bone Joint Surg. Am 92, 98–104. [DOI] [PubMed] [Google Scholar]
  81. Hubisz MJ, Pollard KS, and Siepel A (2011). PHAST and RPHAST: phylogenetic analysis with space/time models. Brief. Bioinform 12, 41–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Hume MA, Barrera LA, Gisselbrecht SS, and Bulyk ML (2015). UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res 43, D117–D122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Hunter DJ, and Bierma-Zeinstra S (2019). Osteoarthritis. Lancet 393, 1745–1759. [DOI] [PubMed] [Google Scholar]
  84. Hunter D, Nevitt M, Lynch J, Kraus VB, Katz JN, Collins JE, Bowes M, Guermazi A, Roemer FW, and Losina E; FNIH OA Biomarkers Consortium (2016). Longitudinal validation of periarticular bone area and 3D shape as biomarkers for knee OA progression? Data from the FNIH OA Biomarkers Consortium. Ann. Rheum. Dis 75, 1607–1614. [DOI] [PubMed] [Google Scholar]
  85. Indjeian VB, Kingman GA, Jones FC, Guenther CA, Grimwood J, Schmutz J, Myers RM, and Kingsley DM (2016). Evolving New Skeletal Traits by cis-Regulatory Changes in Bone Morphogenetic Proteins. Cell 164, 45–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Infante CR, Park S, Mihala AG, Kingsley DM, and Menke DB (2013). Pitx1 broadly associates with limb enhancers and is enriched on hindlimb cis-regulatory elements. Dev. Biol 374, 234–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Jackman S (2017). {pscl}: Classes and Methods for {R} Developed in the Political Science Computational Laboratory (R package version 1.5.2 United States Studies Centre, University of Sydney; ). https://github.com/atahk/pscl/. [Google Scholar]
  88. Jagoda E, Lawson DJ, Wall JD, Lambert D, Muller C, Westaway M, Leavesley M, Capellini TD, Mirazón Lahr M, Gerbault P, et al. (2018). Dis entangling Immediate Adaptive Introgression from Selection on Standing Introgressed Variation in Humans. Mol. Biol. Evol 35, 623–630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Jungers WL (1988). Relative joint size and hominoid locomotor adaptations with implications for the evolution of hominid bipedalism. J. Hum. Evol 17, 247–265. [Google Scholar]
  90. Kannu P, Bateman JF, Belluoccio D, Fosang AJ, and Savarirayan R (2009). Employing molecular genetics of chondrodysplasias to inform the study of osteoarthritis. Arthritis Rheum 60, 325–334. [DOI] [PubMed] [Google Scholar]
  91. Kanton S, Boyle MJ, He Z, Santel M, Weigert A, Sanchís-Calleja F, Guijarro P, Sidow L, Fleck JS, Han D, et al. (2019). Organoid single-cell genomic atlas uncovers human-specific features of brain development. Nature 574, 418–422. [DOI] [PubMed] [Google Scholar]
  92. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, and Kent WJ (2004). The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32, D493–D496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, et al. (2014). The UCSC Genome Browser database: 2014 update. Nucleic Acids Res 42, D764–D770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Kassambara A, and Mundt F (2017). Factoextra: extract and visualize the results of multivariate data analyses (R package version 1.0. 4).
  95. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, and Haussler D (2002). The human genome browser at UCSC. Genome Res 12, 996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Kerkhof HJ, Lories RJ, Meulenbelt I, Jonsdottir I, Valdes AM, Arp P, Ingvarsson T, Jhamai M, Jonsson H, Stolk L, et al. (2010). A genome-wide association study identifies an osteoarthritis susceptibility locus on chromosome 7q22. Arthritis Rheum 62, 499–510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. King M-C, and Wilson AC (1975). Evolution at two levels in humans and chimpanzees. Science 188, 107–116. [DOI] [PubMed] [Google Scholar]
  98. Klein JC, Keith A, Rice SJ, Shepherd C, Agarwal V, Loughlin J, and Shendure J (2019). Functional testing of thousands of osteoarthritis-associated variants for regulatory activity. Nat. Commun 10, 2434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Kokenyesi R, Tan L, Robbins JR, and Goldring MB (2000). Proteoglycan production by immortalized human chondrocyte cell lines cultured under conditions that promote expression of the differentiated phenotype. Arch. Biochem. Biophys 383, 79–90. [DOI] [PubMed] [Google Scholar]
  100. Kosicki M, Tomberg K, and Bradley A (2018). Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol 36, 765–771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Langmead B, and Salzberg SL (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Li H (2011a). Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Li H (2011b). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Li Q, Brown JB, Huang H, and Bickel PJ (2011b). Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat 5, 1752–1779. [Google Scholar]
  105. Liu C-F, Samsa WE, Zhou G, and Lefebvre V (2017a). Transcriptional control of chondrocyte specification and differentiation. Semin. Cell Dev. Biol 62, 34–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Liu Y, Yau MS, Yerges-Armstrong LM, Duggan DJ, Renner JB, Hochberg MC, Mitchell BD, Jackson RD, and Jordan JM (2017b). Genetic Determinants of Radiographic Knee Osteoarthritis in African Americans. J. Rheumatol 44, 1652–1658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Loughlin J (2015). Genetic contribution to osteoarthritis development: current state of evidence. Curr. Opin. Rheumatol 27, 284–288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Lovejoy CO (2007). The natural history of human gait and posture. Part 3. The knee. Gait Posture 25, 325–341. [DOI] [PubMed] [Google Scholar]
  109. Lowenstine LJ, McManamon R, and Terio KA (2016). Comparative Pathology of Aging Great Apes: Bonobos, Chimpanzees, Gorillas, and Orangutans. Vet. Pathol 53, 250–276. [DOI] [PubMed] [Google Scholar]
  110. Lyons KM, and Rosen V (2019). BMPs, TGFβ, and border security at the interzone. Curr. Top. Dev. Biol 133, 153–170. [DOI] [PubMed] [Google Scholar]
  111. Maas SA, and Fallon JF (2005). Single base pair change in the long-range Sonic hedgehog limb-specific enhancer is a genetic basis for preaxial polydactyly. Dev. Dyn 232, 345–348. [DOI] [PubMed] [Google Scholar]
  112. Macrini TE, Coan HB, Levine SM, Lerma T, Saks CD, Araujo DJ, Bredbenner TL, Coutts RD, Nicolella DP, and Havill LM (2013). Reproductive status and sex show strong effects on knee OA in a baboon model. Osteoarthritis Cartilage 21, 839–848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Maechler M, Rousseeuw P, Struyf A, Hubert M, and Hornik K (2018). cluster: Cluster Analysis Basics and Extensions. R Packages Version 1.
  114. Mathelier A, Fornes O, Arenillas DJ, Chen CY, Denay G, Lee J, Shi W, Shyr C, Tan G, Worsley-Hunt R, et al. (2016). JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res 44 (D1), D110–D115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, and Bejerano G (2010). GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol 28, 495–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. McLean CY, Reno PL, Pollen AA, Bassan AI, Capellini TD, Guenther C, Indjeian VB, Lim X, Menke DB, Schaar BT, et al. (2011). Human-specific loss of regulatory DNA and the evolution of human-specific traits. Nature 471, 216–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Mérida-Velasco JA, Sánchez-Montesinos I, Espín-Ferra J, Rodríguez-Vázquez JF, Mérida-Velasco JR, and Jiménez-Collado J (1997). Development of the human knee joint. Anat. Rec 248, 269–278. [DOI] [PubMed] [Google Scholar]
  118. Meyer MB, Benkusky NA, and Pike JW (2014). The RUNX2 cistrome in osteoblasts: characterization, down-regulation following differentiation, and relationship to gene expression. J. Biol. Chem 289, 16016–16031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Miyamoto Y, Mabuchi A, Shi D, Kubo T, Takatori Y, Saito S, Fujioka M, Sudo A, Uchida A, Yamamoto S, et al. (2007). A functional polymorphism in the 5′ UTR of GDF5 is associated with susceptibility to osteoarthritis. Nat. Genet 39, 529–533. [DOI] [PubMed] [Google Scholar]
  120. Miyamoto Y, Shi D, Nakajima M, Ozaki K, Sudo A, Kotani A, Uchida A, Tanaka T, Fukui N, Tsunoda T, et al. (2008). Common variants in DVWA on chromosome 3p24.3 are associated with susceptibility to knee osteoarthritis. Nat. Genet 40, 994–998. [DOI] [PubMed] [Google Scholar]
  121. Morrison JB (1970). The mechanics of the knee joint in relation to normal walking. J. Biomech 3, 51–61. [DOI] [PubMed] [Google Scholar]
  122. Mun J (2008). Advanced Analytical Models: over 800 Models and 300 Applications from the Basel II Accord to Wall Street and Beyond (Wiley; ). [Google Scholar]
  123. Nakajima M, Takahashi A, Kou I, Rodriguez-Fontenla C, Gomez-Reino JJ, Furuichi T, Dai J, Sudo A, Uchida A, Fukui N, et al. (2010). New sequence variants in HLA class II/III region associated with susceptibility to knee osteoarthritis identified by genome-wide association study. PLoS ONE 5, e9723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Nemec S, Luxey M, Jain D, Huang Sung A, Pastinen T, and Drouin J (2017). Pitx1 directly modulates the core limb development program to implement hindlimb identity. Development 144, 3325–3335. [DOI] [PubMed] [Google Scholar]
  125. Neogi T, and Felson DT (2016). Osteoarthritis: Bone as an imaging biomarker and treatment target in OA. Nat. Rev. Rheumatol 12, 503–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Neogi T, Bowes MA, Niu J, De Souza KM, Vincent GR, Goggins J, Zhang Y, and Felson DT (2013). Magnetic resonance imaging-based three-dimensional bone shape of the knee predicts onset of knee osteoarthritis: data from the osteoarthritis initiative. Arthritis Rheum 65, 2048–2058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. O’Rahilly R (1951). The early prenatal development of the human knee joint. J. Anat 85, 166–170. [PMC free article] [PubMed] [Google Scholar]
  128. Ohba S, He X, Hojo H, and McMahon AP (2015). Distinct Transcriptional Programs Underlie Sox9 Regulation of the Mammalian Chondrocyte. Cell Rep 12, 229–243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Pagani L, Lawson DJ, Jagoda E, Mörseburg A, Eriksson A, Mitt M, Clemente F, Hudjashov G, DeGiorgio M, Saag L, et al. (2016). Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538, 238–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Panoutsopoulou K, Southam L, Elliott KS, Wrayner N, Zhai G, Beazley C, Thorleifsson G, Arden NK, Carr A, Chapman K, et al. ; arcOGEN Consortium (2011). Insights into the genetic architecture of osteoarthritis from stage 1 of the arcOGEN study. Ann. Rheum. Dis 70, 864–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  131. Paradis E (2010). pegas: an R package for population genetics with an integrated-modular approach. Bioinformatics 26, 419–420. [DOI] [PubMed] [Google Scholar]
  132. Picard C, Azeddine B, Moldovan F, Martel-Pelletier J, and Moreau A (2007). New emerging role of pitx1 transcription factor in osteoarthritis pathogenesis. Clin. Orthop. Relat. Res 462, 59–66. [DOI] [PubMed] [Google Scholar]
  133. Polk JD, Williams SA, and Peterson JV (2009). Body size and joint posture in primates. Am. J. Phys. Anthropol 140, 359–367. [DOI] [PubMed] [Google Scholar]
  134. Pollard KS, Salama SR, King B, Kern AD, Dreszer T, Katzman S, Siepel A, Pedersen JS, Bejerano G, Baertsch R, et al. (2006a). Forces shaping the fastest evolving regions in the human genome. PLoS Genet 2, e168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  135. Pollard KS, Salama SR, Lambert N, Lambot M-A, Coppens S, Pedersen JS, Katzman S, King B, Onodera C, Siepel A, et al. (2006b). An RNA gene expressed during cortical development evolved rapidly in humans. Nature 443, 167–172. [DOI] [PubMed] [Google Scholar]
  136. Pollard KS, Hubisz MJ, Rosenbloom KR, and Siepel A (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20, 110–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  137. Prabhakar S, Noonan JP, Pääbo S, and Rubin EM (2006). Accelerated evolution of conserved noncoding sequences in humans. Science 314, 786. [DOI] [PubMed] [Google Scholar]
  138. Prabhakar S, Visel A, Akiyama JA, Shoukry M, Lewis KD, Holt A, Plajzer-Frick I, Morrison H, Fitzpatrick DR, Afzal V, et al. (2008). Human-specific gain of function in a developmental enhancer. Science 321, 1346–1350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, Lorente-Galdos B, Veeramah KR, Woerner AE, O’Connor TD, Santpere G, et al. (2013). Great ape genetic diversity and population history. Nature 499, 471–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  140. Pregizer SK, Kiapour AM, Young M, Chen H, Schoor M, Liu Z, Cao J, Rosen V, and Capellini TD (2018). Impact of broad regulatory regions on Gdf5 expression and function in knee development and susceptibility to osteoarthritis. Ann. Rheum. Dis 77, 450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  141. Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  142. R Development Core Team (2008). R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing; ). [Google Scholar]
  143. Racimo F (2016). Testing for ancient selection using cross-population allele frequency differentiation. Genetics 202, 733–750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  144. Ramos YF, Metrustry S, Arden N, Bay-Jensen AC, Beekman M, de Craen AJ, Cupples LA, Esko T, Evangelou E, Felson DT, et al. ; arcOGEN Consortium; TreatOA Collaborators (2014). Meta-analysis identifies loci affecting levels of the potential osteoarthritis biomarkers sCOMP and uCTX-II with genome wide significance. J. Med. Genet 51, 596–604. [DOI] [PubMed] [Google Scholar]
  145. Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, and Zhang F (2013). Genome engineering using the CRISPR-Cas9 system. Nat. Protoc 8, 2281–2308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  146. Rasmussen MD, Hubisz MJ, Gronau I, and Siepel A (2014). Genome-wide inference of ancestral recombination graphs. PLoS Genet 10, e1004342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  147. Reyes C, Leyland KM, Peat G, Cooper C, Arden NK, and Prieto-Alhambra D (2016). Association between overweight and obesity and risk of clinically diagnosed knee, hip, and hand osteoarthritis: a population-based cohort study. Arthritis Rheumatol 68, 1869–1875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  148. Richette P, Poitou C, Garnero P, Vicaut E, Bouillot J-L, Lacorte J-M, Basdevant A, Clément K, Bardin T, and Chevalier X (2011). Benefits of massive weight loss on symptoms, systemic inflammation and cartilage turnover in obese patients with knee osteoarthritis. Ann. Rheum. Dis 70, 139–144. [DOI] [PubMed] [Google Scholar]
  149. Rosen V (2009). BMP2 signaling in bone development and repair. Cytokine Growth Factor Rev 20, 475–480. [DOI] [PubMed] [Google Scholar]
  150. Rountree RB, Schoor M, Chen H, Marks ME, Harley V, Mishina Y, and Kingsley DM (2004). BMP receptor signaling is required for postnatal maintenance of articular cartilage. PLoS Biol 2, e355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  151. Salazar VS, Gamer LW, and Rosen V (2016). BMP signalling in skeletal development, disease and repair. Nat. Rev. Endocrinol 12, 203–221. [DOI] [PubMed] [Google Scholar]
  152. Schwartz HJ (1989). Effect of chronic chromolyn sodium therapy in a beau-tician with occupational asthma. J. Occup. Med 31, 112–114. [PubMed] [Google Scholar]
  153. Selleri L, Depew MJ, Jacobs Y, Chanda SK, Tsang KY, Cheah KSE, Rubenstein JLR, O’Gorman S, and Cleary ML (2001). Requirement for Pbx1 in skeletal patterning and programming chondrocyte proliferation and differentiation. Development 128, 3543–3557. [DOI] [PubMed] [Google Scholar]
  154. Sen M, Cheng Y-H, Goldring MB, Lotz MK, and Carson DA (2004). WISP3-dependent regulation of type II collagen and aggrecan production in chondrocytes. Arthritis Rheum 50, 488–497. [DOI] [PubMed] [Google Scholar]
  155. Settle SH Jr., Rountree RB, Sinha A, Thacker A, Higgins K, and Kingsley DM (2003). Multiple joint and skeletal patterning defects caused by single and double mutations in the mouse Gdf6 and Gdf5 genes. Dev. Biol 254, 116–130. [DOI] [PubMed] [Google Scholar]
  156. Shannon P, and Richards M (2014). MotifDb: An annotated collection of protein-DNA binding sequence motifs. R Package. Version 1.
  157. Shinoda Y, Ogata N, Higashikawa A, Manabe I, Shindo T, Yamada T, Kugimiya F, Ikeda T, Kawamura N, Kawasaki Y, et al. (2008). Kruppel-like factor 5 causes cartilage degradation through transactivation of matrix metalloproteinase 9. J. Biol. Chem 283, 24682–24689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  158. Shlyueva D, Stampfel G, and Stark A (2014). Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet 15, 272–286. [DOI] [PubMed] [Google Scholar]
  159. Sonnery-Cottet B, Archbold P, Cucurulo T, Fayard J-M, Bortolletto J, Thaunat M, Prost T, and Chambat P (2011). The influence of the tibial slope and the size of the intercondylar notch on rupture of the anterior cruciate ligament. J. Bone Joint Surg. Br 93, 1475–1478. [DOI] [PubMed] [Google Scholar]
  160. Stamatakis A (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690. [DOI] [PubMed] [Google Scholar]
  161. Styrkarsdottir U, Thorleifsson G, Helgadottir HT, Bomer N, Metrustry S, Bierma-Zeinstra S, Strijbosch AM, Evangelou E, Hart D, Beekman M, et al. ; TREAT-OA Consortium; arcOGEN Consortium (2014). Severe osteoarthritis of the hand associates with common variants within the ALDH1A2 gene and with rare variants at 1p31. Nat. Genet 46, 498–502. [DOI] [PubMed] [Google Scholar]
  162. Styrkarsdottir U, Helgason H, Sigurdsson A, Norddahl GL, Agustsdottir AB, Reynard LN, Villalvilla A, Halldorsson GH, Jonasdottir A, Magnusdottir A, et al. ; arcOGEN consortium (2017). Whole-genome sequencing identifies rare genotypes in COMP and CHADL associated with high risk of hip osteoarthritis. Nat. Genet 49, 801–805. [DOI] [PubMed] [Google Scholar]
  163. Styrkarsdottir U, Lund SH, Thorleifsson G, Zink F, Stefansson OA, Sigurdsson JK, Juliusson K, Bjarnadottir K, Sigurbjornsdottir S, Jonsson S, et al. (2018). Meta-analysis of Icelandic and UK data sets identifies missense variants in SMO, IL11, COL11A1 and 13 more new loci associated with osteoarthritis. Nat. Genet 50, 1681–1687. [DOI] [PubMed] [Google Scholar]
  164. Tachmazidou I, Hatzikotoulas K, Southam L, Esparza-Gordillo J, Haber-land V, Zheng J, Johnson T, Koprulu M, Zengini E, Steinberg J, et al. ; arcOGEN Consortium (2019). Identification of new therapeutic targets for osteoarthritis through genome-wide analyses of UK Biobank data. Nat. Genet 51, 230–236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  165. Tajima F (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  166. Tardieu C (1999). Ontogeny and phylogeny of femoro-tibial characters in humans and hominid fossils: functional influence and genetic determinism. Am. J. Phys. Anthropol 110, 365–377. [DOI] [PubMed] [Google Scholar]
  167. Tardieu C (2010). Development of the human hind limb and its importance for the evolution of bipedalism. Evol. Anthropol 19, 174–186. [Google Scholar]
  168. Valdes AM, Van Oene M, Hart DJ, Surdulescu GL, Loughlin J, Doherty M, and Spector TD (2006). Reproducible genetic associations between candidate genes and clinical knee osteoarthritis in men and women. Arthritis Rheum 54, 533–539. [DOI] [PubMed] [Google Scholar]
  169. Valdes AM, Evangelou E, Kerkhof HJM, Tamm A, Doherty SA, Kisand K, Tamm A, Kerna I, Uitterlinden A, Hofman A, et al. (2011). The GDF5 rs143383 polymorphism is associated with osteoarthritis of the knee with genome-wide statistical significance. Ann. Rheum. Dis 70, 873–875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  170. Varki A, and Altheide TK (2005). Comparing the human and chimpanzee genomes: searching for needles in a haystack. Genome Res 15, 1746–1758. [DOI] [PubMed] [Google Scholar]
  171. Visel A, Thaller C, and Eichele G (2004). GenePaint.org: an atlas of gene expression patterns in the mouse embryo. Nucleic Acids Res 32, D552–D556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  172. Vitti JJ, Grossman SR, and Sabeti PC (2013). Detecting natural selection in genomic data. Annu. Rev. Genet 47, 97–120. [DOI] [PubMed] [Google Scholar]
  173. Voight BF, Kudaravalli S, Wen X, and Pritchard JK (2006). A map of recent positive selection in the human genome. PLoS Biol 4, e72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  174. Wang RN, Green J, Wang Z, Deng Y, Qiao M, Peabody M, Zhang Q, Ye J, Yan Z, Denduluri S, et al. (2014). Bone Morphogenetic Protein (BMP) signaling in development and human diseases. Genes Dis 1, 87–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  175. Wang JS, Infante CR, Park S, and Menke DB (2018). PITX1 promotes chondrogenesis and myogenesis in mouse hindlimbs through conserved regulatory targets. Dev. Biol 434, 186–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  176. Ward LD, and Kellis M (2012). HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res 40, D930–D934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  177. Weir BS, and Cockerham CC (1984). Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370. [DOI] [PubMed] [Google Scholar]
  178. Wittkopp PJ, and Kalay G (2011). Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet 13, 59–69. [DOI] [PubMed] [Google Scholar]
  179. Wu M, Chen G, and Li Y-P (2016). TGF-β and BMP signaling in osteoblast, skeletal development, and bone formation, homeostasis and disease. Bone Res 4, 16009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  180. Xu S, Liu P, Chen Y, Chen Y, Zhang W, Zhao H, Cao Y, Wang F, Jiang N, Lin S, et al. (2018). Foxp2 regulates anatomical features that may be relevant for vocal behaviors and bipedal locomotion. Proc. Natl. Acad. Sci. USA 115, 8799–8804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  181. Ye W, Song Y, Huang Z, Osterwalder M, Ljubojevic A, Xu J, Bobick B, Abassah-Oppong S, Ruan N, Shamby R, et al. (2016). A unique stylopod patterning mechanism by Shox2-controlled osteogenesis. Development 143, 2548–2560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  182. Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, Sandstrom R, Ma Z, Davis C, Pope BD, et al. ; Mouse ENCODE Consortium (2014). A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  183. Zehra R, and Abbasi AA (2018). Homo sapiens-Specific Binding Site Variants within Brain Exclusive Enhancers Are Subject to Accelerated Divergence across Human Population. Genome Biol. Evol 10, 956–966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  184. Zeileis A, Kleiber C, and Jackman S (2008). Regression Models for Count Data in {R}. J. Stat. Softw 27, 1–25. [Google Scholar]
  185. Zengini E, Hatzikotoulas K, Tachmazidou I, Steinberg J, Hartwig FP, Southam L, Hackinger S, Boer CG, Styrkarsdottir U, Gilly A, et al. (2018). Genome-wide analyses using UK Biobank data provide insights into the genetic architecture of osteoarthritis. Nat. Genet 50, 549–558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  186. Zhai G, van Meurs JB, Livshits G, Meulenbelt I, Valdes AM, Soranzo N, Hart D, Zhang F, Kato BS, Richards JB, et al. (2009). A genome-wide association study suggests that a locus within the ataxin 2 binding protein 1 gene is associated with hand osteoarthritis: the Treat-OA consortium. J. Med. Genet 46, 614–616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  187. Zhai G, Doré J, and Rahman P (2015). TGF-β signal transduction pathways and osteoarthritis. Rheumatol. Int 35, 1283–1292. [DOI] [PubMed] [Google Scholar]
  188. Zhang F, and Lupski JR (2015). Non-coding genetic variants in human disease. Hum. Mol. Genet 24 (R1), R102–R110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  189. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, and Liu XS (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  190. Zhao H, Sun Z, Wang J, Huang H, Kocher J-P, and Wang L (2014). CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  191. Zhao H, Zhou W, Yao Z, Wan Y, Cao J, Zhang L, Zhao J, Li H, Zhou R, Li B, et al. (2015). Foxp1/2/4 regulate endochondral ossification as a suppresser complex. Dev. Biol 398, 242–254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  192. Zheng J, Erzurumluoglu AM, Elsworth BL, Kemp JP, Howe L, Hay-cock PC, Hemani G, Tansey K, Laurin C, Pourcain BS, et al. ; Early Genetics and Lifecourse Epidemiology (EAGLE) Eczema Consortium (2017). LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1. Figure S1. Related to Figure 1.

(A-C): phyloP scores (phyloP20ways) were averaged per-bp across all specific-knee ATAC-seq regions in a peak set and plotted in respective colors as a function of distance along a 1kb window centered on the peak middle; a random background distribution of conservation scores is indicated in gray (see STAR Methods). (A) Specific Distal Femur (B) Specific Proximal Tibia (C) Specific Knee-common. (D-E): HAR enrichment gene loci screenshots. (D) A general distal-femur element overlaps a human-accelerated region (HAR) (Prabhakar et al., 2006) upstream of BMP2, a TGF-β family member with a well-established role in chondrogenic differentiation, that is expressed in the knee (Rosen, 2009) and which been linked to osteoarthritis (Gamer et al., 2018; Valdes et al., 2006). ATAC-seq regions (“General DF”), ENCODE DNase hypersensitivity sites and phyloP100ways conservation tracks are shown on bottom. (E) A proximal-tibia-specific element overlaps an accelerated region (Bird et al., 2007) intronic to IQGAP1, a protein complex scaffold (Hedman et al., 2015) expressed in embryonic mouse cartilage (Cupit et al., 2004) which impacts skeletal development, particularly in the tibia (Dickinson et al., 2016). Format as in (D), with ATAC-seq track “PT” shown.

2. Figure S2. Related to Figure 3.

Intra-species sequence diversity in humans and apes. The number of intra-species variants intersecting a given ATAC-seq peak were counted for all variable sequences in a set within a given species and expressed as ‘SNPs per sequence’. Mean and median values of distributions are indicated in dashed and bold lines, respectively. Table S5 lists numerical values for each comparison. (A) General Knee Distal Femur, (B) General Knee Proximal Tibia, (C) General Knee Common, (D) General Elbow Distal Humerus, (E) General Elbow Proximal Radius, (F) General Elbow Common, (G) Elbow Distal Humerus-Specific, (H) Elbow Proximal Radius-Specific, (I) Elbow-Common-Specific. (J) Knee Distal Femur-Specific, overlapping human E59 Distal Femur data. (K) Knee Proximal Tibia-Specific, overlapping human E59 Proximal Tibia data. (L) Knee-Common-Specific, overlapping pooled human E59 Distal Femur and Proximal Tibia data. (M) Elbow Distal Humerus-Specific, overlapping human E59 Distal Humerus data. (N) Elbow Proximal Radius-Specific, overlapping human E59 Proximal Radius data. (O) Elbow-Common-Specific, overlapping pooled human E59 Distal Humerus and Proximal Radius data. (P) Counts of common chimpanzee variants (via the GADP dataset) per bp of sequence for element sets were compared to random region sets along with other genomic features for enrichment/depletion analysis; labels correspond to results in Table S5. Significance codes: not significant (ns), < 0.05 (*), < 0.01 (**), < 1e-5 (***).

3. Figure S3. Related to Figure 4.

Examining modern human sequence variation in knee sets. (A-B): Motif alteration in putative regulatory elements near osteoarthritis-related genes. Red shading indicates altered base relative to motif logo, blue shading below indicates genomic position of sequence. ATAC-seq regions and phyloP100ways conservation tracks are shown on bottom. (A) Common human variant (rs265053: C/T) within an intron of UNC5C, improves a predicted CEBPB sequence motif downstream of BMPR1B (Baugé et al., 2013; Zhai et al., 2015). (B) Common human variant (rs2280153: G/A) improves a predicted SOX9 sequence motif within the WISP3 promoter (Kannu et al., 2009; Sen et al., 2004). (C) Common human variants within the indicated peak sets distributed across the first two PCs of an Fst-based PCA analysis. Two sub-groups based on k-means clustering are indicated in red (Cluster 1) and blue (Cluster 2). Variance explained by each PC is indicated on axes. (D) Representative correlation heatmap between PCs and population comparisons across all variants in the knee-specific pooled set. Pearson’s correlation between PCs and population pairs are clustered, with populations colored by continent: red - Asia, green - Europe, blue - Africa. (E, F) Calculated Fst for common variants across populations. Fst values for a single variant across multiple pairwise population comparisons are shown, with each row representing a single variant; the particular pairwise comparison is indicated below, color scheme as for (D). Red/blue scale for Fst values is indicated on far right. Both variants and pairwise comparisons are grouped by hierarchical clustering. (E) Cluster 1 variants within the knee-specific pooled set. (F) Cluster 2 variants within the knee-specific pooled set. (G-J) Visual genotype plots for selected populations in a given locus. SNP sequences for all individuals are arranged according to a maximum-likelihood tree. Red/Blue indicates ancestral-derived assignments, respectively. Solid orange lines indicate SNPs which are gapped in chimpanzee, solid green lines indicates individuals carrying alleles besides the ancestral/derived annotations. Vertical yellow dashed lines indicate proxy variants, vertical green dashed lines indicate lead variants, when present in indicated region. For plots with a distinguishable haplotype (G, H) the alternate haplotype is indicated with a thick green line. (G) Visual genotypes for selected population (Volga-Ural region of Russia) (Capellini et al., 2017) within a ~150kb haploblock in the GDF5-UQCC1 locus. Distribution of genotypes indicates the presence of a high-frequency haploblock carrying the lead GWAS variant “A” at rs143383 (Miyamoto et al., 2007; Valdes et al., 2011) and linked variants (rs4911178, rs6060369). (H) Visual genotypes for selected population (South-East Asia) within a ~64kb haploblock in the UNC5C-BMPR1B locus; variant rs2626053 (arcOGEN Consortium et al., 2012) appears on a low-frequency haploblock. (I) Visual genotypes for selected population (South-Western Europe) in the ENPP1/3 locus; given the lack of clearly-defined haploblock structure in the locus, a 200kb region centered on rs3850251 (arcOGEN Consortium et al., 2012; Klein et al., 2019) was used to generate this plot, with linked rs7744039, rs7773292, rs9493095 indicated. (J) Visual genotypes for selected population (Central Siberia) in the LSM5 locus; given the lack of clearly-defined haploblock structure in the locus, a 200kb region centered on rs4141788, linked to rs7785659 (arcOGEN Consortium et al., 2012), was used to generate this plot.

4. Figure S4. Related to Figures 5 and 7.

Additional morphometric analyses on R4 enhancer null mice. (A) Quantified anatomical indices of the femur and tibia used in this study. Femoral length (FL), bicondylar width of the femur (BCW), trochlear groove depth (TD), width of the lateral femoral condyle (LCW), width of the medial femoral condyle (MCW), intercondylar notch width (NW), trochlear groove sulcus angle (SA), width of the tibial plateau (TPW), height of the medial tibial spine (MTSH), height of the lateral tibial spine (LTSH), posterior slope of the tibial plateau (PSTP), and tibial length (TL). Curvature radius of the femoral condyle and the posterior slope of the tibial plateau were measured across both medial and lateral compartments. (B,C) Morphological defects in 1 year old R4 enhancer null mice (HOM), compared to heterozygous mice (HET) and wild-type mice (WT). A number of measurements were carried out at 1 year of age (see (A)). This figure displays only those which revealed significant differences between control and R4 enhancer null mice in distal femur (B) and proximal tibia (C) structures. See Table S9 for complete results of statistical comparisons for all measurements as well as significance values for each comparison. (D) Correlation between OARSI scores and morphometric measures of the distal femur and proximal tibia on R4 enhancer mice. Correlation (R2) and p value results for Pearson’s correlation tests are indicated on each graph.

5. Figure S5. Related to Figure 5.

Additional histological sections on R4 enhancer null mice at P30 and 1 year. (Left) Comparison of WT and HOM knee joints at P30, showing microCT renditions (top row), high-magnification images of sections of the unaffected knee joint in both genotypes as reported in Figure 5 (second row), and low-magnification images of the of same medial sections but at different planes (third and fourth rows). (Right) Comparison of WT and two HOM knee joints of differing phenotypic severity (HOM-1 mild osteoarthritis (green); HOM-2 severe osteoarthritis (red)) at 1 year, showing microCT renditions (top row), including an X-ray image of HOM-2 specimen with heterotopic ossification (top row, far right), high-magnification images showing effected medial and lateral sections (when effected) as reported in Figure 5 (second row), low-magnification images of the same medial and lateral sections of the joint but in two different planes (third and fourth rows), and total joint images for all three specimens (fifth row). Note, the loss of articular cartilage matrix in the lateral compartment and clustering and loss of cells in the cartilage of the medial compartment in HOM knees. Scale bars, 50 μm (second row); 500 μm (third/fourth rows); 250 μm (fifth row).

6
7
8
9
10
11
12
13
14

Data Availability Statement

The data supporting the findings of this study, as presented in Figures 1, 2, 3, 4, 5, 6, 7, S1, S2, S3, S4, and S5 are available within Tables S1, S2, S3, S4, S5, S6, S7, S8, and S9 included in this publication as well as from the Lead Contact upon reasonable request. The accession number for the raw ATAC-seq data and processed peak bed files reported in this paper is GEO: GSE122877

Relevant URLs

Jackson Laboratory, https://www.jax.org/; MIT CRISPR Tools, http://zlab.bio/guide-design-resources; HaploRegV.4.1, https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php/; UniPROBE Database, http://the_brain.bwh.harvard.edu/uniprobe/about.php/; Visigene,:http://genome.ucsc.edu/cgi-bin/hgVisigene; Eurexpress, http://www.eurexpress.org/ee/; GenePaint, http://www.genepaint.org/Frameset.html; Mouse Genome Informatics, http://www.informatics.jax.org/; 1000 Genomes Project raw data, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr20.phase1_release_v3.20101123.snps_indels_svs.-genotypes.vcf.gz; GADP data, https://eichlerlab.gs.washington.edu/greatape/data/; EGDP data, https://evolbio.ut.ee/CGgenomes_VCF/; SAMtools, http://samtools.sourceforge.net/; ProxyFinder (https://github.com/immunogenomics/harmjan/tree/master/ProxyFinder); OAI: http://oai.epi-ucsf.org./datarelease/; FNIH: https://fnih.org/what-we-do/biomarkers-consortium/programs/osteoarthritis-project; LDSC, https://github.com/bulik/ldsc, LDHub Files, http://ldsc.broadinstitute.org/gwashare; LDHub results, http://ldsc.broadinstitute.org/lookup; Alkes Group LDSCORE files, https://data.broadinstitute.org/alkesgroup/LDSCORE

RESOURCES