Abstract
Many human proteins contain domains that vary in size or copy number due to variable numbers of tandem repeats (VNTRs). However, the relationships of VNTRs to most phenotypes are unknown due to difficulties measuring such repetitive elements. We developed methods to estimate VNTR lengths from whole-exome sequencing data and impute VNTR alleles into SNP-haplotypes. Analyzing 118 protein-altering VNTRs in 415,280 UK Biobank participants for association with 786 phenotypes identified some of the strongest associations of common variants with human phenotypes including height, hair morphology, and biomarkers of health. Accounting for large-effect VNTRs further enabled fine-mapping of associations to many more protein-coding mutations in the same genes. These results point to cryptic effects of highly polymorphic common structural variants that have eluded molecular analyses to date.
The human genome contains thousands of variable-number-of-tandem-repeat (VNTR) polymorphisms (1, 2), but the effects of these polymorphisms on human phenotypes are largely unknown. VNTRs are multi-allelic variants at which a nucleotide sequence – from seven to thousands of bp long – is repeated several to hundreds of times, with the number of repeats varying among individuals (Fig. S1). Extreme alleles of VNTRs have been implicated in diseases including progressive myoclonus epilepsy (3) and facioscapulohumeral muscular dystrophy (4). However, since most VNTRs are invisible to SNP arrays and difficult to measure with short-read sequencing, VNTRs have not been considered in genotype-phenotype association studies that have been central to recent work in human genetics.
We hypothesized that exome-sequence data might contain unappreciated information about VNTR lengths, and that VNTR alleles might segregate on specific SNP haplotypes, enabling statistical imputation (5) in SNP-phenotype data sets from hundreds of thousands of people, such as UK Biobank (UKB) (6).
Exploring the phenotypic effects of coding VNTRs
We identified candidate VNTRs by scanning the human reference genome for tandemly repeated sequences (7). For each repeat, we estimated “diploid VNTR content”—the sum of maternally- and paternally-derived allele lengths—in N=49,959 exome-sequenced UKB participants (8) by measuring numbers of reads that aligned to the repeated sequence (7). We then used surrounding SNPs to identify haplotypes likely to have been co-inherited from a recent common ancestor, enabling resolution of diploid measurements into allele-specific contributions and imputation of VNTR lengths into SNP-haplotypes of N=437,612 additional UKB participants. We developed statistical algorithms to perform such analysis on extended SNP haplotypes for hundreds of thousands of individuals, using sibling IBD information to benchmark accuracy and optimize analysis parameters (7). We focused subsequent analysis on autosomal exon-overlapping repeats in 118 genes for which these measurements exhibited cis-heritability in sibling pairs (Table S1).
We applied this approach to identify relationships between coding VNTR alleles and 786 phenotypes (Table S2) in up to 415,280 unrelated UKB participants (depending on phenotype) of European ancestry. This analysis found 185 statistically significant associations (Table S3). To determine whether such associations were driven by VNTR length variation, rather than by other variants with which the VNTRs were in linkage disequilibrium (LD), we performed fine-mapping analyses (9) considering nearby genotyped and imputed variants (6, 10). Because variation at most VNTRs arises from three or more alleles, VNTR variation was only partially correlated with individual SNPs, enabling this analysis to distinguish VNTR from SNP effects.
Nineteen phenotype associations involving five distinct VNTRs (Tables 1 and S3; Fig. S1) exhibited evidence (FINEMAP (9) posterior probability >0.95) that VNTR length variation, rather than nearby SNPs, drove genotype-phenotype associations. For these five VNTRs, we improved genotyping accuracy by incorporating additional information from within-repeat variation or spanning reads to confirm the associations (Figs. S2-S3; (7)).
Table 1. VNTRs within protein-coding sequences affect diverse human phenotypes.
For each of five protein-altering VNTRs involved in phenotype associations that passed stringent fine-mapping criteria, P-values (in linear mixed model analyses of N=415,280 unrelated UKB participants of European ancestry) and estimated effect-size ranges (across the longest and shortest alleles sufficiently common to be amenable to our computational analysis) are listed for the most-strongly associated phenotype. aa, amino acids.
Gene | Cytoband | Repeat unit size |
Repeat count (EUR) |
Protein domain (effect) |
Phenotype | Effect range (± s.e.) | P-value |
---|---|---|---|---|---|---|---|
LPA | 6q25.3-q26 | ~5.6kb (114aa, 2 exons) | 2-40 | Kringle-IV (number) | Lipoprotein(a) concentration | 5.1 (± 0.5) s.d.(= 233 ± 23 nmol/L) | 4.4 x 10−(25,121) |
ACAN | 15q26.1 | 57bp (19aa) | 13-44 | Chondroitin sulfate (size) | Height | 0.49 (± 0.04) s.d.(= 3.2 ± 0.3 cm) | 1.7 x 10−234 |
TENT5A | 6q14.1 | 15bp (5aa) | 2-7 | Unknown (size) | Height | 0.09 (± 0.01) s.d.(= 0.6 ± 0.1 cm) | 2.5 x 10−53 |
MUC1 | 1q22 | 60bp (20aa) | 20-125 | Extracellular (size) | Serum urea | 0.16 (± 0.01) s.d.(= 0.22 ± 0.01 mmol/L) | 2.7 x 10−163 |
TCHH | 1q21.3 | 18bp (6aa) | 5-15 | α-helix rod (size) | Male pattern baldness score | −0.063 (± 0.006) s.d. | 1.6 x 10−55 |
These associations appeared to explain some of the largest known GWAS signals for human phenotypes, including height, serum urea, and hair phenotypes, with some associations exhibiting strength comparable to or exceeding that of any single SNP in the genome.
Three VNTRs—within exons of TENT5A, MUC1, and TCHH—had not previously been implicated at these loci; a fourth (in ACAN) was recently reported in parallel work (11). Analysis also replicated an association between the length of the KIV-2 repeat in LPA and lipoprotein(a) concentration (12) (P = 4.4 x 10−(25,121), BOLT-LMM (13)). All five VNTRs were genotyped and imputed accurately (RMSE ~ 1 repeat unit and/or R2 ≥ 0.7) according to benchmarks using cross-validation (Fig. S4 and Table S1) and the HGSVC2 long-read sequencing data set (Figs. S5-9) (7, 14).
Fine-mapping of LPA variants influencing lipoprotein(a) concentration
Complex genetics involving VNTRs and SNPs at the same locus was revealed in analysis of lipoprotein(a) concentration (Lp(a)), for which elevated levels are a major risk factor for coronary artery disease (15). Lp(a) is almost completely heritable, with roughly half of its population variance explained by a VNTR-generated size polymorphism in the kringle-IV (KIV) domain of apo(a) (12). Each KIV-2 repeat unit (~5.6 kb) spans two exons of LPA, which together encode a 114-amino-acid copy of this domain. Longer alleles—with more copies of the encoded kringle repeat—are known to associate with lower Lp(a) levels (12, 16), reflecting retention of longer apo(a) isoforms in the endoplasmic reticulum (17). In UKB, inheritance at the LPA locus explained most of the variance in Lp(a) measurements (R=0.93 in sib-pairs sharing both LPA alleles, consistent with previous work (18)), with KIV-2 length explaining ~61% of this variance in a nonparametric model.
To identify additional LPA variants that might more completely explain Lp(a) variation, and to explore their interactions with KIV-2 length, we utilized individuals heterozygous for either of two coding variants (combined MAF=0.05) that create null alleles that produce undetectable serum Lp(a) (7). This approach created an effective haploid model for Lp(a) and made it possible to systematically identify and measure the effects of Lp(a)-altering alleles (Fig. S10). We performed stepwise conditional analysis to identify LPA sequence variants that associated with low Lp(a) despite occurring on short or medium-length KIV-2 alleles that typically associate with higher Lp(a) levels (7).
These analyses identified associations to 17 protein-altering variants, each of which appeared to greatly reduce Lp(a) (P<1 x 10−17 for each variant, Fisher’s exact test or linear regression, Table S4); 43% of European haplotypes were affected by at least one of these variants. Six variants predicted to partially or fully abolish constitutive splice sites and six missense variants achieved the strongest associations in 12 consecutive stages of stepwise analysis; five additional rare (MAF<1%) coding variants exhibited top or near-top associations in further conditional analyses (Figs. 1A, S11, and Table S4). The two variants with the largest impacts on Lp(a) variation in the European population (owing to their high allele frequencies; MAF=13% and 21%) were variants within the KIV-2 region that are computationally predicted to impair splicing (19) of KIV-2 exon 2. One of these splice variants has been experimentally validated (20). These variants reduced Lp(a) by 85% and 89%, respectively, when present within a single KIV-2 repeat unit; alleles carrying either variant on multiple repeat units within the VNTR produced nearly undetectable Lp(a) (Fig. S12). Fine-mapping analyses identified three other common variants (MAF=14-28%)—two in the 5’ untranslated region of LPA (observed to regulate translational activity (21, 22)) and one missense variant—associated with more modest effects on Lp(a) levels across a broad range of KIV-2 alleles (Fig. 1A and Table S4).
Figure 1. Kringle IV-2 repeat length variation and 23 LPA SNPs together explain ~90% of lipoprotein(a) heritability.
A, Serum lipoprotein(a) concentration vs. KIV-2 VNTR length in an effective-haploid model of Lp(a), involving N=24,969 LPA alleles (in exome-sequenced UKB participants of European ancestry) for which the allele on the homologous chromosome was predicted to produce negligible Lp(a) (<4 nmol/L). Colors indicate the 15 most common Lp(a)-modifying SNPs identified by fine-mapping analysis. Curves indicate parametric fits of Lp(a) to KIV-2 length (gray: alleles not carrying any Lp(a)-modifying SNP; red, blue, green: carriers of a single common Lp(a)-modifying SNP); large points, mean Lp(a) among such alleles in KIV-2 length bins (error bars, 95% CIs). Histograms (top/bottom), counts of Lp(a) measurements outside the reportable range (<3.8 nmol/L or >189 nmol/L), colored by Lp(a)-modifying SNPs (7). B, Observed and predicted median Lp(a) among individuals of African (AFR; N=893), European (EUR; N=42,162), South Asian (SA; N=954), and East Asian (EAS; N=156) ancestry. C, LPA allele frequencies by ancestry. VNTR alleles in cis with a large-effect Lp(a)-reducing variant (respectively, the Lp(a)-increasing 5’ UTR variant rs1800769) are indicated in gray (respectively, red). D,E, Myocardial infarction risk (respectively, type 2 diabetes prevalence) vs. measured or genetically predicted Lp(a). Error bars, 95% CIs.
The strong effects of the VNTR and SNPs at LPA, the large sample size of UK Biobank, and the ability to chromosomally phase all these variants accurately, made it possible to identify nonlinear and cis-epistatic effects at LPA. Accounting for the effects of the 17 implicated coding variants at LPA showed that the inverse relationship between KIV-2 length and Lp(a) (12, 17) breaks down for very short (high-protein-level) alleles (Fig. 1A). Throughout most of the KIV-2 length range (12-24 repeats), each one-repeat-unit decrease in KIV-2 length resulted in a 37% increase in Lp(a) (Fig. 1A). However, this effect was attenuated for alleles with fewer than 12 repeats and appeared to invert around 8 repeats (P=9.4 x 10−31, linear regression; Figs. 1A and S13). Accounting for the nonlinear effect of KIV-2 length and for phase-resolved LPA sequence variants explained 90% of the heritable variance (83% of total variance) in Lp(a) (vs. ~60% of total variance in earlier work (12, 23)).
Serum Lp(a) levels vary across populations (12), with median measurements 4-fold higher among Africans than among Europeans, but the reason for this cross-population variation has been unclear. We found that this variation was largely explained by population differences in the allele frequencies of LPA sequence variants (Fig. 1B). Elevated Lp(a) in UKB participants of African ancestry (median 80.1 nmol/L vs.18.5 nmol/L in Europeans) was primarily explained by the paucity of alleles carrying variants that greatly reduced Lp(a) (~13% of African alleles vs. ~43% of European alleles, despite sufficient discovery power in both populations) and the higher frequency of the Lp(a)-increasing 5’ UTR variant among African alleles (MAF=46% vs. 17% in European alleles for rs1800769; Fig. 1C). These allele frequency differences also explained the apparent difference in shape of the Lp(a)-KIV-2 curve in different populations (Fig. S14).
The accuracy of genetically predicted Lp(a) (R2=0.83 in Europeans) enabled insights into epidemiological associations involving Lp(a). We observed that the myocardial infarction risk-increasing effect of higher Lp(a) (15, 24) extends to extreme Lp(a) levels (OR=3.1, 95% CI=1.9-5.2 for individuals with genetically predicted Lp(a)>400 nmol/L; Fig. 1D). In contrast, lower genetically predicted Lp(a) did not associate with increased type-2 diabetes (T2D) risk, suggesting that the 17% (s.e. 1%) lower levels of Lp(a) observed in T2D patients represents reverse causation resulting from T2D itself, T2D-related liver comorbidities, or T2D medication (Figs. 1E, S15, and Table S5).
Human height is strongly affected by VNTRs in ACAN and TENT5A
Human height associates with hundreds of common alleles (25), generally with small effect sizes (<0.05 standard deviations). In contrast, size variation of a 57bp (19 amino acid) repeat in the ACAN gene strongly associated with height (P=1.7 x 10−234, BOLT-LMM), with an effect size differential of 0.49 standard deviations (s.e. 0.04)—i.e., 3.2 centimeters—between the longest and shortest European alleles (Fig. 2). This association, which appears to underlie one of the first reported genetic associations with height (26), was also observed in a parallel study using long-read sequencing in the deCODE cohort (11). Here, analysis in the larger, more diverse UKB cohort—which contains double the range of allelic variation, including a very short 6-repeat African allele and European alleles with up to ~44 repeats (Fig. 2B,D)—uncovered several additional insights.
Figure 2. Lengths of protein-coding repeat polymorphisms in ACAN and TENT5A associate with human height.
A, Genetic associations with height in UKB participants of European (top; EUR N=415,280) and African (bottom; AFR N=7,543) ancestry. B, ACAN VNTR allele length distributions. C, Height association statistics at ACAN in three consecutive steps of stepwise conditional analysis (EUR N=415,280). Large diamond/squares, likely-causal coding mutations; colored dots, variants in partial LD (R2>0.1) with labeled variants. Height phenotypes were adjusted for genetic predictions computed using the rest of the genome (7). D, Mean height of carriers (lines, left axis) and EUR allele frequencies (histograms, right axis) of ACAN alleles defined by VNTR length and missense SNP haplotype; error bars, 95% CIs. Rare long alleles (40-42 repeats) were grouped into one bin. E, Height associations at TENT5A. F, Mean height and EUR allele frequencies for TENT5A VNTR alleles; error bars, 95% CIs.
Height exhibited an approximately linear relationship with length of the ACAN VNTR. Consistent increasing effects were observed across a series of at least nine distinct VNTR allele lengths, resulting in an association signal (P=1.7 x 10−234, BOLT-LMM) stronger than that of any nearby variant, explaining 0.19% of height variance among European-ancestry UKB participants (Fig. 2C,D). Moreover, among 7,543 UKB participants of African ancestry, the ACAN VNTR association was nearly 50% stronger than the association of any other variant in the genome (P=5.2 x 10−12 for the VNTR vs. P=1.4 x 10−8 for the strongest SNP association) and explained 0.60% of height variance, primarily owing to greater VNTR length variation (s.d.=3.7 repeats vs. 1.5 repeats in Europeans; Fig. 2B). Imputation of the VNTR association into height association statistics from the African-ancestry AAAGC cohort (27) replicated these results (with the VNTR explaining an estimated 0.42% of height variance; imputed P=5.8 x 10−40 vs. linear regression P=3.4 x 10−20 for the strongest SNP association genome-wide; Fig. S16; (7)).
Aggrecan, the protein encoded by ACAN, is a component of the extracellular matrix in growth plate cartilage and is required for normal growth plate cytoarchitecture (28). The VNTR generates 2.4-fold size variation in aggrecan’s first chondroitin sulfate domain (CS1), a domain whose amino-acid residues are modified by long, charged polysaccharide chains that endow this extracellular matrix with key properties including the ability to hold large amounts of water (29).
As at LPA, incorporation of the ACAN VNTR into genetic association analysis (by stepwise conditional analysis) made it possible to identify additional genetic effects, driven at ACAN by two common missense SNPs (Fig. 2C and Table S6). These two missense SNPs, which affect ACAN globular domains, had two of the top three predicted deleteriousness scores (30) (CADD = 23.1 for rs3817428 and 27.6 for rs34949187) among common missense SNPs in ACAN and were corroborated by Bayesian fine-mapping (9) analysis (FINEMAP posterior probability of causality >0.99). A combined model including the VNTR and these SNPs explained 0.33% of height variance in Europeans.
Despite the strong effects of ACAN VNTR alleles on height, neither end of the allelic spectrum appeared to compromise ACAN function in any way detrimental to health. Whereas loss-of-function mutations in ACAN cause autosomal dominant skeletal disorders (31), VNTR length variation did not associate at Bonferroni significance with any disease in UK Biobank (P>3 x 10−4, logistic regression). A participant homozygous for the short 6-repeat allele (AF=1.2% among participants with African ancestry) had no reported musculoskeletal disease phenotypes.
A distinct coding VNTR in the TENT5A gene (previously named FAM46A) consisting of 2-7 repeats of 15bp also associated with height (P=2.5 x 10−53, BOLT-LMM), with six VNTR alleles exhibiting monotonically increasing effects (Fig. 2E,F). TENT5A—a poly(A) polymerase in which multiple coding variants have been linked to autosomal recessive osteogenesis imperfecta (OI) (32)—polyadenylates and increases expression in osteoblasts of the collagen genes COL1A1 and COL1A2 and other genes mutated in OI (33).
Kidney-function phenotypes shaped by a VNTR in MUC1
The MUC1 gene encodes a secreted (cell-surface-associated) protein (mucin 1) with cell-adhesive and anti-adhesive properties. MUC1 harbors a VNTR that contains 20-125 repeats (34) of a 60bp (20 amino acid) coding sequence that determines the length of a heavily glycosylated extracellular domain. Ultra-rare frameshift mutations within the MUC1 VNTR cause autosomal dominant tubulointerstitial kidney disease (35). In our analyses, length of the MUC1 VNTR associated with several renal phenotypes (Fig. 3), including serum urea (P=2.7 x 10−163, BOLT-LMM) and serum urate (P=4.7 x 10−99, BOLT-LMM). Longer VNTR alleles also associated with gout (P=3.6 x 10−17, logistic regression), a disease caused by excessive uric acid crystallization in the joints.
Figure 3. MUC1 VNTR length associates with multiple renal phenotypes.
A,C, Genetic associations with serum urea (A) and serum urate (C) at MUC1 (top; orange dots indicate variants in LD with MUC1 VNTR length (R2>0.1)) and genome-wide (bottom); N=415,280 UKB EUR participants. B,D, Mean phenotypes in carriers (B) or disease odds ratios (D) (lines, left axis) and allele frequencies (histograms, right axis) of MUC1 VNTR alleles. VNTR alleles were stratified into three groups for phenotype analyses: short (<55 repeat units), long (55-95 repeat units), and very long (>95 repeat units). Error bars, 95% CIs; eGFR, estimated glomerular filtration rate.
The MUC1 VNTR length polymorphism appeared to underlie some of the strongest, earliest reported SNP associations with serum urea and serum urate, two biomarkers of renal function that otherwise have somewhat independent heritability (genetic correlation = 0.25 (s.e. 0.01); Fig. 3A,C). For urea, the VNTR exhibited the strongest association genome-wide (matching that of a SNP on chromosome 5), explaining ~1% of heritable variance (~0.2% of total variance) in Europeans and accounting for nearly all of the association signal at the MUC1 locus (previously reported as MTX1-GBA (36); Fig. 3A). For urate, the VNTR also appeared to be the primary causal variant at a locus previously reported as TRIM46 (37) (Fig. 3C). Longer MUC1 alleles associated with increasing levels of both serum urea and urate across the VNTR length spectrum, with an incompletely dominant effect on urea (P=2.3 x 10−20 for interaction, linear regression; Fig. S17) but an additive effect on urate (P=0.56 for interaction).
Associations with additional renal phenotypes indicated a complex relationship between MUC1 VNTR length and kidney function (Fig. 3B,D). Long MUC1 alleles (>55 repeat units) increased the risk of gout (OR=1.10; 95% CI, [1.08-1.13], P=1.2 x 10−16, logistic regression) and chronic tubulointerstitial nephritis (OR=1.31 [1.09-1.57], P=3.4 x 10−3, logistic regression, which remained significant after correcting for 13 kidney diseases tested). However, MUC1 VNTR allele length did not associate with chronic kidney disease (OR=1.01 [0.99-1.04], P=0.33, logistic regression) reported in 14,573 cases and only weakly influenced glomerular filtration rate as estimated from serum creatinine (beta=−0.19% [0.11-0.28%] for long vs. short alleles). Long MUC1 alleles associated with modest reductions in red blood cell counts (beta=−0.029 s.d., s.e.=0.002, P=1.5 x 10−39, linear regression) and hemoglobin levels (beta=−0.031 s.d., s.e.=0.002, P=9.9 x 10−44, linear regression), possibly reflecting an impact of reduced kidney function on erythropoietin production.
TCHH VNTR strongly associates with hair phenotypes
Repeat length variation in a coding VNTR in TCHH associated strongly with male pattern baldness (P=1.6 x 10−55, BOLT-LMM). TCHH encodes trichohyalin, a protein that associates in regular arrays with keratin intermediate filaments and confers mechanical strength to the inner root sheath (38). The 18bp VNTR encodes part of a highly stabilized alpha-helix that forms an elongated rod structure (39). A rare nonsense mutation in TCHH has been implicated in uncombable hair syndrome (40), and a common haplotype containing the TCHH missense SNP rs11803731 (encoding a leucine to methionine substitution in TCHH) is by far the strongest genetic determinant of hair curl in individuals of European ancestry (41, 42). In UKB, the TCHH VNTR and rs11803731 exhibited independent associations with male pattern baldness (Fig. 4A,B).
Figure 4. TCHH VNTR length and missense SNP rs11803731 associate independently with hair phenotypes.
A, Genetic associations with male pattern baldness at TCHH (N=189,537 male UKB EUR participants). Colors indicate partial LD (R > 0.1) with missense SNP rs11803731 (blue), the TCHH VNTR (red), or both rs11803731 and VNTR length (purple). B, Mean baldness score in carriers (lines, left axis) and allele frequencies (histograms, right axis) of TCHH alleles. TCHH alleles were binned by VNTR length quintile and missense SNP rs11803731 status. C,D, Genetic associations with hair curl at TCHH in N=3,334 TwinsUK participants (conditioned on rs11803731 in D). E, Genome-wide associations with hair curl in TwinsUK. F, Relationship between TCHH allele length and hair curl (analogous to B).
Intriguingly, the TCHH VNTR appeared to be hypermutable and was poorly tagged by all nearby individual SNPs (R2<0.1), leading us to wonder whether it might also contribute to hair curl in a way invisible to genome-wide association studies of this phenotype. Imputing TCHH VNTR alleles into the TwinsUK cohort (43) (N=3,334 genotyped individuals with hair curl phenotypes) revealed that the TCHH VNTR appeared to be the human genome’s second-largest contributor to hair curl variation genome-wide (explaining ~1% of variance; P=3.6 x 10−8, BOLT-LMM) after the missense SNP rs11803731 in TCHH (which explained ~4% of variance; Fig. 4C-F). Linkage disequilibrium between the VNTR and rs11803731 further explained an association reported near LCE3E (450kb upstream of TCHH) previously thought to be independent of TCHH (42) (Fig. 4C,D).
Discussion
These results identify many strong effects of protein-coding VNTRs on human phenotypes. Most were among the strongest effects of all common variants identified for these phenotypes to date and resolved previously mysterious genetic associations for multiple traits. Incorporation of multi-allelic VNTRs into fine-mapping analyses also helped identify many more functional variants at the same loci, revealing the importance of incorporating allelic series of SNP and VNTR alleles into functional studies and epidemiological research.
These results are likely just the leading edge of a far-larger set of VNTR-phenotype associations that future studies will reveal. In this work with exome sequence data, we were unable to analyze VNTRs that exist in noncoding sequences, are too short for depth-of-coverage to accurately measure length variation, or are too mutable to segregate well with SNP haplotypes. We anticipate that newer sequencing technologies applied to large, diverse cohorts will yield further insights into the mutational and evolutionary processes of VNTRs and their contribution to the “missing heritability” of human phenotypes.
A frustration in human genetics has been that the majority of reported genetic associations involve haplotypes of noncoding and missense SNPs whose potential phenotypic contributions are challenging to dis-entangle from one another, and whose first-order molecular effects are opaque. VNTRs have several attributes that help overcome these challenges. First, multi-allelic VNTRs usually share only partial LD with nearby di-allelic SNP and indel variants. Second, associations to protein-coding VNTRs implicate the size and copy number of specific protein domains, leading to specific, testable hypotheses about the effects of protein domains in biological systems. Third, the directions of coding VNTR associations have clear meaning, revealing whether risk is generated by having more or less of a domain. Finally, VNTRs generate natural allelic series of functionally distinct alleles that can be used for dose-response studies in human tissues and cellular models. We anticipate that these attributes will lead to new insights about the mechanisms by which gene and protein variation affect human biology.
Supplementary Material
Acknowledgments
The authors are grateful to R. Gupta, J. Hirschhorn, M. Hujoel, S. Raychaudhuri, and M. Warman for helpful discussions. This research was conducted using the UK Biobank Resource under application #40709. Computational analyses were performed on the O2 High Performance Compute Cluster, supported by the Research Computing Group, at Harvard Medical School (http://rc.hms.harvard.edu).
Funding:
R.E.M. was supported by NSF grant DMS-1939015 and US National Institutes of Health (NIH) grant K25 HL150334. R.E.H. and S.A.M. were supported by NIH grant R01 HG006855. M.A.S. was supported by the MIT John W. Jarve (1978) Seed Fund for Science Innovation and NIH fellowship F31 MH124393. A.R.B. was supported by NIH fellowship F31 HL154537 and training grant T32 HG 2295-16. P.-R.L. was supported by NIH grant DP2 ES030554, a Burroughs Wellcome Fund Career Award at the Scientific Interfaces, the Next Generation Fund at the Broad Institute of MIT and Harvard, and a Sloan Research Fellowship. TwinsUK is funded by the Wellcome Trust, Medical Research Council, European Union, Chronic Disease Research Foundation (CDRF), Zoe Global Ltd and the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London.
Footnotes
Competing interests: The authors declare no competing interests.
Code availability: The following publicly available software resources were used to perform analyses in this work: Eagle2 (v2.3.5), https://data.broadinstitute.org/alkesgroup/Eagle/; Minimac4 (v1.0.1), https://genome.sph.umich.edu/wiki/Minimac4; BOLT-LMM (v2.3.5), https://data.broadinstitute.org/alkesgroup/BOLT-LMM/; FINEMAP (v1.3.1), http://www.christianbenner.com/; plink (v1.9 and v2.0), https://www.cog-genomics.org/plink2/; Tandem Repeats Finder (v4.09.1), tandem.bu.edu/trf/trf.html; the TOPMed Imputation Server, https://imputation.biodatacatalyst.nhlbi.nih.gov/; BLAT (v35), http://hgdownload.soe.ucsc.edu/admin/exe/; susieR (v0.10.1), https://stephenslab.github.io/susieR/; LDstore (v2.0), http://www.finemap.me; ImpG (v1.0.1), https://github.com/huwenboshi/ImpG. Code and scripts used to perform analyses are available at http://doi.org/10.5281/zenodo.4776804 (44).
Publisher's Disclaimer: This manuscript has been accepted for publication in Science. This version has not undergone final editing. Please refer to the complete version of record at http://www.sciencemag.org/. The manuscript may not be reproduced or used in any manner that does not fall within the fair use provisions of the Copyright Act without the prior, written permission of AAAS.
Data availability:
Access to the following data resources is available to all bona fide researchers by application: UK Biobank (http://www.ukbiobank.ac.uk/); Twins UK (https://twinsuk.ac.uk/); the Haplotype Reference Consortium imputation panel (http://www.haplotype-reference-consortium.org/); AAAGC height summary statistics (https://www.ebi.ac.uk/gwas/). Individual-level VNTR allele length estimates (resolved to phased SNP-haplotypes) and genetically predicted Lp(a) values are available from UK Biobank as a Return from application #40709.
References and Notes
- 1.Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MH-Y, Konkel MK, Malhotra A, Stütz AM, Shi X, Casale FP, Chen J, Hormozdiari F, Dayama G, Chen K, Malig M, Chaisson MJP, Walter K, Meiers S, Kashin S, Garrison E, Auton A, Lam HYK, Mu XJ, Alkan C, Antaki D, Bae T, Cerveira E, Chines P, Chong Z, Clarke L, Dal E, Ding L, Emery S, Fan X, Gujral M, Kahveci F, Kidd JM, Kong Y, Lameijer E-W, McCarthy S, Flicek P, Gibbs RA, Marth G, Mason CE, Menelaou A, Muzny DM, Nelson BJ, Noor A, Parrish NF, Pendleton M, Quitadamo A, Raeder B, Schadt EE, Romanovitch M, Schlattl A, Sebra R, Shabalin AA, Untergasser A, Walker JA, Wang M, Yu F, Zhang C, Zhang J, Zheng-Bradley X, Zhou W, Zichner T, Sebat J, Batzer MA, McCarroll SA, Mills RE, Gerstein MB, Bashir A, Stegle O, Devine SE, Lee C, Eichler EE, Korbel JO, An integrated map of structural variation in 2,504 human genomes. Nature (2015), doi: 10.1038/nature15394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Sulovari A, Li R, Audano PA, Porubsky D, Vollger MR, Logsdon GA, Human Genome Structural Variation Consortium, Warren WC, Pollen AA, Chaisson MJP, Eichler EE, Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl. Acad. Sci (2019), doi: 10.1073/pnas.1912175116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lalioti MD, Scott HS, Buresi C, Rossier C, Bottani A, Morris MA, Malafosse A, Antonarakis SE, Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy. Nature. 386, 847–851 (1997). [DOI] [PubMed] [Google Scholar]
- 4.Wijmenga C, Hewitt JE, Sandkuijl LA, Clark LN, Wright TJ, Dauwerse HG, Gruter A-M, Hofker MH, Moerer P, Williamson R, van Ommen G-JB, Padberg GW, Frants RR, Chromosome 4q DNA rearrangements associated with facioscapulohumeral muscular dystrophy. Nat. Genet 2, 26–30 (1992). [DOI] [PubMed] [Google Scholar]
- 5.Marchini J, Howie B, Myers S, McVean G, Donnelly P, A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet 39, 906–913 (2007). [DOI] [PubMed] [Google Scholar]
- 6.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O’Connell J, Cortes A, Welsh S, Young A, Effingham M, McVean G, Leslie S, Allen N, Donnelly P, Marchini J, The UK Biobank resource with deep phenotyping and genomic data. Nature. 562, 203–209 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Supplementary Materials for “Protein-coding VNTRs strongly shape diverse human phenotypes.”
- 8.Van Hout CV, Tachmazidou I, Backman JD, Hoffman JD, Liu D, Pandey AK, Gonzaga-Jauregui C, Khalid S, Ye B, Banerjee N, Li AH, O’Dushlaine C, Marcketta A, Staples J, Schurmann C, Hawes A, Maxwell E, Barnard L, Lopez A, Penn J, Habegger L, Blumenfeld AL, Bai X, O’Keeffe S, Yadav A, Praveen K, Jones M, Salerno WJ, Chung WK, Surakka I, Willer CJ, Hveem K, Leader JB, Carey DJ, Ledbetter DH, Cardon L, Yancopoulos GD, Economides A, Coppola G, Shuldiner AR, Balasubramanian S, Cantor M, Nelson MR, Whittaker J, Reid JG, Marchini J, Overton JD, Scott RA, Abecasis GR, Yerges-Armstrong L, Baras A, Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature. 586, 749–756 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Benner C, Spencer CCA, Havulinna AS, Salomaa V, Ripatti S, Pirinen M, FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 32, 1493–1501 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Barton AR, Sherman MA, Mukamel RE, Loh P-R, Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses. Nat. Genet, 1–10 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, Atlason BA, Kristmundsdottir S, Mehringer S, Hardarson MT, Gudjonsson SA, Magnusdottir DN, Jonasdottir A, Jonasdottir A, Kristjansson RP, Sverrisson ST, Holley G, Palsson G, Stefansson OA, Eyjolfsson G, Olafsson I, Sigurdardottir O, Torfason B, Masson G, Helgason A, Thorsteinsdottir U, Holm H, Gudbjartsson DF, Sulem P, Magnusson OT, Halldorsson BV, Stefansson K, Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat. Genet, 1–8 (2021). [DOI] [PubMed] [Google Scholar]
- 12.Schmidt K, Noureen A, Kronenberg F, Utermann G, Structure, Function, and Genetics of Lipoprotein(a). J. Lipid Res (2016), doi: 10.1194/jlr.R067314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Loh P-R, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, Patterson N, Price AL, Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet 47, 284–290 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Mari RS, Yilmaz F, Zhao X, Hsieh P, Lee J, Kumar S, Lin J, Rausch T, Chen Y, Ren J, Santamarina M, Höps W, Ashraf H, Chuang NT, Yang X, Munson KM, Lewis AP, Fairley S, Tallon LJ, Clarke WE, Basile AO, Byrska-Bishop M, Corvelo A, Evani US, Lu T-Y, Chaisson MJP, Chen J, Li C, Brand H, Wenger AM, Ghareghani M, Harvey WT, Raeder B, Hasenfeld P, Regier AA, Abel HJ, Hall IM, Flicek P, Stegle O, Gerstein MB, Tubio JMC, Mu Z, Li YI, Shi X, Hastie AR, Ye K, Chong Z, Sanders AD, Zody MC, Talkowski ME, Mills RE, Devine SE, Lee C, Korbel JO, Marschall T, Eichler EE, Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 372 (2021), doi: 10.1126/science.abf7117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Clarke R, Peden JF, Hopewell JC, Kyriakou T, Goel A, Heath SC, Parish S, Barlera S, Franzosi MG, Rust S, Bennett D, Silveira A, Malarstig A, Green FR, Lathrop M, Gigante B, Leander K, de Faire U, Seedorf U, Hamsten A, Collins R, Watkins H, Farrall M, Genetic Variants Associated with Lp(a) Lipoprotein Level and Coronary Disease. N. Engl. J. Med 361, 2518–2528 (2009). [DOI] [PubMed] [Google Scholar]
- 16.Utermann G, Menzel HJ, Kraft HG, Duba HC, Kemmler HG, Seitz C, Lp(a) glycoprotein phenotypes. Inheritance and relation to Lp(a)-lipoprotein concentrations in plasma. J. Clin. Invest 80, 458–465 (1987). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.White AL, Hixson JE, Rainwater DL, Lanford RE, Molecular Basis for “Null” Lipoprotein(a) Phenotypes and the Influence of Apolipoprotein(a) Size on Plasma Lipoprotein(a) Level in the Baboon. J. Biol. Chem 269, 9060–9066 (1994). [PubMed] [Google Scholar]
- 18.Boerwinkle E, Leffert CC, Lin J, Lackner C, Chiesa G, Hobbs HH, Apolipoprotein(a) gene accounts for greater than 90% of the variation in plasma lipoprotein(a) concentrations. J. Clin. Invest 90, 52–60 (1992). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB, Chow ED, Kanterakis E, Gao H, Kia A, Batzoglou S, Sanders SJ, Farh KK-H, Predicting Splicing from Primary Sequence with Deep Learning. Cell. 176, 535–548 (2019). [DOI] [PubMed] [Google Scholar]
- 20.Coassin S, Erhart G, Weissensteiner H, Eca Guimarães de Araújo M, Lamina C, Schönherr S, Forer L, Haun M, Losso JL, Köttgen A, Schmidt K, Utermann G, Peters A, Gieger C, Strauch K, Finkenstedt A, Bale R, Zoller H, Paulweber B, Eckardt K-U, Hüttenhofer A, Huber LA, Kronenberg F, A novel but frequent variant in LPA KIV-2 is associated with a pronounced Lp(a) and cardiovascular risk reduction. Eur. Heart J 38, 1823–1831 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zysow BR, Lindahl GE, Wade DP, Knight BL, Lawn RM, C/T Polymorphism in the 5′ Untranslated Region of the Apolipoprotein(a) Gene Introduces an Upstream ATG and Reduces In Vitro Translation. Arterioscler. Thromb. Vasc. Biol 15, 58–64 (1995). [DOI] [PubMed] [Google Scholar]
- 22.Suzuki K, Kuriyama M, Saito T, Ichinose A, Plasma lipoprotein(a) levels and expression of the apolipoprotein(a) gene are dependent on the nucleotide polymorphisms in its 5’-flanking region. J. Clin. Invest 99, 1361–1366 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Trinder M, Uddin MM, Finneran P, Aragam KG, Natarajan P, Clinical Utility of Lipoprotein(a) and LPA Genetic Risk Score in Risk Prediction of Incident Atherosclerotic Cardiovascular Disease. JAMA Cardiol. (2020), doi: 10.1001/jamacardio.2020.5398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Gudbjartsson DF, Thorgeirsson G, Sulem P, Helgadottir A, Gylfason A, Saemundsdottir J, Bjornsson E, Norddahl GL, Jonasdottir A, Jonasdottir A, Eggertsson HP, Gretarsdottir S, Thorleifsson G, Indridason OS, Palsson R, Jonasson F, Jonsdottir I, Eyjolfsson GI, Sigurdardottir O, Olafsson I, Danielsen R, Matthiasson SE, Kristmundsdottir S, Halldorsson BV, Hreidarsson AB, Valdimarsson EM, Gudnason T, Benediktsson R, Steinthorsdottir V, Thorsteinsdottir U, Holm H, Stefansson K, Lipoprotein(a) Concentration and Risks of Cardiovascular Disease and Diabetes. J. Am. Coll. Cardiol 74, 2982–2994 (2019). [DOI] [PubMed] [Google Scholar]
- 25.Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, the GIANT Consortium, Meta-analysis of genome-wide association studies for height and body mass index in ~700000 individuals of European ancestry. Hum. Mol. Genet 27, 3641–3649 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, Mangino M, Freathy RM, Perry JRB, Stevens S, Hall AS, Samani NJ, Shields B, Prokopenko I, Farrall M, Dominiczak A, Johnson T, Bergmann S, Beckmann JS, Vollenweider P, Waterworth DM, Mooser V, Palmer CNA, Morris AD, Ouwehand WH, Caulfield M, Munroe PB, Hattersley AT, McCarthy MI, Frayling TM, Genome-wide association analysis identifies 20 loci that influence adult height. Nat. Genet 40, 575–583 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Graff M, Justice AE, Young KL, Marouli E, Zhang X, Fine RS, Lim E, Buchanan V, Rand K, Feitosa MF, Wojczynski MK, Yanek LR, Shao Y, Rohde R, Adeyemo AA, Aldrich MC, Allison MA, Ambrosone CB, Ambs S, Amos C, Arnett DK, Atwood L, Bandera EV, Bartz T, Becker DM, Berndt SI, Bernstein L, Bielak LF, Blot WJ, Bottinger EP, Bowden DW, Bradfield JP, Brody JA, Broeckel U, Burke G, Cade BE, Cai Q, Caporaso N, Carlson C, Carpten J, Casey G, Chanock SJ, Chen G, Chen M, Chen Y-DI, Chen W-M, Chesi A, Chiang CWK, Chu L, Coetzee GA, Conti DV, Cooper RS, Cushman M, Demerath E, Deming SL, Dimitrov L, Ding J, Diver WR, Duan Q, Evans MK, Falusi AG, Faul JD, Fornage M, Fox C, Freedman BI, Garcia M, Gillanders EM, Goodman P, Gottesman O, Grant SFA, Guo X, Hakonarson H, Haritunians T, Harris TB, Harris CC, Henderson BE, Hennis A, Hernandez DG, Hirschhorn JN, McNeill LH, Howard TD, Howard B, Hsing AW, Hsu Y-HH, Hu JJ, Huff CD, Huo D, Ingles SA, Irvin MR, John EM, Johnson KC, Jordan JM, Kabagambe EK, Kang SJ, Kardia SL, Keating BJ, Kittles RA, Klein EA, Kolb S, Kolonel LN, Kooperberg C, Kuller L, Kutlar A, Lange L, Langefeld CD, Le Marchand L, Leonard H, Lettre G, Levin AM, Li Y, Li J, Liu Y, Liu Y, Liu S, Lohman K, Lotay V, Lu Y, Maixner W, Manson JE, McKnight B, Meng Y, Monda KL, Monroe K, Moore JH, Mosley TH, Mudgal P, Murphy AB, Nadukuru R, Nalls MA, Nathanson KL, Nayak U, N’Diaye A, Nemesure B, Neslund-Dudas C, Neuhouser ML, Nyante S, Ochs-Balcom H, Ogundiran TO, Ogunniyi A, Ojengbede O, Okut H, Olopade OI, Olshan A, Padhukasahasram B, Palmer J, Palmer CD, Palmer ND, Papanicolaou G, Patel SR, Pettaway CA, Peyser PA, Press MF, Rao DC, Rasmussen-Torvik LJ, Redline S, Reiner AP, Rhie SK, Rodriguez-Gil JL, Rotimi CN, Rotter JI, Ruiz-Narvaez EA, Rybicki BA, Salako B, Sale MM, Sanderson M, Schadt E, Schreiner PJ, Schurmann C, Schwartz AG, Shriner DA, Signorello LB, Singleton AB, Siscovick DS, Smith JA, Smith S, Speliotes E, Spitz M, Stanford JL, Stevens VL, Stram A, Strom SS, Sucheston L, Sun YV, Tajuddin SM, Taylor H, Taylor K, Tayo BO, Thun MJ, Tucker MA, Vaidya D, Van Den Berg DJ, Vedantam S, Vitolins M, Wang Z, Ware EB, Wassertheil-Smoller S, Weir DR, Wiencke JK, Williams SM, Williams LK, Wilson JG, Witte JS, Wrensch M, Wu X, Yao J, Zakai N, Zanetti K, Zemel BS, Zhao W, Zhao JH, Zheng W, Zhi D, Zhou J, Zhu X, Ziegler RG, Zmuda J, Zonderman AB, Psaty BM, Borecki IB, Cupples LA, Liu C-T, Haiman CA, Loos R, Ng MCY, North KE, Discovery and fine-mapping of height loci via high-density imputation of GWASs in individuals of African ancestry. Am. J. Hum. Genet 108, 564–582 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lauing KL, Cortes M, Domowicz MS, Henry JG, Baria AT, Schwartz NB, Aggrecan is required for growth plate cytoarchitecture and differentiation. Dev. Biol 396, 224–236 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Doege KJ, Coulter SN, Meek LM, Maslen K, Wood JG, A Human-specific Polymorphism in the Coding Region of the Aggrecan Gene: VARIABLE NUMBER OF TANDEM REPEATS PRODUCE A RANGE OF CORE PROTEIN SIZES IN THE GENERAL POPULATION. J. Biol. Chem 272, 13974–13979 (1997). [DOI] [PubMed] [Google Scholar]
- 30.Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M, CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gleghorn L, Ramesar R, Beighton P, Wallis G, A Mutation in the Variable Repeat Region of the Aggrecan Gene (AGC1) Causes a Form of Spondyloepiphyseal Dysplasia Associated with Severe, Premature Osteoarthritis. Am. J. Hum. Genet 77, 484–490 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Doyard M, Bacrot S, Huber C, Di Rocco M, Goldenberg A, Aglan MS, Brunelle P, Temtamy S, Michot C, Otaify GA, Haudry C, Castanet M, Leroux J, Bonnefont J-P, Munnich A, Baujat G, Lapunzina P, Monnot S, Ruiz-Perez VL, Cormier-Daire V, FAM46A mutations are responsible for autosomal recessive osteogenesis imperfecta. J. Med. Genet 55, 278–284 (2018). [DOI] [PubMed] [Google Scholar]
- 33.Gewartowska O, Aranaz-Novaliches G, Krawczyk PS, Mroczek S, Kusio-Kobiałka M, Tarkowski B, Spoutil F, Benada O, Kofroňová O, Szwedziak P, Cysewski D, Gruchota J, Szpila M, Chlebowski A, Sedlacek R, Prochazka J, Dziembowski A, Cytoplasmic polyadenylation by TENT5A is required for proper bone formation. Cell Rep. 35, 109015 (2021). [DOI] [PubMed] [Google Scholar]
- 34.Fowler JC, Teixeira AS, Vinall LE, Swallow DM, Hypervariability of the membrane-associated mucin and cancer marker MUC1. Hum. Genet 113, 473–479 (2003). [DOI] [PubMed] [Google Scholar]
- 35.Kirby A, Gnirke A, Jaffe DB, Barešová V, Pochet N, Blumenstiel B, Ye C, Aird D, Stevens C, Robinson JT, Cabili MN, Gat-Viks I, Kelliher E, Daza R, DeFelice M, Hůlková H, Sovová J, Vylet’al P, Antignac C, Guttman M, Handsaker RE, Perrin D, Steelman S, Sigurdsson S, Scheinman SJ, Sougnez C, Cibulskis K, Parkin M, Green T, Rossin E, Zody MC, Xavier RJ, Pollak MR, Alper SL, Lindblad-Toh K, Gabriel S, Hart PS, Regev A, Nusbaum C, Kmoch S, Bleyer AJ, Lander ES, Daly MJ, Mutations causing medullary cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing. Nat. Genet 45, 299–303 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Okada Y, Sim X, Go MJ, Wu J-Y, Gu D, Takeuchi F, Takahashi A, Maeda S, Tsunoda T, Chen P, Lim S-C, Wong T-Y, Liu J, Young TL, Aung T, Seielstad M, Teo Y-Y, Kim YJ, Lee J-Y, Han B-G, Kang D, Chen C-H, Tsai F-J, Chang L-C, Fann S-JC, Mei H, Rao DC, Hixson JE, Chen S, Katsuya T, Isono M, Ogihara T, Chambers JC, Zhang W, Kooner JS, Albrecht E, Yamamoto K, Kubo M, Nakamura Y, Kamatani N, Kato N, He J, Chen Y-T, Cho YS, Tai E-S, Tanaka T, Meta-analysis identifies multiple loci associated with kidney function–related traits in east Asian populations. Nat. Genet 44, 904–909 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Köttgen A, Albrecht E, Teumer A, Vitart V, Krumsiek J, Hundertmark C, Pistis G, Ruggiero D, O’Seaghdha CM, Haller T, Yang Q, Tanaka T, Johnson AD, Kutalik Z, Smith AV, Shi J, Struchalin M, Middelberg RPS, Brown MJ, Gaffo AL, Pirastu N, Li G, Hayward C, Zemunik T, Huffman J, Yengo L, Zhao JH, Demirkan A, Feitosa MF, Liu X, Malerba G, Lopez LM, van der Harst P, Li X, Kleber ME, Hicks AA, Nolte IM, Johansson A, Murgia F, Wild SH, Bakker SJL, Peden JF, Dehghan A, Steri M, Tenesa A, Lagou V, Salo P, Mangino M, Rose LM, Lehtimäki T, Woodward OM, Okada Y, Tin A, Müller C, Oldmeadow C, Putku M, Czamara D, Kraft P, Frogheri L, Thun GA, Grotevendt A, Gislason GK, Harris TB, Launer LJ, McArdle P, Shuldiner AR, Boerwinkle E, Coresh J, Schmidt H, Schallert M, Martin NG, Montgomery GW, Kubo M, Nakamura Y, Tanaka T, Munroe PB, Samani NJ, Jacobs DR, Liu K, D’Adamo P, Ulivi S, Rotter JI, Psaty BM, Vollenweider P, Waeber G, Campbell S, Devuyst O, Navarro P, Kolcic I, Hastie N, Balkau B, Froguel P, Esko T, Salumets A, Khaw KT, Langenberg C, Wareham NJ, Isaacs A, Kraja A, Zhang Q, Wild PS, Scott RJ, Holliday EG, Org E, Viigimaa M, Bandinelli S, Metter JE, Lupo A, Trabetti E, Sorice R, Döring A, Lattka E, Strauch K, Theis F, Waldenberger M, Wichmann H-E, Davies G, Gow AJ, Bruinenberg M, Stolk RP, Kooner JS, Zhang W, Winkelmann BR, Boehm BO, Lucae S, Penninx BW, Smit JH, Curhan G, Mudgal P, Plenge RM, Portas L, Persico I, Kirin M, Wilson JF, Leach IM, van Gilst WH, Goel A, Ongen H, Hofman A, Rivadeneira F, Uitterlinden AG, Imboden M, von Eckardstein A, Cucca F, Nagaraja R, Piras MG, Nauck M, Schurmann C, Budde K, Ernst F, Farrington SM, Theodoratou E, Prokopenko I, Stumvoll M, Jula A, Perola M, Salomaa V, Shin S-Y, Spector TD, Sala C, Ridker PM, Kähönen M, Viikari J, Hengstenberg C, Nelson CP, Meschia JF, Nalls MA, Sharma P, Singleton AB, Kamatani N, Zeller T, Burnier M, Attia J, Laan M, Klopp N, Hillege HL, Kloiber S, Choi H, Pirastu M, Tore S, Probst-Hensch NM, Völzke H, Gudnason V, Parsa A, Schmidt R, Whitfield JB, Fornage M, Gasparini P, Siscovick DS, Polašek O, Campbell H, Rudan I, Bouatia-Naji N, Metspalu A, Loos RJF, van Duijn CM, Borecki IB, Ferrucci L, Gambaro G, Deary IJ, Wolffenbuttel BHR, Chambers JC, März W, Pramstaller PP, Snieder H, Gyllensten U, Wright AF, Navis G, Watkins H, Witteman JCM, Sanna S, Schipf S, Dunlop MG, Tönjes A, Ripatti S, Soranzo N, Toniolo D, Chasman DI, Raitakari O, Kao WHL, Ciullo M, Fox CS, Caulfield M, Bochud M, Gieger C, Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat. Genet 45, 145–154 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Steinert PM, Parry DAD, Marekov LN, Trichohyalin Mechanically Strengthens the Hair Follicle MULTIPLE CROSS-BRIDGING ROLES IN THE INNER ROOT SHEATH. J. Biol. Chem 278, 41409–41419 (2003). [DOI] [PubMed] [Google Scholar]
- 39.Lee SC, Kim IG, Marekov LN, O’Keefe EJ, Parry DA, Steinert PM, The structure of human trichohyalin. Potential multiple roles as a functional EF-hand-like calcium-binding protein, a cornified cell envelope precursor, and an intermediate filament-associated (cross-linking) protein. J. Biol. Chem 268, 12164–12176 (1993). [PubMed] [Google Scholar]
- 40.Basmanav FBÜ, Cau L, Tafazzoli A, Méchin M-C, Wolf S, Romano MT, Valentin F, Wiegmann H, Huchenq A, Kandil R, Garcia Bartels N, Kilic A, George S, Ralser DJ, Bergner S, Ferguson DJP, Oprisoreanu A-M, Wehner M, Thiele H, Altmüller J, Nürnberg P, Swan D, Houniet D, Büchner A, Weibel L, Wagner N, Grimalt R, Bygum A, Serre G, Blume-Peytavi U, Sprecher E, Schoch S, Oji V, Hamm H, Farrant P, Simon M, Betz RC, Mutations in Three Genes Encoding Proteins Involved in Hair Shaft Formation Cause Uncombable Hair Syndrome. Am. J. Hum. Genet 99, 1292–1304 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Medland SE, Nyholt DR, Painter JN, McEvoy BP, McRae AF, Zhu G, Gordon SD, Ferreira MAR, Wright MJ, Henders AK, Campbell MJ, Duffy DL, Hansell NK, Macgregor S, Slutske WS, Heath AC, Montgomery GW, Martin NG, Common Variants in the Trichohyalin Gene Are Associated with Straight Hair in Europeans. Am. J. Hum. Genet 85, 750–755 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Liu F, Chen Y, Zhu G, Hysi PG, Wu S, Adhikari K, Breslin K, Pośpiech E, Hamer MA, Peng F, Muralidharan C, Acuna-Alonzo V, Canizales-Quinteros S, Bedoya G, Gallo C, Poletti G, Rothhammer F, Bortolini MC, Gonzalez-Jose R, Zeng C, Xu S, Jin L, Uitterlinden AG, Ikram MA, van Duijn CM, Nijsten T, Walsh S, Branicki W, Wang S, Ruiz-Linares A, Spector TD, Martin NG, Medland SE, Kayser M, Meta-analysis of genome-wide association studies identifies 8 novel loci involved in shape variation of human head hair. Hum. Mol. Genet 27, 559–575 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Moayyeri A, Hammond CJ, Hart DJ, Spector TD, The UK Adult Twin Registry (TwinsUK Resource). Twin Res. Hum. Genet 16, 144–149 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Mukamel RE, Handsaker RE, Sherman MA, Barton AR, Zheng Y, McCarroll SA, Loh P-R, Codes and scripts for “Protein-coding repeat polymorphisms strongly shape diverse human phenotypes.” (2021), doi: 10.5281/zenodo.4776804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M, Liu B, Matthews P, Ong G, Pell J, Silman A, Young A, Sprosen T, Peakman T, Collins R, UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLOS Med. 12, e1001779 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Loh P-R, Kichaev G, Gazal S, Schoech AP, Price AL, Mixed-model association for biobank-scale datasets. Nat. Genet 50, 906–908 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Benson G, Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Bakhtiari M, Shleizer-Burko S, Gymrek M, Bansal V, Bafna V, Targeted genotyping of variable number tandem repeats with adVNTR. Genome Res. 28, 1709–1719 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Dolzhenko E, van Vugt JJFA, Shaw RJ, Bekritsky MA, van Blitterswijk M, Narzisi G, Ajay SS, Rajan V, Lajoie BR, Johnson NH, Kingsbury Z, Humphray SJ, Schellevis RD, Brands WJ, Baker M, Rademakers R, Kooyman M, Tazelaar GHP, van Es MA, McLaughlin R, Sproviero W, Shatunov A, Jones A, Khleifat AA, Pittman A, Morgan S, Hardiman O, Al-Chalabi A, Shaw C, Smith B, Neo EJ, Morrison K, Shaw PJ, Reeves C, Winterkorn L, Wexler NS, T. U.-V. C. R. Group, Housman DE, Ng CW, Li AL, Taft RJ, van den Berg LH, Bentley DR, Veldink JH, Eberle MA, Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. (2017), doi: 10.1101/gr.225672.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Wang G, Sarkar A, Carbonetto P, Stephens M, A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B Stat. Methodol 82, 1273–1300 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Benner C, Havulinna AS, Järvelin M-R, Salomaa V, Ripatti S, Pirinen M, Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies. Am. J. Hum. Genet 101, 539–551 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Levey AS, Bosch JP, Lewis JB, Greene T, Rogers N, Roth D, A more accurate method to estimate glomerular filtration rate from serum creatinine: a new prediction equation. Modification of Diet in Renal Disease Study Group. Ann. Intern. Med 130, 461–470 (1999). [DOI] [PubMed] [Google Scholar]
- 53.Yap CX, Sidorenko J, Wu Y, Kemper KE, Yang J, Wray NR, Robinson MR, Visscher PM, Dissection of genetic variation and evidence for pleiotropy in male pattern baldness. Nat. Commun 9, 5407 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Listgarten J, Lippert C, Kadie CM, Davidson RI, Eskin E, Heckerman D, Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–526 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Mefford J, Park D, Zheng Z, Ko A, Ala-Korpela M, Laakso M, Pajukanta P, Yang J, Witte J, Zaitlen N, Efficient Estimation and Applications of Cross-Validated Genetic Predictions to Polygenic Risk Scores and Linear Mixed Models. J. Comput. Biol 27, 599–612 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Loh P-R, Bhatia G, Gusev A, Finucane HK, Bulik-Sullivan BK, Pollack SJ, Schizophrenia Working Group of the Psychiatric Genomics Consortium, de Candia TR, Lee SH, Wray NR, Kendler KS, O’Donovan MC, Neale BM, Patterson N, Price AL, Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet 47, 1385–1392 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Mack S, Coassin S, Rueedi R, Yousri NA, Seppälä I, Gieger C, Schönherr S, Forer L, Erhart G, Marques-Vidal P, Ried JS, Waeber G, Bergmann S, Dähnhardt D, Stöckl A, Raitakari OT, Kähönen M, Peters A, Meitinger T, Strauch K, K.-S. Group, Kedenko L, Paulweber B, Lehtimäki T, Hunt SC, Vollenweider P, Lamina C, Kronenberg F, A genome-wide association meta-analysis on lipoprotein (a) concentrations adjusted for apolipoprotein (a) isoforms. J. Lipid Res 58, 1834–1844 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Zekavat SM, Ruotsalainen S, Handsaker RE, Alver M, Bloom J, Poterba T, Seed C, Ernst J, Chaffin M, Engreitz J, Peloso GM, Manichaikul A, Yang C, Ryan KA, Fu M, Johnson WC, Tsai M, Budoff M, Vasan RS, Cupples LA, Rotter JI, Rich SS, Post W, Mitchell BD, Correa A, Metspalu A, Wilson JG, Salomaa V, Kellis M, Daly MJ, Neale BM, McCarroll S, Surakka I, Esko T, Ganna A, Ripatti S, Kathiresan S, Natarajan P, Deep coverage whole genome sequences and plasma lipoprotein(a) in individuals of European and African ancestries. Nat. Commun 9, 2606 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Di Maio S, Grüneis R, Streiter G, Lamina C, Maglione M, Schoenherr S, Öfner D, Thorand B, Peters A, Eckardt K-U, Köttgen A, Kronenberg F, Coassin S, Investigation of a nonsense mutation located in the complex KIV-2 copy number variation region of apolipoprotein(a) in 10,910 individuals. Genome Med. 12, 74 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Pasaniuc B, Zaitlen N, Shi H, Bhatia G, Gusev A, Pickrell J, Hirschhorn J, Strachan DP, Patterson N, Price AL, Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics. 30, 2906–2914 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.The 1000 Genomes Project Consortium, An integrated map of genetic variation from 1,092 human genomes. Nature. 491, 56–65 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.The 1000 Genomes Project Consortium, A global reference for human genetic variation. Nature. 526, 68–73 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh P-R, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C, Next-generation genotype imputation service and methods. Nat. Genet 48, 1284–1287 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM, Pitsillides AN, LeFaive J, Lee S, Tian X, Browning BL, Das S, Emde A-K, Clarke WE, Loesch DP, Shetty AC, Blackwell TW, Smith AV, Wong Q, Liu X, Conomos MP, Bobo DM, Aguet F, Albert C, Alonso A, Ardlie KG, Arking DE, Aslibekyan S, Auer PL, Barnard J, Barr RG, Barwick L, Becker LC, Beer RL, Benjamin EJ, Bielak LF, Blangero J, Boehnke M, Bowden DW, Brody JA, Burchard EG, Cade BE, Casella JF, Chalazan B, Chasman DI, Chen Y-DI, Cho MH, Choi SH, Chung MK, Clish CB, Correa A, Curran JE, Custer B, Darbar D, Daya M, de Andrade M, DeMeo DL, Dutcher SK, Ellinor PT, Emery LS, Eng C, Fatkin D, Fingerlin T, Forer L, Fornage M, Franceschini N, Fuchsberger C, Fullerton SM, Germer S, Gladwin MT, Gottlieb DJ, Guo X, Hall ME, He J, Heard-Costa NL, Heckbert SR, Irvin MR, Johnsen JM, Johnson AD, Kaplan R, Kardia SLR, Kelly T, Kelly S, Kenny EE, Kiel DP, Klemmer R, Konkle BA, Kooperberg C, Köttgen A, Lange LA, Lasky-Su J, Levy D, Lin X, Lin K-H, Liu C, Loos RJF, Garman L, Gerszten R, Lubitz SA, Lunetta KL, Mak ACY, Manichaikul A, Manning AK, Mathias RA, McManus DD, McGarvey ST, Meigs JB, Meyers DA, Mikulla JL, Minear MA, Mitchell BD, Mohanty S, Montasser ME, Montgomery C, Morrison AC, Murabito JM, Natale A, Natarajan P, Nelson SC, North KE, O’Connell JR, Palmer ND, Pankratz N, Peloso GM, Peyser PA, Pleiness J, Post WS, Psaty BM, Rao DC, Redline S, Reiner AP, Roden D, Rotter JI, Ruczinski I, Sarnowski C, Schoenherr S, Schwartz DA, Seo J-S, Seshadri S, Sheehan VA, Sheu WH, Shoemaker MB, Smith NL, Smith JA, Sotoodehnia N, Stilp AM, Tang W, Taylor KD, Telen M, Thornton TA, Tracy RP, Van Den Berg DJ, Vasan RS, Viaud-Martinez KA, Vrieze S, Weeks DE, Weir BS, Weiss ST, Weng L-C, Willer CJ, Zhang Y, Zhao X, Arnett DK, Ashley-Koch AE, Barnes KC, Boerwinkle E, Gabriel S, Gibbs R, Rice KM, Rich SS, Silverman EK, Qasba P, Gan W, Papanicolaou GJ, Nickerson DA, Browning SR, Zody MC, Zöllner S, Wilson JG, Cupples LA, Laurie CC, Jaquish CE, Hernandez RD, O’Connor TD, Abecasis GR, Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 590, 290–299 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Loh P-R, Danecek P, Palamara PF, Fuchsberger C, Reshef YA, Finucane HK, Schoenherr S, Forer L, McCarthy S, Abecasis GR, Durbin R, Price AL, Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet 48, 1443–1448 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Loh P-R, Genovese G, McCarroll SA, Monogenic and polygenic inheritance become instruments for clonal selection. Nature. 584, 136–141 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Li H, Durbin R, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25, 1754–1760 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Edgar RC, MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Kent WJ, BLAT—The BLAST-Like Alignment Tool. Genome Res. 12, 656–664 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K, Fairley S, Runnels A, Winterkorn L, Lowy-Gallego E, T. H. G. S. V. Consortium, Flicek P, Germer S, Brand H, Hall IM, Talkowski ME, Narzisi G, Zody MC, High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. bioRxiv (2021), doi: 10.1101/2021.02.06.430068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Handsaker RE, Van Doren V, Berman JR, Genovese G, Kashin S, Boettger LM, McCarroll SA, Large multiallelic copy number variations in humans. Nat. Genet 47, 296–303 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, and Haussler D, The Human Genome Browser at UCSC. Genome Res. 12, 996–1006 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Handsaker RE, Korn JM, Nemesh J, McCarroll SA, Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet 43, 269–276 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Abyzov A, Urban AE, Snyder M, Gerstein M, CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Lu T-Y, T. Human Genome Structural Variation Consortium, Chaisson MJP, Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs. Nat. Commun 12, 4250 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Garg P, Martin-Trujillo A, Rodriguez OL, Gies SJ, Hadelia E, Jadhav B, Jain M, Paten B, Sharp AJ, Pervasive cis effects of variation in copy number of large tandem repeats on local DNA methylation and gene expression. Am. J. Hum. Genet 108, 809–824 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Pedersen BS, Quinlan AR, Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics. 34, 867–868 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Browning BL, Browning SR, Genotype Imputation with Millions of Reference Samples. Am. J. Hum. Genet 98, 116–126 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Stephens M, Scheet P, Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-Data Imputation. Am. J. Hum. Genet 76, 449–462 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Browning BL, Browning SR, A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals. Am. J. Hum. Genet 84, 210–223 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Li N, Stephens M, Modeling Linkage Disequilibrium and Identifying Recombination Hotspots Using Single-Nucleotide Polymorphism Data. Genetics. 165, 2213–2233 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ, Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 4 (2015), doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Kirkpatrick S, Gelatt CD, Vecchi MP, Optimization by Simulated Annealing. Science. 220, 671–680 (1983). [DOI] [PubMed] [Google Scholar]
- 86.Coassin S, Schönherr S, Weissensteiner H, Erhart G, Forer L, Losso JL, Lamina C, Haun M, Utermann G, Paulweber B, Specht G, Kronenberg F, A comprehensive map of single-base polymorphisms in the hypervariable LPA kringle IV type 2 copy number variation region. J. Lipid Res 60, 186–199 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Kraft HG, Lingenhel A, Pang RWC, Delport R, Trommsdorff M, Vermaak H, Janus ED, Utermann G, Frequency Distributions of Apolipoprotein(a) Kringle IV Repeat Alleles and Their Effects on Lipoprotein(a) Levels in Caucasian, Asian, and African Populations: The Distribution of Null Alleles Is Non-Random. Eur. J. Hum. Genet 4, 74–87 (1996). [DOI] [PubMed] [Google Scholar]
- 88.Horton WE Jr, Lethbridge-Çejku M, Hochberg MC, Balakir R, Precht P, Plato CC, Tobin JD, Meek L, Doege K, An association between an aggrecan polymorphic allele and bilateral hand osteoarthritis in elderly white men: data from the Baltimore Longitudinal Study of Aging (BLSA). Osteoarthritis Cartilage. 6, 245–251 (1998). [DOI] [PubMed] [Google Scholar]
- 89.Barragán I, Borrego S, El-Aziz MMA, El-Ashry MF, Abu-Safieh L, Bhattacharya SS, Antiñolo G, Genetic Analysis of FAM46A in Spanish Families with Autosomal Recessive Retinitis Pigmentosa: Characterisation of Novel VNTRs. Ann. Hum. Genet 72, 26–34 (2008). [DOI] [PubMed] [Google Scholar]
- 90.Vinall LE, King M, Novelli M, Green CA, Daniels G, Hilkens J, Sarner M, Swallow DM, Altered expression and allelic association of the hypervariable membrane mucin MUC1 in Helicobacter pylori gastritis. Gastroenterology. 123, 41–49 (2002). [DOI] [PubMed] [Google Scholar]
- 91.Audano PA, Sulovari A, Graves-Lindsay TA, Cantsilieris S, Sorensen M, Welch AE, Dougherty ML, Nelson BJ, Shah A, Dutcher SK, Warren WC, Magrini V, McGrath SD, Li YI, Wilson RK, Eichler EE, Characterizing the Major Structural Variant Alleles of the Human Genome. Cell. 176, 663–675.e19 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Noureen A, Fresser F, Utermann G, Schmidt K, Sequence Variation within the KIV-2 Copy Number Polymorphism of the Human LPA Gene in African, Asian, and European Populations. PLOS ONE. 10, e0121582 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Parson W, Kraft HG, Niederstätter H, Lingenhel AW, Köchl S, Fresser F, Utermann G, A common nonsense mutation in the repetitive Kringle IV-2 domain of human apolipoprotein(a) results in a truncated protein and low plasma Lp(a). Hum. Mutat 24, 474–480 (2004). [DOI] [PubMed] [Google Scholar]
- 94.Ogorelkova M, Gruber A, Utermann G, Molecular Basis of Congenital Lp(A) Deficiency: A Frequent Apo(A) ‘Null’ Mutation in Caucasians. Hum. Mol. Genet 8, 2087–2096 (1999). [DOI] [PubMed] [Google Scholar]
- 95.Lim ET, Würtz P, Havulinna AS, Palta P, Tukiainen T, Rehnström K, Esko T, Mägi R, Inouye M, Lappalainen T, Chan Y, Salem RM, Lek M, Flannick J, Sim X, Manning A, Ladenvall C, Bumpstead S, Hämäläinen E, Aalto K, Maksimow M, Salmi M, Blankenberg S, Ardissino D, Shah S, Horne B, McPherson R, Hovingh GK, Reilly MP, Watkins H, Goel A, Farrall M, Girelli D, Reiner AP, Stitziel NO, Kathiresan S, Gabriel S, Barrett JC, Lehtimäki T, Laakso M, Groop L, Kaprio J, Perola M, McCarthy MI, Boehnke M, Altshuler DM, Lindgren CM, Hirschhorn JN, Metspalu A, Freimer NB, Zeller T, Jalkanen S, Koskinen S, Raitakari O, Durbin R, MacArthur DG, Salomaa V, Ripatti S, Daly MJ, Palotie A, for the S. I. S. (SISu) Project, Distribution and Medical Impact of Loss-of-Function Variants in the Finnish Founder Population. PLOS Genet. 10, e1004494 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Morgan BM, Brown AN, Deo N, Harrop TWR, Taiaroa G, Mace PD, Wilbanks SM, Merriman TR, Williams MJA, McCormick SPA, Nonsynonymous SNPs in LPA homologous to plasminogen deficiency mutants represent novel null apo(a) alleles. J. Lipid Res 61, 432–444 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Said MA, Yeung MW, van de Vegte YJ, Benjamins JW, Dullaart RPF, Ruotsalainen S, Ripatti S, Natarajan P, Juarez-Orozco LE, Verweij N, van der Harst P, Genome-Wide Association Study and Identification of a Protective Missense Variant on Lipoprotein(a) Concentration. Arterioscler. Thromb. Vasc. Biol 41, 1792–1800 (2021). [DOI] [PubMed] [Google Scholar]
- 98.Aguet F, Barbeira AN, Bonazzola Rodrigo, Brown A, Castel SE, The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 369, 1318–1330 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O’Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME, Neale BM, Daly MJ, MacArthur DG, The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 581, 434–443 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Access to the following data resources is available to all bona fide researchers by application: UK Biobank (http://www.ukbiobank.ac.uk/); Twins UK (https://twinsuk.ac.uk/); the Haplotype Reference Consortium imputation panel (http://www.haplotype-reference-consortium.org/); AAAGC height summary statistics (https://www.ebi.ac.uk/gwas/). Individual-level VNTR allele length estimates (resolved to phased SNP-haplotypes) and genetically predicted Lp(a) values are available from UK Biobank as a Return from application #40709.