Abstract
Tandem repeats (TRs) - highly polymorphic, repetitive sequences dispersed across the human genome - are crucial regulators of gene expression and diverse biological processes, but have remained underexplored relative to other classes of genetic variation due to historical challenges in their accurate calling and analysis. Here, we leverage whole genome and single-cell RNA sequencing from over 5.4 million blood-derived cells from 1,925 individuals to explore the impact of variation in over 1.7 million polymorphic TR loci on blood cell type-specific gene expression. We identify over 62,000 single-cell expression quantitative trait TR loci (sc-eTRs), 16.6% of which are specific to one of 28 distinct immune cell types. Further fine-mapping uncovers 4,283 sc-eTRs as candidate causal drivers of gene expression in 13.6% of genes tested genome-wide. We show through colocalization that TRs are likely mediators of genetic associations with immune-mediated and hematological traits in over 700 genes, and further identify novel TRs warranting investigation in rare disease cohorts. TRs are critical, yet long-overlooked, contributors to cell type-specific gene expression, with implications for understanding rare disease pathogenesis and the genetic architecture of complex traits.
Full Text Availability
The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.