Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2024 Nov 26:2024.11.25.625248. [Version 1] doi: 10.1101/2024.11.25.625248

Insights into the causes and consequences of DNA repeat expansions from 700,000 biobank participants

Margaux LA Hujoel, Robert E Handsaker, Nolan Kamitaki, Ronen E Mukamel, Simone Rubinacci, Pier F Palamara, Steven A McCarroll, Po-Ru Loh
PMCID: PMC11623664  PMID: 39651202

Abstract

Expansions and contractions of tandem DNA repeats are a source of genetic variation in human populations and in human tissues: some expanded repeats cause inherited disorders, and some are also somatically unstable. We analyzed DNA sequence data, derived from the blood cells of >700,000 participants in UK Biobank and the All of Us Research Program, and developed new computational approaches to recognize, measure and learn from DNA-repeat instability at 15 highly polymorphic CAG-repeat loci. We found that expansion and contraction rates varied widely across these 15 loci, even for alleles of the same length; repeats at different loci also exhibited widely variable relative propensities to mutate in the germline versus the blood. The high somatic instability of TCF4 repeats enabled a genome-wide association analysis that identified seven loci at which inherited variants modulate TCF4 repeat instability in blood cells. Three of the implicated loci contained genes ( MSH3 , FAN1 , and PMS2 ) that also modulate Huntington’s disease age-at-onset as well as somatic instability of the HTT repeat in blood; however, the specific genetic variants and their effects (instability-increasing or-decreasing) appeared to be tissue-specific and repeat-specific, suggesting that somatic mutation in different tissues—or of different repeats in the same tissue—proceeds independently and under the control of substantially different genetic variation. Additional modifier loci included DNA damage response genes ATAD5 and GADD45A . Analyzing DNA repeat expansions together with clinical data showed that inherited repeats in the 5’ UTR of the glutaminase ( GLS) gene are associated with stage 5 chronic kidney disease (OR=14.0 [5.7–34.3]) and liver diseases (OR=3.0 [1.5–5.9]). These and other results point to the dynamics of DNA repeats in human populations and across the human lifespan.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES