Skip to main content
BMC Medical Genomics logoLink to BMC Medical Genomics
. 2025 Jul 25;18:120. doi: 10.1186/s12920-025-02191-8

Enrichment of tandem repeat element variants near CHD genes identified by short- and long-read genome sequencing

Abhilash Suresh 1,#, Sarah U Morton 2,3,✉,#, Daniel Quiat 2,4, Steven R DePalma 1, Joshua M Gorham 1, Martina Brueckner 5, Martin Tristani-Firouzi 6, Bruce D Gelb 7, Jonathan G Seidman 1, Christine E Seidman 1,8,9; the Pediatric Cardiac Genomics Consortium
PMCID: PMC12291376  PMID: 40713712

Abstract

Background

Congenital heart disease (CHD) is an important cause of childhood mortality as well as morbidity in children and adults. While genetic risk contributes to the majority of CHD, most individuals with CHD do not have an identified genetic diagnosis. Short tandem repeat (TR) elements are composed of repeated base pair motifs for 2–6 basepairs that are highly polymorphic in length between individuals. These regions had been difficult to study with short read sequencing, and they have not been studied at a large scale in the context of CHD. New software and sequencing platforms have allowed for more accurate TR element genotyping. Therefore, we aimed to identify TR element variants that could impact the expression of known CHD genes.

Results

We identified de novo and inherited TR element variants near known CHD genes in participants with CHD (n = 1,899) in the Pediatric Cardiac Genomics Consortium cohort as well as unaffected participants (n = 1,932) from the Simons Foundation Autism Research Initiative using short-read sequencing followed by variant calling with the gangSTR pipeline. Comparison with long-read sequencing confirmed proband genotypes for 75% (91/120) of the TR element variants identified using short read sequencing. 114 TR element regions had 3 or more de novo TR element variants, compared to an expectation of 74 TR element regions (1.54-fold enrichment, p < 1.5E-5). CHD genes CACNA1C and EVC2 had the strongest enrichment of TR element variants in the CHD cohort, determined by a higher frequency of nearby de novo TR length variants in the CHD cohort compared to the non-CHD cohort. Within CHD trios, there was over-transmission of a TR element variant near Tab 2.

Conclusions

In a targeted analysis of de novo and transmitted TR element variants in a large cohort of CHD probands, each individual had ~ 1 de novo TR element variant near a CHD gene, and participants with CHD demonstrate clustering of variants within TR element regions. Long-read sequencing confirmed the majority of TR element variants identified using the gangSTR pipeline. De novo variants in known CHD genes were enriched in participants with CHD, with specific enrichment in TR elements near CACNA1C, EVC2, and Tab 2 in the CHD cohort. Many individual TR element variants were in known regulatory regions, but further work is needed to determine their functional impact.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12920-025-02191-8.

Keywords: Congenital heart disease, Long-read sequencing, Tandem repeat element

Background

Congenital heart disease (CHD) is a major cause of childhood mortality and lifelong morbidity in children and adults [13]. It is the most common birth defect, affecting around 1% of live births. CHD has a variety of causes including in-utero exposures, maternal health conditions, and severe genetic abnormalities. It also has a strong genetic basis, with exome and short-read whole genome sequencing (srWGS) identifying genetic contributions to CHD risk in around 45% of participants, including monogenic causes, chromosomal aneuploidies, and other types of genetic variation. The remaining 55% have not had a specific genetic risk identified [4]. Previous exome and genome studies that aim to identify genetic contributions to CHD have focused on single nucleotide variants (SNVs) and small insertions/deletions (indels). However, there are several other types of genetic variation that might hold the key to unsolved cases of CHD.

Tandem repeat (TR) elements are defined as repetitive sequence motifs that can be up to several kilobases (kb) in length and have 10–10,000-fold higher de novo variant rates than other types of genetic variation [5]. TR element variants, defined for this study as repeat length variation in TR elements, and have been implicated in several diseases and can affect complex traits, diseases, and gene expression [6, 7]. TR element polymorphism in coding or regulatory regions of CHD-related genes could potentially alter function and expression, thereby modifying risk of CHD. Several genetic diseases with cardiac phenotypes have been shown to be caused by, or associated with, TR repeat length expansions. Examples include Friedrich ataxia which is caused by a GAA expansion in the FXN gene, and a CGG expansion near the DIP2B gene that has been found in families with cardiomyopathy [810]. Identifying repeat size variation in TR elements has been a challenge due to multiple repeat length variants at each site, as well as difficulty resolving long expansions with short-read sequencing technologies. New bioinformatic tools have enabled identification of high-confidence short-TR element variants in srWGS [1113]. In addition to new software packages, there have also been advances in sequencing techniques that allow for better resolution of TR elements. Long-read whole genome sequencing (lrWGS), with read lengths of 10–20 kb, can provide accurate genotyping of variation in TR element genotypes [14]. As srWGS utilizes read lengths of 100–150 basepairs, it can be difficult to accurately resolve TR element length variants, resulting in inaccurate genotyping. By contrast, lrWGS allows for a single read to contain the entirety of these long TR elements allowing for more accurate alignment and genotyping [15]. Further, lrWGS methods have been shown to more accurately resolve longer and complex structural variants, which include TR elements, in CHD specifically [16]. Together, these technologies allow for exploration of previously understudied parts of the genome as well as the role of tandem repeats in CHD risk.

We hypothesize that de novo length variants in TR elements modify risk for CHD and that probands with CHD have a higher burden of de novo TR element variants in and near known CHD-associated genes. To identify TR element length variants of uncertain significance that may regulate known CHD genes, we utilize a genomic pipeline to accurately genotype TR elements with period sequence of 2–6 basepairs, call de novo length variants using trio information, and compare their burden between affected and unaffected cohorts. Additionally, we validate TR element calls using PacBio long-read sequencing in a smaller sub-cohort to assess the accuracy of our TR element variant calls. Our work highlights potential disease associated loci that have not been studied due to the difficulty in sequencing and genotyping repetitive regions of the genome.

Results

Comparison of Gangstr/Monstr variant calls with GATK and PacBio

To determine the consistency of TR element variant calls, we assessed TR element variants in the same cohort and regions using an alternative short-read pipeline, GATK HaplotypeCaller [17]. Of the 1208 de novo TR element variants called by GangSTR/MonSTR in the 741 trios analyzed by both methods, GATK HaplotypeCaller also identified variants at 1066 (88%) of the loci with TR element variants from GangSTR/MonSTR (Tab 1). However, the short-read pipelines differed with regard to the length change, consistent with difficulties estimating longer variation in repeat elements using short-read sequencing. Long-read pipeline sequences were generated using PacBio HiFi sequencing for a subset of 77 participants with CHD who had available high-quality DNA. By comparison, only 91 of the 120 de novo TR element variants (76%) identified by GangSTR/MonSTR had variants at the same position identified by the PacBio pipeline, despite adequate read depth in the relevant regions of the long-read genome sequencing data. Forty-seven trios had de novo TR element variants called by all three pipelines: GangSTR/MonSTR pipeline, GATK HaplotypeCaller, and PacBio. Of the de novo TR element variants identified in those 47 trios, 88% (76/86) of the GangSTR/MonSTR variants were also called by both of the other methods.

Table 1.

Cross validation of TR element variant calls. Comparison of TR elements identified in shared samples between different variant callers (MonSTR, GATK) and between different sequencing modalities (short read whole genome sequencing vs. Pacbio long read whole genome sequencing)

Software/Platform Shared Samples DNVs in MonSTR DNVs in both Percentage in both
MonSTR + GATK 741 1208 1066 88%
MonSTR + PacBio 77 120 91 76%
MonSTR + Pacbio + GATK 47 86 76 88%

Abbreviations de novo variant (DNV)

GangSTR/MonSTR identifies de novo TR element variants near CHD genes

GangSTR and MonSTR were used to identify TR element variants in trio srWGS from 1899 CHD participants and 1932 non-CHD participants (Fig. 1). We compared variants associated with 269 human CHD genes compared to 281 control genes that were matched for length. There was no difference in the per gene number of TR elements in the CHD and control gene lists (median CHD 32 TR elements per gene vs. control 30 TR elements per gene, Wilcoxon rank sum p-value 0.22). In regions of CHD genes, there were 1,482,422 TR element variants in CHD participants (mean 781/proband, standard deviation [SD] 65) and 1,532,973 TR element variants in non-CHD participants (mean 793/proband, SD 42, p = 0.98). In the control gene set, there were 1,767,010 TR element variants in CHD participants (mean 930/proband, SD 71) and 1,839,704 in non-CHD participants (mean 952/proband, SD 51, p = 0.91). A total of 1,771 de novo TR elements (0.93/proband) near CHD associated genes were identified in CHD participants and a total of 1864 de novo TR elements (0.96/proband) were identified in non-CHD participants (p = 0.99). In the control gene set, a total of 2,517 de novo TR elements (1.3 per proband) were identified in PCGC probands while 2,584 de novo TR element variants (1.3 per proband) were identified in non-CHD participants (p = 0.99).

Fig. 1.

Fig. 1

TR element analysis pipeline. First, clinical data and published coding variants were queried to remove trios with confirmed molecular diagnosis or known genetic syndromes that explain CHD. Next, short-read sequencing from trios was subset to regions within 20 kb of 269 human CHD genes (Supplemental Table 4), or a length-matched control set of non-CHD associated genes (Supplemental Table 5). TR element variants were called using best practices for the GangSTR pipeline, followed by identification of de novo TR element variants using MonSTR. Downstream analysis to explore trends and compare affected probands to unaffected controls, and validation using different variant calling algorithms as well as PacBio lrWGS sequencing technology. Variants refer to TR element repeat length variants

Clustering of de novo TR element variants

We next determined the location of each de novo TR element variant. The majority of de novo TR element variants were intronic, including the first intron, but there was no difference in the rate of intronic variants by cohort (mean 0.70/proband in the CHD cohort vs. 0.77 in the non-CHD cohort, p = 0.69; Tab 2). There were no de novo TR element variants in promoter regions, and they were rarely present in untranslated regions (UTRs) or upstream open reading frames. Exonic de novo TR element variants, present in coding exons, were also rare, with only four in the CHD cohort and three in the non-CHD cohort. The four CHD genes with exonic de novo TR element variants in the CHD cohort were KMT2A, CACNA1A, ARID1B, and KAT6A, all of which are associated with genetic diagnosis that include neurodevelopmental features, while in the non-CHD cohort exonic de novo TR element variants were identified in FOXC1 and CDKN1C, which are associated with genetic diagnosis that have less frequent neurodevelopmental features. There was a trend towards fewer 5’ UTR de novo TR element variants in the CHD cohort (0.0006 vs. 0.004, p = 0.037).

Table 2.

TR elements by genomic annotation. de novo TR element variants per participant categorized by genomic annotation, with enrichment assessed using a binomial model

Location CHD Non-CHD CHD per child Non-CHD per child p-value
5’ UTR 1 7 0.0005 0.004 0.071
3’ UTR 10 11 0.005 0.006 1.000
Intron 1234 1433 0.65 0.74 0.012
1st intron 308 400 0.16 0.21 0.006
Heart Specific Enhancers 196 249 0.10 0.13 0.052
Promoters 0 0 0 0 1.000
uORF 9 6 0.005 0.003 0.444
Exon 4 3 0.002 0.002 0.720
Total 1771 1864 0.93 0.96 0.353

Heart specific enhancers were identified from the VISTA browser [18]. Abbreviations: congenital heart disease (CHD), untranslated region (UTR), Upstream open reading frame (uORF). Bonferroni significance threshold p-value < 0.006

After assigning each TR element to the nearest gene, there were 19 genes with a nominal enrichment for de novo TR element variants in the CHD cohort and four with an enrichment in the non-CHD cohort (Supplemental Table 1). The gene CACNA1C was most enriched for de novo TR element variants in the CHD cohort (11 variants in 1899 probands vs. three variants in 1932 probands, p = 1.87E-4; Fig. 2A). Seven of the other genes enriched for CHD cohort variants were autosomal dominant CHD genes, while all of genes enriched for non-CHD cohort variants were autosomal recessive. The number of CHD genes enriched for de novo TR element variants in the CHD cohort was not more than expected, as permuting the cohort assignment of de novo TR element variants observed an enrichment of similar magnitude (19 observed vs. 15 permuted, p = 0.11, Supplemental Fig. 1A). In the control gene set, there were 15 genes enriched for CHD cohort variants and 12 genes enriched for non-CHD variants (Supplemental Fig. 1). For the control gene set, there was no difference in gene enrichment compared to the permuted set (15 observed vs. 16 permuted, p = 0.6, Supplemental Fig. 1B).

Fig. 2.

Fig. 2

Per-gene and per-locus assessment of de novo TR element variants. (A) Ratio of de novo TR elements per gene in CHD versus non-CHD cohorts plotted by unadjusted p-value. Red points indicate autosomal dominant genes. (B) TR element loci with de novo variants in more than one PCGC proband labeled by nearest CHD gene and number of probands with a variant. (C) Permutation of cohort assignment across de novo TR element variants in 100,000 trials quantifying the number of TR element loci with > 2 de novo variants in unique individuals

When grouped by TR element genomic feature instead of gene, there were several features that contained multiple de novo TR element variants (Fig. 2B), particularly those that were intronic or intergenic. There were 114 loci with TR element variants in at least three probands in the CHD cohort, with an expected 74 loci based on permutation analysis (1.54-fold, p < 1E-5, Fig. 2C). Of these, 16 loci were enriched in the CHD cohort in comparison to the non-CHD cohort (each with adjusted p < 0.05).

Transmission disequilibrium testing

The potential contribution of inherited TR element variants was assessed using transmission disequilibrium tests within the CHD trios. A total of 35,072 transmitted variants were included in the analysis. There was one gene, Tab 2, with a nominal over-transmission of TR element variants in the CHD cohort (579 transmitted variants that were over-transmitted in CHD cohort, there were no loci that with over-transmitted variants in non-CHD cohort; OR 1.3, p_adjusted = 0.046, Fig. 3A). Tab 2 is involved in several signal transduction pathways and, when mutated, has been implicated in congenital heart disease [19, 20]. This variant is a repeat contraction found near a cis-regulatory element (Fig. 3B). There were also 15 genes associated with 20 loci with under-transmission of TR element variants in the CHD cohort (Supplemental Tables 23). This included 18 loci with TR expansions and 8 loci with TR contractions.

Fig. 3.

Fig. 3

Analysis of transmitted TR element variant alleles. (A) Odds ratio of transmission of each TR element variant allele from parents to child plotted versus inverse log of the unadjusted p-value. (B) Genome browser screenshot demonstrating the proximity of the over-transmitted TR element region near the Tab 2 gene to cis-regulatory elements and H3K27ac enhancer peaks

Discussion

TR elements have been difficult to study in the past due to high rates of polymorphism between individuals and limitations in the accuracy with which these loci could be genotyped. However, due to their variability as well as their prevalence in the genome, it is important to understand how TR element variants might affect disease risk. TR elements can affect coding exons, regulatory elements, and binding regions all of which can cause changes in gene expression or protein function [21]. This is the first study to explore de novo TR element variants genome wide in a large cohort of CHD trios. Limitation of read length with srWGS has long been an obstacle in studying TR variation since these variants can cover large regions of the genome. Firstly, our findings validate the accuracy of the GangSTR and MonSTR software packages in calling TR element variants from srWGS data based on our comparison with variants called using PacBio lrWGS with a concordance rate of 76%. Our findings indicate that, overall, there is no burden of de novo TR element variants in probands with CHD versus probands without CHD. While we do not find a burden, we do find an average of 0.93 de novo TR element variants per child, several of which are in important features including coding sequences and enhancers.

Delving into a per-gene analysis, we identify several CHD associated genes that are enriched for de novo TR element variants in affected probands versus controls. Additionally, the majority of all enriched genes are over-represented in CHD affected probands versus unaffected controls, a trend which does not hold with the control gene set. We identified more de novo TR element variants per gene in a subset of CHD genes in CHD probands versus non-CHD participants. Increased burden of variants in these genes could contribute to CHD risk. Of all the genes with a nominal enrichment, two genes, CACNA1C and EVC2, passed genome wide significance. CACNA1C codes for a subunit of the L-type calcium channel found in heart tissue. Damaging variants in CACNA1C can cause Timothy syndrome, which can present with QT prolongation, hypertrophic cardiomyopathy, and congenital heart defects [22]. EVC2 is associated with Ellis-Van Creveld syndrome and can present with structural CHD including common atrium [23]. Enrichment of de novo TR element variants in these genes supports the hypothesis that these variants modify risk of CHD. Furthermore, we present several de novo TR element variants that are present in multiple affected CHD probands supporting the hypothesis that these loci modify risk for CHD in these probands.

A strength of our study is the size of the cohort used. To our knowledge, no prior work has been done studying TR element variants in CHD at this scale. We are able to utilize this large cohort of trios as well as unaffected control trios to identify patterns in TR element variants that would not be apparent in smaller studies. Additionally, we leverage PacBio lrWGS to validate the TR element variant calls using GangSTR/MonSTR with a subset of the cohort that has WGS data on both platforms. PacBio lrWGS is highly accurate, and able to better resolve larger, repetitive regions of the genome.

Limitations

We assess STR variants that had a difference in length but not those that changed nucleotide sequence. To ensure focus our scope on variants that could impact known CHD genes, we started with 269 CHD associated genes, and took 20,000 bases on either side of each gene to call variants. We limited our analysis to TR elements proximal to known CHD genes, so future genome-wide assessment of TR variants could identify novel CHD genes, particularly genes that have unique regulatory mechanisms related to TR elements. Trans effects were not evaluated. As the non-CHD participants were siblings of people with autism spectrum disorders who are known to have TR length variants longer than their parents but shorter than their siblings with autism spectrum disorders [24], we could be underpowered to detect modest length changes that contribute a small amount to CHD risk. Though the CHD and control gene lists were matched for length and had no difference in the number of per gene TR elements, they were not matched for sequence features. Additionally, CHD is a widely variable disease that manifests with a range of phenotypes and is caused by a wide variety of genes. As such, it is difficult to identify general trends among the entire cohort given how heterogeneous the probands are.

Future directions

Our results demonstrate the prevalence of TR element variants in the genome and support the hypothesis that they modify risk in CHD. Inclusion of additional sequencing technologies that represent a gold standard for repeat element variant detection would enable benchmarking of the accuracy and sensitivity of the various TR length variant identification approaches are best suited for future studies. We propose a mechanism by which variants in gene regulatory elements could modify gene expression. Obtaining bulk RNA sequencing expression data would be vital to further test this hypothesis and examine the exact effect of these variants on expression. Secondly, expanding the analysis to include the entire genome, rather than regions near CHD genes could identify variation in regulatory sites farther away from these CHD genes or that regulate novel CHD genes. An recently updated reference panel of TR element profiles that combines several TR element variant calling algorithms will also help more accurately identify variants of interest [25]. Finally, to precisely examine the effects of a certain variant, it would be necessary to model it in vitro in a cardiomyocyte cell line, or whichever cardiac cell type expressed the gene.

Conclusions

In this study we present a targeted analysis of de novo and transmitted TR element variants in a large cohort of CHD probands. We validate the accuracy of TR element variant calling software using PacBio lrWGS. We demonstrate an overall enrichment of de novo variants in known CHD genes in comparison to unaffected controls. Our analysis also provides evidence for potential impact of specific TR element variants in CACNA1C, EVC2, and Tab 2 on the risk for CHD. To our knowledge, there is no large-scale analysis of TR variants in CHD, and especially not in a cohort of this size. TR variants are difficult to study due to their high rates of polymorphism and technical difficulties with limitations in software and sequencing platforms. Thus, not much work has been done in studying these variants, even though there is strong evidence that do affect gene expression and are associated with a variety of phenotypes and diseases. Our analysis is a first step towards characterizing the burden of TR element variants in CHD. While these results are promising, they are a preliminary exploration into these type of variants, and further work needs to be done to validate the effects of these variants on gene expression and any associations with phenotypes.

Methods

Study participants

We studied patterns of DNVs of TR element length in regions in and near 269 congenital heart disease (CHD) genes using a 20-kilobase (kb) buffer before and after the genomic coordinates of each gene. These genes were selected because they are known to have variants that are implicated in CHD. Two cohorts were utilized, the first being 1,899 trios, comprised of parents and CHD affected offspring, recruited and sequenced by the Pediatric Congenital Genomics Consortium (PCGC) of the National Lung, Heart, and Blood Institute (NHLBI). The second cohort was 1,932 trios, comprised of unaffected parents and an unaffected sibling of a child with autism, recruited and sequenced by the Simons Foundation Autism Research Initiative (SFARI).

Sequencing, alignment, and variant calling

Short read genome sequenced samples were aligned to the hg38 reference genome, and subset for the 269 CHD genes (Supplemental Table 4) with 20-kb buffers before and after the start and end of each gene, respectively. The 269 CHD genes are human CHD genes that were curated for association with dominant or recessive CHD based on Online Mendelian Inheritance in Man and literature curation or were described in studies from the Pediatric Cardiovascular Genomics Consortium (PCGC). After curation, the gene list was reviewed and approved by the Genomics Working Group of the PCGC in December 2021. Genes with evidence only in mouse or other animal models were not included. A length matched, non-CHD related set of genes was also used for comparison, with similar 20-kb buffers used for this gene set. This gene set was created using the R package “MatchIt” [26] to find a list of non-CHD related genes that have an identical total length. We utilized the MatchIt algorithm to perform many-to-one matching of genes based on the total length of all genes in the gene set, without replacement. The program uses a propensity score estimated using a generalized linear model that takes into account the total length of all genes in the set, and matched based on nearest neighbor matching. Genes known to be involved in CHD as well as neurodevelopmental diseases were excluded. The TR element variant calling pipeline comprised of three steps: 1) calling TR element length variants of period sequence 2–6 basepairs using GangSTR [11] using trio information, merging and filtering/quality control using TRTools (MergeSTR and DumpSTR) [12], de novo variant calling using MonSTR [13]. GangSTR version 2.5.0 was run with default options and the --include-ggl option to include genotype information, using the hg38 TR element reference catalog hg38.hipstr_reference.bed.gz, available from the HipSTR [27] website, selecting only for TR elements within the previously mentioned CHD gene regions. An allele was determined to be a non-reference variant if it was not the reference length defined by gangSTR. TRTools version 5.0.0 was used, and DumpSTR was used with the --vcf-type gangster used along with the following options: --gangstr-filter-spanbound-only, --gangstr-filter-badCI, --gangstr_min-call-Q 0.9, --gangstr-min-call-DP 20, --gangstr-max-call-DP 1000, --filter-regions-names SEGDUP, --drop-filtered. MergeSTR was run using default settings. MonSTR was run with default settings with trio information in fam files. Probands with > 20 DNV TR elements called were removed prior to downstream analysis likely due to mislabeled parent/proband IDs (six probands removed from PCGC cohort, and five probands removed from unaffected SFARI cohort). Variant calling was restricted to autosomal chromosomes.

PacBio long read sequencing data was used to confirm TR element genotypes. Sequencing data from the same set of probands was used. PacBio Minimap2 [28] was used for alignment using the Hg38 reference, subset for the same 269 genes as before. PacBio TRGT [29] was then used with these aligned reads along with a repeat definition file created with the same reference catalog as that used with GangSTR.

TR element enrichment

de novo TR elements were classified by the genomic feature they are in, including 5’ untranslated region (UTR), 3’-UTR, introns, first introns, heart specific enhancers [18], promoters, upstream open reading frames, and coding sequences. Gene enrichment was quantified by counting the number of DNV TR elements per gene in each cohort and performing a binomial test to check for differences. A background distribution was created by randomly assigning cohort status and performing 100,000 permutations and then a p value was calculated to determine statistical significance of enrichment of CHD genes. DNV TR elements that were present in multiple probands were quantified by position and then annotated by genomic feature. Statistical testing was done using a spatial approach to determine the probability of observing multiple DNV TR elements in the same location based on the overall density of DNV TR elements across the genomic regions tested. TR element loci with multiple hits in different probands were explored further by studying clinical cardiac phenotypes as well as extracardiac manifestations. TR element transmission from parents to children was studied using transmission disequilibrium testing (TDT), a method of quantifying the odds ratio of a certain allele being transmitted in the cohort versus not. At each locus, the most common alleles were considered by recoding those alleles as A, C, T, and G for analysis using standard software. TDT was applied as implemented in Plink [30] version 1.9. As we focused on transmission disequilibrium testing which is robust to population stratification, participant sex, age, and ethnicity were not included as covariates in statistical analysis.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (2.2MB, docx)

Acknowledgements

The authors would like to thank all participants and their families, as well as all collaborators in the Pediatric Cardiac Genomics Consortium.

Author contributions

A.S., S.U.M., D.Q., J.G.S, and C.E.S conceived and designed the work; A.S., S.U.M., D.Q., S.R.D, J.M.G, M.B, M.T-F, B.D.G, J.G.S, and C.E.S participated in the acquisition, analysis or interpretation of the data; A.S., S.U.M., D.Q., J.G.S, and C.E.S drafted the work and all authors contributed to the revision of the manuscript.

Funding

The PCGC program is funded by the NHLBI, NIH, US Department.

of Health and Human Services through grants UM1HL128711, UM1HL098162, UM1HL098147, UM1HL098123, UM1HL128761 and U01HL131003. The PCGC Kids First study includes data sequenced by the Broad Institute (U24 HD090743-01). Additional support was provided by the American Heart Association Career Development Award (S.U.M.), K08 HL157653-01 (S.U.M) and Howard Hughes Medical Institute (C.E.S.).

Data availability

Whole-genome sequencing data are deposited in the database of Genotypes and Phenotypes (dbGaP) under accession numbers phs001194.v4.p3 and phs001138.v4.p2.

Declarations

Ethics approval and consent to participate

Participants were consented to the CHD GENES study (Congenital Heart Disease Gene Network Study of the PCGC, clinicaltrials.gov identifier: NCT01196182). All participants or their parents provided written informed consent prior to participation, or legal guardians in the case of participants without capacity to consent. Regulatory oversight and approval were provided via a central Institutional Review Board protocol at the Cincinnati Children’s Hospital Medical Center, with local agreements at each recruitment site. Non-CHD participants comprised sibling–parent trios, unaffected by CHD or autism, derived from the Simons Foundation sporadic autism quartets that consisted of one offspring with autism, one unaffected sibling and their unaffected parents [31]. This study adhered to the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Abhilash Suresh and Sarah U. Morton contributed equally to this work.

References

  • 1.Giang KW, Mandalenakis Z, Fedchenko M, et al. Congenital heart disease: changes in recorded birth prevalence and cardiac interventions over the past half-century in Sweden. Eur J Prev Cardiol. 2023;30(2):169–76. 10.1093/eurjpc/zwac227. [DOI] [PubMed] [Google Scholar]
  • 2.Zimmerman MS, Smith AGC, Sable CA, et al. Global, regional, and National burden of congenital heart disease, 1990–2017: a systematic analysis for the global burden of disease study 2017. Lancet Child Adolesc Health. 2020;4(3):185–200. 10.1016/S2352-4642(19)30402-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Marelli AJ, Ionescu-Ittu R, Mackie AS, Guo L, Dendukuri N, Kaouache M. Lifetime prevalence of congenital heart disease in the general population from 2000 to 2010. Circulation. 2014;130(9):749–56. 10.1161/CIRCULATIONAHA.113.008396. [DOI] [PubMed] [Google Scholar]
  • 4.Morton SU, Quiat D, Seidman JG, Seidman CE. Genomic frontiers in congenital heart disease. Nat Rev Cardiol. 2022;19(1):26–42. 10.1038/s41569-021-00587-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Duitama J, Zablotskaya A, Gemayel R, et al. Large-scale analysis of tandem repeat variability in the human genome. Nucleic Acids Res. 2014;42(9):5728–41. 10.1093/nar/gku212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hannan AJ. Tandem repeats mediating genetic plasticity in health and disease. Nat Rev Genet. 2018;19(5):286–98. 10.1038/nrg.2017.115. [DOI] [PubMed] [Google Scholar]
  • 7.Mukamel RE, Handsaker RE, Sherman MA, et al. Repeat polymorphisms underlie top genetic risk loci for glaucoma and colorectal cancer. Cell. 2023;186(17):3659–e367323. 10.1016/j.cell.2023.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Weidemann F, Rummey C, Bijnens B, et al. The heart in Friedreich Ataxia. Circulation. 2012;125(13):1626–34. 10.1161/CIRCULATIONAHA.111.059477. [DOI] [PubMed] [Google Scholar]
  • 9.Al-Mahdawi S, Pinto RM, Ismail O, et al. The Friedreich ataxia GAA repeat expansion mutation induces comparable epigenetic changes in human and Transgenic mouse brain and heart tissues. Hum Mol Genet. 2008;17(5):735–46. 10.1093/hmg/ddm346. [DOI] [PubMed] [Google Scholar]
  • 10.Mitina A, Khan M, Lesurf R, et al. Genome-wide enhancer-associated tandem repeats are expanded in cardiomyopathy. eBioMedicine. 2024;101. 10.1016/j.ebiom.2024.105027. [DOI] [PMC free article] [PubMed]
  • 11.Mousavi N, Shleizer-Burko S, Yanicky R, Gymrek M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res. 2019;47(15):e90. 10.1093/nar/gkz501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Mousavi N, Margoliash J, Pusarla N, Saini S, Yanicky R, Gymrek M. TRTools: a toolkit for genome-wide analysis of tandem repeats. Bioinformatics. 2021;37(5):731–3. 10.1093/bioinformatics/btaa736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mitra I, Huang B, Mousavi N, et al. Patterns of de Novo tandem repeat mutations and their role in autism. Nature. 2021;589(7841):246–50. 10.1038/s41586-020-03078-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hon T, Mars K, Young G, et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data. 2020;7(1):399. 10.1038/s41597-020-00743-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Mahmoud M, Huang Y, Garimella K, et al. Utility of long-read sequencing for all of Us. Nat Commun. 2024;15:837. 10.1038/s41467-024-44804-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lesurf R, Jain A, Hanafi N et al. Long-read genome sequencing increases genomic yield in congenital heart disease. Published online May 16, 2025:2025.05.14.25327523. 10.1101/2025.05.14.25327523
  • 17.McKenna A, Hanna M, Banks E, et al. The genome analysis toolkit: A mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dickel DE, Barozzi I, Zhu Y, et al. Genome-wide compendium and functional assessment of in vivo heart enhancers. Nat Commun. 2016;7(1):12923. 10.1038/ncomms12923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hanson J, Brezavar D, Hughes S, et al. Table 2 variants cause cardiovascular heart disease, connective tissue disorder and developmental delay. Clin Genet. 2022;101(2):214–20. 10.1111/cge.14085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Thienpont B, Zhang L, Postma AV, et al. Haploinsufficiency of table 2 causes congenital heart defects in humans. Am J Hum Genet. 2010;86(6):839–49. 10.1016/j.ajhg.2010.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Fotsing SF, Margoliash J, Wang C, et al. The impact of short tandem repeat variation on gene expression. Nat Genet. 2019;51(11):1652–9. 10.1038/s41588-019-0521-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Gakenheimer-Smith L, Meyers L, Lundahl D, et al. Expanding the phenotype of CACNA1C mutation disorders. Mol Genet Genomic Med. 2021;9(6):e1673. 10.1002/mgg3.1673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Pierpont ME, Brueckner M, Chung WK, et al. Genetic basis for congenital heart disease: revisited: A scientific statement from the American heart association. Circulation. 2018;138(21):e653–711. 10.1161/CIR.0000000000000606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Trost B, Engchuan W, Nguyen CM, et al. Genome-wide detection of tandem DNA repeats that are expanded in autism. Nature. 2020;586(7827):80–6. 10.1038/s41586-020-2579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ziaei Jam H, Li Y, DeVito R, et al. A deep population reference panel of tandem repeat variation. Nat Commun. 2023;14:6711. 10.1038/s41467-023-42278-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ho D, Imai K, King G, Stuart E, Whitworth A, Greifer N. MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. Published online April 13, 2023. Accessed May 30, 2023. https://cran.r-project.org/web/packages/MatchIt/
  • 27.Willems T, Zielinski D, Yuan J, Gordon A, Gymrek M, Erlich Y. Genome-wide profiling of heritable and de Novo STR variations. Nat Methods. 2017;14(6):590–2. 10.1038/nmeth.4267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Dolzhenko E, English A, Dashnow H et al. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol Published Online January 2, 2024:1–9. 10.1038/s41587-023-02057-3 [DOI] [PMC free article] [PubMed]
  • 30.Purcell S, Neale B, Todd-Brown K, et al. PLINK: A tool set for Whole-Genome association and Population-Based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Fischbach GD, Lord C. The Simons simplex collection: A resource for identification of autism genetic risk factors. Neuron. 2010;68(2):192–5. 10.1016/j.neuron.2010.10.006. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (2.2MB, docx)

Data Availability Statement

Whole-genome sequencing data are deposited in the database of Genotypes and Phenotypes (dbGaP) under accession numbers phs001194.v4.p3 and phs001138.v4.p2.


Articles from BMC Medical Genomics are provided here courtesy of BMC

RESOURCES