Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2024 Oct 1;52(20):12456–12474. doi: 10.1093/nar/gkae842

Haplotype-resolved de novo assembly revealed unique characteristics of alternative lengthening of telomeres in mouse embryonic stem cells

Hyunji Lee 1,2, Hiroyuki Niida 3, Sanghyun Sung 4,5,, Junho Lee 6,7,8,
PMCID: PMC11551733  PMID: 39351882

Abstract

Telomeres protect chromosome ends from DNA damage responses, and their dysfunction triggers genomic alterations like chromosome fusion and rearrangement, which can lead to cellular death. Certain cells, including specific cancer cells, adopt alternative lengthening of telomere (ALT) to counteract dysfunctional telomeres and proliferate indefinitely. While telomere instability and ALT activity are likely major sources of genomic alteration, the patterns and consequences of such changes at the nucleotide level in ALT cells remain unexplored. Here we generated haplotype-resolved genome assemblies for type I ALT mouse embryonic stem cells, facilitated by highly accurate or ultra-long reads and Hi-C reads. High-quality genome revealed ALT-specific complex chromosome end structures and various genomic alterations including over 1000 structural variants (SVs). The unique sequence (mTALT) used as a template for type I ALT telomeres showed traces of being recruited into the genome, with mTALT being replicated with remarkably high accuracy. Subtelomeric regions exhibited distinct characteristics: resistance to the accumulation of SVs and small variants. We genotyped SVs at allele resolution, identifying genes (Rgs6, Dpf3 and Tacc2) crucial for maintaining ALT telomere stability. Our genome assembly-based approach elucidated the unique characteristics of ALT genome, offering insights into the genome evolution of cells surviving telomere-derived crisis.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

To stably preserve the genomic information, it is crucial to maintain the terminal regions well (1,2). Since the DNA replication machinery cannot replicate the lagging strand to the very end, the terminal sequences inevitably shorten slightly with each DNA replication, a phenomenon referred to as the end replication problem (3). Another significant issue is the end protection problem, which involves distinguishing the chromosome ends from double-strand breaks (DSBs) within the chromosome. While breaks within the chromosome are targets for repair through the DNA damage response (DDR), chromosome ends must be maintained intact to prevent unwanted activation of DDR (4). Telomeres refer to nucleoprotein complexes responsible for protecting chromosome ends and are primarily composed of simple repeat sequences and associated proteins. Telomeres can stop functioning as protective structures when they either shorten below a certain length or when they are unable to hide from the DDR machinery, even while there are long enough repeats. To maintain telomere length, a specialised mechanism is required, typically involving the reverse transcriptase enzyme known as telomerase, which replicates the simple repeat sequences from an RNA template (5). Mechanisms capable of maintaining telomeres without telomerase are referred to as alternative lengthening of telomeres (ALT) (6,7). ALT is found in approximately 15% of human cancer cells, can be spontaneously activated in cells lacking telomerase, and exists in species that originally do not rely on telomerase (8–13). ALT encompasses a complex phenomenon regulated by various genes rather than the action of a single enzyme complex like telomerase, suggesting diverse manifestations and mechanisms.

In mouse embryonic stem cells (mESCs), two types of ALT have been identified (14,15). Type I ALT reconstructs telomeres using non-telomeric unique sequences derived from a subtelomere, while type II ALT maintains telomeres using canonical telomeric repeats. In both types of ALT mESCs, the activation of ALT was not accompanied by an increase in C-circle, a well-known molecular marker of ALT activation, suggesting a potential difference in the specific telomere maintenance mechanism compared to human ALT cancers. Regarding another ALT marker, ALT-associated promyelocytic leukaemia bodies (APBs), type I ALT mESCs did not show an increase in APBs, while type II ALT mESCs did. Additionally, when analysing non-canonical variant repeats within the telomeres, only type I ALT mESCs exhibited a slight increase in the overall ratio of some variants, along with changes in the composition of these variants.

From a genomic perspective, the most significant difference between these two types is the higher telomere and genome instability observed in type I ALT mESCs. During telomere shortening, ALT activation, and subsequent stabilisation processes, single nucleotide variations (SNVs) and copy number variations (CNVs) occur vigorously in type I ALT. However, due to the limitations of short-read whole-genome sequencing, it was impossible to accurately determine the structure and distribution of large-scale structural variants (SVs). These SVs can alter the gene regulatory network associated with ALT by affecting gene expression levels without directly changing the coding regions of genes.

Since SNVs and CNVs accumulate even during periods when telomeres are maintained stably in type I ALT, there is a high likelihood that telomere instability contributes to continuous genomic changes. Indeed, cases have been reported where cancer cells continue to exhibit a certain degree of telomere dysfunction even after acquiring telomere maintenance mechanisms (16,17). ALT, especially characterised by heterogeneous telomere lengths, can have a significant impact on the entire genome through non-reciprocal translocations originating from single chromosomal ends with short telomeres.

Despite the possible significance of genomic instability in ALT, there have been no cases of analysing genomic changes associated with ALT in mammalian models using long-read sequencing. Long-read sequencing can resolve many issues arising from fragmentation in short-read sequencing as it is capable of reading 100 times longer DNA sequences from a single molecule (18,19). Utilising long reads can fundamentally enhance the quality of genome assembly, facilitating the interpretation of complex genomes that include repetitive sequences and SVs. Moreover, when genome assembly extends to the phasing, which refers to the process of assembling haplotypes, it becomes possible to understand and detect various genetic variations, including the analysis of mutation status with allele-resolution and relatively detailed exploration of the structure of complex cancer genomes.

We integrated two long-read sequencing techniques with Hi-C to construct haplotype-resolved ALT genomes. We utilised a telomerase-positive original cell line and a cell line maintaining stable telomeres with ALT activation after telomerase knock-out. From high-quality genomes, we were able to identify SVs, small variants, and SNVs specific to ALT cells at allele resolution. We also detected genome-wide CNVs. Interestingly, subtelomeres showed resistance to SVs while exhibiting active SNV and CNV events. Moreover, we gained detailed insights into chromosome end structures, allowing us to analyse the arrangement and connectivity of mTALT, the template for telomere construction in type I ALT. Notably, copies of mTALT joined not only with terminal regions but also within the genome, without any variants, suggesting its potential involvement in repairing various DSBs. Finally, by identifying genes that are likely influenced by ALT-specific variants and confirming that their depletion induces a dysfunctional telomere phenotype, we demonstrated the significance of completing the ALT genome.

Materials and methods

Mouse embryonic stem cells and accessions

Type I ALT mESCs (DKO741) were produced and provided by H.Niida, and the cell line was described in a previous paper (20). Type I ALT mESC used in this study is a cell line that stably maintains ALT, corresponding to approximately 800 population doublings (PDs) of telomerase-negative (Terc knock-out) mESCs. Telomerase-positive (Tel+) mESC (embryo day 14), derived from the 129/Ola strain and passaged in parallel with type I ALT mESCs, was another cell line used in this study. This cell line was also provided by H. Niida. All mESCs used in the experiments were cultured under feeder-free conditions supplemented with LIF (WAKO #121–06663).

Additionally, the public genome assembly of C57BL/6J was used as a reference genome. The C57BL/6J genome, GRCm39, was downloaded from Ensembl (release 104). The public Hi-C data from mESCs (E14, 129/Ola strain) were used for de novo assembly in this study. Hi-C data of mESCs, SRR16961018, were downloaded from ENA.

gDNA extraction, PacBio sequencing and Oxford Nanopore Technologies sequencing

The process of genomic DNA extraction from each mESC was performed using the Gentra Puregene Cell kit (Qiagen #158388) according to the manufacturer's manual. Briefly, 2 × 107 cells were collected and lysed, RNA was degraded using RNase A, and DNA was isolated using isopropanol after protein precipitation. DNA from type I ALT mESCs (PD800) and Tel+ mESCs was extracted. HiFi SMRTBell library construction, size selection and sequencing were performed at Macrogen (https://dna.macrogen.com). The libraries were sequenced by PacBio Single Molecule, Real-Time (SMRT) DNA sequencing technology (platform: PacBio Sequel II; mode: circular consensus sequencing (CCS) mode). Oxford Nanopore Technologies (ONT) library preparation and sequencing were performed at the National Instrumentation Center for Environmental Management (NICEM) (https://nicem.snu.ac.kr/main/). ONT libraries were sequenced by Oxford Nanopore Technologies Single Molecule DNA sequencing technology (platform: PromethION Flow Cell).

Hi-C analysis

Hi-C experiments were conducted using the Dovetail Micro-C kit (Cantata Bio #21006) according to the manufacturer's manual. Briefly, 1 × 106 cells were collected, cross-linked with formaldehyde, in situ digested with MNase, and lysed using SDS buffer. After reverse cross-linking the lysate, the size and distribution of the DNA fragments were confirmed using Caliper Labchip GX. The next step was carried out after confirming that the mononucleosome fraction was 40–70%. Chromatin was bound to chromatin capture beads and separated using a magnetic rack. The cut DNA ends were trimmed using end polishing enzymes, and bridges were ligated. After completing proximity ligation through intra-aggregate ligation, crosslink reversal was performed, and DNA was purified using SPRIselect beads (Beckman coulter #B23317). After the end repair reaction, adapter ligation was performed and DNA purification was performed. Streptavidin beads were used to capture the ligates attached to the adapter, the library was amplified through Unique Dual Index PCR, and the final library was created by selecting fragments of 350–1000 bp in size using SPRIselect beads. Libraries were generated in duplicates for type I ALT mESCs (PD800), and 800M reads for each library were sequenced by Illumina sequencing technology (platform: NovaSeq 6000). fastq files were processed to BAM files with BWA (BWA version 0.7.17; bwa mem -5SP -T0 -t16) (21) and samtools (Samtools version 1.13) (22,23). Contact matric (.hic) were generated with JuicerTools (Juicer version 1.6) (24). Inter-chromosomal SVs were predicted (predictSV) and plotted (plot-interSVs) with EagleC pipeline (25).

Genome assembly

Each haplotype-resolved genome was de novo assembled using HiFi reads with 13× depth each for type I ALT and Tel+ mESCs, ONT reads with 30× depth each for type I ALT and Tel+ mESCs, and Hi-C data with 96× depth for type I ALT mESCs and 18× depth for Tel+ mESCs using Hifiasm (Hifiasm version 0.18.2; –hom-cov 10) (26,27). And de novo assembly using only ONT reads with 30× depth each for type I ALT and Tel+ mESCs, respectively, was performed using Shasta (Shasta version 0.8.0; –config Nanopore-Dec2019) (28). Each assembly was scaffolded using RagTag (29), YaHS (30), Pin_hic (31), and SALSA2 (32). Haplotype-resolved contig assemblies were mapped to the C57BL/6J genome using minimap2 (version 2.0.1) (33), and the output PAF file was used for scaffolding using RagTag scaffold (RagTag version 2.0.1; read –aligner minimap2 —minimap2-params=‘-x asm5’ scaffold). Scaffolding of type I ALT genome was also performed based on Hi-C data. Hi-C data were mapped to contig assemblies of type I ALT mESCs according to Arima Genomics' mapping pipeline (BWA version 0.7.17, Samtools version 1.13, perl version 5.26.1). The obtained output BAM file was sorted using SAMtools (Samtools version 1.13; sort -n -O BAM). Scaffolding was then performed based on the sorted BAM file using YaHS (YaHS version 1.2a.1; -q 30 -l 300 000 –no-contig-e -e GATC). For scaffolding using SALSA (version 2.3), BAM files previously obtained following Arima Genomics' mapping pipeline were converted to BED files using BEDtools (BEDtools version 2.30.0; bedtools bamToBed) (34). Scaffolding was then performed based on the BED files using SALSA2 (SALSA2 version 2.3; -c 300 000 -e DNASE -p yes -O # -f). Finally, the previously obtained BAM files were also used for scaffolding using Pin_hic (Pin_hic version 3.0.0; -i 3).

Genome quality assessment

The completeness of the assemblies, based on HiFi/ONT reads and Hi-C data, was assessed by the BUSCO value and the length of the repetitive elements. First, BUSCO analysis was performed using the vertebrate database (BUSCO version 5.2.2; -m genome -l vertebrata_odb10) (35,36). And repetitive elements were detected and masked using RepeatMasker (RepeatMasker version 4.1.2-p1; -species Metazoa -s).

Identification of genome-wide variants

For variant calling using SVIM-asm (ver.2.22) (37), haplotype-resolved contigs from each assembly (type I ALT and Tel+ mESCs) were aligned to the C57BL/6J genome using minimap2 (minimap2 version 2.22; -a -x asm5 –cs -r2k) and the output BAM files were sorted and indexed using SAMtools (Samtools version 1.7; sort -m4G -O BAM and index). These indexed BAM files were used to detect variants using SVIM-asm (SVIM-asm version 1.0.2; haploid –min_sv_size 1). For variant calling using PAV (ver.2.3.4) (38), haplotype-resolved contigs from each assembly (type I ALT and Tel+ mESCs) were aligned to the C57BL/6J genome using minimap2 (minimap2 version 2.22; -x asm20 -m 10 000 -z 10 000,50 -r 50 000 –end- bonus = 100 -O 5,56 -E 4,1 -B 5 –secondary = no -a -t 12 –eqx -Y). The output BED files were used to detect variants using PAV (PAV version 2.3.4). To extract putative true-positive variants, we used BEDTools options to extract those that perfectly overlapped between two call sets detected using SVIM-asm and PAV (BEDtools version 2.30.0; bedtools intersect -wa -r -f 1.0 -a -b for deletions and duplications and bedtools intersect -wa -wb -r -f 1.0 -a -b | awk ‘$4==$8′ for insertions). Deletions, insertions and duplications were divided into 1–49 bp or ≥50 bp, depending on the size of variants. For SNVs, a callset captured using PAV was used without any extraction process. Finally, to identify type I ALT-specific variants, we further analysed the commonality of variant sets based on overlap between type I ALT and Tel+ mESCs using BEDtools (BEDtools version 2.30.0; bedtools intersect -v -a -b).

For read-based SV calling using NanomonSV (ver.0.7.2) (39), ONT reads from type I ALT and Tel+ mESCs were aligned to the C57BL/6J genome using minimap2 (minimap2 version 2.22; -a -x map-ont) and the output BAM file was sorted and indexed using SAMtools (Samtools version 1.7; sort -O BAM and index). These indexed BAM files were used to detect SVs using NanomonSV (NanomonSV version 0.7.2; parse and get —single_bnd —use_racon). To exclude SVs located within simple repeat sequences, we used annotation information for these regions in the C57BL/6J genome from the UCSC database. Only deletions and insertions were retained for downstream analysis.

Variant analysis

To comprehensively understand the allele-resolution status of variants, all variants—including SNVs, small variants, and SVs—were divided into three groups: 1|1, 1|0 or 0|1, based on the genotype, where 0 indicates the reference allele, 1 indicates the alternative allele, and the vertical pipe | denotes the phased genotype. The effect of each variant was predicted using VEP (version 109.3) (40). All genes affected by variants were categorised into three groups: (i) affected by monoallelic variants, where the genotype of all variants within the gene is either 0|1 or 1|0; (ii) affected by biallelic heterozygous variants, where the genotypes of variants within the gene are a mix of 0|1 and 1|0 and (iii) affected by biallelic homozygous variants, where at least one variant within the gene has the genotype 1|1.

Identification of copy number variations (CNVs) and chromothripsis

HiFi reads of each sample (type I ALT and Tel+ mESCs) were mapped to the C57BL/6J genome using minimap2 (minimap2 version 2.22; -ax map-hifi). The obtained BAM file was then sorted and indexed using SAMtools (SAMtools version 1.13; sort -O BAM and index). We inferred the copy number ratio of the type I ALT to Tel+ mESCs using CNVkit (CNVkit version 0.9.10) (41). Chromothripsis was identified through visual scoring according to a previous paper (42).

Kataegis analysis

Type I ALT-specific SNVs were mapped to the GRCm39 genome used in this study. Kataegis analysis was performed using the R KataegisPortal package (ver.1.0.3) (43), with the default settings for the human genome partially adapted to be specific to the mouse genome. Modified R code used in the study is uploaded to the following git repository: http://github.com/hyyuunjii/NAR_revision (permanent DOI: http://zenodo.org/doi/10.5281/zenodo.13352886). At least six consecutive SNVs with a maximum intermutation distance of 1000 bp were defined as kataegis, and only kataegis events with a confidence score >0 were included in further analysis.

Genotype information was used to determine the allele-resolution status of the SNVs in each kataegis event, allowing classification into four groups: (ii) all SNVs in the kataegis clusters were derived from allele 1; (ii) all SNVs in the kataegis clusters were derived from allele 2; (iii) all SNVs in the kataegis clusters were biallelic homozygous variants and (iv) SNVs in the kataegis events were derived from both alleles.

Association of kataegis and genomic rearrangements, including SVs and chromothripsis

The distance from the midpoint of each kataegis event to the SV breakpoint was manually calculated, with an association defined as the nearest distance being less than 10 kb. An association between kataegis and chromothripsis was confirmed only if the kataegis events were fully contained within the chromothripsis regions.

Synteny analysis

To analyse the synteny of subtelomeres between Tel+ mESCs and the reference, scaffolds of Tel+ mESCs were mapped to the C57BL/6J genome using winnowmap (Winnowmap version 2.03) (44). The output SAM files were then used to investigate synteny using SyRi (SyRi version 1.4; -k -FS) (45). To analyse chromosome-level synteny between the reference and scaffolds of each sample (type I ALT and Tel+ mESCs), chromosome scaffolds were mapped to the C57BL/6J genome using minimap2 (version 2.0.1). Links with a mapping score of 60 were extracted from the output PAF file, and approximately 25000 links were visualised using circos (circos version 0.69.8) (46).

Subtelomere and telomere analysis

To identify chromosome end structures at the single nucleotide level, we generated a list of putative end-containing contigs using blastn from NCBI-BLAST (version 2.7.1+) (47). BLAST database for type I ALT genome was generated using makeblastdb from the NCBI-BLAST package (version 2.7.1+). The canonical Mus musculus telomeric repeat sequences (6 copies of TTAGGG), the full-length sequence of mTALT, and part of the subtelomeric region (∼200 kb away from the telomeric repeat at the end of q-arm) were used as query sequences. Contigs showing at least 50% sequence match with the query sequence were used, and the end-containing contigs were manually selected.

Reverse transcription quantitative PCR (RT-qPCR)

RNA was purified using TRI reagent (Sigma #93289)/chloroform and ethanol precipitation. Reverse transcription was performed using the Omniscript RT kit (Qiagen #205113) according to the manufacturer's instructions. Gene expression levels were quantified by performing quantitative PCR from cDNA. The primers used are listed in Supplementary Table S4.

Western blot

Approximately 5 × 106 cells were harvested and treated with ice-cold RIPA buffer (ELPIS-BIOTECH #EBA-1149) at 4°C for 30 min, followed by centrifugation at 16 000 × g for 20 min. After taking 20 μl of the supernatant for BCA quantification (Thermo Scientific #23227), the remaining supernatant was aliquoted (60 μg protein). 5× Laemmli sample buffer (ELPIS-BIOTECH #EBA-1052) was added, the samples were boiled at 95°C for 5 min, and spun down at 16 000 × g for 1 min. Samples were loaded onto a pre-cast gel (BioRad #4561084) and run at 50 V for 5 min, then 100 V for 90 min. After soaking the gel in transfer buffer (ELPIS-BIOTECH #EBA-1042) for 10 min, transfer was performed onto a nitrocellulose membrane (Whatman #10401396) at 100 V, 4°C for 2 h. The membrane was stained with Ponceau S (ELPIS-BIOTECH #EBP-1032) to verify transfer, blocked in 3% BSA (Sigma #A1470)/TBST (Welgene #ML023-03) for 1 h, and incubated overnight at 4°C with primary antibodies (RGS6, Novus #NB100-56659, 1/1000; DPF3, Novus #NBP2-14910, 1/1000; TACC2, Novus #NBP1-31221, 1/1000; β-actin, ABclonal #AC026, 1/10 000). After washing, the membrane was incubated with HRP-conjugated secondary antibody (ABclonal #AS014, 1/2000) for 1 h at room temperature, developed using ECL solution (BJ Trading #BJECL) and imaged using a CCD camera (GE Healthcare, ImageQuant LAS 4000 mini).

Immunofluorescence and telomere fluorescence in situ hybridization (FISH)

Immunofluorescence and telomere FISH were carried out largely as described in the previous paper, but with slight modifications (48). Cells were cultured in a feeder-free condition and then trypsinised at 50–60% confluency. Cells were diluted in PBS to a concentration of 2–5 × 105 cells/ml. The diluted cells were cytocentrifuged at 200 × g for 10 min using a cytospin funnel. Cells were fixed in 3.7% formaldehyde and permeabilised in KCM buffer (120 mM KCl, 20 mM NaCl, 10 mM Tris–Cl, pH 7.5, 0.1% Triton X-100). Blocking was performed using ABDIL buffer (20 mM Tris–Cl, pH 7.5, 2% bovine serum albumin, 0.2% fish gelatin, 150 mM NaCl, 0.1% Triton X-100, 0.1% sodium azide) supplemented with RNase A (100 ug/ml). After treatment with primary antibody diluted with ABDIL, the cells were incubated overnight at 4°C. Slides were rinsed with PBST and treated with secondary antibodies diluted in ABDIL for approximately 1 h at room temperature. The slides were then cleaned once more in PBST and cells were fixed with 3.7% formaldehyde. Slides were dehydrated in 70%, 90% and 100% graded ethanol. After the slide was completely dried, 0.3 ug/ml telomere PNA probe diluted in hybridisation solution (70% formamide, 0.5% blocking reagent, 10 mM Tris–Cl, pH 7.5) was treated, and denaturation was performed at 80°C for 5 min. Hybridisation was then performed overnight at room temperature. Next, slides were washed with PNA washes A (70% formamide, pH 7.5, 10 mM Tris–Cl, pH 7.5) and B (50 mM Tris–Cl, pH 7.5, 150 mM NaCl, 0.8% Tween 20). After a series of ethanol dehydration and air drying, the slides were mounted with Prolong Gold and stored overnight. Images were captured using Applied Precision's DeltaVision (Olympus IX71) and SoftWoRx software (version 6.11).

Primary antibodies: γ–H2A.X (Biolegend #613401, 1/1000 dilution), 53BP1 (NOVUS, NB100-304, 1/1000 dilution), RPA2 [p Ser33] (NOVUS, NB100-544, 1/1000 dilution). PNA probe: TelG-Cy3 ((TTAGGG)*4-Cy3), TelC-FITC ((CCCTAA)*4-FITC)

Chromosome orientation-FISH (CO-FISH)

The overall procedure was performed as previously described (48). Cells were plated at 40–50% confluency and treated with 10 μM 3:1 BrdU/BrdC for 8 h. During the final 1.5 h, 100 ng/ml Colcemid was added. The collected cells were resuspended in a 0.8% sodium citrate solution and incubated at 37°C for 10 min. After centrifugation, the supernatant was removed, and the cells were resuspended in 300 μl of the remaining solution. Gradually, 1 ml of pre-cooled fixative (composed of 75% methanol and 25% acetic acid) was added drop by drop, with gentle tapping after each drop. A total of 9 ml of fixative was added with continued tapping. The cells were fixed overnight at 4°C and then stored at −20°C. Slides were prepared in advance by immersing them in 100% ethanol for 1 h, then air-dried while preparing fresh fixative. The fixed cells were centrifuged again and resuspended in fresh fixative to achieve a concentration of approximately 1 × 106 cells/ml. The cleaned slides were placed on wet paper towels, and the resuspended cells were applied to the slides from a height of 2 cm. The slides were then incubated at 65°C for 1 min and allowed to air-dry completely. Finally, the slides were inspected under a bright-field light microscope to identify well-preserved metaphase chromosomes and were stored at room temperature overnight to cure. The slides were rehydrated in PBS, fixed in 3.7% formaldehyde for 5 min, and treated with RNase A (250 μg/ml) for 15 min at 37°C. After washing, the slides were incubated in 2× SSC containing 0.5 μg/ml Hoechst 33258 for 15 min, exposed to UV light (365 nm) for 30 min, and treated with 80 μl of Exonuclease III (10 U/μl) at 37°C for 30 min. Following heat treatment with 70% formamide/30% 2× SSC, the slides were dehydrated with ethanol. TelG-Cy3 PNA probe (0.03 ug/ml) was denatured for 10 min at 70°C and hybridised for 2 h at room temperature, followed by washing and hybridisation with TelC-FITC PNA probe (0.3 ug/ml). After final washes, the slides were air-dried, mounted with Prolong Gold (Invitrogen #P36930), and imaged using SoftWoRx (version 6.11).

Results

Haplotype-resolved de novo assemblies of type I ALT and Tel+ mESCs

Type I ALT mESC (DKO741) used in this study is a cell line that stably maintains telomeres with ALT, corresponding to approximately 800 population doublings (PDs) of telomerase-negative mESCs derived from 129/Ola strain (Figure 1A). In previous studies, we identified the mouse template for ALT (mTALT), a specific subtelomeric element used by type I ALT mESCs to reconstruct telomeres.

Figure 1.

Figure 1.

Haplotype-resolved de novo assemblies of type I ALT and telomerase-positive (Tel+) mESCs. (A) Schematic model showing type I ALT mouse embryonic stem cells (mESCs) established from telomerase knock-out, alongside Tel+ mESCs passaged in parallel with type I ALT mESCs. The lower panel shows the sequence of events from telomere shortening, telomere dysfunction, and chromosome fusion to ALT activation. (B) Genomic changes detected in type I ALT mESCs can be divided into two groups depending on their source. (i) 129/Ola background-specific changes in common with Tel+ mESCs; (ii) Type I ALT-specific changes acquired due to DNA damages occurred during the process of ALT establishment (orange). (C) Nx plot showing the continuous length distribution of each assembly. The intersection of each horizontal solid line and vertical dashed line represents the N50 for each assembly. (Hifiasm) assembled using only HiFi reads (Hifiasm + UL) assembled by integrating HiFi reads with ONT ultra-long reads, (Hifiasm + UL + Hi-C) assembled by integrating HiFi read, ONT UL read and Hi-C data, (Shasta) assembled using only ONT reads. A pair of haplotype-resolved genome are shown in the same colour. (D) BUSCO values of each haplotype-resolved assembly. The reference on the left corresponds to the C57BL/6J genome. (E) Proportion of the repetitive sequence length compared to the whole genome in each haplotype-resolved assembly and the C57BL/6J genome.

Numerous genomic changes resulting from events such as telomere dysfunction, chromosome fusion and ALT activation are assumed to accumulate in the type I ALT genome. Type I ALT cells are derived from a different strain than the C57BL/6J strain which is used for reference genome (GRCm39). Therefore, even if genomic changes in type I ALT genome are detected by comparison to the reference genome, it is hard to distinguish whether they are 129/Ola-mediated or ALT-mediated changes. We attempted to extract ALT-specific genomic changes using the genome of telomerase-positive (Tel+) mESCs, derived from 129/Ola and passaged in parallel with type I ALT mESCs (Figure 1B). Since type I ALT and Tel+ mESCs originate from the same strain, 129/Ola-specific genomic changes in type I ALT mESCs will also be detected in Tel+ mESCs. On the other hand, ALT-mediated changes in type I ALT mESCs will be detected in type I ALT alone, none in Tel+ mESCs. We aimed to comprehend ALT-related genomic changes by isolating type I ALT-specific genomic changes not identified in Tel+ mESCs, using the C57BL/6J genome as a reference.

Type I ALT and Tel+ mESCs were sequenced using PacBio HiFi sequencing at an average coverage of 13× (Supplementary Table S1; Supplementary Figure S1), yielding highly accurate long reads with an average read length of 15 kb and per-base accuracy >99.9%. Furthermore, two lines were sequenced using Oxford Nanopore Technologies sequencing at an average coverage of 30×, producing >17000 ultra-long (UL) reads (>100 kb) from each sample. For de novo assembly integrating both HiFi and ONT reads, we used the hifiasm assembler, which could generate two-pair haplotype-resolved contig assemblies (26). Based on the integration of HiFi and ONT reads that complement each other to improve assembly contiguity, Hi-C data were further integrated into the de novo assembly process (Hifiams + UL + Hi-C), ultimately assembling phased contig assemblies with N50 over 36 Mb (36.8–44.1 Mb) for both samples (Figure 1C; Table 1). Hi-C data contributed to increasing the N50 and reducing the number of contigs in the haplotype-resolved assemblies. The sum of each assembly was similar to that of the reference genome, 2.4 Gb (2.4–2.7 Gb), indicating that the haplotypes were well resolved.

Table 1.

Statistics of the de novo assemblies for type I ALT and telomerase-positive (Tel+) mESCs

Sample Assembler Assembly type Assembly length (Gb) Contig N50 (Mb) Max contig length (Mb) Number of contigs (n)
Tel+ mESCs Hifiasm + UL + Hi-C hap1/hap2 2.7/2.5 36.9/36.8 131.3/130.4 1266/816
Hifiasm + UL hap1/hap2 2.5/2.6 35.1/36.8 131/121.5 1181/1134
Hifiasm hap1/hap2 2.6/2.6 2.9/2.9 38.0/37.6 4529/3902
Shasta primary 2.5 7.3 31.8 4424
Type I ALT mESCs Hifiasm + UL + Hi-C hap1/hap2 2.4/2.7 42.3/44.1 150.5/152.4 519/756
Hifiasm + UL hap1/hap2 2.4/2.5 35.7/36.6 152/141.4 834/772
Hifiasm hap1/hap2 2.7/2.5 2.3/2.2 37.2/42.6 4919/4570
Shasta primary 2.3 2.3 31.3 3960

BUSCO values, which could infer genome quality through the number of well-assembled single-copy orthologs, were analysed on the final assembled contig assemblies (Hifiasm + UL + Hi-C). Most assemblies had >96% of BUSCO completeness, similar to that of reference genome (Figure 1D). Exceptionally, one assembly of type I ALT mESCs missed 205 ortholog genes, which resulted in a slight decrease in BUSCO completeness. It is assumed that ALT-related changes in type I ALT mESCs involve loss of multiple genes. Next, we analysed the lengths of various types of repetitive regions (SINE, LINE, LTR, satellites, etc.) in each assembly, which are known to be difficult to assemble due to consecutive repeats. In all assemblies, the lengths of repetitive regions were similar to those of the reference genome (Figure 1E). Satellites, in particular, have been assembled longer in our assemblies than in the reference genome. Taken together, these results showed that we produced haplotype-resolved assemblies with high contiguity.

Approximately 3/4 of q-arm subtelomeres were rearranged in two strains, 129/Ola and C57BL/6J

We then scaffolded the chromosomes of Tel+ mESCs using synteny with the reference genome, resulting in a pair of haplotype-resolved scaffold assemblies with N50 over 127 Mb (127.2–131.3 Mb) (Supplementary Figure S2A; Supplementary Table S2A). Both scaffold assemblies showed conservation of synteny with the reference at chromosome level (Supplementary Figure S3A). However, the subtelomeric regions were rarely aligned between the two strains (Supplementary Figure S4). We assembled a total of 16 scaffolds that ended with telomeres in Tel+ mESCs, 11 of which (approximately 3/4) had strain-specific sequences in their subtelomeric regions generated by rearrangements, indicating that rearrangements occurred frequently in subtelomeres during the bifurcation of two strains. Furthermore, subtelomeric regions of three chromosomes differed between two haplotypes of Tel+ mESCs, implying that rearrangements of homologous chromosomes can occur independently. The remaining six q-arm ends of Tel+ mESCs maintained synteny with the reference genome, without rearrangements in the subtelomeric region. Comparison of p-arm end sequences between the two strains was excluded from the analysis because the p-arm telomeres are misassembled regions in the reference genome.

For scaffolding the chromosomes of the type I ALT mESCs, we used YaHS (30), which assembles chromosomes using Hi-C data. We assembled scaffolds with longer N50 and smaller gaps than those assembled using synteny with the reference genome (Supplementary Figure S2B and C; Supplementary Table S2A). YaHS outperformed other Hi-C-based scaffolders, Pin_hic (31) and SALSA2 (32,49), resulting in scaffolds with the longest N50, as reported in previous studies (Supplementary Figure S2D and E; Supplementary Table S2B). Type I ALT mESCs maintained chromosome-level synteny with the reference genome (Supplementary Figure S3B). Interestingly, three pairs of end-to-end chromosome fusions were observed in the scaffolds of type I ALT mESCs (end-to-end fusion between chromosomes 6–12, 7–10 and 15–17). Two pairs of fusions (6–12 and 15–17) were confirmed in the contigs (Supplementary Figure S5D–E), and contact between the two fusion sites was also confirmed in the Hi-C contact matrix map (Supplementary Figure S5A–C). However, no fusion between chromosome 7 and 10 was identified in the contig, and no contact was captured in the Hi-C contact map. Since this fusion was not observed in chromosomes assembled using other scaffolders, it is expected to be false positive due to limitation of the scaffolder.

Type I ALT telomeres reconstructed by mTALT exhibited various forms of structures

Our contig assemblies of type I ALT mESCs enabled us to understand end repair at the nucleotide level. We first marked putative end-containing contigs using end-specific sequences, such as mTALT, telomeric repeats, and subtelomeric sequences. We recovered only 11 q-arm end-containing contigs (11 out of 19 excluding sex chromosomes) because incomplete genome assembly and repetitive sequences in subtelomeres limited full reconstruction and marking chromosome ends. Seven ends had copies of mTALT, and the remaining four ends were fused (Figure 2).

Figure 2.

Figure 2.

Schematic representation of chromosome end structures (not to scale). (A) The origin of mTALT in Tel+ mESCs. mTALT (dark blue bar) is a 7347-bp sequence duplicated from chromosome 13 to the right end of chromosome 11, flanked by telomeric repeats (grey bar) at both sides. (B) Typical telomere structures in Tel+ mESCs and reference genome (upper) and telomere structures of the type I ALT mESCs (lower). (C–G) Assembled and confirmed mTALT-mediated chromosome end structures with intact subtelomeres. (H, I) Assembled and confirmed mTALT-mediated chromosome end structures with deleted subtelomeres. (J, K) Assembled and confirmed end-to-end chromosome fusions in type I ALT mESCs.

The seven end-containing contigs of type I ALT mESCs had new chromosome end structure with 7349-bp mTALT and the ∼700 bp short telomeric repeats repeated in the same direction. Subtelomeric deletion was not observed in 5 out of seven contigs, with 400–500 bp of telomeric repeats remaining between subtelomeric ends and the first mTALT (Figure 2CG). Additional subtelomere deletions were observed in the remaining two contigs, with 9643-bp and 158-kb deletion on chromosome 1 and 9, respectively (Figure 2H and I). Interestingly, two contigs (end structure of chromosome 1 and 3) had mTALT and telomeric repeats copied in opposite directions (Figure 2D and H). On chromosome 3, ALT telomeres (mTALT and telomere repeat unit) was elongated in the opposite direction at 229 bp of the first mTALT at the end, which is interpreted as the scar from DDR events in the ALT telomeres. On chromosome 1, all ALT telomeres were oriented in the opposite direction after 1226-bp insertion in the subtelomeric regions. Based on homology to the 3′ end of the inserted sequence, CCCTAA, the ALT telomere would have been replicated in the opposite directions or the inverted structure can be derived from chromosomal fusion events. We also labelled putative end-containing UL reads (≥100 kb) using end-specific sequences, but were unable to identify additional chromosome end structures.

End-to-end fusion between two chromosomes in type I ALT

Next, we resolved the fusion sites in four ends at the nucleotide level. Interestingly, in all chromosome fusion sites, one chromosome end had deletions in telomeric repeats and even in subtelomeric regions, while the other chromosome end had just deletions in telomeric repeats, with 400–500 bp remaining (Figure 2J and K). Specifically, the end of chromosome 6 had 4929-bp deletions in subtelomeric regions, concealed by fusion with the end of chromosome 12 using 3-bp microhomology. And the end of chromosome 17 had very large deletions (20 Mb), corresponding to 21% of the chromosome, which was concealed by fusion with the end of chromosome 15 (Supplementary Figure S5F). Unfortunately, we were unable to recover chromosome fusion events at p-arm ends because misassembly of p-arm ends in the reference genome restricted labelling of p-arm chromosome ends.

Copies of mTALT and telomeric repeats, with an average length of 50 kb, were junctioned with non-subtelomeric sequences

Junctions with mTALT copies were not limited to subtelomeric sequences. We extracted mTALT-containing UL reads (≥100 kb) and found 11 structures where mTALT and telomeric repeats joined non-subtelomeric sequences (Figure 3). We resolved junction sites in the UL reads at the nucleotide level. The junctioned mTALT and telomeric repeats had an average length of 50 kb and joined with sequences on various chromosomes. Most structures (9/11) exhibited deletion in junctioned mTALT in which the mTALT deletion sizes varied from 1.5 to 6.4 kb. This finding contradicts what we discovered in chromosome end structures, where telomeric repeats exist between subtelomeric sequences and mTALT (Figure 2CI), implying that the mechanisms by which mTALT copies join with subtelomeric or non-subtelomeric sequences may differ.

Figure 3.

Figure 3.

Schematic representations of mTALT and telomeric repeats junction with non-subtelomeric regions (not to scale). The structure was confirmed in ONT ultra-long reads (>100 kb).

More than 1000 type I ALT-specific structural variants were identified

We comprehensively investigated the characteristics of the type I ALT mESCs using our assemblies. For variant calling in paired haplotype-resolved assemblies, we used two different variant callers: SVIM-asm (37) and PAV (38). The detected variants were classified into three groups; SNVs, small variants (<50 bp), and SVs (≥50 bp). Approximately 94% of all variants detected in type I ALT mESCs were in common with Tel+ mESCs and were probably unrelated to the ALT (Figure 4A). Variants specifically detected in type I ALT mESCs, that is, variants presumed to have been created during the process of ALT acquisition and maintenance, were extracted, and we identified approximately 300000 SNVs, 140000 small variants and 1000 SVs (Table 2). There were >1000 type I ALT-specific SVs, including deletions, insertions, and tandem duplications. The majority of deletions (66%) and insertions (70%) were not larger than 200 bp, and duplications were the rarest, but more than half of them were larger than 50 kb (Figure 4B). We investigated the distribution of annotated SVs and found that most deletions (212/214) and insertions (813/816) were distributed in non-coding regions, whereas duplications were mainly distributed in regulatory or coding regions (Supplementary Figure S6). The estimated number of genes affected by insertions, deletions, and duplications were 508, 133, and 68, with 0, 9, and 62 genes predicted to have significant gene expression change, respectively (Figure 4C). Despite their rarity, duplications had a significant impact on gene expression changes in 62 genes by causing transcript amplification, whereas deletions and insertions had less severe effects.

Figure 4.

Figure 4.

Characteristics of single nucleotide variants (SNVs), small variants (<50 bp), structural variants (SVs), and copy number variations (CNVs) in type I ALT mESCs. (A) All variants detected in the haplotype-resolved assemblies of type I ALT mESCs are divided into three types based on the size and type; SNVs, small variants (<50 bp), and SVs (≥50 bp). Bar graph showing the ratio of type I ALT-specific variants and variants common with Tel+ mESCs. Numbers in the plot indicate the absolute number of variants. (B) Size distribution of type I ALT-specific SVs (total 1035) detected in haplotype-resolved assemblies. The numbers of variants range in size from 50 bp to >500 kb. (C) Pie chart showing the number of genes affected by deletions, insertions, and duplications belonging to 1035 SVs. Low, Modifier, and High indicate the severity of the impact of SV. (D) Bar graph showing the ratio of each genotype of type I ALT-specific variants. Numbers in the plot indicate the absolute number of variants. (E–G) Distribution of type I ALT-specific variants in subtelomeric regions and other genomic regions. Subtelomeric regions were defined as 500-kb regions from the q-arm end of each chromosome. (E) SV distributions. The numbers of SVs were merged in 500-kb intervals. (F) Small variants (<50 bp) distributions. The numbers of variants were merged in 500-kb intervals. (G) SNV distribution. The numbers of SNVs were merged in 100-kb intervals. The numbers in the plot represent the average number of variants in each interval. (H) Plot showing regression analysis results for the number of type I ALT-specific SVs and SNVs located in the identical 100-kb windows. The red line represents the regression line for all measurements. The number in the plot indicates the Pearson correlation coefficient. (I) Box plot showing the distribution of CNVs in subtelomeric regions and other genomic regions. The y-axis shows the copy number (log2 ratio) of type I ALT mESCs relative to Tel+ mESCs using HiFi reads. (J) Box plot showing the distribution of CNVs in subtelomeric regions and other genomic regions for each chromosome. The horizontal line in the box represents the mean. DEL; deletion, INS; insertion, DUP; duplication.

Table 2.

Summary of type I ALT-specific variants not detected in Tel+ mESCs

Type of variant Detection method Number of variants (n) Median size (bp) Max size (bp)
SNV PAV 305393 1 1
Small variant (<50 bp) Deletion PAV, SVIM-asm 46980 1 49
Insertion PAV, SVIM-asm 93246 1 49
Structural variant (≥50 bp) Deletion PAV, SVIM-asm 214 84.5 60256
Insertion PAV, SVIM-asm 816 93 34016
Duplication PAV, SVIM-asm 5 136336 1313343
All SVs - 1035 - -
All variants - 446654 - -

Haplotype-resolved assemblies allowed elucidation of genotypes of phased variants

Variant calling based on haplotype-resolved assembly allows the use of phasing information to determine the genotypes of variants. We investigated the genotypes of various type I ALT-specific variants, including SNVs, small variants and SVs, and found that 50% of all variants were homozygous variants, especially most insertions (58% of <50 bp and 74% of ≥50 bp) were homozygous (Figure 4D). Furthermore, we investigated the genotype of the variant set within a gene, distinguishing between two types of variants: a monoallelic variant and a biallelic but heterozygous variant, which were difficult to distinguish without haplotype-resolved assembly. Approximately 300 genes were affected by biallelic and heterozygous variants (Supplementary Figure S7A). In particular, biallelic and heterozygous SNVs in genes Gm21103 and Mup18 were predicted to have deleterious effects on protein function. Two SNVs in each allele of Gm21103, and one and two SNVs in each allele of Mup18, were predicted to have deleterious effects on each allele (Supplementary Table S3). Allelic phasing of SNVs with deleterious effects is critical for accurately determining whether proteins with normal function are synthesized, and by inferring this in detail, we were able to validate the utility of haplotype-resolved assembly.

Overall, we were able to accurately determine the allele-resolution status of the variants using haplotype-resolved assembly, allowing us to infer the zygotic identity of the variants accumulated in the type I ALT mESCs.

The subtelomeric regions of type I ALT mESCs showed resistance to small variants and SVs, but susceptibility to SNVs and CNVs

Next, we compared SVs located in subtelomeric regions, 500 kb far from q-arm telomeric repeats, and those in other genomic regions in type I ALT mESCs, and found that SVs were more enriched in other genomic regions than in subtelomeric regions. (0.19 and 0.1 SVs per 500 kb for other genomic regions and subtelomeric regions, respectively) (Figure 4E). Approximately 130000 small variants (<50 bp), deletions and insertions, specific to type I ALT, were also enriched in other genomic regions than in the subtelomeric regions (24.0 and 15.8 small variants per 500 kb in other genomic regions and subtelomeric regions, respectively) (Figure 4F), indicating that other genomic regions are more susceptible to variants than subtelomeric regions in type I ALT mESCs, regardless of the variant size. However, SNVs were more enriched in subtelomeric regions than in other genomic regions in type I ALT mESCs (Figure 4G).

We investigated the genome-wide distribution of variants, and confirmed that both small variants and SVs tend to be enriched in other genomic regions than subtelomeric regions (Supplementary Figure S8). In particular, hotspots for small variants overlapped with hotspots for SVs, typically found in internal genomic regions of chromosome 4, 8, 10 and 17. This result suggested the possibility that the mechanisms producing small variants and SVs target genomic regions with similar characteristics, or that each type of variant was formed through related events. The distribution pattern of SNVs was quite different from those of other variants. In the other genomic regions, most regions had no SNVs at all, except for a few hotspot regions for SNVs (Supplementary Figure S9). We investigated the number of small variants, SVs, and SNVs within each 100 kb region and found that both small variants and SVs had significantly weak correlations with the number of SNVs (Pearson correlation coefficients: 0.09 for small variants and SNVs, 0.14 for SVs and SNVs; P value < 2.2e-16) (Figure 4H; Supplementary Figure S7B). This suggests that variants and SNVs may have been formed through different mechanisms or independent events during the process of ALT acquisition and maintenance.

Next, we investigated the copy number ratio of type I ALT mESCs compared to Tel+ mESCs using HiFi data, and compared the distribution of CNVs in subtelomeric regions and other genomic regions. While in other genomic regions the copy number ratio of type I ALT to Tel+ mESCs relatively converged to 1 (log ratio 0), we found that in subtelomeric regions, not only did the average copy number ratio deviate from 1, but the variance of the distribution was also large. (Figure 4I). A similar pattern was observed when we examined the copy number of the two regions for each chromosome. While other genomic regions of type I ALT mESCs had few CNVs across all chromosomes, subtelomeric regions exhibited CNVs on most chromosomes, although the direction and extent of variation varied among chromosomes (Figure 4J). In particular, large deletions in type I ALT mESCs resulted in the severe CNVs in the subtelomeric region of chromosome 17. Exceptionally, the copy number of other genomic regions of the Y chromosome was very low, suggesting a natural loss of the Y chromosome during the establishment of the type I ALT mESCs.

Overall, the subtelomeric regions of type I ALT mESCs are vulnerable to CNVs and SNVs, but are resistant to small variants and SVs. One hypothesis that could explain this is that subtelomeric regions may frequently be duplicated within the genome for some reason, but variants containing indels do not occur often due to the characteristics of the subtelomeric region. As we did not analyse translocation in this study, it was not possible to determine whether subtelomeres were frequently copied into the genome. However, for two large insertions identified in type I ALT mESCs, the inserted sequences were aligned to the subtelomeric regions, allowing us to indirectly confirm the replication of the subtelomeric regions into the genome (Supplementary Figure S7C). Furthermore, even if recombination between subtelomeric sequences would be active, such changes are difficult to detect unless scars of a certain size are present. In other words, the discovery of only a few small variants and SVs in subtelomeric regions does not mean that rearrangements seldom occurred in subtelomeres.

Approximately 95% of all variants detected in Tel+ mESCs were shared with type I ALT mESCs, with the remaining 5% unique to Tel+ mESCs (Supplementary Figure S10A). Although the number of Tel+-specific variants was comparable to ALT-specific variants across all types, their distributions differed. In contrast to type I ALT, where subtelomeric regions showed resistance to the accumulation of small variants and SVs, Tel+-specific variants were relatively enriched in subtelomeric regions (Supplementary Figure S10B–D). Despite expressing telomerase, Tel+ mESCs have accumulated a significant number of variants since bifurcation from the time point of telomerase knock-out, potentially due to several factors: (i) false positive variants arising from unassembled regions in the type I ALT genome, (ii) genomic instability caused by long-term passaging and suboptimal in vitro culture conditions and (iii) the presence of variants in the sub-populations of type I ALT mESCs due to cell population heterogeneity, which are not reflected in the assembled genome.

In this study, genomic variants from two sample genomes were detected and compared using GRCm39 as the reference. Some variants appeared unique to one sample due to the limitation in the mapping process. Notably, 37% of SNVs, 14.2% of small variants and 23.6% of SVs unique to Tel+ were found in unassembled regions of the type I ALT genome, suggesting a possible misclassification as Tel+-specific (Supplementary Figure S10F). Conversely, the proportion of variants unique to type I ALT mESCs in hard-to-map regions of the Tel+ genome was very small (Supplementary Figure S10E), indicating a minimal likelihood of false-positives in type I ALT-specific variants.

Tel+ mESCs exhibited an average of three end-to-end fusions per cell after more than a year of passaging, despite expressing telomerase (20). The in vitro cell culture conditions (DMEM + LiF) may have been insufficient to maintain the genomic stability of mESCs over an extended period.

In this study, we performed genome assembly and assembly-based variant calling, but some true-positive variants may have been missed during the process. To detect these, we conducted ONT raw read-based variant calling, revealing 233 type I ALT-specific SVs and 83 Tel+-specific SVs (Supplementary Figure S10G). Among them, 206 and 78 SVs, respectively, were detected only by the raw read-based methods (Supplementary Figure S10H). We then examined the read support ratio (SV-containing or supporting reads/total mapped reads) distribution and found that SVs with a support ratio around 0.1 were the most common in both type I ALT mESCs and Tel+ mESCs. For SVs with a read support ratio below 0.5—representing relatively minor SVs—, 220 and 80 SVs were observed in type I ALT mESCs and Tel+ mESCs, respectively. Taken together, type I ALT mESCs harbour a significant number of minor variants undetected in the assembled genome, suggesting they may consist of more heterogeneous populations and likely have more read-based variants than Tel+ mESCs. Genomic instability persisting after ALT acquisition may lead to the accumulation of numerous SNVs across sub-populations in type I ALT mESCs.

On the other hand, we also examined the read support ratio distribution of SVs detected by assembled genome-based methods. Regardless of the type (HiFi or ONT) of raw long read, many of the detected SVs had a read support ratio >0.5, indicating that major SVs in the populations were well-captured from the assembled genome (Supplementary Figure S10I, J). Insertions showed slightly lower read support ratios compared to deletions, likely due to the relatively low accuracy of ONT reads and the limited depth of HiFi reads. This trend was observed in both type I ALT and Tel+ mESCs, confirming that assembly-based variant calling effectively captures major SVs across samples.

Chromothripsis regions and kataegis events across multiple chromosomes in type I ALT mESCs

We speculated that the type I ALT mESCs experienced extensive DNA damage under telomere dysfunction conditions, and that we can find traces of highly complex genomic rearrangements such as chromothripsis within the genome. Using CNVkit (ver.0.9.10), we investigated the copy number ratio of the type I ALT compared to Tel+ mESCs, and subsequently evaluated the pattern of CNVs to manually assess chromothripsis. As a result, we identified a total of 20 chromothripsis signatures in the type I ALT genome, distributed evenly across 14 chromosomes (Figure 5A; Supplementary Figure S11A). Notably, telomere-involved chromothripsis accounted for half of the total with a count of 10.

Figure 5.

Figure 5.

Chromothripsis discovered in type I ALT mESCs. (A) Copy number (log2 ratio) of type I ALT mESCs relative to Tel+ mESCs using HiFi reads. (B) Distribution of CNVs in the chromothripsis region of chromosome 11 (upper) and 2 Mb region upstream of mTALT (lower). The y-axis shows the copy number (log2 ratio) of type I ALT mESCs relative to Tel+ mESCs. The average number of variants in each interval is represented in the plot. (C–E) Distribution of type I ALT-specific variants in 20 chromothripsis regions and the whole genome. (C) SV distributions. The numbers of SVs were merged into 500-kb intervals. (D) Small variants (<50 bp) distributions. The numbers of variants were merged into 500-kb intervals. (E) SNV distribution. The numbers of SNVs were merged into 100-kb intervals. The numbers in the plot represent the average number of variants in each interval.

We found that the frequency of the accumulation of ALT-related variants in chromothripsis regions was similar to that of ALT-related variants across whole genome regardless of the type of variants (Figure 5CE). In other words, the presence of chromothripsis did not lead to a higher frequency of variant occurrence.

An approximately 2 Mb regions at the end of chromosome 11 (11:119783073–121869890) was amplified >5-fold in the type I ALT compared to Tel+ mESCs, making it one of the most severely affected regions by CNVs in the type I ALT genome (Figure 5B). Interestingly, this region corresponded to the very upstream 2 Mb of the mTALT template (11:121868656∼). While this region showed significant CNVs, the frequency of SNV and SV occurrences was notably lower compared to the genome average, indicating the presence of resistance to variant occurrence and accumulation not only in the mTALT region but also in the upstream regions. Surprisingly, despite the mTALT sequence being amplified >500-fold in the type I ALT genome, we found no SNVs or SVs within it. Assuming that genomic variants are distributed uniformly across the genome, the mTALT sequence, which totals 635 Mb and represents 0.02% of the type I ALT genome, would be expected to contain approximately 70.7 SNVs, 32.5 small variants, and 0.24 SVs. However, the absence of any genomic variants in this sequence suggests that the mTALT sequence is replicated and maintained with exceptionally high fidelity (Supplementary Figure S11B).

We investigated whether genomic alterations during telomere crisis and ALT activation led to clusters of point mutations, known as kataegis. We observed 132 kataegis events using SNVs that are unique to type I ALT and absent in Tel+ mESCs (Supplementary Figure S11C). Most point mutations within a kataegis originated from the same allele, with biallelic mutation clusters being the most common, comprising about 78% of the total (103/132) (Supplementary Figure S11F). Kataegis events were often located within 10 kb of SV breakpoints, with 15% associated with chromothripsis regions (Supplementary Figure S11D and E). Given that the 1035 SV regions identified in type I ALT cells account for only 0.1% of the entire genome (2.6 Mb), the association of 6% of kataegis events with SVs suggests that kataegis events may be enriched in SV regions. In contrast, it is difficult to interpret kataegis events as being enriched in chromothripsis regions, as the 20 chromothripsis regions total 549 Mb and represent nearly 20% of the type I ALT genome. Together, these results demonstrate that in type I ALT cells, kataegis may occur independently of chromothripsis but may be associated with SVs. On the other hand, 111 kataegis events were found using Tel+ specific SNVs. No significant differences were detected in these kataegis events compared to type I ALT kataegis when analysing the nearest distance to SV breakpoints and the allele status of each kataegis SNV. (Supplementary Figure S11G–I). Of the kataegis events identified in Tel+ mESCs, 22 overlapped with those in type I ALT, suggesting that these overlapping regions may be SNV-prone areas.

Three genes, whose expression has increased under the influence of SVs, contributed to the ALT mechanism

Only three of the ALT-related genes reported in the previous studies were affected by SVs: Pcna, Prim2 and Wrn, all of which had insertions either within an intron or within 5 kb downstream from the 3′ end. However, the transcriptome data from type I ALT mESCs showed that the expression of these genes in post-ALT compared to pre-ALT remained unchanged, indicating that the insertion had only a minor effect on gene expression changes (Supplementary Figure S12A). Among the 62 genes expected to have significant expression changes due to type I ALT-specific large duplications, the expression of 39 genes were observed in bulk RNA-seq data, with the 15 genes showing the most significant expression changes also validated by qPCR (Supplementary Figure S12E). Only five genes showed increased expression after type I ALT activation compared to before activation, while the remaining duplicated genes either did not show a clear increase or exhibited a tendency to decrease in expression after ALT activation, suggesting that epigenetic changes or other mechanisms may have contributed to the regulation of these duplicated genes.

We focused on two genes that showed significantly increased expression after type I ALT activation: Double PHD Finger 3 (Dpf3) and Regulator of G protein Signaling 6 (Rgs6) (Supplementary Figure S12B and F). Dpf3 and Rgs6 are genes located within the duplicated region of chromosome 12 (12:83039151–83393846), and transcription amplification resulting directly or indirectly from duplication may have influenced the increased expression of these two genes (Figure 6B; Supplementary Figure S12C and D). Dpf3 is known as a component of the SWI/SNF chromatin remodelling complex, regulating chromatin accessibility and playing roles in transcription regulation and DDR (50,51). Overexpression of Dpf3 has been associated with promoting renal cell carcinoma (RCC) cell growth and inhibiting apoptosis, thereby stimulating oncogenic pathways (52). Rgs6 is involved in G-protein signaling and is known as a crucial regulator of the DDR process (53,54).

Figure 6.

Figure 6.

Contribution of genes affected by type I ALT-specific SVs to ALT telomere stability. (A) Heatmap representing RNA expression of genes affected by five large duplications in type I ALT mESCs. (B) Copy number (log2 ratio) of the duplicated region of chromosome 12 in type I ALT mESCs relative to Tel+ mESCs. A grey box indicates the location of the gene, and the duplicated region is shown between red dotted lines. (C) (left) Representative images of telomere dysfunction-induced foci (TIF) analysis of type I ALT mESCs under Dpf3, Rgs6, or Tacc2 knockdown conditions. White arrows indicate colocalisation of γ-H2A.X and telomeres. Scale bar, 5 uM. (right, upper) Quantification of TIF foci per cell. The lines represent the mean of ≥ 302 cells from three biologically independent replicates. P values correspond to a two-tailed unpaired t-test. (right, lower) Quantification of the ratio of cells with >6 TIFs. The bars represent mean with SD from three biologically independent replicates. P values from Dunnett's one-way ANOVA. (D) Examples of normal and fragile telomeres, along with chromosome ends showing two classes of T-SCEs—single exchanged and double exchanged telomeres—detected by CO-FISH in metaphases of type I ALT mESCs. Leading strand telomeres were labelled in red, while lagging strand telomeres were labelled in green. *, fragile telomeres; <<, single-exchanged telomeres; X, double-exchanged telomeres. (E) Quantification of fragile telomeres detected by CO-FISH analysis in type I ALT mESCs and Tel+ mESCs under Dpf3, Rgs6 or Tacc2 knockdown conditions. The bars represent mean with SD of ≥75 metaphases, ≥ 666 telomeres from three biologically independent replicates. Dots represent individual replicates. P values from Dunnett's one-way ANOVA. (F) Quantification of chromosome ends with single exchange T-SCEs. The bars represent mean with SD from three biologically independent replicates, as described in (E). Dots represent individual replicates. P values from Dunnett's one-way ANOVA. (G) Quantification of chromosome ends with double exchange T-SCEs. The bars represent mean with SD from three biologically independent replicates, as described in (E). Dots represent individual replicates. P values from Dunnett's one-way ANOVA.

Among the 508 genes expected to exhibit changes in expression due to type I ALT-specific large insertions, our attention was drawn to a gene, transforming acidic coiled-coil protein 2 (Tacc2), which showed significantly increased expression after type I ALT activation (Supplementary Figure S12B and F). Within the intron of Tacc2, there were 6294-bp and 92-bp insertions, which were speculated to contribute to the increased gene expression. Tacc2 is known to play a significant role in regulating microtubule and spindle dynamics. Overexpression of Tacc2 has been observed in prostate cancer (55), with Tacc2 overexpression particularly associated with promoting cell growth and poor prognosis in breast cancer (56). Although we observed a significant increase in the mRNA expression of the three genes—Dpf3, Rgs6 and Tacc2—associated with duplications or insertions, this increase could not be detected at the protein level, except for Tacc2, where an increase in protein expression was observed (Supplementary Figure S12G–I).

We observed that knockdown of these three genes caused a significant increase in telomere damage, a result seen only in type I ALT mESCs and not in Tel+ mESCs, although Dpf3 knockdown did increase the telomeric γ-H2A.X signal in Tel+ mESCs (Figure 6C; Supplementary Figure S12G–I and Supplementary Figure S13A–E). This indicates that Dpf3, Rgs6 and Tacc2 contribute to maintaining telomere stability in type I ALT mESCs. However, knockdown of the three genes did not significantly alter replicative stress in type I ALT telomeres (Supplementary Figure S13B). We also examined whether depletion of these genes affected other ALT phenotypes, such as fragile telomeres and T-SCEs (57). While depletion of all three genes had no effect on type I ALT telomere fragility, depletion of Dpf3 or Rgs6 increased the frequency of both single and double exchange T-SCEs (Figure 6DG). Notably, this increase in T-SCE frequency was absent in Tel+ mESCs under Rgs6 depletion, indicating that Rgs6 specifically contributes to telomere recombination in type I ALT, whereas Dpf3 depletion has the potential to induce a certain degree of telomere dysfunction even in Tel+ mESCs.

In summary, although the precise timing of the acquisition of SVs by these genes remains uncertain, it is plausible that they underwent positive selection due to their involvement in ALT-mediated telomere maintenance.

Another candidate of mTALT is located at q-arm subtelomere of chromosome 1 in Tel+ mESCs

The mTALT located at the subtelomere of chromosome 11 in Tel+ mESCs exhibits a structure flanked by telomeric repeats on both sides: 55 bp on one side and 22 kb on the opposite side (Figure 2A). The TTAGGG repeat sequences present on both sides of mTALT are homologous to telomeres of various chromosome ends, suggesting their significant role in replicating mTALT into telomeres through homologous recombination. The sequence structure considered crucial for mTALT serving as a telomere template—unique sequences flanked by telomeric repeats—was not limited to the ends of chromosome 11. We also identified this structure at the end of chromosome 1 in Tel+ mESCs. At the end of chromosome 1 in Tel+ mESCs, 6721-bp sequence was observed between 779 bp and 11 kb of telomeric repeats at the contig level. This region was 129/Ola-specific subtelomere sequences not observed in the reference genome (Supplementary Figure S4). It is speculated that at the end of chromosome 1 in Tel+ mESCs, during the process of repairing telomere DNA damage, the 6721-bp sequence was replicated from the same or another chromosome via homologous recombination, followed by extension of telomere sequences to ensure chromosome stability. However, due to deletions in subtelomeric sequences during telomere dysfunction, the sequence was completely absent from the end of chromosome 1 in the type I ALT mESCs (Figure 2H). Consequently, this 6.7-kb sequence was not selected for the telomeric template, despite having potential to act as a template for ALT telomeres, given that it is similar in length to mTALT and has telomere-flanked structures.

Additionally, at the q-arm subtelomeres of chromosome 9 in Tel+ mESCs, 148-kb sequence flanked by 11 kb and 10 kb of telomeric repeats was identified. However, this region was also completely deleted in the type I ALT mESCs (Figure 2I; Supplementary Figure S4). Its length may be too long to serve as the template for ALT, even if present during ALT activation.

The existence of telomeric repeat-flanked subtelomeric sequences was not specific to the 129/Ola strain. Similar structures were also observed in the C57BL/6J strain reference genome; for example, approximately 55-kb and 70-kb regions flanked by telomeric repeats were found at specific locations on chromosomes 8 and 10, respectively (Supplementary Figure S14). In conclusion, the presence of telomeric-flanked subtelomeric sequences is not unique to a specific strain and is expected to exist in various forms as potential mTALT candidates across multiple mouse genomes.

Discussion

In this study, we used highly accurate or ultra-long reads, as well as Hi-C data, to generate haplotype-resolved genome assemblies for type I ALT mESCs and Tel+ mESCs (Figure 1). We were able to compare the 129/Ola genome to the C57BL/6J reference genome at the chromosomal level, with a particular focus on sequence diversity in subtelomeric regions between the two strains. Furthermore, various genomic alterations in type I ALT cells were identified at the nucleotide level, and several genes affected by SVs were discovered to contribute to ALT telomere stability.

Chromosome end structures stabilised by simple fusion or mTALT replication

Our de novo assembly of type I ALT mESCs revealed how they repair DSBs at chromosome ends and achieve end stability. While our assembly allowed us to identify chromosome end structures at the nucleotide level, only 11 out of 40 chromosome ends could be identified. We identified end-containing contigs through BLAST searches utilising mTALT sequence, subtelomeric sequences from each chromosome and canonical telomeric repeats. Specifically, p-arm end-containing contigs could not be identified due to misassembly of p-arm ends in the reference genome, which restricted labelling of p-arm chromosome ends. Additionally, nine of the 20 q-arm ends remained unresolved, possibly due to poorly assembled subtelomeres caused by repetitiveness or the lack of unique sequences in these nine subtelomeric regions, leading to incorrect identification. Improving assembly contiguity through increased read depth is expected to uncover additional chromosome end structures overlooked in this study.

The simplest structure among the 11 chromosome ends was a fusion of two chromosome ends (Figure 2J and K). The chromosomal end lacking telomeric and subtelomeric sequences was concealed by fusing with another chromosomal end with short telomeric repeats. While such dicentric chromosomes are anticipated to be genetically unstable during cell division, type I ALT cells withstand and survive the instability inherent in such structure. In contrast to the chromosome end structure of type I ALT C. elegans reported in the previous study (58), where fusions were found only between two subtelomeres following telomere erosion, type I ALT mESC exhibited fusion events even at intact subtelomeres.

Except for simple fusions, all seven chromosome ends were stabilised through mTALT replication (Figure 2CI). In most chromosomes, except for chromosome 1 and 9, mTALT sequences were replicated at the chromosome ends when ∼700 bp of telomeric repeats remained after telomere deletion. These residual telomeric repeats may have shielded the ends from fusion and provided homology for mTALT replication. The structure of chromosome ends stabilised by mTALT replication closely resembled that of type I ALT C. elegans (58); a unique subtelomeric element flanked by short telomeric repeats acted as the ALT template and was replicated unidirectionally at the ends of each chromosome. This indicates potential evolutionary conservation of the type I ALT mechanism.

Potential of mTALT as repair template for global DSBs

It was fascinating to discover that mTALT is not only located at chromosome ends but also exists within the genome. Furthermore, multiple copies of mTALT, arranged in long structures, were found within the genome. The structure where unique sequences flank the mTALT assay on both sides has not been detected, making it impossible to fully unravel the connection between mTALT and the genome. The wide distribution of mTALT raises intriguing possibilities: (i) mTALT possesses significant potential to be recruited for repair when DSBs occur within the genome, (ii) mTALT itself harbors favorable characteristics for replication into other regions and (iii) mTALT may be capable of forming neotelomeres through the healing of DSBs (59,60). These possibilities are not mutually exclusive and may be interrelated. The process of adding neotelomeres by telomerase, which initiates at sites of DSBs, may represent a common step in tumorigenesis. However, the interaction of mTALT with DSBs within the genome is expected to occur following the activation of ALT, suggesting that it may involve mechanisms distinct from neotelomere formation by telomerase. Despite originating from subtelomeric regions, mTALT did not accumulate SNVs during amplification, indicating a resilient nature against mechanisms that can induce point mutations. Moreover, cases of breakpoints within mTALT were very rare, suggesting that SVs disrupting the mTALT template occur infrequently. This property is shared not only by mTALT but also by the upstream 2 Mb region, yet the reasons for the stability of such large regions against various mutations and rearrangements remain unclear. Exploring the genomic architecture of these regions—including the local organisation of repeat and other sequences, replication timing, transcriptional activity, as well as epigenetic factors such as binding proteins and modifications—may provide insights into their stability. Nonetheless, it is remarkable that this stability persists even after spreading throughout the genome.

Candidate mechanisms for subtelomeric alterations

In type I ALT mESC, subtelomeres generally exhibited fewer accumulated SVs or small variants compared to other genomic regions, but SNVs and CNVs occurred more actively. For SNVs, the mean or median values were higher in subtelomeres, although there were many regions with zero SNVs in other genomic regions. In fact, the distribution of SVs, small variants, and SNVs in other genomic regions showed larger deviations and many regions with high values. While mapping variations genome-wide across chromosomes, hotspot regions with frequent variations within the genome could be identified, but their association with specific genomic architectures remained unclear.

It is challenging to precisely determine the mechanism underlying the formation of variations in subtelomeres. The repair process most likely to occur rapidly and efficiently in the presence of DSBs is classical non-homologous end joining (C-NHEJ) (61,62). However, C-NHEJ in telomeres and subtelomeres is known to be inherently suppressed. The active occurrence of CNVs in subtelomeric regions with fewer SVs suggests a predominant involvement of homologous recombination (HR). HR, requiring relatively long homologous sequences, tends to leave fewer errors after repair. Regardless of few errors, HR itself can lead to complex SVs, especially when segmental duplications or high homology from distinct locations serve as templates, resulting in nonallelic homologous recombination and spurious rearrangements (63). Given the high frequency of CNVs observed, subtelomeric sequences are likely to exist in other regions of the genome, where relatively accurate replication via HR may have been utilised.

If DNA replication fails to proceed smoothly in subtelomeres, resulting in the generation of single-ended DSBs or if only one end successfully locates a homologous template, break-induced replication (BIR) may occur (64). It is believed that BIR plays a role in repairing DSBs and lengthening telomeres in human ALT cells (65,66,67). The extent to which the BIR pathway is utilised for telomere maintenance in type I ALT mESCs has not been elucidated. Based on current knowledge, it is expected that the dependence on conventional BIR will not be high, primarily because the mTALT template for type I telomeres did not accumulate SNVs even after active duplication. However, independent of the replication of mTALT, there is a possibility that BIR was involved in rearrangement of subtelomeres. If nonallelic homology is utilised, BIR can result in nonreciprocal translocations. As a result, while significant structural changes may not have occurred, accumulation of SNVs could have resulted from the action of error-prone DNA polymerases during BIR. It is unclear at what frequency and through which pathways BIR will proceed in telomeres and subtelomeres, but it is likely that both regions will be maintained through relatively error-free homology-directed repair (HDR) mechanisms. When homology-based repair mechanisms fail to operate, an alternative mechanism to consider is alternative non-homologous end joining (A-NHEJ), with the most representative being microhomology-mediated end joining (MMEJ) (68). MMEJ, relying on microhomology and the highly error-prone polymerase theta, has a high mutagenic potential. However, HR deficiency does not appear to be evidenced in type I ALT mESCs.

In summary, subtelomeres exhibit active rearrangements but appear to maintain structural integrity well. A moderate level of instability may function plastically to help maintain overall genome stability. One particularly noteworthy case is mTALT, used as a template for telomeres.

ALT-related genomic alterations induced gene expression changes

With the achievement of high-quality genome assembly, we have obtained a foundation to elucidate the genomic structure of type I ALT in detail. Among the information not obtainable through short-read whole genome sequencing is the list of genes that may have been affected by genomic alterations other than variations occurring in coding sequences, potentially influencing gene expression levels and impacting ALT. Representative examples include Rgs6, Dpf3 and Tacc2. Telomere damage increased upon depletion of all three genes, indicating that the increased expression of each gene contributes dosage-dependently to the stability of ALT telomeres. Specific mechanisms and timing through which each gene becomes important will require further investigation, but the genome information obtained through long-read sequencing represents the first case of identifying genes associated with the ALT mechanism.

Structure and characteristics of additional mTALT candidates

Previous studies have demonstrated that when cells at the stage of senescence due to telomere shortening are continuously cultured, two types of ALT cells can emerge: type I ALT cells utilising mTALT and type II ALT cells not utilising mTALT. While it is clear that type II ALT cells do not utilise mTALT, it is uncertain whether they utilise another non-telomeric sequence as a template. In this study, we explored whether there were other candidate sequences sharing structural features with mTALT (specific unique sequences flanked by telomeric repeats) and identified two candidates (chr1 and chr9) in Tel+ mESCs. An interesting aspect is the unclear origin of the unique sequences between telomeric repeats; similar to mTALT, these sequences may have originated from the same or different chromosomes to repair DNA damage. Similar candidates were also found in the C57BL/6J reference genome. It is uncertain whether the length of sequences between telomeric repeats, except for those present on chromosome 1 of Tel+ mESCs, is suitable for serving as intact replication templates. Unique Sequences flanked by repeats are susceptible to nonallelic homologous recombination or rearrangement by replication-based mechanisms. Therefore, such sequence structures may have advantages as templates for DNA repair in certain situations.

Perspective

The current genome assembly provides only a final snapshot of various variations, encompassing changes occurring during telomere crisis, alterations selected during the initiation of ALT, and further modifications undergoing positive selection during active ALT maintenance. If genomes from cells at various time points before and after Terc knock-out could be fully assembled, it would allow for the deconvolution of changes directly associated with ALT. Leveraging haplotype-resolved genome assembly, it will be possible to more accurately track the progressive accumulation of genomic changes over time.

Supplementary Material

gkae842_Supplemental_Files

Acknowledgements

The authors thank Jun Kim for giving us the opportunity to conduct Oxford Nanopore Technologies (ONT) sequencing at Genome Analysis Center at National Instrumentation Center for Environmental Management (NICEM), Seoul National University, Korea. The authors thank Genome Analysis Center at NICEM for generating ONT sequencing data, and members of Lee lab (laboratory of genes and development) for helpful and critical suggestions during the study.

Author contributions: H.L.: Conceptualization, Methodology, Formal analysis, Investigation, Writing—original draft, Writing—review & editing. S.S.: Methodology, Formal Analysis, Investigation, Writing—original draft, Writing—review & editing. H.N.: Methodology. J.L.: Writing—review & editing, Funding Acquisition, Supervision.

Contributor Information

Hyunji Lee, Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea; Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Korea.

Hiroyuki Niida, Hamamatsu University School of Medicine, 1-20-1 Handayama, Chuo-ku, Hamamatsu city, Shizuoka 431-3192, Japan.

Sanghyun Sung, Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea; Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Korea.

Junho Lee, Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea; Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Korea; Research Institute of Basic Sciences, Seoul National University, Seoul 08826, Korea.

Data availability

The data supporting the conclusions of this article are publicly available in the Korean Nucleotide Archive (KoNa, http://www.kobic.re.kr/kona) with the accession ID KAP230737 (sequencing data only). https://kbds.re.kr/KRA/browse/view/EXPERIMENT/2423304. The modified R code used in the study is uploaded to the following git repository: http://github.com/hyyuunjii/NAR_revision (permanent DOI: http://zenodo.org/doi/10.5281/zenodo.13352886).

Supplementary data

Supplementary Data are available at NAR Online.

Funding

National Research Foundation of Korea [NRF-2020R1A2C3003352, NRF-2019R1A6A1A10073437]; Samsung Science and Technology Foundation [SSTF-BA1501-52]. Funding for open access charge: National Research Foundation of Korea.

Conflict of interest statement. None declared.

References

  • 1. Fajkus J., Sýkorová E., Leitch A.R.. Telomeres in evolution and evolution of telomeres. Chromosome Res. 2005; 13:469–479. [DOI] [PubMed] [Google Scholar]
  • 2. Baird D.M. Telomeres and genomic evolution. Philos. Trans. Roy. Soc. B: Biol. Sci. 2018; 373:20160437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Levy M.Z., Allsopp R.C., Futcher A.B., Greider C.W., Harley C.B.. Telomere end-replication problem and cell aging. J. Mol. Biol. 1992; 225:951–960. [DOI] [PubMed] [Google Scholar]
  • 4. Lundblad V. DNA ends: maintenance of chromosome termini versus repair of double strand breaks. Mutat. Res. Fundam. Mol. Mech. Mutagen. 2000; 451:227–240. [DOI] [PubMed] [Google Scholar]
  • 5. Shay J.W., Wright W.E.. Senescence and immortalization: role of telomeres and telomerase. Carcinogenesis. 2005; 26:867–874. [DOI] [PubMed] [Google Scholar]
  • 6. Bryan T.M., Englezou A., Dalla-Pozza L., Dunham M.A., Reddel R.R.. Evidence for an alternative mechanism for maintaining telomere length in human tumors and tumor-derived cell lines. Nat. Med. 1997; 3:1271. [DOI] [PubMed] [Google Scholar]
  • 7. Cesare A.J., Reddel R.R.. Alternative lengthening of telomeres: models, mechanisms and implications. Nat. Rev. Genet. 2010; 11:319–330. [DOI] [PubMed] [Google Scholar]
  • 8. Nakamura T.M., Cooper J.P., Cech T.R.. Two modes of survival of fission yeast without telomerase. Science. 1998; 282:493–496. [DOI] [PubMed] [Google Scholar]
  • 9. Dunham M.A., Neumann A.A., Fasching C.L., Reddel R.R.. Telomere maintenance by recombination in human cells. Nat. Genet. 2000; 26:447–450. [DOI] [PubMed] [Google Scholar]
  • 10. Henson J.D., Neumann A.A., Yeager T.R., Reddel R.R.. Alternative lengthening of telomeres in mammalian cells. Oncogene. 2002; 21:598. [DOI] [PubMed] [Google Scholar]
  • 11. Jain D., Hebden A.K., Nakamura T.M., Miller K.M., Cooper J.P.. HAATI survivors replace canonical telomeres with blocks of generic heterochromatin. Nature. 2010; 467:223–227. [DOI] [PubMed] [Google Scholar]
  • 12. Heaphy C.M., Subhawong A.P., Hong S.-M., Goggins M.G., Montgomery E.A., Gabrielson E., Netto G.J., Epstein J.I., Lotan T.L., Westra W.H.. Prevalence of the alternative lengthening of telomeres telomere maintenance mechanism in human cancer subtypes. Am. J. Pathol. 2011; 179:1608–1615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Yu Q., Gates P.B., Rogers S., Mikicic I., Elewa A., Salomon F., Lachnit M., Caldarelli A., Flores-Rodriguez N., Cesare A.J.et al.. Telomerase-independent maintenance of telomere length in a vertebrate. 2022; bioRxiv doi:26 March 2022, preprint: not peer reviewed 10.1101/2022.03.25.485759. [DOI]
  • 14. Sung S., Kim E., Niida H., Kim C., Lee J.. Distinct characteristics of two types of alternative lengthening of telomeres in mouse embryonic stem cells. Nucleic Acids Res. 2023; 51:9122–9143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Kim C., Sung S., Kim J.-S., Lee H., Jung Y., Shin S., Kim E., Seo J.J., Kim J., Kim D.. Telomeres reforged with non-telomeric sequences in mouse embryonic stem cells. Nat. Commun. 2021; 12:1097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Gisselsson D., Jonson T., Petersén Å., Strömbeck B., Dal Cin P., Höglund M., Mitelman F., Mertens F., Mandahl N.. Telomere dysfunction triggers extensive DNA fragmentation and evolution of complex chromosome abnormalities in human malignant tumors. Proc. Natl. Acad. Sci. U.S.A. 2001; 98:12683–12688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Jacobs J. Loss of telomere protection: consequences and opportunities. Front. Oncol. 2013; 3:88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Amarasinghe S.L., Su S., Dong X., Zappia L., Ritchie M.E., Gouil Q.. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 2020; 21:30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Sakamoto Y., Sereewattanawoot S., Suzuki A.. A new era of long-read sequencing for cancer genomics. J. Hum. Genet. 2020; 65:3–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Niida H., Shinkai Y., Hande M.P., Matsumoto T., Takehara S., Tachibana M., Oshimura M., Lansdorp P.M., Furuichi Y.. Telomere maintenance in telomerase-deficient mouse embryonic stem cells: characterization of an amplified telomeric DNA. Mol. Cell. Biol. 2000; 20:4115–4127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013; arXiv doi:26 May 2013, preprint: not peer reviewedhttps://arxiv.org/abs/1303.3997.
  • 22. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R.. The sequence alignment/map format and SAMtools. Bioinformatics. 2009; 25:2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Danecek P., Bonfield J.K., Liddle J., Marshall J., Ohan V., Pollard M.O., Whitwham A., Keane T., McCarthy S.A., Davies R.M.. Twelve years of SAMtools and BCFtools. Gigascience. 2021; 10:giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Durand N.C., Shamim M.S., Machol I., Rao S.S.P., Huntley M.H., Lander E.S., Aiden E.L.. Juicer provides a one-click system for analyzing loop-resolution hi-C experiments. Cell Syst. 2016; 3:95–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Wang X., Luan Y., Yue F.. EagleC: a deep-learning framework for detecting a full range of structural variations from bulk and single-cell contact maps. Sci. Adv. 2022; 8:eabn9215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Cheng H., Concepcion G.T., Feng X., Zhang H., Li H.. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 2021; 18:170–175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Cheng H., Jarvis E.D., Fedrigo O., Koepfli K.-P., Urban L., Gemmell N.J., Li H.. Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 2022; 40:1332–1335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Shafin K., Pesout T., Lorig-Roach R., Haukness M., Olsen H.E., Bosworth C., Armstrong J., Tigyi K., Maurer N., Koren S.. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 2020; 38:1044–1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Alonge M., Lebeigle L., Kirsche M., Jenike K., Ou S., Aganezov S., Wang X., Lippman Z.B., Schatz M.C., Soyk S.. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 2022; 23:258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Zhou C., McCarthy S.A., Durbin R.. YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 2023; 39:btac808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Guan D., McCarthy S.A., Ning Z., Wang G., Wang Y., Durbin R.. Efficient iterative Hi-C scaffolder based on N-best neighbors. BMC Bioinf. 2021; 22:569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Ghurye J., Pop M., Koren S., Bickhart D., Chin C.-S.. Scaffolding of long read assemblies using long range contact information. Bmc Genomics [Electronic Resource]. 2017; 18:527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Li H. New strategies to improve minimap2 alignment accuracy. Bioinformatics. 2021; 37:4572–4574. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Quinlan A.R., Hall I.M.. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M.. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015; 31:3210–3212. [DOI] [PubMed] [Google Scholar]
  • 36. Kriventseva E.V., Kuznetsov D., Tegenfeldt F., Manni M., Dias R., Simão F.A., Zdobnov E.M.. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2019; 47:D807–D811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Heller D., Vingron M.. SVIM-asm: structural variant detection from haploid and diploid genome assemblies. Bioinformatics. 2020; 36:5519–5521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Ebert P., Audano P.A., Zhu Q., Rodriguez-Martin B., Porubsky D., Bonder M.J., Sulovari A., Ebler J., Zhou W., Serra Mari R.. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021; 372:eabf7117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Shiraishi Y., Koya J., Chiba K., Okada A., Arai Y., Saito Y., Shibata T., Kataoka K.. Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv. Nucleic Acids Res. 2023; 51:e74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R., Thormann A., Flicek P., Cunningham F.. The ensembl variant effect predictor. Genome Biol. 2016; 17:122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Talevich E., Shain A.H., Botton T., Bastian B.C.. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput. Biol. 2016; 12:e1004873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Voronina N., Wong J.K., Hübschmann D., Hlevnjak M., Uhrig S., Heilig C.E., Horak P., Kreutzfeldt S., Mock A., Stenzinger A.. The landscape of chromothripsis across adult cancer types. Nat. Commun. 2020; 11:2320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Yin X., Bi R., Ma P., Zhang S., Zhang Y., Sun Y., Zhang Y., Jing Y., Yu M., Wang W.. Multiregion whole-genome sequencing depicts intratumour heterogeneity and punctuated evolution in ovarian clear cell carcinoma. J. Med. Genet. 2020; 57:605–609. [DOI] [PubMed] [Google Scholar]
  • 44. Jain C., Rhie A., Zhang H., Chu C., Walenz B.P., Koren S., Phillippy A.M.. Weighted minimizer sampling improves long read mapping. Bioinformatics. 2020; 36:i111–i118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Goel M., Sun H., Jiao W.-B., Schneeberger K.. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019; 20:277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Krzywinski M., Schein J., Birol I., Connors J., Gascoyne R., Horsman D., Jones S.J., Marra M.A.. Circos: an information aesthetic for comparative genomics. Genome Res. 2009; 19:1639–1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J.. Basic local alignment search tool. J. Mol. Biol. 1990; 215:403–410. [DOI] [PubMed] [Google Scholar]
  • 48. Cesare A.J., Heaphy C.M., O'Sullivan R.J. Visualization of telomere integrity and function in vitro and in vivo using immunofluorescence techniques. Current Protoc. Cytom. 2015; 73:12.40.1–12.40.31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Ghurye J., Rhie A., Walenz B.P., Schmitt A., Selvaraj S., Pop M., Phillippy A.M., Koren S.. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput. Biol. 2019; 15:e1007273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Zeng L., Zhang Q., Li S., Plotnikov A.N., Walsh M.J., Zhou M.-M.. Mechanism and regulation of acetylated histone binding by the tandem PHD finger of DPF3b. Nature. 2010; 466:258–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Hodges C., Kirkland J.G., Crabtree G.R.. The many roles of BAF (mSWI/SNF) and PBAF complexes in cancer. Cold Spring Harb. Perspect. Med. 2016; 6:a026930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Colli L.M., Jessop L., Myers T.A., Camp S.Y., Machiela M.J., Choi J., Cunha R., Onabajo O., Mills G.C., Schmid V.. Altered regulation of DPF3, a member of the SWI/SNF complexes, underlies the 14q24 renal cancer susceptibility locus. Am. Hum. Genet. 2021; 108:1590–1610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Dohlman H.G., Thorner J.. RGS proteins and signaling by heterotrimeric G proteins. J. Biol. Chem. 1997; 272:3871–3874. [DOI] [PubMed] [Google Scholar]
  • 54. Huang J., Yang J., Maity B., Mayuzumi D., Fisher R.A.. Regulator of G protein signaling 6 mediates doxorubicin-induced ATM and p53 activation by a reactive oxygen species–dependent mechanism. Cancer Res. 2011; 71:6310–6319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Takayama K.-i., Horie-Inoue K., Suzuki T., Urano T., Ikeda K., Fujimura T., Takahashi S., Homma Y., Ouchi Y., Inoue S.. TACC2 is an androgen-responsive cell cycle regulator promoting androgen-mediated and castration-resistant growth of prostate cancer. Mol. Endocrinol. 2012; 26:748–761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Cheng S., Douglas-Jones A., Yang X., Mansel R.E., Jiang W.G.. Transforming acidic coiled-coil-containing protein 2 (TACC2) in human breast cancer, expression pattern and clinical/prognostic relevance. Cancer Genomics Proteomics. 2010; 7:67–73. [PubMed] [Google Scholar]
  • 57. Yin S., Zhang F., Lin S., Chen W., Weng K., Liu D., Wang C., He Z., Chen Y., Ma W.. TIN2 deficiency leads to ALT-associated phenotypes and differentiation defects in embryonic stem cells. Stem Cell Rep. 2022; 17:1183–1197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Kim E., Kim J., Kim C., Lee J.. Long-read sequencing and de novo genome assemblies reveal complex chromosome end structures caused by telomere dysfunction at the single nucleotide level. Nucleic Acids Res. 2021; 49:3338–3353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Tan K.-T., Slevin M.K., Leibowitz M.L., Garrity-Janger M., Shan J., Li H., Meyerson M.. Neotelomeres and telomere-spanning chromosomal arm fusions in cancer genomes revealed by long-read sequencing. Cell Genomics. 2024; 4:100588. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Kinzig C.G., Zakusilo G., Takai K.K., Myler L.R., de Lange T.. ATR blocks telomerase from converting DNA breaks into telomeres. Science. 2024; 383:763–770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Muraki K., Murnane J.P.. The DNA damage response at dysfunctional telomeres, and at interstitial and subtelomeric DNA double-strand breaks. Genes Genet. Syst. 2017; 92:135–152. [DOI] [PubMed] [Google Scholar]
  • 62. Maciejowski J., de Lange T.. Telomeres in cancer: tumour suppression and genome instability. Nat. Rev. Mol. Cell Biol. 2017; 18:175–186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Carvalho C.M., Lupski J.R.. Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet. 2016; 17:224–238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Liu L., Malkova A.. Break-induced replication: unraveling each step. Trends Genet. 2022; 38:752–765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Zhang J.M., Yadav T., Ouyang J., Lan L., Zou L.. Alternative lengthening of Telomeres through two distinct break-induced replication pathways. Cell Rep. 2019; 26:955–968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Yang Z., Takai K.K., Lovejoy C.A., de Lange T.. Break-induced replication promotes fragile telomere formation. Genes Dev. 2020; 34:1392–1405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Dilley R.L., Verma P., Cho N.W., Winters H.D., Wondisford A.R., Greenberg R.A.. Break-induced telomere synthesis underlies alternative telomere maintenance. Nature. 2016; 539:54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Sfeir A., Symington L.S.. Microhomology-mediated end joining: a back-up survival mechanism or dedicated pathway?. Trends Biochem. Sci. 2015; 40:701–714. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkae842_Supplemental_Files

Data Availability Statement

The data supporting the conclusions of this article are publicly available in the Korean Nucleotide Archive (KoNa, http://www.kobic.re.kr/kona) with the accession ID KAP230737 (sequencing data only). https://kbds.re.kr/KRA/browse/view/EXPERIMENT/2423304. The modified R code used in the study is uploaded to the following git repository: http://github.com/hyyuunjii/NAR_revision (permanent DOI: http://zenodo.org/doi/10.5281/zenodo.13352886).


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES