Functional analysis of structural variants in single cells using Strand-seq

Hyobin Jeong; Karen Grimes; Kerstin K Rauwolf; Peter-Martin Bruch; Tobias Rausch; Patrick Hasenfeld; Eva Benito; Tobias Roider; Radhakrishnan Sabarinathan; David Porubsky; Sophie A Herbst; Büşra Erarslan-Uysal; Johann-Christoph Jann; Tobias Marschall; Daniel Nowak; Jean-Pierre Bourquin; Andreas E Kulozik; Sascha Dietrich; Beat Bornhauser; Ashley D Sanders; Jan O Korbel

doi:10.1038/s41587-022-01551-4

. 2022 Nov 24;41(6):832–844. doi: 10.1038/s41587-022-01551-4

Functional analysis of structural variants in single cells using Strand-seq

Hyobin Jeong ^1,^18,^#, Karen Grimes ^1,^2,^#, Kerstin K Rauwolf ³, Peter-Martin Bruch ^4,^5,⁶, Tobias Rausch ^1,⁵, Patrick Hasenfeld ¹, Eva Benito ¹, Tobias Roider ^1,^4,⁵, Radhakrishnan Sabarinathan ⁷, David Porubsky ^8,^9,¹⁹, Sophie A Herbst ^4,⁵, Büşra Erarslan-Uysal ^5,¹⁰, Johann-Christoph Jann ¹¹, Tobias Marschall ¹², Daniel Nowak ¹¹, Jean-Pierre Bourquin ³, Andreas E Kulozik ^5,¹⁰, Sascha Dietrich ^4,^5,^6,¹³, Beat Bornhauser ³, Ashley D Sanders ^1,^14,^15,^16,^✉, Jan O Korbel ^1,^5,^17,^✉

¹Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany

²Faculty of Biosciences, EMBL and Heidelberg University, Heidelberg, Germany

³Division of Pediatric Oncology, University Children’s Hospital, Zürich, Switzerland

⁴Department of Hematology, Oncology and Rheumatology, Heidelberg University Hospital, Heidelberg, Germany

⁵Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany

⁶Department of Hematology and Oncology, University Hospital Düsseldorf, Düsseldorf, Germany

⁷National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India

⁸Center for Bioinformatics, Saarland University, Saarbrücken, Germany

⁹Max Planck Institute for Informatics, Saarbrücken, Germany

¹⁰Department of Pediatric Oncology, Hematology, and Immunology, University of Heidelberg and Hopp Children’s Cancer Center, Heidelberg, Germany

¹¹Department of Hematology and Oncology, Medical Faculty Mannheim of the Heidelberg University, Heidelberg, Germany

¹²Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany

¹³Department of Translational Medical Oncology, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany

¹⁴Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany

¹⁵Berlin Institute of Health (BIH), Berlin, Germany

¹⁶Charité-Universitätsmedizin, Berlin, Germany

¹⁷Bridging Research Division on Mechanisms of Genomic Variation and Data Science, German Cancer Research Center (DKFZ), Heidelberg, Germany

¹⁸Present Address: Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Seoul, Republic of Korea

¹⁹Present Address: Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA

^✉

Corresponding author.

Contributed equally.

PMCID: PMC10264249 EMSID: EMS155453 PMID: 36424487

Abstract

Somatic structural variants (SVs) are widespread in cancer, but their impact on disease evolution is understudied due to a lack of methods to directly characterize their functional consequences. We present a computational method, scNOVA, which uses Strand-seq to perform haplotype-aware integration of SV discovery and molecular phenotyping in single cells by using nucleosome occupancy to infer gene expression as a readout. Application to leukemias and cell lines identifies local effects of copy-balanced rearrangements on gene deregulation, and consequences of SVs on aberrant signaling pathways in subclones. We discovered distinct SV subclones with dysregulated Wnt signaling in a chronic lymphocytic leukemia patient. We further uncovered the consequences of subclonal chromothripsis in T cell acute lymphoblastic leukemia, which revealed c-Myb activation, enrichment of a primitive cell state and informed successful targeting of the subclone in cell culture, using a Notch inhibitor. By directly linking SVs to their functional effects, scNOVA enables systematic single-cell multiomic studies of structural variation in heterogeneous cell populations.

Subject terms: Cancer genomics, Data integration, Epigenomics

Linking structural variants to gene expression information identifies aberrant signaling pathways.

Main

The mutational landscapes of numerous cancers were recently cataloged^1,2, revealing that somatic SVs represent around 55% of driver mutations^2,3. Somatic mutational processes generate a broad spectrum of SVs from simple (for example, deletions and inversions) to complex classes (for example, chromothripsis)^4–8, and these SVs are important drivers of malignancy, metastasis and relapse^9–12. However, with the exception of focal deletions and amplifications, somatic SVs have proven difficult to characterize functionally in cancer genomic surveys^1–3,13. Studies integrating transcriptome and whole genome sequencing (WGS) data have inferred SV functional outcomes^13–16, but these typically require large cohorts and do not account for intratumor heterogeneity (ITH)³. Instead, SV effects can be measured directly by reading both genotype and molecular phenotype in the same cell, using single-cell multiomics^17–21. Several such methods have been developed^17–20, but these do not presently account for small (<10 Mb) somatic copy number alterations (SCNAs), balanced SVs and complex rearrangement events like chromothripsis^4,5,7,22, which has limited efforts to functionally characterize the most common class of driver mutations in cancer.

To address this, we developed scNOVA (single-cell nucleosome occupancy and genetic variation analysis)—a method enabling functional characterization of the full spectrum of somatic SV classes. scNOVA uses Strand-seq²³ in two ways: (1) it uses the DNA fragmentation pattern resulting from micrococcal nuclease (MNase) digestion²³ to directly measure nucleosome occupancy (NO) and indirectly infer patterns of gene activity, and (2) it couples this ‘molecular phenotype’ with SVs discovered by single-cell tri-channel processing (scTRIP, which jointly models read-orientation, read depth and haplotype-phase²⁴) in the same cell. MNase digests the linker DNA between nucleosomes, leaving nucleosome-protected DNA intact, to enable genome-wide inference of NO by measuring sequence read counts^25–28. Previous work has shown that active enhancers and transcribed genes exhibit reduced NO^25–30. However, the relationships between NO and SV landscapes in cancer remain unexplored. scNOVA addresses this by integrating SVs and NO along the genome of a cell, to functionally characterize SVs in heterogeneous samples.

Results

NO classifies cell types and predicts gene activity changes

Strand-seq data reveals NO

We hypothesized that NO patterns derived from MNase fragmentation during Strand-seq library preparation could represent a readout to allow functional characterization of SVs (Fig. 1a and Extended Data Fig. 1). To test this, we evaluated whether Strand-seq data revealed nucleosome positioning through comparison with bulk MNase-seq data. We used the NA12878 lymphoblastoid cell line (LCL), which has both datatypes available, and pooled 95 Strand-seq libraries (sequenced to a median of 540,379 mapped nonduplicate reads per single cell³¹; Supplementary Table 1), into a ‘pseudobulk’ track, allowing direct comparison with the corresponding MNase-seq dataset (sequenced to 19-fold genomic coverage³²). We measured NO along the genome (Methods) and found Strand-seq and MNase-seq were highly concordant in terms of uniformity of coverage and inferred nucleosome positions at DNase I hypersensitive sites (Spearman’s r = 0.68) (Fig. 1b,c). Nucleosome positioning near the binding site of CTCF^26,28 (a key chromatin organizer) closely matched between both assays (Fig. 1d and Supplementary Fig. 1), and estimated nucleosome repeat lengths²⁸ were highly concordant (Supplementary Fig. 1). In addition, both assays measured NO in all 15 chromatin states identified by the Roadmap Epigenome Consortium³³. Among these chromatin states, Strand-seq and MNase-seq revealed the highest NO signals on average for the polycomb-repressed state and the bivalent enhancer state, whereas the lowest average NO signals were consistently seen for the active transcription start site (TSS) state (Extended Data Fig. 2). This indicates that Strand-seq enables direct measurement of NO to reveal a ‘molecular readout’. We thus developed the scNOVA framework, which harnesses Strand-seq to measure NO genome-wide and couples this with SVs discovered in the same sequenced cell (Fig. 1a).

Fig. 1 — a, Leveraging Strand-seq, scNOVA performs SV discovery and then, using phased NO tracks, identifies functional effects of SVs locally (via evaluation of haplotype-specific NO) and globally (clone-specific NO). Orange, Strand-seq reads mapped to the Watson (W) strand; green, reads mapped to the Crick (C) strand. b, Strand-seq-based NO tracks in NA12878 reveal nucleosome positions well-concordant with bulk MNase-seq, depicted for a chromosome 12 locus with relatively regular nucleosome positioning⁹². Red, NO tracks mapping to haplotype 1 (H1); blue, H2; black, combining phased and unphased reads; gray, MNase-seq. The y axis depicts the mean read counts at each bp in 10 bp bins. c, Correlated NO at consensus DNase I hypersensitive sites³³ for NA12878. d, Averaged nucleosome patterns at CTCF binding sites³⁴ in NA12878, using pseudobulk Strand-seq and MNase-seq. e, FCs of haplotype-resolved NO in gene bodies plotted for chromosome X and chromosome 7 (a representative autosome) in NA12878. FCs of haplotype-resolved RNA expression measurements are shown to the right. f, Pseudobulk haplotype-phased NO track of exons of the representative chromosome X gene *SH3KBP1* based on Strand-seq. Boxplots comparing H1 and H2 use two-sided Wilcoxon rank sum tests followed by Benjamini–Hochberg multiple testing (FDR) correction (boxplots defined by minima = 25th percentile – 1.5 × interquartile range (IQR), maxima = 75th percentile + 1.5 × IQR, center = median and bounds of box = 25th and 75th percentile; n = 47 single cells). Bar charts show haplotype-specific RNA expression of *SH3KBP1* (two-sided likelihood ratio test followed by FDR correction; n = 4 biological replicates; data are presented as mean values ± s.e.m.). g, Inverse correlation of NO at gene bodies and gene expression. NO is based on pseudobulk Strand-seq libraries from RPE-1. Gene bodies were scaled to the same length. h, Cell-typing based on NO at gene bodies (AUC = 0.96). Cell line codes: Blue, RPE-1; Purple, BM510; Magenta, C7; LV, latent variable. i, Receiver operating characteristics for inferring altered gene activity by analyzing NO at gene bodies, using pseudobulk Strand-seq libraries from in silico cell mixing.

Extended Data Fig. 1 — scNOVA employs single cell tri-channel processing (scTRIP) as realized in the MosaiCatcher pipeline to perform haplotype-aware somatic SV discovery²⁴. Modules of scNOVA enable single-cell mulitomics of these somatic SVs, including inference of haplotype-specific NO to investigate local (*cis*) effect of SVs, and inference of altered gene/pathway activity to investigate global (*trans*) effect of SVs detectable between geneticlaly distinct subclones. To infer alterations in gene activity, scNOVA integrates deep convolutional neural network (CNN) based machine learning, and negative binomial generalized linear models. The framework dissects intra-sample genetic heterogeneity at single-cell resolution, measures the local haplotype-specific impact of somatic SVs, can be used to explore global gene dysregulation in SV-containing cells, can discriminate between genetically-distinct subclones, and can uncover shared functional consequences of heterogeneous SVs affecting the same chromosomal interval.

Extended Data Fig. 2 — 15 chromatin states based on the NA12878 cell line were utilized in this genome-wide analysis. Plots generated represent Strand-seq data from NA12878 (n = 95 cells) **(a)**, and publicly available MNase-seq from NA12878, NA19193, and NA19238 (n = 1 sample each) **(b-d)**. The bulk MNase-seq experiment of NA12878 was pursued using single-end SOLID sequencing reads, and that of NA19193 and NA19238 was done using paired-end Illumina reads. The X-axis in the box plot indicates reads per kilobase per million (RPKM) measured for each genomic segment annotated by one of the 15 chromatin states. Abbreviations for chromatin states³³ are: TssA-Active TSS, TssAFlnk-Flanking Active TSS, TxFlnk - Transcription at gene 5’and 3’, Tx - Strong transcription, TxWk - Weak transcription, EnhG - Genic enhancers, Enh - Enhancers, ZNV/Rpts - ZNF genes & repeats, Het - Heterochromatin, TssBiv - Bivalent/Poised TSS, BivFlnk - Flanking Bivalent TSS/Enh, EnhBiv - Bivalent Enhancer, ReprPC - Repressed PolyComb, ReprPCWk - Weak Repressed PolyComb, Quies - Quiescent/Low. Boxplots were defined by minima = 25th percentile - 1.5X interquartile range (IQR), maxima = 75th percentile + 1.5X IQR, center = median, and bounds of box = 25th and 75th percentile. Both Strand-seq and MNase-seq assays measured NO in all fifteen chromatin states. Among these chromatin states, Strand-seq and MNase-seq revealed the highest NO signals on average for the polycomb repressed state and the bivalent enhancer state; whereas the lowest average NO signals were consistently seen for the active transcription start site (TSS) state.

As Strand-seq resolves its measurements by haplotype³¹, we considered that haplotype-specific differences in NO (haplotype-specific NO) resulting from random monoallelic expression, germline SNPs and local effects of SVs could be harnessed for scNOVA. To assess the utility of haplotype-resolved NO, we phased 24,652,658 of 49,205,197 (50.1%) of the NA12878 Strand-seq read fragments, and pooled these reads to generate pseudobulk NO tracks for each chromosomal haplotype (denoted ‘H1’ and ‘H2’, respectively; Fig. 1b). Using the female-derived NA12878 cell line, we compared haplotype-specific NO to haplotype-resolved gene expression measurements from bulk RNA-seq data³⁴ (Methods). We identified a significant increase of NO in gene bodies mapping to H1 compared with H2 across the X chromosome (adjusted P = 0.0012; Wilcoxon rank sum test), suggesting that H1 represents the inactive X chromosome. These data were consistent with haplotype-resolved gene expression measurements at loci subject to X-inactivation³⁵, whereas genes escaping X-inactivation did not exhibit haplotype-specific NO (Fig. 1e,f and Supplementary Fig. 3). We also investigated whether Strand-seq data is informative of haplotype-specific NO at cis-regulatory elements (CREs), and identified a 1.4-fold enrichment for allele-specific CRE binding on the X chromosome (P = 0.015; hypergeometric test; based on 718 CREs with haplotype-specific NO genome-wide; 10% false discovery rate (FDR)) (Supplementary Fig. 2). Moreover, CREs with haplotype-specific NO were significantly over-represented near genes showing allele-specific expression in the genome (P < 0.0018, hypergeometric test; Supplementary Fig. 2). These data suggest that haplotype-specific NO, a signal directly obtained from Strand-seq datasets, reflects biological gene regulation patterns in the genome.

Cell-typing

Since NO within gene bodies reflects gene activity in MNase-seq data²⁸, we hypothesized that Strand-seq based NO patterns could be used to infer gene expression. To investigate this, we tested whether NO globally reflects cellular gene expression patterns in the retinal pigment epithelium-1 (RPE-1) cell line, for which we previously generated both Strand-seq and RNA-seq data²⁴. To profile NO globally, we pooled 33 million read fragments (including phased and nonphased reads) from 79 Strand-seq libraries into pseudobulk NO tracks. We identified an inverse correlation between NO at gene bodies and gene expression (P < 2.2 × 10^–16; Spearman’s r of up to −0.24; Fig. 1g and Supplementary Fig. 4), where highly expressed genes showed significantly lower NO within their gene bodies (and vice versa). We next explored the utility of NO for cell-type inference (‘cell-typing’), based on the activity of lineage-specific genes, by implementing a multivariate dimensionality reduction framework. We performed in silico mixing of Strand-seq libraries from different LCLs and RPE cell lines, and built a classifier that separates distinct cell types by partial least squares discriminant analysis (PLS-DA). We used a training set of 179 mixed libraries, and initially considered 19,629 features, which reflect ENSEMBL³⁶ genes with sufficient read coverage (Methods). After feature selection, 1,738 features were retained. We then used a nonoverlapping set of 123 cells to assess performance, all of which scNOVA classified accurately (area under the curve (AUC) = 1; Extended Data Fig. 3). Our framework also discriminated between cells from three related RPE cell lines derived from the same donor, which exhibit distinct SV landscapes^24,37 (AUC = 0.96; Fig. 1h) indicating that scNOVA enables accurate cell-typing.

Extended Data Fig. 3 — **(a)** Cell-typing based on NO at gene bodies (AUC = 1). Epi1: RPE-1 replicate 1 (79 cells); Epi2: replicate 2 (77 cells); LCL1: HG01573 (46 cells); LCL2: HG02018 (50 cells), LCL3: NA19036 (50 cells); LV: latent variable. **(b)** UMAP visualization of Strand-seq libraries based on NO at gene-bodies (normalized by segmental ploidy status²⁴). **(c)** We also explored dimensionality reduction of Strand-seq libraries based on DNA motif accessibility. Using the chromVAR package¹¹⁰, single-cell NO profiles for 2 kb DNase I hypersensitive sites (DHSs) were transformed into a deviation Z-score, which measures how likely a certain motif accessibility would occur when randomly sampling sets of peaks with similar GC content and read depth. For each single-cell, the deviation Z-score was calculated for 870 human TF motifs from the cisBP database¹¹¹. These dimensionality reduction plots suggest that batch effect within the same cell type (three individuals in LCL, and two batches in RPE-1 sequenced separately) is minimal, and far less than the cell-type dependent variability. **(d)** UMAP using scMNase-seq²⁶, including 45 NIH3T3 cells and 272 murine naive T cells, based on NO at the gene-bodies. **(e)** UMAP of RPE-1 (the originally commercially available cell line) and its transformed derived³⁷ cell lines (BM510 and C7). Two biological replicates were sequenced for each cell line. **(f)** Receiver operating characteristic (ROC) using the PLS-DA based classifier. AUC for classifying each cell line was 0.9614, 0.9694, and 0.9892 for RPE-1, BM510, and C7 respectively. **(g-h)** Cell-typing for LCL, RPE-1, skin fibroblast, AML, T-ALL, and umbilical cord blood cells (g), and ROC curve depicting classification performance (overall AUC = 0.998) (h). (**i-j**) Cell-typing in five RPE-1 derived cell lines³⁷ (RPE-1, BM510, C7, C29, and C11) (i), and ROC curve depiciting classification performance (overall AUC = 0.9648) (j).

Gene activity changes between cell populations

Having established that scNOVA can use the expression of lineage-specific genes for cell-typing, we evaluated if it could predict gene expression differences between defined cell populations, such as subclones bearing distinct SVs. We devised a module that integrates deep convolutional neural networks and negative binomial generalized linear models (Supplementary Figs. 5 and 6), to measure differential gene activity between two defined cell populations. To benchmark this module, we mixed Strand-seq libraries from different cell lines in silico, creating ‘pseudoclones’, and evaluated the predicted changes in gene activity between defined pseudoclones (each composed of cells from one cell line) by analyzing NO at gene bodies (Supplementary Fig. 7 and Extended Data Fig. 4). We first compared RPE-1 to the HG01573 LCL line, and defined the ground truth of expression using RNA-seq. We found that the differential gene activity score of scNOVA (Methods) was highly predictive of the ten most differentially expressed genes, where analyses of pseudoclones comprising 156 RPE-1 and 46 HG01573 libraries revealed an AUC of 0.93 (we observed a similar performance when analyzing the 50 most differentially expressed genes; Fig. 1i). Gene activity changes inferred included well-known markers of epithelial (for example, EGFR, VCAN) and lymphoid (for example, CD74, CD100) cell types (Supplementary Table 2). The scNOVA predictions were informative also when we simulated minor subclones present with clonal frequency (CF) = 20%, CF = 5% and CF = 1.3%, resulting in AUCs of 0.92, 0.79 and 0.68, respectively (Extended Data Fig. 4). We obtained similar results when applying scNOVA to pseudoclones derived from different (genetically related) RPE cell lines (Supplementary Fig. 7). These benchmarking exercises suggest that scNOVA can accurately infer gene activity changes between defined cell populations, suggesting that this framework can be used to functionally characterize subclonal SVs.

Extended Data Fig. 4 — We performed *in silico* cell mixing of RPE-1 and HG01573 cells to simulate application of scNOVA to different cell fractions (CFs). In this analysis six different CF ranges were considered (20, 10, 5, 3.3, 2, and 1.3). For each *in silico* cell mixing experiment, a total of 150 single cells were randomly subsampled for the major pseudo-clone (containing RPE-1 cells) and the minor pseudo-clone (HG01573 cells), by controlling the minor pseudo-clone CF at 20, 10, 5, 3.3, 2, and 1.3%, respectively. AUC, area under the curve. DEGs, differentially expressed genes. For each CF, we performed random subsampling of single-cell libraries 10 times, and depicted the respective mean AUC in the plot. Two different analysis modes - default (dashed lines, CNN with negative binomial generalized linear model), and alternative (solid lines, CNN with PLS-DA) are depicted. When the CF is larger than 10%, the default mode performs better, whereas for CFs smaller than 10%, the alternative mode outperforms the default mode.

Functional outcomes of SVs in cell lines

To test this, we set out to investigate the functional outcomes of somatic SV landscapes in a panel of LCL samples³⁸ (N = 25) from the 1000 Genomes Project³⁹ (1KGP). Single-cell SV discovery in 1,372 Strand-seq libraries generated for this panel (Supplementary Table 1) discovered 205 somatic SVs, with 24 of 25 (96%) LCLs showing at least one SV subclone—a sevenfold increase compared to a previous report⁴⁰ (Supplementary Table 3 and Supplementary Data). Of all the cell lines, 13 (52%) contained an SV subclone above 10% CF. This included the widely used NA12878 cell line^34,39, in which we discovered a subclonal 500 kb deletion at19q13.12 (CF = 21%) that was mutually exclusive with two 22q11.2 deletions seen at CFs of 21% and 57%, respectively (Supplementary Figs. 9 and 10). The 22q11.2 SVs mapped to the well-known site of IGL recombination occurring during normal B cell development⁴¹. We hence focused on the 19q13.12 event, which resulted in the loss of a copy of ZNF382—a tumor suppressor and repressor of c-Myc⁴². Application of scNOVA measured significantly increased activity of ERCC6—a target gene of the c-Myc/Max transcription factor (TF) dimer⁴³—and decreased activity of PIEZO2 and TRAPPC9, in cells harboring this deletion (10% FDR; Supplementary Table 2).

To validate these findings, we reanalyzed Fluidigm and Smart-seq single-cell RNA-seq (scRNA-seq) datasets generated for NA12878 (refs. ^44,45). We employed several established tools for SCNAs discovery from scRNA-seq data^46–48 (Supplementary Table 4), all of which failed to discover any of the SV subclones seen in this cell line (Supplementary Table 4). Yet, upon directly inputting the respective SV breakpoint coordinates into the CONICSmat tool⁴⁶, we succeeded in identifying the 19q13.12 deletion (denoted ‘19q-Del’) through ‘targeted SCNA recalling’. We next pursued differential gene expression analyses by scRNA-seq, comparing 19q-Del cells to unaffected (‘19q-Ref’) cells, and verified overexpression of ERCC6 in 19q-Del cells (10% FDR; Supplementary Fig. 10). For PIEZO2 and TRAPPC9, the scRNA-seq-based expression trends were consistent with scNOVA (Supplementary Fig. 10), but did not reach the FDR threshold. A search for the over-represented TF targets amongst the differentially active genes identified c-Myc and Max as the most over-represented TFs in 19q-Del cells (10% FDR; Supplementary Fig. 10). These results indicate that scNOVA can functionally characterize SVs inaccessible to scRNA-seq-based SCNA discovery.

We next focused on NA20509, the LCL with the most abundant SV subclone (85% CF). Somatic SVs in NA20509 arose primarily through the breakage-fusion-bride-cycle (BFB) process^24,49 involving a 49 Mb terminal duplication on 5q, and a 2.5 Mb inverted duplication on 17p with an adjacent terminal deletion (terDel) (Fig. 2a). The 5q and 17p segments became fused into a derivative chromosome of around 115 Mb (Supplementary Fig. 13), which probably stabilized the BFB. We searched for global gene activity changes in this ‘17p-BFB’ subclone compared with the nonrearranged cells (‘17p-Ref’) and identified 18 dysregulated genes (Fig. 2b). Testing for gene set over-representation⁵⁰ (Methods) revealed an enrichment of the target genes of c-Myc/Max heterodimers (10% FDR; Fig. 2c), that is, the same TFs we observed in the 19q-Del subclone in NA12878. Consistent with this, we identified somatic copy-number gain of MAP2K3, which encodes a gene activating c-Myc/Max⁵¹, resulting from the BFB (Fig. 2a).

Fig. 2 — a, Complex SVs in NA20509, with BFB-mediated rearrangements (17p) and a terminal dispersed duplication (5q) present with CF = 85%, shown for representative single cells. Ref, cells lacking complex SVs; InvDup, inverted duplication; terDel, terminal deletion. Reads are mapped to the W (orange) or C (green) strand. Gray, single cell IDs. b, Heatmap of 18 genes with altered gene activity amongst subclones, based on scNOVA (‘17p-BFB’, SV subclone; ‘17p-Ref’, 17p not rearranged). Asterisks denote TF targets of c-Myc and Max. c, Gene set over-representation analysis for TF target genes showing significant enrichment of c-Myc and Max targets in the 17p-BFB subclone. P_adj., adjusted P value. Right, Model for c-Myc/Max target activation in NA20509 based on scNOVA, combined with previous knowledge. d, Mean RNA-seq expression Z scores of c-Myc/Max target genes across 33 LCLs. e, Fishplot showing CF changes over long-term culture from 23.3% (7 of 33 cells; p4) to 100% (30of 30 cells; p8). f, qPCR verifies clonal expansion of the BFB clone in p8 compared with p4 (P value based on FDR-corrected two-sided unpaired t-tests; n = 3). HG1505, control cell line with a somatically stable *MAP2K3* locus. Note that for both NA20509 and HG1505 the germline copy number of the *MAP2K3* locus was consistently estimated to be three. Data are presented as mean values ± s.e.m. g, RNA-seq shows significant increase of *MAP2K3* at p8 versus p4 (FDR-corrected two-sided Wald test, based on DESeq2; n = 5 and three biological replicates for p4 and p8, respectively). h, Mean RNA expression Z scores of c-Myc/Max target genes in NA20509 (differences between p4 and p8 were evaluated using a two-sided Wilcoxon rank sum test; n = 5 and three biological replicates for p4 and p8, respectively). Boxplot was defined by minima = 25th percentile – 1.5 × IQR, maxima = 75th percentile + 1.5 × IQR, center = median and bounds of box = 25th and 75th percentile (g–h).

We performed several orthogonal analyses to validate these findings. First, we verified all somatic SVs using deep WGS data generated for the 1KGP sample panel⁵² (Supplementary Fig. 13). Second, we analyzed RNA-seq data³⁸ for this LCL panel, which revealed that NA20509 exhibits the highest MAP2K3 expression and the highest c-Myc/Max target expression (Supplementary Fig. 14 and Fig. 2d). Third, we followed the 17p-BFB subclone in culture, by subjecting early (p4) and late passage (p8) cells to Strand-seq, which revealed outgrowth of the 17p-BFB subclone (CF = 23% at p4, CF = 100% at p8; P < 0.00001, Fisher’s exact test; Fig. 2e), suggesting these cells have a proliferative advantage. Quantitative real-time PCR experiments verified this clonal outgrowth pattern (Fig. 2f).

Since the functional impact of SVs on clonal expansion is unexplored in LCLs, we more deeply characterized the molecular phenotypes of 17p-BFB cells by pursuing RNA-seq in p4 and p8 cultures. We observed increased MAP2K3 expression (1.39-fold, 10% FDR) at p8, consistent with MAP2K3 dysregulation as a result of copy-number gain in the 17p-BFB subclone (Fig. 2g and Supplementary Note). Pathway-level analysis showed deregulation of c-Myc/Max target genes following clonal expansion (P = 0.036; Wilcoxon rank sum test; Fig. 2h and Supplementary Fig. 14). Collectively, these data link the outgrowth of SV subclones to the deregulation of c-Myc/Max targets, which could represent a common driver of clonal expansion in LCLs.

Local effects of copy-balanced driver SVs in leukemia

To deconvolute the effects of driver SVs in patients, we applied scNOVA to analyze the local consequences of balanced SVs, which are widespread in leukemia^3,53. We analyzed primary cells from a patient with acute myeloid leukemia (AML) (32-year-old male; patient-ID = AML_1) bearing a balanced t(8;21) translocation that results in RUNX1-RUNX1T1 gene fusion⁵⁴. We sorted CD34⁺ cells from AML_1 (Supplementary Fig. 15), and sequenced 42 Strand-seq libraries. SV discovery revealed a 46,XY,t(8;21)(q22;q22) karyotype (Fig. 3a, Supplementary Fig. 16 and Supplementary Table 3) consistent with clinical diagnosis. We fine-mapped the translocation breakpoint to intron 1 of RUNX1T1 and intron 5 of RUNX1 (Supplementary Fig. 17), and subsequently identified haplotype-specific NO at 11 genes, genome-wide (10% FDR; Supplementary Table 2). This included RUNX1T1, which showed reduced NO on the derivative (H2) haplotype (Fig. 3b), consistent with increased gene activity mediated as a local effect of the translocation⁵⁵. The remaining genes did not reside near a detected somatic SV, suggesting other factors (such as germline SNPs; Supplementary Fig. 17) may have affected their NO.

Fig. 3 — a, Balanced t(8;21) translocation in AML_1, discovered based on strand cosegregation (P value = 0.00003 for translocation discovery using strand cosegregation²⁴, FDR-adjusted Fisher’s exact test; Supplementary Fig. 16). The SV breakpoint was fine-mapped to the region highlighted in light blue. Composite reads shown were taken from all informative cells in which reads could be phased (WC or CW configuration; Methods). b, A violin plot demonstrates haplotype-specific NO at the *RUNX1T1* gene body (10% FDR; two-sided Wilcoxon rank sum test followed by Benjamini–Hochberg multiple correction; n = 17 single cells; boxplot was defined by minima = 25th percentile – 1.5× IQR, maxima = 75th percentile + 1.5× IQR, center = median and bounds of box = 25th and 75th percentile), consistent with aberrant activity of the locus on der(8). c, Haplotype-specific NO around the SV breakpoint. FCs of haplotype-specific NO, measured between the *RUNX1-RUNX1T1* containing derivative chromosome (der(8)) and corresponding regions on the unaffected homolog (Ref), are shown in black, and –log₁₀(P values) in light blue. Enhancer-target gene physical interactions based on chromatin conformation capture^56,93 are depicted in orange (interactions involving *RUNX1* and *RUNX1T1*) and gray (involving other loci). d, Significant CREs located within the distal peak region, demonstrating haplotype-specific absence of NO on der(8) at 10% FDR, suggesting increased CRE accessibility on der(8). Within the segment around 0.8 to 1.1 Mb upstream of RUNX1, which showed pronounced haplotype-specific NO, we tested 69 CREs for haplotype-specific NO, which identified two significant CREs. e, Haplotype-specific NO measured between der(8) and corresponding regions of the unaffected homolog. Red, regions corresponding to the fused TAD. f, A beeswarm plot shows that the fused TAD (red) is an outlier in terms of haplotype-specific NO on der(8) (P values based on Kolmogorov–Smirnov tests; n = 83 TADs in der(8); boxplot was defined by minima = 25th percentile – 1.5× IQR, maxima = 75th percentile + 1.5× IQR, center = median and bounds of box = 25th and 75th percentile).

To systematically investigate potential local effects, we used a sliding window (Methods) to measure NO on both sides of the translocation breakpoint. We observed decreased NO, suggesting increased chromatin accessibility, from the breakpoint junction up to the respective nearest topologically associating domain (TAD) boundaries (Fig. 3c). This signal was most pronounced in an enhancer-rich region around 0.8 to 1.1 Mb upstream of RUNX1 originating from chromosome 21 (P < 0.003; likelihood ratio test, adjusted using permutations; Fig. 3c), found to physically interact with the RUNX1 promoter in CD34⁺ cells⁵⁶. Within this segment, we identified two CREs with significantly reduced NO (10% FDR; Exact test) (Fig. 3d and Supplementary Table 5), which may foster RUNX1-RUNX1T1 expression. Chromosome-wide analysis showed haplotype-specific NO patterns were restricted to the fused TAD (Fig. 3e,f), in line with these patterns resulting from the translocation.

We also revisited Strand-seq datasets with previously reported copy-neutral SVs, including the BM510 cell line in which copy-neutral interchromosomal SVs resulted in TP53–NTRK3 gene fusion²⁴. In agreement with the oncogenic role of TP53–NTRK3 (ref. ²⁴), scNOVA identified NTRK3 upregulation as the only significant local effect (10% FDR), consistent with allele-specific TP53–NTRK3 expression measured on the rearranged haplotype (Extended Data Fig. 5). Second, we revisited a 2.6 Mb inversion mapping to 14q32 in a T-cell acute lymphoblastic leukemia (T-ALL) patient-derived xenograft (T-ALL_P1)²⁴. scNOVA discovered downregulation of BCL11B, a known haploinsufficient T-ALL tumor suppressor⁵⁷, as a significant local effect of this balanced inversion, supporting allele-specific silencing of BCL11B on the rearranged haplotype as measured by RNA-seq²⁴ (Extended Data Fig. 6). These data collectively show that scNOVA allows linking balanced SVs to their local functional consequences—a functionality not provided by any previous single-cell multiomic method²⁰.

Extended Data Fig. 5 — (a, b) Haplotype-specific NO analysis of NO at gene bodies genome-wide in RPE-1 (a) and BM510 (b). For each chromosomal karyogram, the y-axis indicates the significance of haplotype-specific NO for each gene (-log10 p.adjust). All the significant genes were indicated in red dots (FDR 10%; two-sided wilcoxon rank sum test followed by Benjamini Hochberg multiple correction; derived from n = 33 cells and n = 79 cells for RPE-1 and BM510, respectively; Boxplots were defined by minima = 25th percentile - 1.5X interquartile range (IQR), maxima = 75th percentile + 1.5X IQR, center = median, and bounds of box = 25th and 75th percentile.). *NTRK3* (identified in BM510) is the only significant gene adjacent to an SV breakpoint. Haplotype-resolved RNA expression at the *NTRK3* locus is depicted using bar graphs in the right panel (two-sided likelihood ratio test followed by Benjamini Hochberg multiple correction; n = 2 biological replicates; Data are presented as mean values +/− SEM). (**c-d**) Haplotype-specific NO analysis at CREs. Browser track depicts the haplotype-resolved NO of the not rearranged (Ref) homolog in red, and the SV homolog in blue. scNOVA identified two CREs with significant haplotype-specific NO, including an intergenic CRE spanning chr15:87527100-87528100 (p.adjust = 0.029, log2-fold change = −2.01) (c) and an intronic CRE at chr15:88246388-88247388 (p.adjust = 0.076, log2-fold change = −1.39) (d).

Extended Data Fig. 6 — (a) For each chromosomal karyogram, the y-axis indicates the significance of haplotype-specific NO at each gene (-log10 p.adjust). Genes with haplotype-specific NO are indicated using red dots (FDR 10%). An inlet figure depicts haplotype-specific NO (two-sided wilcoxon rank sum test and Benjamini Hochberg multiple correction; n = 56 cells) and RNA expression at the *BCL11B* gene locus (two-sided likelihood ratio test and Benjamini Hochberg multiple correction; n = 2 biological replicates), which has a nearby somatic SV (within 1 Megabase) and represents the (only) predicted local SV effect. (b) We did not measure haplotype-specific NO for *TCL1A* (two-sided wilcoxon rank sum test and Benjamini Hochberg multiple correction; n = 56 cells), a small gene with 4229 bp in size, in spite of its haplotype-specific gene expression²⁴ (two-sided likelihood ratio test and Benjamini Hochberg multiple correction; n = 2 biological replicates). Boxplots were defined by minima=25th percentile-1.5X interquartile range (IQR), maxima=75th percentile+1.5X IQR, center=median, and bounds of box=25th and 75th percentile. For bargraphs, data are presented as mean values +/− SEM (**a-b**). (c) Simulation analysis revealed a minimum gene length (7219 bp) needed to robustly detect haplotype-specific NO at gene bodies, a gene length met by 80% of genes in the genome (Supplementary Notes). (d) Inversion breakpoints and rearranged TADs. Known 3’ *BCL11B* enhancers¹¹² are depicted in orange. In the not rearranged haplotype, they are located proximal to *BCL11B*, but in the inverted haplotype these enhancers they are located far away from *BCL11B*, and proximal to *TCL1A* in the different TAD boundary. (e) Application of scNOVA identified an intergenic CRE near the *BCL11B* with haplotype-specific NO. The browser track depicts the haplotype-resolved NO of the not rearranged (Ref) homolog in red and the SV homolog in blue. (f) The known 3’ *BCL11B* enhancer does not show significant haplotype-specific NO, but the inversion physically relocates these enhancers to the far distance from the *BCL11B*. A representative CRE is shown amongst four CREs overlapping with known 3’ *BCL11B* enhancers.

Dissecting functional effects of heterogeneous somatic SVs

We next set out to functionally dissect a leukemia sample with unknown genetic drivers, by characterizing B-cells from a 61-year-old patient with chronic lymphocytic leukemia (CLL) (CLL_24)⁵⁸. Analysis of 86 Strand-seq libraries revealed an unprecedented level of somatic SVs, with 11 different karyotypes represented by 13 SVs occurring in subclones with CFs of 1–5% (Supplementary Table 3). This vastly exceeds intrapatient diversity estimates for CLLs from the Pan-Cancer Analysis of Whole Genomes (PCAWG), where maximally three subclones were reported⁵⁹, highlighting how Strand-seq provides access to SVs escaping discovery by WGS^3,24. Chromosome 10q showed especially pronounced subclonal heterogeneity; we identified seven partially overlapping deletions ranging from 2 to 31 Mb in size, and residing proximal to the fragile site FRA10B⁶⁰ (Fig. 4a and Supplementary Fig. 18). These SVs clustered into a 1.4 Mb ‘minimal segment’ at 10q24.32, arising independently from both haplotypes (Fig. 4b). While previous studies reported somatic 10q24.32 deletions in 1–4% of CLLs^61–63, molecular analysis of this recurrent somatic SV has so far been lacking.

Fig. 4 — a, Single-cell SV discovery in CLL_24. All cells exhibiting deletions (10q-Del) shown in Supplementary Fig. 18. 10q-Ref, cells bearing a not rearranged 10q. b, Minimal deleted region (chr10:101615000-103028000; hg38), displaying recurrent deletions in a separate cohort of CLLs⁶². c, Heatmap of genes with altered activity in 10q-Del based on scNOVA (alternative mode; 10% FDR). Genes from all significant pathways reported in d are highlighted. d, Pathway modules with differential activity, in cells exhibiting 10q-Del (10% FDR). e, Minimal deleted region-bearing CLL samples from the International Cancer Genome Consortium (ICGC) demonstrate overexpression of Wnt signaling genes compared with 10q-Ref (P = 0.0098; two-sided likelihood ratio test; n = 174 and n = 4 independent CLL samples for 10q-Ref and 10q-Del, respectively). f, Pathway activities ((–1) × Z score of NO) derived from jointly modeled NO at the gene bodies of Wnt signaling pathway genes for each SV-bearing CLL_24 cell. SIa-SId correspond to single cells exhibiting a deletion at 10q24 not shared by any other cell. n = 2, 3, 2 and 1 cells are depicted in the plot for SCa, SCb, SCc and SIa-SId, respectively. g, Single-cell gene set enrichment scores for five leukemia-related pathways from CITE-seq. Enrichment scores for 10q-terDel (n = 82) and 10q-Ref (n = 2,381) cells were compared using two-sided t-tests. h, Chart depicting 43 differentially active TFs between 10q-terDel and 10q-Ref cells based on DoRothEA⁶⁸. Genes involved in the pathways over-represented by these TFs are annotated using colored dots. i, Differentially expressed surface protein CD279 (PD-1) in 10q-terDel (n = 82) compared with 10q-Ref (n = 2,381) cells based on a two-sided Wilcoxon rank sum test. j, Wnt pathway diagram showing the altered genes or TFs in SCb (10q-terDel) identified by scNOVA (blue nodes) and CITE-seq (red borders). Gray, known (see PubmedIDs) and computationally predicted regulators (based on Gene Ontology Biological Process (GOBP)) of Wnt signaling that are deleted in SCb. Throughout the figure, boxplots were defined by minima = 25th percentile – 1.5× IQR, maxima = 75th percentile + 1.5× IQR, center = median and bounds of box = 25th and 75th percentile.

We first compared all cells bearing a 10q24.32 deletion (‘10q-Del’, N = 11) to cells lacking such SV (‘10q-Ref’, N = 75), hence disregarding the fine-scale subclonal structure of CLL_24, and predicted 115 dysregulated genes (Fig. 4c and Supplementary Table 2). Next, we performed molecular phenotype analysis using MsigDB⁶⁴ (Methods), which revealed that 10q-Del cells exhibit increased activity in several leukemia-relevant signaling pathways, including Wnt, c-Met (a pathway promoted by Wnt signaling⁶⁵), B cell receptor (BCR) signaling, phosphatidylinositol (3,4,5)-trisphosphate (PIP3) signaling and the CREB pathway (10% FDR; Fig. 4d). RNA-seq data available for 178 CLLs⁶² and stratified by 10q24.32 status, revealed upregulation of Wnt and c-Met signaling—but not of BCR, PIP3 and CREB signaling—in CLLs exhibiting 10q24.32 deletions (10% FDR; CLLs with 10q-Del: N = 4; 10q-Ref: N = 174; Fig. 4e and Supplementary Fig. 24). These data therefore suggest a link between 10q24.32 deletion and the promotion of Wnt signaling.

We further tested whether the different 10q-Del events seen in CLL_24 subclones have led to distinct functional outcomes, focusing on three subclones represented by at least two cells: ‘SCa,’ showing one interstitial deletion directly at the minimal segment; ‘SCb,’ harboring a terDel, with the breakpoint located at the minimal segment boundary and ‘SCc,’ containing two interstitial deletions, at the minimal segment and at 10q23.31 (Fig. 4b and Supplementary Table 3). Molecular phenotype analysis of each subclone identified 109, 206 and 266 differentially active genes, respectively (Supplementary Table 2), with the most pronounced levels of Wnt upregulation in SCb and SCc (Fig. 4f). SCb showed the highest activation of c-Met, BCR and PIP3 signaling, whereas CREB signaling was highest in SCc (Supplementary Fig. 21). This suggests that deletion location and length at 10q24.32 affect their molecular consequences, and furthermore illustrates the ability of scNOVA to predict molecular differences in subclones represented by as few as two cells.

To more deeply characterize the CLL_24 subclones, we generated CITE-seq (cellular indexing of transcriptomes and epitopes by single-cell sequencing) data, which couples scRNA-seq with protein surface marker measurements⁶⁶. Again, we attempted SCNA discovery in the scRNA-seq data, which failed to detect any SCNAs, or subclones, in CLL_24 (Supplementary Table 4). However, targeted SCNA recalling⁴⁶ identified 82 CITE-seq cells harboring the greater than 31 Mb 10q-terDel of SCb (‘10q-terDel’), whereas the deletions in SCa (2.2 Mb) and SCc (2.1 Mb and 1.9 Mb, respectively) escaped detection (Extended Data Fig. 7 and Supplementary Notes). Having recovered the SCb subclone in the CITE-seq data, we performed single-cell gene set enrichment analysis⁶⁷ (Methods), which verified that all pathways inferred by scNOVA (Wnt, c-Met, BCR, PIP3 and CREB) are upregulated in 10q-terDel cells (Fig. 4d,g). A gene regulatory network analysis⁶⁸ comparing 10q-terDel with 10q-Ref cells identified 43 differentially active TFs (FDR 10%; Fig. 4h) and a functional enrichment analysis⁶⁹ showed over-representation of Wnt signaling, BCR signaling and the PD-1 checkpoint pathway (Supplementary Table 16 and Fig. 4h); the PD-1 checkpoint pathway has been linked to immune resistance and transformation of CLL to aggressive lymphoma^70,71. Since somatic lesions mediating PD-1 expression in CLL have remained elusive, we used the CITE-seq data to analyze PD-1 protein expression, which demonstrated upregulation of PD-1 in 10q-terDel-containing cells as the only significant hit at the protein level (Fig. 4i). Notably, NFATC1, a TF predicted to be differentially active by both scNOVA and CITE-seq, regulates Wnt⁷², PIP3 (refs. ^73,74), CREB⁷⁵ and BCR signaling⁷⁶ as well as PD-1 expression⁷⁷, and thus may contribute to global pathway dysregulation in CLL_24. Our analysis reveals subtle pathway activities of somatic deletions present at low CF (Fig. 4f,j), and collectively implicates 10q24.32 deletions in dysregulated Wnt signaling—a crucial pathway for CLL pathogenesis⁷⁸.

Extended Data Fig. 7 — (a) InferCNV⁴⁸ analysis of 3,919 high quality CLL cells, and 540 control cells (cells sequenced by CITE-seq not originating from the B-cell lineage; see Supplementary Fig. 25), profiled by CITE-seq. This analysis did not discover any subclones in CLL_24. (Note that the high variability observed on the 6p-arm, not only seen in CLL cells but also in control cells, likely arose from the presence of *MHC* genes in this locus, whose expression is cell cycle dependent¹¹³.) (b) CONICSmat based targeted SCNA recalling of the 10q-terDel (previously discovered in SCb; see Fig. 4b) using the high-resolution breakpoints derived from Strand-seq. Use of these SV breakpoints allowed CONICSmat to confidently call the 10q-terDel in 82 single cells from the CITE-seq data.

Functional characterization of subclonal chromothripsis

While chromothripsis is a widespread mutational process in cancer^3,4,22, this process is not ascertained by previous single-cell multiomic methods, and its molecular outcomes remain largely elusive^3,79. We previously discovered a subclonal chromothripsis event²⁴ in T-ALL_P1 that affects most of 6q (denoted ‘6q-CT’; CF = 30%) (Fig. 5a and Supplementary Table 3); however, the consequences of this complex rearrangement were uncharacterized. Using scNOVA, we identified 12 genes with differential NO between 6q-CT and 6q-Ref cells (denoted the ‘CT gene signature’; 10% FDR; Fig. 5a,b and Supplementary Table 2). A closer analysis showed 27 TF genes overlapping the chromothriptic region (Fig. 5a). Gene set over-representation testing using the target genes of these TFs revealed that c-Myb, product of the MYB oncogene, was significantly enriched among the genes included in the CT gene signature (10% FDR; adjusted P = 0.00015; Fig. 5b,c and Supplementary Table 6). The MYB gene is located within a region that was duplicated (and inverted) as a result of 6q-CT, suggesting a potential dosage effect (Fig. 5a). Corroborating these predictions, we performed RNA-seq in a panel of 13 T-ALLs, amongst which T-ALL_P1 showed the highest expression of c-Myb targets (Fig. 5d and Supplementary Table 7). We also verified that MYB is allele-specifically expressed from the SV-affected haplotype (P = 0.0317; likelihood ratio test; Supplementary Fig. 30), which together nominates MYB as a candidate driver gene dysregulated as a consequence of 6q-CT.

Fig. 5 — a, The 27 TF genes located in a segment that underwent chromothripsis²⁴ on 6q in T-ALL_P1. Haplotype-specific NO measurements, which scNOVA generated for CREs assigned to the nearest genes, are depicted below. FC of normalized haplotype-specific NO is shown for each subclone. 6q-CT, subclone bearing chromothripsis on 6q; 6q-Ref, subclone bearing a not rearranged chromosome 6. b, Heatmap of 12 genes with differential activity between subclones in T-ALL_P1, based on scNOVA (denoted CT gene signature). Asterisks denote TF targets highlighted in c. c, TF target over-representation analyses for CT gene signature, revealing c-Myb as the only significant hit. Venn diagram depicting enrichment of c-Myb targets (P value based on an FDR-adjusted hypergeometric test). Upper right, network with c-Myb and its target genes based on scNOVA, combined with previous knowledge. d, Mean Z scores of c-Myb target gene expression measured by bulk RNA-seq in a panel of 13 T-ALL-derived samples. T-ALL_P1 (P1) exhibited the overall highest expression of c-Myb targets.

To more deeply characterize this sample, we generated scRNA-seq data for T-ALL_P1 (5,504 cells; Fig. 6a). Since scRNA-seq-based SCNAs discovery^46–48 missed the 6q-CT event (Supplementary Table 4), we again performed targeted SCNA recalling (Supplementary Notes) generating confident calls for 838 (around 15%) cells in the scRNA-seq dataset (the remaining 4,666 cells lacked a confident assignment; ‘NA’). Out of these 838 cells, 729 were predicted to harbor the 6q-CT event, and 109 were called 6q-Ref. Unsupervised clustering⁸⁰ of the scRNA-seq data stratified by 6q status (Methods) revealed that 6q-CT cells (as predicted through targeted recalling) were enriched in two expression clusters (clusters 3 and 7; P = 3.43 × 10^–5 and 1.15 × 10^–3; FDR-adjusted Fisher’s exact test; Fig. 6d and Supplementary Fig. 34), in line with a distinctive expression profile. To corroborate this, we applied UCell⁸¹ to assign cells into ‘6q-CT’ or ‘6q-Ref’ based on the CT gene signature, which confirmed enrichment of 6q-CT in clusters 3 and 7 (Fig. 6c,d; P = 3.39 × 10^–38 and P = 2.15 × 10^–4; FDR-adjusted Fisher’s exact test). Trajectory analysis⁸² showed the 6q-CT cells (as defined by UCell) were enriched for DNearly (double-negative early; P = 2.78 × 10^–13), DNQ (double-negative quiescent; P = 1.27 × 10^–5) and DPP (double-positive proliferating; P = 1.88 × 10^–7) T cells (FDR-corrected Fisher’s exact tests; Fig. 6b and Supplementary Fig. 35), and depleted of mature CD4⁺ T cells (P = 1.45 × 10^–11, Supplementary Fig. 35). This suggests a potential differentiation block at the progenitor stage as a result of 6q-CT and, more generally, that 6q-CT cells bear a distinctive molecular phenotype as a result of the chromothriptic rearrangements.

Having identified c-Myb pathway activation as a consequence of 6q-CT in TALL_P1, we hypothesized this molecular phenotype could guide drug targeting in cell culture. We selected NOTCH1 as a suitable candidate for targeting this subclone because this c-Myb target was (1) inferred by scNOVA to be highly upregulated in 6q-CT cells (Fig. 5b) and (2) is targetable by different compounds and strategies⁸³. We treated T-ALL_P1 cell cultures with the CB-103 pan-NOTCH small-molecule inhibitor (targeting the Notch1 intracellular domain (N1-ICD)^84,85) or a vehicle control for 8 h and 24 h (Methods). Using scRNA-seq (3,663 single cells) to analyze drug response patterns, we inferred 6q-CT and 6q-Ref cells at each timepoint by transferring the cell annotation labels from the untreated (reference) sample with Seurat⁸⁰ (Fig. 6c and Supplementary Fig. 37). After 24 h in culture, vehicle-treated T-ALL_P1 cells showed a 45% relative increase in the 6q-CT subclone compared to 8 h (CF of 17.1% to 24.6%; P = 0.0180; FDR-adjusted Fisher’s exact test), indicating that 6q-CT cells expanded clonally. By contrast, upon CB-103 treatment, the CF of the 6q-CT subclone was reduced at 24 h (to CF = 15.5%; P = 0.0064; Fig. 6e and Supplementary Fig. 38), indicating that 6q-CT cells were preferentially lost with N1-ICD inhibition. Additionally, we observed specific depletion of the REACTOME N1-ICD gene set only in 6q-CT cells after 24 h of CD-103 treatment, consistent with specific subclone targeting (P = 0.0096; FDR-adjusted Wilcoxon rank sum test; Fig. 6f and Supplementary Fig. 39). These results highlight the potential of scNOVA to functionally characterize highly complex classes of DNA rearrangement (that is, chromothripsis events), and to clinically target subclones bearing complex cancer driver SVs.

Discussion

The functional characterization of SVs is of critical importance for precision oncology^1–3. Our method characterizes a wide spectrum of SV classes²⁴, and couples these with NO analysis to link somatic SVs to local or global gene activity changes. Accounting for balanced SVs, scNOVA allows the investigation of copy-number stable (that is, euploid) malignancies previously inaccessible to single-cell multiomics^3,20 (Supplementary Table 12). Strand-seq derived SCNA calls were far better resolved compared to scRNA-seq based calls (Supplementary Table 4), suggesting a more limited utility of scRNA-seq data for discovering SCNA drivers in cancer, with the exception of malignancies displaying extremely high levels of chromosomal instability with particularly large-scale SCNAs^3,86.

We uncovered unprecedented karyotypic diversity in a CLL sample, comprising distinct deletions at 10q24.32, which we link to leukemia-related signaling pathways, particularly Wnt signaling. Read depth based profiling of SCNAs is prone to underreport such subclonal structural diversity³. Enrichment of cases bearing 10q24.32 deletions amongst relapsed/refractory and high-risk CLL⁸⁷ suggests a potential role of Wnt pathway dysregulation mediated through 10q24.32 in disease progression. Whether the FRA10B fragile site is involved in the formation of these deletions remains to be seen and requires larger cohorts. Interestingly, CLL_24 exhibits a SNP (rs118137427; 3.7% allele frequency in Europeans) within FRA10B associated with the acquisition of 10q-terDel in normal blood⁸⁸. Based on the PCAWG resource comprising 94 CLLs², rs118137427 is seen in 2 out of 4 (50%) CLLs with 10q24.32 deletions, but in only 6 of 90 (6.7%) CLLs with 10q-Ref (P = 0.035; Fisher’s exact test), suggesting a possible link between SNPs at FRA10B and ITH in leukemia that warrants future investigation.

Our framework readily functionally characterizes complex rearrangements previously inaccessible to single-cell multiomics³. Complex somatic SVs are prevalent in cancer and linked with aggressive tumor phenotypes^2,3,22 underlining significant potential of scNOVA for the comprehensive functional characterization of cancer cells. Since scNOVA does not require coupling distinct experimental modalities in each individual cell, it overcomes important methodological challenges²⁰, including data sparseness and higher costs from generating data for more than one modality^20,89. Additionally, the coverage achieved by Strand-seq enables the analysis of haplotype-specific NO along the entire genome (Supplementary Fig. 41), providing advantages over classical allele-specific analyses that are restricted to regionally phased SNPs¹⁵.

Nonetheless, important challenges remain, and the full spectrum of mutations arising in an individual cell is likely to remain inaccessible to a single method in the foreseeable future. Strand-seq does not capture SVs less than 200 kb that more rarely acts as cancer drivers². Additionally, while scNOVA infers differentially active genes, it does not span the same dynamic expression range as scRNA-seq (Supplementary Table 12). This suggests that pairing scNOVA with targeted SCNA recalling by scRNA-seq can provide added value by allowing variants outside the detection range of other methods to be characterized. Finally, Strand-seq requires dividing cells for BrdU labeling²³ (Fig. 1a), and is therefore not applicable for nondividing cells or fixed samples. However, it can be used for dividing cells in organoids, primary fresh frozen progenitor cells, cells in regenerating tissues and cancer samples amenable to culture. Our study used cell lines for benchmarking followed by proof-of-principle application in patient samples. Generalization of these analyses to larger cohorts will allow systematic investigation of the roles subclonal SVs play in leukaemogenesis.

We foresee a wide variety of potential future applications. Our framework offers potential for studies on the determinants and consequences of chromosomal instability in cancer, and may promote research into the interplay of genetic and nongenetic cancer determinants²⁰. It likewise could be used to advance surveys of precancerous lesions^3,90. Additionally, scNOVA may offer value in precision oncology by exposing subclonal driver alterations along with their targetable functional outcomes, to target cancer subclones in patients. Furthermore, SVs can accidentally arise in key model cell lines, as we demonstrate for widely used LCLs, and the features of scNOVA are ideally suited to functionally characterize unwanted heterogeneity in such samples. Unwanted somatic SVs also arise as a by-product of CRISPR-Cas9 genome editing, which generates micronuclei and chromosome bridges in human primary cells, structures that initiate the formation of chromothripsis⁹¹. scNOVA could promote the safety of therapeutically relevant genome editing in the future, by enabling the simultaneous detection and functional characterization of such potentially pathogenic editing outcomes.

In summary, scNOVA moves directly from SV landscapes to their functional consequences in heterogeneous cell populations. By making a broad spectrum of somatic SVs accessible for functional characterization genome-wide, this single-cell multiomic framework serves as a foundation for deciphering the impact of somatic rearrangement processes in cancer.

Methods

Strand-seq library preparation

NA20509 Strand-seq libraries were prepared as previously described⁹⁴. Strand-seq libraries of primary leukemia samples were generated as follows: peripheral blood mononuclear cells of a previously untreated female CLL patient (routine diagnostics: IGHV unmutated, no TP53 mutation, no detected alteration in 6q21, 8q24, 11q22.3, 12q13, 13q14 and 17p13) were isolated after obtaining informed consent. Cells were isolated and cultured using previously established protocols⁹⁵. CLL cells were cultured at 1 × 10⁶ cells ml^–1 in Roswell Park Memorial Institute (RPMI) medium (Gibco by Life technologies), supplemented with 10 % human serum (PAN BIOTECH), 1% Pen/Strep (GIBCO by Life Technologies) and 1% Glutamine (GIBCO by Life Technologies). Cells were stimulated with 1 µg ml^–1 Resiquimod (Enzo) and 50 ng ml^–1 IL-2 (Sigma). BrdU (40 µM; Sigma) was incorporated for 90 h and 120 h, respectively, to perform nontemplate strand labeling. Single nuclei from each timepoint were sorted into 96-well plates using a BD FACSMelody cell sorter, followed by Strand-seq library preparation (described below). In the case of the AML sample, frozen primary mononuclear cells from a bone marrow aspirate were thawed and stained with CD34-APC (clone 581; Biolegend), CD38-PeCy7 (clone HB7; eBioscience), CD45Ra-FITC (clone HI100; eBioscience), CD90-PE (clone 5E10; eBioscience) and LIVE/DEAD Fixable Near-IR Dead Cell Stain (Thermofisher). Single, viable, CD34⁺ cells (Supplementary Fig. 15) were sorted using a BD FACSAria Fusion Cell Sorter into ice-cold serum-free expansion medium (SFEM) supplemented with 100 ng ml^–1 SCF and Flt3 (Stem Cell Technologies), 20 ng ml^–1 IL-3, IL-6, G-CSF and TPO (Stem Cell Technologies). Cells were plated in Corning Costar Ultra-Low Attachment 96-well flat-bottom plates (Sigma) at 1 × 10⁵ cells ml^–1 in warm medium as above. At 24 h after culture, 40 µM BrdU was added. Nuclei were isolated after 43 h total culture time, and BrdU-incorporating nuclei sorted into 96-well plates followed by Strand-seq library preparation. All Strand-seq libraries were automatically prepared using a Biomek FXP liquid handling robotic system, as described previously^23,96. Libraries were sequenced on an Illumina NextSeq 500 sequencing platform (MID-mode, 75 base pair (bp) paired-end sequencing protocol).

Strand-seq data preprocessing

Reads from Strand-seq (fastq) libraries were aligned to the hg38 assembly using BWA⁹⁷, as previously described²⁴. Sequence reads with low quality (MAPQ < 10), supplementary reads and duplicated reads were removed. Single-cell library selection was performed as described previously²⁴. The single-cell footprints of different SV classes were discovered using the principle of scTRIP of Strand-seq data using the MosaiCatcher computational pipeline with default settings²⁴.

Coupling NO measurements and SV discovery in the same cell with scNOVA

We developed scNOVA as a computational framework for coupling discovered somatic SVs with analyses of NO profiles in the same cell. The scNOVA workflow covers a set of different operations from single-cell SV discovery (using the previously described scTRIP method²⁴) to NO profiling at CREs, and gene as well as pathway dysregulation inference based on NO at gene bodies, and can be used in a haplotype-aware or -unaware manner (Extended Data Fig. 1). To maximize reusability, interoperability and reproducibility we combined all scNOVA modules into a coherent workflow using snakemake. Alternatively, these modules can be executed individually.

Data analysis and operational definition utilized for NO

We operationally defined NO closely following definitions from a previous study²⁸: NO maps were calculated by counting how many reads from the Strand-seq libraries (which typically comprise mono-nucleosomal fragments around 140–180 bp in size; see Supplementary Table 1 and Supplementary Fig. 1) covered a given bp based on aligning reads to the GRCh38 (hg38) genome assembly with BWA⁹⁷. Genomic regions with unusual (such as artificially high) coverage were considered artifacts, and were automatically excluded (‘blacklisted’) by our Strand-seq analysis workflow as previously described²⁴. No further peak calling or smoothing was conducted, and no assumptions on the length of the nucleosomal DNA were made to derive NO maps, as nucleosome boundaries were determined on both sides of the nucleosome by paired-end sequencing²⁸. For the calculation of NO around bound CTCF binding sites (downloaded from ENCODE³⁴), the averaged profile was scaled²⁸ to yield an NO equal to 1 at position –2,000 bp from the center of the bound CTCF site.

Cell type classification

We generated feature sets from the NO at the body of genes (defined as the region from the TSS to the transcription termination site, which includes exons and introns) at the single-cell level. When several sequencing batches from the same samples were available, we applied batch correction to the NO count matrix using ComBat-seq⁹⁸. NO in gene body regions was normalized by segmental copy number status, and by library size to obtain reads per million, which we transformed into log₂ scale. This feature set was used for the unsupervised dimension reduction plot (Extended Data Fig. 3) and for training of a supervised classification model based on PLS-DA⁹⁹.

Haplotype-phasing of single-cell NO tracks

As previously described, Strand-seq directly resolves its underlying sequence reads onto haplotypes ranging from telomere to telomere³¹ (chromosome-length haplotyping). scNOVA phases NO profiles onto a chromosomal homolog using the StrandPhaseR algorithm³¹, which is employed wherever the template strand segregation pattern of a chromosome enables unambiguous haplotype-phasing, that is, for Watson/Crick (WC) or Crick/Watson (CW) template state configurations in Strand-seq libraries^31,96. Haplotype-specific analyses pursued by scNOVA employ phased reads (normalized by locus copy number), whereas the inference of gene activity changes uses both phased reads (from chromosomes with a WC or CW configuration) and unphased reads (from chromosomes with a CC or WW configuration^31,96).

Inference of haplotype-specific NO and identification of local effects of SVs

To dissect local effects of SVs, the scNOVA framework performs a genome-wide haplotype-specific NO analysis at gene bodies in pseudobulk, which yields a haplotype-specific NO matrix. Using this matrix, scNOVA then scans up to ±1 Mb around each somatic SV breakpoint to infer local effects of these breakpoints on haplotype-specific gene activity, using FDR-adjusted Wilcoxon rank sum tests. Once a local effect on gene activity is identified, scNOVA additionally provides the option to locally scan for CREs exhibiting haplotype-specific NO. To do so, user-provided CRE positions from the cell type of interest are used by scNOVA to calculate haplotype-specific NO at CREs, and the Exact test (10% FDR) is used for significance testing.

Inference of genome-wide changes in gene activity

This haplotype-unaware module of scNOVA considers all reads—whether phased or not—to infer gene activity alterations via analysis of differential patterns of NO along gene bodies. scNOVA obtains gene loci from ENSEMBL (GRCh38.81), converted into bed format (Genebody_hg38.81.bed). Strand-seq reads falling within the start and end position of genes (Genebody_hg38.81.bed) were identified with the Deeptool multiBamSummary function¹⁰⁰, using the following parameters: [multiBamSummary BED-file –BED Genebody_hg38.81.bed –bamfiles Input.bam –extendReads –outRawCounts output.tab -out output.npz]. The scNOVA gene dysregulation inference module contains two steps: Step 1 filters out genes unlikely to be expressed (‘not expressed’, NEs), whereas Step 2 infers dysregulated (that is, differentially expressed) genes between subclones using a generalized linear model.

In Step 1, scNOVA first aims to infer gene expression ‘On’ and ‘Off’ states¹⁰¹ from NO, by analyzing NO as well as gene context-specific sequence features along gene bodies using deep convolutional neural networks¹⁰² (CNNs).

By default, scNOVA operates with the model trained with a pseudobulk of 80 cells, to estimate the probability of each gene to represent an NE in each clone. Genes likely to be unexpressed (NE status probability ≥0.9) across clones are filtered out in Step 1, and all remaining genes used in Step 2.

In Step 2, scNOVA by default employs negative binomial generalized linear models, available in the DESeq2 algorithm¹⁰³, to infer genes with differential activity between individual cells or clones. As an input, scNOVA computes single-cell count tables of gene body NO. When running this step with subclones, all individual cells of the subclone are considered ‘replicates’ in DESeq2 terminology¹⁰³. Subclones (or cells) are compared in a pairwise manner using a two-sided Wald test to infer genome-wide alterations in gene activity. Based on this, we defined the differential gene activity score as the sign of the fold change (FC) in NO at gene bodies, multiplied by –log₁₀ P values. Genes with significantly altered activity were identified using a 10% FDR threshold. Additionally, to facilitate the analysis of small CF subclones, scNOVA provides an alternative mode which employs PLS-DA⁹⁹ to identify discriminatory feature sets as gene sets showing altered activity. To do this, scNOVA builds a PLS-DA⁹⁹ discriminant model to classify cells in a given subclone 1 and subclone 2 based on single-cell count tables of gene body NO as feature sets. This model provides a variable importance of projection (VIP) and significance compared with a null distribution in the form of a P value for each gene analyzed. Similar to the default setting, genes with altered activity were identified using a 10% FDR cutoff when using PLS-DA for inferring changes in gene activity between subclones. Benchmarking both modes (see Extended Data Fig. 4) suggested that, whereas both DESeq2 and PLS-DA offer acceptable performance, the alternative mode (PLS-DA) outperforms the default setting when the subclonal CF is below 10%, whereas the default mode (DESeq2) generated superior results for CF values of 10% or greater.

Genes with altered somatic copy number were masked (removed) when investigating gene activity changes based on NO at gene bodies, since differences in copy number status could confound differential NO measurements.

Molecular phenotype analysis in gene sets

This module of scNOVA uses defined gene sets, obtained from public resources, to identify over-represented sets of functionally related genes changing in activity between subclones (or individual cells). Two types of analyses are enabled by this module: (1) gene set over-representation analysis, which can be used to investigate, for example, the enrichment of targets of an important TF among genes showing a change in activity according to gene body analysis of NO; (2) joint modeling of NO across predefined gene sets, using pathway definitions from MSigDB⁶⁴. Throughout the manuscript, we applied an FDR of 10% (P_adj. < 0.1) as a significance threshold.

In the case of gene set over-representation analysis, we collected TF target genes from database entries (EnrichR⁵⁰) as well as by reviewing the literature. When reviewing the literature, we created curated lists of target genes for TFs based on published genome-wide studies using the following strict criteria: (1) target genes show evidence of binding of the TF of interest by chromatin immunoprecipitation followed by sequencing (ChIP–seq); (2) the same genes must additionally show differential expression when the TF of interest is experimentally silenced (our curated target gene lists are available in Supplementary Table 7). For each TF, the significance of overlap between its target gene set and genes exhibiting differential NO was computed using hypergeometric tests, followed by controlling the FDR at 10%.

To jointly model differential NO across all genes of predefined pathways, scNOVA first generates a single-cell gene body NO table using Strand-seq read count data, with these read counts then being normalized using the median-of-ratios method from DESeq2 (ref. ¹⁰³). For each member in the biological pathway gene sets from MSigDB⁶⁴, scNOVA then computes mean normalized NO values, in each single-cell, as a proxy for pathway-level NO. Lowly variable genes (s.d. <80%) are removed. Pathway-level NO is compared between cells with and without SVs using linear mixed model fitting followed by likelihood ratio testing, and controlling the FDR at 10%. For linear mixed model fitting, SV status is defined as a fixed effect and different Strand-seq library batches are defined as random effects, by scNOVA.

Quantitative real-time PCR

NA20509 was ordered from Coriell and taken into culture at passage four. The late passage was grown until passage eight in a time span of 8 weeks. HG01505 was taken into culture at passage five and was grown until passage nine within a total time span of 6 weeks. DNA, RNA and protein were isolated with the NucleoSpin TriPrep Mini kit (740966.50) according to the manufacturer’s protocol. qPCR was performed on genomic DNA. PCR primers for MAP2K3 and TP53 were obtained from Sigma. qPCR was performed using BD SYBR Green PCR Master Mix (4309155) with a final primer concentration of 300 nM each and 10 ng input gDNA. A GAPDH control region was used as a normalizer. The primer sequences for DNA qPCR are provided in Supplementary Table 17.

Drug treatment with CB-103

Primary human T-ALL cells were recovered from cryopreserved bone marrow aspirates of patients enrolled in the ALL-BFM 2009 study. Patient-derived xenografts were generated as previously described by intrafemoral injection of 1 million viable primary ALL cells in NSG mice¹⁰⁴ Patient-derived xenografts (T-ALL_P1)²⁴ cells were frozen until processing. Human hTERT immortalized primary bone marrow mesenchymal stroma cells (MSC; provided by D. Campana) were cultured in Roswell Park Memorial Institute (RPMI) 1640 medium supplemented with 10% heat-inactivated fetal bovine serum, l-glutamine (2 mM), penicillin/streptomycin (100 IU ml^–1) and hydrocortisone (1 μM). MSCs were seeded in 24-well plates at a concentration of 500,000 cells per well in 1 ml Aim V medium. After 24 h, T-ALL cells were added at a concentration of 1.5 million cells per well in 1 ml Aim V. CB-103 (MedChemExpress, HY-135145) or DMSO (vehicle) as control was added after an additional 24 h at a concentration of 10 μM. After 8 h and 24 h, cells were trypsinized, collected and frozen in 90% fetal bovine serum /10% DMSO.

Single-cell RNA sequencing and data processing

For scRNA-seq library preparation, cryopreserved cells were thawed rapidly at 37 °C and resuspended in 10 ml warm RPMI medium with 100 μg ml^–1 Dnase I. Cells were centrifuged for 5 mins at 300g, and resuspended in ice-cold PBS with 2% fetal bovine serum and 5 mM EDTA. Cells were stained on ice with anti-murine-CD45-PE (mCD45)(clone 30-F11; BioLegend; 1:20) in the dark for 30 mins. 1:100 4,6-diamidino-2-phenylindole (DAPI) was added and incubated in the dark for 5 mins before sorting. Triple negative cells (4,6-diamidino-2-phenylindole-mCD45-GFP^–) were sorted (Supplementary Fig. 32) using a BD FACSAria fusion cell sorter into ice-cold 0.03% bovine serum albumin (BSA) in PBS. All isolated cells were used immediately for scRNA-seq libraries, which were generated as per the standard 10x Genomics Chromium 3′ (v.3.1 Chemistry) protocol. Completed libraries were sequenced on a NextSeq5000 sequencer (HIGH-mode, 75 bp paired-end).

Sequenced transcripts were aligned to both human and mouse genomes (GRCh38 and mm10) and quantified into count matrices using cell ranger mkfastq and count workflows (10X Genomics, v.3.1.0, default parameters). The R package Seurat⁸⁰ (v.4.0.3) was used for quality control of single cells and unsupervised clustering of the data. Briefly, human cells were separated from multiplets/mouse contamination based on greater than 97 % of their reads aligning to GRCh38. Further filtering for high quality cells accepted only those with more than 200 but less than 20,000 total RNA counts, and a percentage of mitochondrial reads less than 10% for the untreated data, and less than 40% for the drug-treated samples. Finally, remaining mouse transcripts were removed before further analysis.

In the untreated data, normalization, scaling and regression of mitochondrial read percentage was carried out using the scTransform package¹⁰⁵. Dimensionality reduction and differential expression analysis of identified clusters was performed as standard using Seurat. Trajectory analysis was performed using Monocle3 (ref. ¹⁰⁶). In the drug treatment data, individual Seurat objects that had been quality controlled as above were normalized by scTransform^105,107 and then integrated to correct for batch effects and allow for comparative analysis. To re-annotate clusters from the untreated data in the drug treatment data, the TransferData() function from Seurat⁸⁰ was used to project labels from our reference (that is, untreated data) onto the integrated drug treatment data. Single-cell gene set enrichment analysis was performed using the R package ‘escape’⁶⁷.

Cellular indexing of transcriptomes and epitopes by single-cell sequencing

A peripheral blood-derived sample (CLL_24) was recovered from cryopreservation as previously described¹⁰⁸ to reach viability above 90%. Then, 5 × 10⁵ viable cells were stained by a premixed cocktail of oligonucleotide-conjugated antibodies (Supplementary Table 14) and incubated at 4 °C for 30 min. We provided dilution used for each antibody in Supplementary Table 14. Cells were washed three times with ice-cold washing buffer. After completion, bead-cell suspensions, synthesis of complementary DNA and single-cell gene expression and antibody-derived tag (ADT) libraries were performed using a Chromium single cell v.3.1 3ʹ kit (10× Genomics) according to the manufacturer’s instructions. Then, 3′ gene expression and ADT libraries were pooled in a ratio of 3:1 aiming for 40,000 reads (gene expression) and 15,000 reads per cell (ADT), respectively. Sequencing was performed on a NextSeq 500 (Illumina). After sequencing, the cell ranger wrapper function (10x Genomics, v.6.1.1) cellranger mkfastq was used to demultiplex and to align raw base-call files to the human reference genome (hg38). The obtained FASTQ files were counted by the cellranger count command. If not otherwise indicated default settings were used. Single-cell gene set enrichment analysis was performed using the R package ‘escape’⁶⁷.

Single-cell gene signature scoring using UCell

The activity of the scNOVA-identified gene set from T-ALL_P1 in scRNA-seq data was profiled using the UCell package⁸¹. Briefly, signature genes considered were those with either increased (implying decreased expression) or decreased (implying increased expression) nucleosome occupancy (see Fig. 5b), or genes encoding TFs whose targets showed differential nucleosome occupancy (see Fig. 5c). The following gene set was used for T-ALL_P1: ‘PRKCB–’, ‘RPS6KA2–’, ‘FAM120B–’, ‘FAM86C1+’, ‘FBXO22+’, ‘RHOH+’, ‘SLC9A7+’, ‘NASP+’, ‘NOTCH1+’, ‘MRPL48+’, ‘MFSD9+’, ‘MVB12B+’, ‘MYB+’ (with ‘+’ for upregulated, and ‘–’ for downregulated). The score per single cell for the entire directional gene set was calculated using the AddModuleScore_UCell() function. Cells were considered to be ‘active’ for the signature genes if their respective UCell score was greater than or equal to the median UCell score of the entire dataset, plus the s.d. Similarly, for T-cell cell-type labeling, marker gene sets for T-cell subsets were obtained from Park et al.¹⁰⁹ and single cells were scored for their activity in each gene set. Cells were labeled by their best-fit cell type, that is the cell-type whose gene set gave the highest UCell score.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41587-022-01551-4.

Supplementary information

Supplementary Information^{(22MB, pdf)}

Supplementary Figs. 1–41, descriptions for Tables 1–17, notes for methodological details and Discussion.

Reporting Summary^{(6.3MB, pdf)}

Supplementary Table^{(1.1MB, xlsx)}

Supplementary Tables 1–17.

Supplementary Data^{(25.1MB, pdf)}

Snapshots of somatic SV events in LCLs.

Acknowledgements

We thank A. Krebs, J. Zaugg, K. Rippe and I. Cortés-Ciriano for providing thoughtful feedback on the development of scNOVA. We also thank M. Paulsen (Flow Cytometry Core Facility) for assistance in cell sorting, B. Raeder for assisting in Strand-seq library preparation, and the EMBL Genomics Core Facility for assisting in single-cell automation (J. Zimmermann and V. Benes) and scRNA-seq library preparation (L. Villacorta). Finally, we thank W. Höps for assistance with single-cell analysis, as well as M. Happich and P. Richter-Pechanska for assistance with RNA-seq analysis. Principal funding came from the European Research Council (ERC Consolidator grant no. 773026, to J.O.K.). Funding also came from the an ERC Starting Grant (grant number 336045) to J.O.K., the National Institutes of Health (grant no. 2U24HG007497-05) to J.O.K. and T.M., the Baden-Württemberg Stiftung (for supporting the projects ‘Epigenetics in T-ALL’ and ‘SV_Surveillance’) to J.O.K. and A.E.K., a Volkswagen Foundation grant (VW - 95826) to J.O.K. and the German Federal Ministry of Education and Research (grant no. 031A537B; de.NBI project) to J.O.K. H.J. and A.D.S. acknowledge fellowships through the Alexander von Humboldt Foundation. We thank the Human Genome Structural Variation Consortium for providing early access to deep bulk RNA-seq data from several LCLs (generated using funds provided by NHGRI Grant 2U24HG007497-05). D.N. is an endowed Professor of the German José-Carreras-Foundation (DJCLSH03/01). J.C.J. was funded by a Gerok position of the ‘Deutsche Forschungsgemeinschaft’ (DFG) (NO 817/5-2, FOR2033, NICHEM). K.K.R. received postdoctoral funding from the Deutsche Krebshilfe (Mildred-Scheel-Fellowship).

Extended data

Author contributions

H.J., K.G., A.D.S. and J.O.K. designed the study (including conceptualization of haplotype-specific NO analysis, cell-type classification and altered gene activity using Strand-seq data). H.J., K.G., A.D.S. and J.O.K. developed the scNOVA computational method. H.J., K.G., A.D.S. and J.O.K. performed single-cell SV discovery. A.D.S. and P.H. performed LCL Strand-seq experiments; K.G., P.-M.B. and S.D. performed CLL Strand-seq experiments; K.G., J.-C.J. and D.N. performed AML Strand-seq experiments and K.G., K.K.R. and P.H. performed T-ALL Strand-seq experiments. T.R. carried out WGS-based SV discovery and verification. H.J., D.P. and T.M. performed haplotype-phasing. LCL scRNA-seq analysis was carried out by H.J.; CLL scRNA-seq analysis by K.G., H.J. and T.R. and T-ALL scRNA-seq analysis by K.G., H.J. and K.K.R. K.K.R., K.G. and B.B. performed drug treatment experiments. K.G., P.H. and E.B. analyzed LCL clonal expansion. LCL RNA-seq analysis was carried out by H.J.; CLL RNA-seq analysis by H.J., S.H., P.-M.B. and S.D. and T-ALL RNA-seq analysis by H.J., B.E.-U. and A.E.K. R.S. and J.O.K. performed PCAWG SV driver spectrum analysis. The manuscript was written by H.J., K.G., A.D.S. and J.O.K., with additional contributions from all authors.

Peer review

Peer review information

Nature Biotechnology thanks Jonas Demeulemeester, Elisa Oricchio, Peter Park and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Data availability

Sequencing data from this study can be retrieved from the European Genome-phenome Archive (EGA) and the European Nucleotide Archive (ENA). LCL data are available under the following accessions: Strand-seq (PRJEB39750, PRJEB55038); RNA-seq (ERP123231); WGS (PRJEB37677). C11 cell line data are available under the accession PRJEB55012. Leukemia patient data and human primary cells derived data were deposited in the European Genome-phenome Archive (EGA) under the following accession numbers: skin fibroblast (EGAS00001006498); cord blood (EGAS00001006567). T-ALL Strand-seq and scRNA-seq (EGAS00001003365), CLL Strand-seq (EGAS00001004925), AML Strand-seq (EGAS00001004903), T-ALL bulk RNA-seq (EGAS00001003248), CLL bulk RNA-seq (EGAS00001005746), CLL CITE-seq (EGAS00001004925). Access to human patient data is governed by the EGA Data Access Committee.

Code availability

The computational code of our analytical framework scNOVA is available open source at https://github.com/jeongdo801/scNOVA, with no restrictions on reuse. Other software used: Mosaicatcher (https://github.com/friendsofstrandseq/mosaicatcher-pipeline), StrandPhaseR (https://github.com/daewoooo/StrandPhaseR), InferCNV (https://github.com/broadinstitute/inferCNV/), HoneyBADGER (https://jef.works/HoneyBADGER/), CONICSmat (https://github.com/diazlab/CONICS), NucTools (https://homeveg.github.io/nuctools), Delly2 (https://github.com/dellytools/delly), BWA (v.0.7.15), STAR (v.2.7.9a), SAMtools (v.1.3.1), biobambam2 (v.2.0.76), deeptools (v.2.5.1), perl (v.5.16.3), Python (v.3.7.4), cuDNN (v.7.6.4.38), CUDA (v.10.1.243), TensorFlow (v.1.15.0), scikit-learn (v.0.21.3), matplotlib (v.3.1.1), R v.4.0.0, DESeq2, FlowJo and BD FACSDiva.

Competing interests

The following authors have previously disclosed a patent application (no. EP19169090) that is relevant to this manuscript: A.D.S., J.O.K., T.M. and D.P. The remaining authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Hyobin Jeong, Karen Grimes

These authors jointly supervised this work: Ashley D. Sanders, Jan O. Korbel.

Contributor Information

Ashley D. Sanders, Email: ashley.sanders@mdc-berlin.de

Jan O. Korbel, Email: jan.korbel@embl.org

Extended data

is available for this paper at 10.1038/s41587-022-01551-4.

Supplementary information

The online version contains supplementary material available at 10.1038/s41587-022-01551-4.

References

1.Priestley P, et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature. 2019;575:210–216. doi: 10.1038/s41586-019-1689-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature. 2020;578:82–93. doi: 10.1038/s41586-020-1969-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Cosenza MR, Rodriguez-Martin B, Korbel JO. Structural variation in cancer: role, prevalence, and mechanisms. Annu. Rev. Genomics Hum. Genet. 2022;23:123–152. doi: 10.1146/annurev-genom-120121-101149. [DOI] [PubMed] [Google Scholar]
4.Stephens PJ, et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011;144:27–40. doi: 10.1016/j.cell.2010.11.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Rausch T, et al. Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell. 2012;148:59–71. doi: 10.1016/j.cell.2011.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Baca SC, et al. Punctuated evolution of prostate cancer genomes. Cell. 2013;153:666–677. doi: 10.1016/j.cell.2013.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Umbreit NT, et al. Mechanisms generating cancer genome complexity from a single cell division error. Science. 2020;368:eaba0712. doi: 10.1126/science.aba0712. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hadi K, et al. Distinct classes of complex structural variation uncovered across thousands of cancer genome graphs. Cell. 2020;183:e32. doi: 10.1016/j.cell.2020.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Minussi DC, et al. Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature. 2021;592:302–308. doi: 10.1038/s41586-021-03357-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Watkins TBK, et al. Pervasive chromosomal instability and karyotype order in tumour evolution. Nature. 2020;587:126–132. doi: 10.1038/s41586-020-2698-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Viswanathan SR, et al. Structural alterations driving castration-resistant prostate cancer revealed by linked-read genome sequencing. Cell. 2018;174:e19. doi: 10.1016/j.cell.2018.05.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.McPherson A, et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 2016;48:758–767. doi: 10.1038/ng.3573. [DOI] [PubMed] [Google Scholar]
13.Weischenfeldt J, et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 2016;49:65–74. doi: 10.1038/ng.3722. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Liu Y, et al. Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X. Nat. Genet. 2020;52:811–818. doi: 10.1038/s41588-020-0659-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.PCAWG Transcriptome Core Group. et al. Genomic basis for RNA alterations in cancer. Nature. 2020;578:129–136. doi: 10.1038/s41586-020-1970-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Northcott PA, et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature. 2014;511:428–434. doi: 10.1038/nature13379. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Dey SS, Kester L, Spanjaard B, Bienko M, van Oudenaarden A. Integrated genome and transcriptome sequencing of the same cell. Nat. Biotechnol. 2015;33:285–289. doi: 10.1038/nbt.3129. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Macaulay IC, et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods. 2015;12:519–522. doi: 10.1038/nmeth.3370. [DOI] [PubMed] [Google Scholar]
19.Yin Y, et al. High-throughput single-cell sequencing with linear amplification. Mol. Cell. 2019;76:e10. doi: 10.1016/j.molcel.2019.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Nam AS, Chaligne R, Landau DA. Integrating genetic and non-genetic determinants of cancer evolution by single-cell multi-omics. Nat. Rev. Genet. 2020;22:3–18. doi: 10.1038/s41576-020-0265-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Nam AS, et al. Somatic mutations and cell identity linked by genotyping of transcriptomes. Nature. 2019;571:355–360. doi: 10.1038/s41586-019-1367-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Cortés-Ciriano I, et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet. 2020;52:331–341. doi: 10.1038/s41588-019-0576-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Sanders AD, Falconer E, Hills M, Spierings DCJ, Lansdorp PM. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs. Nat. Protoc. 2017;12:1151–1176. doi: 10.1038/nprot.2017.029. [DOI] [PubMed] [Google Scholar]
24.Sanders AD, et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat. Biotechnol. 2020;38:343–354. doi: 10.1038/s41587-019-0366-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Schones DE, et al. Dynamic regulation of nucleosome positioning in the human genome. Cell. 2008;132:887–898. doi: 10.1016/j.cell.2008.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Lai B, et al. Principles of nucleosome organization revealed by single-cell micrococcal nuclease sequencing. Nature. 2018;562:281–285. doi: 10.1038/s41586-018-0567-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Struhl K, Segal E. Determinants of nucleosome positioning. Nat. Struct. Mol. Biol. 2013;20:267–273. doi: 10.1038/nsmb.2506. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Teif VB, et al. Genome-wide nucleosome positioning during embryonic stem cell development. Nat. Struct. Mol. Biol. 2012;19:1185–1192. doi: 10.1038/nsmb.2419. [DOI] [PubMed] [Google Scholar]
29.Lam FH, Steger DJ, O’Shea EK. Chromatin decouples promoter threshold from dynamic range. Nature. 2008;453:246–250. doi: 10.1038/nature06867. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Shivaswamy S, et al. Dynamic remodeling of individual nucleosomes across a eukaryotic genome in response to transcriptional perturbation. PLoS Biol. 2008;6:e65. doi: 10.1371/journal.pbio.0060065. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Porubský D, et al. Direct chromosome-length haplotyping by single-cell sequencing. Genome Res. 2016;26:1565–1574. doi: 10.1101/gr.209841.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Kundaje A, et al. Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome Res. 2012;22:1735–1747. doi: 10.1101/gr.136366.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Loda A, Collombet S, Heard E. Gene regulation in time and space during X-chromosome inactivation. Nat. Rev. Mol. Cell Biol. 2022;23:231–249. doi: 10.1038/s41580-021-00438-7. [DOI] [PubMed] [Google Scholar]
36.Yates AD, et al. Ensembl 2020. Nucleic Acids Res. 2020;48:D682–D688. doi: 10.1093/nar/gkz966. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Mardin BR, et al. A cell-based model system links chromothripsis with hyperploidy. Mol. Syst. Biol. 2015;11:828. doi: 10.15252/msb.20156505. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Ebert P, et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021;372:eabf7117. doi: 10.1126/science.abf7117. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Shirley MD, et al. Chromosomal variation in lymphoblastoid cell lines. Hum. Mutat. 2012;33:1075–1086. doi: 10.1002/humu.22062. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Mraz M, et al. The origin of deletion 22q11 in chronic lymphocytic leukemia is related to the rearrangement of immunoglobulin lambda light chain locus. Leuk. Res. 2013;37:802–808. doi: 10.1016/j.leukres.2013.03.018. [DOI] [PubMed] [Google Scholar]
42.Dang S, et al. Dynamic expression of ZNF382 and its tumor-suppressor role in hepatitis B virus-related hepatocellular carcinogenesis. Oncogene. 2019;38:4804–4819. doi: 10.1038/s41388-019-0759-9. [DOI] [PubMed] [Google Scholar]
43.Li Z, et al. A global transcriptional regulatory role for c-Myc in Burkitt’s lymphoma cells. Proc. Natl Acad. Sci. USA. 2003;100:8164–8169. doi: 10.1073/pnas.1332764100. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Marinov GK, et al. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res. 2014;24:496–510. doi: 10.1101/gr.161034.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Li H, et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 2017;49:708–718. doi: 10.1038/ng.3818. [DOI] [PubMed] [Google Scholar]
46.Müller S, Cho A, Liu SJ, Lim DA, Diaz A. CONICS integrates scRNA-seq with DNA sequencing to map gene expression to tumor sub-clones. Bioinformatics. 2018;34:3217–3219. doi: 10.1093/bioinformatics/bty316. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Fan J, et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 2018;28:1217–1227. doi: 10.1101/gr.228080.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Patel AP, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–1401. doi: 10.1126/science.1254257. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.McClintock B. The stability of broken ends of chromosomes in Zea mays. Genetics. 1941;26:234–282. doi: 10.1093/genetics/26.2.234. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Kuleshov MV, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44:W90–W97. doi: 10.1093/nar/gkw377. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Yang X, et al. Discovery of the first chemical tools to regulate MKK3-mediated MYC activation in cancer. Bioorg. Med. Chem. 2021;45:116324. doi: 10.1016/j.bmc.2021.116324. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Byrska-Bishop M, et al. High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell. 2022;185:3426–3440.e19. doi: 10.1016/j.cell.2022.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Zhang Y, Rowley JD. Chromatin structural elements and chromosomal translocations in leukemia. DNA Repair. 2006;5:1282–1297. doi: 10.1016/j.dnarep.2006.05.020. [DOI] [PubMed] [Google Scholar]
54.Erickson P, et al. Identification of breakpoints in t(8;21) acute myelogenous leukemia and isolation of a fusion transcript, AML1/ETO, with similarity to Drosophila segmentation gene, runt. Blood. 1992;80:1825–1831. doi: 10.1182/blood.V80.7.1825.1825. [DOI] [PubMed] [Google Scholar]
55.Xiao Z, et al. Molecular characterization of genomic AML1-ETO fusions in childhood leukemia. Leukemia. 2001;15:1906–1913. doi: 10.1038/sj.leu.2402318. [DOI] [PubMed] [Google Scholar]
56.Mifsud B, et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 2015;47:598–606. doi: 10.1038/ng.3286. [DOI] [PubMed] [Google Scholar]
57.Gutierrez A, et al. The BCL11B tumor suppressor is mutated across the major molecular subtypes of T-cell acute lymphoblastic leukemia. Blood. 2011;118:4169–4173. doi: 10.1182/blood-2010-11-318873. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Döhner H, et al. Genomic aberrations and survival in chronic lymphocytic leukemia. N. Engl. J. Med. 2000;343:1910–1916. doi: 10.1056/NEJM200012283432602. [DOI] [PubMed] [Google Scholar]
59.Dentro SC, et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell. 2021;184:2239–2254.e39. doi: 10.1016/j.cell.2021.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Hewett DR, et al. FRA10B structure reveals common elements in repeat expansion and chromosomal fragile site genesis. Mol. Cell. 1998;1:773–781. doi: 10.1016/S1097-2765(00)80077-5. [DOI] [PubMed] [Google Scholar]
61.Edelmann J, et al. High-resolution genomic profiling of chronic lymphocytic leukemia reveals new recurrent genomic alterations. Blood. 2012;120:4783–4794. doi: 10.1182/blood-2012-04-423517. [DOI] [PubMed] [Google Scholar]
62.Puente XS, et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature. 2015;526:519–524. doi: 10.1038/nature14666. [DOI] [PubMed] [Google Scholar]
63.Malek SN. The biology and clinical significance of acquired genomic copy number aberrations and recurrent gene mutations in chronic lymphocytic leukemia. Oncogene. 2013;32:2805–2817. doi: 10.1038/onc.2012.411. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Liberzon A, et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–1740. doi: 10.1093/bioinformatics/btr260. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Boon EMJ, van der Neut R, van de Wetering M, Clevers H, Pals ST. Wnt signaling regulates expression of the receptor tyrosine kinase met in colorectal cancer. Cancer Res. 2002;62:5126–5128. [PubMed] [Google Scholar]
66.Stoeckius M, et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods. 2017;14:865–868. doi: 10.1038/nmeth.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Borcherding N, et al. Mapping the immune environment in clear cell renal carcinoma by single-cell genomics. Commun. Biol. 2021;4:122. doi: 10.1038/s42003-020-01625-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Garcia-Alonso L, Holland CH, Ibrahim MM, Turei D, Saez-Rodriguez J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 2019;29:1363–1375. doi: 10.1101/gr.240663.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Kamburov A, Herwig R. ConsensusPathDB 2022: molecular interactions update as a resource for network biology. Nucleic Acids Res. 2022;50:D587–D595. doi: 10.1093/nar/gkab1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Böttcher M, et al. Control of PD-L1 expression in CLL-cells by stromal triggering of the Notch-c-Myc-EZH2 oncogenic signaling axis. J. Immunother. Cancer. 2021;9:e001889. doi: 10.1136/jitc-2020-001889. [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Wang Y, et al. Distinct immune signatures in chronic lymphocytic leukemia and Richter syndrome. Blood Cancer J. 2021;11:86. doi: 10.1038/s41408-021-00477-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Fromigué O, Haÿ E, Barbara A, Marie PJ. Essential role of nuclear factor of activated T cells (NFAT)-mediated Wnt signaling in osteoblast differentiation induced by strontium ranelate. J. Biol. Chem. 2010;285:25251–25258. doi: 10.1074/jbc.M110.110502. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Moon JB, et al. Akt induces osteoclast differentiation through regulating the GSK3β/NFATc1 signaling cascade. J. Immunol. 2012;188:163–169. doi: 10.4049/jimmunol.1101254. [DOI] [PubMed] [Google Scholar]
74.Nurieva RI, et al. A costimulation-initiated signaling pathway regulates NFATc1 transcription in T lymphocytes. J. Immunol. 2007;179:1096–1103. doi: 10.4049/jimmunol.179.2.1096. [DOI] [PubMed] [Google Scholar]
75.Park H-J, Baek K, Baek J-H, Kim H-R. The cooperation of CREB and NFAT is required for PTHrP-induced RANKL expression in mouse osteoblastic cells. J. Cell. Physiol. 2015;230:667–679. doi: 10.1002/jcp.24790. [DOI] [PubMed] [Google Scholar]
76.Li L, et al. B-cell receptor-mediated NFATc1 activation induces IL-10/STAT3/PD-L1 signaling in diffuse large B-cell lymphoma. Blood. 2018;132:1805–1817. doi: 10.1182/blood-2018-03-841015. [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Oestreich KJ, Yoon H, Ahmed R, Boss JM. NFATc1 regulates PD-1 expression upon T cell activation. J. Immunol. 2008;181:4832–4839. doi: 10.4049/jimmunol.181.7.4832. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Staal FJT, Famili F, Garcia Perez L, Pike-Overzet K. Aberrant Wnt signaling in leukemia. Cancers (Basel) 2016;8:78. doi: 10.3390/cancers8090078. [DOI] [PMC free article] [PubMed] [Google Scholar]
79.Korbel JO, Campbell PJ. Criteria for inference of chromothripsis in cancer genomes. Cell. 2013;152:1226–1236. doi: 10.1016/j.cell.2013.02.023. [DOI] [PubMed] [Google Scholar]
80.Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018;36:411–420. doi: 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]
81.Andreatta M, Carmona SJ. UCell: robust and scalable single-cell gene signature scoring. Comput. Struct. Biotechnol. J. 2021;19:3796–3798. doi: 10.1016/j.csbj.2021.06.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
82.Street K, et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018;19:477. doi: 10.1186/s12864-018-4772-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
83.Majumder S, et al. Targeting notch in oncology: the path forward. Nat. Rev. Drug Discov. 2021;20:125–144. doi: 10.1038/s41573-020-00091-3. [DOI] [PubMed] [Google Scholar]
84.Study of CB-103 in adult patients with advanced or metastatic solid tumours and haematological malignancies. https://clinicaltrials.gov/ct2/show/NCT03422679 (2017).
85.Lehal R, et al. Pharmacological disruption of the Notch transcription factor complex. Proc. Natl Acad. Sci. USA. 2020;117:16292–16301. doi: 10.1073/pnas.1922606117. [DOI] [PMC free article] [PubMed] [Google Scholar]
86.Drews RM, et al. A pan-cancer compendium of chromosomal instability. Nature. 2022;606:976–983. doi: 10.1038/s41586-022-04789-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
87.Edelmann J, et al. Genomic alterations in high-risk chronic lymphocytic leukemia frequently affect cell cycle key regulators and NOTCH1-regulated transcription. Haematologica. 2020;105:1379–1390. doi: 10.3324/haematol.2019.217307. [DOI] [PMC free article] [PubMed] [Google Scholar]
88.Loh P-R, et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature. 2018;559:350–355. doi: 10.1038/s41586-018-0321-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
89.Zhu C, Preissl S, Ren B. Single-cell multimodal omics: the power of many. Nat. Methods. 2020;17:11–14. doi: 10.1038/s41592-019-0691-5. [DOI] [PubMed] [Google Scholar]
90.Forsberg LA, Gisselsson D, Dumanski JP. Mosaicism in health and disease - clones picking up speed. Nat. Rev. Genet. 2017;18:128–142. doi: 10.1038/nrg.2016.145. [DOI] [PubMed] [Google Scholar]
91.Leibowitz ML, et al. Chromothripsis as an on-target consequence of CRISPR-Cas9 genome editing. Nat. Genet. 2021;53:895–905. doi: 10.1038/s41588-021-00838-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
92.Gaffney DJ, et al. Controls of nucleosome positioning in the human genome. PLoS Genet. 2012;8:e1003036. doi: 10.1371/journal.pgen.1003036. [DOI] [PMC free article] [PubMed] [Google Scholar]
93.Fishilevich S, et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxford) 2017;2017:bax028. doi: 10.1093/database/bax028. [DOI] [PMC free article] [PubMed] [Google Scholar]
94.Porubsky D, et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell. 2022;185:1986–2005.e26. doi: 10.1016/j.cell.2022.04.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
95.Dietrich S, et al. Drug-perturbation-based stratification of blood cancer. J. Clin. Invest. 2018;128:427–445. doi: 10.1172/JCI93801. [DOI] [PMC free article] [PubMed] [Google Scholar]
96.Falconer E, et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat. Methods. 2012;9:1107–1112. doi: 10.1038/nmeth.2206. [DOI] [PMC free article] [PubMed] [Google Scholar]
97.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
98.Zhang Y, Parmigiani G, Johnson WE. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom. Bioinform. 2020;2:lqaa078. doi: 10.1093/nargab/lqaa078. [DOI] [PMC free article] [PubMed] [Google Scholar]
99.Boulesteix A-L, Strimmer K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinform. 2007;8:32–44. doi: 10.1093/bib/bbl016. [DOI] [PubMed] [Google Scholar]
100.Ramírez F, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
101.Ulz P, et al. Inferring expressed genes by whole-genome sequencing of plasma DNA. Nat. Genet. 2016;48:1273–1278. doi: 10.1038/ng.3648. [DOI] [PubMed] [Google Scholar]
102.Eraslan G, Avsec Ž, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 2019;20:389–403. doi: 10.1038/s41576-019-0122-6. [DOI] [PubMed] [Google Scholar]
103.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
104.Schmitz M, et al. Xenografts of highly resistant leukemia recapitulate the clonal composition of the leukemogenic compartment. Blood. 2011;118:1854–1864. doi: 10.1182/blood-2010-11-320309. [DOI] [PubMed] [Google Scholar]
105.Hafemeister C, Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 2019;20:296. doi: 10.1186/s13059-019-1874-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
106.Cao J, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566:496–502. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
107.Stuart T, et al. Comprehensive integration of single-cell data. Cell. 2019;177:e21. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
108.Roider T, et al. Dissecting intratumour heterogeneity of nodal B-cell lymphomas at the transcriptional, genetic and drug-response levels. Nat. Cell Biol. 2020;22:896–906. doi: 10.1038/s41556-020-0532-x. [DOI] [PubMed] [Google Scholar]
109.Park J-E, et al. A cell atlas of human thymic development defines T cell repertoire formation. Science. 2020;367:eaay3224. doi: 10.1126/science.aay3224. [DOI] [PMC free article] [PubMed] [Google Scholar]
110.Schep AN, Wu B, Buenrostro JD, Greenleaf WJ. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods. 2017;14:975–978. doi: 10.1038/nmeth.4401. [DOI] [PMC free article] [PubMed] [Google Scholar]
111.Weirauch MT, et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014;158:1431–1443. doi: 10.1016/j.cell.2014.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
112.Nagel S, et al. Activation of TLX3 and NKX2-5 in t(5;14)(q35;q32) T-cell acute lymphoblastic leukemia by remote 3’-BCL11B enhancers and coregulation by PU.1 and HMGA1. Cancer Res. 2007;67:1461–1471. doi: 10.1158/0008-5472.CAN-06-2615. [DOI] [PubMed] [Google Scholar]
113.Xaus J, et al. The expression of MHC class II genes in macrophages is cell cycle dependent. J. Immunol. 2000;165:6364–6371. doi: 10.4049/jimmunol.165.11.6364. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(22MB, pdf)}

Supplementary Figs. 1–41, descriptions for Tables 1–17, notes for methodological details and Discussion.

Reporting Summary^{(6.3MB, pdf)}

Supplementary Table^{(1.1MB, xlsx)}

Supplementary Tables 1–17.

Supplementary Data^{(25.1MB, pdf)}

Snapshots of somatic SV events in LCLs.

Data Availability Statement

[CR1] 1.Priestley P, et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature. 2019;575:210–216. doi: 10.1038/s41586-019-1689-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature. 2020;578:82–93. doi: 10.1038/s41586-020-1969-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Cosenza MR, Rodriguez-Martin B, Korbel JO. Structural variation in cancer: role, prevalence, and mechanisms. Annu. Rev. Genomics Hum. Genet. 2022;23:123–152. doi: 10.1146/annurev-genom-120121-101149. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Stephens PJ, et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011;144:27–40. doi: 10.1016/j.cell.2010.11.055. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Rausch T, et al. Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell. 2012;148:59–71. doi: 10.1016/j.cell.2011.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Baca SC, et al. Punctuated evolution of prostate cancer genomes. Cell. 2013;153:666–677. doi: 10.1016/j.cell.2013.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Umbreit NT, et al. Mechanisms generating cancer genome complexity from a single cell division error. Science. 2020;368:eaba0712. doi: 10.1126/science.aba0712. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Hadi K, et al. Distinct classes of complex structural variation uncovered across thousands of cancer genome graphs. Cell. 2020;183:e32. doi: 10.1016/j.cell.2020.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Minussi DC, et al. Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature. 2021;592:302–308. doi: 10.1038/s41586-021-03357-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Watkins TBK, et al. Pervasive chromosomal instability and karyotype order in tumour evolution. Nature. 2020;587:126–132. doi: 10.1038/s41586-020-2698-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Viswanathan SR, et al. Structural alterations driving castration-resistant prostate cancer revealed by linked-read genome sequencing. Cell. 2018;174:e19. doi: 10.1016/j.cell.2018.05.036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.McPherson A, et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 2016;48:758–767. doi: 10.1038/ng.3573. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Weischenfeldt J, et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 2016;49:65–74. doi: 10.1038/ng.3722. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Liu Y, et al. Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X. Nat. Genet. 2020;52:811–818. doi: 10.1038/s41588-020-0659-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.PCAWG Transcriptome Core Group. et al. Genomic basis for RNA alterations in cancer. Nature. 2020;578:129–136. doi: 10.1038/s41586-020-1970-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Northcott PA, et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature. 2014;511:428–434. doi: 10.1038/nature13379. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Dey SS, Kester L, Spanjaard B, Bienko M, van Oudenaarden A. Integrated genome and transcriptome sequencing of the same cell. Nat. Biotechnol. 2015;33:285–289. doi: 10.1038/nbt.3129. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Macaulay IC, et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods. 2015;12:519–522. doi: 10.1038/nmeth.3370. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Yin Y, et al. High-throughput single-cell sequencing with linear amplification. Mol. Cell. 2019;76:e10. doi: 10.1016/j.molcel.2019.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Nam AS, Chaligne R, Landau DA. Integrating genetic and non-genetic determinants of cancer evolution by single-cell multi-omics. Nat. Rev. Genet. 2020;22:3–18. doi: 10.1038/s41576-020-0265-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Nam AS, et al. Somatic mutations and cell identity linked by genotyping of transcriptomes. Nature. 2019;571:355–360. doi: 10.1038/s41586-019-1367-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Cortés-Ciriano I, et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet. 2020;52:331–341. doi: 10.1038/s41588-019-0576-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Sanders AD, Falconer E, Hills M, Spierings DCJ, Lansdorp PM. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs. Nat. Protoc. 2017;12:1151–1176. doi: 10.1038/nprot.2017.029. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Sanders AD, et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat. Biotechnol. 2020;38:343–354. doi: 10.1038/s41587-019-0366-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Schones DE, et al. Dynamic regulation of nucleosome positioning in the human genome. Cell. 2008;132:887–898. doi: 10.1016/j.cell.2008.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Lai B, et al. Principles of nucleosome organization revealed by single-cell micrococcal nuclease sequencing. Nature. 2018;562:281–285. doi: 10.1038/s41586-018-0567-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Struhl K, Segal E. Determinants of nucleosome positioning. Nat. Struct. Mol. Biol. 2013;20:267–273. doi: 10.1038/nsmb.2506. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Teif VB, et al. Genome-wide nucleosome positioning during embryonic stem cell development. Nat. Struct. Mol. Biol. 2012;19:1185–1192. doi: 10.1038/nsmb.2419. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Lam FH, Steger DJ, O’Shea EK. Chromatin decouples promoter threshold from dynamic range. Nature. 2008;453:246–250. doi: 10.1038/nature06867. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Shivaswamy S, et al. Dynamic remodeling of individual nucleosomes across a eukaryotic genome in response to transcriptional perturbation. PLoS Biol. 2008;6:e65. doi: 10.1371/journal.pbio.0060065. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Porubský D, et al. Direct chromosome-length haplotyping by single-cell sequencing. Genome Res. 2016;26:1565–1574. doi: 10.1101/gr.209841.116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Kundaje A, et al. Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome Res. 2012;22:1735–1747. doi: 10.1101/gr.136366.111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Loda A, Collombet S, Heard E. Gene regulation in time and space during X-chromosome inactivation. Nat. Rev. Mol. Cell Biol. 2022;23:231–249. doi: 10.1038/s41580-021-00438-7. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Yates AD, et al. Ensembl 2020. Nucleic Acids Res. 2020;48:D682–D688. doi: 10.1093/nar/gkz966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Mardin BR, et al. A cell-based model system links chromothripsis with hyperploidy. Mol. Syst. Biol. 2015;11:828. doi: 10.15252/msb.20156505. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Ebert P, et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021;372:eabf7117. doi: 10.1126/science.abf7117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Shirley MD, et al. Chromosomal variation in lymphoblastoid cell lines. Hum. Mutat. 2012;33:1075–1086. doi: 10.1002/humu.22062. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Mraz M, et al. The origin of deletion 22q11 in chronic lymphocytic leukemia is related to the rearrangement of immunoglobulin lambda light chain locus. Leuk. Res. 2013;37:802–808. doi: 10.1016/j.leukres.2013.03.018. [DOI] [PubMed] [Google Scholar]

[CR42] 42.Dang S, et al. Dynamic expression of ZNF382 and its tumor-suppressor role in hepatitis B virus-related hepatocellular carcinogenesis. Oncogene. 2019;38:4804–4819. doi: 10.1038/s41388-019-0759-9. [DOI] [PubMed] [Google Scholar]

[CR43] 43.Li Z, et al. A global transcriptional regulatory role for c-Myc in Burkitt’s lymphoma cells. Proc. Natl Acad. Sci. USA. 2003;100:8164–8169. doi: 10.1073/pnas.1332764100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Marinov GK, et al. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res. 2014;24:496–510. doi: 10.1101/gr.161034.113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Li H, et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 2017;49:708–718. doi: 10.1038/ng.3818. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Müller S, Cho A, Liu SJ, Lim DA, Diaz A. CONICS integrates scRNA-seq with DNA sequencing to map gene expression to tumor sub-clones. Bioinformatics. 2018;34:3217–3219. doi: 10.1093/bioinformatics/bty316. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Fan J, et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 2018;28:1217–1227. doi: 10.1101/gr.228080.117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Patel AP, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–1401. doi: 10.1126/science.1254257. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.McClintock B. The stability of broken ends of chromosomes in Zea mays. Genetics. 1941;26:234–282. doi: 10.1093/genetics/26.2.234. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Kuleshov MV, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44:W90–W97. doi: 10.1093/nar/gkw377. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Yang X, et al. Discovery of the first chemical tools to regulate MKK3-mediated MYC activation in cancer. Bioorg. Med. Chem. 2021;45:116324. doi: 10.1016/j.bmc.2021.116324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Byrska-Bishop M, et al. High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell. 2022;185:3426–3440.e19. doi: 10.1016/j.cell.2022.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Zhang Y, Rowley JD. Chromatin structural elements and chromosomal translocations in leukemia. DNA Repair. 2006;5:1282–1297. doi: 10.1016/j.dnarep.2006.05.020. [DOI] [PubMed] [Google Scholar]

[CR54] 54.Erickson P, et al. Identification of breakpoints in t(8;21) acute myelogenous leukemia and isolation of a fusion transcript, AML1/ETO, with similarity to Drosophila segmentation gene, runt. Blood. 1992;80:1825–1831. doi: 10.1182/blood.V80.7.1825.1825. [DOI] [PubMed] [Google Scholar]

[CR55] 55.Xiao Z, et al. Molecular characterization of genomic AML1-ETO fusions in childhood leukemia. Leukemia. 2001;15:1906–1913. doi: 10.1038/sj.leu.2402318. [DOI] [PubMed] [Google Scholar]

[CR56] 56.Mifsud B, et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 2015;47:598–606. doi: 10.1038/ng.3286. [DOI] [PubMed] [Google Scholar]

[CR57] 57.Gutierrez A, et al. The BCL11B tumor suppressor is mutated across the major molecular subtypes of T-cell acute lymphoblastic leukemia. Blood. 2011;118:4169–4173. doi: 10.1182/blood-2010-11-318873. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Döhner H, et al. Genomic aberrations and survival in chronic lymphocytic leukemia. N. Engl. J. Med. 2000;343:1910–1916. doi: 10.1056/NEJM200012283432602. [DOI] [PubMed] [Google Scholar]

[CR59] 59.Dentro SC, et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell. 2021;184:2239–2254.e39. doi: 10.1016/j.cell.2021.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Hewett DR, et al. FRA10B structure reveals common elements in repeat expansion and chromosomal fragile site genesis. Mol. Cell. 1998;1:773–781. doi: 10.1016/S1097-2765(00)80077-5. [DOI] [PubMed] [Google Scholar]

[CR61] 61.Edelmann J, et al. High-resolution genomic profiling of chronic lymphocytic leukemia reveals new recurrent genomic alterations. Blood. 2012;120:4783–4794. doi: 10.1182/blood-2012-04-423517. [DOI] [PubMed] [Google Scholar]

[CR62] 62.Puente XS, et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature. 2015;526:519–524. doi: 10.1038/nature14666. [DOI] [PubMed] [Google Scholar]

[CR63] 63.Malek SN. The biology and clinical significance of acquired genomic copy number aberrations and recurrent gene mutations in chronic lymphocytic leukemia. Oncogene. 2013;32:2805–2817. doi: 10.1038/onc.2012.411. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR64] 64.Liberzon A, et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–1740. doi: 10.1093/bioinformatics/btr260. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Boon EMJ, van der Neut R, van de Wetering M, Clevers H, Pals ST. Wnt signaling regulates expression of the receptor tyrosine kinase met in colorectal cancer. Cancer Res. 2002;62:5126–5128. [PubMed] [Google Scholar]

[CR66] 66.Stoeckius M, et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods. 2017;14:865–868. doi: 10.1038/nmeth.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR67] 67.Borcherding N, et al. Mapping the immune environment in clear cell renal carcinoma by single-cell genomics. Commun. Biol. 2021;4:122. doi: 10.1038/s42003-020-01625-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] 68.Garcia-Alonso L, Holland CH, Ibrahim MM, Turei D, Saez-Rodriguez J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 2019;29:1363–1375. doi: 10.1101/gr.240663.118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR69] 69.Kamburov A, Herwig R. ConsensusPathDB 2022: molecular interactions update as a resource for network biology. Nucleic Acids Res. 2022;50:D587–D595. doi: 10.1093/nar/gkab1128. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR70] 70.Böttcher M, et al. Control of PD-L1 expression in CLL-cells by stromal triggering of the Notch-c-Myc-EZH2 oncogenic signaling axis. J. Immunother. Cancer. 2021;9:e001889. doi: 10.1136/jitc-2020-001889. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR71] 71.Wang Y, et al. Distinct immune signatures in chronic lymphocytic leukemia and Richter syndrome. Blood Cancer J. 2021;11:86. doi: 10.1038/s41408-021-00477-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR72] 72.Fromigué O, Haÿ E, Barbara A, Marie PJ. Essential role of nuclear factor of activated T cells (NFAT)-mediated Wnt signaling in osteoblast differentiation induced by strontium ranelate. J. Biol. Chem. 2010;285:25251–25258. doi: 10.1074/jbc.M110.110502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR73] 73.Moon JB, et al. Akt induces osteoclast differentiation through regulating the GSK3β/NFATc1 signaling cascade. J. Immunol. 2012;188:163–169. doi: 10.4049/jimmunol.1101254. [DOI] [PubMed] [Google Scholar]

[CR74] 74.Nurieva RI, et al. A costimulation-initiated signaling pathway regulates NFATc1 transcription in T lymphocytes. J. Immunol. 2007;179:1096–1103. doi: 10.4049/jimmunol.179.2.1096. [DOI] [PubMed] [Google Scholar]

[CR75] 75.Park H-J, Baek K, Baek J-H, Kim H-R. The cooperation of CREB and NFAT is required for PTHrP-induced RANKL expression in mouse osteoblastic cells. J. Cell. Physiol. 2015;230:667–679. doi: 10.1002/jcp.24790. [DOI] [PubMed] [Google Scholar]

[CR76] 76.Li L, et al. B-cell receptor-mediated NFATc1 activation induces IL-10/STAT3/PD-L1 signaling in diffuse large B-cell lymphoma. Blood. 2018;132:1805–1817. doi: 10.1182/blood-2018-03-841015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR77] 77.Oestreich KJ, Yoon H, Ahmed R, Boss JM. NFATc1 regulates PD-1 expression upon T cell activation. J. Immunol. 2008;181:4832–4839. doi: 10.4049/jimmunol.181.7.4832. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR78] 78.Staal FJT, Famili F, Garcia Perez L, Pike-Overzet K. Aberrant Wnt signaling in leukemia. Cancers (Basel) 2016;8:78. doi: 10.3390/cancers8090078. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR79] 79.Korbel JO, Campbell PJ. Criteria for inference of chromothripsis in cancer genomes. Cell. 2013;152:1226–1236. doi: 10.1016/j.cell.2013.02.023. [DOI] [PubMed] [Google Scholar]

[CR80] 80.Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018;36:411–420. doi: 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR81] 81.Andreatta M, Carmona SJ. UCell: robust and scalable single-cell gene signature scoring. Comput. Struct. Biotechnol. J. 2021;19:3796–3798. doi: 10.1016/j.csbj.2021.06.043. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR82] 82.Street K, et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018;19:477. doi: 10.1186/s12864-018-4772-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR83] 83.Majumder S, et al. Targeting notch in oncology: the path forward. Nat. Rev. Drug Discov. 2021;20:125–144. doi: 10.1038/s41573-020-00091-3. [DOI] [PubMed] [Google Scholar]

[CR84] 84.Study of CB-103 in adult patients with advanced or metastatic solid tumours and haematological malignancies. https://clinicaltrials.gov/ct2/show/NCT03422679 (2017).

[CR85] 85.Lehal R, et al. Pharmacological disruption of the Notch transcription factor complex. Proc. Natl Acad. Sci. USA. 2020;117:16292–16301. doi: 10.1073/pnas.1922606117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR86] 86.Drews RM, et al. A pan-cancer compendium of chromosomal instability. Nature. 2022;606:976–983. doi: 10.1038/s41586-022-04789-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR87] 87.Edelmann J, et al. Genomic alterations in high-risk chronic lymphocytic leukemia frequently affect cell cycle key regulators and NOTCH1-regulated transcription. Haematologica. 2020;105:1379–1390. doi: 10.3324/haematol.2019.217307. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR88] 88.Loh P-R, et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature. 2018;559:350–355. doi: 10.1038/s41586-018-0321-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR89] 89.Zhu C, Preissl S, Ren B. Single-cell multimodal omics: the power of many. Nat. Methods. 2020;17:11–14. doi: 10.1038/s41592-019-0691-5. [DOI] [PubMed] [Google Scholar]

[CR90] 90.Forsberg LA, Gisselsson D, Dumanski JP. Mosaicism in health and disease - clones picking up speed. Nat. Rev. Genet. 2017;18:128–142. doi: 10.1038/nrg.2016.145. [DOI] [PubMed] [Google Scholar]

[CR91] 91.Leibowitz ML, et al. Chromothripsis as an on-target consequence of CRISPR-Cas9 genome editing. Nat. Genet. 2021;53:895–905. doi: 10.1038/s41588-021-00838-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR92] 92.Gaffney DJ, et al. Controls of nucleosome positioning in the human genome. PLoS Genet. 2012;8:e1003036. doi: 10.1371/journal.pgen.1003036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR93] 93.Fishilevich S, et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxford) 2017;2017:bax028. doi: 10.1093/database/bax028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR94] 94.Porubsky D, et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell. 2022;185:1986–2005.e26. doi: 10.1016/j.cell.2022.04.017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR95] 95.Dietrich S, et al. Drug-perturbation-based stratification of blood cancer. J. Clin. Invest. 2018;128:427–445. doi: 10.1172/JCI93801. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR96] 96.Falconer E, et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat. Methods. 2012;9:1107–1112. doi: 10.1038/nmeth.2206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR97] 97.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR98] 98.Zhang Y, Parmigiani G, Johnson WE. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom. Bioinform. 2020;2:lqaa078. doi: 10.1093/nargab/lqaa078. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR99] 99.Boulesteix A-L, Strimmer K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinform. 2007;8:32–44. doi: 10.1093/bib/bbl016. [DOI] [PubMed] [Google Scholar]

[CR100] 100.Ramírez F, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR101] 101.Ulz P, et al. Inferring expressed genes by whole-genome sequencing of plasma DNA. Nat. Genet. 2016;48:1273–1278. doi: 10.1038/ng.3648. [DOI] [PubMed] [Google Scholar]

[CR102] 102.Eraslan G, Avsec Ž, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 2019;20:389–403. doi: 10.1038/s41576-019-0122-6. [DOI] [PubMed] [Google Scholar]

[CR103] 103.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR104] 104.Schmitz M, et al. Xenografts of highly resistant leukemia recapitulate the clonal composition of the leukemogenic compartment. Blood. 2011;118:1854–1864. doi: 10.1182/blood-2010-11-320309. [DOI] [PubMed] [Google Scholar]

[CR105] 105.Hafemeister C, Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 2019;20:296. doi: 10.1186/s13059-019-1874-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR106] 106.Cao J, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566:496–502. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR107] 107.Stuart T, et al. Comprehensive integration of single-cell data. Cell. 2019;177:e21. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR108] 108.Roider T, et al. Dissecting intratumour heterogeneity of nodal B-cell lymphomas at the transcriptional, genetic and drug-response levels. Nat. Cell Biol. 2020;22:896–906. doi: 10.1038/s41556-020-0532-x. [DOI] [PubMed] [Google Scholar]

[CR109] 109.Park J-E, et al. A cell atlas of human thymic development defines T cell repertoire formation. Science. 2020;367:eaay3224. doi: 10.1126/science.aay3224. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR110] 110.Schep AN, Wu B, Buenrostro JD, Greenleaf WJ. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods. 2017;14:975–978. doi: 10.1038/nmeth.4401. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR111] 111.Weirauch MT, et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014;158:1431–1443. doi: 10.1016/j.cell.2014.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR112] 112.Nagel S, et al. Activation of TLX3 and NKX2-5 in t(5;14)(q35;q32) T-cell acute lymphoblastic leukemia by remote 3’-BCL11B enhancers and coregulation by PU.1 and HMGA1. Cancer Res. 2007;67:1461–1471. doi: 10.1158/0008-5472.CAN-06-2615. [DOI] [PubMed] [Google Scholar]

[CR113] 113.Xaus J, et al. The expression of MHC class II genes in macrophages is cell cycle dependent. J. Immunol. 2000;165:6364–6371. doi: 10.4049/jimmunol.165.11.6364. [DOI] [PubMed] [Google Scholar]

PERMALINK

Functional analysis of structural variants in single cells using Strand-seq

Hyobin Jeong

Karen Grimes

Kerstin K Rauwolf

Peter-Martin Bruch

Tobias Rausch

Patrick Hasenfeld

Eva Benito

Tobias Roider

Radhakrishnan Sabarinathan

David Porubsky

Sophie A Herbst

Büşra Erarslan-Uysal

Johann-Christoph Jann

Tobias Marschall

Daniel Nowak

Jean-Pierre Bourquin

Andreas E Kulozik

Sascha Dietrich

Beat Bornhauser

Ashley D Sanders

Jan O Korbel

Abstract

Main

Results

NO classifies cell types and predicts gene activity changes

Strand-seq data reveals NO

Fig. 1. Haplotype-aware single-cell multiomics to functionally characterize SVs.

Extended Data Fig. 1. Overview of components of the scNOVA computational workflow.

Extended Data Fig. 2. Read depth of Strand-seq and MNase-seq data stratified into 15 chromatin states defined by Roadmap epigenome consortium33.

Cell-typing

Extended Data Fig. 3. Utility of NO for cell-typing.

Gene activity changes between cell populations

Extended Data Fig. 4. In silico downsampling experiments.

Functional outcomes of SVs in cell lines

Fig. 2. Linking subclonal SVs to their functional consequences in LCLs.

Local effects of copy-balanced driver SVs in leukemia

Fig. 3. Haplotype-specific NO analysis shows local effects of a copy-neutral driver SV in AML.

Extended Data Fig. 5. Haplotype-specific NO analysis in RPE-1 and BM510.

Extended Data Fig. 6. Haplotype-specific NO analysis in T-ALL_P1.

Dissecting functional effects of heterogeneous somatic SVs

Fig. 4. Deconvoluting consequences of subclonal SV heterogeneity in a CLL primary sample.

Extended Data Fig. 7. Inference of SCNAs using CITE-seq data from the CLL_24 sample.

Functional characterization of subclonal chromothripsis

Fig. 5. scNOVA identifies functional effects of a subclonal chromothripsis event.

Fig. 6. Targeting the chromothriptic subclone in cell culture.

Discussion

Methods

Strand-seq library preparation

Strand-seq data preprocessing

Coupling NO measurements and SV discovery in the same cell with scNOVA

Data analysis and operational definition utilized for NO

Cell type classification

Haplotype-phasing of single-cell NO tracks

Inference of haplotype-specific NO and identification of local effects of SVs

Inference of genome-wide changes in gene activity

Molecular phenotype analysis in gene sets

Quantitative real-time PCR

Drug treatment with CB-103

Single-cell RNA sequencing and data processing

Cellular indexing of transcriptomes and epitopes by single-cell sequencing

Single-cell gene signature scoring using UCell

Reporting summary

Online content

Supplementary information

Acknowledgements

Extended data

Author contributions

Peer review

Peer review information

Data availability

Code availability

Competing interests

Footnotes

Contributor Information

Extended data

Supplementary information

References

Associated Data

Extended Data Fig. 2. Read depth of Strand-seq and MNase-seq data stratified into 15 chromatin states defined by Roadmap epigenome consortium³³.