Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 17.
Published in final edited form as: Cell. 2023 Aug 15;186(18):3968–3982.e15. doi: 10.1016/j.cell.2023.07.024

Archival single cell genomics reveals persistent subclones during DCIS progression

Kaile Wang 1,2,16, Tapsi Kumar 1,2,3,4,16, Junke Wang 1,2,3, Darlan Conterno Minussi 1,2,3, Emi Sei 1,2, Jianzhuo Li 1,2, Tuan M Tran 1,2, Aatish Thennavan 1,2, Min Hu 1,2, Anna K Casasent 1,2, Zhenna Xiao 1,2, Shanshan Bai 1,2, Lei Yang 1,2,3, Lorraine M King 5, Vandna Shah 6, Petra Kristel 7, Carolien L van der Borden 7, Jeffrey R Marks 5, Yuehui Zhao 1,2, Amado J Zurita 8, Ana Aparicio 8, Brian Chapin 9, Jie Ye 1,2,3,10, Jianjun Zhang 4,10, Don L Gibbons 10; Grand Challenge PRECISION Consortium11, Ellinor Sawyer 6, Alastair M Thompson 12, Andrew Futreal 4, E Shelley Hwang 5, Jelle Wesseling 13,14, Esther H Lips 13,14, Nicholas E Navin 1,2,3,15,17,*
PMCID: PMC11831769  NIHMSID: NIHMS1919894  PMID: 37586362

Summary

Ductal carcinoma in situ (DCIS) is a common precursor of invasive breast cancer. Our understanding of its genomic progression to recurrent disease remains poor, partly due to challenges associated with the genomic profiling of formalin-fixed paraffin-embedded (FFPE) materials. Here, we developed Arc-well, a high-throughput single cell DNA sequencing method that is compatible with FFPE materials. We validated our method by profiling 40,330 single cells from cell lines, a frozen tissue and 27 FFPE samples from breast, lung and prostate tumors stored for 3–31 years. Analysis of 10 patients with matched DCIS and cancers that recurred 2–16 years later show that many primary DCIS had already undergone whole-genome-doubling and clonal diversification, and that they shared genomic lineages with persistent subclones in the recurrences. Evolutionary analysis suggests that most DCIS cases in our cohort underwent an evolutionary bottleneck, and further identified chromosome aberrations in the persistent subclones that were associated with recurrence.

eTOC

Arc-well enables reliable high-throughput single cell genomic sequencing from formalin-fixed paraffin-embedded clinical oncology samples archived for years to decades, and the application of this method reveals genomic features and evolutionary models in ductal carcinoma in situ associated with cancer progression and recurrence.

Graphical Abstract

graphic file with name nihms-1919894-f0008.jpg

Introduction

Ductal carcinoma in situ (DCIS) is a common precursor of invasive breast cancer and is often detected during screening mammography1. In about 20% of patients, the DCIS recurs or progresses to invasive disease within 15 years, even after local treatment with surgery and radiation2,3. However, the clonal diversity of DCIS and genomic evolution of DCIS to recurrent disease remain unclear. Research in this area has been challenging due to both logistical and technical issues. The collection of matched longitudinal samples of the initial DCIS and recurrence that may present years to decades later has been challenging, since many patients are treated at different hospital or geographical locations. Additionally, the genomic profiling of DCIS tissue which are mainly collected as formalin-fixed paraffin-embedded (FFPE) blocks is challenging, particularly at single cell genomic resolution. Consequently, most previous genomic studies of DCIS have been limited to single time point samples using either synchronous DCIS-invasive tissues, in which invasion has already occurred47 or unmatched DCIS and invasive cancer pairs8,9.

Important questions regarding the biology of DCIS include understanding the evolutionary models of progression and identifying persistent subclones between the DCIS and recurrent disease, which harbor genetic events associated with invasion and recurrence. The main evolutionary question is whether the initial DCIS lesions share a direct genomic lineage with the recurrent DCIS or invasive ductal carcinoma (IDC)10,11 or alternatively, do not share any genomic events (i.e. from independent lineages)4,12. Some previous studies using bulk genomic methods have suggested that many recurrences have a clonal genetic relationship with their matched primary DCIS pairs favoring the direct genomic lineage model1315. Another large bulk genomic study using a cohort of matched DCIS and recurrent cancers, reported that around 20% of cases have independent lineages, suggesting that some recurrences are genetically unrelated to the initial disease16. However, a major limitation of the aforementioned bulk genomic studies is that they could not accurately resolve the clonal substructure of the DCIS and recurrence that is essential for inferring models of evolution (e.g. an evolution bottleneck1012,17,18 or multiclonal evolution4,12).

Over the last decade, the field of single cell genomics has undergone rapid progress since the development of the first single cell DNA sequencing (scDNA-seq) method to profile copy number in human tissues19. While initial methods were based on whole-genome-amplification (WGA) chemistries and were limited to profiling a few cells at a time2024, the development of microdroplet, nanowell and combinatorial indexing methods have vastly increased cell throughput and reduced costs2529. However, these methods all require fresh or snap-frozen tissue samples, which prevents their application for analyzing archival FFPE tissue samples. This represents a major technical barrier, since most clinical tissues with long-term patient outcome data have been stored as archival FFPE blocks. The main problems of FFPE preservation are formalin-induced DNA-protein crosslinking and double-stranded breaks leading to small fragmented DNA molecules30,31 that are difficult to amplify with existing WGA based methods. While one previous study has demonstrated initial feasibility in performing scDNA-seq from a few FFPE tissues, this method had limited cell throughput (N=96), high costs and required lengthy experimental procedures32. To address these issues, we developed Arc-well (Archival nanowell sequencing), a method to perform high-throughput single cell DNA sequencing to profile genomic copy number from thousands of cells in parallel from FFPE tissues.

Results

Arc-well method

To perform Arc-well, an FFPE block is sectioned at 50 μm thickness to obtain 1–3 tissue scrolls that are deparaffinized and dissociated into single nucleus suspensions (Figure 1A). The single nucleus suspensions are stained with DAPI and flow-sorted by fluorescence-activated cell sorting (FACS) to remove degraded cells and enrich cancer cells. After FACS, the nuclei are deposited into a nanowell chip with 5,184 wells and single nuclei (1,200–2,600 per chip) are selected by imaging to avoid doublets, degraded nuclei and empty wells (Figure S1A). Downstream reagents are deposited into the selected nanowells using a five-step equal volume (35 nl each) protocol. First, a lysis buffer is dispensed to release the DNA from the nucleus, followed by dispensing of reagents for tagmentation reactions (Tn5 transposome) and Tn5 inactivation. Additionally, a unique barcode combination is assigned to each nanowell by depositing dual indices (72 × 72 combinations) followed by PCR amplification. The barcoded libraries are then pooled together for next-generation sequencing (STAR Methods).

Figure 1. Arc-well workflow and study overview.

Figure 1.

(A) Overview of the Arc-well method, in which FFPE blocks are sectioned and deparaffinized to generate single nucleus suspensions that are used for FACS sorting (steps 1–4). Next, sorted nuclei are dispensed into nanowell chips followed by imaging of the nanowells for single cell selection. A five-step equal volume dispensing protocol is then used to dispense Arc-well chemistry into nanowell chips to perform cell barcoding (step 5 and inset panel). Finally, the amplified single cell libraries are pooled together for sequencing and data analysis (steps 6–7).

(B) Schematic overview of the study design, in which 20 human breast FFPE samples from 10 patients with primary DCIS and matched recurrent DCIS/IDC were used to profile single cell copy number using Arc-well. The resulting Arc-well data was used to resolve clonal substructure and infer subclone lineages.

(C) Clinical metadata of the 10 patients with matched DCIS and recurrent samples.

See also Figure S1 and Table S1S2.

Compared to previous method such as Acoustic Cell Tagmentation (ACT)26, Arc-well has introduced a number of technical improvements: 1) increased maximum cell throughput to 1,900–2,600 cells, 2) decreased costs of reagents by downscaling reactions (1:30) to nanoliter volumes, 3) optimized tagmentation chemistry, and 4) decreased technical variability by imaging and automating the microdepositing steps (Table S1). More importantly, Arc-well has been developed for compatibility with FFPE tissue, by enabling the amplification of degraded DNA fragments that frequently occur in archival FFPE tissues. During the development of Arc-well, we evaluated the addition of DNA repair protocols prior to performing tagmentation reactions, however, this did not improve the downstream data quality and significantly increases the experimental timeframe (Figure S1B and S1C). To investigate the doublet error rate, we performed species-mixing experiments by mixing fixed (10% formalin neutral buffered) mouse and human B lymphocyte cell lines (A20 and GM12878) and estimated that Arc-well has a doublet rate of 3.3%, which is consistent with previous species-mixing experiments for scRNA-seq using similar nanowell chips (2.4%)33 (Figure S1D). Overall, the Arc-well workflow for processing thousands of cells in parallel is only a two day workflow, which saves weeks of experimental time compared to a previous method using DOP-PCR34 (Figure S1E).

Study overview

We validated Arc-well in two cell lines, a frozen tissue and 27 archival FFPE samples to evaluate technical performance. Next, we applied Arc-well to 20 breast FFPE samples from 10 patients (P3-P12) with long term clinical outcome data (Figure 1B, table S2). The 10 patients with matched DCIS and recurrences (pure DCIS, synchronous DCIS-IDC, or IDC) that recurred after 2–16 years were selected to study the clonal substructure and genomic evolution between the primary DCIS and recurrence time points. The archival FFPE blocks ranged from 4–31 years in age and had intermediate or high-grade DCIS (grade 2–3) with varying estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2) status (Figure 1C, table S2).

Technical performance in cell lines and human tumors

To compare the technical performance of Arc-well to other scDNA-seq copy number methods with no available FFPE data, we first generated Arc-well data on 1,932 non-fixed single nuclei from a diploid human male lymphoblast cell line (315A), as well as 1,138 and 969 single nuclei from two different passages (p37, p38) of the aneuploid MDA-MB-231 breast cancer cell line (MDA231). The resulting data was compared to QC metrics from previously generated data using 5 other scDNA-seq platforms, including ACT26, 10X Genomics’ Chromium Single Cell copy number variation microdroplet system (10X CNV)26, Direct Library Preparation (DLP)35, DLP+27, and DOP-PCR36 (Figure 2A and 2B). To assess performance, we first randomly sampled 80 single cells per group to calculate the overdispersion, a metric that reflects the variance of read count distributions across each 220kb genomic bin (STAR Methods). The results show that all three Arc-well datasets had a significantly reduced overdispersion compared to other platforms (P < 0.05, Wilcoxon tests), but did not show significant differences among these three Arc-well experiments (P ≥ 0.29, Wilcoxon tests) (Figure 2A). When comparing breadth of coverage, all three Arc-well datasets also showed significantly increased genome coverage performance (P < 0.05, Wilcoxon tests) compared to all other platforms when using randomly down-sampled data (500K reads per cell) for the analysis (Figure 2B). Additionally, the copy number profiles of Arc-well of two MDA231 datasets (p37, p38) were highly consistent with the MDA231 dataset (p30-p32) generated by ACT (r = 0.98, Pearson) in co-clustering of these datasets (Figure S2A).

Figure 2. Technical performance of the Arc-well method.

Figure 2.

(A) Comparison of overdispersion metrics for the genomic bin counts and (B) breadth of coverage metrics for six different scDNA-seq methods using cell line or frozen tissue data. Coverage was calculated from 80 randomly sampled cells per methods and using 500K reads per cell as input. The methods using cell lines included Arc-well (315A, diploid; MDA231, aneuploid), ACT (MDA231, aneuploid) and DLP+(GM18507, diploid), while the methods using frozen tissues included 10X CNA, DLP, DOP-PCR.

(C) Overdispersion and breadth of coverage (500K reads per cell) computed from non-fixed and formalin fixed diploid 315A cell line, aneuploid MDA231 cell line and from frozen and FFPE tissue from the same human IDC sample.

(D) Overdispersion of bin counts computed using Arc-well data from 22 breast cancer, 2 lung cancer and 2 prostate cancer samples.

(E) Correlations between FFPE block age and QC metrics for the mean overdispersion, the mean PCR duplicate rates, and correlations between DNA integrity number (DIN) and overdispersion metrics.

(F) Two examples of copy number profiles with ratio values (dots) and segmentation values (lines) of two different single cells from patient P6 (cell 1: ArcN759-ArcS519) and patient P10 (cell 2: ArcN741-ArcS541, table S6). BA: block age of FFPE samples.

(G) UMAP plot of single cell copy number profiles from FFPE tissue of P1, where each color represents a subclone.

(H) Clustered heatmap of single cell copy number profiles for P1 (top panel) and bottom panel shown the consensus integer CNA profiles of each subclone with selected breast cancer genes annotated below.

See also Figure S2S3 and Table S3S4, S6.

Next, we performed formalin fixation (10% neutral buffered) of the 315A and MDA231 (p37) cell lines, and then ran Arc-well in parallel with non-fixed 315A and MDA231 cells to understand the impact of fixation on the data. The 315A data shows that even though the overdispersion and breadth of coverage metrics of the fixed cell line were slightly worse than the non-fixed data (P < 0.05, Wilcoxon tests, Figure 2C, table S3), both datasets had significantly better metrics than the 10X CNV, DLP, DLP+ and DOP-PCR datasets (P < 0.05, Wilcoxon tests), which were all generated from non-fixed samples. Similarly, the fixed MDA231 dataset showed lower breadth of coverage compared with non-fixed MDA231, but was significantly better than the 10X CNV, DLP, DLP+ and DOP-PCR datasets (P < 0.05, Wilcoxon tests). We next checked the single cell copy number profiles of both non-fixed and formalin fixed 315A and MDA231 cell lines to determine if any technical artifacts were introduced during fixation. Both non-fixed and fixed 315A datasets showed the same flat diploid copy number profiles with no new CNA events introduced (Figure S2B). In the MDA231 non-fixed and fixed datasets, clustering identified 10 subclones, where each of the subclones was composed of mixtures of cells from both conditions, with no missing CNA events or subclones due to fixation (Figure S2A).

We further investigated the performance of Arc-well in freshly prepared human FFPE tissue samples, by splitting a fresh human IDC sample into two pieces and placing one piece in optimal cutting temperature (OCT) block for cryofreezing, while the other piece was embedded into a FFPE block. We isolated single nucleus suspensions from the OCT and FFPE blocks and performed Arc-well (Figure 2C, Figure S2C). Consistent with the comparison between non-fixed and formalin fixed cell lines, we found decreased QC metrics including higher overdispersion and lower coverage breadth in the FFPE sample (P < 0.05, Wilcoxon tests, Figure 2C). However, both metrics were better than all other scDNA-seq methods that were compared (P < 0.05, Wilcoxon tests). We next merged the OCT and FFPE datasets and performed co-clustering analysis, which showed that the FFPE IDC samples had similar high-quality copy number aberration (CNA) profiles as the OCT sample. Arc-well identified 4 different subclones (c1-c4) including 3 subclonal genotypes were shared by both samples and 1 subclone was specific to the FFPE sample, which may be due to spatial sampling differences. (Figure S2C).

Validation of Arc-well in different FFPE tissues

We next applied Arc-well to FFPE blocks that ranged from 3–31 years in block age (BA) from 22 DCIS and IDC tissues (Figure 2D), 2 lung adenocarcinoma samples (Lung-P1, Lung-P2) and 2 castration-sensitive prostate cancer samples (Prostate-P1, Prostate-P2) from human patients to evaluate QC performance in a variety of different tissues and cancer types. Nuclei enriched from aneuploid (A) peaks and diploid (D) peaks by FACS were loaded into nanowell chips to perform Arc-well. In total 27,851 single nuclei from the 26 samples were profiled, of which 15,125 (54.31%) aneuploid nuclei were used for genomic copy number analysis after removing diploid nuclei and low-quality nuclei (STAR Methods, table S3S4). The aneuploid nuclei had 1.60M reads per nucleus on average (range 747K - 3.65M) with a mean of 44.05% PCR duplicate reads (range 15.64%−89.54%) (Tables S3S4). After removing duplicated reads, an average of 64 reads per bin were obtained at 220kb genomic resolution. The Arc-well data showed a median overdispersion of 0.021 (MAD 0.014) that ranged across the patient samples (Figure 2D). To determine if the QC metrics and data quality were associated with the age of the FFPE blocks, we correlated the FFPE block age of 22 breast samples with the sample overdispersion (r = 0.04, Pearson, P = 0.86), PCR duplicates rates (r = 0.14, Pearson, P = 0.52) and DNA integrity numbers (DIN, r = −0.11, Pearson, P = 0.68) (Figure 2E). These data showed that older blocks, and lower DIN did not correlate with poorer QC metrics, suggesting that even very old FFPE blocks (e.g., 20–30 years old) could be used for analysis.

To better understand the accuracy of copy number detection at 220kb resolution at different sequencing depths, we downsampled the single cell data from 3 different samples (MDA231-p38, P1R, P4P) at decreasing read counts (1M – 50K reads per cell) that varied in their PCR duplicate rates (16.43% - 74.36%). We correlated the resulting segmented copy number profiles from the down sampled data to the original profiles with high coverage depths (STAR Methods). This analysis showed that the correlations of genomic profiles remained high (rho = 0.96, Spearman, SD: ± 0.025) using 250K or more reads per cell, even in samples with higher PCR duplicate rates (Figure S2D).

As a visual assessment of the data quality, we plotted the copy number ratio and segmentation plots for 2 single cells from two different patients (P6P, P10R) with older block ages (24–25 years) (Figure 2F). These data show that the copy number ratio values of bins within segments were well distributed along the integer copy number states. Overall, the data generated using Arc-well showed good QC metrics for measuring genomic copy number profiles from FFPE blocks, even in samples that were decades old.

Next, to determine if Arc-well can resolve copy number substructure in a variety of FFPE samples, we analyzed three different cancer types: 2 breast cancers, 2 lung cancers and 2 prostate cancers. Within the breast samples, P1R was a high-grade recurrent DCIS tumor with an FFPE block age of 4 years from which 1,082 single nuclei were profiled, and 884 were identified as aneuploid nuclei (Table S4). Unbiased clustering detected 9 subclones (c1-c9) in the UMAP and clustered heatmap (Figure 2G and 2H). The second breast sample (P2P) was a high-grade pure DCIS tissue with an FFPE block age of 22 years (table S2). A total of 1,057 single nuclei were profiled by Arc-well, and 762 aneuploid nuclei were used to cluster and identify 13 subclones with distinct genotypes (Table S4, Figure S2E). In the other cancer types, the Arc-well data from 2 lung cancers and 2 prostate cancers included a total of 3,867 single nuclei that were profiled, of which 2,657 aneuploid nuclei were used for analysis after removing low quality and diploid nuclei (Figure S3A and S3B, table S3). Clustering identified 10–12 subclones in the two lung cancer samples and 12–14 subclones in the two prostate samples (Figure S3B).

Clonal diversity across patients with DCIS and recurrences

To investigate the copy number substructure of primary DCIS with matched recurrent DCIS or IDC, we analyzed Arc-well data from 10 patients (P3-P12). Most patients (8/10) had recurrent invasive or synchronous disease (P5-P12). However, two patients had recurrent DCIS lesions (P3, P4) (Table S2). While many of the recurrent invasive tumors had synchronous DCIS-IDC (P5, P7, P10, P11, P12), histopathology showed that they were mostly IDC with a very low proportion of DCIS cells (<10%). Furthermore, the tumor margins of all primary DCIS cases with available clinical data had no detectable tumor cells after the initial surgery by histopathology.

A total of 10,822 aneuploid nuclei in the 10 patients were used for downstream analysis after removing low quality cells and diploid cells (Figure S4A and S4B, table S4). On average, 527 (± 180 SEM) primary DCIS nuclei and 555 (± 306 SEM) recurrent nuclei were profiled from each FFPE sample (Figure 3A). Clustering analysis identified 1–13 (median 6) subclones in the primary disease samples and 1–15 (median 9) subclones in the recurrent samples (Figure 3A and 3B, STAR Methods). These data showed that 8 patients did not share any subclones with the same copy number profile between the primary and recurrent tumors (P3-P10). In contrast, two patients (P11 and P12) had 1 and 5 shared subclones between the matched DCIS and recurrences, respectively (Figure 3A and 3B). The total number of CNA events varied across the tumors, but was not significantly different between the primary and recurrence (P = 0.28, paired samples Wilcoxon test: PSWT) (STAR Methods, Figure 3C). Next, we estimated clonal diversity by computing the single cell mean pairwise phylogenetic distance (MPD) of the primary and recurrence tumor cells (Figure 3D). Based on our cell downsampling analysis, we found that the clonal diversity index (MPD) was stable using 200 or more cells to perform these calculations (Figure 3E). The MPD showed a range of values across patients, but did not show a significant difference between the primary DCIS and recurrent cancer time points (p = 0.56, PSWT) (Figure 3D, STAR Methods).

Figure 3. Overview of genomic diversity in DCIS with matched recurrent disease.

Figure 3.

(A) Bar plots of the shared and unique subclones (top panel) and number of cells (bottom panel) across all 10 patients.

(B) UMAP plots of single cell copy number profiles from each patient colored by subclones/clusters and timepoints.

(C) Line plots showing the change in the number of copy number events between the matched primary DCIS and recurrences (p = 0.28, PSWT).

(D) Diversity index (MPD) of subclonal frequencies in the 10 patients between the matched primary DCIS and recurrences (p = 0.56, PSWT).

(E) The correlation between MPD diversity index and the number of cells included in the calculation for all 10 patients with paired samples.

(F) Correlations between the diversity index (MPD), number of CNAs events and patient clinical features (ER, PR, HER2, histology and disease grade). All tests were performed by Wilcoxon test. * Indicates the tests show a significant difference (p< 0.05) between the groups.

(G) FACS DNA ploidies from each primary DCIS and recurrent sample for each patient.

See also Figure S4 and Table S4.

We further investigated the associations between the clonal diversity index (MPD) and the total CNA event counts with several pathological parameters: ER, PR and HER2 receptor positivity, histology (DCIS or IDC) and DCIS grade (intermediate or high) (Figure 3F). This analysis did not show any significant associations between these features and the MPD (P > 0.05, Wilcoxon tests). However, the data did show that ER-negative status, PR-negative status, HER2-positive status and high-grade were significantly associated with higher CNA event numbers (P < 0.05, Wilcoxon tests). This result is consistent with previous bulk genome sequencing studies of invasive breast cancer3739.

To determine the DNA ploidy of each DCIS and recurrence, we calculated the mean values based on the DAPI signal from FACS (Figure 3G, Figure S4A). These data showed that 9 of the matched DCIS and recurrent samples had a DNA ploidy larger than 2.7N, suggesting that whole genome doubling (WGD) had occurred in most samples. In many patients (8/10), the DNA ploidy showed highly similar values between the DCIS and recurrent samples. However, in P4 and P5, the ploidy increased more than 1N in the recurrent disease, suggesting that WGD occurred a second time during disease progression. In contrast to most patients, the DNA ploidy from both timepoints of patient P3 were close to 2N, suggesting that no WGD occurred in this patient.

Clonal substructure of DCIS and recurrences in individual patients

We further analyzed the detailed copy number substructure of individual patients with DCIS and matched recurrent disease. Patient P3 had an initial DCIS that was treated with surgery and recurred as a DCIS lesion after 2 years. Clustering of 734 aneuploid nuclei identified 4 major subclones, including 3 in the DCIS and 1 in the recurrence (Figure 4A). The 4 subclones shared clonal losses of chr11q, chr17 (TP53, BRCA1) and chrX, suggesting a common genetic lineage between the primary DCIS and recurrence. Additionally, several CNA events were acquired in the recurrent subclone (c4). In patient P5, a total of 1,872 aneuploid nuclei from the primary and its paired recurrent IDC tumor that emerged after 2 years of the initial treatment (Figure 4B) were profiled. A total of 16 subclones were detected in P5 and several shared clonal CNA events were identified, including a gain of chr1q and losses of chr2p, 12q and 16q in both the DCIS and recurrent IDC. These data showed 5 subclones that were specific to the primary DCIS, and 11 that were specific to the recurrent IDC. Patient P6 had a primary DCIS that recurred as IDC after 7 years. A total of 11 subclones were detected from 1,377 aneuploid nuclei by clustering (Figure 4C). Most of the CNA events were specific to each of the two matched time points, however, all 11 subclones shared a focal gain of chr8q (MYC), chr17 (ERBB2), and a loss of chr17p (TP53).

Figure 4. Clonal substructure of matched primary DCIS and recurrences Clustered heatmaps of single cell copy number profiles showing subclones in matched DCIS and recurrent DCIS/IDC in 4 patients (P3, P5, P6, P7).

Figure 4.

(A) to (D). Upper panels show the histopathological H&E images (scale bar: 50 μm) and clinical timelines of primary diagnosis and recurrences for each patient. Bottom panels show single cell copy number clustered heatmaps, with left header columns indicating time points and subclone groups. Bottom annotation panels indicate the clonal and subclonal classification of CNAs and selected cancer gene annotations.

See also Figure S5S6.

In another DCIS patient (P7), the individual was diagnosed with DCIS in 1990 and treated with surgery, after which they presented with recurrent invasive breast cancer 16 years later. Despite this long timeframe, the two subclones of primary DCIS (c1, c12) and the 10 subclones in the recurrence shared many clonal CNA events, such as gains of chr1q (SHC1, AKT3), chr5 (FGFR4), chr6p, chr15q and chr17 (ERBB2), and losses of chr2q, chr3p (FHIT), chr4p, chr8p (PPP2R2A), chr11p, chr16q and chr17p (TP53). Similarly, in the 4 other DCIS patients (P4, P8 - P10) the genotypes of most subclones were specific to the primary or recurrent disease, but shared multiple clonal CNAs (Figure S5, Figure S6A).

In contrast, in one patient (P11), clustering of Arc-well data of 947 single nuclei identified 7 subclones, including 6 subclones that were specific to the primary DCIS and 1 subclone shared by the primary and recurrence (Figure S5D). Similarly, in patient P12, clustering of 855 single nuclei identified 20 subclones, including 5 shared subclones between the DCIS and IDC tumors (Figure S6B). The clonal CNAs included losses of chr9p (CDKN2A), chr10q (PTEN), chr11q, chr13 (BRCA2, RB1), chr17p (TP53) and gains of chr8 (MYC), chr17q (ERBB2), as well as a focal gain of chr11q13 (CCND1) that were shared among all subclones, except for c2. Overall, Arc-well delineated the detailed clonal substructure in the primary DCIS and matched recurrences in all 10 patients and showed that all samples shared a common genetic lineage between the initial DCIS and the recurrent cancers.

Genomic evolution of DCIS to recurrent disease

To reconstruct genomic evolution, we first computed consensus clonal genotypes of integer copy number profiles from the subclone clusters (STAR Methods). The consensus subclones were used to reconstruct clonal lineages using MEDICC240 (Figure 5, Figure S6C and S6D, Figure S7). From these data we inferred the most recent common ancestor (MRCA), the DCIS primary common ancestor (PCA) and the recurrence common ancestor (RCA). We then compared the PCA, or persistent subclones, of the DCIS to the RCA to identify CNA events associated with recurrence. We also annotated the timing of the WGD events in the lineages based on the FACS ploidy data.

Figure 5. Evolutionary lineages of matched DCIS and recurrences.

Figure 5.

(A) to (D). Left panels show the event-based evolutionary trees of subclonal consensus integer copy number profiles that are rooted by a diploid profile and annotated for WGD events based on the FACS DNA ploidy data for four patients (P3, P5, P6 and P7). The blue dots on the tree represent the most recent common ancestor (MRCA), while the purple dots represent the primary common ancestor (PCA), and green dots represent the recurrence common ancestor (RCA). The top-right panels show the heatmap of subclonal consensus integer copy number profiles with the right annotation bar representing the cell fractions at each time point. The bottom-right panels show the heatmap of the inferred PCA and RCA (or subclone) profiles with selected gene annotations, in which the orange bars represent the regions that have different copy number states between the PCA and RCA. Selected cancer genes are annotated below the heatmaps.

See also Figure S6S7.

In patient P3 with a primary DCIS and a DCIS recurrence, the three primary subclones (c1-c3) were traced back to a common PCA, from which the ancestral genotype was inferred (Figure 5A). The recurrence had only a single subclone (c4) which was compared to the PCA to identify CNA events associated with recurrence in the persistent subclone, including losses of chr3p (FHIT) and 8p, and gains of chr3q (PIK3CA) and chr8q (MYC). In P5 with a DCIS to IDC recurrence, the early DCIS cancer cells underwent a WGD event between the initial diploid cells and the MRCA, which increased the DNA ploidy from 2N to 3.6N (Figure 5B). Additionally, this cancer underwent a second WGD event between the PCA and the RCA increasing the DNA ploidy from 3.6N to 5.2N, after which many additional recurrent subclones (c2-c12) diverged. All the recurrent subclones (c2-c12) formed a separate clade in the lineage that was traced back to a single RCA, suggesting that they went through an evolution bottleneck. Compared to the PCA (ancestor of c1, c13-c16), the RCA acquired many chromosomal gains after the WGD event, including chr1q (AKT3), chr5 (FGFR4), chr7(EGFR), chr8 (MYC), chr19 (CCNE1) and chr22 (AURKA), which were present in the persistent subclone and associated with invasive progression. In another patient (P6), the clonal lineages indicated that WGD occurred between the initial diploid cells and the MRCA, increasing the DNA ploidy from 2N to 3.3N (Figure 5C). Only one subclone was detected in the primary DCIS, while all the subclones from the invasive recurrence were present in one clade, indicating a monoclonal expansion from a single subclone. Arc-well identified CNA events associated with recurrence, including gains of chr3 (PIK3CA), chr8 (CCNE2), chr12 (MDM2), chr19 (CCNE1) and chr20 (AURKA) and losses of chr9p (CDKN2A), chr13 (RB1, BRCA2). In P7, the MRCA underwent WGD and acquired 67 CNA events, after which the lineage diverged into PCA and RCA (Figure 5D). All 10 recurrent subclones were clustered together, suggesting an expansion from a single ancestral subclone. Consistent with these cases, evolutionary analysis of the four other patients (P4, P8, P9, P11) also inferred that a single common ancestor from the DCIS formed the recurrent subclones (Figure S6CS6D, Figure S7AS7B).

In contrast, two patients (P10 and P12) had two or more subclones that expanded from the primary DCIS to the recurrent lesions (Figure S7C and S7D). In P10, with a DCIS that recurred as a DCIS-IDC cancer, the WGD event occurred from the diploid cell to the MRCA, increasing the DNA ploidy from 2N to 3.9N and acquiring 9 CNA events in the MRCA (Figure S7C). The MRCA diverged into one subclone (c15) that lost one CNA event, and another lineage that formed 9 primary subclones and 6 recurrent subclones. The lineage structure suggests that the recurrent subclones derived from two different common ancestors, which persisted in the tumor mass and re-expanded to form the invasive tumor. Patient P12 had a primary DCIS that recurred as IDC and had an early WGD event that increased the DNA ploidy from 2N to 3.6N (Figure S7D). The MRCA then diverged into one clade that formed the recurrence subclone (c2), and the other clade that formed most of the subclones (c1, c3-c20) in the primary DCIS and recurrence. The primary and recurrence subclones shared 5 identical branches that intermixed together, suggesting a multiclonal progression model.

A consistent finding across most patients (7/8 cases), was that the RCA had an increased number of CNA events compared to the PCA, indicating that the RCA emerged later than the PCA and represented a more advanced evolutionary time point (P = 0.04, PSWT, Figure 6A). These data suggest that the classification of monoclonal expansions through evolutionary bottlenecks versus multiclonal expansions is unlikely due to undersampling spatial areas of the tissues.

Figure 6. CNA events associated with recurrence across different patients.

Figure 6.

(A) Number of accumulated CNA events in the inferred PCA and RCA after the divergence of the MRCA in patients with an evolutionary bottleneck model of progression.

(B) The top panel shows a heatmap of copy number profiles calculated from the difference between the PCA and RCA across the 7 patients with evolutionary bottleneck lineages (excluding P11 in which the recurrence shared its genotype with the primary DCIS). Profiles represent CNA events that were acquired or lost in the recurrences after the primary DCIS. The bottom panel shows the frequency of recurrence-associated CNA events (losses, gains or neutral events) as a histogram of variable bins from the 7 patients with selected cancer genes annotated below. The y-axis represents the number of patients that had the loss, gain or neutral CNA events.

Finally, we used the 7 samples with evolutionary bottlenecks (with exception of P11), to identify common CNA events in the persistent subclones that recurred across patients (Figure 6B, STAR Methods). This analysis involved comparing CNA events in the RCA that differed from the PCA in the primary DCIS lesions and computing the CNA event frequencies of chromosome gains and losses. The resulting analysis showed an increased frequency of CNA gains in the recurrences in chr3q (PIK3CA) in 4/7 samples, chr5p in 5/7 samples, chr8p (MYC, CCNE2) in 4/7 patients, 9q (4/7 patients) and 20q (ZNF217, AURKA) in 4/7 samples.

Discussion

Here, we report the development of Arc-well, a method that can achieve high-throughput scDNA-seq of thousands of cells in parallel from archival FFPE materials. Arc-well opens new avenues of investigation into large collections of clinical FFPE blocks that have been collected and archived around the world, often with long-term clinical outcome data that can be leveraged to study cancer and other human diseases33. Additionally, we showed that Arc-well can perform high-throughput scDNA-seq on unfixed tissues (fresh or frozen) and resulted in improved data quality when benchmarked to many existing scDNA-seq methods.

By studying DCIS samples with matched recurrent disease that occurred years to decades later, our data shows that genome doubling, extensive aneuploidy and genomic rearrangements were already present in most DCIS lesions, prior to the progression of recurrent disease. This data is consistent with early microarray CGH studies of DCIS lesions3,4143, but provide far more granular data on the clonal substructure of the DCIS lesions. From our data we inferred the genotype of the persistent subclones in many patients (7/10 patients) and directly observed the persistent subclones in two patients (P11, P12) (Figure S7B and S7D). By comparing the RCA to the PCA, we identified many patient-specific CNA events that may be associated with progression and recurrence in the persistent subclones. By comparing data across patients, we identified recurrent CNAs that harbored cancer genes (eg. PIK3CA, MYC, CCNE2, ZNF217, AURKA) that may be associated with DCIS recurrence and progression.

Our inferred evolutionary lineages, suggest that most DCIS lesions (8/10) underwent an evolutionary bottleneck, in which a common ancestor of persistent subclones was selected and expanded to form the recurrent cancers (Figure 7A). Understanding the evolutionary bottleneck model in DCIS is important, as it indicates a role for tumor-intrinsic genetic events in DCIS progression and invasion. We also identified two patients (P10, P12) with a multiclonal progression model (Figure 7B) which may also indicate a potential role of tumor-extrinsic events, such as the microenvironment and breakdown of the myoepithelial layer of the ducts in DCIS progression and invasion. Both models suggest that the treatment of the initial DCIS with surgery failed to eliminate all of the residual cancer cells and allowed one or more subclones to persist and expand to form the recurrent lesions many years later. Notably, our study did not identify any cases with independent genomic lineages (Figure 7C), which may be due to the small cohort of patients in this study.

Figure 7. Evolutionary models of primary DCIS to recurrent disease progression.

Figure 7.

(A) In the Evolutionary Bottleneck model, subclones diverge and expand in the ducts after which a single clone is selected and persists until it expands to form the recurrent disease many years to decades later.

(B) In the multiclonal evolution model, subclones diverge and expand in the DCIS and then multiple subclones persist in the tissue until they co-invade and expand to form the recurrent tumor many years later.

(C) In the independent evolution model, subclones in primary and recurrence originate from different normal epithelial cells and have no shared copy number events.

(A-C) In the lower panels, the expected phylogenies are shown that are consistent with each model, in which the evolutionary bottleneck has distinct lineages for subclones in the primary and recurrent disease, while the multiclonal evolution model has intermixing of many subclonal genotypes across the two time points in the clonal lineages, and the independent evolution model have no common MRCA that is shared across the two different time points.

From a clinical perspective, while most cases had clear surgical margins, our data shows that rare residual subclones that were not detected by initial histopathology, persisted in the breast tissues and recurred many years later (Table S2). This may be due to the complex 3-dimensional structure of the ductal-lobular network and the clinical challenge of detecting residual DCIS cells by histopathology in surgical margins. Overall, our data suggest that in these specific DCIS patients, adding more treatment (eg. radiation, chemotherapy, and/or target therapy) or implementing better imaging technologies to detect the residual cells may be needed to eliminate all the residual cancer cells after the initial surgery. However, in most DCIS patients, particularly in low-grade disease, ongoing research suggests that treatments should ideally be scaled back.

In closing, Arc-well provides a powerful, high-throughput technology for analyzing large collections of clinical FFPE blocks that exist around the world45. As newer spatial DNA-technologies are beginning to emerge4,44, an important future direction will be to extend Arc-well for spatial mapping in FFPE tissues. Overall, we expect that Arc-well will have utility for studying many cancer types and research areas such as premalignant progression, metastatic dissemination and therapy response46,47. Beyond cancer, Arc-well may also have broad applications for studying copy number diversity from neurological diseases, normal tissues, embryological samples and development stages4851 to improve our understanding of basic biology and human diseases.

Limitations of the study

This study has a few limitations. First, the FFPE dissociation protocol used in this manuscript only retains single nuclei after lysis with NST (Nonidet P40 with salts and Tris) buffer, which could potentially miss DNA outside of the nucleus, such as micronuclei. Additionally, cells in specific phases of the cell cycle may not be captured when the nuclear membrane disassembles. Second, detecting mutations accurately from single cells using Arc-well may present a significant challenge, due to the sparsity of sequencing reads to call mutations and the high number of false-positive deamination events in FFPE samples. Third, the patient cohort used to study DCIS progression was limited to N=10 patients, therefore it would be important to evaluate the generalizability of biological results and evolutionary models in larger cohorts of patients.

STAR Methods

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Nicholas Navin (nnavin@mdanderson.org).

Materials availability

All unique/stable reagents generated in this study are available from the Lead Contact, Nicholas Navin (nnavin@mdanderson.org).

Data and Code Availability

  • All sequencing data generated in this study have been deposited to the Sequence Read Archive (SRA): PRJNA799605.

  • The codes used in this paper are available at https://github.com/navinlabcode/Arc-well.

  • Any additional information required to reanalyze the data reported in this work paper is available from the lead contact upon request.

Experimental model and study participant details

Cell Lines

The human breast cancer cell line (MDA-MB-231, female) was cultured in Dulbecco’s Modified Eagle’s Medium-high glucose (DMEM, Sigma, D5976) medium supplemented with 10% FBS (Sigma, F0926), 1 × penicillin-streptomycin and 1 × L-Glutamine and maintained at 37°C with 5% CO2. The cell line was obtained from the MD Anderson Characterized Cell Line Core Facility, in which the short tandem repeat (STR) profiling and mycoplasma tests were performed. To obtain single nucleus suspensions, cultured cells were trypsinized with 0.25% Tryspin-EDTA (Corning, 25053CI), washed one time with 1 × DPBS (Millipore Sigma, D8537), and lysed by resuspending in NST-DAPI buffer as previously described59. The human lymphoblastoid cell line (315A) was established from a normal disease-free male. We thawed 315A and resuspended the cells with RPMI-1640 medium supplemented with 10% FBS, 1× penicillin-streptomycin and 1× L-Glutamine, then split the cells into two tubes. One tube was resuspended and lysed with NST-DAPI buffer to prepare single nucleus suspensions after centrifugation. The other tube was fixed with 10 ml 10% neutral buffered formalin at room temperature for 10 min, after which the cells were pelleted down and quenched with 1 ml 10 mM Tris-HCl (PH 7.5) and resuspended with 1 ml NST-DAPI buffer to lyse the cytoplasmic membranes. The mouse A20 (male) and human GM12878 (female) cell lines were purchased from the American Type Culture Collection (ATCC TIB-208) and Coriell Institute for medical research (Coriell), the culture protocol are based on ATCC and Coriell guidelines. The cell fixation protocol for MDA-MB-231, A20 and GM12878 followed the above 315A fixation protocol.

Human participants

The human primary DCIS samples and recurrent samples were obtained from cases of pure primary DCIS that, after treatment, had subsequently developed recurrent disease, which were identified from: (1) the Dutch DCIS study, including all women diagnosed with primary DCIS with a median follow up time of 12 years60, which was approved by the review boards of the NCR (ref. no. 12.281) and PALGA (ref.no. LZV990) and the institutional review board (IRB) of the Netherlands Cancer Institute under number CFMPB166, CFMPB393 and CFMPB688. (2) the Duke Hospital cohort, a hospital-based study of women diagnosed with DCIS with a median follow up of 7.9 years, which was approved by IRB under number of Pro00054877 and Pro00068646. FFPE breast tissue blocks were stored at room temperature from the time of block generation. A H&E section was produced from all blocks within a case for review by a specialist breast pathologist. Details of the patients and their associated clinical data is provided in table S2. The IDC samples used for comparing QC metrics of frozen and FFPE data were collected from an Asian female (51 years old with grade 2 disease) and were approved by the IRB at Baylor College of Medicine. Nuclear suspension from the frozen piece was prepared using an NST-DAPI lysis buffer as previously described26. The FFPE blocks were prepared by the Research Histology Core Lab (RHCL) at MD Anderson Cancer Center (MDACC). The two lung cancer samples were frow two female patients (Lung-P1: 73 years old with T2aN0 and Lung-P2: 76 years old with T2aN1) with non-small cell lung adenocarcinomas, while the two prostate cancer samples were from two male patients (Prostate-P1: 54 years old with M0T3bN1 and Prostate-P2: 68 years old with M1T1c) were castration-sensitive prostate cancers and were collected under IRB approved protocols at MDACC.

Method details

FFPE tissue dissociation and FACS

FFPE blocks were sectioned to generated 1–3 50μM scrolls using a microtome (Finesse ME+, Thermo Scientific, UK). Single cell suspensions were generated from the scrolls using the gentleMACS Dissociator with Heaters and the FFPE tissue dissociation kit (MACS #130–118-052) according to the manufacturer’s recommendations with the minor following changes: after the last wash with ice-cold Buffer W, single cell suspensions were transferred to 1.5ml LoBind tubes (Fisher #22431021) and spun at 500 g for 5 min at 4 °C. The cell pellet was resuspended in an appropriate volume of NST-DAPI buffer (200 μl-5 ml) to generate nucleus suspensions as previously described59. Nucleus suspensions were stored in 10% DMSO for long term storage. Nucleus suspensions were flow-sorted with the BD FACSMelody, where DAPI intensity was used to gate the desired diploid/aneuploid populations.

Arc-well single cell DNA library preparation

Single cell DNA library preparation consists of nuclei dispensing, imaging and 5 reagent depositing steps that are dispensed using the ICELL8 cx system (Takara Bio) into nanowells (Table S5), as described below:

Nuclei dispensing:
Option 1: One round of nuclei dispensing and imaging:

Nuclear suspensions are washed with 0.5 × DPBS and diluted to 28,571 nuclei per milliliter with 0.5 × DPBS and dispensed into ICELL8 350v chips using ICELL8 cx system (35 nl/well), followed by centrifugation at 1000 g for 5 min at 4 °C. After the nanowell chip was imaged on the ICELL8 cx system, the images were analyzed with the CellSelect Software (Takara Bio) to generate a filter file for identifying the wells containing single nuclei. Each subsequent dispensing step uses this filter file to dispense reagents into the selected nanowells.

Option 2: Two rounds of nuclei dispensing and imaging:

Nuclear suspensions are washed with 0.5 × DPBS and diluted to 40,000 nuclei per milliliter with 0.5 × DPBS and dispensed into the ICELL8 350v nanowell chips using ICELL8 cx system with the following modified settings: change ‘Dispense VolumeNL’ from 35 to 25 under Volume 35nL and VolumeSample35 under menu of service/configure/Biodot configuration/volumes, then press “Done” under Utilities/Application manger. Scan the chip under the blue channel after dispensing. Then select all nanowells without nuclei and dispense the nuclear suspension again, while selecting the leftover wells to dispense 0.5 × DPBS to maintain the total volume at 50 nl in all wells. Next image the chip again to generate a new filter file that only contains the wells with single nuclei.

5 steps of reagents microdispensing
Step 1, Lysis selected nuclei.

To lyse the nuclei, lysis reagents (200 μl recipe: 180 μl lysis buffer (30 mM Tris-HCl, ph8.0, 5% Tween, 0.5% TritonX-100), 20 μl protease (1.36 AU/ml)) were placed into 4 wells of source plates and dispensed into the chip wells with nuclei (35 nl/well). The chip was then centrifuged at 1000 g for 5 min at 4 °C, and incubated at 59.7 °C for 5 s, 54.5 °C for 30 min, 79 °C for 11 s, 75.3 °C for 15 min and hold at 4 °C, followed by centrifuge at 3,220 g for 3 min at 4 °C.

Step 2, Tagmentation.

Tagmentation mixture (160 μl recipe: 144 μl 2 × TD buffer, 16 μl TDE1, Illumina) was placed into 4 wells of source plates and dispensed into the chip wells with single nucleus (35 nl/well). The chip was incubated at 59.7 °C for 5 s, 54.5 °C 8 min (FFPE), 12 min (fresh sample) after centrifugation.

Step 3, Neutralization and first index dispensing.

Neutralization mix for each index (20 μl recipe: 10 μl 5 × Kapa Hifi Fidelity buffer, 1.6 μl 25 mM dNTP, 1 μl 0.5 M EDTA, 6.4 μl H2O, 1 μl 100 μM PCR forward primers with index S5XX: AATGATACGGCGACCACCGAGATCTACACxrefXXTCGTCGGCAGCGTC, xrefXX represent 8 bp index sequence) (Table S6, 72 in total) was placed into 72 different wells of the 384-well source plates, then dispensed into the filtered wells (35 nl/well). The chip was centrifuged and incubated at 54.9 °C for 5 s, 49.4 °C for 30 min, 4 °C hold.

Step 4, Second index dispensing.

Second indexing mix for each index (20 μl recipe: 10 μl 5× Kapa Hifi Fidelity buffer, 0.348 μl 1 M MgCl2, 8.652 μl H2O, 1 μl 100 μM PCR reverse primers with index N7XX: CAAGCAGAAGACGGCATACGAGATxrefXXGTCTCGTGGGCTCGG, xrefXX represent 8 bp index sequence) (Table S6, 72 in total) was placed in 72 different wells of source plates, then dispensed into the filtered wells (35 nl/well).

Step 5, PCR master mix.

PCR master mix (200 μl recipe: 40 μl 5 × Kapa Hifi Fidelity buffer, 40 μl KAPA HiFi HotStart DNA Polymerase (1 U/μl), 120 μl H2O) was placed into 4 wells of source plates and dispensed into the chip wells with nuclei (35 nl/well). After centrifugation, the chip was incubated following the PCR cycles of 72.1 °C 8 min, 99.6 °C 30 s, 10~16 (10–12 for frozen samples, 14–16 for FFPE samples) × (99.6 °C 20 s, 57.5 °C 5 s, 62.7 °C 30 s, 72.1 °C 1 min), 72.1 °C 2 min, 4 °C hold.

The PCR product was then collected using Collection Module (Takara) by centrifuging the chip facing downwards at 4,000 g for 3 min at 4 °C and purified with 1.8 × Ampure XP beads (Beckman), followed by DNA trace QC checking and qPCR. The library was then sequenced using the Illumina sequencing platforms (NextSeq2000, NovaSeq6000) and 8bp dual indexing sequencing at a target of 1M reads per cell.

Quantification and statistical analysis Data preprocessing

We demultiplexed the sequencing reads using ‘bcl2fastq’ tool from Illumina without allowing any mismatches of index sequences. FASTQ reads of each sample were mapped to the human reference genome (hg19, NCBI build 37) by bowtie252. PCR duplicates were marked by sambamba with “-markdup” option. Cells with excessive noise (initial QC) were excluded according to the following criteria: (1) cells with low mapping quality libraries (Q<1), (2) cells with read counts that were less than 100K when average read number per cell is around 1M, (3) cells with >10% bins with no mapped reads. For species mixture experiments, the demultiplexed FASTQ reads were mapped to a concatenated reference of hg19 and mm10 and marked PCR duplicates as described above. Cells were then filtered using the first two criteria described above. The cell was considered as singlet if >95% of reads in one cell mapped to one reference genome (human or mouse), otherwise the cell was classified as a doublet.

Inference of DNA copy number

We used variable binning pipeline as previously reported26,36 to infer the single cell copy number profiles. Briefly, we first counted the aligned reads in variable bins averaging 220kb and normalized them with GC content using lowess regression, followed by calculating the bin-wise ratios from the ratio of bin counts to mean bin count of the sample. Circular binary segmentation (CBS) function (alpha = 0.0001 and undo.prune = 0.05) of R package DNACopy61 was used to perform segmentation.

Filtering of low-quality copy number profiles

We used k-nearest neighbor (KNN) filtering as previously described and implemented in CopyKit54. Briefly, we calculated a correlation matrix using the CBS segment ratio values from each single cell. For every cell, we averaged the cell correlation with its 5 nearest-neighbors and those with an average correlation value lower than 0.8 were excluded.

Technical metrics

The overdispersion QC metric of genomic bin counts is defined as the read count variance in bins over the mean reads counts, which is an important parameter for evaluating the uniformity of coverage during the whole genome amplification process. The overdispersion metrics was calculated as previously described26 according to the following formula Φ=(D1)/μ. Let Φ be the overdispersion parameter, μ be the mean read counts of genomic bins, and D be the index of dispersion (see more details in https://github.com/navinlabcode/Arc-well). To evaluate the breadth of coverage, we first downsampled the bam files with over 500K reads to 500K to match the sequencing depth across all samples, then applied Bedtools (v2.26.0)55 genomeCoverageBed function to BAM files from 80 randomly sampled single cells (to match with DLP datasets, which only had 82 cells with over 500K reads after QC) of each scDNA-seq method, i.e., Arc-well, ACT26, 10X genomics CNV26, DLP35, DLP+27, and DOP-PCR36.

Downsampling of reads to evaluate the accuracy of CNA detection

We first selected cells in MDA-MB-231, P1R and P4P with over 1M reads in bam files, then randomly down-sampled the total sequencing read counts to 1M, 750K, 500K, 250K, 125K, 75K and 50K reads for each single cell of each sample. From the subsampled data we calculated the copy number segmentation of each sampled single cell as described in above section of “Inference of DNA copy number” and further correlated (Spearman correlation) with original copy number segmentation profiles of cells using all of the sequencing reads from the original experiment (>1M).

Detection of diploid cells

To detect cells with diploid profiles from the datasets we calculated the coefficient of variation from the CBS segment ratio values of every single cell. We first simulated the expected coefficient of variation for 1000 diploid cells (for all tumor samples except the samples of IDC-FFPE(QC section) and P10, which used the number of the total cells due to some aneuploid cells with only a few CNA events) with N (0, 0.01) and applied an expectation-maximization algorithm to fit a mixture of normal distributions with the function normalmixEM from R package mixtools (v1.2.0)56 to the dataset comprised of the single cells segment ratio and simulated diploid coefficient of variations. Cells that presented a coefficient of variation larger than 5 standard deviations from the mean of the distribution containing the simulated diploid dataset were marked as diploid and excluded from analysis.

Arc-well data clustering and visualization

Segment ratios of each single cell were first transformed by log2, then used to perform dimension reduction with UMAP62 using R package ‘uwot’ (v0.1.11, seed = 31, min dist = 0.1, n_neighbors = 20, distance = “manhattan”, spread = 3). The n_neighbors = 25 parameter was used for the merged MDA-MB-231 datasets due to large number of cells (n=3,950). Subclones were identified by passing the resulting embedding on to a density based clustering algorithm (dbscan) using R package ‘dbscan’ (v1.1.10)63 with minPts = 0.015 × # of cells except the merged IDC-Frozen and IDC-FFPE dataset, which used minPts = 0.02 × # of cells). Cells classified as noise by hdbscan were filtered out, and cells that did not meet the minimum cell number (N=6) in each subclones from each time point were also filtered out to avoid clustering errors. A subclone (c10) in P9 with low data quality that caused clustering errors was removed. Heatmaps were plotted with R package ComplexHeatmap (v2.2.0)57.

Gene annotation for clonal and subclonal events

Subclonal and clonal segments were defined by using all the cells in the single cell heatmaps. For each sample we converted the log2 segmented ratios into a trinary event-based matrix defined by cutoffs of ± 0.15 except P3 (± 0.25) that has 2N ploidies. For example, an amplification event was defined when the log2 ratio was >= 0.15, deletion if < −0.15 and for all others a neutral event. A segment was marked as clonal CNA event (cCNA) if an amplification or deletion event was shared by at least 95% of all the cells. Clonal neutral events were marked if a neutral event shared by at least 90% of all cells (lower cut-off because neutral events is more sensitive to noise). All other remaining events were marked as subclonal CNA events (sCNA).

Estimating the number of CNA events

We calculated the total number of CNA events by counting the total number of CNA breakpoints, which represent the points of copy number level changes in the consensus CNA profiles in each time point of every sample. The length of the CNA events was estimated by the number of bins between every pair of 2 breakpoints, and CNA events with length <= 1 bin were excluded from the calculation. We used the mean number of CNA events from primary and recurrence consensus profiles respectively to estimate the difference of genomic alterations from the two time points for each sample.

Single cell phylogenetic mean pairwise distance (MPD)

To estimate clonal copy number diversity, we calculated the MPD from the single cell phylogenetic trees. First phylogenetic trees were constructed by neighbor-joining algorithm from segmented ratio profiles of the cells from each sample of each time point, respectively. To make the metric more robust to technical noise, we removed the tips of the trees so that the distance from the tree tips to their most recent common ancestors were not counted. Pairwise phylogenetic distance matrix was then calculated by the R function cophenetic() for each tree. Finally, the MPD of each sample was calculated as the mean of its resulting distance matrix. Paired samples Wilcoxon test was used to compare the MPD of matched samples between different time points.

Downsampling of MPD

The cells from each sample of each timepoint was randomly down sampled from 50 cells to maximum numbers of cells with 25 cells interval. The MPD was calculated as described above for each subset. Every downsampling procedure was repeated 10 times and the mean was reported.

Integer copy number and consensus profiles

The ploidy for each timepoint of a sample was estimated by the DAPI signals, briefly, we used the value of 2 × (median DAPI intensity of A peak / median DAPI intensity of D peak) if most of sequenced cell were from aneuploid peak. In P3, we used ploidy of 2 for both primary and recurrence since most of aneuploid cells were from diploid peak. Then we construct single cell integer copy number profiles by multiply the segment ratios by ploidy and round the values to their nearest integer. Subclonal consensus integer profiles were calculated by taking the median of every integer copy number of all the single cells that were assigned to the same subclone and rounded to the nearest integer.

Phylogenetic reconstruction of clonal lineages

Phylogenetic inference for subclonal consensus trees were performed using MEDICC240 based on the minimum event distances (medicc2 -a CN --total-copy-numbers -j 40 -vv input_path output_path). A diploid cell with copy number = 2 was added to the tree and designed as the root node, to root the tree. Common ancestors were estimated during tree construction as described in MEDICC2. The trees were plotted using R package ‘ggtree’ (v3.2.1)58.

Inference of RCA associated CNA events

The primary and recurrence common ancestors (PCA and RCA) were inferred from the consensus copy number profiles (mean integer values of segments) of all the primary or recurrence tumor subclones from each sample during MEDICC2 tree construction. The RCA associated CNA events were calculated by subtracting the CNA events in the PCA from the RCA of each tumor.

Supplementary Material

1. Figure S1. Testing and optimization of Arc-well chemistry and experimental time frame, related to Figure 1.

(A) Nanowell images of DAPI stained nuclei showing examples of empty wells, single nuclei in wells and multiplets of nuclei in wells.

(B) Tagmented DNA fragment size trace distribution using different amounts of DNA repair reagents (PreCR Repair mix, NEB). The Arc-well reaction is a one-pot reaction (without any purification steps during reactions), in which the DNA repair reagents remain in the reaction after repairing. The 0.1X, and 0.2X PreCR conditions represent the PreCR reagents concentration at the tagmentation step. The No PreCR mix reaction (control) showed expected uniform distributions of ~200–300bp tagmented DNA fragments, while in the reactions that were repaired by PreCR, most of products were primer dimers with size less than150bp.

(C) Tagmented DNA fragment trace distributions after treating DNA from FFPE tissue samples with different repair conditions, including control, 75°C 1h, 75°C 1h and repair with different concentration of PreCR repair mixture.

(D) Barnyard species mixture experiments using human (GM12878) and mouse (A20) B lymphocyte cell lines profiled by Arc-well.

(E) Comparison of the time frames between DOP-PCR and Arc-well methods for performing one experiment (DOP-PCR: 96 cells, Arc-well: 1200–2600 cells).

2. Figure S2. Clonal substructure of two cell lines and an IDC tumor, related to Figure 2.

(A) UMAP and clustered heatmap of single cell copy number profiles from different passages of MDA-MB-231 cell line profiled by Arc-well (p37: non-fixed and fixed; p38: non-fixed) and the ACT method (p30-p32, non-fixed).

(B) Single cell heatmap of non-fixed and formalin fixed 315A cells profiled by Arc-well.

(C) UMAP and clustered single cell CNA heatmap of Arc-well data generated from frozen and FFPE tissues from the same patient.

(D) Correlation (Spearman’s rank correlation test) of DNA copy number segments between downsampled reads (from 1M to 50K) to all reads (> 1M) for each singe cell across three samples, including 1 cell line (MDA-MB-231) and 2 FFPE samples from breast cancer samples with different PCR duplicate rates.

(E) UMAP plot and heatmap of single cells copy number profiles from a single-timepoint DCIS sample (P2P). Upper part of single cell heatmap shown the single cell ratio profiles, while the lower part shown the subclonal consensus integer copy number profiles.

3. Figure S3. Clonal substructure of lung and prostate tumors, related to Figure 2.

(A) UMAPs of single cell copy number profiles for two lung cancer and two prostate cancer samples, where each dot represents a single cell and is colored by subclones. BA: FFPE block age.

(B) Heatmaps of single cell copy number profiles for two lung cancer and two prostate cancer samples (chrY wasn’t shown), with left column annotations of FACS gates used to flow-sort of the nuclei and subclone. For the FACS gates ‘A’ indicates cell was sorted from aneuploid peak. while ‘D’ indicates cell was sorted from diploid peak.

4. Figure S4. Gating strategy for FACS sorting and heatmaps of unfiltered FFPE Arc-well data from 10 patients with paired samples, related to Figure 3.

(A) Strategy for FACS gating of DAPI-stained nuclei from diploid and aneuploid cell distributions from 10 breast cancer patients with paired primary DCIS and recurrences. The x-axis represents the DAPI signal intensity, while the y-axis represents sorted single nuclei counts.The ‘D’ label indicates is the diploid peak, while ‘A’ label indicates the aneuploid peak that was FACS sorted for the Arc-well experiments.

(B) Heatmaps of single cell copy number profiles for all 10 paired primary DCIS and recurrent (DCIS/IDC) samples (P3 - P12), where left annotation bars indicate the filtered status, the time point of the sample and FACS peaks of each single cell.

5. Figure S5. Copy number substructure of P4, P8, P9 and P11, related to Figure 4.

Upper panels indicate the H&E images (scale bar: 50 μm) and clinical timelines of DCIS and recurrences. Lower panels show single cell copy number heatmaps. Bottom annotation panels indicate the classification of clonal and subclonal CNAs. Data is shown for patients (A) P4, (B) P8, (C) P9 and (D) P11.

6. Figure S6. Copy number substructure of P10, P12 and phylogenetic lineages of P4 and P5, related to Figures 4 and 5.

(A-B) H&E images (scale bar: 50 μm), clinical timelines and single cell copy number heatmaps for patients (A) P10, (B) P12.

(C-D) Evolutionary trees and the heatmap of the consensus of subclone copy number profiles for (C) P4 and (D) P8.

7. Figure S7. Phylogenetic lineages of P9 - P12, related to Figure 5.

(A-D) Evolutionary trees and the heatmap of the consensus of subclone copy number profiles for (A) P9, (B) P11, (C) P10 and (D) P12.

8
REAGENT or RESOURCE SOURCE IDENTIFIER
Biological samples
Human breast tissue samples Netherlands Cancer Institute, Duke University Hospital and Baylor College of Medicine N/A
Human lung cancer samples MD Anderson Cancer Center N/A
Human prostate cancer samples MD Anderson Cancer Center N/A
Chemicals, peptides, and recombinant proteins
Formalin solution, neutral buffered, 10% Millipore Sigma HT5012–1CS
DMEM Millipore Sigma D5796
FBS Millipore Sigma F0926
Penicillin-streptomycin Corning 30–002-Cl
L-glutamine Solution Corning 25005CI
Trypsin-EDTA Corning 25053CI
DPBS Millipore Sigma D8537
Tris-HCl (pH7.4) Millipore Sigma T2194
Tween-20 Andwin Scientific NC9022994
Nonidet P40 Substitute Millipore Sigma 74385
NaCl Invitrogen AM9760G
MgCl2 Invitrogen AM9530G
Dimethyl sulfoxide (DMSO) Millipore Sigma D2650
Triton™ X-100 Fisher scientific ICN19485483
QIAGEN Protease (30 AU) Qiagen 19157
0.5 M EDTA Thermo Scientific R1021
Critical commercial assays
Illumina Tagment DNA Enzyme and Buffer Large Kit Illumina 20034198
KAPA HotStart PCR Kit, with dNTPs Roche KK2502
FFPE Tissue Dissociation Kit Miltenyi Biotec 130–118-052
Deposited data
Raw data files and processed data files This manuscript SRA: PRJNA799605
and https://github.com/navinlabcode/Arc-well
Experimental models: Cell lines
Human: MDA-MB-231 MD Anderson Characterized Cell
Line Core Facility
N/A
Human: 315A Cold Spring Harbor Laboratory N/A
Human: GM12878 Coriell Institute for
medical research
GM12878
Mouse: A20 ATCC ATCC: TIB-208
Oligonucleotides
See Table S6 IDT N/A
Software and algorithms
bcl2fastq (v2.20.0.422) Illumina https://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software.html
bowtie2 Langmead et.al52 https://bowtie-bio.sourceforge.net/bowtie2/index.shtml
Sambamba (v0.8.0) Tarasov et.al53 https://lomereiter.github.io/sambamba/
DNACopy Seshan et.al 10.18129/B9.bioc.DNAcopy
CNV pipeline Minussi et.al26 https://github.com/navinlabcode/CNV_pipeline
CopyKit (V0.1.0) Minussi et.al54 https://github.com/navinlabcode/copykit
bedtools (v2.26.0) Quinlan et.al55 https://bedtools.readthedocs.io/en/latest/
mixtools (v1.2.0) Benaglia et.al56 https://cran.r-project.org/web/packages/mixtools/index.html
uwot (v0.1.11) Melville et.al https://github.com/jlmelville/uwot
dbscan (v1.1.10) Hahsler et.al https://github.com/mhahsler/dbscan
ComplexHeatmap (v2.2.0) Gu et.al57 10.1093/bioinformatics/btw313
MEDICC2 Kaufmann et.al40 https://bitbucket.org/schwarzlab/medicc2/src/master/
ggtree Yu et.al58 https://github.com/YuLab-SMU/ggtree

Highlights.

  • Arc-well is a high-throughput single cell DNA-seq method for archival FFPE tissues

  • Arc-well reliably profiled thousands of cells from 27 FFPE tissues archived for years

  • Persistent subclones from DCIS and matched recurrences were identified

  • Most DCIS cases in this cohort underwent an evolutionary bottleneck

Acknowledgements

This work was supported by grants to N.N. from the NIH National Cancer Institute (RO1CA240526, RO1CA236864), the CPRIT Single Cell Genomics Center (RP180684), the PRECISION Cancer Grand Challenge Grant and the MD Anderson Moon Shot program. This study was supported by grants to E.S.H (R01 CA185138-01, U2C CA-17-035 Pre-Cancer Atlas (PCA) Research Centers, DOD BC132057). N.N. is an AAAS Fellow, AAAS Wachtel Scholar, Damon-Runyon Rachleff Innovator and Jack & Beverly Randall Innovator. This study was supported by the MD Anderson Sequencing Core Facility Grant (CA016672) and Histopathology Core (P30CA016672). T.K. is funded by the NCI T32 Translational Genomics Fellowship and Rosalie B. Hite fellowship. Y.Z. is funded by the Prostate Cancer Foundation Young Investigator Award, MD Anderson Odyssey Fellowship, and CPRIT Research Training Program (RP170067). Z.X. is funded by the AACR-Pfizer Breast Cancer Research Fellowship. We also thank Roland Schwarz lab and Takara Bio. This work was supported by Cancer Research UK, Dutch Cancer Society (KWF) (ref. C38317/A24043) Dutch Ministry of Health, Welfare and Sport. We thank all collaborating hospitals and pathology departments, the Netherlands Comprehensive Cancer Organization (IKNL) and PALGA. We acknowledge the NKI Core Facility Molecular Pathology & Biobanking (CFMPB). Most importantly, we thank all patients in the US, The Netherlands and the UK who have donated tissue for this work.

Footnotes

Declaration of interests

The authors do not have conflicts of interest to declare related to this study.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Bleyer A, and Welch HG (2012). Effect of three decades of screening mammography on breast-cancer incidence. N Engl J Med 367, 1998–2005. 10.1056/NEJMoa1206809. [DOI] [PubMed] [Google Scholar]
  • 2.Boxer MM, Delaney GP, and Chua BH (2013). A review of the management of ductal carcinoma in situ following breast conserving surgery. Breast 22, 1019–1025. 10.1016/j.breast.2013.08.012. [DOI] [PubMed] [Google Scholar]
  • 3.Gorringe KL, Hunter SM, Pang J-M, Opeskin K, Hill P, Rowley SM, Choong DYH, Thompson ER, Dobrovic A, Fox SB, et al. (2015). Copy number analysis of ductal carcinoma in situ with and without recurrence. Mod Pathol 28, 1174–1184. 10.1038/modpathol.2015.75. [DOI] [PubMed] [Google Scholar]
  • 4.Casasent AK, Schalck A, Gao R, Sei E, Long A, Pangburn W, Casasent T, Meric-Bernstam F, Edgerton ME, and Navin NE (2018). Multiclonal Invasion in Breast Tumors Identified by Topographic Single Cell Sequencing. Cell 172, 205–217.e12. 10.1016/j.cell.2017.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Pareja F, Brown DN, Lee JY, Paula ADC, Selenica P, Bi R, Geyer FC, Gazzo A, Silva E.M. da, Vahdatinia M, et al. (2020). Whole-Exome Sequencing Analysis of the Progression from Non–Low-Grade Ductal Carcinoma In Situ to Invasive Ductal Carcinoma. Clin Cancer Res 26, 3682–3693. 10.1158/1078-0432.CCR-19-2563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kim SY, Jung S-H, Kim MS, Baek I-P, Lee SH, Kim T-M, Chung Y-J, and Lee SH (2015). Genomic differences between pure ductal carcinoma in situ and synchronous ductal carcinoma in situ with invasive breast cancer. Oncotarget 6, 7597–7607. 10.18632/oncotarget.3162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hernandez L, Wilkerson PM, Lambros MB, Campion-Flora A, Rodrigues DN, Gauthier A, Cabral C, Pawar V, Mackay A, A’Hern R, et al. (2012). Genomic and mutational profiling of ductal carcinomas in situ and matched adjacent invasive breast cancers reveals intra-tumour genetic heterogeneity and clonal selection. J Pathol 227, 42–52. 10.1002/path.3990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bergholtz H, Lien TG, Swanson DM, Frigessi A, Daidone MG, Tost J, Wärnberg F, and Sørlie T (2020). Contrasting DCIS and invasive breast cancer by subtype suggests basal-like DCIS as distinct lesions. npj Breast Cancer 6, 1–9. 10.1038/s41523-020-0167-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lin C-Y, Vennam S, Purington N, Lin E, Varma S, Han S, Desa M, Seto T, Wang NJ, Stehr H, et al. (2019). Genomic landscape of ductal carcinoma in situ and association with progression. Breast Cancer Res Treat 178, 307–316. 10.1007/s10549-019-05401-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yap TA, Gerlinger M, Futreal PA, Pusztai L, and Swanton C. (2012). Intratumor heterogeneity: seeing the wood for the trees. Sci Transl Med 4, 127ps10. 10.1126/scitranslmed.3003854. [DOI] [PubMed] [Google Scholar]
  • 11.Turner NC, and Reis-Filho JS (2012). Genetic heterogeneity and cancer drug resistance. Lancet Oncol 13, e178–185. 10.1016/S1470-2045(11)70335-7. [DOI] [PubMed] [Google Scholar]
  • 12.Casasent AK, Edgerton M, and Navin NE (2017). Genome evolution in ductal carcinoma in situ: invasion of the clones. J Pathol 241, 208–218. 10.1002/path.4840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Trinh A, Alcazar CRGD, Shukla SA, Chin K, Chang YH, Thibault G, Eng J, Jovanović B, Aldaz CM, Park SY, et al. (2021). Genomic Alterations during the In Situ to Invasive Ductal Breast Carcinoma Transition Shaped by the Immune System. Mol Cancer Res 19, 623–635. 10.1158/1541-7786.MCR-20-0949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wu C-H, Yeh H-T, Hsieh C-S, Huang C-C, Chattopadhyay A, Chung Y-C, Tu S-H, Li Y-H, Lu T-P, Lai L-C, et al. (2021). Evolutionary Trajectories and Genomic Divergence in Localized Breast Cancers after Ipsilateral Breast Tumor Recurrence. Cancers (Basel) 13, 1821. 10.3390/cancers13081821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Waldman FM, DeVries S, Chew KL, Moore DH II, Kerlikowske K, and Ljung B-M (2000). Chromosomal Alterations in Ductal Carcinomas In Situ and Their In Situ Recurrences. JNCI: Journal of the National Cancer Institute 92, 313–320. 10.1093/jnci/92.4.313. [DOI] [PubMed] [Google Scholar]
  • 16.Lips EH, Kumar T, Megalios A, Visser LL, Sheinman M, Fortunato A, Shah V, Hoogstraat M, Sei E, Mallo D, et al. (2022). Genomic analysis defines clonal relationships of ductal carcinoma in situ and recurrent invasive breast cancer. Nat Genet 54, 850–860. 10.1038/s41588-022-01082-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Cowell CF, Weigelt B, Sakr RA, Ng CKY, Hicks J, King TA, and Reis-Filho JS (2013). Progression from ductal carcinoma in situ to invasive breast cancer: Revisited. Molecular Oncology 7, 859–869. 10.1016/j.molonc.2013.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Turner NC, and Reis-Filho JS (2012). Genetic heterogeneity and cancer drug resistance. Lancet Oncol 13, e178–185. 10.1016/S1470-2045(11)70335-7. [DOI] [PubMed] [Google Scholar]
  • 19.Navin N, Kendall J, Troge J, Andrews P, Rodgers L, McIndoo J, Cook K, Stepansky A, Levy D, Esposito D, et al. (2011). Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94. 10.1038/nature09807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Telenius H, Carter NP, Bebb CE, Nordenskjöld M, Ponder BA, and Tunnacliffe A. (1992). Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer. Genomics 13, 718–725. 10.1016/0888-7543(92)90147-k. [DOI] [PubMed] [Google Scholar]
  • 21.Dean FB, Hosono S, Fang L, Wu X, Faruqi AF, Bray-Ward P, Sun Z, Zong Q, Du Y, Du J, et al. (2002). Comprehensive human genome amplification using multiple displacement amplification. Proc Natl Acad Sci U S A 99, 5261–5266. 10.1073/pnas.082089499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zong C, Lu S, Chapman AR, and Xie XS (2012). Genome-Wide Detection of Single-Nucleotide and Copy-Number Variations of a Single Human Cell. Science 338, 1622–1626. 10.1126/science.1229164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ruan J, Wang K, Shen X, Lu X, and Wu CI (2016). DNA amplification method, CN201410448896.XA. [Google Scholar]
  • 24.Chen C, Xing D, Tan L, Li H, Zhou G, Huang L, and Xie XS (2017). Single-cell whole-genome analyses by Linear Amplification via Transposon Insertion (LIANTI). Science 356, 189–194. 10.1126/science.aak9787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Andor N, Lau BT, Catalanotti C, Kumar V, Sathe A, Belhocine K, Wheeler TD, Price AD, Song M, Džakula Ž, et al. (2020). Joint single cell DNA-Seq and RNA-Seq of cancer reveals subclonal signatures of genomic instability and gene expression. 445932. 10.1101/445932. [DOI] [Google Scholar]
  • 26.Minussi DC, Nicholson MD, Ye H, Davis A, Wang K, Baker T, Tarabichi M, Sei E, Du H, Rabbani M, et al. (2021). Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature 592, 302–308. 10.1038/s41586-021-03357-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Laks E, McPherson A, Zahn H, Lai D, Steif A, Brimhall J, Biele J, Wang B, Masud T, Ting J, et al. (2019). Clonal Decomposition and DNA Replication States Defined by Scaled Single-Cell Genome Sequencing. Cell 179, 1207–1221.e22. 10.1016/j.cell.2019.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Vitak SA, Torkenczy KA, Rosenkrantz JL, Fields AJ, Christiansen L, Wong MH, Carbone L, Steemers FJ, and Adey A. (2017). Sequencing thousands of single-cell genomes with combinatorial indexing. Nat Methods 14, 302–308. 10.1038/nmeth.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yin Y, Jiang Y, Lam K-WG, Berletch JB, Disteche CM, Noble WS, Steemers FJ, Camerini-Otero RD, Adey AC, and Shendure J. (2019). High-Throughput Single-Cell Sequencing with Linear Amplification. Mol Cell 76, 676–690.e10. 10.1016/j.molcel.2019.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Feldman M.Ya. (1973). Reactions of Nucleic Acids and NucleoDroteins with Formaldehyde11Translated by A. L. Pumpiansky, Moscow. In Progress in Nucleic Acid Research and Molecular Biology, Davidson JN and Cohn WE, eds. (Academic Press; ), pp. 1–49. 10.1016/S0079-6603(08)60099-9. [DOI] [PubMed] [Google Scholar]
  • 31.Hoffman EA, Frey BL, Smith LM, and Auble DT (2015). Formaldehyde Crosslinking: A Tool for the Study of Chromatin Complexes*. Journal of Biological Chemistry 290, 26404–26411. 10.1074/jbc.R115.651679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Martelotto LG, Baslan T, Kendall J, Geyer FC, Burke KA, Spraggon L, Piscuoglio S, Chadalavada K, Nanjangud G, Ng CKY, et al. (2017). Whole-genome single-cell copy number profiling from formalin-fixed paraffin-embedded samples. Nature Medicine 23, 376–385. 10.1038/nm.4279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Goldstein LD, Chen Y-JJ, Dunne J, Mir A, Hubschle H, Guillory J, Yuan W, Zhang J, Stinson J, Jaiswal B, et al. (2017). Massively parallel nanowell-based single-cell gene expression profiling. BMC Genomics 18, 519. 10.1186/s12864-017-3893-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Martelotto LG, Baslan T, Kendall J, Geyer FC, Burke KA, Spraggon L, Piscuoglio S, Chadalavada K, Nanjangud G, Ng CKY, et al. (2017). Whole-genome single-cell copy number profiling from formalin-fixed paraffin-embedded samples. Nat Med 23, 376–385. 10.1038/nm.4279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zahn H, Steif A, Laks E, Eirew P, VanInsberghe M, Shah SP, Aparicio S, and Hansen CL (2017). Scalable whole-genome single-cell library preparation without preamplification. Nat Methods 14, 167–173. 10.1038/nmeth.4140. [DOI] [PubMed] [Google Scholar]
  • 36.Gao R, Davis A, McDonald TO, Sei E, Shi X, Wang Y, Tsai P-C, Casasent A, Waters J, Zhang H, et al. (2016). Punctuated copy number evolution and clonal stasis in triple-negative breast cancer. Nature Genetics 48, 1119–1130. 10.1038/ng.3641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Creighton CJ, Kent Osborne C, van de Vijver MJ, Foekens JA, Klijn JG, Horlings HM, Nuyten D, Wang Y, Zhang Y, Chamness GC, et al. (2009). Molecular profiles of progesterone receptor loss in human breast tumors. Breast Cancer Res Treat 114, 287–299. 10.1007/s10549-008-0017-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70. 10.1038/nature11412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Curtis C, Shah SP, Chin S-F, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y, et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352. 10.1038/nature10983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kaufmann TL, Petkovic M, Watkins TBK, Colliver EC, Laskina S, Thapa N, Minussi DC, Navin N, Swanton C, Van Loo P, et al. (2022). MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution. Genome Biology 23, 241. 10.1186/s13059-022-02794-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Liao S, Desouki MM, Gaile DP, Shepherd L, Nowak NJ, Conroy J, Barry WT, and Geradts J. (2012). Differential copy number aberrations in novel candidate genes associated with progression from in situ to invasive ductal carcinoma of the breast. Genes Chromosomes Cancer 51, 1067–1078. 10.1002/gcc.21991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Johnson CE, Gorringe KL, Thompson ER, Opeskin K, Boyle SE, Wang Y, Hill P, Mann GB, and Campbell IG (2012). Identification of copy number alterations associated with the progression of DCIS to invasive ductal carcinoma. Breast Cancer Res Treat 133, 889–898. 10.1007/s10549-011-1835-1. [DOI] [PubMed] [Google Scholar]
  • 43.Hwang ES, DeVries S, Chew KL, Moore DH, Kerlikowske K, Thor A, Ljung B-M, and Waldman FM (2004). Patterns of chromosomal alterations in breast ductal carcinoma in situ. Clin Cancer Res 10, 5160–5167. 10.1158/1078-0432.CCR-04-0165. [DOI] [PubMed] [Google Scholar]
  • 44.Zhao T, Chiang ZD, Morriss JW, LaFave LM, Murray EM, Del Priore I, Meli K, Lareau CA, Nadaf NM, Li J, et al. (2022). Spatial genomics enables multi-modal study of clonal heterogeneity in tissues. Nature 601, 85–91. 10.1038/s41586-021-04217-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Kokkat TJ, Patel MS, McGarvey D, LiVolsi VA, and Baloch ZW (2013). Archived formalin-fixed paraffin-embedded (FFPE) blocks: A valuable underexploited resource for extraction of DNA, RNA, and protein. Biopreserv Biobank 11, 101–106. 10.1089/bio.2012.0052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Navin NE (2014). Cancer genomics: one cell at a time. Genome Biol 15, 452. 10.1186/s13059-014-0452-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Ren X, Kang B, and Zhang Z. (2018). Understanding tumor ecosystems by single-cell sequencing: promises and limitations. Genome Biol 19, 211. 10.1186/s13059-018-1593-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Wang Y, and Navin NE (2015). Advances and applications of single-cell sequencing technologies. Mol Cell 58, 598–609. 10.1016/j.molcel.2015.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Cai X, Evrony GD, Lehmann HS, Elhosary PC, Mehta BK, Poduri A, and Walsh CA (2015). Single-cell, genome-wide sequencing identifies clonal somatic copy-number variation in the human brain. Cell Rep 10, 645. 10.1016/j.celrep.2015.01.028. [DOI] [PubMed] [Google Scholar]
  • 50.Wang J, Fan HC, Behr B, and Quake SR (2012). Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm. Cell 150, 402–412. 10.1016/j.cell.2012.06.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Gawad C, Koh W, and Quake SR (2016). Single-cell genome sequencing: current state of the science. Nat Rev Genet 17, 175–188. 10.1038/nrg.2015.16. [DOI] [PubMed] [Google Scholar]
  • 52.Langmead B, and Salzberg SL (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359. 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, and Prins P. (2015). Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034. 10.1093/bioinformatics/btv098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Minussi DC, Sei E, Wang J, Schalck A, Yan Y, Davis A, Wu H-J, Bai S, Peng C, Hu M, et al. (2022). Resolving clonal substructure from single cell genomic data using CopyKit. 2022.03.09.483497. 10.1101/2022.03.09.483497. [DOI] [Google Scholar]
  • 55.Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Benaglia T, Chauveau D, Hunter DR, and Young DS (2010). mixtools: An R Package for Analyzing Mixture Models. Journal of Statistical Software 32, 1–29. 10.18637/jss.v032.i06. [DOI] [Google Scholar]
  • 57.Gu Z, Eils R, and Schlesner M. (2016). Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849. 10.1093/bioinformatics/btw313. [DOI] [PubMed] [Google Scholar]
  • 58.Yu G. (2020). Using ggtree to Visualize Data on Tree-Like Structures. Curr Protoc Bioinformatics 69, e96. 10.1002/cpbi.96. [DOI] [PubMed] [Google Scholar]
  • 59.Leung ML, Wang Y, Kim C, Gao R, Jiang J, Sei E, and Navin NE (2016). Highly multiplexed targeted DNA sequencing from single nuclei. Nature Protocols 11, 214–235. 10.1038/nprot.2016.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Elshof LE, Schaapveld M, Schmidt MK, Rutgers EJ, van Leeuwen FE, and Wesseling J. (2016). Subsequent risk of ipsilateral and contralateral invasive breast cancer after treatment for ductal carcinoma in situ: incidence and the effect of radiotherapy in a population-based cohort of 10,090 women. Breast Cancer Res Treat 159, 553–563. 10.1007/s10549-016-3973-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Olshen AB, Venkatraman ES, Lucito R, and Wigler M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572. 10.1093/biostatistics/kxh008. [DOI] [PubMed] [Google Scholar]
  • 62.McInnes L, Healy J, and Melville J. (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 [cs, stat]. [Google Scholar]
  • 63.Hahsler M, Piekenbrock M, and Doran D. (2019). dbscan: Fast Density-Based Clustering with R. Journal of Statistical Software 91, 1–30. 10.18637/jss.v091.i01. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1. Figure S1. Testing and optimization of Arc-well chemistry and experimental time frame, related to Figure 1.

(A) Nanowell images of DAPI stained nuclei showing examples of empty wells, single nuclei in wells and multiplets of nuclei in wells.

(B) Tagmented DNA fragment size trace distribution using different amounts of DNA repair reagents (PreCR Repair mix, NEB). The Arc-well reaction is a one-pot reaction (without any purification steps during reactions), in which the DNA repair reagents remain in the reaction after repairing. The 0.1X, and 0.2X PreCR conditions represent the PreCR reagents concentration at the tagmentation step. The No PreCR mix reaction (control) showed expected uniform distributions of ~200–300bp tagmented DNA fragments, while in the reactions that were repaired by PreCR, most of products were primer dimers with size less than150bp.

(C) Tagmented DNA fragment trace distributions after treating DNA from FFPE tissue samples with different repair conditions, including control, 75°C 1h, 75°C 1h and repair with different concentration of PreCR repair mixture.

(D) Barnyard species mixture experiments using human (GM12878) and mouse (A20) B lymphocyte cell lines profiled by Arc-well.

(E) Comparison of the time frames between DOP-PCR and Arc-well methods for performing one experiment (DOP-PCR: 96 cells, Arc-well: 1200–2600 cells).

2. Figure S2. Clonal substructure of two cell lines and an IDC tumor, related to Figure 2.

(A) UMAP and clustered heatmap of single cell copy number profiles from different passages of MDA-MB-231 cell line profiled by Arc-well (p37: non-fixed and fixed; p38: non-fixed) and the ACT method (p30-p32, non-fixed).

(B) Single cell heatmap of non-fixed and formalin fixed 315A cells profiled by Arc-well.

(C) UMAP and clustered single cell CNA heatmap of Arc-well data generated from frozen and FFPE tissues from the same patient.

(D) Correlation (Spearman’s rank correlation test) of DNA copy number segments between downsampled reads (from 1M to 50K) to all reads (> 1M) for each singe cell across three samples, including 1 cell line (MDA-MB-231) and 2 FFPE samples from breast cancer samples with different PCR duplicate rates.

(E) UMAP plot and heatmap of single cells copy number profiles from a single-timepoint DCIS sample (P2P). Upper part of single cell heatmap shown the single cell ratio profiles, while the lower part shown the subclonal consensus integer copy number profiles.

3. Figure S3. Clonal substructure of lung and prostate tumors, related to Figure 2.

(A) UMAPs of single cell copy number profiles for two lung cancer and two prostate cancer samples, where each dot represents a single cell and is colored by subclones. BA: FFPE block age.

(B) Heatmaps of single cell copy number profiles for two lung cancer and two prostate cancer samples (chrY wasn’t shown), with left column annotations of FACS gates used to flow-sort of the nuclei and subclone. For the FACS gates ‘A’ indicates cell was sorted from aneuploid peak. while ‘D’ indicates cell was sorted from diploid peak.

4. Figure S4. Gating strategy for FACS sorting and heatmaps of unfiltered FFPE Arc-well data from 10 patients with paired samples, related to Figure 3.

(A) Strategy for FACS gating of DAPI-stained nuclei from diploid and aneuploid cell distributions from 10 breast cancer patients with paired primary DCIS and recurrences. The x-axis represents the DAPI signal intensity, while the y-axis represents sorted single nuclei counts.The ‘D’ label indicates is the diploid peak, while ‘A’ label indicates the aneuploid peak that was FACS sorted for the Arc-well experiments.

(B) Heatmaps of single cell copy number profiles for all 10 paired primary DCIS and recurrent (DCIS/IDC) samples (P3 - P12), where left annotation bars indicate the filtered status, the time point of the sample and FACS peaks of each single cell.

5. Figure S5. Copy number substructure of P4, P8, P9 and P11, related to Figure 4.

Upper panels indicate the H&E images (scale bar: 50 μm) and clinical timelines of DCIS and recurrences. Lower panels show single cell copy number heatmaps. Bottom annotation panels indicate the classification of clonal and subclonal CNAs. Data is shown for patients (A) P4, (B) P8, (C) P9 and (D) P11.

6. Figure S6. Copy number substructure of P10, P12 and phylogenetic lineages of P4 and P5, related to Figures 4 and 5.

(A-B) H&E images (scale bar: 50 μm), clinical timelines and single cell copy number heatmaps for patients (A) P10, (B) P12.

(C-D) Evolutionary trees and the heatmap of the consensus of subclone copy number profiles for (C) P4 and (D) P8.

7. Figure S7. Phylogenetic lineages of P9 - P12, related to Figure 5.

(A-D) Evolutionary trees and the heatmap of the consensus of subclone copy number profiles for (A) P9, (B) P11, (C) P10 and (D) P12.

8

Data Availability Statement

  • All sequencing data generated in this study have been deposited to the Sequence Read Archive (SRA): PRJNA799605.

  • The codes used in this paper are available at https://github.com/navinlabcode/Arc-well.

  • Any additional information required to reanalyze the data reported in this work paper is available from the lead contact upon request.

RESOURCES