Skip to main content
NPJ Precision Oncology logoLink to NPJ Precision Oncology
. 2024 Oct 3;8:222. doi: 10.1038/s41698-024-00723-6

Serial single-cell RNA sequencing unveils drug resistance and metastatic traits in stage IV breast cancer

Kazutaka Otsuji 1, Yoko Takahashi 1,2, Tomo Osako 3, Takayuki Kobayashi 4, Toshimi Takano 4, Sumito Saeki 5, Liying Yang 5, Satoko Baba 3,6, Kohei Kumegawa 1, Hiromu Suzuki 7, Tetsuo Noda 8, Kengo Takeuchi 3,6, Shinji Ohno 1, Takayuki Ueno 1,2, Reo Maruyama 1,5,
PMCID: PMC11450160  PMID: 39363009

Abstract

Metastasis is a complex process that remains poorly understood at the molecular levels. We profiled single-cell transcriptomic, genomic, and epigenomic changes associated with cancer cell progression, chemotherapy resistance, and metastasis from a Stage IV breast cancer patient. Pretreatment- and posttreatment-specimens from the primary tumor and distant metastases were collected for single-cell RNA sequencing and subsequent cell clustering, copy number variation (CNV) estimation, transcriptomic factor estimation, and pseudotime analyses. CNV analysis revealed that a small population of pretreatment cancer cells resisted chemotherapy and expanded. New clones including Metastatic Precursor Cells (MPCs), emerged in the posttreatment primary tumors in CNV similar to metastatic cells. MPCs exhibited expression profiles indicative of epithelial–mesenchymal transition. Comparison of MPCs with metastatic cancer cells also revealed dynamic changes in transcription factors and calcitonin pathway gene expression. These findings demonstrate the utility of single-patient clinical sample analysis for understanding tumor drug resistance, regrowth, and metastasis.

Subject terms: Breast cancer, Metastasis

Introduction

Breast cancer in its early stage is generally associated with a better prognosis compared to other cancers, while recurrence and metastasis significantly worsen its outcomes regardless of its subtype13. Intratumoral heterogeneity (ITH) is a key factor in drug resistance, cancer recurrence, and metastasis. Deciphering ITH is not only essential for improving cancer diagnostics and treatments but also holds the potential to improve patients’ prognoses4,5. The traditional bulk analysis has struggled to capture the heterogeneous characteristics of cancer; however, the advent of single-cell analysis has dramatically deepened our understanding of ITH6,7. Some studies for breast cancer using single-cell analyses, for instance, revealed subpopulations specific to subtypes, which may play a crucial role in treatment resistance and cancer progression8,9. These advancements in our understanding could pave the way for more effective and targeted cancer treatments in the future.

One of the crucial causes of ITH is the accumulation of different genomic abnormalities in individual cells. The genetic diversity arises from genetic and epigenetic alterations during tumor progression, creating multiple subclonal populations. These subclones can differ in their responses to therapy, invasive potential, and metastasizing ability. This evolutionary process is known as clonal evolution10,11. Similar to but different from clonal evolution, cancer cell plasticity also significantly contributes to ITH. It represents a transient change within cancer cells, allowing them to adapt to their environment and to adopt different phenotypic states, such as epithelial-to-mesenchymal transition (EMT), or acquire stem-like properties1214. Studies on clonal evolution in cancer have primarily focused on genomic data. Genetic research in breast cancer has explored transitions from ductal carcinoma in situ to invasive cancer or progression from the primary site to lymphatic metastasis in the axilla1519. However, genomic analysis evaluates clonal patterns and does not fully assess the functional characteristics of those clones or the causes of genetic alterations. Single‐cell RNA-sequencing (scRNA-seq), on the contrary, enables us to interpret the significance of expressional changes and estimate copy numbers with efficient tools like inferCNV20 at single-cell resolution.

Whether analyzing the clonal evolution of cancer or examining its plasticity, it is challenging to clarify these processes with a sample at a single time and location. Longitudinal observations of cancer progression are crucial; however, especially in human clinical samples, collecting samples from the same patient’s multiple lesions over time presents significant challenges. While it is relatively easier to access biopsy samples from primary tumors, approaching metastatic sites in other organs remains difficult. In close collaboration with clinicians, we have managed to undertake temporal and spatial multi-sampling from a patient with Stage IV breast cancer. The primary tumor at diagnosis exhibited a heterogeneous population of clones, one of which was estimated to have resisted drug treatment and subsequently proliferated. Furthermore, a specific clone that emerged in the primary tumor following treatment resembled clones at distant metastatic sites. These unique clones also expressed cancer stem cell markers and genes related to epithelial–mesenchymal transition (EMT), as well as altered activation of specific transcription factors (TFs). Collectively, these findings reveal the molecular evolution underpinning breast cancer progression, drug resistance, and metastasis, and may offer clues for the development of more effective therapeutic strategies.

Results

scRNA-seq analysis of the four merged samples revealed significant changes in cancer characteristics after initial chemotherapy

The clinical specimens analyzed in this study were collected during the lactation period from a 40-year-old patient with de novo stage IV breast cancer. The first sample, referred to as “Pre,” was obtained from biopsy specimens of the breast tumor collected for diagnostic purposes at the initial consultation. The second and third samples, named “Post1” and “Post2,” were specimens collected during surgical resection for the local control of rapidly growing breast cancer despite ongoing drug therapy. The fourth sample, “Meta,” was obtained from surgical specimens of a newly developed metastatic lesion in the peritoneum, 6 months after the surgery of the primary tumor. We conducted scRNA-seq on these four samples using the 10x Chromium platform, performing analyses individually and in an integrated manner (Fig. 1A). Detailed clinical outcomes are presented in Supplementary Fig. 1 and Methods section, and the pathological diagnostic results for each sample are listed in Supplementary Table 1.

Fig. 1. Experimental scheme, and changes in breast cancer cell gene expression induced by chemotherapy.

Fig. 1

A Schema of sample naming, processing, and analysis. (Created by BioRender) B, C UMAP plots for merged cells after integration, color-coded by cluster cell type (B) and sample (C). D Volcano plot comparing differential gene expression in cancer cells before and after chemotherapy. E Pathway enrichment analysis of these differentially expressed genes (D). F Clustergram of positively enriched terms in posttreatment cancer cells (left) and heatmap displaying the average expression of genes related to enriched terms in cancer cells of each sample.

Gene expression profiles of the four clinical samples were obtained from scRNA-seq using Seurat21 and Uniform Manifold Approximation and Projection (UMAP) plots were generated for visualization (Supplementary Fig. 2A). Subsequently, a total of 16,368 cells from the four samples with sufficient quality reads were first simply merged and plotted on UMAP (Supplementary Fig. 3A). We then performed data integration using an R package STACAS22,23 based on the library production batches for each sample (Fig. 1B, C, Supplementary Fig. 3B). To clarify the distinction between clusters before and after data integration, we designated clusters before integration as “All-Unintegrated-C” followed by a number and those after integration as “All-Integrated-C” with a subsequent number. Cell-type annotation confirmed that UMAP clustering effectively depicted cell type variation within the tumors (Fig. 1B, Supplementary Figs. 2A, B, 3A, C). Focusing on the cancer cells, the integrated clusters formed two distinct groups, one comprising mainly clusters All-Integrated-C3 and -C14 from Pre and another including clusters All-Integrated-C1, -C2, -C5, -C6, -C9, and -C10 from the posttreatment specimens (Post1, Post2, and Meta) (Fig. 1B, C, Supplementary Fig. 3B). The separation of pre- and posttreatment samples in the plot indicates that the initial drug treatments markedly altered cancer cell expression characteristics. For instance, when extracting only cancer cells, posttreatment samples exhibited significantly decreased expression levels of the marker genes EPCAM and TACSTD2 as well as significant upregulation of VIM (an EMT marker) and CD44 (a cancer stemness marker). Reduced expression of TACSTD2 in the posttreatment samples (Fig. 1D, Supplementary Figs. 3D, E) might be clinically significant, as this gene encodes TROP2, the target protein of the therapeutic agent datopotamab-deruxtecan (Dato-DXd)24, used as the third-line drug treatment in this case (Supplementary Fig. 1). However, no report has directly demonstrated an association between TACSTD2 gene expression and Dato-Dxd therapeutic efficacy. Expression levels of ESR1 and FOXA1 were markedly decreased after treatment, although no statistically significant difference from pretreatment was observed, likely because of the initially low expression levels (Supplementary Fig. 3D, E). This observation appears to be closely related to immunohistopathology showing loss of estrogen and progesterone receptor expression in resected samples after drug treatment compared with pretreatment biopsy samples (Supplementary Table 1).

Differential gene expression analysis also revealed that CALCA, SNAI2, and BMP5 were upregulated, whereas KLK6, KLK7, LTF, and TACSTD2 were downregulated in cancer cells from posttreatment samples (Fig. 1D, Supplementary Data 1). Pathway enrichment analysis of these genes revealed increased expression levels of genes related to HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION and decreased expression levels of genes related to HALLMARK_ESTROGEN_RESPONSE_EARLY in posttreatment samples compared with those in the pretreatment specimen (Fig. 1E), in accordance with a previous study reporting association between these hallmark processes and treatment efficacy25. Close inspection revealed some overlap between terms enriched positively and negatively in the posttreatment cancer cells (Fig. 1E). However, despite the overlap, the composition of the genes was different before and after treatment (Fig. 1F and Supplementary Fig. 3F).

Copy number variation analysis revealed a small population of cancer cells within the primary tumor that exhibited drug treatment resistance

To investigate clonal heterogeneity in our samples, we estimated copy number variations (CNVs) in each sample using the R package inferCNV20 (Supplementary Figs. 4A–D). The CNV patterns inferred from the scRNA-seq data were closely aligned with the results from CNV profiling analysis performed using genomic DNA sequencing data from the same samples (Supplementary Figs. 4E, F). The cancer clusters C1, C4, C6, C9, and C14 of Pre (labeled Pre-C1, -C4, -C6, -C9, and -C14, respectively, in Fig. 2A and Supplementary Fig. 2A) demonstrated some shared CNVs, such as the overall loss of chromosome 6 and gain of the chromosome 8 long arm, but there were differences in chromosomes 3, 5 and 9 (Fig. 2B). The tumor subclustering method of inferCNV segregated the pretreatment cancer cell populations into four distinct clonal subgroups (Pre-CNV1–4). Notably, the majority of cells in clone Pre-CNV1 were from UMAP clusters Pre-C0, -C9, and -C14, those in Pre-CNV3 and 4 mainly from Pre-C4, and Pre-CNV2 primarily from Pre-C6 cells (Fig. 2C and Supplementary Fig. 5A).

Fig. 2. Genomic, transcriptomic, and pathological characteristics in cancer cells of Pre sample.

Fig. 2

A UMAP plot of cancer cell clustering of Pre (numbering of Seurat-identified Pre clusters is based on the UMAP clustering in Supplementary Fig. 2A). B Heatmap of cancer cell CNV signals by inferCNV in Pre. Horizontal lines divide subclones defined by inferCNV. A subset of cells mostly comprised of Pre-C9 are highlighted with a red square. C UMAP plot of cancer cells in Pre, color-coded according to CNV clone. D Comparison of CNV patterns in cancer from primary site before and after drug therapy. UMAP plot (left) and heatmap of CNV signals (middle) after extracting and reclustering only cells from Pre-C9 in A. Cells with loss on the long arm of chromosome 11 correspond mainly to cells in newly formed cluster C2 and show CNV patterns (blue bar) similar to most surgical (Post1 and Post2) specimens (right). E UMAP plot of cancer cells of Pre, overlayed the information of Pre-C9S and -C9R onto that in A. F UMAP of integrated cancer cells displaying Pre clusters, including -C9S and -C9R. G Violin plot and box plot of gene signature scores in each Pre cluster. Box plots represent the following: center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers. EMT, epithelial–mesenchymal transition; ER-Early, estrogen response early. H Hematoxylin and eosin staining and immunohistochemical analysis of clinical biopsy specimens of Pre. Scale bar: 250 µm. I Fluorescence in situ hybridization analysis of clinical biopsy specimens from Pre. Image shows the loss of 11q22.1 (spectrum red signals) in some cancer cells.

While most cancer cells in Pre exhibited copy number gain in the long arm of chromosome 11 (Fig. 2B and Supplementary Fig. 4A), almost all posttreatment specimens demonstrated copy number loss within this genomic region (Supplementary Fig. 4B–D). However, upon closer examination, a subset of cells within Pre-C9 also exhibited copy number loss in the long arm of chromosome 11 (Fig. 2B). To further analyze the clinical significance of this pretreatment change in copy number profile, we extracted Pre-C9 cells and repeated the clustering and inferCNV. Again, only a subset of the cancer cells in Pre-C9 exhibited loss on the long arm of chromosome 11, and the entire CNV pattern in this small subset was very similar to that of Post2 and a subpopulation of Post1 (Fig. 2D), indicating that a subset of cells in Pre-C9 survived drug treatment and expanded further. We named this treatment-resistant cluster as Pre-C9R, while categorizing the remaining treatment-sensitive cells as Pre-C9S, and this information was overlaid on the UMAP plot in Fig. 2A (Fig. 2E). Using CNV scores with single-cell resolution obtained through inferCNV, we calculated the differences in gene-specific scores between Pre-C9R and other clusters. We then assessed how genes with significant CNV score differences correspond to genes specifically expressed in Pre-C9R compared to other clusters. The results revealed that there were 1,153 genes with a mean difference (MeanDiff) > 0 and significant score differences (−log10 False Discovery Rate (FDR) > 2.0); of these, 47 genes met the criteria for differentially expressed genes (DEGs) with an adjusted P value < 0.05 and log2 Fold Change (FC) > 1.0. Conversely, among the 814 genes with a MeanDiff < 0 and significant score differences (−log10FDR > 2.0), 29 genes were identified as DEGs that met the criteria of an adjusted P value < 0.05 and log2FC < −1.0 (Supplementary Data 2). After merging Pre and Post1, we reperformed inferCNV analysis with annotations based on each sample’s original clusters (Seurat-identified Pre and Post1 clusters), identifying four new clones: Pre/Post1-CNV1–4. Unlike other Pre clusters, most cells in Pre-C9R were classified into the same clone Pre/Post1-CNV3 as cells in some Post1 clusters (Supplementary Fig. 5B and C). Notably, Post1-C6 cancer cells did not align with the cancer cell clones of Pre. When displaying Pre clusters on the UMAP of integrated cancer cells in Fig. 1B and C using the R package SCpubr26, only cells in Pre-C9R were distributed to All-Integrated-C1 and -C2, which mainly comprised cancer cells from Post1 and Post2 (Fig. 2F and Supplementary Fig. 5D). These findings indicate that Pre-C9R cells, genomically similar to posttreatment cancer cells, also closely resemble Post1 and Post2 in terms of gene expression. Overall, cells in the Pre-C9R cluster can be classified as drug-tolerant persister cells (DTPs)2730 in this breast cancer case.

Comparing the expression of several marker genes among Pre clusters, first, the expression levels of epithelial marker genes EPCAM and TACSTD2 were significantly reduced in Pre-C9R compared with cells in other clusters (Supplementary Fig. 5E, Supplementary Data 3). These findings strongly indicate that the expression of the Dato-DXd therapeutic target gene TACSTD2 was already lost in DTPs before initiation of treatment rather than downregulated during the treatment course. In other words, the primary tumor included a cell population that was potentially less sensitive or resistant to Dato-DXd before treatment. When scoring Pre clusters based on three epithelial marker gene types in normal mammary glands31 (Supplementary Data 4), the expression of genes associated with luminal hormone-sensing (L-Hor) cells and luminal alveolar (L-Alv) cells was significantly lower in Pre-C9R than in the other clusters (Supplementary Fig. 5F), consistent with reduced expression of luminal markers KRT8 and 18 in Pre-C9R (Supplementary Fig. 5E, Supplementary Data 3). However, the basal markers KRT5 and KRT14 were also downregulated in Pre-C9R compared with Pre-C9S (Supplementary Fig. 5E), despite a high basal score in Pre-C9R (Supplementary Fig. 5F), indicating a substantial loss of the epithelial nature of cancer cells in Pre-C9R. Moreover, some EMT markers VIM, COL6A1, and SNAI2, as well as the stem cell marker CD44 were upregulated in the Pre-C9R cluster, strongly indicating that these potentially drug-resistant cells had undergone EMT and acquired stem cell characteristics before treatment (Supplementary Fig. 5E, Supplementary Data 3). Scorings of gene signatures showed that, in Pre-C9R cancer cells, the HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION score was higher and the HALLMARK_ESTROGEN_RESPONSE_EARLY score was lower than those of the other cells in the pretreatment samples (Fig. 2G), which are patterns very similar to those of cancer cells in posttreatment samples (Fig. 1E, F). These findings provide supporting evidence that cells in the Pre-C9R cluster served as the precursors for the subsequently re-emerging primary tumor.

To validate the findings from scRNA-seq analysis, we performed immunohistochemistry (IHC) staining and fluorescence in situ hybridization (FISH) on clinical biopsy specimens of Pre. IHC staining revealed expression of the EMT markers N-cadherin and vimentin in a small subset of cancer cells from biopsy samples, indicating EMT-like changes in these cells (Fig. 2H). FISH analysis of the biopsy specimens showed two signals of the chromosome 11 centromere (green) and only one signal of 11q22.1 (red) in a few cells, indicating loss of the long arm of chromosome 11 in these cancer cells (Fig. 2I).

Exposure to hypoxia within the primary tumor during drug treatment conferred metastasis capacity to some cancer cells

To examine the molecular mechanisms conferring cancer cell metastatic capacity, we conducted similar CNV and clustering analyses of posttreatment samples. As shown in Fig. 2D, the Post2 sample consisted primarily of uniform clones, whereas Post1 exhibited varying copy number patterns. Notably, some CNV patterns in the Post1 sample closely matched that in the distant metastasis sample Meta (Supplementary Figs. 4B, D).

To elucidate cancer heterogeneity and metastatic recurrence mechanisms, we will now focus primarily on Post1. UMAP clustering classified Post1 cancer cells into four clusters, namely, Post1-C2, -C3, -C4, and -C6 (Fig. 3A). Meanwhile, the inferCNV tumor subclustering method divided the Post1 sample into four major clones, Post1-CNV1–4 (Fig. 3B). Of these, Post1-CNV3 and -CNV4 were almost identical to Post1-C6, whereas Post1-CNV1 and 2 did not overlap with any UMAP clusters (Fig. 3C and Supplementary Fig. 6A). We compared inferCNV-derived gene-specific CNV score differences and transcriptomic DEGs between Post1-C6 and other clusters in Post1. Among 3644 genes with a MeanDiff > 0 and significant CNV score differences (−log10FDR > 2.0), 244 genes met the DEG criteria with an adjusted P value < 0.05 and log2FC > 1.0. Conversely, of the 1347 genes with a MeanDiff < 0 and significant CNV score differences, 296 satisfied the DEG criteria with an adjusted P value < 0.05 and log2FC < −1.0 (Supplementary Data 5). We combined Post1 and Meta samples and performed inferCNV analysis, annotating based on each sample’s original clusters (Seurat-identified Post1 and Meta clusters), resulting in three new clones: Post1/Meta-CNV1–3. In contrast to other Post1 clusters, most cells in Post1-C6 were categorized into Post1/Meta-CNV2, identical to cells in Meta clusters (Supplementary Fig. 6B, C). When Post1 clusters and Post1-CNV clones were displayed alongside Meta cancer cells on the UMAP of integrated cancer cells in Fig. 1B and C using SCpubr, cells from Post1-C6 and Post1-CNV3/4 almost overlapped those from Meta (Fig. 3E and Supplementary Fig. 6D). These genomic and transcriptomic similarities strongly indicate that cancer cells in Post1-C6 and Post1-CNV3/4 acquired metastatic capacity in the primary tumor and ultimately spread to subsequent metastatic lesions in the mesentery. We termed these cells metastatic precursor cells (MPCs).

Fig. 3. Genomic, transcriptomic, and pathological characteristics in cancer cells of Pre sample.

Fig. 3

A UMAP plot of Post1 cancer cells. The numbering of Seurat-identified Post1 clusters is based on the clustering of Post1 in Supplementary Fig. 2A. B Heatmap of Post1 cancer cell CNV signals by inferCNV. Horizontal lines divide subclones defined by inferCNV. C UMAP plot of Post1 cancer cells color-coded according to CNV clone. D Heatmap of CNV signals of Post1 (left) and Meta (right). Cells with loss on the long arm of chromosomes 5 and 17 correspond mainly to Post1-CNV3 and -CNV4, and show CNV patterns (red bar) similar to those of Meta. E UMAP of integrated cancer cells displaying Seurat-identified Post1 clusters plus Meta (upper) and Post1-CNV clones plus Meta (lower). F Violin plot and box plot of gene signature scores in each Post1-CNV clone. Box plots represent the following: center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers. EMT, epithelial-mesenchymal transition. G Circos plots visualizing the cell–cell connectivity among Post1-CNV clones. Ligands occupy the lower semicircle, and corresponding receptors the upper semicircle. Subclones are color-coded by edge, ligand, and receptor. H Hematoxylin and eosin staining and immunohistochemical analysis of clinical surgically resected specimens from Post1 at two locations on the same slide: Spot A (upper) and Spot B (lower). Scale bar: 250 µm. I Fluorescence in situ hybridization analysis of clinical surgically resected specimens from Post1 at two locations on the same slide: Spot A (upper) and Spot B (lower). At Spot A, cancer cells showed normal copy numbers with two spectrum red signals (5q22.1) and two spectrum green signals (CEP5). At Spot B, cancer cells exhibited the loss of 5q22.1 (spectrum red signals).

Comparing the expression of several marker genes among Post1 clusters and Post1-CNV clones, we observed that the expression levels of EPCAM and certain keratin-related genes were higher in MPCs than in other cancer cells in Post1, although statistically unsignificant. Conversely, some keratin-related genes, such as KRT14 and 19, were significantly down-expressed in MPCs (Supplementary Fig. 6E, Supplementary Data 6 and 7). When scoring Post1 clusters and Post1-CNV clones based on the three epithelial marker gene types in normal mammary glands, MPCs tended to exhibit higher scores for all three breast epithelial markers compared with other cancer cells in Post1. In MPCs, Post1-CNV4 showed significantly higher L-Hor and L-Alv scores than Post1-CNV3 (Supplementary Fig. 6F).

DEGs were subsequently identified for each Post1-CNV clone (according to thresholds log2FC > 1.0 and P value < 0.05) (Supplementary Fig. 7A, Supplementary Data 7) and subjected to enrichment analysis. The most prominent enriched pathways in Post1-CNV1 were associated with HALLMARK_OXIDATIVE_PHOSPHORYLATION, whereas the most predominant pathways in Post1-CNV2 were related to HALLMARK_HYPOXIA, and those in Post1-CNV3 and -CNV4 to HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION (Supplementary Fig. 7B). Gene signatures related to EMT were expressed at much higher levels in MPCs, whereas hypoxia-related genes were upregulated as much or more in MPCs than in Post1-CNV2 (Fig. 3F), indicating that Post1-CNV2, 3, and 4 may have evolved from Post1-CNV1 under the strong influence of hypoxia. The second most enriched term in Post1-CNV3 and the third most enriched term in Post1-CNV4 were related to angiogenesis, and MPCs scored higher in angiogenesis than other cancer cells in Post1 (Supplementary Fig. 7B, C). Meanwhile, the genes related to angiogenesis, VEGFA, and ADM, were upregulated as the DEGs of Post1-CNV2, and their expression was higher in Post1-CNV2 than in MPCs, indicating that Post1-CNV2 may contribute to angiogenesis within the primary tumor, facilitating metastasis indirectly by aiding MPC infiltration into blood vessels.

According to cell-to-cell communication analysis conducted using the R package Connectome32, several cell–cell interactions were identified among Post1-CNV2, 3, and 4 (Fig. 3G; Supplementary Fig. 8A–C), whereas minimal interactions were observed between Post1-CNV1 and each of the other three clones. These findings indicate that Post1-CNV2 and MPCs form an independent cell population within Post1 tumors.

To validate the scRNA-seq analysis results, we performed IHC staining and FISH on surgically resected clinical specimens of Post1 in two distinct areas (Spots A and B) on the same slide. Hypoxia markers CA IX and Glut-1 showed diffusely high expression in spots A and B. However, protein levels associated with EMT and signaling on the long arm of chromosome 5 exhibited significant differences between the two areas. In Spot A cancer cells showed high E-cadherin expression but minimal N-cadherin and vimentin expression. FISH analysis revealed two pairs of signals at 5q22.1 (colored red) in most cancer cells in this area. Conversely, the cancer cells in Spot B showed low E-cadherin expression levels and higher N-cadherin and vimentin levels. Additionally, FISH analysis indicated the loss of one of the 5q22.1 signals (red) in many cells. Combining the results from single-cell analysis with pathology validations indicated that Spot A primarily comprised cancer cells with Post1-CNV1 and -CNV2, whereas MPCs were found in Spot B (Fig. 3H, I).

The activities of transcription factors, including ATF3 and SNAI2, appeared altered in MPCs

To examine the molecular mechanisms contributing to metastatic capacity, we integrated the single-cell data of Post1 and Meta, and selectively isolated cancer cells from them (Fig. 4A, B, Supplementary Fig. 9A). The new cluster Post1/Meta-Integrated-C3, consisted mainly of cells in the G2M and S phases of the cell cycle (Supplementary Fig. 9B). Post1 MPCs and Meta-derived cells were plotted in nearly identical positions on the UMAP (Fig. 4C), though cancer cells in Post1/Meta-Integrated-C9 were mostly derived from Post1-C6, while approximately two-thirds of cells in Post1/Meta-Integrated-C4 were from Meta-derived cells (Fig. 4A, C and Supplementary Fig. 9C).

Fig. 4. Transcription factors contributing to MPC transition and metastasis.

Fig. 4

A, B UMAP plots of integrated Post1 and Meta with only cancer cells. Cells are color-coded according to cluster number (A), and sample (B). C UMAP of integrated Post1- and Meta-cancer cells displaying Seurat-identified Post1 clusters plus Meta (upper) and Post1-CNV clones plus Meta (lower). D Heatmap of the TF activity alteration scores inferred from gene expression. TFs were vertically aligned by hierarchical clustering. Horizontal bars at the top of the heatmap depict cluster numbers in A and Post1-CNV clones plus Meta. E TFs whose activity was highly altered in Post1/Meta-Integrated-C4 and -C9 are featured on the UMAP plot.

To explore the gene expression regulation mechanisms in MPCs, we calculated TF activity alteration scores for integrated cancer cell data from Post1 and Meta using the R package BITFAM33. Based on a heatmap of TF target scores for each cancer cell grouped by Post1/Meta-Integrated cancer clusters (Fig. 4D), several TFs showed altered activation in a cell subgroup suspected of cancer metastasis (Post1/Meta-Integrated-C4 and -C9 in Fig. 4A). Among the cluster-specific TF activities inferred through the BITFAM algorithm, the activities of ATF3, SNAI2, KLF4, and EGR1 were highly altered in Post1/Meta-Integrated-C9, while those of SPI1 and SPIB was altered mainly in Post1/Meta-Integrated-C4 (Fig. 4D, E). These variations in TF activity indicate dynamic changes in the gene expression regulation landscape during cancer progression, from the formation of MPCs in the primary tumor to circulation in the bloodstream and eventual infiltration and growth at distant sites.

Cancer cells with metastatic potential exhibited upregulated expression of calcitonin gene-related peptide receptors

Finally, we investigated the impact of calcitonin-related gene upregulation on metastasis as the expression levels of these genes, including CALCA, were elevated in posttreatment cancer cells (Fig. 1D) but have not been examined during breast cancer progression. These genes and associated signaling pathways are shown in Fig. 5A. Among the cancer clusters obtained from Post1 and Meta integration (Fig. 4A), the expression of CALCA and CALCB was markedly elevated in Post1/Meta-Integrated-C2 and -C3, ADM was highly expressed in -C3 and -C4, and CALCRL expression was high explicitly in -C3 (Fig. 5B). The expression levels of RAMP1/2/3 and ACKR3 were generally low (Fig. 5B), and the expression levels of CALCR and MRGPRX2 were undetected in all samples (data not shown).

Fig. 5. Potential contributions of calcitonin signaling to progression from MPC to metastatic cell.

Fig. 5

A Genes, products, and their receptors for the human calcitonin family. CALCA, located on chromosome 11p15, encodes two hormones, calcitonin (CT) and calcitonin gene-related peptide (CGRP), produced through alternative mRNA processing54. CT binds to the calcitonin receptor (encoded by CALCR), while CGRP binds to the calcitonin receptor-like receptor (CLR, coded by CALCRL). ADM encodes adrenomedullin (AM), which shares structural similarities with CGRP and also binds to the CLR. The specific ligand to which CLR binds is regulated by complex formation with receptor activity-modifying proteins. If RAMP1 forms a complex with CLR, CGRP is the ligand; if RAMP2 or RAMP3 forms the complex, AM is the ligand64. In addition to AM, ADM encodes a protein called PAMP-12, which does not interact with the CLR but binds to MRGPX2 and ACKR365. (Created by BioRender) B Violin plots visualizing the expression levels of calcitonin-related genes in integrated cancer clusters of Post1 and Meta (Fig. 4A). C, D Pseudotime analysis of cells in integrated cancer clusters of Post1 and Meta (Fig. 4A). Only cancer cells in G1 cell-cycle phase are contained in this analysis. C Developmental trajectories (color-coded by Post1-CNV clones plus Meta). D Pseudotime values. E Expression level changes of ADM, CALCA, CALCB, RAMP1, and RAMP2 in cancer cells of Post1 and Meta (only in G1 cell-cycle phase) during pseudotime analysis, plotted on the UMAP. F Pseudo-temporal kinetics of ADM, CALCA, CALCB, RAMP1, and RAMP2 expression.

To investigate the coexpression of calcitonin-related genes, specifically CALCA, ADM, and CALCRL, within individual cells, we performed a correlation analysis of their expression at the single-cell level across samples, Seurat-identified Post1 clusters, and Post1-CNV clones. Given that scRNA-seq is susceptible to pseudo-negative results in gene expression because of limited sequencing depth34,35, which can underestimate correlation assessments, we smoothed the data using the Rmagic package36 in advance. According to the correlation analysis per cell, we found a positive correlation between CALCA and CALCRL expression levels in Post1-CNV3/4 and Meta. In contrast, expression levels of ADM and CALCA, and those of ADM and CALCRL, were negatively correlated in Post1-CNV3/4, and Meta (Supplementary Fig. 10A, B). These findings indicate that while many cancer cells in MPCs and Meta coexpressed CALCA and CALCRL, simultaneous expression of ADM with CALCA or ADM with CALCRL was rare.

To examine the trajectory of these gene expression changes from MPC to Meta cell in greater detail, we performed pseudotime analysis of cancer clusters obtained from Post1 and Meta integration (Fig. 4A) using R package Monocle 33739. This time, only G1-stage cancer cells were included in the analysis to remove cell cycle effects (Fig. 5C, D). Pseudo-temporal kinetic analysis revealed that, while ADM expression levels decreased at the end of the trajectory, CALCA, CALCB, and CALCRL expression levels increased almost simultaneously from MPCs to Meta. Furthermore, RAMP1 expression was practically absent in MPCs, but elevated in metastases at the end of pseudotime (Fig. 5B, E, and F).

Discussion

Since the widespread adoption of next-generation sequencing, there has been considerable progress in studies on the clonal evolution of breast cancer, mainly through genomic analysis, reaching a resolution down to the single-cell level15,16. Although genomic mutations provide critical insights into cancer’s evolution, it is challenging to fully understand the specific functional changes and variations in cancer phenotypes caused by these mutations. scRNA-seq allows for detailed analysis of expression differences at the cellular level, deepening our understanding of functional abnormalities in cancer cells, including breast cancer4053. Moreover, tools like inferCNV also enable the estimation of genomic changes, adding depth to our understanding of clonal dynamics. However, integrated analysis using multiple samples necessitates cautious interpretation due to considerable variability in gene expression among patients. Also, it is crucial to compare different lesions within the same patient when studying cancer clonal evolution with scRNA-seq.

In this study, we analyzed multiple samples from a single patient both temporally and spatially, avoiding issues of interpatient variability and providing a detailed overview of the cancer’s functional evolution by comparing conditions before and after drug treatment and between primary and metastatic sites. This approach allowed us to detect particular cell types like DPTs and MPCs, which are often difficult to identify with bulk analysis or when analyzing data integrated across many patients. scRNA-seq and clustering analyses revealed that the primary breast tumor of our patient was highly heterogeneous at the genomic level even before treatment and contained a small number of cancer cells with basal-like features (within mostly cells with luminal breast epithelial characteristics) that appeared to survive drug treatment and expand, reforming the tumor. This small subpopulation may correspond to previously reported DTP cells2730 and exhibited gene expression profiles associated with EMT and stemness in the present case. Clinically, the patient initially responded to drug therapy; however, only the primary tumor gradually enlarged. We speculate that the enlarged cancer cells were DTPs existing before treatment (designated as Pre-C9R in Fig. 2). We demonstrated that the CNV pattern and gene expression of Pre-C9R were highly similar to those of the post-treatment primary tumor. scRNA-seq analysis revealed that the re-enlarged primary tumor was heterogeneous at both the expression and genomic levels. In the Post1 sample, the inferred CNV pattern was divided into four subclones, two of which (named Post1-CNV3/4 in Fig. 3, nearly synonymous with Post1-C6) almost precisely matched the CNV pattern of the Meta sample. Conversely, the CNV pattern of Post1-C6 did not match any cluster of pretreatment cancer cells. We corroborated these findings by conducting inferCNV analyses on merged scRNA-seq data from Pre + Post1 and Post1 + Meta, alongside individual sample-specific inferCNV assessments. Additionally, identical results were achieved when inferCNV was applied to a merged dataset of all four samples, although these data are not shown. Consequently, we estimated that in the tumor at Post1, new clones originated from DTP-derived cells, and those that acquired metastatic capability (Post1-CNV3/4 or Post1-C6) migrated to the mesentery around the time of primary lesion surgery (Fig. 6A). However, we cannot definitively prove that the enlarged primary site after treatment metastasized to Meta, considering that multiple mesenteric metastases were already identified in imaging at initial diagnosis and the sampled metastatic sites may have existed at that time. Nevertheless, the clinical course supports our hypothesis. Drug treatment rendered distant metastases nearly undetectable in imaging. The only confirmed recurrence after primary tumor surgery was at the sampled mesenteric metastases, located in different areas from those at initial screenings. Since their surgical removal, no further distant metastases have been found in this patient. Additionally, axillary lymph nodes excised for sampling during primary tumor surgery were pathologically undiagnosed as metastatic. It is plausible that all cancers, including the axillary lymph nodes, metastasized de novo and were susceptible to drug treatment and elimination.

Fig. 6. Cancer progression estimation in the studied case.

Fig. 6

A Putative cancer cell evolutionary pathways in the studied case. B Schematic representation illustrating the generation of subclones within Post1 and their respective roles in cancer progression. Post1-CNV2–4 may have originated from Post1-CNV1 owing to hypoxic conditions within the tumor core. Cancer cells within the Post1-CNV2 clone may have contributed to angiogenesis within the primary tumor by expressing ADM and VEGFA, thereby prompting MPC survival and migration within Post1-CNV3 and -CNV4. C Putative contributions of calcitonin gene-related peptide (CGRP) secretion and expression of calcitonin receptor-like receptor (CLR) and RAMP1 in breast cancer cell evolution. (Created by BioRender).

Following drug treatment, multiple subclones were identified at the primary tumor site, and hypoxia emerged as a factor in our analysis. Notably, the surgical specimen of the primary lesion was large and exhibited extensive internal necrosis, indicating a hypoxic environment within the tumor. Immunostaining of pathological specimens from the primary lesion revealed diffuse expression of the hypoxia markers CA IX and Glut-1. Two out of three subclones, Post1-CNV3 and 4, exhibited CNV profiles similar or identical to a clone isolated from distant metastases, indicating the presence of specific cancer cells in the primary tumor with latent capacity to metastasize, which we named MPCs. MPCs’ properties closely resemble those of cancer cells in metastatic sites in terms of copy number profiles, and gene expression levels, including EMT characteristics. In pathological specimens from primary lesion surgical samples, EMT marker expression of was highly heterogeneous. The other subclone, Post1-CNV2, although apparently lacking metastatic capacity, strongly expressed genes related to angiogenesis, such as VEGFA and ADM, and thus may have supported metastasis by allowing migration of MPCs into the bloodstream (Fig. 6B). It remains uncertain whether hypoxia promoted the branching of clones. However, our findings indicate that the subclones Post1-CNV2–4 exhibited notably higher hypoxia marker expression levels than the main clone Post1-CNV1. Additionally, observable cell–cell communication occurred among these subclones, whereas minimal communication with Post1-CNV1 was observed. These findings indicate that the newly branched subclones were closely associated with hypoxia and strongly interacted with each other. Furthermore, there were dynamic changes in transcriptional modulator expression during cancer cell progression to metastasis, indicating epigenomic plasticity during this metastatic transition.

In the course of breast cancer evolution, cells began to exhibit high expression levels of calcitonin-related genes, including CALCA, although it is unclear if this change was induced or emerged by chance mutation. Changes in the expression of calcitonin gene-related peptide (CGRP), a splicing variant of calcitonin54, have been reported in various cancers, including breast cancer55. Initially, in our case, posttreatment cancer cells did not express receptors for CGRP; thus, CGRP could not influence cancer cell behavior. However, the MPCs that emerged in the posttreatment primary tumor began to express CALCRL (coding CGRP’s receptor, calcitonin receptor-like receptor (CLR)44), potentially allowing CGRP signaling and associated phenotypic changes. However, expression of RAMP1, which form complex with CLR54, was still largely absent in posttreatment primary tumor cells, including MPCs. Surprisingly, as MPCs metastasized to distant sites, RAMP1 was expressed (in addition to further CALCRL upregulation), indicating that these metastatic cancer cells had developed the capacity for autocrine CGRP responses. This sequence of gene expression changes indicates that cancer cells can dynamically adapt to promote survival (Fig. 6C).

In this study, we obtained tumor samples for gene expression and clonal analyses from the same patient because the phenotypic heterogeneity of tumors from different patients can obscure the unique molecular characteristics of small subpopulations relevant to tumor progression56,57. Conversely, when analyzing various samples from the same patient, the background remains consistent, facilitating the detection of important molecular similarities and differences among clones, even when based on single-cell analysis. This study confirms the dramatic changes in tumor clonal structure induced by drug treatment due to differential sensitivity. Furthermore, we found a subpopulation of DTP-like cells with high basal drug resistance, including cells with high metastatic capacity (termed MPCs). Such findings could only be obtained by single-cell analysis at multiple stages of cancer development within the same patient. A distinctive feature of this study is the successful description of cancer progression in a patient through integration of omics analysis with clinical course and pathological data. The findings presented here indicate that effective cancer treatment strategies must account for the changes in cancer cell biology induced by selection pressure and genomic instability.

The major limitation of this study is that we analyzed only a single case; thus, these findings cannot be extrapolated to the broader breast cancer population. Additionally, constraints related to sample collection include obtaining biopsy specimens solely from primary lesions at the patient’s first consultation. Distant metastases were not biopsied by clinicians before treatment, rendering these sites inaccessible for analysis. Consequently, the outlined cancer progression pattern in this patient includes speculative elements. Moreover, single-cell analyses typically offer a limited number of cells for examination, indicating that our findings may not fully represent all cancer cells within this patient. Additionally, there were no analyses of local microenvironmental influences on cancer cell gene expression profiles. Nonetheless, we identified a subpopulation of cells (MPCs) that may be critical for cancer metastasis but difficult to detect by tissue-level analysis of multiple patients. Understanding the characteristics of DTPs and MPCs could lead to the development of treatments for targeted eradication, greatly reducing the risk of metastasis. The reproducibility of our findings must be confirmed by conducting the same single-cell analyses in multiple breast cancer cases with diverse clinical outcomes.

In summary, we compared scRNA-seq results from pretreatment primary breast tumor, posttreatment primary breast tumor, and distant tumors to identify gene expression and cell phenotype changes underlying the development of treatment resistance and metastasis. We identified specialized cell populations with drug resistance at baseline and another with metastatic potential after drug treatment. Identifying cells contributing to poor therapeutic response and the associated mechanism of metastasis may aid in the development of targeted ablation treatments.

Methods

Patient characteristics and sample collection

Pretreatment and posttreatment specimens were obtained from a lactating patient with breast cancer aged 40 years at initial diagnosis. A midwife noticed a mass in the patient's left breast while instructing on lactation. She visited the Cancer Institute Hospital to undergo detailed examinations. A subsequent needle biopsy confirmed the tumor to be invasive ductal carcinoma. Imagin tests revealed metastases in the axillary lymph nodes, liver, lung, and peritoneum. The patient initially received paclitaxel and bevacizumab treatment but discontinued after 1.5 months because of tumor progression at the primary site, lymph nodes, and liver, as revealed by CT scan. Subsequently, carboplatin, gemcitabine, and pembrolizumab therapy were initiated, which resulted in significant tumor and metastasis reduction. However, after a few months, the primary tumor gradually regrew, prompting treatment cessation. The patient then participated in a clinical trial for Dato-DXd, an antibody-drug conjugate targeting TROP2. However, the tumor continued to grow after the first dose, reaching 10 cm in diameter, which required surgical resection of the left breast and axillary lymph nodes. After the mastectomy, the patient did not wish to undergo further treatment and opted for follow-up observation only. However, new lesions were subsequently found in the peritoneum, necessitating surgical intervention. In total, four specimens were collected from the patient for scRNA-seq: the pretreatment needle biopsy sample (Pre), a posttreatment surgical specimen near the central necrotic tissue (Post1), a sample containing viable cancer cells collected near the tumor margins (Post2), and a surgical specimen of peritoneal metastasis (Meta). The patient gave written informed consent before specimen collection. The protocol was approved by the institutional ethics committee of Cancer Institute Hospital, Japanese Foundation for Cancer Research (No. 2018–1168) in accordance with the ethical guidelines of the institutional ethics committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

Single-cell preparation from clinical specimens

Biopsy specimens for clinical diagnosis and surgically resected tumor tissues were dissociated into single cells using a MACS Tumor Dissociation Kit and a gentleMACS Dissociator (Miltenyi Biotec, North Rhine-Westphalia, Germany) following the manufacturer’s instructions. Aliquoted cells were resuspended in the freeze preservation solution Bambanker (CS-02-001, GCLTEC), and frozen at –80 °C.

10x Genomics Chromium library construction and sequencing

We performed single-plex analysis of four clinical samples acquired, respectively, by needle biopsy of the primary breast tumor prior to treatment (Pre), surgical resection of the primary tumor after third-line drug treatments (Post1 and Post2), and from surgical resection of distant metastases (Meta). In the initial batch, library preparations were conducted for samples labeled as Pre (approximately six months post-cryopreservation) and Post2 (approximately one-week post-cryopreservation). Subsequently, the library for the Post1 sample (approximately two months post-cryopreservation) was prepared in the second batch. Finally, in the third batch, we prepared the library for the Meta sample (two days post-cryopreservation). Approximately 20,000 to 30,000 indexed cells per sample were loaded in a single microfluidic channel of the 10x Genomics Chromium system for single-cell capture and cDNA preparation according to the single-cell 3’ v3 Protocol recommended by the manufacturer (10x Genomics, Pleasanton, CA, USA). The libraries were then sequenced on the Illumina NextSeq 550 platform (Illumina, California, USA) with paired-end reads (read1, 28 bp; index1, 10 bp; index2, 10 bp; read2, 90 bp). Raw data from the 10x Genomic platform were processed by Cell Ranger (v6.1.2) and mapped to the human reference genome GRCH38 (accession GCA_000001405.15).

Quality check and preprocessing of the single-cell RNA-seq data

Quality control, normalization, and unbiased clustering of single-cell transcriptomes were performed using R and Seurat package (v5.0.2)21, unless otherwise specified. As the first step, low-quality barcodes with a gene count less than 400 (nFeature_RNA < 400), unique molecular identifier count more than 30,000 to 40,000 (nCount_RNA > 30,000 or > 40,000), or mitochondrial genes fraction greater than 20–25% (percent.mt > 20% for Pre, Post1, and Meta, and >25% for Post2) were removed. Transcript count matrices from high-quality scRNA-seq data were normalized to the total number of counts per cell and multiplied by a scale factor of 10,000. The normalized values were subsequently natural-log transformed using the Seurat “NormalizeData()” function, and a linear transformation was applied using the Seurat “ScaleData()” function. The impact of cell cycle-related gene expression heterogeneity on subsequent analyses was eliminated by cell-cycle scoring according to canonical marker expression and regressing out these markers using Seurat. Principal component analysis was performed using “RunPCA()”, and the top 2000 variable features were identified using the Seurat “FindVariableFeatures()” function and “vst” selection method. Seurat standard clustering procedures were performed using “FindNeighbors()” and “FindClusters()” with 1-50 dimensions and a resolution of 0.5. scRNA-seq datasets were then projected onto UMAP embedding space using “RunUMAP()” with 1-50 dimensions. DEGs in each cluster were identified using the FindAllMarker() or FindMarker() function within the Seurat package, with corresponding P value determined using the Wilcoxon Rank Sum test followed by a Bonferroni correction.

Cell-type annotation

Following UMAP clustering, we annotated cell types for each cluster. First, we identified marker genes for each cluster, setting |log2 (fold change)|> 1, and adjusted P value < 0.05. Additionally, we examined the expression levels of established marker genes for each cluster as follows: breast epithelial markers EPCAM, KRT8, and KRT14; B lymphocyte marker CD79A; T lymphocyte marker CD3D; natural killer lymphocyte markers KLRB1 and KLRD1; myeloid cell markers FCER1G, CD1C, and CELC10A; fibroblast markers FAP, ACTA2, and MYL9; and endothelial cell markers PECAM1 and THBD. Finally, the SingleR58 (v2.4.1) package was used for cell cluster annotation with the reference dataset HumanPrimaryCellAtlasData49.

Data integration

Library preparation and sequencing of samples were performed in three batches: Pre and Post2 in the first batch, Post1 in the second batch, and Meta in the third batch. To avoid batch effects after merging multiple sample data, we performed data integration using the R package STACAS (v2.2.2)22,23. Briefly, after merging normalized data using the merge() function in Seurat, we segregated the object by batch information. Anchors were identified using FindAnchors.STACAS (…, dims = 1:15, anchor.features = 1000). The integration order was calculated using SampleTree.STACAS(). Finally, integration was conducted with the anchors using the IntegrateData.STACAS() function. Downstream analysis, graph-based clustering, visualization, and differential gene expression analyses were performed using Seurat. To visualize cluster or CNV-clone information on the UMAP plot of integrated data, we employed the do_DimPlot() function in the SCpubr package (v2.0.0)26.

Breast epithelial cell signature score comparisons

Normal breast epithelial cells are differentiated into three cell types, “luminal hormone-sensing (L-Hor) cells,” “luminal-alveolar (L-Alv) cells,” and “basal cells.” We scored our samples’ cancer clusters with breast epithelial-cell signatures using published gene sets generated from human scRNAseq data31 and via the AddModuleScore() function in Seurat. Differences in scores within each cluster were detected using the Wilcoxon Rank Sum test.

Enrichment analysis and gene set scoring

Molecular Signature DataBase Hallmark 2020 (MSigDB Hallmark 2020) pathway enrichment analysis was performed using the enrichR package (v3.2) interface of the Enrichr database59. For the top-ranked hallmarks, gene sets from MSigDB60 were downloaded, and gene signature expression scores were calculated using the Seurat “AddModuleScore()” function to compare expression intensities among clusters and subclones. Differences in scores within each cluster were detected using the Wilcoxon Rank Sum test.

Copy number variation prediction with inferCNV

Somatic large-scale chromosomal CNV of each sample and some merged samples were calculated using the R package “inferCNV” (v1.15.3)20. InferCNV is a tool used to deduce CNV from tumor single-cell RNA sequencing data, identifying signs of large-scale chromosomal CNV in somatic cells, such as expansions or deletions of whole chromosomes or substantial chromosomal segments. By comparing these data to a reference set of “normal” cells, the variation in gene expression across different regions of the tumor genome can be analyzed to identify areas of chromosomal amplification or deletion. Subsequently, the relative expression intensity across each chromosome is visualized as a heatmap. A raw counts matrix, annotation file, and gene/chromosome position file were prepared according to data requirements (https://github.com/broadinstitute/inferCNV). The detailed settings were as follows: cutoff = 0.1, cluster_by_groups = F, and analysis_mode = “subcluster.” The inferCNV package “random_trees” option was used for partitioning the hierarchical clustering tree into subclusters. Fibroblasts and myeloid cells were selected as reference normal cells.

Cell-to-cell communication analysis

To analyze and visualize ligand-receptor interactions among CVN clones in Post1, we used the R package Connectome (v1.0.0)32. Briefly, we applied the CreateConnectome() function with a min.cells.per.ident cutoff of 75, followed by data filtering using the FilterConnectome() function to include only edges with a ligand and receptor z-score exceeding 0.25. This Filtration narrowed the 38,752 edges down to 260. Centrality analysis was conducted across all signaling families by setting the parameters as weight.attribute = “weight_sc” and group.by = “mode.” The CircosPlot() and NetworkPlot() functions were used to visualize the interactions. In making circos plots, the top 5 signaling vectors for each cell–cell vector were selected.

Transcription factor activity estimation

TF activities of posttreatment surgical specimens were estimated from normalized scRNA-seq data using the R package BITFAM (v1.2.0) 33. A heatmap of inferred TF activities was then generated using the ComplexHeatmap package (v2.14.0). Some single-cell TF activity results were merged into the Seurat object using the “AddMetaData()” function, and plotted in UMAP space using the “FeaturePlot()” function.

Correlation analysis for gene expression levels per cell

Before performing correlation analysis of gene expression levels per cell, we used the MAGIC() function of the Rmagic package (2.0.3)36 with its default parameters to reduce the sparsity of single-cell data. Pearson correlation coefficients were calculated between the expression of two selected genes using the cor() function.

Pseudotime analysis

To examine genetic variation during the transition from MPCs to metastatic cancer cells, we used Post1/Meta-Integrated cancer cells in only G1 cell phase to perform trajectory/pseudotime analysis with the Monocle 3 (v1.2.9) algorithm3639. An inferred branched pseudotime trajectory was constructed using the Monocle 3’s “learn_graph()” function. For ordering the cells according to pseudotime, the point considered furthest from Post1/Meta-Integrated-C4 or -C9 was defined as the root node. The “plot_genes_in_pseudotime()” function was used to display ADM, CALCA, CALCRL, RAMP1, and RAMP2 genes in the pseudotime trajectory.

DNA extraction and copy number variation analysis

Total genomic DNA was isolated from dissociated cells using the QIAamp UCP DNA Micro Kit (QIAGEN) following the manufacturer’s instructions. The sequencing library was prepared using Nextera DNA Flex Library Prep Kit (Illumina) and sequenced on a NovaSeq 6000 system (Illumina). To ensure quality control and remove adaptor sequences, we employed TrimGalore (version 0.6.10; https://github.com/FelixKrueger/TrimGalore), a Perl wrapper of Cutadapt and FastQC. Subsequently, reads were mapped to the hg38 genome using BWA MEM61. Copy number analysis was performed using CNVkit62.

Correlation analysis of copy number variation levels

We compared CNV scores obtained from inferCNV with those from genomic DNA for each sample. Since the bulk DNA sequencing results provided by CNVkit were based on regions, genes overlapping these regions were annotated using the hg38 reference genome63 (TxDb.Hsapiens.UCSC.hg38.knownGene) and formatted as log2-transformed coverage at the gene level. Correlation plots were generated by mapping the genes identified in the inferCNV results to those in the CNVkit output.

We also compared CNV scores derived from inferCNV data between two groups: Pre-C9R and other cancer clusters, and Post1-C6 and other cancer clusters. We assessed the differences in scores per cell using the Wilcoxon rank-sum test. The false discovery rate (FDR) was calculated for each gene.

Immunohistochemistry

For ICH, the 4-μm-thick formalin-fixed paraffin-embedded (FFPE) sections of clinical samples were stained with antibodies against E-cadherin (NCH-38; 1:200 dilution; DAKO), N-cadherin (6G11; 1:100 dilution; DAKO), Vimentin (V9; 1:4 dilution; DAKO), CA IX (Polyclonal; 1:500 dilution; abcam), and Glut-1 (Polyclonal; 1:100 dilution; IBL).

FISH analysis

To detect the characteristic genetic features of 11q loss in pretreatment biopsy sample and 5q loss in posttreatment surgical sample from the primary site, we performed FISH assays using bacterial artificial chromosome (BAC) probes. Probes were designed as follows: targeting ARHGAP42 at 11q22.1 (Texas Red) and centromere 11 (CEP11, FITC) for Pre biopsy specimens; targeting APC at 5q22.1 (Texas Red) and centromere 5 (CEP5, FITC) for Post1 surgical specimens. BAC clone DNA was extracted using PI-80X (Kurabo, Osaka, Japan) and fluorescently labeled using a nick translation kit (Abbott Molecular, Des Plaines, IL, USA). Unstained 4-μm-thick FFPE sections were hybridized with fluorescent DNA probes, and the hybridized slides were stained with 4,6-diamidino-2-phenylindole before being examined under a BZ-X800 fluorescence microscope (KEYENCE). The names of the BAC clones used are available upon request.

Supplementary information

Supplementary Data 1 (763.5KB, xlsx)
Supplementary Data 2 (1.1MB, xlsx)
Supplementary Data 3 (122.3KB, xlsx)
Supplementary Data 4 (30.7KB, xlsx)
Supplementary Data 5 (1.2MB, xlsx)
Supplementary Data 6 (111KB, xlsx)
Supplementary Data 7 (141.4KB, xlsx)

Acknowledgements

The authors would like to thank Enago for the English language review.

Author contributions

K.O., K.K., H.S., and R.M. performed data analysis. K.O., Y.T., S.S., S.O., and T.U. recruited patients and obtained clinical specimens. K.O. processed tumor samples and L.Y. performed single-cell experiments. T.O., S.B., and T.K. performed pathological analyses. T.K. and T.T. provided clinical information. T.N., S.O., T.U., and R.M. supervised the research. K.O. and R.M. wrote the manuscript. All authors discussed the results and commented on the manuscript.

Data availability

Processed scRNA-seq data have been deposited at GEO (Accession ID: GSE264205).

Code availability

R code for reproducing the results can be found at https://github.com/KazutakaOtsuji/scRNA-StageIV_BC. Data analysis was performed using R version 4.2.2, and Seurat version 5.0.2. Default variables and parameters were used unless otherwise specified in the Method section.

Competing interests

The authors declare no competing Interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41698-024-00723-6.

References

  • 1.Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol.41, 4192–4199 (2023). [DOI] [PubMed] [Google Scholar]
  • 2.Kast, K. et al. Impact of breast cancer subtypes and patterns of metastasis on outcome. Breast Cancer Res Treat.150, 621–629 (2015). [DOI] [PubMed] [Google Scholar]
  • 3.Tarighati, E., Keivan, H. & Mahani, H. A review of prognostic and predictive biomarkers in breast cancer. Clin. Exp. Med23, 1–16 (2023). [DOI] [PubMed] [Google Scholar]
  • 4.Marusyk, A., Janiszewska, M. & Polyak, K. Intratumor heterogeneity: The Rosetta stone of therapy resistance. Cancer Cell37, 471–484 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ramon, Y. C. S. et al. Clinical implications of intratumor heterogeneity: challenges and opportunities. J. Mol. Med.98, 161–177 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ortega, M. A. et al. Using single-cell multiple omics approaches to resolve tumor heterogeneity. Clin. Transl. Med6, 46 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Nath, A. & Bild, A. H. Leveraging single-cell approaches in cancer precision medicine. Trends Cancer7, 359–372 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Pang, L. et al. Single-cell integrative analysis reveals consensus cancer cell states and clinical relevance in breast cancer. Sci. Data11, 289 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Inayatullah, M. et al. Basal-epithelial subpopulations underlie and predict chemotherapy resistance in triple-negative breast cancer. EMBO Mol. Med.16, 823–853 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature481, 306–313 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Dang, H. X. et al. The clonal evolution of metastatic colorectal cancer. Sci. Adv.6, eaay9691 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Biswas, A. & De, S. Drivers of dynamic intratumor heterogeneity and phenotypic plasticity. Am. J. Physiol. Cell Physiol.320, C750–C760 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Burkhardt, D. B., San Juan, B. P., Lock, J. G., Krishnaswamy, S. & Chaffer, C. L. Mapping phenotypic plasticity upon the cancer cell state landscape using manifold learning. Cancer Discov.12, 1847–1859 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Whiting, F. J. H., Househam, J., Baker, A. M., Sottoriva, A. & Graham, T. A. Phenotypic noise and plasticity in cancer evolution. Trends Cell Biol34, 451–464 (2023). [DOI] [PubMed] [Google Scholar]
  • 15.Wang, K. et al. Archival single-cell genomics reveals persistent subclones during DCIS progression. Cell186, 3968–3982 e3915 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nishimura, T. et al. Evolutionary histories of breast cancer and related clones. Nature620, 607–614 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kumar, T. et al. A spatially resolved single-cell genomic atlas of the adult human breast. Nature620, 181–191 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lomakin, A. et al. Spatial genomics maps the structure, nature and evolution of cancer clones. Nature611, 594–602 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Harvey, J. M., Clark, G. M., Osborne, C. K. & Allred, D. C. Estrogen receptor status by immunohistochemistry is superior to the ligand-binding assay for predicting response to adjuvant endocrine therapy in breast cancer. J. Clin. Oncol.17, 1474–1481 (1999). [DOI] [PubMed] [Google Scholar]
  • 20.Tickle, T. I., Georgescu, C., Brown, M. & Haas, B. inferCNV of the Trinity CTAT Project. https://github.com/broadinstitute/inferCNV.
  • 21.Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell184, 3573–3587 e3529 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Andreatta, M. et al. Semi-supervised integration of single-cell transcriptomics data. Nat. Commun.15, 872 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Andreatta, M. & Carmona, S. J. STACAS: sub-type anchor correction for alignment in Seurat to integrate single-cell RNA-seq data. Bioinformatics37, 882–884 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Okajima, D. et al. Datopotamab Deruxtecan, a novel TROP2-directed antibody-drug conjugate, demonstrates potent antitumor activity by efficient drug delivery to tumor cells. Mol. Cancer Ther.20, 2329–2340 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.He, B. et al. The prognostic landscape of interactive biological processes presents treatment responses in cancer. EBioMedicine41, 120–133 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Blanco-Carmona, E. Generating publication ready visualizations for Single Cell transcriptomics using SCpubr. bioRxiv, 2022.2002.2028.482303 (2022).
  • 27.Pu, Y. et al. Drug-tolerant persister cells in cancer: the cutting edges and future directions. Nat. Rev. Clin. Oncol.20, 799–813 (2023). [DOI] [PubMed] [Google Scholar]
  • 28.Dhanyamraju, P. K., Schell, T. D., Amin, S. & Robertson, G. P. Drug-tolerant persister cells in cancer therapy resistance. Cancer Res82, 2503–2514 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mikubo, M., Inoue, Y., Liu, G. & Tsao, M. S. Mechanism of drug tolerant persister cancer cells: the landscape and clinical implication for therapy. J. Thorac. Oncol.16, 1798–1809 (2021). [DOI] [PubMed] [Google Scholar]
  • 30.Ramirez, M. et al. Diverse drug-resistance mechanisms can emerge from drug-tolerant cancer persister cells. Nat. Commun.7, 10690 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Saeki, K. et al. Mammary cell gene expression atlas links epithelial cell remodeling events to breast carcinogenesis. Commun. Biol.4, 660 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Raredon, M. S. B. et al. Computation and visualization of cell-cell signaling topologies in single-cell systems data using Connectome. Sci. Rep.12, 4187 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Gao, S., Dai, Y. & Rehman, J. A Bayesian inference transcription factor activity model for the analysis of single-cell transcriptomes. Genome Res31, 1296–1311 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bouland, G. A., Mahfouz, A. & Reinders, M. J. T. Consequences and opportunities arising due to sparser single-cell RNA-seq datasets. Genome Biol.24, 86 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhang, M. J., Ntranos, V. & Tse, D. Determining sequencing depth in a single-cell RNA-seq experiment. Nat. Commun.11, 774 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell174, 716–729.e727 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods14, 979–982 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Qiu, X. et al. Single-cell mRNA quantification and differential analysis with Census. Nat. Methods14, 309–315 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol.32, 381–386 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Dave, A. et al. The Breast Cancer Single-Cell Atlas: Defining cellular heterogeneity within model cell lines and primary tumors to inform disease subtype, stemness, and treatment options. Cell Oncol. (Dordr.)46, 603–628 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ruan, H. et al. Single-cell RNA sequencing reveals the characteristics of cerebrospinal fluid tumour environment in breast cancer and lung cancer leptomeningeal metastases. Clin. Transl. Med12, e885 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Liu, Y. et al. Intercellular communication reveals therapeutic potential of epithelial-mesenchymal transition in triple-negative breast cancer. Biomolecules12, 1478 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Gambardella, G. et al. A single-cell analysis of breast cancer cell lines to study tumour heterogeneity and drug response. Nat. Commun.13, 1714 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Carpen, L. et al. A single-cell transcriptomic landscape of innate and adaptive intratumoral immunity in triple negative breast cancer during chemo- and immunotherapies. Cell Death Discov.8, 106 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Zhou, S. et al. Single-cell RNA-seq dissects the intratumoral heterogeneity of triple-negative breast cancer based on gene regulatory networks. Mol. Ther. Nucleic Acids23, 682–690 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Xu, K. et al. Single-cell RNA sequencing reveals cell heterogeneity and transcriptome profile of breast cancer lymph node metastasis. Oncogenesis10, 66 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet53, 1334–1347 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Vishnubalaji, R. & Alajez, N. M. Transcriptional landscape associated with TNBC resistance to neoadjuvant chemotherapy revealed by single-cell RNA-seq. Mol. Ther. Oncolytics23, 151–162 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ren, L. et al. Single cell RNA sequencing for breast cancer: present and future. Cell Death Discov.7, 104 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Pal, B. et al. A single-cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast. EMBO J.40, e107333 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hu, L. et al. Single-cell RNA sequencing reveals the cellular origin and evolution of breast cancer in BRCA1 mutation carriers. Cancer Res81, 2600–2611 (2021). [DOI] [PubMed] [Google Scholar]
  • 52.Bassez, A. et al. A single-cell map of intratumoral changes during anti-PD1 treatment of patients with breast cancer. Nat. Med27, 820–832 (2021). [DOI] [PubMed] [Google Scholar]
  • 53.Ding, S., Chen, X. & Shen, K. Single-cell RNA sequencing in breast cancer: Understanding tumor heterogeneity and paving roads to individualized therapy. Cancer Commun. (Lond.)40, 329–344 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Fila, M., Sobczuk, A., Pawlowska, E. & Blasiak, J. Epigenetic connection of the calcitonin gene-related peptide and its potential in migraine. Int. J. Mol. Sci. 23, 6151 (2022). [DOI] [PMC free article] [PubMed]
  • 55.Sanchez, M. L., Rodriguez, F. D. & Covenas, R. Peptidergic Systems and Cancer: Focus on Tachykinin and calcitonin/calcitonin gene-related peptide families. Cancers. 15, 1694 (2023). [DOI] [PMC free article] [PubMed]
  • 56.Mahalanabis, A. et al. Evaluation of single-cell RNA-seq clustering algorithms on cancer tumor datasets. Comput Struct. Biotechnol. J.20, 6375–6387 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Fan, J., Slowikowski, K. & Zhang, F. Single-cell transcriptomics in cancer: computational challenges and opportunities. Exp. Mol. Med52, 1452–1465 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol.20, 163–172 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res44, W90–97 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst.1, 417–425 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv:1303.3997; 10.48550/arXiv.1303.3997 (2013).
  • 62.Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput Biol.12, e1004873 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Casper, J. et al. The UCSC Genome Browser database: 2018 update. Nucleic Acids Res46, D762–D769 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Hay, D. L., Garelja, M. L., Poyner, D. R. & Walker, C. S. Update on the pharmacology of calcitonin/CGRP family of peptides: IUPHAR Review 25. Br. J. Pharm.175, 3–17 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Meyrath, M. et al. Proadrenomedullin N-Terminal 20 Peptides (PAMPs) are agonists of the chemokine scavenger receptor ACKR3/CXCR7. ACS Pharm. Transl. Sci.4, 813–823 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data 1 (763.5KB, xlsx)
Supplementary Data 2 (1.1MB, xlsx)
Supplementary Data 3 (122.3KB, xlsx)
Supplementary Data 4 (30.7KB, xlsx)
Supplementary Data 5 (1.2MB, xlsx)
Supplementary Data 6 (111KB, xlsx)
Supplementary Data 7 (141.4KB, xlsx)

Data Availability Statement

Processed scRNA-seq data have been deposited at GEO (Accession ID: GSE264205).

R code for reproducing the results can be found at https://github.com/KazutakaOtsuji/scRNA-StageIV_BC. Data analysis was performed using R version 4.2.2, and Seurat version 5.0.2. Default variables and parameters were used unless otherwise specified in the Method section.


Articles from NPJ Precision Oncology are provided here courtesy of Nature Publishing Group

RESOURCES