Skip to main content
Nature Portfolio logoLink to Nature Portfolio
. 2024 Oct 16;5(11):1660–1680. doi: 10.1038/s43018-024-00839-5

Two distinct epithelial-to-mesenchymal transition programs control invasion and inflammation in segregated tumor cell populations

Khalil Kass Youssef 1, Nitin Narwade 1,#, Aida Arcas 1,6,#, Angel Marquez-Galera 1,#, Raúl Jiménez-Castaño 1,#, Cristina Lopez-Blau 1, Hassan Fazilaty 1,7, David García-Gutierrez 1, Amparo Cano 2,3, Joan Galcerán 1,4, Gema Moreno-Bueno 2,3,5, Jose P Lopez-Atalaya 1, M Angela Nieto 1,4,
PMCID: PMC11584407  PMID: 39414946

Abstract

Epithelial-to-mesenchymal transition (EMT) triggers cell plasticity in embryonic development, adult injured tissues and cancer. Combining the analysis of EMT in cell lines, embryonic neural crest and mouse models of renal fibrosis and breast cancer, we find that there is not a cancer-specific EMT program. Instead, cancer cells dedifferentiate and bifurcate into two distinct and segregated cellular trajectories after activating either embryonic-like or adult-like EMTs to drive dissemination or inflammation, respectively. We show that SNAIL1 acts as a pioneer factor in both EMT trajectories, and PRRX1 drives the progression of the embryonic-like invasive trajectory. We also find that the two trajectories are plastic and interdependent, as the abrogation of the EMT invasive trajectory by deleting Prrx1 not only prevents metastasis but also enhances inflammation, increasing the recruitment of antitumor macrophages. Our data unveil an additional role for EMT in orchestrating intratumor heterogeneity, driving the distribution of functions associated with either inflammation or metastatic dissemination.

Subject terms: Breast cancer, Cell biology, Cancer


Youssef et al. study epithelial-to-mesenchymal transition (EMT) programs in breast cancer and identify two interdependent trajectories coexisting within tumors, with EMT transcription factors such as PRRX1 at their nexus.

Main

Epithelial plasticity is at the core of crucial processes, including embryonic cell migration, cancer progression, organ fibrosis and tissue repair14. Epithelial-to-mesenchymal transition (EMT) triggers cell plasticity in all these contexts, highlighting its pleiotropy and intrinsic complexity. EMT is not a binary process or a single program, as it implies the generation of intermediate hybrid epithelial–mesenchymal (E/M) states that, often, never reach the full mesenchymal state4,5. EMT frequently endows cells with invasive and migratory properties used by both embryonic and cancer cells to disseminate and then colonize to form tissues or metastases, respectively1. In other contexts, such as during the progression of organ fibrosis, cells do not migrate, as they are unable to activate the invasion process6,7. The latter has been defined as a partial EMT, with cells showing a hybrid noninvasive E/M phenotype. However, this is not the only type of partial EMT, as during cancer progression, cells with a hybrid E/M phenotype are associated with invasion and increased metastatic potential810. The intermediate states together with the intrinsic cell plasticity and transient nature of EMT have also contributed to the complexity in the analysis of the process5,11. Seminal studies have classified EMT states in cancer cell lines and animal models810,12. Our horizontal approach using both physiological and pathological models provides useful information not only on gene signatures, cell types or identities but also on biological activities, some beyond migration, invasion and metastasis. We describe two types of EMT that, reflecting the embryonic and adult cell responses occurring during embryonic development and organ fibrosis, are simultaneously implemented in primary tumors to respectively control invasion and inflammation. We also find that two EMT transcription factors (EMT-TFs) behave differently with respect to the two trajectories. Although SNAIL1 is activated as a pioneer factor at the base of both trajectories, PRRX1 is specific for the invasive trajectory. Compatible with this, Snail1-mutant cancer cells can hardly develop tumors, and Prrx1-mutant tumors are barely invasive, with the progression toward metastasis highly impaired. Importantly, the truncation of the embryonic-like invasive trajectory after Prrx1 deletion leads to an enhancement of the adult-like inflammatory trajectory, unveiling the plasticity and interdependence of the two identified EMT pathways and opening new avenues for the design of pathway-specific anti-EMT therapies.

Results

SNAIL1 is a pioneer factor in EMT induction

To understand the diverse EMT phenotypes along the E/M spectrum, we first used the available whole-genome transcriptome analysis of human breast cancer cells lines13 and found three groups representing the epithelial, hybrid and mesenchymal phenotypic states (Fig. 1a, Extended Data Fig. 1 and Supplementary Table 1), compatible with previous studies8,12. We found that the position in the EMT spectrum from the epithelial to the mesenchymal phenotype correlates with the level of activation of EMT-TF families (Fig. 1b), except for SNAIL factors, which do not show significant changes in expression between the hybrid E/M and mesenchymal cell lines.

Fig. 1. E/M states associate with conserved EMT-TF expression codes.

Fig. 1

a, Top: three-group clustering of 71 breast cancer cell lines13 based on epithelial and mesenchymal component enrichments (genes listed in Supplementary Table 1; Methods). Middle: heat map showing epithelial (E), mesenchymal (M) and Hallmark EMT (Molecular Signatures Database (MSigDB); University of California San Diego and Broad Institute) enrichment scores in the 71 breast cancer cell lines. Bottom: relative expression of representative epithelial and mesenchymal genes. Color scale, relative transcript levels (log2) from lowest (dark blue) to highest (bright red); x axis, breast cancer cell lines; y axis, genes; PCA, principal component analysis; ssGSEA, single-sample gene set enrichment analysis. b, Relative expression of EMT-TFs in the epithelial (n = 42), E/M (n = 14) and mesenchymal (n = 14) cell lines. Boxes indicate medians and interquartile ranges (IQRs), and whiskers indicate the minimum and maximum values. P values were determined using an unpaired two-sided t-test. c, Brightfield images of untreated or TGFβ-treated MDCK-II (II) and MDCK-NBL2 (NBL2) cells. d, Immunofluorescence images for epithelial tight junction protein 1 (TJP1) and mesenchymal fibronectin 1 (FN1) markers during TGFβ treatment. Images in c and d are representative of at least three biological replicates; scale bars, 20 μm; pEMT, partial EMT; fEMT, full EMT. e, EMT-TF transcript levels (quantitative PCR with reverse transcription (RT–qPCR)) during TGFβ treatment. Fold change (FC) to untreated control cells (CTR) is shown. FC is represented as mean ± s.e.m. (n = number of independent experiments). In MDCK-II control- and TGFβ-treated cells, n = 7 for 1 and 4 h, n = 6 for 8 h, n = 4 for 12 h, n = 12 for 1 day, n = 16 for 3 days, n = 8 for 4 days and n = 4 for 8 and 10 days. In MDCK-NBL2 control- and TGFβ-treated cells, n = 3 for 1, 4, 8 and 12 h, n = 6 for 1 and 2 days, n = 3 for 4 days and n = 4 for 8 and 10 days. A correction factor (cf) is applied to adjust the FC representation in MDCK-II cells. The cf is the ratio of gene expression in the two MDCK cell lines before TGFβ treatment (time = 0). Time points on the x axis are plotted in a nonproportional scale to better follow the early time points; d, days. f, RT–qPCR showing relative transcripts levels (FC) in TGFβ-treated versus untreated (CTR) MDCK-NBL2 cells for epithelial (left) and mesenchymal (right) markers. FC is represented as mean ± s.e.m. (n indicates the number of independent experiments. In control- and TGFβ-treated cells, n = 3 for 1, 2, 4, 8 and 12 h, n = 7 for 1 day, n = 4 for 2, 4 and 10 days. τep and τm stand for the onset of repression or activation of epithelial or mesenchymal genes, respectively; m, minutes. g, Proposed model for the progression from the epithelial to the mesenchymal phenotype (breast cell lines and TGFβ-treated MDCK cells).

Source data

Extended Data Fig. 1. Epithelial (E) and mesenchymal (M) component analysis in breast cancer cell lines and Differential response to TGFβ in MDCK cell lines.

Extended Data Fig. 1

a, E (n = 286) and M (n = 130) genes were extracted from EMT signatures described previously12,61. E and M genes where further used to perform k means clustering (see Methods) in the 71 breast cancer (BC) cell lines 23 (Fig. 1a). E and M genes are listed in Supplementary Table 1. b, Western blots showing early (1 h) and sustained (4 days) SMAD2/3 activation in response to TGFβ treatment3 in MDCK-II and MDCK-NBL2 cells. SMAD2/3 activation is measured by assessing phospho-SMAD2/3. Blots are representative of 3 independent experiments/condition. c, Representative images and quantification of SMAD2/3 nuclear accumulation in MDCK-II and MDCK-NBL2 cells after four days of TGFβ treatment. SMAD2/3 signal is represented as mean ± SEM, n = 3 independent experiments with similar results/condition. Non-significant P > 0.05. Scale bars, 20μm. d, Gene set enrichment analysis (GSEA) for TGFβ signalling (MSigDB; UC San Diego and Broad Institute) in TGFβ-treated cells (MDCK-NBL2 versus MDCK-II). NES, Normalised Enrichment Score; RES, Running Enrichment Score; RLM, Ranked List Metric. n = 3 independent experiments with similar results/condition. e, Downregulation of the epithelial marker CDH1 (E-cadherin) during TGFβ treatment. DAPI staining labels the nuclei (blue). Scale bars, 20μm. Representative images (n = 3 independent experiments) are shown. f, Western-blot showing fibronectin (FN1), epithelial tight junction protein 1 (TJP1) and E-cadherin (CDH1) protein levels after four days of TFGβ treatment. g, CDH1 levels after 10 days of treatment. β-actin (beta-actin) is used as a housekeeping control in (f) and (g). Also in (f) and (g) blots are representative of 3 independent experiments. h, Gene set enrichment analysis (GSEA) for Hallmark_EMT (MSigDB; UC San Diego and Broad Institute) in TGFβ-treated cells (MDCK-NBL2 versus MDCK-II) described and analysed as in (d). i, Heatmap of hierarchical clustering representing the relative expression of epithelial and mesenchymal genes obtained after bulk transcriptome RNA-seq. x-axis: cell lines and treatment; y-axis: genes. The color scale represents relative transcript levels (log2) from lowest (dark blue) to highest (bright red). In (c), P values were determined using unpaired two-sided t-test. In (d) and (h), P values for GSEA enrichment are estimated using an empirical phenotype-based permutation test.

Source data

To assess the dynamics of EMT-TF activation in the progression toward the mesenchymal phenotype, we used two MDCK epithelial cell sublines and transforming growth factor-β (TGFβ) treatment, the classical and most potent EMT inducer14. TGFβ signaling was robustly activated in both cell lines (Extended Data Fig. 1b–d), and both activated EMT in response to TGFβ. MDCK-II cells maintained residual cell–cell adhesion and displayed weak mesenchymal activation, therefore showing a hybrid E/M phenotype reminiscent of a stable partial EMT. By contrast, MDCK-NBL2 cells underwent a first transition to partial EMT, followed by a fast and robust mesenchymal transition typical of a full EMT (Fig. 1c,d and Extended Data Fig. 1e–i). The kinetics of EMT-TF expression relative to the onset of epithelial repression (τep) and mesenchymal activation (tm) during treatment with TGFβ showed that SNAIL1 was rapidly activated in both cell lines, showing two-wave dynamics (Fig. 1e), as previously observed15. A first fast burst was followed by the onset of epithelial repression (τep 30 min–8 h) and the activation of early mesenchymal genes, compatible with the role of SNAIL1 as a pioneer upstream regulator of the EMT program15. As such, SNAIL1 interference prevents TGFβ-induced EMT in both cell lines (Extended Data Fig. 2a), attenuating the repression of epithelial markers and the activation of mesenchymal markers in MDCK-NBL2 cells (Extended Data Fig. 2b). The second wave of SNAIL1 coincided with the recruitment of other EMT-TFs and the enhanced regulation of different epithelial and mesenchymal markers (Fig. 1e,f). Interestingly, TWIST1 and PRRX1 expression was only found in the progression of MDCK-NBL2 cells from partial to full EMT. Together, this is compatible with our data in cancer cell lines, revealing a conserved EMT-TF recruitment associated with EMT phenotypic states along the epithelial-to-mesenchymal spectrum (Fig. 1g).

Extended Data Fig. 2. SNAIL1 is required for the TGFβ-induced EMT response.

Extended Data Fig. 2

a, Brightfield images of MDCK-II and MDCK-NBL2 cells treated with small interfering RNA for SNAIL1 (siSNAIL1) or siCTR 24 hours before treatment with TGFβ. Images are representative of 3 independent experiments. Scale bars, 20μm. Images are representative of 3 independent experiments/condition. b, RT-qPCR showing the relative transcript levels for epithelial, mesenchymal and EMT-TFs genes after 1, 2 and 4 days of TGFβ administration in MDCK-NBL2 cells in the presence of small interfering RNA for SNAIL1 (siSNAIL1, n = 4 independent experiments for 1d and d2, and n = 6 for 4d of administration) or control reagent (siCTR, n = 4 independent experiments for all time points) compared to untreated cells. Data represent mean ± SEM. Consistent with the proposed pioneer role of SNAIL1, its knockdown considerably attenuates TGFβ-induced EMT including the early repression of epithelial markers, and the activation of mesenchymal markers and EMT-TFs (Fig. 1e,f). Interestingly, SNAIL1 knockdown does not prevent ZEB2 activation, which is already transcribed in the neural tube19 before SNAIL1 induction in neural crest precursors. P values were determined using an unpaired two-sided t-test.

Source data

PRRX1 is required for the invasive EMT phenotype

When both MDCK sublines were cultured in three-dimensional (3D) collagen matrices, they formed polarized hollowed spheres (Fig. 2a). When treated with TGFβ, only MDCK-NBL2 cells displayed frequent protrusive events (Fig. 2a), concomitant with the loss of epithelial and the gain of mesenchymal markers (Extended Data Fig. 3a), together hallmarks of invasive behavior. To identify an EMT invasive signature, we performed bulk RNA sequencing (RNA-seq) and selected genes specifically upregulated in TGFβ-treated MDCK-NBL2 cells compared to in TGFβ-treated MDCK-II cells. The genes enriched in MDCK-NBL2 cells were able to segregate basal-like cells16 among all cancer cell lines, an aggressive type known to undergo EMT17, into two clusters according to their invasive capacities (Fig. 2b). This allowed the identification of 259 genes that we refer to as breast cancer proinvasion genes (BC-PINGs). In addition to EMT, Gene Ontology (GO) identified developmental and invasion programs selectively enriched in TGFβ-treated MDCK-NBL2 cells (Fig. 2c) and in the BC-PING signature (Extended Data Fig. 3b and Supplementary Table 2), prompting us to analyze our EMT-associated BC-PINGs in a prototypical embryonic invasive EMT program, the delamination and migration of the neural crest14.

Fig. 2. EMT in developing embryos, epithelial cells and invasive cancer cells.

Fig. 2

a, Analysis of TGFβ-induced (1 ng ml–1) invasiveness in MDCK cells cultured for 10 days in 3D collagen matrices. Phalloidin staining reveals actin filaments (F-actin). TGFβ was administered after sphere formation. Nuclei are shown in blue; scale bars, 20 μm. Images are representative of three independent experiments with similar results. b, Differentially expressed genes upregulated in TGFβ-treated MDCK-NBL2 cells were used to cluster basal-like breast cancer cell lines. MCF10A cells were used as a control for nontumorigenic and noninvasive cells. The heat map shows that two main clusters segregate invasive (left) from noninvasive (right) cells; n = 4 cell lines per group. The y axis shows genes enriched ≥1.75× in the averaged invasive versus noninvasive basal-like breast cancer cell lines, referred to as BC-PINGs (genes listed in Supplementary Table 3). The color scale indicates the relative transcript levels (log2) from lowest (dark blue) to highest (bright red). c, Dot plot showing selected GO terms enriched in MDCK-NBL2 versus MDCK-II cells in response to TGFβ in two-dimensional (2D) cultures (n = 3 independent experiments per cell line). d, Top: trunk neural crest (NC) populations in embryonic day 9.5 mouse embryos. Bottom: corresponding single-cell t-distributed stochastic neighbor (t-SNE) embedding and connectivity map as predicted by PAGA28 obtained using data from Soldatov et al.18; NT, neural tube; pre-M-NC, premigratory neural crest; Del-NC, delaminating neural crest; MP-NC, migratory progenitor neural crest; NC-Mes, neural crest-derived mesenchymal cells; Sen-N, sensory neurons; Auto-NS, autonomic nervous system. Clusters 5 and 6 correspond to bifurcations toward neural differentiation and, therefore, involve the known repression of the EMT program and are not used in the subsequent analysis. e, Heat map with transcriptional program changes in the single-cell trajectory from NT to NC-Mes (clusters 0 to 4; genes predicted by Moran’s I-test with q < 0.001 and ordered over pseudotime using scVelo77). See Supplementary Table 3 for full gene names. Right: dot plot showing enrichment in Hallmark EMT and BC-PINGs in each of the indicated populations. Enrichment score indicates the number of overlapping genes in the populations and Hallmark EMT or BC-PINGs per number of overlaps expected by chance. The significance of enrichments was assessed by hypergeometric P value; n = number of genes; group 1, n = 529; group 2, n = 654; group 3, n = 193. f, Enrichments represented over the t-SNE embedding shown in d. g, Relative expression of EMT-TFs in trunk neural crest migratory trajectory. The color scale is as in b. h, MDCK-NBL2 and MDCK-II cells treated with small interfering RNA (siRNA) for PRRX1 (siPRRX1) or control (siCTR) 8 h before treatment with TGFβ or once cells have undergone EMT (6 days after TGFβ administration); scale bars, 20 μm. Images are representative of three independent experiments with similar results per condition. i, Dot plot showing selected GO terms for downregulated genes after 4 days of TGFβ treatment of MDCK-NBL2 cells under siPRRX1 versus siCTR conditions (n = 3 independent experiments). j, GSEA showing the relative enrichment for BC-PINGs in TGFβ-treated MDCK-NBL2 cells (invasive EMT) compared to TGFβ-treated MDCK-II cells (noninvasive EMT response; top) and loss of the positive enrichment after siPRRX1 treatment (bottom; same experiments as in i); NES, normalized enrichment score; RES, running enrichment score; RLM, ranked list metric. k, Transwell invasion assays showing the nuclei (DAPI) of MDCK-NBL2 invasive cells after 2 days of TGFβ treatment in the presence of siCTR or siPRRX1. Invading cells are represented as mean ± s.e.m. (n = 5 independent experiments with similar results per condition). l, Top: representative brightfield images of TGFβ-treated MDCK-II cells transfected with empty (CTR) or PRRX1-expressing plasmids. Bottom: images and quantification of transwell invasion assays. PRRX1 expression is sufficient to induce cell scattering and invasive properties. The numbers of invading cells are represented as mean ± s.e.m. (n = 6 independent experiments with similar results per condition). P values were determined using Fisher’s exact test for the dot plots (c, e and i), estimated using an empirical phenotype-based permutation test (j) or determined using an unpaired two-sided t-test (k and l).

Source data

Extended Data Fig. 3. PRRX1 knockdown prevents TGFβ-induced invasive EMT.

Extended Data Fig. 3

a, Analysis of epithelial (TJP1) and mesenchymal (VIM, vimentin) markers in TGFβ (1 ng/ml) treated MDCK cells cultured in Collagen matrices (as in Fig. 2a). Nuclei in blue. Scale bars, 50μm. b, Dot plot showing selected GO terms enriched in BC-PINGs. n = 4 cell lines/group (invasive and non-invasive). See Fig. 2b. c, Upper panel, relative transcript levels (RT-qPCR) for PRRX1 long (PPRX1-L), short (PPRX1-S) or both isoforms (PRRX1) in MDCK-NBL2 cells, either 1 or 4 days after TGFβ treatment. Note that both isoforms are activated. qPCR fold-change expression is represented mean ± SEM, n = 3 independent experiments/time point and condition. Lower panel, PRRX1 relative transcript levels in MDCK-NBL2 cells pre-treated with small interfering RNA for PRRX1 (siPRRX1) or control reagent (siCTR) followed by 4 days of TGFβ. Data represent mean ± SEM (n = 6 independent experiments per condition. d, GSEA showing the reduction of Hallmark_EMT in TGFβ-treated NBL2 cells pre-treated with siPRRX1, and their corresponding bright- field images. NES, Normalised Enrichment Score; RES, Running Enrichment Score; RLM, Ranked List Metric. n = 3 independent experiments/condition e, IF images showing the expression of E-cadherin (CDH1) and α-SMA after 4 days in culture with or without TGFβ in the presence of siPRRX1 or siCTR. DAPI staining labels the nuclei in blue. Scale bars, 20μm. f, RT-qPCR showing the relative transcript levels for epithelial and mesenchymal genes after 4 days of TGFβ administration in MDCK-NBL2 and MDCK-II cells in the presence of small interfering RNA for PRRX1 (siPRRX1) and compared with those containing a control reagent (siCTR). qPCR fold-change expression is represented as mean ± SEM, n = 4 independent experiments/cell line and condition for all genes except TAGLN, ACTA2 and CDH2 were n = 6. g, Hierarchical clustering analysis after bulk RNA sequencing showing the differential regulation of epithelial, mesenchymal and EMT-TFs genes after siPRRX1 treatment during TGFβ-induced EMT. The colour scale represents relative transcript levels (log2) from lowest (dark blue) to highest (bright red). h, Signalling pathways enriched in TGFβ-induced invasive vs non-invasive EMT in MDCK cells. n = 3 independent experiments/condition. i, GSEA for the two cell lines in (g) showing the relative enrichment for the indicated EMT-associated signalling pathways (upper panel) and the corresponding loss after siPRRX1 treatment (lower panel). j, Brightfield images showing the Impact of pre-treatment with FAK inhibitor (FAKi) or vehicle (DMSO) in MDCK-NBL2 cells subsequently treated with TGFβ. k, Brightfield images showing the Impact FAKi, applied after the cells had undergone EMT. l, No obvious effect of FAKi can be observed in TGFβ-treated MDCK-II cells. Scale bars, 20μm in (j), (k) and (l). m, RT-qPCR showing the relative transcript levels for epithelial and mesenchymal genes in the MDCK-II cultures shown in (l). Data represent mean ± SEM, n = 4 independent experiments/condition. P values were determined using Fisher´s Exact Test for the dot plots in (b) and (h), and using an unpaired two-sided t-test in (c), (f) and (m). Images in (a), (e) (j), (k) and (l) are representative of 3 independent experiments with similar results.

Source data

We used single-cell transcriptomic analysis of embryonic trunk neural crest18 to build a connectivity map (Fig. 2d) and reconstruct the transcriptional program during delamination and migration excluding bifurcations to differentiation (clusters 5 and 6; Fig. 2d). The resulting trajectory is associated with an increase in the EMT program (Hallmark EMT). Although invasive properties are already present in the delaminating cells, the invasive signature (BC-PINGS) is maintained as migration proceeds toward the mesenchymal phenotype (Fig. 2e,f). Except for Zeb2, which is already expressed before neural crest induction19 and not associated with invasion in some contexts20, this trajectory concurs with a sequential activation of EMT-TFs (Fig. 2g), as we have described it in TGFβ-treated MDCK-NBL2 cells progressing toward the mesenchymal phenotype (Fig. 1). Prrx1, specifically activated at advanced EMT in all models analyzed (Figs. 1b,e and 2g), can stabilize the mesenchymal phenotype in the migratory crest population as it generates ectomesenchyme derivatives, for example, cartilage21 or connective tissue. As high levels of PRRX1 accompany the progression to the full mesenchymal phenotype in different contexts, we examined whether PRRX1 is a requirement for this transition. Knockdown of PRRX1 (siPRRX1; over 90% reduction of long and short isoforms) prevented the full EMT induced in MDCK-NBL2 cells by TGFβ and reverted the EMT status to a partial EMT when administered after the mesenchymal transition was complete (Fig. 2h and Extended Data Fig. 3c,d). When PRRX1 was knocked down in MDCK-NBL2 cells, the mesenchymalization observed after TGFβ treatment was highly attenuated and did not affect MDCK-II cells, as expected (Extended Data Fig. 3f). Bulk RNA-seq confirmed that PRRX1 knockdown (siPRRX1) in MDCK-NBL2 cells attenuates the repression of the epithelial program, prevents full activation of mesenchymal genes, including other EMT-TFs (Extended Data Fig. 3g), and represses developmental programs associated with cell migration (Fig. 2i), and cells lose the invasive signature (Fig. 2j,k). Thus, in the absence of PRRX1 activation, MDCK-NBL2 cells reach an end state in their response to TGFβ like the noninvasive partial EMT observed in MDCK-II cells. Interestingly, ectopic expression of PRRX1 was sufficient to promote invasiveness in MDCK-II cells treated with TGFβ (Fig. 2l), supporting the role of PRRX1 in inducing the invasive EMT phenotype.

Several signaling pathways are enriched in TGFβ-induced invasive versus noninvasive EMT, in particular focal adhesion kinase (FAK), strongly dependent on PRRX1 (Extended Data Fig. 3h,i). Sublethal doses of FAK inhibitors (FAKi) prevented the induction of full EMT by TGFβ in MDCK-NBL2 cells and induced a partial reversion (MET) if administered after cells had undergone full EMT (Extended Data Fig. 3j,k). By contrast, MDCK-II cells exposed to TGFβ showed a similar morphology and conserved EMT response irrespective of the presence of FAKi (Extended Data Fig. 3l,m). Thus, the impact of FAKi is similar to that of PRRX1 knockdown, and the activation of high PRRX1 levels and FAK signaling associates with invasive EMT and promotes transition to a full mesenchymal phenotype in embryos and cancer cells.

Partial and inflammatory noninvasive EMT in renal fibrosis

In contrast to invasive (embryonic or tumoral) EMT, a noninvasive partial EMT is activated in renal fibrosis6,7, as observed after unilateral ureteral obstruction (UUO), which induces tubular injury progressively evolving into renal interstitial fibrosis and renal failure22. In this model, SNAIL1 activated EMT in tubular cells, which dedifferentiate but do not become invasive, remain in the damaged tubules and secrete chemokines and cytokines that promote fibrogenesis and inflammation (Fig. 3a)6,7. This was confirmed by the absence of red (tdTomato) cells in the stroma after genetic labeling of renal epithelial cells and UUO (Fig. 3b). E-cadherin reduction and vimentin activation in the same cells confirmed the existence of a hybrid E/M phenotype (Fig. 3c). Residual E-cadherin and tight junction protein 1 helped to maintain some cell–cell adhesion (Fig. 3d). These findings confirm partial and noninvasive EMT, where epithelial cells do not become fibroblasts6,7, in agreement with the recent demonstration that myofibroblasts derive from fibroblasts (and pericytes) in human fibrosis23.

Fig. 3. Single-cell transcriptomic analysis reveals EMT activation in renal fibrosis.

Fig. 3

a, Genetic strategy to trace renal epithelial cells (see Methods), which appear labeled in red (tdTomato). b, Expression of the mesenchymal marker vimentin (Vim) in combination with tdTomato-labeled renal epithelial cells in control (sham) and obstructed kidneys (UUO). Dashed lines surround renal tubules. Arrows indicate de novo vimentin activation in renal tubules. Nuclei are in blue; scale bar, 5 μm. c, E-cadherin and vimentin expression. The arrow in sham indicates renal epithelial cells, and the arrowhead indicates a glomerulus. In the UUO images, the higher-magnification images (box) show E-cadherin and vimentin expression in single channels. Arrows show renal epithelial cells positive for both markers; scale bar, 5 μm. d, Tight junction epithelial protein (TJP1) expression. Arrowheads show puncta adherens junctions. Nuclei are in blue; scale bar, 10 μm. Images in bd are representative of kidneys extracted from n = 4 sham and n = 6 UUO mice. e, UMAP showing the diversity of cell types in sham-operated and obstructed kidneys 10 days after ligation. f, UMAP and bar plots showing the contribution of different cell populations to control (sham) and obstructed (UUO) kidneys. Abbreviations are as in e; Inj, injured tubules. UMAPs in e and f represent 25,424 cells in total obtained from kidneys extracted from one sham and two UUO mice; EPI, epithelial; INT, interstitial; IM-M, immune myeloids; IM-L, immune lymphoids; PROL, proliferative; ED, endothelial; GLO, glomerulus. g, Top: origin of injured epithelial cells determined using supervised machine learning (see Methods). The dashed box contains the epithelial clusters with major contribution to injury (PT clusters 8, 16 and 5; 89.3%). Bottom: expression of the injury marker Kim-1 in combination with the PT cell marker LTA. Arrows indicate Kim-1+ cells also positive for LTA (PT). The box plot shows the percentage of Kim-1+ injured cortical epithelial cells also positive for LTA (n = 3 mice analyzed 1 day after UUO, with six randomly selected cortex images quantified per kidney). The box shows the median and IQRs, and whiskers show minimum and maximum values. Nuclei are in blue; scale bar, 50 μm. h, Top: UMAP of injured epithelial cells and clusters contributing to injury (dashed box in g). Bottom: respective contribution in control and obstructed kidneys. i, UMAP as in h. Cells are colored by the enrichment score for Hallmark EMT, PT differentiation and injury-associated inflammation (see Methods). UMAPs in h and i represent n = 3,929 healthy and n = 3,509 injured epithelial cells.

Source data

Following droplet-based single-cell RNA-seq of sham-operated and obstructed kidneys, unsupervised graph-based clustering (see Methods) organized cells into 26 major cell clusters representing the different cell types shown on uniform manifold approximation and projection (UMAP; Fig. 3e and Extended Data Fig. 4a–e), all identified by the expression of bona fide lineage markers2325. We found a dramatic remodeling in the nonepithelial component in obstructed kidneys, with a massive increase in interstitial and immune cells, as previously observed26. In the epithelial component, the appearance of a cluster of injured cells is concomitant with the disappearance of bona fide proximal tubule (PT) cells (Fig. 3f and Extended Data Fig. 4f), identified as major contributors to the injured population (approximately 90%; Fig. 3g and Extended Data Fig. 5a), consistent with the coexpression of PT and injury markers 1 day after UUO (Fig. 3g and Extended Data Fig. 5b,c) and with previous data22. The reclustering of PT (dashed box in Fig. 3g) and PT injured cells showed that damaged cells had activated an EMT program concomitant with the loss of renal epithelial differentiation and the acquisition of an injury inflammatory program27 (Fig. 3h,i). Trajectory reconstruction using partition-based graph abstraction (PAGA)28, Velocity29 and transcriptional regulatory network computation by SCENIC30 confirmed the progression of EMT states in parallel to the increase in injury markers (Fig. 4a,b and Extended Data Fig. 6a–c). Importantly, the EMT program, including epithelial dedifferentiation, mesenchymalyzation and injury markers, considerably decreased when UUO was performed in mice bearing SNAIL1-deficient (Snail1-conditional knockout (cKO)) renal epithelial cells (Fig. 4c,d), confirming the role of SNAIL1 in triggering tubulointerstitial inflammation, fibrosis6,7 and EMT activation and validating the regulatory networks predicted by SCENIC (Fig. 4b,d) during the injury response in the adult kidney.

Extended Data Fig. 4. Cell populations in control and obstructed kidneys revealed by single-cell RNA-seq.

Extended Data Fig. 4

a, Experimental design. Three single-cell RNAseq libraries were generated from one control (n = 1 SHAM operated mouse) and two obstructed kidneys (n = 2 UUO mice) obtained from 3 male mice. b, Violin plots showing gene number (detected genes), unique transcript counts and percentage of mitochondrial counts for the different 10xGenomics-based libraries. We applied filtering to remove putative cell doublets and to include only cells having number of detected genes in the range of 400–4000. 25424 cells passed this filter and were subjected to subsequent analysis, showing a mitochondrial proportion below 10% to include metabolically highly active tubular renal cells. c, Heatmap showing top 20 discriminative DEGs for the 26 clusters. d, UMAP plot showing the distribution of the 26 clusters and their assigned identities. e, Dot plot showing the proportion and expression levels of markers genes that identify different cell types as in (d). Markers (y-axis), cell types (x-axis). See Supplementary Table 2 for full gene names. UMAP and dot plot in (d) and (e) were generated using n = 25424 cells. f, Cell populations changes after unilateral ureteral obstruction. Compositional data analysis (CoDA)72 (see Methods) was used to assess the statistical relevance of changes in cell populations taking the glomerulus as a reference. Positive and negative CoDA loadings (x-axis) correspond respectively to the increases and decreases of a cell population in UUO compared to SHAM. Cell populations as in Fig. 3e. The boxplots depict the uncertainty of the loading coefficients obtained by resampling with 1000 bootstrapping. The uncertainty of the loading coefficients obtained by resampling with 1000 bootstrapping was represented using boxplot, where the boxes are IQRs split by the median (middle line) and the whiskers represent minimum and maximum loading coefficients. Corrected P values were determined using Benjamini Hochberg procedure (for more details, see ‘Compositional analysis for kidney cell populations’ in Methods). The red horizontal line separates cell types passing significance threshold.

Source data

Extended Data Fig. 5. Origin of the injured epithelial cells in the kidney UUO model.

Extended Data Fig. 5

a, 10-fold cross-validation of supervised machine learning model. Bar plots showing accuracy score and normalized Matthews correlation coefficient (normMCC = (MCC + 1) / 2)78 obtained by n = 10-fold cross validation over training dataset (see Methods for classification of injured epithelial cells using supervised machine learning model). The performance measures for each class were calculated using the One vs. Rest (OvR) method. Plots represent 10 validations, and error bars represent mean +/- SD. b, Expression of the injury marker Kim-1 in combination with the PT cell marker LTA in SHAM-operated and UUO 1 day after obstruction. Images are representative of kidneys obtained from n = 3 mice/condition. c, Higher power magnification showing the expression of Kim-1 and LTA in the renal cortex and medulla. Nuclei in blue. Scale bar, 50μm except for (b) where bar indicates 500 μm.

Source data

Fig. 4. Noninvasive and inflammatory partial EMT program in renal fibrosis.

Fig. 4

a, Left: PAGA-predicted connectivity map of PT and injured clusters (see Methods). Right: RNA velocity analysis (see Methods) showing the trajectory from differentiated to injured PT cells; n = 3,929 healthy and n = 3,509 injured epithelial cells obtained from kidneys extracted from one sham and two UUO mice. b, Hierarchical clustering of SCENIC-computed TF activities (regulons; see Extended Data Fig. 6). The regulon activity represents the mean value of AUCell score per cluster for cells shown in a (see also Extended Data Fig. 6). diff, differentiated; dediff, dedifferentiated; inj, injured. c, Expression of cadherin-16 (CDH16; top) and vimentin (bottom) in nonobstructed kidneys (CTR) and in kidneys after 2 weeks of obstruction in SNAIL1-proficient (UUO CTR) and SNAIL1-deficient mice (renal epithelial cell-specific Snail1-KO mice4; UUO Snail1 cKO), plus the quantification of cadherin-16 expression in renal epithelial cells and the percentage of renal epithelial cells positive for vimentin. Data are represented as mean ± s.e.m.; n = 3 kidneys per group for cadherin-16 and vimentin quantifications. d, RT–qPCR showing the FC for depicted differentiation epithelial genes and deregulated TFs predicted by SCENIC in kidneys similar to those described in c. FC is represented as mean ± s.e.m. (per condition: n = 7 kidneys for epithelial genes, n = 6 kidneys for Fos and n = 5 kidneys for the remaining TFs). e, Changes in the transcriptional program along the trajectory from PT to injured cells (genes predicted by Moran’s I-test and ordered over pseudotime as in Fig. 1h); n indicates the number of genes per group (group 1, n = 209; group 2, n = 210; group 3, n = 1,105). f, Dot plot showing GO terms related to differentiation or inflammation associated with the indicated groups in e; IFNγ, interferon-γ; IFNα, interferon-α. g, Expression of the injury response and TGFβ target (KRT20) and dedifferentiation (KLF4) markers in combination with a PT differentiation marker (LTA). Bottom: Kim-1 and Jun as markers of injury plus LTA. h, PRRX1 (green) and renal epithelial cells genetically traced with tdTomato (red); arrowheads indicate PRRX1 exclusive expression in interstitial cells. All markers are shown in control and obstructed kidneys. Nuclei are in blue. Images in c, g and h are representative of n = 3 independent experiments with similar results per condition; scale bar, 20 μm. P values in c and d were determined using an unpaired two-sided t-test and Fisher’s exact test for the dot plots in f.

Source data

Extended Data Fig. 6. The partial EMT programme associated with injured proximal tubules.

Extended Data Fig. 6

a, Prediction of the regulatory transcriptional programme in EMT trajectory in injured renal epithelial cells. The heat map shows the hierarchical clustering of SCENIC computed transcription factor activities (regulons) for the injury trajectory. The regulon activity represents the mean value of AUCell score per single-cell cluster. See Supplementary Table 2 for full gene names. Data were generated using n = 3929 healthy and 3509 injured epithelial cells from kidneys obtained from 1 SHAM operated and 2 UUO mice. b, Binding motifs for the corresponding transcription factors and their activity (y-axis) plotted over pseudotime (x-axis) for selected examples of regulons shown in (b). The complete list of predicted regulons and their binding motifs is available in Supplementary Table 6. c, UMAP plots derived from cells in (a) showing the expression of markers of renal-specific epithelial differentiation (Myoinositol oxygenase, Miox and Hepatocyte nuclear factor 4, Hnf4), Inflammation, repair/degeneration, and mesenchymalysation. d, Top row, expression of the injury response marker Krt20 and the TGFβ target and dedifferentiation marker Klf4 in combination with a PT differentiation marker (LTA). Bottom row, expression of the injury response markers Kim-1 and Jun in combination with LTA. Nuclei in blue. Scale bar, 50 μm. Note the progressive loss of LTA one and two weeks after UUO and the acquisition of adult EMT in damaged proximal tubules. e, Time course analysis of adult EMT markers Jun and Klf4 in combination with LTA. Scale bar, 50 μm. Images in (d) and (e) are representative of those obtained from 3 mice per condition. f, RT-qPCR analysis of bulk kidney tissue showing relative transcript levels for EMT-TFs two weeks after UUO. Data indicate mean ± SEM (n = 3 mice per condition). g, Dot plot showing the enrichment for injury response (Inflammation and pro-fibrotic GO terms) in TGFβ-treated MDCK II (non-invasive EMT) vs NBL2 (invasive EMT), n = 3 independent experiments/condition. P values were determined using an unpaired two-sided t-test (f) or Fisher´s Exact Test for the dot plot (g).

Source data

Unlike the invasive EMT found in TGFβ-treated MDCK-NBL2 cells, neural crest and breast cancer cells, the EMT program activated in damaged PT cells was enriched in genes associated with the injury response (for example, Vcam1, Jun/Fos and Egr1), inflammation (for example, Ccl2, Ccl5, Nfkbia, Notch1 and Notch3) and fibrogenesis (for example, Tgfb1 and Tgfb2 and metalloproteinase inhibitors Timp1 and Timp2; Fig. 4e and Extended Data Fig. 6c). This EMT inflammatory program was confirmed by the analysis of enriched pathways (Fig. 4f) and the localization of hallmark injury markers (Fig. 4g and Extended Data Fig. 6d,e). Genes encoding inflammatory cytokines and chemokines are expressed by damaged epithelial cells that dedifferentiate and remain in the injured tubules (Fig. 4e,g and Extended Data Fig. 6c–e). Several EMT-TFs are activated in the obstructed kidneys (Extended Data Fig. 6f), but the absence of PRRX1 in the damaged epithelial cells (Fig. 4h, expression only detected in the stroma) can explain their failure to invade (see Figs. 1 and 2). Hence, PT cells acquired a stable partial EMT phenotype with residual cell–cell junctions. Interestingly, the MDCK-II cells that respond to TGFβ undergoing a noninvasive partial EMT (Fig. 2) are also enriched in pathways associated with inflammation (Extended Data Fig. 6g). Thus, here, we characterize the partial EMT program in epithelial cells during fibrosis as the trigger of dedifferentiation compatible with renal insufficiency and accompanied by a repair/inflammatory phenotype that secretes fibrogenic and inflammatory cytokines, influencing the stroma in a paracrine manner to promote the progression of the disease. Together, we describe the inflammatory EMT program as the response to injury of adult nontransformed cells.

Progenitor-like EMT phenotypes in tumors

As EMT is pathologically activated in primary tumors to favor cancer cell dissemination, we extended our studies to a widely used breast cancer model, MMTV-PyMT31, carcinomas that progress to the invasive and metastatic state, resembling human invasive breast cancer32. We tagged mammary gland progenitor cells from early embryonic stages33 to detect all cancer cells and discriminate them from those in the tumor microenvironment (Fig. 5a,b). For the transcriptomic analysis at single-cell resolution, we followed a strategy similar to that used in renal fibrosis and profiled advanced metastatic primary tumors (Extended Data Fig. 7a,b). Cells were organized into five major clusters (Fig. 5c, left), representing cancer cells and associated populations, all identified by the expression of specific markers (Fig. 5c, left, and Extended Data Fig. 7c–e). As in the kidney, we focused on the epithelial component, the cancer cells identified by the expression of tdTomato (Extended Data Fig. 8a). Cancer cells were subdivided into 17 clusters represented across the four tumor samples (Fig. 5c, right, and Extended Data Fig. 8b), validating our experimental approach and showing that the progression of PyMT mammary gland carcinomas is very stereotyped. Using luminal and basal gene signatures for mammary epithelial cells34,35, we found that, as expected from the luminal origin of PyMT tumors, the majority of clusters (around 70% of the cancer cells) had a luminal alveolar phenotype (Fig. 5d and Extended Data Fig. 8c). In addition, we observed clusters with transcriptomes compatible with different progenitor states, reminiscent of a luminal alveolar stem/progenitor state35,36 (clusters 1, 13 and 15), a pan-luminal stem/progenitor state34,37 (cluster 10), a hybrid state combining the pan-luminal stem/progenitor state, luminal hormone sensing and baso-myoepithelial phenotype (cluster 14), compatible with a luminobasal bipotent progenitor state induced by the oncogene38,39 and reprogramming toward a developmental progenitor-like state40,41. Clusters 12 and 16 acquire a basal program compatible with the progression toward the invasive phenotype42. Thus, we observed a series of phenotypes associated with dedifferentiation and reprogramming toward progenitor, multipotent-like states (Fig. 5d and Extended Data Fig. 8c). This reprogramming occurs concomitantly with EMT and the progression along the E/M spectrum (Fig. 5e and Extended Data Fig. 8d–f). Luminal alveolar clusters show the highest level of epithelial markers, whereas partial EMT states (clusters 1, 13 and 15) are associated with mammary gland progenitor-like lineages and stemness markers, and cluster 16 expresses high Prrx1 levels, reminiscent of cells in a full EMT state (Fig. 5e and Extended Data Fig. 8d–f). Thus, the progression toward more dedifferentiated phenotypes concurs with the sequential recruitment of EMT-TFs, as observed in cancer cell lines and during TGFβ-induced EMT (Fig. 1), revealing the parallel progression of mammary cell dedifferentiation and EMT states.

Fig. 5. Concomitant dedifferentiation and EMT activation in PyMT breast cancer.

Fig. 5

a, Experimental design to generate genetically trackable cancer cells. PyMT activation in luminal cells leads to carcinoma development, with all cancer cells labeled by tdTomato but not the stroma (see Methods). b, Expression of the mammary epithelial cell reporter (tdTomato) and the oncogene (PyMT) in a tumor from a 15-week-old K14Cre;Rosa-tdTomato;MMTV-PyMT mouse. Note that 99.9% of the tdTomato+ cells are cancer cells (PyMT+) and that all the PyMT+ cancer cells are tagged (tdTomato+). Images are representative of ten tumors analyzed from five mice; scale bar, 25 μm. c, UMAPs showing the diversity of cell types in PyMT tumors (total cell number, n = 36,091) and clustering of the cancer cell subset (total cell number, n = 19,001; see Methods). d, Hierarchical clustering of PyMT cancer cell clusters based on the expression of cell differentiation markers (luminal or basal/myoepithelial). The x axis represents cancer cell clusters, and the y axis represents the average log2 (FC) in gene expression per cluster. The color scale is as in Fig. 1. Right: cluster representation with associated colors according to differentiation states; BM, baso-myoepithelial genes; HS, luminal hormone-sensing genes; LS, luminal stem/progenitor genes; LAS, lumino-alveolar stem/progenitor genes; LA, lumino-alveolar genes; LAP, lumino-alveolar stem/progenitor state; PLP, pan-luminal stem/progenitor state; LBP, luminobasal stem/progenitor state. For gene names, see Supplementary Table 3. e, Hierarchical clustering of cancer cells based on the expression of epithelial and mesenchymal markers. Bottom: integration of differentiation (color coded as in c) and EMT states (gray scale); n = 19,001 cancer cells in d and e.

Source data

Extended Data Fig. 7. Analysis of cell populations in PyMT metastatic breast cancer.

Extended Data Fig. 7

a, Experimental design used to prepare single-cell barcoded cDNA libraries. Four single-cell RNAseq libraries (T1-T4) were generated from n = 4 independent samples obtained from three 12–14 weeks old female mice. Right panel shows the 3D reconstitution of one representative whole-mounted left lung lobe showing tdTomato-positive metastatic foci. Scale bars, 2 mm. b, Violin plots showing gene number (detected genes), unique transcript counts and percentage of mitochondrial counts for the different 10xGenomics-based libraries of the four PyMT primary tumor samples. We removed putative cell doublets and applied stringent filtering to include only cells having number of detected genes in the range of 400–4000. The majority of cells (n = 36091/36162) passed this filter and were subjected to subsequent analysis, showing a mitochondrial proportion below 2%, indicative of high-quality104. c, Heatmap showing discriminative genes of the five main PyMT tumour populations (see Fig. 4c). d, Dot-plot showing the expression levels for genes that identify the major cell types in the tumours. Symbols of cell types (y-axis) as shown in (c). e, UMAP visualization of cells expressing different markers for tumour cells (CC, tdTomato), myeloid cells (MC, Cd74), cancer-associated fibroblasts (CAF, Col3a1), endothelial cells (EC, Cdh5) and lymphoid cells (LC, Cd3g). See Supplementary Table 2 for all full gene names. Data in (c), (d) and (c) were generated using n = 36162 tumour cells.

Source data

Extended Data Fig. 8. PyMT cancer cell cluster analysis.

Extended Data Fig. 8

a, UMAP visualization of cancer cells showing expression of the tdTomato reporter. b, UMAP plots and table depicting the distribution of cancer cell subclusters in each single-cell RNAseq data set derived from the 4 independent tumour samples. c, Expression of luminal (blue) and basal/myoepithelial (red) cancer cell lineage markers on the UMAP gene expression plot. d-f, Distribution of expression of epithelial (d), mesenchymal markers (e), and of EMT-TFs (f) in cancer cells on the UMAP plot. Markov affinity-based graph imputation of cells (MAGIC)73 was applied to improve EMT-Tfs representation over UMAP. See Supplementary Table 2 for full gene names. All UMAPs were generated using n = 19001 cancer cells.

Source data

Two distinct EMT programs in cancer

Following a similar strategy to that used in the neural crest and the kidney, we found that cancer cell EMT clusters, rather than appearing ordered in a linear trajectory, were organized in a branched structure with two discrete paths bifurcating from the bulk of luminal cancer cell clusters at the level of cluster 11 (Fig. 6a). RNA velocity29 analysis of individual cells was compatible with this organization and inferred the directionality of the two trajectories (Fig. 6b). Next, we performed SCENIC analysis30 that revealed cell state-specific transcriptional regulators across the two EMT trajectories (Fig. 6c,d). Applying pseudotime inference, we reconstructed their corresponding molecular programs (Fig. 6e). In the EMT trajectory 1 (EMT-T1) branch, in addition to losing lumino-alveolar differentiation markers and progressing toward a mesenchymal phenotype, cancer cells acquired a proinvasive gene profile from cluster 14 (Fig. 6e and Extended Data Fig. 9a). The recruitment of EMT factors resembles that in the full EMT response of MDCK-NBL2 cells to TGFβ, with Snail genes followed by Zeb1 and Prrx1 (Extended Data Fig. 9a), compatible with a progression to cancer cell dissemination. By contrast, in EMT trajectory 2 (EMT-T2), the lumino-alveolar epithelial phenotype of cluster 11 progresses to the partial EMT phenotype of cell clusters 1, 13 and 15, still maintaining expression of epithelial genes and activating a limited mesenchymal program (Fig. 6e and Extended Data Fig. 9a). Snail1 is the only EMT-TF significantly detected in the EMT-T2 trajectory in cancer cells (Extended Data Fig. 9a), resembling the partial EMT program observed during renal fibrosis. In relation to this, the most relevant trait in EMT-T2 is the remarkable enrichment in inflammatory and profibrotic genes. In sharp contrast to the highly proteolytic and invasive EMT-T1 gene signature enriched in matrix metalloproteinases (Mmp2, Mmp3, Mmp13 and Mmp14), EMT-T2 is enriched in metalloproteinase inhibitors (Timp1, Timp2 and Timp3), consistent with its noninvasive and profibrotic profile (Fig. 6e and Extended Data Fig. 9a). Together, our single-cell transcriptomic analyses reveal that within the same tumor, cancer cells progress along two different EMT trajectories both associated with dedifferentiation and either a proinvasive or inflammatory phenotype. Analysis of enriched biological processes confirms the existence of trajectory-specific functions, namely invasion (EMT-T1) and inflammation (EMT-T2), characteristic of embryonic and adult EMTs, respectively (Fig. 6e–g and Extended Data Fig. 9b,c).

Fig. 6. Two distinct EMT programs in PyMT breast cancer.

Fig. 6

a, Left: cancer cell (PyMT+tdTomato+) connectivity map predicted by PAGA and represented over the UMAP embedding shown in Fig. 5c. Right: EMT states bifurcate into two distinct trajectories, EMT-T1 and EMT-T2; n = 19,001 cancer cells. b, Left: RNA velocity analysis (see Methods). The velocities are visualized on a recalculated UMAP for EMT trajectories in a. The solid line represents a smooth principal curve (see Methods) fitted over a UMAP. Bottom left: velocities of cells shown at the bifurcation point. Bottom right: Hallmark EMT enrichment represented over a UMAP embedding; n = 7,404 cancer cells. c, Hierarchical clustering of SCENIC-computed TF activities (regulons) on EMT-T1 and EMT-T2. The regulon matrix represents the mean value of AUCell score per single-cell cluster. The x axis shows cancer cell clusters, and the y axis shows regulons. See Supplementary Table 3 for full gene names. d, Binding motifs for the corresponding TFs and their activity (y axis) plotted over pseudotime (x axis) for selected examples of regulons shown in c. The complete list of predicted regulons and their binding motifs is available in Supplementary Table 7. Data in c and d refer to cells described in b. e, Expression heat maps showing changes in the transcriptional programs in EMT-T1 and EMT-T2 (genes were predicted by Moran’s I-test and ordered over pseudotime as in Fig. 2e). n indicates the number of genes (EMT-T1: group 1, n = 173; group 2, n = 158; group 3, n = 199; group 4, n = 398. EMT-T2: group 1, n = 220; group 2, n = 467). f, Top: dot plot showing GO terms associated with the two EMT trajectories, embryonic and adult-like, related to development/invasion and inflammation, respectively. Bottom: dot plot showing BC-PINGs enrichment. P values were determined using Fisher’s exact test for the dot plots, and for BC-PINGs, P values were determined based on the cumulative distribution function of the hypergeometric distribution. g, Enrichments of BC-PINGs and inflammatory score represented over a UMAP embedding as in b. Data in f and g refer to genes in e.

Source data

Extended Data Fig. 9. Molecular characterisation of the two EMT trajectories in PyMT breast cancer.

Extended Data Fig. 9

a, Expression heatmap extending the analysis shown in Fig. 5a to discriminate between EMT-T1 and EMT-T2 transcriptional programmes. Cells in the two EMT trajectories follow completely different paths associated with phenotypic transitions and EMT-TF expression codes. In EMT-T1, cancer cells progressively lose lumino-alveolar differentiation genes like Csn3, Lalba, Wap and evolve towards a stem/progenitor-like state (cluster 10), including expression of Aldh1a3, pro-stemness genes such as Ndrg1 (ref. 79). Cluster 14, compatible with a partial EMT status, and loosing epithelial genes such as Epcam, Cldn3 and Cldn7, is followed in the pseudo time analysis by clusters 12 and 16. Cluster 14 contains pluripotency markers such as Wnt9a, Bmp1, Id1, Id3, and Igf1 plus mammary gland embryonic and basal-like signatures while progressing towards a full EMT state exemplified by high expression of Vim and Cdh2 (Cluster 16). An invasion program is already evident in cluster 14, with cells expressing genes that regulate cell migration and cytoskeleton remodeling (Tnc, Gsn, Palld, Cnn2, Tpm1, Tpm2 and Mmp14). The invasion signature is amplified in clusters 12 and 16, with prominent expression of additional invasion genes including cytoskeleton regulators (Mylk, Tagln and Pdpn), guidance receptors (Sema5a and Nrp2) and microenvironmental modulators like metalloproteinases and Lysyl oxidases (Mmp2, Mmp3 and LoxL1) in the latter. Initiation of the Hallmark_EMT signature in cluster 14 concurs with the detection of Snail2 in addition to Snail1 and Twist1 in cluster 12. The progression towards more advanced EMT state is coupled to an increase in Zeb1 and Prrx1. In EMT-T2, the lumino-alveolar epithelial phenotype of cluster 11 progresses to the partial EMT phenotype of cells in clusters 1, 13 and 15, still maintaining expression of epithelial genes while activating some mesenchymal genes shared with the EMT-T1 trajectory (for example Sparc, Postn, S100a4), but without progressing to full EMT. EMT2 has a remarkable enrichment in injury response genes (for example Egr1, Jun, Junb, Fos, Fosb, and Lcn2) and inflammatory regulators, including secreted factors (Spp1), components of the TNF-α/interferon and NF-κB pathways (for example Nfkbia, Ccrl2 and Notch2) and inflammatory biomarkers such as serum amyloid A proteins (Saa2, Saa1) or Lymphocyte antigen-6 family genes (Ly6k and Ly6d). Additional enrichment for pro-inflammatory genes are seen in cluster 2, the most prominent cluster in this branch, including additional interferon regulators and downstream targets genes (for example Irf7, Ifitm3, Ifitm2, and Cxcl16). In addition, EMT-T2 is enriched for pro-fibrotic genes as the tissue inhibitor of metalloproteinases (Timp2, Timp3 and Timp1). All of this indicates that, in EMT-T2, the transition to a partial EMT is concomitant with the acquisition of an inflammatory and pro-fibrotic phenotype. In contrast to EMT-T1, among EMT-TFs, only Snail1 is detected in clusters 1, 13 and 15. Abbreviations as in Figs. 4 and 5. See Supplementary Table 2 for full gene names. n = 928 and n = 687 genes (EMT-T1 and EMT-T2, respectively).b, Dot plot showing an extended version of the GO terms enriched in different cancer cell clusters and across EMT trajectories shown in Fig. 5b. Interestingly, common pathways are associated with the activation of an EMT programme including regulation of cell cycle and resistance to cell death80. Developmental pathways are associated with EMT-T1 and those related to inflammation are enriched in EMT-T2. c, Dot plot showing the enrichment in the BC-PINGs signature and in genes upregulated in TGFβ-treated invasive MDCK-NBL2 and non-invasive MDCK-II in cancer cells. Clusters represented as in (b). In (b) and (c), n= number of genes. EMT-T1: group 1, n = 173; group 2, n = 158; group 3, n = 199; group 4, n = 398. EMT-T2: group 1, n = 220; group 2, n = 467. P values were determined using Fisher´s Exact Test for the dot plot (b) or based on the cumulative distribution function of the hypergeometric distribution (c).

Source data

Embryonic and adult EMTs for cancer progression

EMT-T1 and EMT-T2 markers show a nonoverlapping localization, with the invasive EMT-T1 cells located at the tumor margins and EMT-T2 cells distributed within the tumor (Fig. 7a,b). This structure was reproduced in tumoroids derived from disaggregated PyMT primary tumors cultured in 3D collagen matrices (Fig. 7c–e). Tumoroids also confirmed the invasive nature of cancer cells expressing EMT-T1 markers (Fig. 7d). We next examined the distribution of EMT-T1 and EMT-T2 markers in human triple-negative breast cancer (TNBC) samples and observed that they are also expressed in nonoverlapping populations (Fig. 7f). Furthermore, the clusters characterized in EMT-T1 and EMT-T2 trajectories were identified in breast luminal cancer and enriched in TNBC (Fig. 7g). Thus, individual mouse and human tumors that progress to the aggressive basal-like phenotype, even if they are of luminal origin17,32,43, can bear segregated cell populations that have undergone EMT with either an embryonic-like proinvasive phenotype or an adult progenitor phenotype with proinflammatory and profibrotic properties.

Fig. 7. The two EMT programs are activated in segregated cancer cell populations.

Fig. 7

a, Expression of EMT-T1 markers (KRT14 and p63, top) and EMT-T2 markers (Jun and KLF4, bottom) in PyMT primary tumors. Cancer cells are identified with an antibody to PyMT or by the expression of the reporter (K14cre;Rosa-tdTomato). p63 is expressed in adult basal cells and can reprogram adult luminal cells into basal cells33. In PyMT tumors, luminal cells acquire a progenitor basal-like phenotype. b, Nonoverlapping expression of the EMT-T1 (T1) marker KRT14 at the tumor/stroma interface and the Jun EMT-T2 (T2) marker, enriched in more internal positions. The arrowhead (bottom) indicates EMT-T1 cancer cells with invasive protrusions; Str, stroma. Nuclei are in blue; scale bar, 50 μm in a and b. Images are representative of at least six primary tumors from three mice. c, PyMT tumoroid invasion assay. Primary tumors collected from PyMT;K14cre;Rosa-tdTomato mice were disaggregated into small fragments, embedded into 3D collagen matrices (see Methods) and cultured for 3 days. Some tumoroids spontaneously invaded the surrounding environment. d, Cells expressing KRT14, an EMT-T1-specific marker, are enriched at the invasive edges (arrows). e, Cells expressing high levels of the EMT-T2-specific marker Jun are enriched in central areas. Tumoroid images are representative of three independent cultures; scale bars, 25 μm. Dashed lines in d and e delineate tumor edges. f, Expression of N-cadherin (CDH2) and Jun (EMT-T1 and EMT-T2 markers, respectively) in human TNBC. Arrowheads indicate cancer cells expressing either the EMT-T1 (green) or EMT-T2 marker (magenta). Pan-keratin (CKs) identifies cancer cells. Nuclei are in blue; scale bar, 50 μm. Images are representative of six breast cancer sections. g, Enrichment in EMT-T1 and EMT-T2 clusters in human breast cancer. Results of a gene set variation analysis (GSVA) of cancer clusters from EMT-T1 and EMT-T2 (see Methods) in different breast cancer subtypes are shown. The TNBC subtype shows the enrichment score for both EMT-T1 and EMT-T clusters. Boxes indicate the median and IQR (25th to 75th percentiles), and whiskers indicate the highest and lowest values within 1.5 times the IQR. Outliers are marked as dots (luminal A + B group, n = 3 individuals; HER2, n = 6 individuals; TNBC, n = 11 individuals). P values were determined using an unpaired two-sided t-test.

Source data

Plasticity of invasive and inflammatory EMT trajectories in tumors

To confirm the proposed functions of the two trajectories, we decided to challenge them by deleting EMT-TFs. We first generated mice bearing PyMT tumors deficient for SNAIL1, activated in both EMT-T1 and EMT-T2 trajectories (Fig. 5e and Extended Data Fig. 9a). SNAIL1 deficiency (Snail1 cKO; Fig. 8a) dramatically reduced both the number and size of tumors (Fig. 8b), compatible with the described early activation of SNAIL1 in luminal cells and its ability to confer tumor-initiating capacities in the PyMT model44. In agreement with a pioneer role in the activation of EMT, the majority of the few and small SNAIL1-deficient tumors were highly differentiated, in clear contrast to control tumors, undifferentiated and compatible with grade 3 (Fig. 8c). The strong impact of SNAIL1 loss in breast tumor development did not allow trajectory analysis, but we revealed its regulatory role in inflammatory EMT in a human inflammatory cell line (see below). We challenged EMT-T1 trajectory-generating mice bearing PRRX1-deficient tumors (Fig. 5e and Extended Data Figs. 8f and 9a). In contrast to SNAIL1 loss, PRRX1 deficiency (Prrx1 cKO; Fig. 8d and Extended Data Fig. 10a–d) did not modify the number or the size of the tumors (Fig. 8e), although they were less advanced than the controls, containing areas typical of carcinoma in situ (Fig. 8f).

Fig. 8. Plasticity between invasive and inflammatory EMT trajectories in PyMT breast cancer.

Fig. 8

a, Design to combine the conditional loss of Snail1 and genetic tracing of PyMT cancer cells (see Methods); UTR, untranslated region. b, Analysis of primary tumor burden per mouse (CTR, n = 18; Snail1 cKO, n = 15 mice). c, Hematoxylin and eosin images of control and Snail1-cKO tumors. Tumor differentiation grade was determined by mitosis rate, cellular pleomorphism and atypia (grade 1, well differentiated; grade 2, moderately differentiated; grade 3, poorly differentiated). Quantification is expressed as the percentage of tumors (CTR, n = 14 tumors and Snail1 cKO, n = 13 tumors from seven mice per condition); scale bar, 200 μm. d, Design to combine the conditional loss of Prrx1 and genetic tracing of PyMT cancer cells. e, Analysis of primary tumor burden per mouse (CTR, n = 18; Prrx1 cKO, n = 11 mice). f, Hematoxylin and eosin images of control and Prrx1 cKO tumors. Tumor differentiation grade was determined and represented as in c (CTR, n = 14 tumors and Prrx1 cKO, n = 18 tumors from seven mice per condition); scale bar, 200 μm. g, PRRX1 and vimentin coexpression in cancer and stromal cell subpopulations. Arrows indicate PRRX1-expressing cancer cells (red and blue) that have activated EMT (green). h, PRRX1 expression in hybrid E/M cancer cells identified by Epcam (blue) and vimentin (green) coexpression (arrows). Nuclei are in blue; scale bar, 50 μm (g and h). Images in g and h are representative of n = 3 control tumors from independent mice. i, Invasive versus total tumor areas (percentage mean ± s.e.m.; n = 5 mice per group). j, Three-dimensional reconstitution of whole-mounted lung lobes showing metastatic foci (tdTomato+) and metastatic burden quantification (n = 7 mice per group); scale bar, 1 mm. k, RT–qPCR FC expression for markers of clusters in the two EMT trajectories in PRRX1-proficient (CTR) and PRRX1-deficient (cKO) cancer cells sorted by FACS (n = 6 tumors from three mice per condition). l, Similar RT–qPCR for EMT-T2 markers in the inflammatory breast cancer SUM149PT cell line after SNAIL1 downregulation (n = 5 independent experiments with similar results per condition). m, Expression of cytokines in Prrx1-cKO versus control tumors. n, Quantification of infiltrating (Inf) and noninfiltrating (Noninf) F4/80+ cells (n = 5 tumors from three mice per condition). See Extended Data Fig. 10h for immunofluorescence. o, Top: pan-macrophage marker (F4/80) and CD163+ subpopulation in control and Prrx1-cKO primary tumors. In the latter, the infiltrating F4/80 macrophages are negative for CD163. Bottom: EMT-T2 marker KLF4 and MHC class II expression in control and Prrx1-cKO primary tumors. The increase in EMT-T2 is associated with an increase in MHC class II+ cells (tumor and stroma). Images are representative of five tumors from three mice per condition. Nuclei are in blue; scale bar, 100 μm. Dashed lines delineate tumour compartments, cancer cells and stroma (Str). p, EMT programs in development, organ fibrosis and cancer. During embryonic development, invasive EMT allows cells to disseminate and give rise to different cell types during organogenesis. In the adult, cells activate a noninvasive EMT as an inflammatory repair response to injury. This regenerative program can evolve toward a prodegenerative process by promoting fibrogenesis. In cancer, both invasive and inflammatory EMTs are activated within the same tumor in distinct cell populations, with antagonic pro- and antitumor roles. In the absence of PRRX1 in cancer cells, embryonic-like EMT is truncated, and adult-like inflammatory EMT is enhanced, preventing dissemination and converting cold into hot tumors with infiltrating antitumor inflammatory macrophages; red, invasive EMT; blue, inflammatory EMT; gray, tumor bulk; white, infiltrating macrophages; e-EMT, embryonic EMT; a-EMT, adult EMT. The tumor microenvironment is not shown. Image created with BioRender.com under Academic License Terms with agreement number VT24MOOYXZ. Boxes (b, e, jl and n) show medians and IQRs. Whiskers indicate the minimum and maximum values. P values were determined using an unpaired two-sided Mann-Whitney U-test.

Source data

Extended Data Fig. 10. Generation and characterisation of Prrx1 conditional mutant mice.

Extended Data Fig. 10

a, Summary of the strategy used to generate an exon 2 double-floxed Prrx1 allele, Prrx1em1An. Mouse embryonic stem cells (mESC) where edited using the CRISPR/Cas9 system to endogenous Prrx1 exon2 by an exon 2 double-floxed Prrx1 cassette flanked by left and right homology arms. Green arrows show the position of the primers used to screen for the recombined alleles. Black arrows show the position of primers used for genotyping (see also Methods). b, Validation of the Prrx1 cKO model. Prrx1em1An/em1An zygotes were treated with TAT-CRE and at 2-cell stage implanted into pseudopregnant females. c, Implanted embryos were collected at E13.5, and used to generate mouse embryonic fibroblasts (MEFs) from untreated or previously TAT-CRE treated zygotes. tdTomato expression in over 90% of MEFs, indicated the high efficiency of TAT-CRE-mediated recombination (not shown). WB showing the loss of Prrx1 protein in MEFs derived from indicated genotypes. Note that TAT-CRE treated Prrx1flox/flox embryos (well number 3) have the same profile as Prrx1 homozygous mutant embryos (well number 5), detecting the Prrx2 protein. This confirms the efficacy of our strategy. The WB represents MEFs from one of three independent validation experiments. d, Images of the palate in newborn (P0) from untreated and TAT-CRE treated PRRX1flox/flox zygotes. Animals derived from TAT- CRE treated zygotes show high td-Tomato recombination and a fully penetrant cleft palate phenotype, as reported in PRRX1 null mutant mice35. TAT-CRE treated zygotes, n = 7; untreated, n = 3. LPS: lateral palatal shelf; NS: nasal septum. e, Images of invasive areas (surrounded by the dashed lines) of the primary PyMT tumours defined by Pan-Laminin low/K14 positive cancer cells. n = 3 mice/genotype. Nuclei in blue. Scale bar, 100 μm. f, Expression of EMT-T2 specific marker Klf4 in PyMT primary tumour n = 3 mice/genotype. Nuclei in blue. Scale bar, 100 μm. g, Upper panel, Venn diagram showing the genes enriched in TGFβ-treated MDCK-II (non-invasive EMT) vs NBL2 (invasive EMT) cells compared to those negatively regulated by PRRX1 in the latter, and the overlap (180 genes). Lower panel, dot plot showing that within the group of 180 genes, there is enrichment for those associated with inflammation and immune regulation in KEGG and GO gene datasets. Our data are compatible with Prrx1 preventing their activation in invasive cells. h, Expression of the pan macrophage (F4/80) marker in CTR and Prrx1 cKO primary tumours. Images are representative of n = 5 tumours from 3 mice/condition. i, Left panels, single channels corresponding to pictures shown in Fig. 8o showing the expression of the pan macrophage marker (F4/80) and of the protumour anti-inflammatory Cd163 positive subpopulation in CTR and Prrx1 cKO primary tumours. In the Prrx1 cKO tumours, the infiltrating F4/80 macrophages are negative for Cd163. Right panels, single channels corresponding to Fig. 8o showing the expression of the EMT-T2 marker Klf4 and of MHC-II in CTR and Prrx1 cKO primary tumours. Note that in the latter the increase in EMT-T2 is associated with an increase in antitumour inflammatory MHC-II positive cells in the stroma and, importantly, in the core of the tumour. Images are representative of n = 5 tumours from 3 mice/condition. Nuclei in blue. Scale bar, 100 μm (h) and (i). P values were determined using Fisher´s Exact Test for the dot plot (g).

Source data

We found cancer cells coexpressing PRRX1 and the mesenchymal marker vimentin that are close to the tumor border (Fig. 8g) and also cells coexpressing PRRX1, the epithelial marker Epcam and vimentin (Fig. 8h). This indicates that PRRX1 is already activated in partial EMT states. As its expression induces invasive properties (Fig. 2), this state is consistent with cells with a hybrid E/M phenotype bearing invasive properties as those shown to bear increased metastatic potential810. We also found that the invasive areas were very much reduced in Prrx1-mutant tumors (around seven times) compared to control tumors (Fig. 8i and Extended Data Fig. 10e), compatible with (1) a more differentiated status in PRRX1-deficient tumors, (2) PRRX1 localization in cancer cells at the periphery of the tumor (Fig. 8g,h) and (3) PRRX1 association with invasiveness. As expected from poor invasive activity, mice bearing PRRX1-deficient tumors showed a dramatic reduction in lung metastatic burden (Fig. 8j). Together, this corresponds to a truncated EMT-T1 trajectory, confirmed by changes in expression of genes specific for different clusters along the trajectory (Fig. 8k). Interestingly, the expression of EMT-T2-specific markers increased in PRRX1-deficient tumors (Fig. 8k and Extended Data Fig. 10f), including transcriptional regulators Klf4, Junb, Mafb and markers of acute inflammation (Saa1 and Saa2) and inflammatory breast cancer (Egr1 and Junb)45. On the other hand, PRRX1 knockdown (siPRRX1) in TGFβ-induced invasive EMT in MDCK-NBL2 cells was sufficient to increase the expression of inflammation-associated genes (Extended Data Fig. 10g). Once the role of PRRX1 in EMT-T1 and invasion was confirmed, we wanted to assess the role of Snail1 in EMT-T2, the only EMT-TF expressed in this trajectory (Extended Data Fig. 9a). In the absence of well-developed SNAIL1-deficient tumors, we downregulated SNAIL1 expression in the inflammatory breast cancer cell line SUM149PT and found a decrease in the expression of the EMT-T2-specific transcriptional regulators and inflammation markers (Fig. 8l), confirming its role in regulating inflammation as it occurs in fibrosis (Fig. 4). Finally, Prrx1-mutant tumors, with enhanced EMT-T2, consequently showed an increase in proinflammatory cytokines but also a decrease in the anti-inflammatory cytokine interleukin-13 (IL-13; Fig. 8m) and high infiltration by tumor-associated macrophages (Fig. 8n and Extended Data Fig. 10h) of the proinflammatory antitumoral type (MHC class II+; Fig. 8o and Extended Data Fig. 10i). Together, this confirms cell dissemination and inflammation as key functions associated with EMT-T1 and EMT-T2 trajectories, respectively, and reveals their interdependence in breast cancer cells and PyMT cancer cell evolution.

Discussion

Parallel analysis of the EMT programs activated in cells treated with TGFβ during embryonic development and adult organ fibrosis has allowed us not only to define the EMT trajectories and functions associated with each of these processes but also better interpret the two alternative EMT trajectories that we have found in cancer. Cancer cells respond to oncogenic activation either as embryonic-like or adult-like cells, leading to different outcomes. The former corresponds to the well-known function of EMT in invasion and dissemination, and the latter corresponds to an antitumor inflammatory injury response.

We find that EMT activation is concomitant with dedifferentiation in adult cells, both in fibrosis and breast cancer. This dedifferentiation step is reminiscent of the lineage infidelity and plasticity described in adult skin wound healing and cancer46, also observed in other carcinomas47. We propose that this plasticity is triggered by the activation of EMT, which also occurs concomitant with cell dedifferentiation in neuroblastoma and melanoma where adrenergic cells or melanocytes, respectively, reactivate embryonic neural crest markers41,48. Activation of EMT has also been associated with cell dedifferentiation and the emergence of repair cell states in limb, fin and heart regeneration in axolotl and zebrafish4. Interestingly, for repair, EMT needs to be transient, and a forced transient activation is also consistent with reinstating heart regeneration in mice49. EMT is also transiently activated in cancer, as successful metastatic colonization involves downregulation of the EMT program50,51. However, during renal fibrosis, EMT activation is not transient, and although triggering dedifferentiation, it progresses to degeneration and organ failure6,7. EMT is also known to be transiently required at the early stages of reprogramming of adult fibroblasts to induced pluripotent stem cells52. Thus, EMT lies at the core of somatic cell dedifferentiation as a driver of epimorphosis to achieve phenotypic plasticity in the adult.

We have defined two different EMT programs and their trajectories in neural crest and renal fibrosis representing the responses of embryonic and adult cells, respectively (Fig. 8p). We also showed that during tumor progression, the EMT-induced initial dedifferentiation step provides the required plasticity that is then followed by two alternative pathways that recapitulate either an embryonic-like or adult-like response. Cancer cells hijack both developmental and adult EMT plasticity programs normally used for cell invasion and migration or as a response to injury, respectively, to implement cell dissemination and antitumoral inflammation. Thus, the embryonic-like trajectory promotes tumor progression toward metastasis, whereas the adult-like trajectory represents a defense mechanism in response to damage, in this case induced by the oncogene.

Genetic challenge of EMT-T1 by deleting Prrx1 specifically in cancer cells confirms that invasion is the functional property associated with EMT-T1 and that PRRX1 is essential for the progression toward tumor invasion and dissemination. In its absence, the invasive trajectory is truncated, and metastatic burden is dramatically reduced (Fig. 8p), explaining recent findings where PRRX1-expressing cells were traced as those forming metastases in melanoma53. The downregulation of SNAIL1, the only EMT-TF detected in EMT-T2, confirms in a human inflammatory breast cancer cell line the predicted regulatory structure of the adult-like EMT trajectory in tumors and reinforces inflammation as its functional property. As such, a subset of cancer cells becomes inflammatory like and expresses genes encoding inflammatory cytokines, including TNF, IL-6, CCL2 and CCL5, like renal epithelial cells during fibrosis6,7. Our data are compatible with these inflammatory cytokines attracting macrophages to the tumor, in particular the major histocompatibility complex class II-positive (MHC class II+) antitumor inflammatory population found in the proximity of the secreting EMT-T2 cancer cells. This response is exacerbated in Prrx1-mutant tumors, where the invasive EMT is truncated, and more cells engage in the inflammatory EMT trajectory now sufficient to convert cold tumors into hot tumors, opening avenues for the design of therapeutic approaches. The relative contribution of EMT-T1 and EMT-T2 in PRRX1-proficient and PRRX1-deficient tumors points to the interdependence of the two trajectories that, sharing a common origin in the breast luminal cell, can be plastic in response to tumor traits and, likely, also to microenvironmental changes, including cancer cell–stromal interactions. As the two EMT programs operate in different cells (Fig. 8p), individual tumors bear dedicated EMT populations to fulfill specific and very distinct functions, adding another layer of intratumor heterogeneity related to not only the expected different EMT phenotypes (epithelial cancer cells moving along the epithelial-to-mesenchymal spectrum) but also the alluded distribution of antagonistic pro- and antitumor functions, namely dissemination and inflammation. In the latter, EMT induces antitumor responses, but the response to injury can also lead to degeneration in chronic settings as in fibrosis. Thus, further studies are warranted to examine whether the antitumor inflammatory trajectory can also evolve to favor tumor progression.

Methods

EMT analysis in individuals with breast cancer

EMT gene expression signatures

Enrichment of gene expression signatures found in each cluster of EMT-T1 and EMT-T2 was computed using the GSVA (v.1.34.0) R/Bioconductor package54 to perform GSVA in breast cancer expression data obtained from Chung et al. (Gene Expression Omnibus (GEO) GSE75688)55. R packages dplyr (v.1.0.3), magrittr (v.2.0.1) and tibble (v.3.1.2) were used to transform gene expression data to the required GSVA input format, and ggpubr (v.0.4.0) was used to generate the GSVA enrichment score box plots.

Human breast cancer tumor multiplex immunofluorescence

Triple immunofluorescence was performed on 2-µm tumor sections from human TNBC samples using a BOND RX Fully Automated Research Stainer and an Opal TM 7-Color Automation IHC kit (Akoya Biosciences). Opal-650, Opal-520 and Opal-570 were used to detect antibodies to c-Jun, cytokeratins AE1/AE3 (pan-cytokeratin) and N-cadherin, respectively. Slides were mounted with Prolong Diamond (Molecular Probes) and imaged using the Thunder imaging system (Leica). Samples were acquired from the Biobank of the Anatomy Pathology Department (record number B.0000745, Instituto de Salud Carlos III National Biobank Network) of the MD Anderson Cancer Center, Madrid, Spain. This study was performed following standard ethical procedures of the Spanish regulation (Ley de Investigación Orgánica Biomédica, 14 July 2007) and was approved by the ethic committees of the MD Anderson Cancer Center, Madrid, Spain.

Animal experiments

Mice were fed ad libitum. Housing and experimental procedures were conducted in strict compliance with the European Community Council Directive (89/609/EEC) and the Spanish legislation. Ethical protocols were approved by the Consejo Superior de Investigaciones Científicas (CSIC) Ethical Committee and the Animal Welfare Committee of the Institute of Neurosciences. Animals for experiments were selected by genotype, and no randomization or blinding was performed. Animals were housed under specific pathogen-free conditions at the Ratones Modificados Genéticamente animal house (ES-119-001001 SEARMG).

Kidney fibrosis model

To genetically label renal tubular epithelial cells, we generated a mouse line with the Rosa-LSL-tdTomato reporter line Ai9/RCL-tdT56 (kindly provided by O. Marin, King’s College London), activated in renal tubular cells by a Cre recombinase under the control of the kidney-specific promoter Ksp1.3 (ref. 57; kindly provided by P. Igarashi, University of Minnesota). To inactivate SNAIL1, we crossed Snail1fl/fl mice6 with the strain bearing the Ksp1.3-cre transgene mentioned above. Mice were maintained in the C57BL/6 background. Male and female mice (8–12 weeks old) were subjected to UUO following the surgery protocol described in Grande et al.6. UUO was maintained for 1, 2 or 3 weeks.

Breast cancer model

Mouse experiments were performed in the MMTV-PYMT model31 crossed with a Rosa-LSL-tdTomato reporter line56, purchased from JAX Mice (The Jackson Laboratory), expressing tdTomato upon Cre-mediated recombination. Cre recombinase is expressed under the control of the KRT14 promoter (Tg(KRT14-cre)1Amc/J; 004782)58. Mice were backcrossed in the FVB background for at least ten generations (99.9% FVB). Considering that the study focuses on breast cancer, only female PyMT mice were used. Health state and tumor size and burden were monitored weekly. Maximal tumor size/burden was defined as equal to or greater than 1,500 mm3 per tumor or equal to or greater than 3,000 mm3 total tumor burden following the ethical protocols approved by the CSIC Ethical Committee and the Animal Welfare Committee of the Institute of Neurosciences (protocols 2015/VSC/PEA/00211 and 2019/VSC/PEA/0218). Maximum size was never exceeded. Raw data for animal tumor experiments are available in the Source Data.

Generation of Snail1 and Prrx1 conditional mutant tumors

To specifically inactivate SNAIL1 in breast cancer cells, we generated a mouse line crossing the above-described line with a Snail10fl/fl line6 (cKO). To specifically inactivate PRRX1, we used a similar strategy, crossing the mice with a newly generated Prrx1 conditional mutant mouse line described in the next section.

Generation of Prrx1 conditional mutant mice (Prrx1em1An)

Mouse embryonic stem (mES) cells were edited using the CRISPR–Cas9 system to replace the endogenous Prrx1 exon 2 with an exon 2 double-floxed Prrx1 cassette flanked by homology arms. We electroporated mES cells with a mix of (1) PX458 plasmid59 (Addgene, 48138) to drive the expression of the SpCas9 protein together with green fluorescent protein and a guide RNA targeting the CTGTGCTTCTTTGGGTAGAA(TGG) sequence downstream of Prrx1 exon 2 and (2) a linearized double-stranded donor cassette containing double-floxed Prrx1 exon 2 flanked by homology arms engineered to replace the endogenous Prrx1 exon 2 after homologous recombination. Successfully electroplated mES cells were selected by assessing green fluorescent protein expression, and cells were expanded in culture until further recombination screening by conventional PCR and sequencing. mES cells with correct recombination of the double-floxed Prrx1 allele were used to generate chimeric mice following conventional protocols. Chimeras with high ES contribution were backcrossed in FVB/N and C57 backgrounds to generate stable mouse colonies carrying the Prrx1em1An allele, which were fully viable and fertile in both genetic backgrounds.

Kidney, mammary tumor and lung samples

Kidney, tumor and whole-lung samples where fixed in 4% paraformaldehyde (PFA) overnight at room temperature. Prefixed kidney and tumor samples were embedded in paraffin or OCT (Sakura) for further sectioning and collection on SuperFrost plus microscope slides.

Cell culture

Two-dimensional cell culture

MDCK-NBL2 and MDCK-II cell lines were purchased from ATCC and Sigma (European Collection of Authenticated Cell Culture), respectively. SUM149PT cells were purchased from Asterand. MDCK-NBL2 and MDCK-II cells were cultured in DMEM (Sigma) supplemented with 10% heat-inactivated fetal bovine serum (Sigma), 1% gentamicin (Sigma) and 1% amphotericin (Sigma). SUM149PT cells were cultured in Nutrient mixture Hams F12 supplemented with 5% inactivated fetal bovine serum, HEPES (10 mM), insulin (5 μg ml–1), hydrocortisone (1 μg ml–1) and antibiotics. Cells were grown at 37 °C and 5% CO2, and the medium was replaced every 2 or 3 days. Cells were passaged up to a maximum of eight times.

Three-dimensional cell culture

Collagen gel containing Bovine Collagen Solution type I (Gibco; 2.5%), GlutaMAX (1×), MEM (1×), NaHCO3 (0.23%) and HEPES (0.1 M) was prepared on ice at pH 7.0–7.5. Glass coverslips were deposited at the bottom of 24-well culture plates, covered with 100 μl of collagen gel and cultured at 37 °C without CO2 for 30 min. After solidification, 100 μl of collagen containing 5 × 103–10 × 103 MDCK cells was added on top and incubated for 30 min. These 3D cultures were incubated at 37 °C and 5% CO2. MDCK medium (200 μl) was added gently after 2 h and replaced every 48 or 72 h.

TGFβ administration, RNA interference experiments and treatment with inhibitors

A stock solution of human recombinant TGFβ (MERQ; SHENANDOAH) was prepared at 2 μg ml–1. All treatments in 2D cultures (5 ng ml–1) started 24 h after seeding cells (104 cells in six-well plates or 75 × 104 cells in 10-cm culture dishes), and the medium containing TGFβ was replaced every 48 h. Cells were never seeded from high-confluency cultures to avoid a reduction in the response to TGFβ. TGFβ administration (high-dose 5 ng ml–1, low-dose 0.3 ng ml–1) in 3D collagen cultures started after the formation of polarized MDCK spherical cysts.

For RNA interference experiments, siRNA was transfected using Lipofectamine RNAiMax (Thermo Fisher Scientific) following the manufacturer’s protocol. TGFβ was administered 8 h after transfection and refreshed every 48 h.

siPRRX1 duplex oligonucleotides (Sigma) were prepared at 20 μM. PRRX1 siRNA (cfa-si-PRRX1-I sense: GAGCGCGUCUUUGAGAGAACACACU(dT)(dT)) was used at a final concentration of 10 nM. BLOCK-iT fluorescent oligonucleotide (20 μM) was used as RNA interference control.

Hs-si-SNAIl1 oligonucleotides were purchased as Dicer-substrate siRNA duplex oligonucleotides (2 nmol) directed to SNAIL1, which were resuspended in nuclease-free water to a final concentration of 100 μM. Working stocks were prepared using the buffer provided. The best SNAIL1 downregulation was obtained with a combination of DsiSNAIl1.13.1 and DsiSNAIl1.13.2 (final concentration of 2.5 M each) 72 h after transfection. DS NC1 oligonucleotide was used as a negative control.

For focal adhesion signaling inhibition, stock solutions of FAK Inhibitor 14 (Sigma) were prepared in DMSO at 10 mM and further diluted in culture medium to a final concentration of 0.2 μM.

Primary tumor-derived tumoroids and invasion assay

Primary tumor tumoroids were prepared and embedded in 3D collagen gels following a protocol modified from Cheung et al.42. In summary, mammary gland carcinomas were collected from 14-week-old female mice, minced into tumor fragments and embedded in collagen gel containing Rat Collagen Solution type I (Corning; 2.5%) in DMEM (1×), NaHCO3 (0.23%) and HEPES (0.1 M) prepared on ice at pH 7.0–7.5. A volume of 100 μl of tumor fragments and collagen mixture was added on top of previously solidified cell-free collagen gel plated in 48-well plates and incubated at 37 °C and 5% CO2.

The tumoroids were washed twice with PBS and fixed with PFA for 60 min at room temperature. Fixed organoids were washed three times for 30 min each in PBS and blocked/permeabilized for 4 h at room temperature with immunofluorescence blocking buffer (IFBB+: 5% normal goat serum, 1% bovine serum albumin, 1% Triton X-100 and 0.1% sodium azide in sterile PBS). Blocking solution was substituted with the primary antibody diluted in IFBB+ and incubated overnight at room temperature on a rocker plate. Tumoroids were washed three times for 30 min each in PBS with 1% Triton X-100 and incubated for 24 h with secondary antibodies and DAPI. Finally, tumoroids were washed three times for 60 min each in PBS with 1% Triton X-100 and mounted on glass-bottom microwell dishes (MatTek) using antifade mounting medium (Dako). Primary and secondary antibodies used are shown in Supplementary Table 4. Tumoroids were photographed using a Leica SPEII confocal microscope, and acquired images were analyzed with ImageJ and Adobe Photoshop CS6 software programs.

Immunofluorescence

Cells in culture

MDCK cells were grown on coverslips in six-well plates under the culture and treatment conditions described above. Cells were rapidly washed twice with PBS, fixed with PFA for 15 min at room temperature, rinsed with PBS at least three times for 10 min each and directly used for immunofluorescence or stored at 4 °C in PBS + 0.02% azide for less than 1 week. For immunofluorescence staining, coverslips were deposited in a humidified chamber and blocked/permeabilized for 1 h with immunofluorescence blocking buffer (IFBB: 5% normal goat serum, 1% bovine serum albumin and 0.2% Triton X-100 in sterile PBS). Blocking solution was substituted with the primary antibody diluted in cold IFBB, and the staining chamber was incubated overnight at 4 °C. Coverslips were washed three times for 10 min each in PBS and incubated for 1 h with secondary antibodies and DAPI. Finally, coverslips were washed three times for 10 min each in PBS and mounted on glass slides using antifade mounting medium (Dako). Primary and secondary antibodies used are shown in Supplementary Table 4. Cells were photographed using a Leica SPEII confocal, Leica DMR or Zeiss Axio microscope. Acquired images were analyzed with ImageJ and Adobe Photoshop CS6 software programs.

Kidney and tumor samples

Paraffin-embedded sections were dewaxed, and protein epitopes were unmasked by immersion in 95 °C preheated citrate (pH 6.0) or Tris-EDTA (pH 9.0) buffer for 20 min. OCT or unmasked paraffin sections were washed three times in PBS for 5 min and subjected to the immunofluorescence protocol described above for cell lines. For information on primary and secondary antibodies, see Supplementary Table 4. Images were acquired and analyzed as described for cells in culture.

Lungs

tdTomato+ metastasis was visualized by immunofluorescence on whole lungs, which were cleared following the iDISCO+ protocol60. Images were acquired using an UltraMicroscope II (LaVision BioTec). The acquired images were analyzed, and 3D reconstruction was performed using Vision4D (Arivis) Image Analysis Software. For the analysis of metastatic burden, 3D reconstruction was performed using Imaris software (version 9.3.1; BitPlane). The ‘Surface’ function was used for segmentation (tdTomato signal), and volumetric data were extracted for the identified metastatic objects. To avoid false positives due to occasional secondary antibody trapping or nonspecific autofluorescence, a tdTomato lung lobule was analyzed, and a detection cutoff of 90,000 μm3 was identified as the minimum volume for object identification with high confidence.

Western blotting

Cells were washed twice with ice-cold PBS and lysed in freshly prepared cold RIPA buffer supplemented with a protease inhibitor cocktail (Complete Mini, Roche). When necessary, cells were passed through a 25-G syringe to help homogenization. Total protein extracts were quantified using a bicinchoninic acid assay (Thermo Fisher Scientific) and quality checked by Coomassie assay. Before electrophoresis, protein lysates were denatured by boiling with 6× Laemmli loading buffer at 99 °C for 10 min. After electrophoresis, proteins were transferred to PVDF membranes, which were blocked for 1 h at room temperature in 5% nonfat milk in Tris-buffered saline with Tween® 20 Detergent and incubated overnight at 4 °C with blocking solution containing the primary antibody. Membranes were washed five to six times in Tris-buffered saline with Tween® 20 Detergent and incubated for 45 min with secondary antibody. After washing, staining was revealed with chemiluminescent reagents (Millipore) and captured using Amersham Imager 680 equipment (GE Healthcare). For further information on primary and secondary antibodies, see Supplementary Table 4.

Transwell cell migration assay

MDCK-II cells (noninvasive) were transfected with a plasmid carrying the coding region of the human PRRX1-L isoform. MDCK-II and MDCK-NBL2 cells transfected with an empty vector plasmid were used as negative and positive controls, respectively. Two days after transfection, cells were treated with TGFβ (5 ng ml–1), collected after 24 h and assessed for migratory capacity using a Boyden Chamber assay. The top chamber insert (Corning Costar Transwell) was covered with 50 μl of mouse collagen IV (Corning, 50 μg ml–1) and left to dry overnight. The resulting matrix was hydrated with 25 μl of water before seeding the cells. MDCK cells (25 × 104) were seeded and allowed to migrate in the presence of TGFβ (5 ng ml–1). Cell nuclei at the bottom of the insert were imaged 24 h after seeding and automatically counted using ImageJ.

Cytokine analysis

Cytokine analysis was performed on whole-tumor lysates using the proteome profiler mouse XL cytokine array (R&D Systems, ARY028), following the manufacturer’s instructions. A total of 200 mg of protein lysate was used per assay. Array membranes were imaged in an Amersham Imager 680 (GE Healthcare), and relative protein levels were calculated for cytokine spots using Matlab Protein Array Tool version 2.0.0.1, MATLAB Central File Exchange, Danny Allen (2022). See https://www.mathworks.com/matlabcentral/fileexchange/35128-protein-array-tool.

Total RNA extraction, cDNA synthesis and RT–qPCR

For gene expression analysis, RNA was extracted using an illustra RNAspin Mini (GE Healthcare) or mirVana miRNA (Ambion) Isolation kit. Retrotranscription was performed using a Maxima First Strand cDNA Synthesis kit (Thermo Fisher Scientific). RT–qPCR was performed using Fast SYBR Green Mastermix in a Step One Plus machine (Applied Biosystems) according to the manufacturers’ instructions. Relative RNA expression levels (relative FC) were calculated using the 2ΔΔCt formula. Quantitative RT–qPCR primers are listed in Supplementary Table 5.

In silico analysis of human cancer cell lines

Breast cancer cell line gene expression data13 were analyzed for epithelial and mesenchymal component enrichment. Epithelial and mesenchymal components were obtained from merging epithelial and mesenchymal signatures in refs. 12,61. Note that the EMT-TFs were removed from the mesenchymal signature to avoid biased correlations in subsequent analyses. Singscore62 was used to compute enrichment scores for epithelial and mesenchymal components (https://github.com/DavisLaboratory/singscore). Epithelial and mesenchymal enrichment values were plotted (x axis: mesenchymal score; y axis: epithelial score), and k-means clustering was used to partition the breast cancer cell lines according to the optimal number of clusters calculation, k = 3.

Bulk RNA-seq and data analysis

Sequencing

RNA was extracted using an illustra RNAspin Mini isolation kit from three biological replicates per condition. RNA quality check, mRNA library preparation (stranded) and paired-end read (75-bp length) sequencing using an Illumina HiSeq4000 platform were performed at the Centro Nacional de Análisis Genómico-Centro de Regulación Genómica facility in Barcelona, Spain.

Data analysis

Reads were aligned to the CanFam3.1 genome annotation (Ensembl v97) using STAR (2.5.3a)63. Quality control of sequenced reads was performed using FastQC (Babraham Institute), and gene expression was quantified using RSEM (1.3.0)64.

Functional enrichment analysis

We used the enrichR R package (v.2.1) to access the Enrichr database65 and performed general functional enrichment analysis, while the gseGO and gseKEGG functions in the clusterProfiler R package (v.3.10.0)66 were used for GSEA of GO terms and KEGG pathways. The R package msigdbr (7.0.1; https://CRAN.R-project.org/package=msigdbr) was used to obtain gene sets from MSigDB v7.0 (ref. 67) from Broad Institute. The R package GOSemSim (v.2.8.0)68 was used to filter GO terms by semantic similarity, and ggplot2 (v.3.3.0; https://cran.r-project.org/web/packages/ggplot2/index.html) and enrichplot (v.1.6.1; https://github.com/GuangchuangYu/enrichplot) were used to visualize functional enrichment results.

Single-cell preparation

Twelve-week-old male mice were subjected to UUO or sham surgery, and whole kidneys were collected after 10 days. Mammary gland carcinomas were collected from 12- to 14-week-old female mice. Collected tissue was minced manually using sterile scalpels and finely cut with a McIlwain Tissue Chopper (Ted Pella). Protocols for dissociation and single-cell Gel Bead-In Emulsions preparation using 10x Genomics kits and platforms are available at Protocol.io69.

Single-cell data analyses

The detailed version of this section is deposited in Protocol.io69.

Quality control, sample integration, dimensionality reduction and clustering

Reads were aligned to the mouse genome (mm10), and gene counting was performed using the CellRanger pipeline70 (10x Genomics). Low-quality cells were identified based on the percentage of mitochondrial genes (kidney < 10%; cancer < 5%), detected genes (400–4,000) and putative doublets using Scrublet (https://github.com/swolock/scrublet). For integration, we used the SCTransform workflow from Seurat71 with the top 3,000 highly variable genes (HVGs). A shared nearest neighbor (SNN) graph and UMAP were built over the top principal components (PCs; kidney = 25 and cancer = 30), and clusters were detected with resolutions of 0.65 and 0.03 for kidney and cancer, respectively. FindAllMarkers was used with a logistic regression method to detect the differentially expressed genes.

Compositional analysis for kidney cell populations

To investigate cell compositional changes in sham and UUO mice, we used the runCoda function from Cacoa72. The compositional analysis was performed with 1,000 bootstraps, and the glomerulus cluster was set as a reference.

Classification of injured epithelial cells in kidney single-cell RNA-seq

We used a deep learning multiclass classification approach to predict the origin of injured epithelial cells. The cells from epithelial clusters (see Fig. 3e and Extended Data Fig. 4d) were subset with SCT-normalized gene expression of HVGs. The MLPClassifier from scikit-learn v1.1.1 was used to build a training model, which was evaluated with tenfold cross-validation. Performance was measured using accuracy and Matthews correlation coefficient calculated using a confusion matrix with a one versus rest strategy. After evaluation, we rebuilt the MLP model using all cells from the epithelial component (excluding injured cluster) and performed the predictions for injured cells.

PT and injured cell subsets in kidney single-cell RNA-seq

PT clusters (see Fig. 3g) were subset, each contributing over 10% to the injured cell population, along with their associated injured epithelial cells in the prediction, and the PCs were recomputed using the same 3,000 HVGs. An SNN graph and UMAP were then built over the top five PCs, followed by clustering with a resolution of 0.1.

EMT, differentiation and inflammation score for PT and injured trajectories in kidney single-cell RNA-seq

Hallmark EMT from MSigDB (https://www.gsea-msigdb.org/gsea/msigdb) was used to calculate the EMT score using the AddModuleScore function. The differentiation score was calculated using common upregulated genes from PT segments25. Genes belonging to the kidney inflammatory pathways reported by Wu et al.27 were used to calculate the injury/inflammation score.

Cancer cell subset for downstream analysis

Another round of clustering was performed with a resolution of 0.8, which resulted in 28 clusters, of which 19 were cancer cell clusters (tdTomato expression). Among them, six clusters were identified with a high proportion of ribosomal genes and were excluded from downstream analysis. Cells from the remaining 13 clusters were used to calculate the PCs, followed by construction of an SNN graph and UMAP over the top 20 PCs. To detect the cancer cell clusters, we performed clustering with a resolution of 0.6. Markov affinity-based graph imputation of cells (MAGIC v.3.0.0)73 was applied to impute the expression of EMT-TFs encoding genes.

Trajectory inference using PAGA and RNA velocity

An integrated Seurat object (PT and injured cells; cancer cells subset) was exported into a Scanpy74 v1.6.0-compatible loom file. Using precomputed PCs and cell embeddings, we constructed a neighborhood graph with 15 neighbors and top PCs (kidney: 5; cancer subset: 20). A connectivity map was built using PAGA28.

To infer the directionality of the transcriptional changes for predefined EMT trajectories in the cancer subset, we subset the clusters 5, 11, 10, 14, 12, 16, 13, 15 and 1. We redefined the UMAP embedding over the top 17 PCs with min.dist = 0.2. Run10x utility from velocyto29 v0.17.17 was used to calculate sample-specific spliced/unspliced counts. Gene filtering was applied, followed by a detection of 3,000. Normalized spliced/unspliced counts were used for PC analysis, selecting the top PCs (147 for kidney data and 105 for cancer data) based on automatic detection of elbow point (cumulative variance ratio > 0.002). Data imputation was performed using 500 neighbors, and RNA velocity was estimated assuming a steady-state transition. Velocity was visualized over UMAP embedding using a regularized grid with a Gaussian kernel with a step size of 40. Additionally, for EMT trajectories in cancer, we ran Slingshot v2.2.0 (https://bioconductor.org/packages/release/bioc/html/slingshot.html) over the top 15 PCs obtained in RNA velocity analysis and fitted principal curves to predict lineages and infer the bifurcation point.

Pseudotime analysis for the inferred trajectories

The root cell was set as ATTCTTGAGTGCAAAT-1_2 for the PT injured population (maximum expression of the PT marker Slc22a12) and CTGATAGGTAAGAGGA_1 for the cancer subset (maximum expression of the epithelial gene Lalba). The diffusion map was built using the top PCs (kidney: 5; cancer: 15), followed by pseudotime calculation.

SCENIC analysis for regulon prediction

pySCENIC30 v0.11.0 was used to predict the expression-based regulon. A coexpression matrix was constructed between TFs and their target genes using the GRNBoost2 method. Motif enrichment was performed for the target genes using the SCENIC mm10 motif database (the region 500 bp upstream and 100 bp downstream to the transcription start site). For EMT trajectories in cancer, we set the normalized enrichment score to ≥1.75 and filtered out the target genes. Single-cell regulatory activity was calculated using the AUCell algorithm, and the average area under the curve score was used for the representation. Additionally, area under the curve scores for selected TFs were plotted over pseudotime, and a local regression curve was fitted using a generalized additive model with splines of degree = 5.

Trajectory-based differential expression analysis

Integrated Seurat objects for trajectories were converted to a Monocle3 (ref. 75) v.0.2.2 object with precomputed 3,000 HVGs, PC analysis and UMAP embeddings. Reduced dimensional space was used to construct the principal graph using reversed graph embedding. Moran’s I-test was used to predict the differentially expressed genes along the trajectory. Genes with significant difference over trajectory were retained. The smoothed minimum–maximum normalized expression was represented as a heat map. Pathway enrichment analysis was performed for the differentially expressed genes using R-based Enrichr76 API v.3.0.

Trunk neural crest single-cell RNA-seq data analysis

The raw gene expression matrix for trunk neural crest single-cell RNA-seq data18 was downloaded from the NCBI GEO database under GEO accession GSE129114. The connectivity map was built using PAGA (resolution = 0.5), as described earlier. For the pseudotime analysis SS2_15_0085_F22 was used as a root cell with maximum expression for the neuronal differentiation marker Hes5. Finally, Moran’s I-test was performed to infer the differentially expressed genes as described above, and GSEA was performed for Hallmark EMT and BC-PING signatures.

Cancer cell sorting with FACS

Tumor cell suspensions were prepared from mammary gland carcinomas obtained from 14- to 15-week-old female mice following the protocol used for cancer single-cell preparation. Digestion buffer was adjusted to contain 1.0 Wünsch units of TH Liberase per ml, and the incubation time was extended to 75 min. After red blood cell removal, cancer cells were neutralized and resuspended in 0.5 ml of FACS buffer with DAPI per 107 cells and directly sorted using a BD FACSAria III flow cytometer. For each sample, 300,000 cancer cells were sorted at high purity following a singlet/DAPIlow/tdTomatohigh gating strategy (see Supplementary Fig. 1). Postsorting analysis was performed to verify the purity of tdTomatohigh-sorted cells. Sorted cancer cells were centrifuged at 5,000 rpm for 3 min and further lysed in the lysis buffer from the illustra RNAspin Mini kit (GE Healthcare). RNA extraction, cDNA synthesis and RT–qPCR were performed as described above.

Statistics and reproducibility

All experiments were repeated at least three times, and the number of independent experimental replicates is indicated for each experiment in the figure legends and Source Data. Statistical analyses were performed using Prism 6 (GraphPad). Statistical analyses are indicated in the figure legends and in the statistical table in the Source Data. P values of >0.05 were considered not statistically significant. No data were excluded except in single-cell RNA-seq analyses. For single-cell data, we excluded cells based on the number of detected transcripts and the percentage of mitochondrial genes content, as described in the Methods. No statistical methods were used to predetermine sample sizes. Sample sizes were empirically determined by similarity to those reported in previous publications from our laboratory using similar in vitro and in vivo experimental models6,50 and as described in the Methods. A minimal number of n = 3 independent experiments were performed for all analyses, with the exception of single-cell RNA-seq in the kidney, which includes one sham-operated and two UUO samples. In the latter, the number of single cells and the average of genes detected per cell were sufficient to identify all renal cell populations without ambiguity and to define the associated molecular profiles. Individual data points were plotted in all analyses. Data distribution was assumed to be normal, but this was not formally tested. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Supplementary Information (790.9KB, pdf)

Supplementary Fig. 1. Gating strategy for tdTomato+ PyMT cancer cells in control and Prrx1-cKO tumors.

Reporting Summary (85.1KB, pdf)
Supplementary Tables 1–7 (394.4KB, xlsx)

Supplementary Table 1. Epithelial and mesenchymal genes. List of epithelial and mesenchymal genes extracted from refs. 12,55. Supplementary Table 2. BC-PINGs signature. Supplementary Table 3. Gene symbols and names. Supplementary Table 4. List of antibodies used for immunofluorescence and western blot analyses. Supplementary Table 5. List of oligonucleotides used for RT–qPCR and genotyping. Supplementary Table 6. Workflow in kidney and cancer sample single-cell analyses. Supplementary Table 7. List of TF binding motifs identified by SCENIC analysis.

Source data

Source Data Figs. 1–8 and Extended Data Figs. 1–10 (224.6KB, xlsx)

Numerical and statistical source data.

Source Data Extended Data Figs. 1 and 10 (7.6MB, pdf)

Unprocessed western blots.

Acknowledgements

We thank B. Sanchez-Laorden for helpful discussions and suggestions throughout the project, S. Vega for help and support in managing cell lines, D. Abad and T. Maria Gomez for technical support and G. Expósito and V. Villar Cerviño for support at the imaging facility. We thank A. Guzman De la Fuente for helpful suggestions for macrophage heterogeneity analysis. We also thank A. Caler Escribano for technical help in the FACS/Omics facility. We thank the MD Anderson Foundation Biobank for providing samples (record number B.0000745, Instituto de Salud Carlos III National Biobank). This work was supported by grants MICIU RTI2018-096501-B-I00 and MCI PID2021-125682NB-I00 to M.A.N., RTI2018-102260-B-I00 to J.P.L.-A. and PID2022-136854OB-I00 to G.M.-B., all funded by MICIU/AEI/10.13039/501100011033 and by FEDER; UE Funds were also provided by the AECC Scientific Foundation (FC_AECC PROYE19073NIE to M.A.N. and PROYE19036MOR to G.M.-B.), Instituto de Salud Carlos III (CIBERONC, CB16/12/00295 to G.M.-B. and A.C.; CIBERER, CB19/07/00038 to M.A.N.), Generalitat Valenciana (Prometeo 2021/45) and the European Research Council (ERC AdG 322694) to M.A.N., who also acknowledges financial support from Centro de Excelencia Severo Ochoa, grant CEX2021-001165-S, funded by MCIN/AEI/10.13039/501100011033, and support from the Scientific Network Conexión Cáncer funded by CSIC, Spain. K.K.Y. was a holder of an EMBO Long-Term fellowship and a ‘Severo Ochoa Excellence Program’ Postdoctoral contract and currently holds an investigator contract from the AECC Scientific Foundation (Ayudas AECC investigador 2022). N.N. held a contract associated with NEUcrest European Union’s Horizon 2020 Research and Innovation Program under Marie Skłodowska-Curie (grant agreement 860635, ITN NEUcrest to M.A.N.). R.J.-C. holds a ‘Severo Ochoa Excellence Programme’ PhD contract (PRE2020-091888).

Extended data

Author contributions

K.K.Y. and M.A.N. conceived the project, interpreted the data and wrote the manuscript. K.K.Y. performed most of the experiments and analyzed the data, and M.A.N. supervised the whole project. A.A. performed the bulk RNA-seq analyses and helped with the in silico analyses. A.M.-G. and N.N. performed the single-cell RNA-seq analyses. R.J.-C. performed immunofluorescence stainings and analysis in metastatic lung, kidney and mouse tumors. C.L.-B. performed experimental animal procedures and mouse line management. H.F. contributed to the in vitro experiments and RT–qPCR in cell lines and obstructed kidneys. D.G.-G. performed the cytokine analyses in tumors. G.M.-B. provided the human breast cancer tissue microarrays, performed immunofluorescence analysis in human tumors and, together with A.C., analyzed and interpreted human data. K.K.Y. and J.G. generated the Prrx1 conditional mutant mouse model. J.P.L.-A. helped design the single-cell RNA-seq experiments and supervised its analysis. M.A.N. also ensured funding.

Peer review

Peer review information

Nature Cancer thanks Heide Ford, Andras Kapus and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Data availability

Bulk RNA-seq data that support the findings of this study have been deposited and are publicly accessible at the GEO repository (GSE164488). scRNA-seq data that support the findings of this study have been deposited and are publicly accessible at the GEO repository (GSE175412 and GSE159478) for kidney and tumor data, respectively.

Single-cell RNA-seq data for the neural crest (Fig. 2) were downloaded from the NCBI GEO database (https://www.ncbi.nlm.nih.gov/geo/) and submitted under GEO accession GSE129114. The t-SNE embedding and associated metadata (Fig. 2d) were obtained from http://pklab.med.harvard.edu/ruslan/neural_crest/tSNE_main_Fig1.txt.

Source data are provided with this paper. All other data supporting the findings of this study are available from the corresponding author at the time of publication on reasonable request.

Code availability

We have not created any custom code or algorithms in this study. Open-source software was used to analyze the data. Details of software versions are specified in the Methods and Reporting Summary.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Nitin Narwade, Aida Arcas, Angel Marquez-Galera, Raúl Jiménez-Castaño.

Extended data

is available for this paper at 10.1038/s43018-024-00839-5.

Supplementary information

The online version contains supplementary material available at 10.1038/s43018-024-00839-5.

References

  • 1.Nieto, M. A., Huang, R. Y.-J., Jackson, R. A. & Thiery, J. P. EMT: 2016. Cell166, 21–45 (2016). [DOI] [PubMed] [Google Scholar]
  • 2.Dongre, A. & Weinberg, R. A. New insights into the mechanisms of epithelial–mesenchymal transition and implications for cancer. Nat. Rev. Mol. Cell Biol.20, 69–84 (2019). [DOI] [PubMed] [Google Scholar]
  • 3.Massagué, J. & Sheppard, D. TGF-β signaling in health and disease. Cell186, 4007–4037 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Youssef, K. K. & Nieto, M. A. Epithelial–mesenchymal transition in tissue repair and degeneration. Nat. Rev. Mol. Cell Biol.25, 720–739 (2024). [DOI] [PubMed] [Google Scholar]
  • 5.Yang, J. et al. Guidelines and definitions for research on epithelial–mesenchymal transition. Nat. Rev. Mol. Cell Biol.21, 341–352 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Grande, M. T. et al. Snail1-induced partial epithelial-to-mesenchymal transition drives renal fibrosis in mice and can be targeted to reverse established disease. Nat. Med.21, 989–997 (2015). [DOI] [PubMed] [Google Scholar]
  • 7.Lovisa, S. et al. Epithelial-to-mesenchymal transition induces cell cycle arrest and parenchymal damage in renal fibrosis. Nat. Med.21, 998–1009 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Pastushenko, I. et al. Identification of the tumour transition states occurring during EMT. Nature556, 463–468 (2018). [DOI] [PubMed] [Google Scholar]
  • 9.Kröger, C. et al. Acquisition of a hybrid E/M state is essential for tumorigenicity of basal breast cancer cells. Proc. Natl Acad. Sci. USA116, 7353–7362 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Simeonov, K. P. et al. Single-cell lineage tracing of metastatic cancer reveals selection of hybrid EMT states. Cancer Cell39, 1150–1162 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Nieto, M. A. Are you interested or afraid of working on EMT? Methods Mol. Biol.2179, 19–28 (2021). [DOI] [PubMed] [Google Scholar]
  • 12.Tan, T. Z. et al. Epithelial–mesenchymal transition spectrum quantification and its efficacy in deciphering survival and drug responses of cancer patients. EMBO Mol. Med.6, 1279–1293 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. Nat. Biotechnol.33, 306–312 (2015). [DOI] [PubMed] [Google Scholar]
  • 14.Thiery, J. P., Acloque, H., Huang, R. Y. J. & Nieto, M. A. Epithelial–mesenchymal transitions in development and disease. Cell139, 871–890 (2009). [DOI] [PubMed] [Google Scholar]
  • 15.Zhang, J. et al. Pathway crosstalk enables cells to interpret TGF-β duration. npj Syst. Biol. Appl.4, 18 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Neve, R. M. et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell10, 515–527 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sarrió, D. et al. Epithelial–mesenchymal transition in breast cancer relates to the basal-like phenotype. Cancer Res.68, 989–997 (2008). [DOI] [PubMed] [Google Scholar]
  • 18.Soldatov, R. et al. Spatiotemporal structure of cell fate decisions in murine neural crest. Science364, eaas9536 (2019). [DOI] [PubMed] [Google Scholar]
  • 19.Hegarty, S. V., Sullivan, A. M. & O’Keeffe, G. W. Zeb2: a multifunctional regulator of nervous system development. Prog. Neurobiol.132, 81–95 (2015). [DOI] [PubMed] [Google Scholar]
  • 20.Vandamme, N. et al. The EMT transcription factor ZEB2 promotes proliferation of primary and metastatic melanoma while suppressing an invasive, mesenchymal-like phenotype. Cancer Res.80, 2983–2995 (2020). [DOI] [PubMed] [Google Scholar]
  • 21.Martin, J. F., Bradley, A. & Olson, E. N. The paired-like homeo box gene Mhox is required for early events of skeletogenesis in multiple lineages. Genes Dev.9, 1237–1249 (1995). [DOI] [PubMed] [Google Scholar]
  • 22.Chevalier, R. L. The proximal tubule is the primary target of injury and progression of kidney disease: role of the glomerulotubular junction. Am. J. Physiol. Renal Physiol.311, F145–F161 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kuppe, C. et al. Decoding myofibroblast origins in human kidney fibrosis. Nature589, 281–286 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Dumas, S. J. et al. Single-cell RNA sequencing reveals renal endothelium heterogeneity and metabolic adaptation to water deprivation. J. Am. Soc. Nephrol.31, 118–138 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ransick, A. et al. Single-cell profiling reveals sex, lineage, and regional diversity in the mouse kidney. Dev. Cell51, 399–413 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Conway, B. R. et al. Kidney single-cell atlas reveals myeloid heterogeneity in progression and regression of kidney disease. J. Am. Soc. Nephrol.31, 2833–2854 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wu, H., Lai, C.-F., Chang-Panesso, M. & Humphreys, B. D. Proximal tubule translational profiling during kidney fibrosis reveals proinflammatory and long noncoding RNA expression patterns with sexual dimorphism. J. Am. Soc. Nephrol.31, 23–38 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol.20, 59 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.La Manno, G. et al. RNA velocity of single cells. Nature560, 494–498 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods14, 1083–1086 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Guy, C. T., Cardiff, R. D. & Muller, W. J. Induction of mammary tumors by expression of polyomavirus middle T oncogene: a transgenic mouse model for metastatic disease. Mol. Cell. Biol.12, 954–961 (1992). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Attalla, S., Taifour, T., Bui, T. & Muller, W. Insights from transgenic mouse models of PyMT-induced breast cancer: recapitulating human breast cancer progression in vivo. Oncogene40, 475–491 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wuidart, A. et al. Early lineage segregation of multipotent embryonic mammary gland progenitors. Nat. Cell Biol.20, 666–676 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bach, K. et al. Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing. Nat. Commun.8, 2128 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Pal, B. et al. Construction of developmental lineage relationships in the mouse mammary gland by single-cell RNA profiling. Nat. Commun.8, 1627 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Shehata, M. et al. Phenotypic and functional characterisation of the luminal cell hierarchy of the mammary gland. Breast Cancer Res.14, R134 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ginestier, C. et al. ALDH1 is a marker of normal and malignant human mammary stem cells and a predictor of poor clinical outcome. Cell Stem Cell1, 555–567 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Koren, S. et al. PIK3CAH1047R induces multipotency and multi-lineage mammary tumours. Nature525, 114–118 (2015). [DOI] [PubMed] [Google Scholar]
  • 39.Van Keymeulen, A. et al. Reactivation of multipotency by oncogenic PIK3CA induces breast tumour heterogeneity. Nature525, 119–123 (2015). [DOI] [PubMed] [Google Scholar]
  • 40.Youssef, K. K. et al. Adult interfollicular tumour-initiating cells are reprogrammed into an embryonic hair follicle progenitor-like fate during basal cell carcinoma initiation. Nat. Cell Biol.14, 1282–1294 (2012). [DOI] [PubMed] [Google Scholar]
  • 41.Kaufman, C. K. et al. A zebrafish melanoma model reveals emergence of neural crest identity during melanoma initiation. Science351, aad2197 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Cheung, K. J., Gabrielson, E., Werb, Z. & Ewald, A. J. Collective invasion in breast cancer requires a conserved basal epithelial program. Cell155, 1639–1651 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Rädler, P. D. et al. Highly metastatic claudin-low mammary cancers can originate from luminal epithelial cells. Nat. Commun.12, 3742 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ye, X. et al. Distinct EMT programs control normal mammary stem cells and tumour-initiating cells. Nature525, 256–260 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bièche, I. et al. Molecular profiling of inflammatory breast cancer: identification of a poor-prognosis gene expression signature. Clin. Cancer Res.10, 6789–6795 (2004). [DOI] [PubMed] [Google Scholar]
  • 46.Ge, Y. et al. Stem cell lineage infidelity drives wound repair and cancer. Cell169, 636–650 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Marjanovic, N. D. et al. Emergence of a high-plasticity cell state during lung cancer evolution. Cancer Cell38, 229–246 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.van Groningen, T. et al. Neuroblastoma is composed of two super-enhancer-associated differentiation states. Nat. Genet.49, 1261–1266 (2017). [DOI] [PubMed] [Google Scholar]
  • 49.González-Iglesias, A. & Nieto, M. A. Proliferation and EMT trigger heart repair. Nat. Cell Biol.22, 1291–1292 (2020). [DOI] [PubMed] [Google Scholar]
  • 50.Ocaña, O. H. et al. Metastatic colonization requires the repression of the epithelial–mesenchymal transition inducer PRRX1. Cancer Cell22, 709–724 (2012). [DOI] [PubMed] [Google Scholar]
  • 51.Tsai, J. H., Donaher, J. L., Murphy, D. A., Chau, S. & Yang, J. Spatiotemporal regulation of epithelial–mesenchymal transition is essential for squamous cell carcinoma metastasis. Cancer Cell22, 725–736 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Liu, X. et al. Sequential introduction of reprogramming factors reveals a time-sensitive requirement for individual factors and a sequential EMT–MET mechanism for optimal reprogramming. Nat. Cell Biol.15, 829–838 (2013). [DOI] [PubMed] [Google Scholar]
  • 53.Karras, P. et al. A cellular hierarchy in melanoma uncouples growth and metastasis. Nature610, 190–198 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics14, 7 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Chung, W. et al. Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat. Commun.8, 15081 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Madisen, L. et al. A robust and high-throughput Cre reporting and characterization system for the whole mouse brain. Nat. Neurosci.13, 133–140 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Shao, X., Somlo, S. & Igarashi, P. Epithelial-specific Cre/lox recombination in the developing kidney and genitourinary tract. J. Am. Soc. Nephrol.13, 1837–1846 (2002). [DOI] [PubMed] [Google Scholar]
  • 58.Dassule, H. R., Lewis, P., Bei, M., Maas, R. & McMahon, A. P. Sonic hedgehog regulates growth and morphogenesis of the tooth. Development127, 4775–4785 (2000). [DOI] [PubMed] [Google Scholar]
  • 59.Ran, F. A. et al. Genome engineering using the CRISPR–Cas9 system. Nat. Protoc.8, 2281–2308 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Renier, N. et al. iDISCO: a simple, rapid method to immunolabel large tissue samples for volume imaging. Cell159, 896–910 (2014). [DOI] [PubMed] [Google Scholar]
  • 61.Taube, J. H. et al. Core epithelial-to-mesenchymal transition interactome gene-expression signature is associated with claudin-low and metaplastic breast cancer subtypes. Proc. Natl Acad. Sci. USA107, 15449–15454 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Foroutan, M. et al. Single sample scoring of molecular phenotypes. BMC Bioinformatics19, 404 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics12, 323 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res.44, W90–W97 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics J. Integr. Biol.16, 284–287 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA102, 15545–15550 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Yu, G. et al. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics26, 976–978 (2010). [DOI] [PubMed] [Google Scholar]
  • 69.Youssef, K. K., Narwade, N. & Nieto, A. Single-cell preparation and scRNA-Seq data analysis. Protocol.io10.17504/protocols.io.eq2lyw9qwvx9/v1 (2024).
  • 70.Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun.8, 14049 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Stuart, T. et al. Comprehensive integration of single-cell data. Cell177, 1888–1902 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Petukhov, V. et al. Case–control analysis of single-cell RNA-seq studies. Preprint at bioRxiv10.1101/2022.03.15.484475 (2022).
  • 73.van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell174, 716–729 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol.19, 15 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature566, 496–502 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics14, 128 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol.38, 1408–1414 (2020). [DOI] [PubMed] [Google Scholar]
  • 78.Chicco, D. & Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics21, 6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Wang, Y. et al. N-Myc downstream regulated gene 1 (NDRG1) promotes the stem-like properties of lung cancer cells through stabilized c-Myc. Cancer Lett.401, 53–62 (2017). [DOI] [PubMed] [Google Scholar]
  • 80.Vega, S. et al. Snail blocks the cell cycle and confers resistance to cell death. Genes Dev.18, 1131–1143 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (790.9KB, pdf)

Supplementary Fig. 1. Gating strategy for tdTomato+ PyMT cancer cells in control and Prrx1-cKO tumors.

Reporting Summary (85.1KB, pdf)
Supplementary Tables 1–7 (394.4KB, xlsx)

Supplementary Table 1. Epithelial and mesenchymal genes. List of epithelial and mesenchymal genes extracted from refs. 12,55. Supplementary Table 2. BC-PINGs signature. Supplementary Table 3. Gene symbols and names. Supplementary Table 4. List of antibodies used for immunofluorescence and western blot analyses. Supplementary Table 5. List of oligonucleotides used for RT–qPCR and genotyping. Supplementary Table 6. Workflow in kidney and cancer sample single-cell analyses. Supplementary Table 7. List of TF binding motifs identified by SCENIC analysis.

Source Data Figs. 1–8 and Extended Data Figs. 1–10 (224.6KB, xlsx)

Numerical and statistical source data.

Source Data Extended Data Figs. 1 and 10 (7.6MB, pdf)

Unprocessed western blots.

Data Availability Statement

Bulk RNA-seq data that support the findings of this study have been deposited and are publicly accessible at the GEO repository (GSE164488). scRNA-seq data that support the findings of this study have been deposited and are publicly accessible at the GEO repository (GSE175412 and GSE159478) for kidney and tumor data, respectively.

Single-cell RNA-seq data for the neural crest (Fig. 2) were downloaded from the NCBI GEO database (https://www.ncbi.nlm.nih.gov/geo/) and submitted under GEO accession GSE129114. The t-SNE embedding and associated metadata (Fig. 2d) were obtained from http://pklab.med.harvard.edu/ruslan/neural_crest/tSNE_main_Fig1.txt.

Source data are provided with this paper. All other data supporting the findings of this study are available from the corresponding author at the time of publication on reasonable request.

We have not created any custom code or algorithms in this study. Open-source software was used to analyze the data. Details of software versions are specified in the Methods and Reporting Summary.


Articles from Nature Cancer are provided here courtesy of Nature Publishing Group

RESOURCES