Abstract
Even among genetically identical cancer cells, resistance to therapy frequently emerges from a small subset of those cells1-7. Molecular differences in rare individual cells in the initial population enable certain cells to become resistant to therapy7-9; however, comparatively little is known about the variability in the resistance outcomes. Here we develop and apply FateMap, a framework that combines DNA barcoding with single-cell RNA sequencing, to reveal the fates of hundreds of thousands of clones exposed to anti-cancer therapies. We show that resistant clones emerging from single-cell-derived cancer cells adopt molecularly, morphologically and functionally distinct resistant types. These resistant types are largely predetermined by molecular differences between cells before drug addition and not by extrinsic factors. Changes in the dose and type of drug can switch the resistant type of an initial cell, resulting in the generation and elimination of certain resistant types. Samples from patients show evidence for the existence of these resistant types in a clinical context. We observed diversity in resistant types across several single-cell-derived cancer cell lines and cell types treated with a variety of drugs. The diversity of resistant types as a result of the variability in intrinsic cell states may be a generic feature of responses to external cues.
Individual cells respond to signals and stresses differently, often owing to intrinsic, non-genetic differences5,6,10-15. Advances in single-cell barcoding have enabled tracking of molecular state changes following exposure to signals and stresses over time8,16-27, but less attention has been paid to characterizing variability in the outcomes themselves. Typically, the implicit assumption is that outcomes are binary: induced or not induced, proliferative or non-proliferative, alive or dead. It is possible, however, that there is a far richer set of outcomes.
Therapeutic resistance in cancer illustrates the variable responses to stress. Anti-cancer drugs kill a majority of cells, but a small, resistant subpopulation often remains, preventing cures. Recent studies have identified these subpopulations (marked by slow fluctuations in gene expression1-4,7,9,28,29) even within single-cell-derived (clonal) cancer populations. Upon drug exposure, clones from these populations survive and proliferate to form resistant colonies. Resistant cells are assumed to be relatively uniform in molecular profile and behaviour, but it is unclear whether the population’s clonal structure results in variability. Although a variety of resistance mechanisms have been documented2,9,30-32 and differences in proliferative capacity suggest heterogeneity between resistant clones8,20,22, it is unclear whether diverse resistant cell types can arise from a homogeneous initial population. We developed FateMap, a framework combining single-cell RNA sequencing (scRNA-seq), DNA barcoding and computational analysis, to follow the fates of thousands of individual cancer cell clones as they acquire resistance. Even homogeneous cells grown in identical conditions gave rise to molecularly and functionally diverse resistant types. These resistant types were predetermined by intrinsic differences between the cells before drug exposure. Transcriptional and functional diversification of resistant types were consistent across different cancers and therapies.
Diverse fates emerge upon drug treatment
We questioned whether the resistant cells that emerged from the treatment of single-cell-derived cancer cells adopted distinct fates. We focused on BRAFV600E-mutated melanoma, where the treatment of single-cell-derived cells with the targeted therapy vemurafenib leads to survival of rare (1 in 1,000 or less) cells, which proliferate to form resistant colonies (Fig. 1a and Supplementary Video 1). We performed scRNA-seq on a mixture of all the resistant colonies from a single tissue culture dish, finding that resistant types exhibited extensive diversity in their gene expression profiles (Fig. 1b and Supplementary Fig. 1). Some resistant cells expressed canonical resistance markers1,3,8 (such as AXL and SERPINE1), but several other subpopulations expressed their own distinct sets of marker genes. These subpopulations expressed multiple markers reminiscent of particular cell types, including smooth muscle (for example, ACTA2, ACTG2 and MYOCD), neural crest (NGFR, S100B and GAS7), adhesive (VCAM1, PKDCC and ITGA8), melanocytic (MLANA, SOX10 and MITF) or type-1 interferon signalling-enriched (IFIT2, DDX58 and OASL) (Fig. 1b,c and Supplementary Fig. 1). Thus, diverse resistant cell types can emerge from single-cell-derived cancer cells upon treatment with targeted therapy. Resistant cells were more transcriptionally diverse than drug-naive, non-resistant cells33 (Extended Data Fig. 1a-g and Supplementary Discussion).
Fig. 1: FateMap reveals that between-clone fate type diversity arises from a single cell upon therapy treatment.
a, Schcmatic of single-cell-derived WM989 A6-G3 melanoma cells exposed to the targeted therapy drug vemurafenib, which formed resistant colonies in 3–4 weeks. Colonies were mixed together and single-cell sequenced. b, Uniform manifold approximation and projection (UMAP) applied to the first 50 principal components to visualize gene expression differences. The 8,212 cells are coloured by clusters determined using the FindClusters command: “Seurat clusters, resolution - 0.6*. of 2 biological replicates. c, Cells on the UMAP recoloured by the expression of a subset of differentially expressed genes. Genes with similar UMAP expression profiles are listed below each panel. ACTA2 is found largely in Seurat cluster 8; IFIT2 is found largely in cluster 12; VCAM1 is found largely in cluster 15; NGFR is found largely in cluster 7; and MLANA is found largely in clusters 0 and 3. d, Schematic of FateMap labelling of cells with unique DNA barcodes before vemurafenib exposure. WM989 A6-G3 cells were transduced with the FateMap barcode library at a multiplicity of infection (MOI) of approximately 0.15. WPRE, woodchuck hepatitis virus post-transcriptional regulatory element; EFS, elongation factor 1α short. e, Barcodcd cells from d were exposed to vemurafenib for 3–4 weeks and resultant colonies were analysed by scRNA-seq and barcode sequencing. f, Testing whether resistant cells sharing a barcode (a resistant clone) are more transcriptionally similar to each other than other clones. g, Clones, irrespective of their size, are largely constrained in specific clusters. h, Quantification of preference for specific clusters across all clones (clone size > 4; representative clones in yellow). Wilcoxon test (unpaired, two sided), P < 2.2 × 10−16. SNN, shared nearest neighbour. i, RNA FISH of genes marking resistant types. Consistent with FatcMap, we found resistant colonies that were selectively positive for each of the three markers tested, and others that were negative for all of these markers.
Resistant cells grow out as separate clones from individual cells amidst the initial drug-naive population (Supplementary Video 1). We wanted to know whether all cells of a clone share a single resistant type, or they comprise many types (Fig. 1d-f). Within-clone diversity implies that resistant cells can switch between types. Between-clone diversity implies that resistant clones are transcriptionally stable. To determine both the transcriptional profile and clonal origin of each cell simultaneously, we developed FateMap, a method that uses transcribed DNA barcodes (encoded in the 3′ untranslated region of the gene encoding GFP) to identify clones from thousands of resistant cells at once (Fig. 1d). First, lentiviral barcodes integrate into the DNA of therapy-naive cells. With a large barcode library complexity (around 59 million unique barcodes; Methods) and low MOI, many thousands of cells could be uniquely barcoded and enriched by sorting. We exposed barcoded cells to vemurafenib, collected resistant populations, performed scRNA-seq and extracted the FateMap clone barcode for each cell by selectively amplifying and sequencing the cDNA library in a way that linked the clone barcode and cell identifier (Fig. 1e, Supplementary Fig. 2, Supplementary Discussion and Methods).
Cells from individual resistant clones fell predominantly within constrained regions of transcriptional space, showing that variability was primarily between clones (Fig. 1f,g). One large resistant clone was enriched for genes expressed in smooth muscle (ACTA2 and MYOCD), whereas another, smaller clone was enriched for genes expressed in neural crest cells (NGFR and S100B) (Fig. 1g and Supplementary Fig. 3). Other clones were enriched for canonical resistance markers (AXL and SERPINE1) (Supplementary Fig. 1). Another subpopulation expressed the melanocyte genes SOX10 and MLANA, predominantly consisting of single barcoded cells (‘singletons’) that were largely non-proliferative (Fig. 1g and Supplementary Fig. 3; 98.6% of all clones within clusters 0 and 3 were singletons). Barcode silencing was minimal and consistent between different resistant types (Supplementary Figs. 1 and 2). Occasionally, cells in a clone belonged to two non-neighbouring clusters—for example, clusters 15 and 6 for three clones marked by VCAM1 and APOE, respectively (Supplementary Fig. 1).
Drug-naive cells also showed transcriptional constraint after 9 days in culture (approximately 4 divisions) but over time such cells span the entire transcriptomic space7, whereas therapy-resistant clones remained constrained after months in drug (Extended Data Fig. 1a-c). To quantify transcriptional homogeneity within a clone, we performed a dominant cluster analysis, showing that clones largely consisted of transcriptionally similar cells regardless of clustering resolution (Fig. 1h and Supplementary Fig. 3). Other statistical metrics27,33,34 supported this conclusion (Supplementary Figs. 3 and 4 and Supplementary Discussion).
We corroborated these results by performing multiplex single-molecule RNA fluorescence in situ hybridization (RNA FISH) for a subset of genes (ACTA2, NGFR and BGN) that belonged to distinct clusters (Fig. 1c and Supplementary Fig. 2) on a large plate containing several resistant clones. We verified that the selected markers were expressed only in distinct resistant clones (Fig. 1i and Supplementary Fig. 5).
To show that there was no genetic basis for the observed diversity of resistant types, we performed whole-genome sequencing (WGS) on both naive and resistant clones, finding no evidence of recurrent driver mutations (Extended Data Fig. 2). We also grew out resistant colonies from the original WM989 A6-G3 cell line as well as two subclones, WM989 A6-G3 A10 and WM989 A6-G3 A11, and performed large-scale single-molecule RNA FISH to demonstrate that resistant types emerged with similar frequencies despite having different background mutations (Supplementary Fig. 6).
Resistant types are functionally diverse
We next tested whether transcriptionally distinct resistant clones had different phenotypic properties. To measure differences in proliferation, we counted the number of cells per clone for different resistant types (Fig. 2a,b and Supplementary Fig. 6). Distinct resistant types had different proliferative capacities. Some types, such as those marked by ACTA2, AXL and VCAM1, formed large colonies (Fig. 2b), whereas those marked by NGFR (cluster 7) formed small colonies and singletons (Fig. 2b).
Fig. 2: Differences in gene expression between clones correspond to differences in morphology, proliferation and invasiveness.
a, Classification of colonies as singletons, small colonies or large colonies. Clusters exhibited different proliferative capacities. b, Colony size distributions for each fate type. Unpaired, two-sided Mann–Whitney U test; intervals based on P value thresholds. ****P < 0.0001, ***P < 0.001, **P < 0.01, *P < 0.05. n = 1 of 2 biological replicates. c, Left, schematics based on visual inspection of morphology, orientation and density. Right, bright-field images of resistant colonies exhibiting different morphologies. d, Schematic of isolation and expansion of vemurafenib-resistant or trametinib-resistant colonies, a subset of which were then analysed by bulk RNA sequencing (RNA-seq), categorized for morphology and measured for invasiveness. e, Resistant cells were seeded at 3,000 cells per well, allowed to form spheroids over 96–120 h and then embedded in a collagen matrix. Red and cyan mark the invading and core boundary, respectively. RC, resistant colony. f, Invasiveness of resistant colonies emerging from treatment with trametinib was quantified by computing the ratio of the area enclosing the red and cyan boundaries as shown in e. Each dot represents one spheroid. g, Mapping of morphology onto the FateMap data by comparing genes differentially expressed from morphology to resistant colonies. Similarity score on UMAP represents the degree of overlap of differentially expressed genes between bulk-sequencing data and each cluster. Resistant colony fate type 2 maps predominantly to cluster 7, whereas fate type 3 maps to cluster 15. Singletons, as identified by imaging for the SOX10 gene, map to clusters 0 and 3.
Next, we tested whether different resistant clones exhibited distinct morphologies. We performed bright-field imaging of resistant colonies on the plate and identified several distinct morphologies (Fig. 2c), including colonies with cells that appeared epithelial (type 1), cells that grew slower and were more transparent (type 2), cells that grew on top of each other (type 3) and elongated cells (type 4). For a subset of the types, we were able to isolate the colony and perform multiple cycles of growth and replating in the presence of vemurafenib; these colonies retained their morphology (Fig. 2d) (although some colonies did not survive the replating process). Furthermore, a systematic longitudinal analysis of several dozen isolated and expanded resistant colonies in vemurafenib revealed that colonies retain their phenotypes, such as morphology and transcriptional makeup, over 1–2 months (Extended Data Fig. 3a-g) (18 for initial time points, 27 for late time points (4-6 weeks), and 13 paired initial–late time point colonies).
We then tested whether resistant colonies differed in invasive potential using a spheroid assay (see Methods). We manually isolated 64 therapy-resistant colonies from multiple parallel experiments and expanded them for months. We formed 3D aggregates (spheroids) from a subset of resistant colonies with sufficient cell numbers, embedded them in a collagen matrix, and measured their invasiveness by measuring the area under the invading boundary (red) relative to that of the embedded spheroid core (blue) (Fig. 2e,f). We found that different resistant clones had markedly different invasion areas in the collagen matrix (Fig. 2e,f) (some colonies were unable to aggregate into spheroids).
We then connected this variation in morphology and invasiveness to specific transcriptional profiles (Fig. 2d). We performed bulk RNA-seq of manually isolated colonies with known morphology and invasive potential. We identified genes that were differentially expressed between morphology types from bulk RNA-seq and used these gene sets to map morphologies to single-cell clusters from FateMap (Fig. 2g and Methods). For example, we found that the fate type 2 and fate type 3 morphology from Fig. 2c corresponded to gene expression signatures most similar to the NGFR-high cluster (cluster 7) and VCAM1-high cluster (cluster 15) (Fig. 2g and Supplementary Table 4). Similarly, several differentially expressed genes between slow and fast invading resistant colonies were connected to specific transcriptional clusters from FateMap (Extended Data Fig. 3h). The fastest invading resistant colonies were enriched for expression of genes that marked cluster 8, including ACTA2, TAGLN and EDN1. Therefore, gene expression differences between clones corresponded to functional differences in proliferation, morphology and invasiveness.
Diverse resistant types occur across cancers
We looked for diversity of resistant types in other cancer cell lines (Supplementary Discussion). Another single-cell-derived patient-derived melanoma cell line (BRAFV600E WM983B E9-C6) also showed morphological and proliferative differences between resistant clones upon vemurafenib treatment (Extended Data Fig. 4a). FateMap revealed many of the same transcriptional signatures and proliferative differences seen in WM989 A6-G3 cells (Extended Data Fig. 4b-f). We performed FateMap analysis on two NRAS-mutant melanoma lines (NRASQ61K WM3451 P2G7 and NRASQ61K WM3623 P4E7) and again found a diversity of types in resistant clones to be a property of melanoma cell lines regardless of driver mutation (Extended Data Figs. 4h-l and 5a-e). Furthermore, FateMap revealed extensive transcriptional diversity in clones of expanded primary human melanocytes (Extended Data Fig. 6) (also observed in ref. 1), suggesting that transcriptional diversity may be a general feature of the melanocyte lineage. FateMap applied to the triple negative breast cancer line MDA-MB-231-D4 treated with chemotherapy drug paclitaxel7 also showed resistant clones occupying constrained regions of transcriptional space, albeit with less overall heterogeneity than in melanoma (Extended Data Fig. 7).
Resistant types emerge in patients
We questioned whether these resistant types also arose in patients, in which the microenvironment (including immune system) and spatial context are factors. We obtained tissue samples of tumours from four patients who had relapsed subsequent to treatment with targeted therapy. For two of these individuals, we also had matching tumour samples from before they underwent therapy. Multiple punch biopsies were taken from the tumour (Fig. 3a). We used GeoMx spatial transcriptomic profiling, in which multiple regions of interest (ROIs) consisting of between 73 and 1,390 cells were selected from within each punch biopsy and profiled by RNA-seq (93 regions across all samples; Fig. 3 and Supplementary Fig. 7). We found extensive variability in the expression of key resistant type markers across resistant tumour patches, suggesting that different areas of the same tumours within a patient may harbour different proportions of the resistant types identified by FateMap (Fig. 3b). There were multiple examples of adjacent regions from the same resistant tumour showing differential expression of markers from different resistant types. For example, within a single punch biopsy from patient 163, two regions showed high expression of MLANA and SOX10, whereas a third region showed high ACTA2 expression (Fig. 3b). Reassuringly, we found many patterns of co-expression (for example, ACTA2 and ACTG2, both from the smooth muscle resistant type identified by FateMap), further corroborating the existence of coherent resistant types in resistant tumours (Supplementary Fig. 11). We found similar pre-existing variability in adjacent regions in patient-matched tumour samples before treatment with targeted therapy, suggesting some degree of pre-existing heterogeneity (Fig. 3b, right and Supplementary Fig. 8). To obtain evidence for clonal structure of expression heterogeneity in the patient data, we used spatial proximity as an approximation of relatedness (Methods and Supplementary Fig. 9), estimating that the degree of imbalance required to match that observed was on the order of 20%. Our findings suggested that the regions that we captured were somewhat imbalanced but not completely clonal, which was expected given the relatively large sizes of the regions (Supplementary Fig. 9). An analysis across other datasets showed further concordance with our FateMap results (Supplementary Fig. 10). Furthermore, we found heterogeneity in immune infiltration after targeted therapy. For example, in a single punch biopsy, one region had high expression of the macrophage marker CD68 whereas nearby regions had low expression of CD68 but high levels of CD8A (Supplementary Fig. 8). Together, these results provided strong evidence for the existence of diverse resistant types in patient samples.
Fig. 3: Resistant fate types emerge following targeted therapy in patients as evidenced by spatial transcriptomic profiling.
a, Overview of all 29 punch biopsies from four patients that were sequenced using the GeoMx Digital Spatial Profiler (DSP) system for spatial transcriptomics. A total of 93 ROIs were selected for sequencing based on visual inspection of DNA (SYTO 13, blue), CD45 (red) and S100B (green) staining. Two patient samples were coupled with matched pre-treatment biopsies (marked untreated). b, Dot plots of counts per million (cpm)-transformed data for selected markers of resistant fate types as identified from in vitro FateMap. Each dot is a single ROI coloured by that region’s qualitative S100B staining level, faceted by the punch biopsy of each region. Specific punch biopsies highlighting nearby regions with different expression profiles are highlighted for two patients, both before and after treatment.
We also searched for resistant types in xenograft models by injecting WM989 A6-G3 5a335 subcutaneously into mice, applying targeted therapy treatment, collecting the resistant tumours and measuring marker expression (Supplementary Fig. 11). Large-scale scans of tumour tissue sections showed that markers for the various resistant types were present in distinct regions of the tumour sections (Supplementary Figs. 11-13).
Resistant fates predetermined by initial conditions
We questioned whether the transcriptional and phenotypic variability in therapy-resistant clones was the result of intrinsic differences in the molecular expression states of cells preceding drug exposure. Alternatively, the resistant types may be determined extrinsically—for instance, by the location and immediate neighbours of cells33. An ‘identical twin’ analysis combined with FateMap enabled us to distinguish between these possibilities.
In brief, upon uniquely barcoding cells, we allowed them to divide several times and then separated the population into two equal split populations A and B, such that most barcoded clones (over 90%) were present in each group as ‘twins’ (Fig. 4a). We then applied vemurafenib and performed FateMap on both split populations. If the resistant type of a cell were intrinsically determined, then its twin would share the same type (assuming that the intrinsic potential has enough memory to be maintained over at least a few cell divisions1,3,7,8). Pure barcode sequencing of genomic DNA (gDNA) confirmed a strong overlap (significantly larger than random) in barcodes between the populations, demonstrating that resistance potential in general was intrinsically determined (Fig. 4b,c) (we added specific amounts of known barcoded cells as standards (Methods) to enable conversion of sequencing reads to the cell numbers). Mouse xenograft studies showed a lower but still statistically significant overlap (Supplementary Fig. 14).
Fig. 4: Cells are predestined for distinct resistant fates upon exposure to therapy.
a, Schematic of FateMap twin experimental designs. We transduced WM989 A6-G3 cells (MOI ≈ 0.15) with the FateMap barcode library. After 3–4 cell divisions, we sorted the barcoded population, split the cells (into A and B), treated each with vemurafenib and performed scRNA-seq and barcode sequencing on the colonies (Supplementary Table 11). b, Unique barcode abundance is identified by gDNA sequencing in splits A and B. Those present in both splits are dark blue (87), and those present in only one (14 each) are cyan. n = 1 of 2 biological replicates. c, Top, Venn diagram of the overlap between barcode clones present in both splits (dark blue) compared with those present only in either A or B (cyan). Bottom, comparison of the observed overlap between the shared barcodes (twins) surviving across splits with random survival chance (simulated 1,000 times). d, UMAPs of representative twin clones (sharing the same barcode) across the two splits A (8,212 cells) and B (7,262 cells). The resistant twins largely end up with the same transcriptional fate type, invariant of the clone size. e, Large clones superimposed on the UMAP, with each colour representing a unique resistant clone, f, Mixing coefficient is used to calculate the pairwise transcriptional relatedness of clones (see Methods). Higher mixing coefficient corresponds to higher transcriptional relatedness of clones (perfect mixing, 1; no mixing, 0). Representative example UMAPs are provided. g, Mixing coefficient for twin clones across splits A and B is presented with representative examples on the UMAP. h, Box plots showing cumulative mixing coefficients between clones within splits A (133) and B (102) (grey), non-twin clones across A and B (66) (grey), and twin clones (12) (blue). Unpaired, two-sided Wilcoxon test; P value for non-twin compared with twin clones is 4.513 × 10−8.
We next tested whether the specific resistant type that a clone adopted was similarly predetermined by the initial state of the pre-resistant cell as opposed to external factors. That is, whether twins separated into the two split populations (thus randomizing the position in the plate and neighbouring cells) adopt similar or distinct transcriptional profiles after drug treatment. An initial inspection of the clones superimposed on the UMAP projections suggested that twins surviving therapy largely end up in the same regions of the UMAP (Fig. 4d and Supplementary Fig. 6g-j) and were more similar than non-twin clones belonging to similar type clusters (Fig. 4e).
To compare single-cell transcriptional profiles of clones across split populations, we formulated a metric that we called the ‘mixing coefficient’ (Methods), which provides a pairwise comparison of transcriptional similarity between any two clones in principal component space (notably, it is independent of any particular cluster designation). For each pair of clones (twin or non-twin), we measured, for each cell, the number of nearby neighbour cells (from the pair of clones) that were either from the same clone (self-neighbour) or from the pair clone. The mixing coefficient was the averaged fraction of self-neighbours across all cells from the clone pair divided by the averaged fraction of non-self neighbours. A mixing coefficient of 1 signifies a high degree of shared transcriptional similarity between the pair, whereas a mixing coefficient of 0 implies that the two clones are transcriptionally separated in the principal component space (see Methods). Non-twin clone pairs had low mixing coefficients both within or across the two split populations, but twin clone pairs exhibited high mixing coefficients (Fig. 4f-h and Supplementary Fig. 14). These results show that the adoption of distinct transcriptional and phenotypic types was determined by the intrinsic molecular state of the cells preceding drug exposure, and environmental factors had little to no effect on the type outcomes. Similar results in primary melanocytes and untreated WM989 A6-G3 melanoma cells showed that the intrinsic states of these cells persisted despite a change in a cell’s environment (Extended Data Figs. 1h and 6). Intrinsic predetermination of resistant type also occurred in three other melanoma cell lines (WM983B A6-G3, WM3451 P2G7 and WM3623 P4E7) and the breast cancer cell line tested earlier (MDA-MB-231-D4) (Extended Data Figs. 4, 5 and 7). The effects were less pronounced in the MDA-MB-231-D4 line (Extended Data Fig. 7).
Changing drug dose causes fate switching
Drug resistance depends heavily on the concentration of drug used. We explored how the ensemble of resistant types would change if we used a different drug concentration using FateMap across two split populations, each treated with a different concentration of drug.
We compared 100 nM vemurafenib to our standard dose of 1 μM (Supplementary Fig. 15), which led to around 2.5-fold more resistant colonies than the high dose (Fig. 5a and Supplementary Videos 1 and 2). Resistant tumours grew comparatively faster in mice treated with the low dose (Supplementary Figs. 13a and 15). This increase could have resulted from (1) new resistant clones in addition to those that survive high dose, (2) a completely distinct set of clones becoming resistant, or (3) the same set of clones from high dose becoming resistant, but with additional divisions of those clones (Fig. 5b). FateMap could distinguish between these possibilities by splitting the population after barcoding and putting one of the set of twins in a low-dose environment and the other in a high-dose environment. gDNA barcode sequencing and imaging analysis showed many barcodes arising only at the low dose, indicating that additional new clones became resistant in the low-dose environment (fold change 2.15 and 2.55, respectively) (Fig. 5a and Supplementary Fig. 15). Barcode overlap between two low-dose arms also showed intrinsic predetermination (Supplementary Fig. 15).
Fig. 5: Changing the therapeutic dose results in stereotypic resistant fate type switching and altered transcriptional profiles.
a, Left, resistant colonies emerging from treatment of 25,000 WM989 A6-G3 cells with two vemurafenib doses (1 μM and 100 nM). Right, total colonies from each dose across n = 3 biological replicates; error bars represent s.e.m. b, Schematic of FateMap twin experimental designs for different vemurafenib doses. We transduced WM989 A6-G3 cells with the barcode library. After 3–4 cell divisions, we sorted the barcoded population, divided it into splits A and B, treated each with vemurafenib and performed FateMap. We list three possible outcomes (cell counts are available in Supplementary Table 11). c, Combined low-dose (13,400 cells) and high-dose (9,457 cells) resistant cells obtained from UMAP applied to the first 50 principal components. Cells are coloured by clusters determined using Seurat’s FindClusters command. d, UMAP with resistant cells coloured by dose. Arrows represent regions present at only one of the doses. e, The UMAP in d is split accoring to dose, with colours representing clusters determined in c. Arrows represent clusters that are present at only one dose. f, UMAPs recoloured according to expression of NGFR and MLANA, markers for clusters that are enriched in one of the two doses. g, Left, UMAP cluster coloured for cluster 9 (high for NGFR). Right, pie chart showing that 25.3% of the NGFR-high clones present in high dose were also detected in the low dose. h, Left and centre, representative examples of where twins from the NGFR-high cluster following high dose are located following low dose. Fate type switch 1 twins (38 out of 46) had similar fate types, as did fate type switch 2 twins (8 out of 46). Right, cumulative density contour plot of fate type switches from high dose to low dose.
We next explored how the diversity of resistant clones changed between low and high doses of drug. Despite the extensive change in the frequency of resistance between low and high dose, many of the transcriptional types were the same between the two doses (Fig. 5c-f and Supplementary Fig. 15). However, there were many differences. Particularly, the NGFR-high cells (cluster 9) were largely missing from the low dose-resistant population of cells. Additionally, although MLANA-high cells (clusters 5,6) were present at both doses, they were in non-overlapping clusters (Fig. 5c-f).
We next addressed the fate of the NGFR-high resistant cells at low dose. We collected the barcoded clones corresponding to the NGFR-high cells in the high dose (cluster 9) and looked for their corresponding twins in the low dose (Fig. 5g), finding 46 such barcodes (25.3% of all barcodes in cluster 9) (Fig. 5g). Most of these clones (38 out of 46) adopted types within the MLANA-high cluster 6 (fate switch 1) (Fig. 5h). The remaining twins appeared to adopt a different, albeit less transcriptionally constrained, type (fate switch 2) (Fig. 5h). Several genes were differentially expressed in pairwise comparisons across the type switches (149 and 216 genes for switch 1 and 2, respectively) (Supplementary Fig. 15 and Supplementary Table 8).
Comparing the 38 clones with the remaining 8 of these 46 clones maintained with the high dose, we identified 70 genes that were differentially expressed (Supplementary Fig. 15), suggesting that subtle differences between naive cells that appeared to adopt the same NGFR-high type at high dose could lead to more obvious fate differences at low dose.
For the MLANA-high resistant type clusters, the percentage of non-singletons within the type increased strongly at low dose (4.66% in high dose to 21.6% in low dose) (Fig. 5f and Supplementary Fig. 15), indicating phenotypic differences in the type of resistant cells between high and low dose.
Metronomic therapy, in which therapy is given in discontinuous intervals, has been proposed as a means by which to decrease therapy-resistant tumour burden with mixed results36,37. We measured the number and type of resistant clones in continuous versus discontinuous dosing regimens with FateMap (Extended Data Fig. 8). Discontinuous dosing resulted in overall higher numbers of resistant cells, both from new clones and increased growth of existing clones. Clones that formed the singleton MLANA-high resistant cells in continuous dosing grew to larger colony sizes in discontinuous dosing, confirmed by time-lapse imaging (Supplementary Fig. 16).
Drug conditions affect ensemble of fates
We explored whether different MAPK inhibitors would have differential effects on resistant types (Supplementary Discussion). We performed FateMap on a population split between exposure to the BRAFV600E inhibitor vemurafenib (1 μM) and the MEK inhibitor trametinib (5 nM). Although many resistant cells from each drug treatment had similar transcriptional profiles (Extended Data Fig. 9a,b), we saw a depletion of MLANA-high cells (cluster 3) with trametinib compared to vemurafenib (Extended Data Fig. 9a,f). The number of singletons was substantially higher with vemurafenib compared to trametinib (Extended Data Fig. 9c), confirmed by imaging (Extended Data Fig. 9d,e and Supplementary Videos 1 and 3). Most of the vemurafenib cluster 3 clones did not have corresponding twins in trametinib (Extended Data Fig. 9g), suggesting that those cells were killed by trametinib as opposed to being converted to a different resistant type (Extended Data Fig. 9h,i,o). The NGFR-high cluster 4 was more populated in cells treated with trametinib compared to vemurafenib (Extended Data Fig. 9b,j,k,p,q). Twins of trametinib-treated NGFR-high cells adopted either the same type (NGFR-high) or in some cases the MLANA-high type in vemurafenib (Extended Data Fig. 9l-n). Additionally, resistant cells from dual treatment were transcriptionally indistinguishable from those obtained with trametinib alone (Extended Data Fig. 9r,s), suggesting that for the doses tested, trametinib dominated type outcomes.
We also tested the inhibition of the histone methyltransferase DOT1L for its effects on resistant types. We previously showed that pre-treatment with pinometostat, a DOT1L inhibitor, increased resistance35; FateMap applied to pinometostat-pretreated cells showed that this increase arose from new clones becoming resistant but adopting largely the same types as they did normally (Extended Data Fig. 10 and Supplementary Discussion).
Discussion
FateMap revealed extensive variability in the outcome for cells after an external cue, in this case between resistant cancer cells after treatment with targeted therapies. These outcomes are largely predetermined by molecular differences in the initial state of cells, some of which have been elucidated8. The rich mapping between the initial molecular states of cells and their outcome is strongly dependent on the external cue—different doses and drugs dictate which cells adopt what types, and hence must be specified as part of the mapping.
A central challenge for the field is to define biologically meaningful ‘clusters’ on the basis of which molecular differences are important versus those that are inconsequential for relevant biological behaviours. Methods such as ClonoCluster38 that combine clonal information with transcriptomics may help resolve such issues.
The two factors underlying cell-type determination in response to a cue are the memory of the initial state and the influence of extrinsic factors. Here, memory of the state means that twins largely adopted the same types, indicating that type was largely intrinsically determined. By contrast, a similar analysis on cardiac differentiation33 revealed that cell type was largely determined by extrinsic factors. It is also possible to have short memory but intrinsic type determination. Such cases would be difficult to discriminate because twin experiments would show little correspondence in the types of twins, even though the state of the twins before the cue still largely determines the outcome.
It is unclear how a cell’s resistant type is determined. One view is that cells have fixed regulatory programmes that lead to particular outcomes. Another view is that cells adapt to stress, leading to a wider range of outcomes, each of which may be determined by both the specific stress and the particular internal state of the cell at that time. Future work may reveal the molecular basis of this regulatory rewiring. Here we focused on characterizing resistant types of single-cell-derived cancer cells in response to anti-cancer therapies. Our work joins a growing literature on genetic and non-genetic sources of cellular heterogeneity in cancer. Cell line profiling has shown surprising levels of variability even in clonal lines, perhaps reflecting clonal memory7,13,39-41. Notably, this variability can drive a number of cancer phenotypes, including therapy resistance1,10, growth39, tumorigenicity42 and metastasis43. FateMap could reveal a diversity of emergent types in several biological processes, including stem cell reprogramming and directed differentiation33,44,45, and identify their potential origins.
Methods
Cell lines and culture
WM989 A6-G3 and WM983b E9-C6 melanoma cell lines, first described in ref. 8 and provided by the laboratory of M. Herlyn, were derived by twice single-cell bottlenecking the WM989 and WM983b melanoma cell lines, respectively. The identities of WM989 A6-G3 and WM983B E9-C6 were verified8 by DNA STR microsatellite fingerprinting at the Wistar Institute. WM989 A6-G3 5a3, first described in ref. 35, was derived by single-cell bottlenecking of WM989 A6-G3. MDA-MB-231-D4, first described in ref. 7 was derived by single-cell bottlenecking of MDA-MB-231 (ATCC HTB-26). The identity of MDA-MB-231-D4 was verified7 by ATCC human STR profiling cell line authentication services. WM3451 P2G7 and WM3623 P4E7 were derived by single-cell bottlenecking WM3451 and WM3623 respectively, both of which were provided by the laboratory of M. Herlyn and verified by ATCC human STR profiling cell line authentication services.
FOM 230-1 primary melanocytes were provided by the laboratory of M. Herlyn. In brief, they obtained foreskin tissue from the Cooperative Human Tissue Network. The foreskin was cut into pieces (approximately 5 mm × 5 mm), transferred into a tube containing dispase II, and incubated at 4 °C for 15–18 h. The next day, the epidermis was separated from the dermis and the epidermal sheets were minced as small as possible. 0.05% trypsin was added, and the minced sheets were incubated at 37 °C for 3–5 min depending on cell disaggregation. This mixture was then pipetted up and down vigorously to release single cells from the epidermal sheets. The trypsin was neutralized with soybean trypsin inhibitor and centrifuged for 5 min at 1,200 rpm at room temperature. The supernatant was aspirated to remove any remaining stratum corneum. The cell pellet was then resuspended with melanocyte growth medium.
WM989 A6-G3, WM989 A6-G3 5a3, WM983b E9-C6, WM3451 P2G7, and WM3623 P4E7 melanoma cell lines were cultured in TU2% medium (80% MCDB 153, 10% Leibovitz’s L-15, 2% FBS, 2.4mM CaCl2, 50 U ml−1 penicillin and 50 μg ml−1 streptomycin). MDA-MB-231 cell lines were cultured in DMEM10% (DMEM with Glutamax, 10% FBS and 50 U ml−1 penicillin, and 50 μg ml−1 streptomycin). All six cell lines were passaged using 0.05% trypsin-EDTA. FOM 230-1 melanocyte cells were cultured in Melanocyte Growth Medium (PromoCell, C-24010). Melanocytes were passaged using 0.05% trypsin-EDTA and neutralized using soybean trypsin inhibitor (Gibco, 17075-029). We periodically perform mycoplasma testing to confirm no contamination.
Flow sorting of barcoded cells
We used 0.05% trypsin-EDTA (Gibco, 25300120) to detach the barcoded cells from the plate and subsequently neutralized the trypsin with the corresponding medium depending on the cell type (TU2% for WM989, WM983B, WM3451, and WM3623; DMEM + 10% FBS for MDA-MB-231; Melanocyte Growth Medium for FOM 230-1). We then pelleted the cells, performed a wash with 1× DPBS (Invitrogen, 14190-136), and resuspended them again in 1× DPBS. Cells were sorted on a BD FACSJazz machine (BD Biosciences) or MoFlo Astrios (Beckman Coulter), gated for positive GFP signal and singlets. Sorted cells were then centrifuged to remove the supernatant medium containing PBS, and replated with the appropriate cell culture medium. Gating strategies are described in Supplementary Fig. 17.
Drug treatment experiments
We prepared stock solutions in DMSO of 4 mM vemurafenib (PLX4032, Selleck Chemicals, S1267), 10 mM pinometostat (Selleck Chemicals, S7062), 100 μM trametinib (Selleck Chemicals, S2673), and 4 mM paclitaxel (Life Technologies, P3456). We prepared small aliquots (10-15ul) for each drug and stored them at −20 °C to minimize freeze–thaw cycles. For drug treatment experiments, we diluted the stock solutions in culture medium to a final concentration of 1 μM and 100 nM for vemurafenib; 4 μM for pinometostat; 5 nM, 10 nM and 25 nM for trametinib; and 1 nM for paclitaxel unless otherwise specified.
The dose of vemurafenib (1 μM) was chosen per ref. 1, which was optimized for growth arrest without overt cytotoxicity. The doses of trametinib used were 5 nM for WM989 A6-G3, 10 nM for WM3623 P4E7, and 25 nM for WM3451 P2G7. These doses were also chosen based on a dose curve to obtain virtually complete growth arrest.
WM989 A6-G3 and WM983b E9-C6 cells were treated with either vemurafenib or 5 nM trametinib for 3–4 weeks, and the medium was replaced every 3–4 days. Similarly, MDA-MB-231-D4 cells were treated with paclitaxel for 3–4 weeks, and the medium was replaced every 3–4 days. WM3623 P4E7 cells were treated with 10 nM trametinib for 3–4 weeks, and the medium was replaced every 3–4 days. WM3451 P2G7 cells were treated with 25 nM trametinib for 3-4 weeks, and the medium was replaced every 3–4 days. At the end of the treatment, surviving cells were trypsinized, neutralized, washed with 1× DPBS, and then either (1) pelleted and stored at −20 °C for gDNA extraction, or (2) resuspended in PBS for scRNA-seq experiments. In some cases, cells were also fixed for imaging at the end of the treatment. For pinometostat (DOT1L inhibitor) pre-treatment (before addition of vemurafenib), WM989 A6-G3 cells were treated for five days, replacing medium once at day 3. For continuous–discontinuous dosing experiments, WM989 A6-G3 cells were barcoded and plated as described above. Both arms of the experiment were treated with 1 μM vemurafenib. The continuous dose arm was maintained in 1 μM vemurafenib for the entirety of the experiment, while the discontinuous arm was maintained in 1 μM vemurafenib for 9 days before being switched to culture medium without 1 μM vemurafenib. The discontinuous arm was then maintained in this drug-free culture medium for 25 days before being switched back to culture medium with 1 μM vemurafenib for the final 5 days of the experiment.
Quantifying homogeneity within a clone
To estimate the homogeneity in gene expression within a clone, we calculated the Spearman’s correlation coefficient, based on the top 500 most variable genes, for each pair of samples present in a clone. A group of random cells having the same size as the clone were selected as a control. The average correlation coefficient was compared between each clone and its paired control, using a Wilcoxon signed rank test (two-sided, paired).
Cell cycle and apoptosis analysis
To test the effect of cell cycle phase on clone size distribution in each cluster, we regressed out cell cycle genes. First, the cell cycle phase scores were estimated for each cell using the CellCycleScoring function in the Seurat package for the genes involved in G2/M phase and S phase31. The cell cycle scores were then regressed out from the gene expression using the RegressOut function in the Seurat package. This function models the expression levels of each gene based on the cell cycle score. The regressed-out gene expression matrix is calculated as the residuals for this model (for each gene), and these values are used for downstream analysis such as dimensional reduction. We also calculated scores for apoptosis using AUCell46. HALLMARK_APOPTOSIS, and KEGG_APOPTOSIS gene sets were obtained from the Molecular Signature DataBase (MSigDB, Broad Institute47).
Bulk RNA-seq
To quantify the phenotypic drift between early and later stages of treatment, we calculated the pairwise Euclidean distance for each pair of early and late samples for a colony. The Euclidean distance was calculated for the top 500 most variable genes in the dataset. As a paired control for each resistant colony, an equal number of random early and late samples were selected, and the Euclidean distance between them was measured. The average Euclidean distance was compared for true pairs and random pairs to estimate the extent of phenotypic drift in the colony. The comparison was statistically tested using a Wilcoxon signed rank test.
Clonal genes and clone identification
Differentially expressed genes for each clone (with clone size > 1) were identified using FindAllMarkers function. Cut-off of (foldchange) >1 and Bonferroni-corrected P value < 0.05 were used to identify clonal genes. For identification of clones, a support vector machine model was trained and tested on clones having more than 100 cells. Clones were sectioned into training and testing groups by randomly sampling 80% samples for the training group and classifying the rest as the testing group. The model was trained on expression levels of clonal genes. As a control, clone labels were shuffled for the training samples, and a support vector machine was trained on the randomized data.
Patient data analysis
Patient scRNA-seq data from GSE7205631 and patient-derived xenograft model of WM4007 cells derived from American Joint Committee on Cancer stage IV melanoma male 62-year-old patient (never treated with any drug or immune therapy prior to surgery or biopsy) treated with dabrafenib and trametinib (this study) were analysed to test for differential expression of genes identified in cell lines using FateMap. Samples were clustered using FindClusters function at a resolution of 0.6 and differentially expressed genes were identified for the clusters using FindAllMarkers function. Cut-offs of (foldchange) > 1 and Bonferroni-corrected P value < 0.05 were used to identify cluster genes. The extent of overlap in genes between each patient dataset and cell line dataset (FM01) was measured and statistically evaluated using a hypergeometric test (Fisher’s Exact Test). The extent of overlap was also compared to the extent of overlap when an equal number of genes were randomly subsampled from a list of all sequenced genes. Subsampling was repeated 100 times.
Size distribution of clusters
A Chi-square test of Independence was used to test the null hypothesis that the marker-based clusters are independent of colony size. The test was run on a contingency matrix consisting of a number of singleton (colony size = 1), small (1< colony size < 4) and large (colony size ≥ 4) colonies for each marker-based cluster. Pearson residuals were estimated to quantify the deviation of colony size distribution from the null hypothesis.
Principal component analysis of drug-resistant and naive WM989 A6-G3 melanoma cells
We used SCTransform to normalize and variance-stabilize the dataset and then performed principal component analysis (PCA) using Seurat’s RunPCA command. To get total variance for each dataset (that is, both the naive cell dataset and the drug-resistant cell dataset), we took the sum of the variance estimates per row of the SCT@scale.data matrix (where each row represented a gene). We calculated the eigenvalues by squaring the standard deviations per principal component stored by Seurat following PCA generation. To calculate the fraction of variance explained per principal component, we divided each of our eigenvalues by the total variance, using ggplot2 to plot the fraction of variance explained for each of the first 50 principal components. To estimate how much variance could be explained by pure chance, we also ran PCA on randomized data.
Cluster and Euclidean distance analysis of drug-resistant and naive WM989 A6-G3 melanoma cells
To quantify whether the resistant population exhibited greater transcriptional heterogeneity as compared to the untreated populations, we measured the Euclidean distance48-50 between clusters within each condition. We used scRNA-seq datasets from two untreated samples and two resistant samples. To control for cell numbers across datasets, we extracted the number of cells in each sample and calculated the minimum cell count of all four datasets. Since the number of cells does not vary much between the samples (minimum = 7,262, maximum = 8,420), we decided to randomly sample to the minimum of the number of cells, 7,262, and perform 10 sampling rounds. After subsampling, we applied the Seurat function SCTransform to normalize and stabilize the variance of molecular count data, and calculated the principal components. Next, we calculated the neighbourhood overlap ( Jaccard index) between every cell based on the first 50 principal components using the Seurat function FindNeighbors. We then applied the seurat function FindClusters to identify cell clusters based on their SNN. To demonstrate that our results do not depend on the chosen resolution, we clustered the cells from a resolution of 0.2 to 1 in steps of 0.1. For each resolution, we calculated the Euclidean distance between the identified clusters using the R library scperturbR48. The Euclidean distance compares the mean pairwise distance of cells across two different identified clusters to the mean pairwise distance of cells within each cluster. We used the first 50 principal components to calculate the Euclidean distance between cells. After getting the sample resolution combination number of clusters and Euclidean distance, we compared the number of clusters for a given resolution and the Euclidean distances for a given number of clusters.
WGS and processing of naive and drug-resistant clones
Eight naive clones, one original clone, and sixteen resistant clones were sequenced at 30× depth with paired-end Illumina sequencing. FASTQs were pre-processed and aligned to hg38 based on GATK4 best practices using an open-source WGS pipeline, Sarek v.3.051. Variant calling was performed using the GATK HaplotypeCaller52.
Variant annotation of naive and drug-resistant clones
Variant files were merged and annotated using the OpenCravat tool53. With this tool, the functional consequences of variants were predicted using CADD v.1.6.154 where coding variants with scaled c-scores >15 were considered deleterious. This c-score cut-off is typical for analysis and is recommended by the authors for filtering (https://cadd.gs.washington.edu/info). Variants analysed were all in protein-coding regions of the genome. Insertion and deletion variants for which CADD scores were unavailable were included in the analysis, except for those annotated as in-frame. Additionally, variants present in less than 20% of reads in a sample were removed from analysis in an effort to filter out variants that arose in clonal expansion.
To assess the potential for acquired genetic resistance to therapy, Fisher’s exact test was performed on variants that were present in resistance clones, but not untreated clones (p < 0.05). Variants in genes implicated in resistance to Vemurafinib from COSMIC (https://cosmic-blog.sanger.ac.uk/drug-resistance-data-cosmic/) were independently analysed for acquired genetic resistance. Variants in known epigenetic modifier genes55 were also separately analysed to evaluate heterogeneity. Finally, each clone’s list of CADD > 15 variants were compared to all other clones for non-random overlap with the hypergeometric test (P < 0.05). All gene sets and associated details are provided in Supplementary Table 9.
Clinical cohort
Patient samples from this study were from the enroled clinical trial (NCT02231775) were previously described56. In brief, patients aged ≥18 years with histologically proven clinical stage III or oligometastatic stage IV BRAFV600E/K melanoma deemed to be resectable by multidisciplinary consensus and measurable disease by RECIST 1.1 criteria were enroled. Those randomized to the experimental arm which received 8 weeks of neoadjuvant dabrafenib (150 mg orally twice daily) plus trametinib (2 mg orally daily) before surgical resection, who failed to achieve a major pathologic response (). Pathologic responses were determined by histopathologic examination of the complete surgical specimen by a melanoma pathologist, including SOX10 immunostaining when necessary to confirm the presence or absence of viable melanoma cells. These patients were treated at The University of Texas MD Anderson Cancer Center and had tumour samples collected and analysed under Institutional Review Board (IRB)-approved protocols. Notably, these studies were conducted in accordance with the Declaration of Helsinki and approved by The UT MD Anderson Cancer Center IRB.
Tumour microarray preparation and digital spatial profiling
Formalin fixed paraffin embedded (FFPE) tumour tissue blocks from four melanoma BRAF/MEK inhibitor treated melanoma patients from the above cohort treated were used to build a tissue microarray (TMA) block using the ATA-100 Advanced Tissue Arrayer (Chemicon International) at The University of Texas MD Anderson Cancer Center. Tissue samples included in the TMA were from pre-treatment, on-treatment or surgical resection time points. The TMA block included a total of 36 cores each measuring 1 mm in diameter. Multi-sampling of the tissue block was performed to account for intra tumoral heterogeneity. TMA slides were then assayed using the Nanostring GeoMx DSP and probed with the human melanoma morphology kit (Syto13, S100B and CD45) and the human whole transcriptome atlas (WTA) on a fee for service basis by Nanostring Technologies performed at the University of Texas Southwestern Medical Center. Three ROIs were selected per tumour core to capture inter- and intra-tumour heterogeneity.
NGS library preparation and sequencing for GeoMx spatial transcriptomics
GeoMx NGS libraries were prepared per manufacturer’s guidelines. In brief, after collection completed, aspirates in the collection plate were dried down at 65 °C for 1 h in thermal cycler with open lid and resuspended in 10 μl of nuclease-free water. Four microlitres of rehydrated aspirates were mixed with 2 μl of 5×PCR Master Mix and 4 μl of SeqCode primers, and PCR amplification was then performed with 18 cycles. The indexed libraries were pooled equally and purified twice with 1.2× AMPure XP beads (Beckman Coulter). The final libraries were evaluated and quantified using Agilent’s High Sensitivity DNA Kit and Invitrogen’s Qubit dsDNA HS assay, respectively. Total sequencing reads per DSP collection plate were calculated based on the NanoString DSP Worksheet. The libraries were subjected to 38 bp paired-end sequencing (PE38) on an Illumina NovaSeq 6000 system with a 100-cycle S1 kit (v1.5).
RNA FISH on cells in plates
We performed single-molecule RNA FISH as previously described57. For the genes used in this study, we designed complementary oligonucleotide probe sets using custom probe design software (MATLAB) and ordered them with a primary amine group on the 3′ end from Biosearch Technologies (Supplementary Table 3 for probe sequences). We then pooled each gene’s complementary oligonucleotides and coupled the set to Cy3 (GE Healthcare), Alexa Fluor 594 (Life Technologies) or Atto 647N (ATTO-TEC) -hydroxysuccinimide ester dyes.
The cells were fixed as follows: we aspirated medium from the plates containing cells, washed the cells once with 1× DPBS, and then incubated the cells in the fixation buffer (3.7% formaldehyde in 1× DPBS) for 10 min at room temperature. We then aspirated the fixation buffer, washed samples twice with 1× DPBS, and added 70% ethanol before storing samples at 4 °C. For hybridization of RNA FISH probes, we rinsed samples with wash buffer (10% formamide in 2× SSC) before adding hybridization buffer (10% formamide and 10% dextran sulfate in 2× SSC) with standard concentrations of RNA FISH probes and incubating samples overnight with coverslips, in humidified containers at 37 °C. The next morning, we performed two 30-min washes at 37 °C with the wash buffer, after which we added 2× SSC with 50 ng ml−1 DAPI. We mounted the sample(s) for imaging in 2× SSC.
Immunofluorescence and imaging
For NGFR staining of fixed cells, after fixation and permeabilization, we washed the cells for 10 min with 0.1% BSA/PBS, and then stained the cells for 30 min with 1:500 anti-NGFR APC-labelled clone ME20.4 (BioLegend, 345107). We washed the cells 5 times with 0.1% BSA/PBS and followed with a final wash with PBS for 2 min at room temperature. Fresh PBS was added prior to imaging. Wells were imaged either immediately or after storage in 4 °C overnight. All conditions (wells) were fixed, permeabilized, and stained at the same time with identical settings. Wells from the same plate were all imaged consecutively in the same imaging session.
For dpERK staining of fixed cells, after fixation and permeabilization, we used primary antibodies targeting dpERK (p44/p42 ERK D12.14.4E Cell Signaling, 4370). First, we rinsed cells twice for 5 min each time with 5% BSA in PBS (5% BSA-PBS) and then incubated in the dark at room temperature for 2 h in 5% BSA-PBS 1:200 dpERK antibodies. Next, we washed the cells 5 × 5 min with 5% BSA-PBS and then incubated the cells at room temperature for 1 h in 5% BSA-PBS containing 1:500 goat anti-rabbit secondary antibody conjugated to Alexa Fluor 594 (Cell Signaling, 8889). After the secondary incubation, we washed the cells 5 × 5 min with 5% BSA-PBS containing 50 ng ml−1 DAPI and then replaced the wash buffer with fresh PBS and proceeded with imaging consecutively. All conditions (wells) were fixed, permeabilized, stained, and imaged at the same time with identical settings.
For colony counting via nuclei imaging, the cells were fixed by aspirating medium from the plates containing cells, washing the cells once with 1× DPBS, and then incubating the cells in the fixation buffer (3.7% formaldehyde in 1× DPBS) for 10 min at room temperature. We aspirated the fixation buffer, washed samples twice with 1× DPBS, and added 70% ethanol before storing samples at 4 °C. Fixed cells were stained for nuclei by incubation in 2× SSC containing 50 ng ml−1 DAPI and then imaged each well via a tiling scan at 10× magnification.
Barcode lentivirus library generation and diversity estimation
Barcode libraries were constructed as previously described8, and the protocol is available at https://www.protocols.io/view/barcode-plasmid-library-cloning-4hggt3w. In brief, we modified the LRG2.1T plasmid (gift from J. Shi) by removing the U6 promoter and single guide RNA scaffold. We then inserted a spacer sequence flanked by EcoRV restriction sites after the stop codon of GFP, subsequently digesting this vector backbone with EcoRV (NEB) and gel purifying the linearized vector. We ordered PAGE-purified ultramer oligonucleotides (IDT) containing 100 nucleotides with a repeating WSN pattern (W = A or T, S = G or C, N = any) surrounded by 30 nucleotides homologous to the vector insertion site (Supplementary Table 1). We subsequently used Gibson assembly followed by column purification to combine the linearized vector and barcode oligonucleotide insert. We performed nine electroporations of the column-purified plasmid into Endura electrocompetent Escherichia coli cells (Lucigen) using a Gene Pulser Xcell (Bio-Rad). We then allowed for their recovery before plating serial dilutions and seeding cultures for maxi-preparation. We incubated these cultures on a shaker at 225 rpm and 32 °C for 12–14 h, pelleted the resulting cultures by centrifugation, and used the EndoFree Plasmid Maxi Kit (Qiagen) to isolate plasmid according to the manufacturer’s protocol. Barcode insertion was verified by polymerase chain reaction (PCR) on colonies from plated serial dilutions. We pooled the plasmids from the 9 separate cultures in equal amounts by weight before packaging into lentivirus.
To estimate the barcode library complexity, we performed three independent transductions (see below for details) on WM989 A6-G3 melanoma cell lines, extracted gDNA, sequenced the barcodes, and noted the total and overlapping barcodes between pairs of three independent transductions. We estimated the barcode library complexity with the equation used in mark and capture analysis: , where is number of recaptured barcodes that were marked, is number of barcodes captured in the second pool, is the number of barcodes marked in the first pool and is the estimated barcode library complexity. Using this formula, we found the barcode diversity from three transductions to be 48.9, 54.4 and 63.3 million barcodes (Supplementary Fig. 1).
Lentivirus packaging and transduction
We adapted previously described protocols to package lentivirus8,35. We first grew HEK293FT to near confluency (80-95%) in 10-cm plates in DMEM containing 10% FBS and 50 U ml−1 penicillin, and 50 μg ml−1 streptomycin, and one day before plasmid transfection, we changed the medium in HEK293FT cells to DMEM containing 10% FBS without antibiotics. For each 10-cm plate, we added 80 μl of polyethylenimine (Polysciences, 23966) to 500 μl of Opti-MEM (Thermo Fisher Scientific, 31985062), separately combining 5 μg of VSVG and 7.5 μg of pPAX2 and 7.35 μg of the barcode plasmid library in 500 μl of Opti-MEM. We then incubated both solutions separately at room temperature for 5 min. We then mixed both solutions together by vortexing and incubated the combined plasmid–polyethylenimine solution at room temperature for 15 min. We added 1.09 ml of the combined plasmid–polyethylenimine solution dropwise to each 10-cm dish. After 6–7 h, we aspirated the medium from the cells, washed the cells with 1× DPBS, and added fresh TU2% medium. The next morning, we aspirated the medium, and added fresh TU2% medium. Approximately 9–11 h later, we transferred the virus-laden medium to an empty, sterile 50-ml tube and stored it at 4 °C, and added fresh TU2% medium to each plate. We continued to collect the virus-laden medium every 9–11 h for the next ~30 h in the same 50-ml tube, and stored the collected medium at 4 °C. Upon final collection, we filtered the virus-laden medium through a 0.45-μm PES filter (MilliporeSigma SE1M003M00) and stored 1.5-ml aliquots in cryovials at −80 °C.
To transduce WM989 A6-G3, WM983b E9-C6, WM3451 P2G7, WM3623 P4E7, FOM 230-1 and MDA-MB-231-D4 cells, we freshly thawed virus-laden medium on ice, added it to dissociated cells, and plated ~100,000 cells per well in a 6-well plate with ~3 ml of the medium. We then centrifuged the 6-well plate at 1,750 rpm (517) for 25 min. We then incubated the 6-well plate at 37 °C and replaced the medium at ~8 h, washed with 1× DPBS, and added fresh medium (TU2% for WM989, WM983B, WM3451 and WM3623; Melanocyte Growth Media for FOM 230-1; and DMEM with 10% FBS for MDA-MB-231) to each well. After ~24 h, we passaged the cells to 10-cm dishes, at which point we typically combined 2 wells by plating them together in a 10-cm dish. For the FateMap experiments with WM989 A6-G3 melanoma cells exposed to vemurafenib, we planned to start each split with 600,000 barcoded (GFP-positive) cells. The barcoded cells (GFP-positive) were then sorted and plated for a total of 4–5 population doublings until treatment with appropriate drugs. The time to 4–5 population doubling was 11–12 days for FOM 230–1, 10–11 days for WM989, WM3451 and WM3623 6–7 days for WM983B, 5–6 days for MDA-MB-231. The volume of the virus-laden medium was decided by the titre performed on each cell line and target MOI. For scRNA-seq experiments in particular, we targeted for the MOI to be ~10–25% to minimize the fraction of cells with multiple unique barcodes. We found it to be relatively computationally challenging to differentiate multiple-barcoded cells from doublets introduced by gel beads-in-emulsions.
scRNA-seq
We used the 10X Genomics scRNA-seq kit v3 to sequence barcoded cells. We resuspended the cells (targeting ~10,000 cells for recovery per sample) in PBS and followed the protocol for the Chromium Next GEM Single Cell 3′ Reagent Kits v3.1 as per manufacturer directions (10X Genomics). In brief, we generated gel beads-in-emulsion (GEMs) using the 10X Chromium system, and subsequently extracted and amplified (11 cycles) barcoded cDNA as per post-GEM RT-cleanup instructions. We then used a fraction of this amplified cDNA (25%) and proceeded with fragmentation, end-repair, poly A-tailing, adapter ligation, and 10X sample indexing per the manufacturer’s protocol. We quantified libraries using the High Sensitivity dsDNA kit (Thermo Fisher Q32854) on Qubit 2.0 Fluorometer (Thermo Fisher Q32866) and Bioanalyzer 2100 (Agilent G2939BA) analysis prior to sequencing on a NextSeq 500 machine (Illumina) using 28 cycles for read 1, 55 cycles for read 2, and 8 cycles for i7 index. A subset of FateMap sequencing runs (NRAS melanoma samples and Metronomic therapy experiments), we used NextSeq 2000 (Illumina) using 26 cycles for read 1, 124 cycles for read 2, and 8 cycles for i7 index.
Computational analyses of scRNA-seq expression data
We adapted the cellranger v3.0.2 by 10X Genomics into our custom pipeline to map and align the reads from NextSeq sequencing run(s). In brief, we downloaded the bcl counts and used cellranger mkfastq to demultiplex raw base call files into library-specific FASTQ files. We aligned the FASTQ files to the hg19 human reference genome and extracted gene expression count matrices using cellranger count, while also filtering and correcting cell identifiers and unique molecular identifiers (UMI) with default settings.
We then performed the downstream single-cell expression analysis in Seurat v3. Within each experimental sample, we removed genes that were present in less than three cells, as well as cells with less than or equal to 200 genes. We also filtered for mitochondrial gene fraction which was dependent on the cell type. For non-identically treated samples, we integrated them using scanorama58, which may work better to integrate non-similar datasets and avoid over-clustering. For samples that were exposed to identical treatment, we normalized using SCTransform59 and the samples according to the Satija laboratory’s integration workflow (https://satijalab.org/seurat/articles/integration_introduction.html). Using scanorama on identically treated samples produced qualitatively similar results (Supplementary Fig. 6).
For each experiment, we used these integrated datasets to generate data dimensionality reductions by PCA and UMAP, using 50 principal components for UMAP generation. For a majority of analyses, we worked with the principal component space and normalized expression counts. For rare cases where we used Seurat UMAP clusters, we tested a range of resolutions with Seurat’s FindClusters command. Our conclusions did not change qualitatively when we tested resolutions between 0.4 and 1.2 (Supplementary Fig. 3). Details for all FateMap experiments, including total cell numbers and total barcoded cells per sample, are provided in Supplementary Table 11.
Bulk sequencing and analysis
We conducted standard bulk paired-end (37:8:8:38) RNA-seq using RNeasy Micro (Qiagen 74004) for RNA extraction, NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB E7490L), NEBNext Ultra II RNA Library Prep Kit for Illumina (NEB E7770L), NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set 1) oligonucleotides (NEB E7600S), and an Illumina NextSeq 550 75 cycle high-output kit (Illumina 20024906), as previously described1,60. Prior to extraction and library preparation, the samples were randomized to avoid any experimental and human biases. As previously described, we aligned RNA-seq reads to the human genome (hg19) with STAR v2.5.2a and counted uniquely mapping reads with HTSeq v0.6.11,60,61 and outputs the count matrix. The counts matrix was used to obtain transcripts per million and other normalized values for each gene using custom scripts.
To compare bulk-sequencing data with scRNA-seq datasets, we first extracted differentially expressed genes for each scRNA-seq cluster (SNN = 0.6) using the Seurat command FindAllMarkers, and filtering for adjusted P value <0.05 and avg_logFC > 1. Similarly, we extracted differentially expressed genes for each condition of interest (morphology or invasiveness) and filtering for −1.5 < avg_logFC > 1.5. We then calculated the similarity score, which represents the normalized fraction of overlap of differentially expressed genes for the condition of interest between bulk-sequencing data and each scRNA-seq cluster.
Expanded resistant colonies morphology categorization
The resistant colonies were manually binned in one of the three categories based on the morphology images taken from the Nikon TS2-FL microscope: ‘small’, ‘on top’ and ‘not on top’. Those that were difficult to be binned in any category were labelled as ‘uncategorized’. Of the three categories, the ‘small’ category was the most easy to identify manually and label due to characteristic optical and proliferation (slow growing) properties. The other two categories (‘on top’ and ‘not on top’) had further sets of morphological and proliferative differences, but were difficult to be parsed into specific categories. Of the 64 resistant colonies isolated and expanded across therapy treatments of vemurafenib and trametinib, five were uncategorized. The differentially upregulated genes for ‘small’ and ‘on top’ are provided in Supplementary Table 4. For category ‘not on top’, only four genes were differentially upregulated, thus precluding us from doing further analysis. Some resistant colonies did not survive the expansion process.
Nearest neighbour analysis
We developed a quantifiable approach to measure the gene expression relatedness of different barcoded clones. For each pair of barcoded clones, we calculated the nearest neighbours for each cell in the 50-dimensional principal component space. We then classified the neighbours as ‘self’ if the neighbours are from the same barcode clone or ‘non-self’ if they belong to the other barcode clone. We defined a quantifiable metric, the mixing coefficient, as follows:
A mixing coefficient of 1 would indicate perfect mixing such that each cell has the same number of self and non-self neighbours. A mixing coefficient of 0 would indicate that there is no mixing and that each cell within a barcoded clone lies far away from the other barcoded clone in the principal component space. The higher the mixing coefficient, the higher the transcriptional relatedness of the barcoded clones analysed. As the number of nearest neighbours depends on the size (number of cells) of a clone, we performed this analysis between cells of similar clone size (Supplementary Fig. 14). Within a specified size range, we normalized the number of neighbours per barcode clone to account for small size differences. The number of neighbours to extract was chosen to be a minimum of 10 or the size of the smaller of the two barcode clones.
Barcode recovery from scRNA-seq data
As the barcodes are transcribed, we extracted the barcode information from the amplified cDNA from 10X Genomics V3 chemistry protocol (step 2). We ran a PCR side reaction with one primer that targets the 3′ UTR of GFP and the other that targets a region introduced by the amplification step within the V3 chemistry of 10X genomics (read 1). The two primers amplify both the 10X cell-identifying sequence as well as the 100 bp barcode that we introduced lentivirally. The number of cycles, typically between 12–15, are decided by the Ct value from a quantitative PCR reaction (New England Biolabs M0543) for the specified cDNA concentration. The thermal cycler (Veriti 4375305) was set to the following settings: 98 °C for 30 s, followed by cycles of 98 °C for 10 s and then 65 °C for 2 min and, finally, 65 °C for 5 min. Upon completion of the PCR reaction, we immediately performed a 0.7× bead purification (Beckman Coulter B23318) followed by final elution in nuclease-free water. Purified libraries were quantified with High Sensitivity dsDNA kit (Thermo Fisher) on Qubit Fluorometer (Thermo Fisher), pooled, and sequenced on a NextSeq 500. We sequence 26 cycles on read 1 which gives 10X cell-identifying sequence and UMI, 124 cycles for read 2 which gives the barcode sequence, and 8 cycles for index i7 to demultiplex pooled samples. The primers used are provided in Supplementary Table 2.
Experiments in mice
For each experiment, WM989 cells were uniquely barcoded with protocols as described above and allowed to divide 4–5 times before splitting the barcoded pool into 5 groups, each containing an equal number of cells. We aimed for ~1–1.5 million WM989 cells to be injected per mice. All animal experiments were performed in accordance with institutional and national guidelines and regulations. The protocols have been approved by the Wistar IACUC. WM989 cells in serum-free RPMI 1640 medium (Corning 10-40-CM) were mixed in a 1:1 ratio with Growth Factor Reduced Matrigel (Corning 354230), then were subcutaneously implanted into the flanks of NSG mice. Once tumours reached about 100 mm3 per calliper measurement, animals were randomized into treatment groups. Treatment consisted of Low Dose 41.7 mg PLX4720 per kg diet (Research Diets D21051202i), or High Dose 417 mg PLX4720 per kg diet (Research Diets D21051201i), to which they had constant access. PLX4720 is closely related both structurally and biologically to PLX4032, which was used for in vitro experiments, and also targets the same molecule and BRAF-V600E structural configuration. As PLX4720 continues to be the drug used in mouse xenograft models and has a similar half-life to PLX403235,62, we used PLX4730 instead of PLX4032 for in vivo experiments. Tumour size was measured with calipers every 2–4 days, and tumour volumes were calculated according to the equation 0.5× , where is the longest side and is a line perpendicular to . Mice were euthanized once tumours reached 1,500 mm3, and once one mouse reached the endpoint, all mice from the same barcode pool were euthanized regardless of tumour size. The tumour tissue was snap frozen in liquid N2 for gDNA extraction. We performed five biological replicate experiments. We could not extract sufficient gDNA from experiments 3 and 4, and these were excluded from barcode split population analysis.
Computational analyses of barcoded single-cell datasets
The barcodes from the side reaction of single-cell cDNA libraries were recovered by developing custom shell, R and Python scripts (see Code availability). In brief, we scan through each read searching for sequences complementary to the side reaction library preparation primers, filtering out reads that lack the GFP barcode sequence, have too many repeated nucleotides, or do not meet a phred score cut-off. Since small differences in otherwise identical barcodes can be introduced due to sequencing and/or PCR errors, we merged highly similar barcode sequences using STARCODE software63, available at https://github.com/gui11aume/starcode. For varying lengths of barcodes (30, 40 or 50, see the pipeline guide provided) depending on the initial distribution of Levenshtein distance of non-merged barcodes, we merged sequences with Levenshtein distance ≤8, summed the counts, and kept only the most abundant barcode sequence. The decision to use a Levenshtein distance ≤8 was reached by systematically analysing the difference between experimentally observed mean Levenshtein distance with the theoretically provided mean Levenshtein distance for a pair of barcodes. We then compared various Levenshtein distances and found that a Levenshtein distance ≤8 resulted in the least difference between observed and expected mean distances between barcodes. Results from this analysis are provided in Supplementary Fig. 2.
For next processing steps and downstream analysis, we first filtered out all barcodes that were associated below the minimum cut-off (dependent on sequencing depth) of unique molecular identifiers (UMI). We next removed all barcodes where one 10X cell-identifying sequence was associated with more than one unique barcode. This could either result from multiplets introduced within gel beads-in-emulsions or because of the same cell receiving multiple barcodes during lentiviral transduction. After these two filtering steps, we were able to recover barcodes associated with 50–60% of single cells, which were then used to do the downstream clone-resolved analysis.
Barcode library preparation and sequencing from gDNA
We prepared barcode libraries from gDNA as previously described8. In brief, we isolated gDNA from barcoded cells using the QIAmp DNA Mini Kit (Qiagen, 51304) per the manufacturer’s protocol. Extracted gDNA was stored as a pellet in −20 °C for days to weeks before the next step. We then performed targeted amplification of the barcode using custom primers containing Illumina adapter sequences, unique sample indices, variable-length staggered bases, and a ‘UMI’ consisting of 6 random nucleotides (NHNNNN). As reported in ref. 8, this UMI does not uniquely tag barcode DNA molecules, but nevertheless appeared to increase reproducibility and normalize raw read counts. We determined the number of amplification cycles () by initially performing a separate quantitative PCR and selecting the number of cycles needed to achieve one-third of the maximum fluorescence intensity for serial dilutions of gDNA. The thermal cycler (Veriti 4375786) was set to the following settings: 98 °C for 30 s, followed by N cycles of 98 °C for 10 s and then 65 °C for 40 s and, finally, 65 °C for 5 min. Upon completion of the PCR reaction, we immediately performed a 0.7× bead purification (Beckman Coulter B23318), followed by final elution in nuclease-free water. Purified libraries were quantified with a High Sensitivity dsDNA kit (Thermo Fisher) on a Qubit Fluorometer (Thermo Fisher), pooled, and sequenced on a NextSeq 500 using 150 cycles for read 1 and 8 cycles for each index (i5 and i7). The primers used are provided in Supplementary Table 6.
Analyses of sequenced barcodes from gDNA
The barcode libraries from gDNA sequencing data were analysed as previously described8, with the custom barcode analysis pipeline (see Code availability). In brief, this pipeline searches for barcode sequences that satisfy a minimum phred score and a minimum length. Note that we count the total number of UMIs as described in ‘Barcode library preparation and sequencing from gDNA’. These UMIs do not necessarily tag unique barcode DNA molecules, but empirically they slightly improve correlation in barcode abundance among replicate libraries8. We also use STARCODE63, available at https://github.com/gui11aume/starcode, to merge sequences with Levenshtein distance ≤8 and add the counts across collapsed (merged) barcode sequences.
In this current work, we also created two subclones (D8 and F8) of WM989 A6-G3, with each clone carrying a unique barcode sequence (Supplementary Table 7). We used these two clones as standards to convert sequencing counts into actual cell numbers which substantially reduces the PCR and cell number bias across samples. We spiked in a known number of cells from each of the two barcoded clones to each cell pellet before gDNA extraction and sequencing. We then used linear regression (on (0, 0), (count_F8, cells_F8), (count_D8, cells_D8)) to get the conversion factor from read counts of all barcodes to their actual cell numbers. We used a minimum cell count and fold change between pairs of conditions to annotate clones as condition-dependent or condition-independent. We found that changing the cut-off for minimum cell count did not affect our conclusions (Supplementary Fig. 15).
Simulation for barcode overlap
We adapted a described previously computational model that simulates all steps of our experiments designed to compare barcode overlap in resistant colonie64. The model simulates cell seeding and infection. Each cell is represented as an independent object. The number of barcoded cells was calculated as
where the MOI was estimated for our barcode lentivirus. Barcodes were represented by integer numbers from among 50 million variants of unique barcodes estimated from our lentiviral library diversity (see Methods and Supplementary Fig. 1). The subset of barcoded cells was assigned barcodes randomly with replacement from this library. The model simulates expanding cells prior to addition of the drug. Each cell, regardless of barcode status, undergoes a cell division procedure with 4–5 rounds depending on the experimental condition. In each round, a given cell will give rise to a number of progeny sharing the same barcode based on an estimated distribution of cell division. The model plates cells onto separate dishes or splits (total dishes or splits dependent on the experiment) by randomly assigning each cell an integer. The model simulates the formation of resistant colonies assuming a purely stochastic model of resistance. A defined fraction of cells on each plate form resistant colonies based on a resistance efficiency that was calculated as
based on experimental observations. Additionally, each cell forming a resistant colony is subject to a probabilistic material loss at different stages of the in silico experiment, including cell culture (5% both in vivo and in vitro), gDNA extraction (0% in vitro and 10% in vivo mouse), DNA sequencing library preparation (5% both in vivo and in vitro), and (as needed) mouse injection (15%) and tumour extraction (15%). The output of the model was the number of barcodes shared between different plates or barcode overlap. This was not corrected for cells having more than one lentiviral barcode due to multiple integrations for a given MOI. We performed 1,000 and 200 independent simulations for in vitro and in vivo experiments, respectively to obtain a distribution of barcode overlap values to determine the probability of obtaining our observed barcode overlap from our experiments by random chance. This model was written and executed in R.
Tissue sectioning and RNA FISH
We adapted the protocol described previously65. Tumour tissue extracted from mice subcutaneously injected with WM989 A6-G3 5a335 was mounted in Tissue-Plus OCT compound (Fisher Healthcare), flash-frozen in liquid nitrogen, wrapped in aluminium foil, and then stored at −80 °C. Tissues were cryosectioned at 6 or 8 μm using a Leica CM1950 cryostat within the Center for Musculoskeletal Disorders (PCMD) Histology Core. We adhered tissue samples to positively charged Superfrost Plus slides (Fisher Scientific). We then washed slides in PBS, fixed them in 4% formaldehyde for 10 min at room temperature, then washed them two times in PBS. Fixed slides were stored in 70% ethanol in LockMailer microscope slide jars at 4 °C.
For RNA FISH on tissue sections, we placed the slide in the wash buffer (2× SSC, 10% formamide) and allowed it to equilibrate for 2–3 min. We then removed slides from the wash buffer and dried off the slides with kimwipes. Immediately after drying, we added 500–1,000 μl of 8% SDS to the tissue section on the slide for 1 min. After 1 min, we turned the slide on the side to remove the SDS, transferred the slide to the wash buffer, and kept it in the wash buffer for ~2 min. We then tapped down the wash buffer on a kimwipe or paper towel, added 50 μl of probe-containing hybridization buffer (10% dextran sulfate, 2× SSC, 10% formamide) as a drop in centre of tissue sample, and placed a coverslip on top of the tissue section. We then placed the slide into a humidifying chamber to prevent the slide from drying, and placed the chamber containing slides in a 37 °C incubator overnight. We took out the chamber to room temperature the next day and placed the slide with coverslip into a wash buffer container and let the coverslip come off. We then transferred the slides to a container or LockMailer jar containing wash buffer and incubated for 30 min at 37 °C. We removed it from the 37 °C incubator and performed a second incubation with wash buffer and DAPI, and put it back into 37 °C for another 30 min. We performed 1 final wash in wash buffer, rinsed 2 times in 2× SSC, and added 50–100 μl of 2× SSC to the tissue section. We then placed a coverslip on the tissue, sealed it with nail varnish, and let it dry before imaging. We used clampFISH data on tissue sections (Supplementary Figs. 12 and 13) from another study, and detailed methods on clampFISH protocols are provided in29.
Spheroid assay
We adapted the protocol described previously66. Tissue culture–treated 96-well plates were coated with 50 μl 1.5% Difco Agar Noble (Becton Dickinson). Melanoma cells were seeded at 3,000 cells per well and allowed to form spheroids over 96 to 120 h. Spheroids were collected and embedded using collagen type I (GIBCO, A1048301). The collagen plug was prepared as 300 μl mix per layer, and two layers were added into each well (1× Eagle Minimum Essential Medium (EMEM; 12-684, Lonza); 10% FCS; 1× l-glutamine; 1.0 mg ml−1 collagen I; NaHCO3 (17-613E, Lonza), diluted in PBS as required). The first layer was added to each well and allowed to solidify. After 5 to 10 min, spheroids were mixed with the remaining 300 μl mix and added to the well to solidify. Once the plug was solidified, medium was added to the well and incubated at 37 °C at 5% CO2 and imaged after 24 h and 48 h. Spheroid images were acquired on a Nikon Ti2E inverted microscope. Quantitation of invasive surface area was performed using NIS Elements Advanced Research software. Of the 64 resistant colonies we expanded, only 24 colonies had enough cells to form multiple spheroids per colony. Of these 24, 6 did not form spheroids and 2 only had one spheroid each. Of the remaining 16 colonies, 8 colonies belonged to resistant colonies emerging from trametinib, 3 belonged to 1 μM vemurafenib and 5 belonged to 250 nM vemurafenib. The differentially upregulated genes for ‘fast invading’ and ‘slow invading’ are provided in Supplementary Table 5.
Imaging
To image RNA FISH and nuclei signal, we used a Nikon TI-E inverted fluorescence microscope equipped with a SOLA SE U-nIR light engine (Lumencor), a Hamamatsu ORCA-Flash 4.0 V3 sCMOS camera, and 4× Plan-Fluor DL 4XF (Nikon MRH20041/MRH20045), 10× Plan-Fluor 10×/0.30 (Nikon MRH10101) and 60× Plan-Apo λ (MRD01605) objectives. We used the following filter sets to acquire different fluorescence channels: 31000v2 (Chroma) for DAPI, 41028 (Chroma) for Atto 488, SP102v1 (Chroma) for Cy3, 17 SP104v2 (Chroma) for Atto 647N, and a custom filter set for Alexa 594. We tuned the exposure times depending on the dyes used (Cy3, Atto 647N, Alexa 594, and DAPI). For large tiled scans, we used a Nikon Perfect Focus system to maintain focus across the imaging area. For imaging RNA FISH signals in tissue sections, we acquired -stacks (three positions) at 60× magnification, and used maximum intensity projection to visualize the signal. For bright-field imaging of resistant colonies, we used a Nikon Eclipse TS2-FL with an Imagingsource DFK 33UX252 camera and 4× Plan-Fluor 4×/0.13 (Nikon MRH20041) objective. For time-lapse imaging of the emergence of drug-resistant colonies, we used an IncuCyte S3 Live Cell Imaging Analysis System (Sartorius) with a 4× objective on WM989 A6-G3 tagged with an mCherry nuclear reporter (H2B–mCherry).
Image processing
For colony counting, all image processing was done blind to the condition (either drug type or dose, or with or without DOT1L inhibition). The wells (within the 6-well plates) were pseudo-named in a format independent from drug or dose. Nikon-generated nd2 files were first parsed using custom MATLAB scripts (rajlabformattools) to convert them from nd2 format to tiff format (see Code availability). Images for each well were then stitched using custom MATLAB code and the number of cells in each well was counted using custom MATLAB code with a Gaussian filter consistent across samples being compared (colonycounting_v2). Colonies within each well were manually segmented and MATLAB was used to calculate the total number of colonies, cells per colony, and cells outside of colonies (see Code availability). The summary average counts for each colony is provided in Supplementary Table 10. RNA FISH and clampFISH on tissue sections were quantified using a custom built computational pipeline, also used in other previous studies1,29.
For immunofluorescence, all image processing was done blind to drug conditions; the wells were pseudo-named in a format independent from drug or dose. Nikon images were first stitched using Nikon Elements software. The channels were split in Fiji and scaled to a smaller size compared to their original pixel size prior to placing them in illustrator. For the conditions being compared (for example, each well with a different drug dose or type), the individual channels were equally adjusted for brightness and contrast across each pair of wells for the signal of interest. Raw nd2 files are provided for each imaging experiment which contains additional metadata for image settings. For the images taken on the bright-field microscope Nikon TS2-FL, the scale bar lengths were calculated by using the pixel size given by the manufacturer of the camera. For well images taken on the Nikon TI-E inverted fluorescence microscope, the scale bar lengths were calculated using Nikon Elements software to add a line of a specific length to the images.
Estimation of survival fraction in MDA-MB-231-D4
We estimated the frequency of drug resistance in MDA-MB-231-D4 by computing the fraction of surviving barcoded colonies upon treatment with paclitaxel as compared to the total number of uniquely barcoded cells in the initial population. From two separate split population experiments, we obtained the frequency to be 1:956 and 1:1,303 (see Data availability and Code availability for the script).
Vemurafenib-resistant colony isolation and expansion
WM989 A6-G3 cells were treated with 1 μM vemurafenib (PLX4032, Selleck Chemicals, S1267) for four weeks to allow resistant colonies to form and expand. Plates with resistant colonies were scanned under a tissue culture microscope to identify colonies that were physically distant from other colonies or singletons. The distant colonies were imaged with a 4× objective, physically isolated, and dissociated via treatment with 0.05% trypsin for 5–10 min as some colonies took longer to detach than others. Colony suspensions were plated in 12-well plates containing 1 ml of TU2% medium containing 1 μM vemurafenib, and the medium was changed the following day to remove residual trypsin. Isolated resistant colonies were closely monitored for growth daily, and the 1 μM vemurafenib containing medium was changed every 3–5 days. Isolated resistant colonies were expanded into 10-cm and then 15-cm plates when cells reached 70–80% confluence. When expanding from 12-well plates to 10-cm plates, 75,000 cells were collected for RNA-seq. When cells reached 70–80% confluence in 15-cm plates, 75,000 cells were collected for a later time point of RNA-seq.
Silencing plate cell culture, imaging and analysis
Drug-naive WM989 A6-G3s were transduced with unique barcodes as described above. Cells were plated in 6-well plates at a density of 100,000 cells per well. One plate was fixed in formaldehyde after 24 h using a protocol described above and the second plate was treated with medium containing 1 μM PLX after 24 h. This drug treatment was continued for three weeks which was enough time for resistant colonies to form. This second plate containing resistant populations was fixed in formaldehyde. Wells were imaged on a Nikon TI-E inverted fluorescence microscope equipped with a SOLA SE U-nIR light engine (Lumencor), a Hamamatsu ORCA-Flash 4.0 V3 sCMOS camera, and 10× Plan-Fluor 10×/0.30 (Nikon MRH10101). Images were analysed using the custom Raj Lab image processing software NimbusImage via the CellPose tool67. Average intensity for each of the cells was calculated also using NimbusImage.
WGS cell culture, isolation and expansion
WM989 A6-G3 cells that were frozen as a backup from the original experiments were thawed and passaged three times before bottle-necking. Cells were trypsinized, centrifuged and resuspended to a concentration of either 0.5 cells per 200 μl, 1 cell per 200 μl, or 2 cells per 200 μl. One full 96-well plate was used for each dilution. Plates were imaged on an Incucyte S3 24 h after the cells were plated. The plates were scanned again two days after the first scan and seven days after the second scan. Any wells that did not definitively have a single cell in the first scan were excluded from the experiment. Once clones had reached ~90% confluency in the 96-well plate, they were scaled up to a 24 well plate. This process was repeated to 12-well, 6-well and 10-cm plates. Ultimately, 8 unique clones were generated for WGS.
NRAS cell line drug treatment and imaging
WM3451 P2G7 and WM3623 P4E7 cell lines were imaged during drug treatment to track morphological changes and colony formation. Cells were plated in 10-cm dishes at a density of 300,000 cells per dish. Medium was changed 15 h after plating for medium containing 10 nM trametinib (WM3623 P4E7) or 25 nM trametinib (WM3451 P2G7). Medium was changed and bright-field images were taken of the cells every 3–4 days, and this was continued for 5 weeks.
Patient sample GeoMx analysis
Segments for spatial sequencing were drawn based on manual inspection of S100B and CD45 staining using the NanoString GeoMx DSP software and sequenced as per the GeoMx protocol. Sequencing data was processed and subjected to quality control using the manufacturer’s proprietary software. Quality control analysis was performed using the manufacturer’s suggested default values except where otherwise noted. In brief, segments with fewer than 1,000 reads, less than 80% aligned reads and less than 50% sequencing saturation were excluded from further analysis. Biological probes were excluded from the target count calculation if the ratio of the geometric mean of the probe in all segments to the geometric mean of the probe in the target was less than or equal to 0.1 or if the probe failed the Grubbs outlier test in 20% or more of samples. Finally, we kept targets that exceeded a threshold (higher of limit of quantification or count of two) in at least 5% of samples. The filtered target counts were exported to R and counts were normalized using the Trimmed means of values method. Count per million values for selected genes based on in vitro resistant fate types were then plotted for each segment.
GeoMx variable gene overlap with FateMap resistant fate type markers
The unsupervised cell clusters in the FM01 dataset were annotated with the resistant cell fate types shown in Fig. 1. The data was subset to include only cells that had a resistant fate type label, and the Seurat command FindAllMarkers with the options ‘only.pos =TRUE’ was run on the resulting subset dataset. The filter p_val_adj < 0.05 was applied to the resulting marker list, and the top 100 marker genes for each of the five resistant fate types were chosen as the top 500 markers. The coefficient of variation for each gene across patient and sample type (pre-treatment or resistant) was calculated for each gene in the GeoMx data. The GeoMx data was then subset to include only genes expressed in the WM989 cells used in the FM01 experiment. Box plots were generated in R comparing the coefficients of variation of the top 500 FateMap markers to all other genes. values were computed using the two-sided, unpaired Wilcoxon test.
Resistant fate type GeoMx uneven partitioning analysis
The unsupervised cell clusters in the FM01 dataset were annotated with the resistant cell fate types shown in Fig. 1 and the object was subset to include only those cells. The Seurat command AverageExpression with options “return.seurat = F, assays = ‘SCT’, slot = ‘counts’” was used to generate a gene signature matrix for each resistant fate type for downstream deconvolution. The SCT-normalized counts were transformed into log-transformed counts per million values via the trimmed means of method to match the GeoMx data. The following process was then repeated once per patient. Two ROIs on the same plug were chosen. We calculated a weighted mean of their transcriptomes using the number of nuclei from the GeoMx metadata. The gene signature matrix and the original and average ROI transcriptomes were then subset to only include genes common to all datasets and in common with FM01 differentially expressed genes. This was done to improve deconvolution by removing noisy, uninformative genes. The original and average ROI transcriptomes were deconvoluted using non-negative least squares deconvolution to yield resistant fate type cell proportions. These proportions were multiplied by the number of nuclei to give an estimated number of cells in each ROI and the ‘combined’ ROI with the mean transcriptome. An observed Euclidean distance between the two original ROIs is then calculated by reversing the deconvolution on each of the two original ROIs.
From the estimated cell numbers, two analyses are performed. The first analysis generates a null distribution of euclidean distances in which cells from the combined ROI are binomially distributed to two ROIs and the deconvolution is reversed to re-generate transcriptomes and calculate the euclidean distance between them. This sampling is done 10,000 times, and the null distribution is compared to the observed distance and the probability of a sampling a distance greater than the observed is calculated. Second, the estimated cell numbers in the combined ROI are partitioned in groups of 10 into either ROI 1 or ROI 2 in all possible combinations that add up to the true number of cells in each ROI ±2%. For each partition, the deconvolution is reversed and the euclidean distance is calculated between the transcriptomes. For each partition, we also calculate the mean absolute deviation away from 50:50 (equal partitioning) for all cell fate types. We compare whether the calculated euclidean distance is at least as large as the observed distance between the original ROIs, and find the mean absolute deviation away from 50:50 for the 25th percentile of distances at least as large as the observed distance.
RNA FISH of resistant colonies derived from WM989 A6-G3 and two clonal lines
WM989 A6-G3 and two subclones thereof, WM989 A6-G3 A10 and WM989 A6-G3 A11, were used to generate vemurafenib-resistant colonies. Cells were plated in glass-bottom 6-well plates and medium was changed 18 h after plating for medium with 1 μM vemurafenib (PLX4032, Selleck Chemicals, S1267). Medium was changed every 3–4 days. WM989 A6-G3 A10 and WM989 A6-G3 A11 plates were fixed after 4 weeks of drug treatment and the WM989 A6-G3 plate was fixed after 5 weeks of drug treatment. Cells were fixed, hybridized with FISH probes, and imaged as described above.
Estimation of number of cell divisions
Cell culture was initiated at 25% confluency and cells were allowed to divide until they reached 75 to 80% confluency. The WM989 A6-G3 cells were passaged a total of 10 times, and each passage followed similar confluency at initial (immediately after passage ) and final (immediately before passage ) time points. To estimate the number of divisions, we assume all cells at a given time point can divide to form two daughter cells.
where is the number of cells at 100% confluency and is the number of divisions cells undergo when cultured from 25% confluency upto 75% confluency. Therefore, cells divide approximately one to two times during a passage. For ten passages, we can estimate the dynamics to be equivalent to the population expanding threefold ten times:
where is the number of division cells undergo when cultured from 25% confluency up to 75% confluency through 10 passages. Therefore, the cells underwent approximately 16 divisions over the 10 passages for WM989 A6-G3 between initial clonal isolation and our analysis. A similar calculation for the A10 and A11 subclones, which underwent 12 passages, yielded an estimate of 19 divisions.
Phenotypic volume and Shannon’s equitability Index
To estimate the transcriptional variability of cells within a clone or within a cluster, phenotypic volume was estimated as described27. The log of phenotypic volume was quantified as the sum of all non-zero eigenvalues for the singular value decomposition of the covariance matrix for gene expression. The covariance matrix was calculated for the scaled gene expression matrix for differentially expressed genes identified by FateMap. As a randomized control, barcodes or cluster numbers for cells were shuffled and the phenotypic volume was re-calculated. Normalized phenotypic volume was estimated as the difference between the phenotypic volume for a clone or a cluster and its paired random control divided by the phenotypic volume for the paired random control.
To quantify the distribution of each clone across the UMAP clusters, Shannon’s equitability index was quantified for each clone and a paired random control (created by shuffling barcodes). Shannon’s equitability index was calculated as
where is Shannon’s equitability index, is the probability of finding cells in the th cluster and S is the number of clusters. Shannon’s equitability index ranges from 0 to 1. A value of 1 indicates that the cells in a clone are evenly spread across all clusters. A value of 0 indicates that all cells in a clone are present in a single cluster.
Extended Data
Extended Data Fig. 1 ∣. FateMap reveals between-clone fate type diversity in treatment-naive cells, albeit to a lesser degree compared to resistant cells.
a. (left) UMAP of all barcoded treatment-naive cells. Total 16,432 cells (8,420 split A and 8,012 cells in split B) are colored by clusters determined using Seurat’s FindClusters command at a resolution of 0.6 (i.e. “Seurat clusters, resolution = 0.6”). (right) On the UMAP, we recolored each cell by its expression for a select subset of genes that were identified as differentially expressed in drug resistant cells via the Seurat pipeline (Cell counts available in Supplementary Table 11). b. Five representative examples demonstrate that a clone (cells sharing the same barcode) is constrained largely in a specific transcriptional cluster such that cells within a clone are more transcriptionally similar to each other than cells in other clones. c. Average pairwise correlation between cells within a clone was estimated based on the expression levels of the top 500 most variable genes. Each point represents the average value for Spearman’s correlation coefficient for all possible pairs of cells within a clone. For each clone, a paired control was created by randomly sampling an equivalent number of cells from the entire population. Higher average correlation coefficient in clones indicates higher transcriptional similarity among cells within a clone, as compared to cells that are not clones. Wilcoxon signed rank exact test (paired, two-sided) was used to compare the difference in average correlation coefficient. d. Fraction of variance explained by the experimental data and randomized data for the top 50 principal components (PCs). The number of PCs needed to explain the actual variance in data (indicated by the dotted line) is a measure of the degrees of freedom of variability of a given dataset. There was an increase (see Extended Data Fig. 1e for statistical testing) in the number of PCs needed to explain the variance in data from resistant cells (43 PCs) as compared to naive (30 PCs) and primed cells (23 PCs), suggesting that there is an increase in overall variability in samples when cells transition to becoming drug resistant. Primed cells were identified as cells where at least 40% of pre-resistant markers identified in (Emert et al. 2021) are higher than their average expression level. e. Average number of PCs needed to explain the variance in resistant, naive and pre-resistant cells. Error bars represent standard deviation over 100 simulations of randomized data. Mann-Whitney U-Test was used to estimate a p-value for pairwise difference in means. f. Comparison of Euclidean distances between clusters across resistant and naive populations of melanoma cells for varying numbers of clusters. We used the first 50 principal components to calculate the Euclidean distance between cells across clusters. We used Wilcoxon signed rank exact test (paired, two-sided) for statistical comparisons. g. Comparison between resistant and naive populations for total number of clusters, given fixed number of cells and shared nearest neighbor (snn) resolution. We used Wilcoxon signed rank exact test (paired, two-sided) for statistical comparisons of average number of clusters across resolutions. h. UMAPs of representative twin clones (sharing the same barcode) across the two splits A and B. The twins largely end up with the same transcriptional fate. This observation suggests that cells have similar transcriptional states prior to drug treatment.
Extended Data Fig. 2 ∣. Whole genome sequencing of treatment-naive and drug resistant fate type clones.
a. We performed a pairwise hypergeometric test for variants in all clones to determine statistical significance of variant overlap between clones. This was calculated with the following parameters (M = all CADD > 15 variants, n = # Variants in Sample 1, N = # Variants in Sample 2, X = # Variants in intersection of Samples 1 & 2). P-values are plotted on the heatmap where the p-value represents the probability of observing at least as large an overlap as observed if the two clones in fact had independently randomly selected variants from the full list of CADD > 15 variants. P-values below 0.05 represent two clones that are not genetically independent. b. Heatmap of genes with deleterious variants (CADD > 15) that were present with allele frequencies between 25% and 75% in both naive and resistant clones, colored by their CADD deleteriousness score. For genes that include multiple, unique variants, the variants were collapsed into one row, where the variant with the highest CADD score was plotted for that sample. The curated gene set represents the lack of variation in (Shaffer et al. 2017; Garman et al. 2017). Differentially expressed genes from the FateMap dataset show eight genes with variation. c. The expression patterns of the eight genes from the DEG list from FateMap with heterogeneously present genetic variants, visualized on UMAP (Cell counts available in Supplementary Table 11). d. To evaluate for acquired genetic resistance to therapy in the resistant clones, we next plotted variants on a heatmap (colored by their CADD deleteriousness score) if there was a significant difference in the allele frequencies of variants between naive and resistant clones by Fisher’s exact test (P < 0.05). Variants were only included if they were not present in any naive clones. The curated gene set represents the lack of acquired variants in genes from [2017 paper genes], [FateMap DEG], [Clone Genes], [Top 500 most Variable Genes], [Known Epigenetic Modifiers]. The [“all variants with CAAD > 15”] includes all variants with CADD c-scores over 15. e. We analyzed 143 genes classified as epigenetic modifiers for deleterious variants (CADD > 15) within naive clones. The chart shows the number of genes with variants in a subset of naive clones (2 genes) and in all naive clones (10 genes). f. Heatmap of deleterious variants in epigenetic modifier genes, colored by their CADD deleteriousness score.
Extended Data Fig. 3 ∣. Isolation, longitudinal profiling and functional mapping of drug resistant clones.
a. Schematic for longitudinal tracking and profiling of drug resistant colonies. Colonies were isolated, expanded and maintained over 4 to 6 weeks. Paired initial and late samples were sequenced at a bulk-level. b. Paired initial and late samples display minimal phenotypic drift in principal component (PC) space for top 500 most variable genes. Insets show brightfield images of representative samples. c. Euclidean distance (in PC1 and PC2) measured between paired initial and late samples and equivalent number of random initial-late pairs of samples. Lower Euclidean distance in true pairs as compared to random pairs implies that paired initial and late samples are transcriptionally more similar (closer in PC space) than any pair of initial and late samples. d. Scree plot depicting cumulative variance explained by PCs. Dotted line represents that most of the variance can be explained by the first 25 PCs alone. e. Euclidean distance (in first 25 PCs) measured between paired initial and late samples and equivalent number of random initial-late pairs of samples. Lower Euclidean distance in true pairs as compared to random pairs implies that paired initial and late samples are transcriptionally more similar (closer in PC space) than any pair of initial and late samples. f. Euclidean distance measured between paired early and late samples and equivalent number of random initial-late pairs of samples. Euclidean distance was measured in PC1 and PC2 space for top 200, 500 and 1000 variable genes. g. Euclidean distance measured between paired initial and late samples and equivalent number of random initial-late pairs of samples. Euclidean distance was measured in the PC space created by the first 25 PCs for top 200, 500 and 1000 variable genes. h. Mapping of invasiveness onto the single-cell RNA sequencing dataset from FateMap by comparing genes differentially expressed between the two slowest and the two fastest invading resistant colonies (UMAP colored for similarity score). The slowest invading colonies have a high similarity score for cluster 15 (and to some extent 4 and 6), while the fastest invading colonies have a high similarity score for cluster 8 (and to some extent 1).
Extended Data Fig. 4 ∣. FateMap on BRAF and NRAS mutant melanoma cell lines reveals between-clone fate type diversity.
a. (left) For another single-cell derived melanoma cell line WM983B E9-C6, we traced representative resistant cells in Adobe Illustrator and created cartoon schematics based on visual inspection of orientation and density. (right) Brightfield images of resistant colonies exhibiting different types of morphologies. b. We applied the Uniform Manifold Approximation and Projection (UMAP) algorithm within Seurat to the first 50 principal components to visualize differences in gene expression. Cells are colored by clusters determined using Seurat’s FindClusters command at a resolution of 0.5 (i.e. “Seurat clusters, resolution = 0.5”) (13,869 and 11,249 total cells respectively for split A and B). c. On the UMAP, we recolored each cell by its expression for a select subset of genes that were identified as differentially expressed via the Seurat pipeline and marked different clusters. MLANA, which marks melanocytes, is found largely in clusters 1,3, and 5; IFIT2, which marks type-1 interferon signaling, is found largely in cluster 4; NGFR, which marks neural crest cells, is found largely in cluster 2 and 4; AXL, which is a canonical resistance marker, is found largely in cluster 5 and 7. d. Six examples to demonstrate that a clone (cells sharing the same barcode) is constrained largely in a specific transcriptional cluster such that cells within a clone are more transcriptionally similar to each other than cells in other clones. Some clones are larger in size than others, and some exist as singletons, meaning they survive vemurafenib treatment but do not necessarily divide while exposed to the drug. e. We quantified the preference for a specific cluster across all barcode clones (clone size>4). Specifically, we calculated the fraction of dominant clusters for each clone and found it to be significantly higher (Wilcoxon test, two-sided, unpaired, p-value = 1.49e-15) than that for randomly selected cells. The analysis plotted here is for a cluster resolution of 0.5. f. Painting of singletons and colonies onto the UMAP demonstrated that singletons and colonies belonged to distinct regions and clusters. g. UMAPs of representative twin clones (sharing the same barcode) across the two splits A and B. The twins largely end up with the same transcriptional fate type, invariant of the clone size. This observation suggests that cells are predestined for distinct resistant fate types upon exposure to vemurafenib. h. NRAS mutant cell line WM3623 treated with three different doses of trametinib (10 nM, 20 nM, and 40 nM). Representative brightfield images after 2.5 and 5 weeks of drug treatment are shown for each dose. i. (left) UMAP of all barcoded 3623 cell line cells treated with Trametinib. 6,397 cells are colored by clusters determined using Seurat’s FindClusters command at a resolution of 0.6 (i.e. “Seurat clusters, resolution = 0.6”). (right) On the UMAP, we recolored each cell by its expression for a select subset of genes that were identified as differentially expressed in drug resistant cells via the Seurat pipeline. j. On the UMAP, we recolored each cell by its expression for a select subset of genes that were identified as differentially expressed in drug resistant cells via the Seurat pipeline. k. Five representative examples demonstrate that a clone (cells sharing the same barcode) is constrained largely in a specific transcriptional cluster such that cells within a clone are more transcriptionally similar to each other than cells in other clones. l. UMAPs of representative twin clones (sharing the same barcode) across the two splits A (6,397 cells) and B (7,538 cells). The twins largely end up with the same transcriptional fate type. This observation suggests that drug resistant cells are derived from the same clones having similar transcriptional states and are constrained in the gene expression space. One of the clones appears to be a dominant clone and gives rise to a large fraction of sequenced cells.
Extended Data Fig. 5 ∣. FateMap on an NRAS mutant melanoma cell line reveals between-clone fate type diversity.
a. NRAS mutant cell line WM3451 treated with three different doses of trametinib (20 nM, 40 nM, and 50 nM). Representative brightfield images after 2.5 and 5 weeks of drug treatment are shown for each dose. b. (left) UMAP of all barcoded 3451 cells treated with Trametinib. 5,789 cells are colored by clusters determined using Seurat’s FindClusters command at a resolution of 0.6 (i.e. “Seurat clusters, resolution = 0.6”). (right) On the UMAP, we recolored each cell by its expression for a select subset of genes that were identified as differentially expressed in drug resistant cells via the Seurat pipeline. c. Five representative examples demonstrate that a clone (cells sharing the same barcode) is constrained largely in a specific transcriptional cluster such that cells within a clone are more transcriptionally similar to each other than cells in other clones. d. Average pairwise correlation between cells within a clone was estimated based on the expression levels of the top 500 most variable genes. Each point represents the average value for Spearman’s correlation coefficient for all possible pairs of cells within a clone. For each clone, a paired control was created by randomly sampling an equivalent number of cells from the whole population. Higher average correlation coefficient in clones indicates higher transcriptional similarity among cells within a clone, as compared to cells that are not clones. Wilcoxon signed rank test (paired, two-sided) was used to compare the difference in average correlation coefficient. e. UMAP of all barcoded WM3451 P2G7 cells treated with Trametinib. Cells are colored by whether they are a singleton (i.e. clone size = 1). f. UMAPs of representative twin clones (sharing the same barcode) across the two splits A (5,789 cells) and B (7,473 cells). The twins largely end up with the same transcriptional fate type. This observation suggests that drug resistant cells are derived from the same clones having similar transcriptional states and are constrained in the gene expression space.
Extended Data Fig. 6 ∣. FateMap on treatment-naive primary human melanocytes reveals between-clone diversity.
a. (left) UMAP of all barcoded naive primary melanocyte cells. Cells are colored by clusters determined using Seurat’s FindClusters command (“Seurat clusters, resolution = 0.6”). (right) On the UMAP, we recolored each cell by its expression for a select subset of genes that were identified as differentially expressed in drug resistant cells via the Seurat pipeline. b. Six representative examples demonstrate that a clone is constrained largely in a specific transcriptional cluster such that cells within a clone are more transcriptionally similar to each other than cells in other clones. c. Average pairwise correlation between cells within a clone was estimated based on the expression levels of the top 500 most variable genes. Each point represents the average value for Spearman’s correlation coefficient for all possible pairs of cells within a clone. For each clone, a paired control was created by randomly sampling an equivalent number of cells from the whole population. Higher average correlation coefficient in clones indicates higher transcriptional similarity among cells within a clone, as compared to cells that are not clones. Wilcoxon signed rank test (paired, two-sided) was used to compare the difference in average correlation coefficient. d. UMAP of all barcoded naive primary melanocyte cells. 2,868 cells are colored by whether they are a singleton (i.e. clone size = 1). Cluster 10 is enriched for singletons and displays high expression of S100B, a marker identified to be associated with single cell colonies by FateMap. e. UMAPs of representative twin clones (sharing the same barcode) across the two splits A (2,868 cells) and B (3,333 cells). The twins largely end up with the same transcriptional fate type. This observation suggests that primary melanocyte cells derived from the same clone have similar transcriptional states and are constrained in the gene expression space.
Extended Data Fig. 7 ∣. FateMap on a triple negative breast cancer cell line reveals between-clone fate type diversity.
a. Nuclei scans (DAPI-stained) of resistant colonies emerging from treatment of the single-cell derived triple negative breast cancer cell line MDA-MB-231-D4 with 1nM paclitaxel. b. For the MDA-MB-231-D4 cell line, we traced representative resistant cells in Adobe Illustrator and created cartoon schematics based on visual inspection of orientation and density. c. Brightfield images of resistant colonies exhibiting different types of morphologies. d. We applied the Uniform Manifold Approximation and Projection (UMAP) algorithm within Seurat to the first 50 principal components to visualize differences in gene expression. 6,535 cells are colored by clusters determined using Seurat’s FindClusters command (“Seurat clusters, resolution = 0.5”). e. We observed silencing of the transcribed barcodes in a subset of colonies, as revealed by epifluorescence imaging of the GFP signal. The colony on the left is strongly expressing a GFP signal while the colony on the right has a very dim GFP signal. f. Cells with assigned barcodes were evenly distributed throughout the UMAP with no clear bias for any specific resistant fate types. g. Four examples from split A (6,535 cells) demonstrate that a clone is constrained largely in specific UMAP regions such that cells within a clone are more transcriptionally similar to each other than cells in other clones. h. Four examples from split B (8,745 cells) demonstrate that a clone is constrained largely in specific UMAP regions such that cells within a clone are more transcriptionally similar to each other than cells in other clones. i. We quantified the preference for a specific cluster across all barcode clones (clone size>4). Specifically, we calculated the fraction of dominant clusters for each clone and found it to be significantly higher (Wilcoxon, unpaired, two-sided) than that for randomly selected cells. The analysis plotted here is for a cluster resolution of 0.5. j. We found that our UMAP had superclusters defined by cell cycle (S, G1, G2M). Of the 3,720 clonal DEGs, 63 are cell cycle genes. We therefore regressed out cell cycle genes and the cell-cycle-genes-regressed data with UMAP. k. We quantified the preference for a specific cluster across all barcode clones after cell cycle regression (clone size>4). Specifically, we calculated the fraction of dominant clusters for each clone and found it to be significantly higher (Wilcoxon, unpaired, two-sided) than that for randomly selected cells. The analysis plotted here is for a cluster resolution of 0.5. l. UMAPs of representative twin clones across the two splits A and B. The twins largely end up with the same transcriptional fate type, invariant of the clone size. This observation suggests that cells are predestined for distinct resistant fate types upon exposure to chemotherapy drug paclitaxel.
Extended Data Fig. 8 ∣. FateMap reveals differences in clonal fate type outcomes between continuous and discontinuous therapy.
a. Schematic of the experimental design where we exposed single-cell-derived WM989 A6-G3 melanoma cells to continuous and discontinuous doses of targeted therapy drug vemurafenib. b. UMAP of all barcoded cells. 17,634 cells are colored by clusters determined using Seurat’s FindClusters command (“Seurat clusters, resolution = 0.6”). c. Pellet morphology for continuous (7,238 cells) and discontinuous (10,396 cells) treatment cells. Cells derived from discontinuous dosage have a larger and darker (more pigmented) pellet. This suggests that during discontinuous dosage, melanocytic cells (which are pigmented in nature) proliferate. d. On the UMAP, we recolored each cell by its expression for a select subset of genes that were identified as differentially expressed in drug resistant cells via the Seurat pipeline. e. UMAP of all barcoded cells. Cells are colored by type of dosage. f. UMAPs of representative twin clones (sharing the same barcode) that arise during discontinuous drug treatment. The twins largely end up with the same transcriptional fate type and have varying proliferative capacities. g. In discontinuous dosage, 68% of clones having high MLANA expression (log2 Expression > 2, in at least 50% of cells in a given clone) are proliferative (i.e. have clone size > 1). In continuous dosage, only 20% of clones having high MLANA expression are proliferative. h. (left) Total number of cells analyzed consisted of 60.5% discontinuous dosage samples and 39.5% continuous dosage samples. (right) The number of unique barcodes (i.e. resistant clones) displays a 3.6 fold increase in discontinuous dosage sample as compared to the continuous dosage sample. i. UMAPs of representative twin clones across the two splits of continuous and discontinuous dosing. Some twins end up in the similar transcriptional fate type while others tend to switch fate type.
Extended Data Fig. 9 ∣. Changing the therapy type to trametinib eliminates an additional resistant fate type present in the vemurafenib treatment.
a. UMAP where the resistant cells are colored by the associated therapy drug type, with dark blue representing vemurafenib (9,457 cells) and light blue representing trametinib (8,569 cells). Arrows represent UMAP regions that are present only in vemurafenib or trametinib. b. UMAP is split by each drug type, with colors representing clusters determined using Seurat’s FindClusters command(“Seurat clusters, resolution = 0.5”). Arrows represent UMAP regions that are present only in vemurafenib or trametinib. c. Painting of singletons and colonies onto the UMAP, colored by the condition, demonstrated that singletons largely belong to vemurafenib and are present predominantly in the MLANA-high cluster. Colonies are dispersed more across the UMAP with no particular region enriched for either condition except for the NGFR-high cluster. d. Imaging of nuclei (DAPI-stained) of resistant colonies emerging from treatment of WM989 A6-G3 cells to either vemurafenib or trametinib. The number of singletons in trametinib treated cells appear to be much less than those treated with vemurafenib, consistent with the sequencing data from FateMap. e. Quantification of the total number of colonies and singletons from each drug type of imaging data across biological replicates. This analysis demonstrated that while the total number of colonies are similar across the two drug types, there is a relative increase (~2.45-fold; n = 3 biological replicates) in the number of singletons in the case of vemurafenib. Error bars represent standard error of the mean. f. UMAPs are recolored for each cell by its expression for gene MLANA, which is a marker for cluster 3 relatively enriched in vemurafenib (as shown with arrows in A and B). g. A pie chart to demonstrate that of all the clones (barcodes) present in vemurafenib-treated split in cluster 3, only 4.8% were also present in the trametinib-treated split. h. A cumulative density contour plot capturing the types of fate switches that the MLANA-high cluster 3 clones from the vemurafenib-treated split adopt in the trametinib-treated split. i. Three representative examples of UMAP regions where twins from the MLANA-high cluster 3 in the vemurafenib-treated split adopt in the trametinib-treated split. j. UMAPs are recolored for each cell by its expression for gene NGFR, which is a marker for cluster 4 relatively enriched in trametinib (as shown with arrows in A and B). k. Composition of clones of different sizes within NGFR-high cluster 4 for both trametinib- and vemurafenib-treated splits. l. A pie chart to demonstrate that of all the clones (barcodes) present in the trametinib-treated split in cluster 4, 20.7% were also present in the vemurafenib-treated split. m. A cumulative density contour plot capturing the types of fate switches that the NGFR-high cluster 4 clones from the vemurafenib-treated split adopt in the trametinib-treated split. n. Two representative examples of UMAP regions where twins from the NGFR-high cluster 4 in trametinib-treated split adopt in the vemurafenib-treated split. o. UMAP for combined vemurafenib and trametinib treatment conditions recolored for each cell by its expression of the gene VCAM1, which is enriched in cluster 6. p. Painting of singletons and colonies onto the UMAP for the NGFR-high cluster 4, colored by the condition, showing a relative enrichment of cells from trametinib as compared to vemurafenib. This panel also demonstrates that both singletons and colonies occupy cluster 4 from each of the two conditions. q. We performed antibody stainings for NGFR on colonies emerging from treatment of the same number of starting melanoma cells with either vemurafenib or trametinib. Consistent with FateMap, we found an increased number of NGFR-positive resistant cells in trametinib treated cells as compared to the vemurafenib treatment. r. UMAP split by each drug condition (trametinib (8,569 cells) or vemurafenib and trametinib (7,023 cells)), with colors representing clusters determined using Seurat’s FindClusters command at a resolution of 0.5 (i.e. “Seurat clusters, resolution = 0.5”). s. UMAP recolored for combined resistant cells from trametinib (light blue) and vemurafenib and trametinib (dark blue). The cells from two conditions are interspersed into each other on the UMAP.
Extended Data Fig. 10 ∣. Inhibition of histone methyltransferase DOT1L results in the emergence of additional resistant proliferative clones and a reduction in singletons.
a. For each barcode identified by sequencing, we plotted its abundance in corresponding splits A (DMSO control) and B (DOT1L inhibition). Those present in both control and DOT1L splits are colored in dark blue, and those present only in either A (control) and B (DOT1L) are colored in cyan. Those present in both (dark blue; 171) exhibited a strong correlation, suggesting that their ability to survive and become resistant is invariant of drug dose. For those present only in either (cyan), we found them to be much more abundant in DOT1L (B, 43 barcodes) than DMSO control (A, 7 barcodes), suggesting that new barcodes, otherwise unable to survive in the control condition, become drug-resistant in the DOT1L inhibited condition. A total of one biological replicate. b. (left) Combined resistant cells in the control (9,343 cells) and DOT1L (7,044 cells) conditions obtained from UMAP applied to the first 50 principal components. Cells are colored by clusters determined using Seurat’s FindClusters command(“Seurat clusters, resolution = 0.8”). (right) UMAP is split by each condition. c. UMAP where the resistant cells are colored by the associated condition (control vs DOT1L). The arrow represents the UMAP region present predominantly in the control region and missing from the DOT1L-associated UMAP region. d. Quantification of singletons and colonies showed that while the number of resistant colonies is higher in DOT1L, it is accompanied by a reduced number of singletons cells compared to control. e. Painting of singletons and colonies onto the UMAP, colored by the condition, demonstrated that singletons largely belong to the control condition and are present predominantly in cluster 2 (MLANA-high). Colonies are dispersed more across the UMAP with no particular region enriched for either condition. f. Imaging of the nuclei (DAPI-stained) of resistant colonies emerging from vemurafenib treatment of WM989 A6-G3 cells, either for control or cells lacking DOT1L. g. Quantification of the total number of colonies and singletons from each fate type across n = 3 biological replicates demonstrated a relative increase (3.65-fold; n = 3 biological replicates) in total colonies and reduction in total singletons in the DOT1L and control conditions, respectively. Error bars represent standard error of the mean. h. UMAP is recolored for each cell by its expression for the gene MLANA, a marker for cluster 2, which is relatively enriched in control (as shown with an arrow). i. A pie chart to demonstrate that of all the clones (barcodes) present in the control condition split, only 3.1% were also present in the DOT1L inhibitor pretreatment split. j. Two representative examples of UMAP regions where twins from the MLANA-high cluster in the control condition go in the DOT1L condition. A cumulative density contour plot capturing the types of fate switches that MLANA-high cluster clones from control adopt in the DOT1L inhibitor-pretreated condition. k. A cumulative density contour plot capturing the types of fate switches that the MLANA-high cluster 2 clones from the control condition split adopt in the DOT1L inhibitor pretreatment split. l. Distribution of cells across clusters for control (top) and DOT1L inhibitor-pretreated (bottom) conditions for clone size>2.
Supplementary Material
Acknowledgements
The authors thank S. Ramdas, C. Jiang, L. Beck, P. Burnham, L. Richman, M. Melzer, S. Reffsin, E. Torre, C. Cote, A. Cote, V. Rebecca and G. Allard for insightful discussions related to this work; the Genomics Facility at the Wistar Institute, especially S. Majumdar and S. Widura, for assistance with sequencing and single-cell partitioning and addition of 10X cell identifiers; the Flow Cytometry Core Laboratory at the Children’s Hospital of Philadelphia Research Institute for assistance with flow cytometry and fluorescence-activated cell sorting; the Histology Core at the Penn Center for Musculoskeletal Disorders for their assistance with tissue sectioning (P30 AR069619); the Center for Genetic Medicine, NUSeq Core, Feinberg Information Technology, Northwestern University Information Technology, and Quest High Performance Computing Cluster at Northwestern University Feinberg School of Medicine for their assistance; B. Singh for histotechnology expertise; I. Raman, G. Chen and C. Zhu from the Microarray Core at UT Southwestern Medical Center in Dallas, TX for their expert help with ROI selection on the GeoMx DSP and data generation; and J. Villanueva, R. Inga and L. Li from the Herlyn laboratory at the Wistar Institute for their assistance with identifying and obtaining the NRAS-mutant melanoma cell lines and the melanocyte cell line. Y.G. thanks R. J. Valadka and M. Temkin for their prompt support in setting up the wet and dry lab space at Northwestern University. M.E.F. and A.T.W. thank the Core Facilities of the Johns Hopkins Kimmel Cancer Center, P30CA00697356. Y.G. acknowledges support from Northwestern University’s startup funds, Research Catalyst Program from the McCormick School of Engineering at Northwestern University, Cancer Research Foundation Young Investigator Award, the Burroughs Wellcome Fund Career Awards at the Scientific Interface, the Jane Coffin Childs Memorial Fund, and the Schmidt Science Fellowship. G.T.B. acknowledges support from NSF GRFP DGE-1845298. M.P. was supported by grants to Y.G. including the Burroughs Wellcome Fund Career Awards at the Scientific Interface and Northwestern University’s startup funds. R.H.B. acknowledges support from NIH T32 HG000046 and T32 GM007170. E.I.G. acknowledges support from the NSF NRT 2021900-Synthesizing Biology Across Scales. B.E. acknowledges support from NIH F30 CA236129, NIH T32 GM007170 and NIH T32 HG000046. A.K. acknowledges support from NIH K00-CA-212437-02. N.J. acknowledges support from NIH F30 HD103378. N.B. acknowledges support from NIH T32 GM144295. I.A.M. acknowledges support from NIH F30 NS100595. K.K. acknowledges support from NIH T32 GM008216. P.T.R. acknowledges support by NIH Medical Scientist Training Program T32 GM07170. D.F. and M.H. acknowledge support from NIH grants RO1 CA238237, U54 CA224070, PO1 CA114046, P50 CA174523 and the Dr. Miriam and Sheldon G. Adelson Medical Research Foundation. S.A. was supported by grants to Y.G. including the Burroughs Wellcome Fund Career Awards at the Scientific Interface, Northwestern University’s startup funds, and Research Catalyst Grant from McCormick School of Engineering. C.C. was supported by grants to Y.G. including the Burroughs Wellcome Fund Career Awards at the Scientific Interface and Northwestern University’s startup funds. M.E.F. and A.T.W. acknowledge support from R01CA174746 and R01CA207935. A.T.W. is also supported by a Team Science Award from the Melanoma Research Alliance and P01 CA114046. A.J.L. and J.A.W. are supported by the NCI Melanoma SPORE (P50CA221703), MD Anderson Melanoma Moonshot, TRANSCEND Cancer Initiative and Platform for Innovative Microbiome and Translational Research (PRIME-TR). A.R. acknowledges support from NIH Director’s Transformative Research Award R01 GM137425, NIH R01 CA238237, NIH R01 CA232256, NIH P30 CA016520, NIH SPORE P50 CA174523 and NIH U01 CA227550.
Footnotes
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-023-06342-8.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Competing interests A.R. receives royalties related to Stellaris RNA FISH probes. Y.G. received consultancy fees from the Schmidt Science Fellows and the Rhodes Trust. A.J.L. reports financial relationships with AbbVie, Adaptimmune, AstraZeneca, Bain Capital, Bayer, Bio-AI Health, BMS, Caris, Deciphera, Foghorn Therapeutics, Gothams, GSK, Illumina, Invitae/Archer DX, Iterion Therapeutics, Merck, Novartis, Nucleai, OncoKB (MSKCC), Pfizer, Regeneron, Roche/Genentech, SpringWorks, Tempus and Thermo Fisher. J.A.W. is an inventor on US patent application no. PCT/US17/53.717 submitted by the University of Texas MD Anderson Cancer Center, which covers methods to enhance immune checkpoint blockade responses by modulating the microbiome. J.A.W. reports compensation for the speaker’s bureau and honoraria from Imedex, Dava Oncology, Omniprex, Illumina, Gilead, PeerView, Physician Education Resource, MedImmune, Exelixis and Bristol Myers Squibb; and has served as a consultant and/or advisory board member for Roche/Genentech, Novartis, AstraZeneca, GlaxoSmithKline, Bristol Myers Squibb, Micronoma, OSE therapeutics, Merck and Everimmune. J.A.W. receives stock options from Micronoma and OSE therapeutics. All other authors declare no competing interests.
Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41586-023-06342-8.
Data availability
All raw and processed imaging data generated in this study are on BioStudies (accession S-BIAD696). All raw and processed FateMap single-cell barcoding data generated in this study are at Figshare (https://doi.org/10.6084/m9.figshare.22798952 and https://doi.org/10.6084/m9.figshare.22802888). All raw and processed gDNA barcoding data generated in this study are at Figshare (https://doi.org/10.6084/m9.figshare.22806494). All raw and processed WGS data for individual clones used in this manuscript are at BioProject accession PRJNA972638 and Figshare (https://doi.org/10.6084/m9.figshare.23255273), respectively. All raw and processed bulk RNA-seq data used in this manuscript are at the Gene Expression Omnibus accession GSE233622. All raw and processed scRNA-seq data used in this manuscript are at Gene Expression Omnibus accession GSE233766. All raw and processed data from the GeoMx spatial transcriptomics used in this manuscript can be found at BioProject Accession PRJNA976929 and Figshare (https://doi.org/10.6084/m9.figshare.23248199), respectively.
Code availability
All code used in this study is available at Zenodo (https://doi.org/10.5281/zenodo.8000328).
References
- 1.Shaffer SM et al. Rare cell variability and drug-induced reprogramming as a mode of cancer drug resistance. Nature 546, 431–435 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rambow F et al. Toward minimal residual disease-directed therapy in melanoma. Cell 174, 843–855.e19 (2018). [DOI] [PubMed] [Google Scholar]
- 3.Schuh L et al. Gene networks with transcriptional bursting recapitulate rare transient coordinated high expression states in cancer. Cell Syst. 10, 363–378.e12 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Roesch A et al. A temporarily distinct subpopulation of slow-cycling melanoma cells is required for continuous tumor growth. Cell 141, 583–594 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sharma SV et al. A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell 141, 69–80 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gupta PB et al. Stochastic state transitions give rise to phenotypic equilibrium in populations of cancer cells. Cell 146, 633–644 (2011). [DOI] [PubMed] [Google Scholar]
- 7.Shaffer SM et al. Memory sequencing reveals heritable single-cell gene expression programs associated with distinct cellular behaviors. Cell 182, 947–959.e17 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Emert BL et al. Variability within rare cell states enables multiple paths toward drug resistance. Nat. Biotechnol 39, 865–876 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Su Y et al. Single-cell analysis resolves the cell state transition and signaling dynamics associated with melanoma drug-induced resistance. Proc. Natl Acad. Sci. USA 114, 13679–13684 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Spencer SL, Gaudet S, Albeck JG, Burke JM & Sorger PK Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature 459, 428–432 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Elowitz MB, Levine AJ, Siggia ED & Swain PS Stochastic gene expression in a single cell. Science 297, 1183–1186 (2002). [DOI] [PubMed] [Google Scholar]
- 12.Raj A, Peskin CS, Tranchina D, Vargas DY & Tyagi S Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kinker GS et al. Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity. Nat. Genet 52, 1208–1218 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rodriguez J et al. Intrinsic dynamics of a human gene reveal the basis of expression heterogeneity. Cell 176, 213–226.e18 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Symmons O & Raj A What’s luck got to do with it: single cells, multiple fates, and biological nondeterminism. Mol. Cell 62, 788–802 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bhang H-EC et al. Studying clonal dynamics in response to cancer therapy using high-complexity barcoding. Nat. Med 21, 440–448 (2015). [DOI] [PubMed] [Google Scholar]
- 17.Biddy BA et al. Single-cell mapping of lineage and identity in direct reprogramming. Nature 564, 219–224 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Weinreb C, Rodriguez-Fraticelli A, Camargo F & Klein AM Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science 367, eaaw3381 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Gutierrez C et al. Multifunctional barcoding with ClonMapper enables high-resolution study of clonal dynamics during tumor evolution and treatment. Nat. Cancer 2, 758–772 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Oren Y et al. Cycling cancer persister cells arise from lineages with distinct programs. Nature 596, 576–582 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Frieda KL et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Umkehrer C et al. Isolating live cell clones from barcoded populations using CRISPRa-inducible reporters. Nat. Biotechnol 39, 174–178 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Tian L et al. Clonal multi-omics reveals Bcor as a negative regulator of emergency dendritic cell development. Immunity 54, 1338–1351.e9 (2021). [DOI] [PubMed] [Google Scholar]
- 24.Leighton J, Hu M, Sei E, Meric-Bernstam F & Navin NE Reconstructing mutational lineages in breast cancer by multi-patient-targeted single-cell DNA sequencing. Cell Genomics 3, 100215 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Rodriguez-Fraticelli AE et al. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature 583, 585–589 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Pillai M, Hojel E, Jolly MK & Goyal Y Unraveling non-genetic heterogeneity in cancer with dynamical models and computational tools. Nat. Comput. Sci 3, 301–313 (2023). [DOI] [PubMed] [Google Scholar]
- 27.Fennell KA et al. Non-genetic determinants of malignant clonal fitness at single-cell resolution. Nature 601, 125–131 (2022). [DOI] [PubMed] [Google Scholar]
- 28.Marin-Bejar O et al. Evolutionary predictability of genetic versus nongenetic resistance to anticancer drugs in melanoma. Cancer Cell 39, 1135–1149.e8 (2021). [DOI] [PubMed] [Google Scholar]
- 29.Dardani I et al. ClampFISH 2.0 enables rapid, scalable amplified RNA detection in situ. Nat. Methods 19, 1403–1410 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ramirez M et al. Diverse drug-resistance mechanisms can emerge from drug-tolerant cancer persister cells. Nat. Commun 7, 10690 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tirosh I et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Krepler C et al. A comprehensive patient-derived xenograft collection representing the heterogeneity of melanoma. Cell Rep. 21, 1953–1967 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Jiang CL et al. Cell type determination for cardiac differentiation occurs soon after seeding of human-induced pluripotent stem cells. Genome Biol. 23, 90 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Mold JE et al. Clonally heritable gene expression imparts a layer of diversity within cell types. Preprint at bioRxiv 10.1101/2022.02.14.480352 (2022). [DOI] [PubMed] [Google Scholar]
- 35.Torre EA et al. Genetic screening for single-cell variability modulators driving therapy resistance. Nat. Genet 53, 76–85 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Das Thakur M et al. Modelling vemurafenib resistance in melanoma reveals a strategy to forestall drug resistance. Nature 494, 251–255 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Algazi AP et al. Continuous versus intermittent BRAF and MEK inhibition in patients with BRAF-mutated melanoma: a randomized phase 2 trial. Nat. Med 26, 1564–1568 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Richman LP, Goyal Y, Jiang CL & Raj A ClonoCluster: a method for using clonal origin to inform transcriptome clustering. Cell Genomics 3, 100247 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kuiken HJ et al. Clonal populations of a human TNBC model display significant functional heterogeneity and divergent growth dynamics in distinct contexts. Oncogene 41, 112–124 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ben-David U et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Meir Z, Mukamel Z, Chomsky E, Lifshitz A & Tanay A Single-cell analysis of clonal maintenance of transcriptional and epigenetic states in cancer cells. Nat. Genet 52, 709–718 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kröger C et al. Acquisition of a hybrid E/M state is essential for tumorigenicity of basal breast cancer cells. Proc. Natl Acad. Sci. USA 116, 7353–7362 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Naffar-Abu Amara S et al. Transient commensal clonal interactions can drive tumor metastasis. Nat. Commun 11, 5799 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Pour M et al. Epigenetic predisposition to reprogramming fates in somatic cells. EMBO Rep. 16, 370–378 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Jain N et al. Retrospective identification of intrinsic factors that mark pluripotency potential in rare somatic cells. Preprint at bioRxiv 10.1101/2023.02.10.527870 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Aibar S et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Subramanian A et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Peidli S et al. scPerturb: Harmonized Single-Cell Perturbation Data. Preprint at bioRxiv 10.1101/2022.08.20.504663 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Replogle JM et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575.e28 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Székely GJ & Rizzo ML Energy statistics: A class of statistics based on distances. J. Stat. Plan. Inference 143, 1249–1272 (2013). [Google Scholar]
- 51.Garcia M et al. Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants. F1000Res. 9, 63 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Poplin R et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv 10.1101/201178 (2018). [DOI] [Google Scholar]
- 53.Pagel KA et al. Integrated informatics analysis of cancer-related variants. JCO Clin. Cancer Inform 4, 310–317 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Rentzsch P, Witten D, Cooper GM, Shendure J & Kircher M CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Singh Nanda J, Kumar R & Raghava GPS dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes. Sci. Rep 6, 19340 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Amaria RN et al. Neoadjuvant plus adjuvant dabrafenib and trametinib versus standard of care in patients with high-risk, surgically resectable melanoma: a single-centre, open-label, randomised, phase 2 trial. Lancet Oncol. 19, 181–193 (2018). [DOI] [PubMed] [Google Scholar]
- 57.Raj A, van den Bogaard P, Rifkin SA, van Oudenaarden A & Tyagi S Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Hie B, Bryson B & Berger B Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol 37, 685–691 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Hafemeister C & Satija R Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Mellis IA et al. Responsiveness to perturbations is a hallmark of transcription factors that maintain cell identity in vitro. Cell Syst. 12, 885–899.e8 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Dobin A et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kaur A et al. sFRP2 in the aged microenvironment drives melanoma metastasis and therapy resistance. Nature 532, 250–254 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Zorita E, Cuscó P & Filion GJ Starcode: sequence clustering based on all-pairs search. Bioinformatics 31, 1913–1919 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Yunusova AM, Fishman VS, Vasiliev GV & Battulin NR Deterministic versus stochastic model of reprogramming: new evidence from cellular barcoding technique. Open Biol. 7, 160311 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Symmons O et al. Allele-specific RNA imaging shows that allelic imbalances can arise in tissues through transcriptional bursting. PLoS Genet. 15, e1007874 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Kaur A et al. Remodeling of the collagen matrix in aging skin promotes melanoma metastasis and affects immune cell motility. Cancer Discov. 9, 64–81 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Stringer C, Wang T, Michaelos M & Pachitariu M Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All raw and processed imaging data generated in this study are on BioStudies (accession S-BIAD696). All raw and processed FateMap single-cell barcoding data generated in this study are at Figshare (https://doi.org/10.6084/m9.figshare.22798952 and https://doi.org/10.6084/m9.figshare.22802888). All raw and processed gDNA barcoding data generated in this study are at Figshare (https://doi.org/10.6084/m9.figshare.22806494). All raw and processed WGS data for individual clones used in this manuscript are at BioProject accession PRJNA972638 and Figshare (https://doi.org/10.6084/m9.figshare.23255273), respectively. All raw and processed bulk RNA-seq data used in this manuscript are at the Gene Expression Omnibus accession GSE233622. All raw and processed scRNA-seq data used in this manuscript are at Gene Expression Omnibus accession GSE233766. All raw and processed data from the GeoMx spatial transcriptomics used in this manuscript can be found at BioProject Accession PRJNA976929 and Figshare (https://doi.org/10.6084/m9.figshare.23248199), respectively.
All code used in this study is available at Zenodo (https://doi.org/10.5281/zenodo.8000328).