(A) tSNE of the SW480 cell line indicating an additional subpopulation in the EpCAMhi population (left panel). Using a signature derived from the bulk RNAseq, this population was identified as the ‘sphere’ population (middle panel) and annotated to be excluded for further analysis (right panel). (B) Left panel: the SW480 cell line, after exclusion of the ‘sphere’ population, contains slightly higher variability compared to the HCT116 cell line, as evidenced by the variance of the top 50 principal components. Right panel: while in HCT116 most of the variable expressed genes are differentially expressed between the EpCAMhi and EpCAMlo population, this is not the case in SW480, where most of the highly variable genes do not differ between the two populations. (C) Top panels: expression values of VIM, ZEB1, and CD44 on the UMAP embedding of the HCT116 cell line. Lower panels: projection of the RNA velocity direction of the same genes. (D) HCT116 UMAP embedding annotated with the eight unsupervised clusters. (E) Heatmap of HCT116 with expression values of the epithelial to mesenchymal transition (EMT) signature averaged by the eight clusters. Clusters were ranked according to their EMT score, and genes were clustered in four distinct gene sets using k-means clustering. (F) Schematic diagram showing a transcriptional trajectory with distinct gene arrays through which pEMT cells arise.