SUMMARY
We propose that the teratoma, a recognized standard for validating pluripotency in stem cells, could be a promising platform for studying human developmental processes. Performing single cell RNA-seq of 179,632 cells across 23 teratomas from 4 cell lines, we found teratomas reproducibly contain approximately 20 cell types across all 3 germ layers, the inter-teratoma cell type heterogeneity was comparable to organoid systems, and that the teratoma gut and brain cell types correspond well to similar fetal cell types. Cellular barcoding confirmed that injected stem cells robustly engraft and contribute to all lineages. Using pooled CRISPR-Cas9 knockout screens, we showed that teratomas can simultaneously assay the effects of genetic perturbations across all germ layers. Additionally, we demonstrated teratomas can be molecularly sculpted via miRNA-regulated suicide gene expression to enrich for specific tissues. Taken together, the teratoma is a promising platform for modeling multi-lineage development, pan-tissue functional genetic screening, and tissue engineering.
In Brief
The teratoma is characterized as a model for multi-lineage human development with cell types represented across all 3 germ layers; is utilized to enable assaying of the effects of genetic perturbations simultaneously across multiple cell types; and a molecular sculpting strategy is presented to enrich for specific tissues.
eTOC/One Sentence Summary:
The teratoma is characterized as a model for multi-lineage human development with cell types represented across all 3 germ layers; is utilized to enable assaying of the effects of genetic perturbations simultaneously across multiple cell types; and a molecular sculpting strategy is presented to enrich for specific tissues.
Graphical Abstract

INTRODUCTION
Current understanding of early human development heavily relies on inference from animal models. Model systems such as frogs (Vastag et al., 2011), fish (Farrell et al., 2018), and mice (Cao et al., 2019; Pijuan-Sala et al., 2019) have demonstrated that many features of early embryogenesis are evolutionarily conserved across species (Lin et al., 2009; Peter and Davidson, 2011; Royo et al., 2011). However, several aspects of development are highly species-specific, especially in neural development (Raff, 1996; Richardson et al., 1997; Richard et al., 2000; Hodge et al., 2019) While there have been studies on human embryonic development (Miller et al., 2014; Zhu et al., 2018), such studies are limited by a scarcity of relevant biological material and key ethical constraints. There has thus been a push to establish models specific to human development.
Human pluripotent stem cells (PSCs), such as embryonic stem cells (ESCs) or induced pluripotent stem cells (iPSCs), have been used as developmental models by directing differentiation of ESCs or iPSCs into various cell types. These studies have shed light on processes such as lineage bifurcation (Yao et al., 2017) and heterogeneity (Wang et al., 2017) during human neuronal development, as well as the presence of discrete cell states during early ESC differentiation (Jang et al., 2017). Additionally, perturbation screens in these cell culture models have looked at the key regulators of differentiation (Parekh et al., 2018) and reprogramming (Tsunemoto et al., 2018). However, true human development takes place in 3-dimensions, which is difficult to capture with a 2-dimensional monolayer (Brown, Quadrato and Arlotta, 2018; Liu et al., 2018).
Newer methods for modeling human development use organoid systems. Organoids are 3D “mini-organs” derived from PSCs in which the cells spontaneously self-assemble into differentiated, functional cell types which mimic their in vivo counterparts structurally and functionally (Huch and Koo, 2015; Clevers, 2016; Yin et al., 2016; Dutta, Heo and Clevers, 2017; Fligor et al., 2018; Capowski et al., 2019; Collin et al., 2019). The use of organoids has enabled researchers to model human specific development in a 3D context, which is especially beneficial for modeling rare genetic diseases or cancers (Dekkers et al., 2013; Bigorgne et al., 2014; Gao et al., 2014; Bartfeld et al., 2015; Boj et al., 2015; Huch and Koo, 2015; van de Wetering et al., 2015; Yin et al., 2016). However, tissue types derived from organoids may be immature (Chambers, Tchieu and Studer, 2013; Aurora and Spence, 2016) and limited in thickness and scale due to the absence of abundant vasculature. Additionally, most organoid models can only generate a single or few developmental lineages (Yin et al., 2016; Jabaudon and Lancaster, 2018; Sato et al., 2009, 2011; Jung et al., 2011; Yin et al., 2016). In this regard, gastruloids, which model early anteroposterior organization, can recapitulate all germ layers, but they are unable to model later stages of development (Moris et al., 2020).
We propose here the use of teratomas as a model for studying human development (Lensch et al., 2007). The teratoma displays multi-lineage differentiation to all germ layers, has vascularized 3D structure, bears regions of complex tissue-like organization, and is relatively straightforward to implement. Early teratoma research revealed that teratomas derive from pluripotent germ cells which resemble embryonic cells (Stevens, 1962, 1967; THURLBECK, WLLIAM M., 1973; Stevens and Pierce., 1975). PSC-derived teratomas are generated by directly injecting PSCs into immunodeficient mice, where the cells will attach and differentiate in a semi-random fashion into all three germ layers (Willis, 1934, 1935; THURLBECK, WLLIAM M., 1973; Bocker, 2002). In this regard, teratoma formation is the gold standard to validate pluripotency and developmental potential of hPSC lines (Smith, Luong and Stein, 2009; Avior, Biancotti and Benvenisty, 2015).
There has also been some progress in utilizing the inherent differentiation potential of teratomas to derive highly sought-after cell types. For instance, teratomas were recently utilized to derive skeletal myogenic progenitors by injecting PSCs into the tibialis anterior muscle of mice to enrich for muscle cell types in the teratomas that formed in those muscles (Chan et al., 2018). Additionally, some groups have successfully enriched for hematopoietic stem cells (HSCs) from teratomas utilizing strategies such as human umbilical vein endothelial cell (HUVEC) pooling (Suzuki et al., 2013; Tsukada et al., 2017; Philipp et al., 2018; Amabile et al., 2019). However, the semi-random nature of teratoma development has previously made characterization of teratomas difficult, as the different lineages can often be found in close spatial proximity.
We hypothesized that the advent of high-throughput single cell gene expression profiling via droplet based methods (Klein et al., 2015; Macosko et al., 2015; Cao et al., 2017; Rosenberg et al., 2017, 2018; Zheng et al., 2017; Ding et al., 2019), and simple genetic perturbation toolsets such as CRISPR-Cas9 could enable us to address this challenge by enabling systematic analysis and perturbation of teratomas at the single cell level (Qi et al., 2013; Adamson et al., 2016; Black et al., 2016; Dixit, Parnas, Li, Weissman, et al., 2016; Chen and Qi, 2017; Datlinger et al., 2017; Akcakaya et al., 2018; Dijk et al., 2018). Coupled with histology, and RNA in situ hybridization, we established a comprehensive experimental and computational framework to systematically analyze, perturb and modulate human PSC-derived teratomas to evaluate their potential for modeling human development and lineage engineering.
RESULTS
Teratoma Characterization
We first characterized the teratoma to better understand its growth kinetics, constituent cell types, and spatial organization. Towards this we generated 7 teratomas using H1 ESCs, identified cell types using single cell RNA-seq, and validated these cell types and assessed their spatial organization with histology and RNA FISH. To generate a teratoma, we made a subcutaneous injection of 5–10 million hESCs into Rag2−/−;γc−/− immunodeficient mice (Figure 1A, Methods). Kinetic trajectories show that it takes an average of around 37 days until we can begin to outwardly see and measure tumor size. We grew the teratomas for up to 70 days until the tumors were of a sufficient size for extraction and downstream analyses (~820 mm2, Figure 1B). Post-extraction, tumors were weighed, inspected, and sectioned (Figure 1C, Methods). We used histology to validate the presence of all 3 germ layers (ectoderm, mesoderm, endoderm) to confirm pluripotency (Figure 1D, Methods). An independent histology analysis also revealed structures such as developing airways, retinal pigment epithelium and neurons, fetal cartilage and bone, muscle, vasculature, GI tract, connective tissue, adipocytes and neuroectoderm (Figure S1A). Remaining tissue was dissociated for single cell RNA sequencing with the droplet-based 10X Genomics Chromium platform (Zheng et al., 2017).
Figure 1. Comprehensive teratoma characterization.
(A) Schematic of general workflow. Subcutaneous injection of H1 PSCs in a slurry of Matrigel® and embryonic stem cell medium was made in the right flank of Rag2−/−;γc−/− immunodeficient mice. Weekly monitoring of teratoma growth was quantified by approximating elliptical area (mm2). Tumors were then extracted after 8–10 wks of growth and observed for external heterogeneity before small sections were frozen for H&E staining. The remaining tumor was dissociated into a single cell suspension via standard GentleMACS protocols. Single cell suspension was used for scRNA-seq (10x Genomics). (B) Growth kinetics of four H1 teratomas. (C) Images of four teratomas generated from H1 cells. (D) H&E stains of the four teratoma histology sections. The presence of ectoderm, mesoderm, and endoderm confirmed for pluripotency and developmental potential. (E) UMAP visualization of cell types identified from single cell RNA-sequencing of the seven H1 teratomas. Dotted lines separate the cell types originating from each of the 3 germ layers.
To analyze the resulting sequencing data, we generated single cell gene expression matrices across the 7 teratomas for both human and mouse cells using the CellRanger (Zheng et al., 2017) pipeline from 10X Genomics (Methods, Figure 1A, Table S1A). We removed any teratoma specific batch effects by using the Seurat data integration pipeline (Stuart et al., 2018), and then clustered the cells using Louvain clustering (Houle et al., 2010). We generated a rough biological annotation of the clusters using a k-nearest neighbors classifier trained on the Mouse Cell Atlas, and refined the cluster annotations manually using canonical cell type markers (Han et al., 2018; Stuart et al., 2018) (Table S2A – E). We sub-clustered a cell type expressing ciliated epithelial markers with divergent expression of Airway and Retinal markers and identified Airway Epithelium, Retinal Epithelium, and erythrocytes (Table S2F. We then visualized both the human and mouse cells with a Uniform Manifold Approximation and Projection (UMAP) (Becht et al., 2018) scatterplot (Figure 1E). In the human cells, we identified 23 putative cell types across all three germ layers, including endodermal cell types (gut epithelium), ectodermal cell types (early neurons), and an abundance of mesoderm-like cell types that expressed Mesenchymal Stem Cell (MSC)/Fibroblast markers, most notably the canonical MSC marker THY1 (An et al., 2018) (Figure 1E, Figure S1B, Table S3A, S3B). We annotated these putative MSC/Fib cell types as Adipogenic (ITM2A, SHOX2), Chondrogenic (COL2A1, SOX9), MyoFibroblasts (COL15A1), or Cycling (HMGB2) (Table 1, Table S3C). We visualized the expression of canonical marker genes for each cell type to assess the robustness of our preliminary cell type annotations (Table 1, Figure S1C, Table S3C, Methods).
Table 1:
Summary of Cell Type Validations
| Germ Layer | Broad Cell Type (used for CRISPR screen & miRNA analysis) | Cell Type | Cells (H1 & cell line teratomas) | Minimal Marker Set | RNA FISH marker validation | Identified in histology analysis | Mapped to fetal human data |
|---|---|---|---|---|---|---|---|
| Ecto | Neural Prog | Radial Glia | 2579 | SOX2, HES5 | HES5 | Yes | |
| CycProg (Cycling Neural Prog) | 1619 | SOX2, HMGB2 | Yes | ||||
| Neurons | Early Neurons | 6010 | DCX, MAP2 | DCX | Yes | ||
| Retinal Neurons | 493 | OTX2, NRL | Yes | ||||
| Retinal Epi | Retinal Epi | 7238 | OTX2, MITF, FOXJ1 | Yes | |||
| Schwann Cell Prog (SCP) | Schwann Cells | 174 | MPZ | ||||
| Melanoblasts | 200 | MITF, SOX10, MLANA | |||||
| Endo | Foregut Epi | Foregut Epi | 584 | ELF3, PAX9, KRT4 | Yes | ||
| Airway Epi | 76 | FOXJ1, CDHR3 | FOXJ1 | Yes | |||
| Mid/Hindgut Epi | Mid/Hindgut Epi | 1742 | ELF3, CDX2 | CDX2 | Yes | ||
| Meso | Hematopoietic | Immune | 1490 | CD74 | |||
| HSC | 140 | CD34, HHEX | |||||
| Erythrocyte | 834 | GYPA | |||||
| MSC/Fib | Adipogenic MSC/Fib | 6487 | THY1, ITM2A, SHOX2 | Yes | |||
| Chondrogenic MSC/Fib | 587 | THY1, COL2A1, SOX9 | |||||
| MSC/Fib | 8046 | THY1, COL14A1 | THY1 | ||||
| Cycling MSC/Fib | 4010 | THY1, HMGB2 | Yes | ||||
| MyoFib | 3329 | THY1, COL15A1 | |||||
| Muscle | Muscle Prog | 1276 | MYOD1, PAX7 | Yes | |||
| Cardiac/ Skeletal Muscle | 528 | MYOD1, TNNI1, TNNT2 | TNNT2 | Yes | |||
| Pericytes | Pericytes | 1053 | FOXC1, CYP1B1 | ||||
| Smooth Muscle | Smooth Muscle | 550 | ACTA2, RGS5 | Yes | |||
| Kidney Prog | 153 | WT1 |
We further validated the cell type annotations by correlating the expression of each teratoma cell type with the expression of cell types from the Mouse Organogenesis Cell Atlas (Cao et al., 2019), demonstrating that each teratoma cell type generally correlates with at least one fetal mouse cell type (Figure S1D). While most of the teratoma cell types correlate to the expected mouse cell type, there are some discrepancies that may be due to differences in developmental stage, mouse/human specific expression, as well as the fact that a broad correlation analysis may not be able to distinguish closely related cell types (Figure S1D). For example, Hematopoietic Stem Cells (HSCs) from the teratoma correlate with fetal mouse endothelial cells, reflecting the endothelial origin of HSCs (Zovein et al., 2008). The MSC/Fib subtypes, as well as Pericytes, all broadly correlate to the same block of mesenchymal fetal mouse cell types which reflects their similar developmental origins (Cathery et al., 2018). Retinal Pigment Epithelia are a type of Ependymal Cell, and thus correlate accordingly (Wolburg et al., 2009). Melanoblasts and Retinal Neurons are also both derived from the neural crest and may share some marker genes such as MITF, although they are not as closely related as the other cell type correlations discussed previously (Goding, 2000; Mort et al., 2015). And finally, Kidney Progenitors do not correlate well with any fetal mouse cell type, although there were no Kidney cell types in the fetal mouse data at the level of annotation we used (Figure S1D).
Overall, we used canonical marker genes and mouse cell atlases to generate a preliminary annotation of the cell types found in the teratoma scRNA-seq datasets. We provide a summary table of the key marker genes, and the experimental and computational validations performed on each cell type in Table 1. In the mouse cells, we primarily observed invading immune cells, endothelial cells, and stromal cells (Figure S1E).
Assaying Teratoma Heterogeneity
Assessing heterogeneity between teratomas (especially between teratomas generated from different stem cell lines) is critical for assessing the reproducibility and utility of this model. Towards this, we generated additional teratomas (per Figure 1A) with H9 ESCs, HUES62 ESCs, and PGP1 iPSCs, and assessed the cell type composition of the teratomas (Figure 2A, Table S1B). We ran 10X sequencing on each teratoma, integrated the expression profiles, classified cell types using the H1 teratomas as reference, and visualized the cell types with aUMAP scatterplot (Figure 2B) while also showing the relative contribution of each cell line teratoma to the UMAP embedding (Figure S2A). We also assessed the distribution of cell types represented in each individual H1 teratoma alongside the H9, HUES62, and PGP1 teratomas (Figure 2C, Figure S2B). We then compared the germ layer representation between all teratomas using zebrafish and Mouse Organogenesis Cell Atlas single-cell datasets for reference (Wagner et al., 2018; Pijuan-Sala et al., 2019) (Figure 2D). Teratomas are comprised mostly of mesoderm and neuroectoderm, with less endoderm (Figure 2D). The mesoderm is primarily from MSC/Fibroblasts in H1 teratomas, while teratomas from different cell lines show more variability in terms of the MSC/Fibroblast fraction (Figure 2D, Figure S1B). The relatively low fraction of endoderm in both the teratomas as well as the zebrafish and mouse embryo models indicate that endoderm is prevalent during development (Figure 2D). Qualitatively, while there is variability in cell type representation among the different teratomas, every teratoma contains most of the major cell types (Figure 2C). By computing the scaled mutual information between cell type assignments and teratoma assignments, we can compute a quantitative metric of this heterogeneity across teratomas (Figure 2E) (Kim et al., 2016). We find that the cell type heterogeneity across the H1 teratomas is similar to that of patterned brain organoids (Velasco et al., 2019), while the teratomas generated from different cell lines have a much higher level of heterogeneity (Figure 2E). Interestingly, line-specific kinetics were present in regard to teratoma growth with PGP1 teratomas growing the fastest and HUES62 the slowest (Figure S2C). Some of this accelerated growth may be due to chromosomal abnormalities as karyotyping has shown the PGP1 line has material translocated to 7q34 (BRAF) (Figure S2D).
Figure 2. Assaying teratoma heterogeneity.
(A) Schematic portraying generation of teratomas from multiple cell lines and process for identifying how lines contribute to cell types. (B) UMAP scatterplot of all cell types present across 3 PSC lines (H9, HUES62, and PGP1) (C) Distribution of cell types represented in each individual teratoma (D) Distribution of germ layer representation in each individual teratoma (along with zebrafish and mouse comparison). (E) The Normalized Entropy represents how well cell type assignments are mixed with teratoma/organoid/cell line identities. A higher Normalized Entropy implies less cell type variation between teratomas/organoids/cell lines. The Cell Line Teratomas include one teratoma generated from each of HUES62, H9, and PGP1 lines. (F) H1 cells were uniquely barcoded at low MOI with lentiviral vectors before teratoma formation. The barcodes were counted and assessed for lineage/cell type priming of cells. (G) Number of unique barcodes detected in each cell type plotted alongside the cell type bias for specific barcodes (computed using the KL divergence of cell type identities with barcode identities scaled by the number of cells in each cell type). (H) Teratoma bias for each cell type plotted against barcode bias.
Another key question in teratoma formation is how many cells engraft after stem cell injection. To determine this, for 3 out of the 7 H1 ESC teratomas, prior to PSC injection, cells were transduced with an integrating lentiviral ORF barcode that can be detected by scRNA-seq (Guo et al., 2018) (Figure 2F, Figure S2E). With this barcoding scheme, cells can be individually labeled prior to teratoma formation and their descendants can be captured after formation via scRNA-seq. Transduced PSCs were evenly split: half for teratoma formation and half were frozen down for DNA sequencing. By comparing unique barcodes extracted from genomic DNA in these two cell populations we can calculate the proportion of cells that engraft. Results showed that across the three teratomas, over 25% of cells engraft, out of a total of 10 million injected cells, which suggests that no major bottlenecking occurs during teratoma formation (Figure S2F). This is especially important in the context of using teratomas in high-throughput genetic screens, as one must ensure that there are enough cells contributing to the final tumor so that none of the elements of the genetic screen are lost.
We next tracked barcodes in individual cells by amplifying the expressed barcode from the scRNA-seq library. Since cells from the teratoma with the same barcode originated from the same PSC, we were able to track whether certain PSCs were primed to develop into certain lineages. For each cell type, we computed a barcode bias score, which reflects the level to which barcodes tend to be enriched or depleted in that cell type and plotted this barcode bias, alongside the total number of barcodes detected in each cell type (Figure 2G, Methods). We also computed a teratoma bias score for each cell type, which reflects how much the proportion of that cell type varies across teratomas and plotted the correlation of the teratoma bias score with the barcode bias score (Figure 2H, Methods). We found that retinal epithelium is an outlier with both a high teratoma bias, and a high barcode bias (Figure 2H). Myofibroblast cells also have a relatively high barcode and teratoma bias score while Early Neurons, Radial Glia, Mid/Hindgut have high teratoma bias score (Figure 2H). Both the barcode bias and teratoma bias scores are scaled by the number of cells in each cell type (Methods).
Taken together, we found teratomas to generally contain the same major cell types at 10 weeks of growth: a large fraction of MSC/Fibroblast and neuronal cell types, and a small fraction of endoderm. RPE shows both a high degree of variability across teratomas and a high level of lineage priming. Notably, the level of heterogeneity between teratomas generated from H1 stem cells is comparable to that observed in organoids (de Souza, 2017; Quadrato et al., 2017; Velasco et al., 2019), but there is a much higher level of heterogeneity among teratomas derived from different PSC lines. This reflects known epigenetic variability across those lines (Ortmann and Vallier, 2017).
Assaying Teratoma Maturity
We next assessed the transcriptional similarity of the teratoma cell types to human fetal cell types, using published single-cell RNA-seq datasets from the human neuroectoderm and gut, to determine their utility as a tool for modeling human development. We looked at which human embryonic stage the 10-week teratoma cell types most resemble, projected the teratoma data onto the fetal data to assess global transcriptional similarity, and compared the expression of key cell type marker genes (Figure 3A).
Figure 3. Assaying teratoma maturity.
(A) Teratoma neuro-ectoderm cell types were mapped to fetal cortical cell types and the corresponding teratoma cell types were projected onto SWNE embeddings of fetal cells. Key marker genes were correlated across matching teratoma/fetal cell types, and average expression of teratoma cell types was correlated with fetal cell types from different stages of development. (B) Cosine similarity of teratoma brain cells with fetal brain cells of different ages. (C) UMAP embedding of teratoma neuro-ectoderm sub-clusters (Table S2G). (D) Projection of teratoma neuro-ectoderm cell types onto the SWNE embedding of fetal cortical cells. (E) Correlation of the scaled expression of key marker genes across Radial Glia, Cycling Progenitors, Early Neurons, and Interneurons. (F) Fraction of brain related cell types in the teratoma and fetal cortex. (G) H&E stain (left) and RNAScope image (right) of HES5 (radial glia marker, top) and DCX (early neuron, bottom) expression. DAPI is a nuclear stain. 4–10 punctate dots/cell is a positive result. Dots were dilated using ImageJ. Scalebar = 50μM. (H) Positive (top) and negative (bottom) RNAScope® control staining. DAPI is a nuclear stain. 4–10 punctate dots/cell is a positive result. Scalebar = 50μM.
Due to the semi-random nature of teratoma differentiation, it is possible that different cell types will resemble different stages of embryonic development. Thus, we analyzed individual tissue types separately, looking specifically at the teratoma neuro-ectoderm and gut cell types in-depth. We first sub-clustered the neuro-ectoderm cells and identified additional subtypes, including a cluster of early interneurons (Figure 3C, Table S2G). We then compared the average expression of all cells belonging to neural subtypes with the average expression of the same subtypes in a (2,300 cell) fetal brain dataset at different stages of development (Zhong et al., 2018) (Figure 3A, Figure 3B). We found that the teratoma neuronal cells had high similarity scores to the human prefrontal cortex at gestational week 13 – 17 with the highest score for weeks 16 – 17 (Figure 3B). Due to the high similarity with week 16 – 17 human data, we identified the teratoma subtypes (Radial Glia, Cycling Progenitors, Early Neurons, Early Interneurons) that matched with the cell types seen in a larger 40,000+ cell week 17 – 18 dataset also from the human prefrontal cortex for further analysis (Polioudakis et al., 2019) (Figure 3A, Figure 3C).
We then generated a Similarity Weighted Nonnegative Embedding (SWNE) of the week 17 – 18 human prefrontal cortex cells and projected the teratoma cells from the matching subtypes onto the fetal human SWNE (Figure 3A, Figure 3D) (Wu, Tamayo and Zhang, 2018b). We found similar cell types map to similar spatial positions in the SWNE embedding, suggesting overall similar expression patterns, although the teratoma SWNE embedding shows some overlap between cycling progenitors and radial glia as well as early interneurons and excitatory neurons (Figure 3D). Additionally, the teratoma radial glia cells project onto the fetal intermediate progenitors (Figure 3D).
To further assess the similarity of the teratoma neuro-ectoderm cell types to the fetal prefrontal cortex cell types, we defined a panel of neuronal cell type marker genes: DCX, NEUROD1, HES5, SOX2, HMGB2, VIM, DLX1 and then correlated the expression of these marker genes between the teratoma cells and fetal brain cells for every matched cell type (Figure 3A, Figure 3E). We found a fairly high correlation overall, with R = 0.82 for Radial Glia, R = 0.93 for Cycling Progenitors, R = 0.84 for Interneurons, and R = 0.77 for Early Neurons (Figure 3E). We also looked at the cell type proportions in the fetal prefrontal cortex versus the teratoma, showing that the teratoma has far more progenitor cells such as Radial Glia, and fewer early neurons with no detectable mature neurons (Figure 3F). We also ran a differential expression as well as a geneset enrichment analysis between the matched teratoma and fetal prefrontal cortex cell types to assess the differences between the teratoma and fetal cells (Figure S3A, S3B). All four cell types showed similar top differentially expressed genes as well as genesets, suggesting that the main differences between the teratoma and fetal cells are global and not cell type specific (Figure S3A, 3B). The teratoma cells have a higher expression of genes related to organ morphogenesis while the fetal cells express genes related to methylation, suggesting the teratoma cells may not have the same epigenetic signatures as fetal cells (Figure S3A, S3B).
This analysis was repeated with teratoma gut subtypes using a published fetal gut dataset as reference (Gao et al., 2018). The teratoma gut cells were most similar to gestational week 8–11 gut age (Figure S3C). We compared marker genes for gut cell types (CDX1, CDX2, HHEX, FOXJ1, PAX9, SOX2) between teratoma and fetal cells and found a high overall correlation, with an R = 0.98 for foregut and R = 0.98 for mid/hindgut (Figure S3D). Projecting fetal gut data onto the teratoma SWNE again resulted in relatively similar spatial positioning (Figure S3E). We see that the teratoma produces less foregut and more mid/hindgut than the fetal gut (Figure S3F). When looking at the differences between the teratoma and fetal gut cells, we again see that the fetal cells express more methylation related genes (Figure S3G, S3H). In this case, the teratoma cells express more genes related to RNA/DNA metabolism (Figure S3G, S3H).
To further validate these results, we used RNAScope In-Situ Hybridization (ISH) to probe for the radial glia marker HES5 and the early excitatory neuron marker DCX, which both showed high abundance in regions of neuro-ectoderm in fixed teratoma tissue sections (Figure 3G). POLR2A, PPIB, and UBC were used as positive controls and bacterial marker DapB as a negative control (Figure 3H). Additionally, we probed for FOXJ1 (cilia), CDX2 (intestine epithelium), TNNT2 (cardiac), and THY1 (mesenchyme/fibroblast) in ciliated airway epithelium, intestinal villi, developing cardiac muscle, and mesenchyme, respectively (Figure S3I). We were able to visualize a high abundance of the respective RNA transcripts, as well as confirm the identity of the respective tissue using H&E staining and histology (Figure S3I). Overall, we were able to show that the teratoma neuro-ectoderm and gut cell types are transcriptionally similar to their fetal counterparts, while also identifying the developmental stage of the teratoma cells. We validated the presence of six cell types (2 per germ layer) using RNAScope ISH and histology, which also showed that these cell types contain some degree of spatial organization (Figure 3G, Figure S3I, Table 1). Thus, we were able to further validate the teratoma neuro-ectoderm and gut cell types by mapping them onto reference fetal human scRNA-seq datasets and probing the spatial expression of canonical marker genes DCX, HES5, and CDX2 (Table 1, Table S3C). We also probed the spatial expression of FOXJ1, TNNT2, and THY1, adding more evidence to the Ciliated Epithelium, Cardiac Muscle, and MSC/Fibroblast cell type annotations (Table 1, Table S3C).
Engineering Teratomas via Genetic Perturbations
To establish the utility of the teratoma system as a model for human development, we next performed a single-cell genetic knockout screen using CRISPR-Cas9. To identify key developmental genes to include in our screen, we compiled a list of 24 major organ/lineage specification genes that are embryonic lethal upon knockout in mice (Table S4A). Studying the effects of these genes using cell lines or organoid models would typically require different experiments and different models for each cell lineage, as even a single gene can have functions across cell types, and even different germ layers. With the teratoma model, we can screen the effects of these genetic perturbations in all major cell lineages and germ layers in the same experiment. Using the CROPseq-Guide-Puro vector backbone, we cloned in 48 individual single guide RNAs (sgRNAs) directed at each developmental gene (2 sgRNAs per gene) (Datlinger et al., 2017) (Figure 4A, Table S4B). We also designed a stable Cas9-expressing iPSC line (PGP1) in order to prevent Cas9 silencing (Figure S4A, S4B, Methods). After creating a pooled lentiviral library with our sgRNAs, we transduced our engineered PGP1-Cas9 line at a MOI of 0.1 so that each cell received approximately one perturbation (Figure 4A). After selection, these cells were injected subcutaneously into 3 Rag2−/−;γc−/− immunodeficient mice for teratoma formation, extraction, and downstream scRNA-seq processing with 10X Genomics (Figure 4A).
Figure 4. Engineering teratomas via genetic perturbations and miRNA based molecular sculpting.
(A) PGP1-Cas9 iPSCs were transduced with a CRISPR library targeting a panel of 16 key developmental genes with 1 gRNA per gene. After generating 3 teratomas with the PGP1-iPSCs, scRNA-seq was used to identify shifts in cell type formation as a result of gene knockouts. We repeated this process with 3 additional teratomas to serve as a replicate screen. (B) Average effect of gene knockout on cell type enrichment/depletion versus the correlation of cell type enrichment between the original screen and replicate screen. Genes with a reproducibility greater than 0.4 (Methods) were selected for further analysis. (C) A heatmap of the effect size (regression coefficient) of gene knockout enrichment for cell types and germ layers. (D) Scatterplot of individual guide RNA effects on cell type abundance for selected genes TWIST1, RUNX1, CDX2, KLF6, ASCL1. (E) Schematic of miRNA-HSV-tk-GFP construct. 2A encodes for a self-cleaving peptide. Upon transcription, the expression will be diminished if corresponding endogenously expressed miRNA is present in the cell. (F) Schematic of how a developing teratoma would form in the presence of Ganciclovir (GCV, 80mg/kg/d, Methods) if cells were transduced with a neural-specific miRNA-HSV-tk construct. (G) Quantification using flow cytometry and gating based on the presence or absence of GFP in 35-day self-patterned whole brain organoid single cells transduced with either HSV-tk-GFP control or miR-124-HSV-tk-GFP. (H) In vivo studies of miR-124-HSV-tk-GFP teratomas in the presence of GCV administration (80mg/kg/d, Methods) using both intratumoral (IT) and intratumoral and intraperitoneal injection methods. A heatmap showing cell type fraction log fold-change for each teratoma replicate compared to a control miR-124-HSV-tk-GFP teratoma in the absence of GCV. Z-scores for each cell type fraction change are plotted as well, with standard deviations calculated using a pooled variance (Methods).
We validated the editing efficiencies of all our guide RNAs using PCR amplification of the expected cut site and looking for mutations and indels with CRISPResso (Table S4C, S4D, Methods). We then selected the top guide targeting each gene with at least a 60% overall editing efficiency and a 40% indel efficiency which resulted in a total of 16 guides (Table S1C, Table S4C, S4D, Methods). We then only used these validated guides for further computational analysis. To assess the reproducibility of our results, we also reran the CRISPR-KO screen by repooling these validated guides and generated 3 additional teratomas (Figure 4A, Table S1D, Methods). We successfully captured a median of 118 cells per gene/guide in the original screen and 1,280 cells per gene/guide in the replicate screen (Figure S4C). We were able to capture more cells per guide in the replicate screen since we only pooled the top 16 guides, while the original screen had a total of 48 guides (Methods).
In order to ensure consistent cell types across teratomas, we integrated all six teratomas across both the original and replicate screen using Seruat v3 (Stuart et al., 2019). We then called cell types in the PGP1 teratoma cells using Seurat label transfer with the 7 H1 teratomas as reference and collapsed developmentally similar cell types (Figure S4D, Methods). To determine the total effect of each knockout, we measured the difference in cell type composition between cells in each gene knockout with all cells belonging to the non-targeting control (NTC) separately for each screen using Earth Mover’s Distance (EMD) (Chen et al., 2020) (Figure 4A, Methods). For both the original and replicate screen, we ran a ridge regression model to assess effects of each gene knockout on cell type enrichment/depletion (Dixit, Parnas, Li, Weissman, et al., 2016) (Figure 4A, Methods). For each gene, we plotted its EMD alongside the Pearson correlation of the regression coefficients for the both the original screen and the replicate screen, giving us a sense of both the effect size and reproducibility of each gene knockout (Figure 4B, Methods). We also see that gene knockouts with strong effect sizes tend to be more reproducible (R = 0.59) (Figure 4B, Methods). We highlighted genes with a Pearson correlation of greater than 0.4 between the original and replicate screen for further analysis (Figure 4B).
For the highlighted genes, TWIST1, RUNX1, CDX2, KLF6, and ASCL1, we wanted to identify the gene knockout effects on cell types that were statistically signficant. Towards this we merged the cells from both screens and ran a combined ridge regression analysis, computing P-values using a permutation test and False Discovery Rates using the Benjamani-Hochberg correction (Methods). We then visualized all gene knockout effects with an FDR < 0.1 (Figure 4C, Methods).
CDX2 is known to be important for the development of the midgut and hindgut (Silberg et al., 2002; Gao, White and Kaestner, 2009). Our data shows that cells with a CDX2 are enriched in the Foregut and depleted in the Mid/Hindgut, which lines up with past literature reports that CDX2 knockout shifts the gut differentiation pathway away from intestine and towards gastric activation (Simmini et al., 2014; Kim and Shivdasani, 2016) (Figure 4C, 4D). TWIST1 showed the largest effect size and is a known transcription factor for the epithelial-to-mesenchymal transition (EMT), which is important in development as well as metastatic cancers (Figure 4B) (Yang et al., 2004; Kalluri and Weinberg, 2009). Our screen found that cells with a TWIST1 knockout are depleted in mesodermal cell types (muscle, smooth muscle, pericytes, and mesenchymal stem cell/fibroblasts), and enriched in neuro-epithelium (retinal epithelium, neurons), confirming prior studies that have identified TWIST1 as key to mesodermal specification (Qin et al., 2011) (Figure 4C, 4D). We see that RUNX1 knockout results in a depletion of neurons and muscle cell types and an enrichment in mid/hindgut, which is consistent with previous mouse and stem cell studies that show RUNX1 to be critical for neural crest formation, signaling in gut epithelium stem cells, and myoblast proliferation (Marmigère et al., 2006; Fijneman et al., 2012; Scheitz and Tumbar, 2013; Umansky et al., 2015; Sarper et al., 2018) (Figure 4C, 4D). KLF6 knockout resulted in a depletion of pericytes, consistent with its role in promoting endothelial activation during vascular repair (Garrido-Martín et al., 2012) (Figure 4C, 4D). ASCL1 interestingly resulted in an increase in the proportion of retinal epithelium and neural progenitors (Figure 4C, 4D). Since ASCL1 is key to cell cycle exit and neuronal differentiation, knocking out ASCL1 may slow down neurogenesis and result in a buildup of neural progenitors (Castro et al., 2011). With this CRISPR knockout screen of key developmental regulators, we were able to assay the multi-lineage functions of these genes in a human-specific model, something that to our knowledge, no other human developmental model can currently accomplish.
Modeling Neural Disorders using Teratomas
While we were able to demonstrate the teratoma’s unique ability to assess the multi-lineage function of embryonic lethal genes, we also wanted to see if the teratoma could model human neural disorders. Specifically, we looked into Pitt-Hopkins (Dean, 2012), Rett (Ehinger et al., 2018), and L1 (Stumpel and Vos, 1993) Syndromes. Pitt-Hopkins syndrome is a rare neurodevelopmental disorder most often caused by a de novo loss of function of one allele of the transcription factor 4 (TCF4) gene (Forrest et al., 2014). Rett Syndrome is a severe X-linked neurological disorder caused by a de novo mutation in the methyl-CpG-binding protein 2 (MECP2) gene. Finally, L1 syndrome is another X-linked syndrome with a mutation in the L1 cell adhesion molecule (L1CAM) gene important for neuron migration, adhesion, and neuronal differentiation (Samatov, Wicklein and Tonevitsky, 2016). To assess the downstream effects of perturbing these genes, we generated a CRISPR-KO library targeting TCF4, MECP2, and L1CAM, with 3 guides for each gene (Table S5A). We transduced PGP1-Cas9 cells with the neural disorder library, generated 2 teratomas, and then sequenced 2 scRNA-seq libraries for each teratoma using the 10X Genomics platform (Table S1E) (Zheng et al., 2017).
We integrated and clustered the teratomas using Seurat data integration and used Seurat’s label transfer method to call cell types using the H1 teratomas as the reference. We then looked for shifts in both cell type proportion and cell type specific gene expression as a result of the gene knockouts (Figure S4E). As one would expect, we found that the shift in cell type proportion (normalized EMD) was much smaller than for the embryonic lethal knockouts (Figure S4F). We thus looked at cell type specific shifts in gene expression from the neurological disorder knockouts instead. We merged our cell types into 7 broad cell types (Neurons, Neural Progenitors, Gut, Retinal Epithelium, Muscle, Immune, MSC/Fibroblast) and computed differential expression between each gene knockout and the NTCs (Methods). There was no significant gene expression shift due to the presence of a double stranded break (per AAVS control) (Table S5B).
We then analyzed the effect of L1CAM in Neurons and the effect of TCF4 and MECP2 in Neural Progenitors and plotted the cell type specific log fold-changes for all DEGs with an FDR below 0.1 across both teratomas, showing that our hits are fairly reproducible (Figures S4G – I). Knocking out L1CAM in Neurons decreased the expression of clusterin (CLU), an effect that has previously been shown in colorectal cancer cells (Shapiro et al., 2015), while also increasing the expression of MAPT (which produces the tau protein). Tau efflux via L1CAM exosomes is present in certain neurological diseases (Shi et al., 2016) (Figure S4G, Table S5C). Knocking out MECP2 in neural progenitors decreased the expression of transient receptor potential cation channel subfamily M member 3 (TRPM3), and previous literature has shown a similar decrease in expression and function of TRP channels in the hippocampus and several other brain regions of MECP2 mutant mice contributing to Rett syndrome etiology (Chapleau et al., 2013; Li and Pozzo-Miller, 2014; Suzuki et al., 2016) (Figure S4H, Table S5D). Finally, knocking out TCF4 in neural progenitors decreased the expression of FOXO3 which is consistent with TCF4 knockdown studies in the human neuroblastoma line SH-SY5Y showing a fold decrease in FOXO3 which has been suggested to contribute to the molecular pathology of Pitt-Hopkins and other autism spectrum disorders (Forrest et al., 2013) (Figure S4I, Table S5E). Overall, we were able to reproducibly discover cell type specific gene expression shifts that occurred when knocking out the genes underlying Rett, Pitt-Hopkins, and L1 syndromes, potentially building a resource for future in-depth study.
Engineering Teratomas via miRNA based Molecular Sculpting
Since the teratoma is vascularized and has the potential to yield mature tissue, we sought to sculpt the teratoma towards specific lineages, which could allow for focused developmental modeling and tissue engineering. We used endogenously expressed micro RNAs (miRNAs) (Ambros, 2004; Bartel, Lee and Feinbaum, 2004; Bartel, 2018), which are often unique to specific cell types, lineages, or disease states (Lu et al., 2005; Shivdasani, 2006). Specifically, we appended tissue specific miRNA target sequences to the 5’ and 3’ UTR of a GFP fluorescent suicide gene (HSV-tk-GFP), thereby suppressing its expression in a miRNA specific lineage of interest (Figure 4E, Table S5G) (Miki et al., 2015; Nissim et al., 2017; Hirosawa et al., 2017). This design ensures that cell types that do not express the miRNA are killed by the suicide gene in the presence of gancliclovir (GCV), thus selecting for our desired lineage (Figure 4F).
We first tested the functionality our miRNA-HSV-tk-GFP constructs in H1 ESCs by showing that cells transduced with our miRNA-HSV-tk-GFP construct die in the presence of 10μM GCV after 5 days of culture, while cells transduced with a GFP control continue growing (Figure S5A). We then assessed the cell type specificity of the miRNA construct using miR-21 expressing HeLa cells(Lu et al., 2008; Medina and Slack, 2008; Yao et al., 2009; Bartel, 2018). HEK293T cells are show little to no expression of miR-21 and can serve as a control (Zhu et al., 2008; Li et al., 2009; Chak et al., 2016). After transduction of both cell lines with our miR-21-HSV-tk-GFP construct, we cultured the cells for 5 days and then performed flow cytometry analysis where we saw a decrease in GFP expression in the HeLA cells, but not in the HEK293T cells (Figure S5B). This would indicate that the GFP expression was silenced by the miR-21 expressed by HeLa cells. We used an HSV-tk-GFP construct without any miRNA binding sites as a control (Figure S5B). We repeated this experiment with a miR-126-HSV-tk-GFP construct (endothelial cell-specific) (Wang et al., 2008) and observed GFP repression ina decrease in GFP signal in HUVEC cells as compared to the HEK293T control (Figure S5C). With this we were able to validate both the HSV-tk killing with GCV, and the ability of our miRNA constructs to specifically repress GFP in target cell lines.
We further validated our construct in whole brain organoids. Following a standard self-patterned whole brain organoid protocol (Figure S5D, Methods) (Quadrato et al., 2017), we created organoids using H1 ESCs transduced with either the miR-124-HSV-tk-GFP construct or the HSV-tk-GFP construct (lacking any miRNA binding sites). We used miR-124 since it is a pan-neural miRNA (Lagos-quintana et al., 2002; Seiler et al., 2005; Sun et al., 2015). Day 35 organoids from both groups (miR-124-HSV-tk-GFP and HSV-tk-GFP) were dissociated down to single cell level and analyzed via flow cytometry for GFP fluorescence (Methods). As expected, HSV-tk-GFP organoid single cells maintained their GFP fluorescence while miR-124-HSV-tk-GFP organoids showed GFP repression (Figure 4G).
We then tested the miRNA-HSV-tk-GFP constructs in vivo using the miR-124-HSV-tk construct to generate teratomas enriched for the neural lineage. After the H1 ESC line was successfully transduced with the miR-124-HSV-tk-GFP construct, we formed teratomas as described in our previous studies (Methods). Once teratomas reached a minimum of 1cm in diameter, we began either intratumoral (IT) injections with GCV (80mg/kg/d, Methods or two-site intraperitoneal and intratumoral (IPIT) injections (50/mg/kg/d for each site, Methods) all compared to a control miR-124-HSV-tk-GFP teratoma with no GCV (Methods). There were 2 teratomas for each injection condition for a total of 4 teratomas + 1 control teratoma and all teratomas were grown for up to 70 days. Post-extraction, teratomas were observed for external heterogeneity. The teratomas that received GCV injections were of smaller size (approx. 2cm compared to 4cm) and weight (approx. 1–2 gm compared to 5+ gm) than the control teratoma without GCV injections (Figure S5E).
We ran the 10X scRNA-seq protocol on each teratoma and classified cells using Seurat label transfer (Table S1F) (Stuart et al., 2019). A comparison of the GCV+ teratomas cell type composition with the GCV− teratoma revealed enrichment in Early Neurons, Neuronal Progenitors, and Schwann cells (Figure 4H). In addition, we saw depletion in muscle, retinal pigmented epithelium (lacks miR-124 expression), and other cell types (Figure 4H). The teratomas with the IPIT injection strategy showed a stronger enrichment for the neuro-ectoderm cell types, suggesting that the addition of an intraperitoneal injection site helps with GCV selection (Figure 4H). We also visualized the neuro-ectoderm enrichment in GCV+ teratomas with H&E staining of a GCV+ and GCV− teratoma (Figure S5F). The IPIT teratomas had a stronger enrichment for Early Neurons (Z-score > 3) than for Neuronal Progenitors or Schwann cells, possibly since the expression of miR-124 increases as the neuro-ectoderm cell types mature (Figure 4H, Figure S5F).
We further validated the enrichment of neuro-ectoderm in IPIT teratomas by immunostaining for PAX6, a key marker of neuronal fate determination (Figure S5G). The three GCV+ teratoma sections with IPIT injections showed higher levels of PAX6 protein expression than the three GCV− teratoma sections, validating that our miR-124 circuit enriches for neuro-ectoderm (Figure S5G). We used a secondary antibody (Dylight 550) only staining to confirm that there was no non-specific secondary antibody binding (Figure S5H). Additionally, we validated that the GCV+ teratoma has higher expression of HES5, a key Radial Glia marker, using RNA FISH (Figure S5I).
In summary, we developed a miRNA circuit that enables us to engineer the teratoma towards a desired lineage. We demonstrated this circuit in vitro using miR-126 (endothelial lineage) and miR-21 (cancer), and in vivo using miR-124 (neuro-ectoderm lineage). Our in vivo results showed that administering GCV through multiple sites resulted in improved neuro-ectoderm enrichment. Our miRNA circuit can be extended to any cell-type specific miRNA, and could have applications in studying developmental biology and human disease, as well as in tissue engineering.
DISCUSSION
The teratoma has the potential to be a fully vascularized, multi-lineage model for human development. Its major advantages are that it can grow to a large size due to its vascularization, and it can produce a wide array of relatively mature cell types from all major developmental lineages. Additionally, as we demonstrated with our CRISPR-Cas9 knockout screens, the teratoma’s ability to generate cells from all lineages enables a comprehensive assessment of the effect of genetic perturbations on human development within a single integrated experiment. Furthermore, we show the teratoma can be engineered using miRNA circuits to grow/enrich specific tissues of interest in vivo.
Future studies with this model could explore increasing tissue maturity with extended growth/larger animal hosts. Benchmarking with human patient-derived teratomas would also be valuable, especially as many of these often can become quite mature. Another critical future study is assessing the impact of different dissociation methods on teratoma cell type proportion. The ability to achieve greater cell numbers with the most current single cell RNA sequencing protocols, such as SPLiT-seq (Rosenberg et al., 2018) and sci-RNA-seq (Cao et al., 2017), will be vital for identifying additional cell types. A time series analysis of teratomas at multiple stages of maturity could help uncover developmental pathways that the cell types follow. Additionally, pooling different cell types together with PSCs prior to injection may help aid in cellular enrichment/maturity in the teratoma (i.e. HUVECs to enrich for HSC populations) (Philipp et al., 2018) or enriching for desired cell types based on injection site (Chan et al., 2018). Growing patient-specific teratomas could benefit disease research through isogenic iPSC lines aiding in understanding the disease state in various tissues that otherwise may be inaccessible with current technologies. Finally, further optimization is necessary on the miRNA molecular sculpting technology, specifically generating stable miRNA cell lines by insertion in loci such as AAVS1, and optimizing the timing, dosing, and route for GCV administration. Taken together, we believe the teratoma is a promising platform for modeling multi-lineage human development, pan-tissue functional genetic screening, and cellular engineering.
LIMITATIONS
Every model system has its intrinsic strengths and weaknesses, and below we discuss some of the limitations of the teratoma system and also considerations towards improving it for enabling basic science and engineering studies. One issue with the teratoma system (and organoids) is the intrinsic degree of heterogeneity (de Souza, 2017; Quadrato et al., 2017; Capowski et al., 2019; Phipson et al., 2019). In this regard, we found the use of internal controls when conducting perturbation experiments was important. For example, in our CRISPR-Cas9 screen, each teratoma contained both gene targeting guides and non-targeting controls, enabling us to compare cell type proportion shifts within each teratoma without having to worry about heterogeneity between teratomas.
While the teratoma has regions of organization and maturity, these may develop in an asynchronous manner. This lack of synchronization may prove to be a barrier in accessing certain mature cell types that need a highly ordered cellular context to develop.
Also, since the teratoma contains cell types from all lineages, finding a single dissociation protocol that captures as many cell types as possible is a challenge. The choice of dissociation method can drastically change the cell types profiled in single cell RNA-seq, and it is likely that the set of cell types we see in our data is biased by our dissociation protocol (Denisenko et al., 2019). It may be the case that no single dissociation method can capture all cell types, and it will be necessary to design specific dissociation protocols to capture specific tissues.
Additionally, our cell type annotations are still preliminary. While we validated key cell types by comparison to fetal human/mouse reference datasets and RNA FISH, we were not able to validate all cell types due to limited developmental human reference scRNA-seq datasets, as well as cost constraints. Thus, some cell types, such as the neuro-ectoderm cell types, have more validation than others, giving us greater confidence in their identity (Table 1). We may also still be underpowered in detecting less abundant cell types and additional single cell RNA-seq could enable us to resolve some missing cell types, as under sampling could result in smaller cell types being collapsed into a larger cell type during analysis.
In regard to lineage engineering, we anticipate there will be a considerable degree of silencing that occurs in the miRNA-suicide gene constructs due to the use of lentiviral vectors. Future studies could explore incorporating these in genomic regions such as the AAVS1 locus that would enable constitutive expression across all cell types. Safety switches based on suicide genes will also be critical for eliminating potential residual undifferentiated cells, and mouse cells within the teratoma, to mitigate impact on safety and utility in tissue engineering applications.
STAR METHODS
RESOURCE AVAILABILITY
Lead Contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Prashant Mali (pmali@ucsd.edu).
Materials Availability
Until the Addgene submission process completes, all unique/stable reagents generated in this study are available from the Lead Contact with a completed Materials Transfer Agreement.
Data and Code Availability
The raw and processed data generated from this study are available at Gene Expression Omnibus with accession code GSE156170
All code used for analysis are available at this github repository: yanwu2014/teratoma-analysis-code. Instructions for reproducing our analysis step by step are also in this repository.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Cell Culture
The H1 (P30), H9 (P36), PGP1 (P39), and HUES62 (P20) hESC cell line was maintained under feeder-free conditions in mTeSR medium (Stem Cell Technologies). Prior to passaging, tissue-culture plates were coated with growth factor-reduced Matrigel (Corning) diluted in DMEM/F-12/Glutamax medium (Thermo Fisher Scientific), and incubated for 30 minutes at 37°C, 5% CO2. Cells were dissociated and passaged using the dissociation reagent Versene (Thermo Fisher Scientific). Cells were passaged a maximum of 4 times for proper expansion prior to injection. HEK 293T and HeLa were maintained in high glucose DMEM supplemented with 10% fetal bovine serum (FBS) and passaged every couple days upon confluency with .05% Trypsin-EDTA (Gibco). HUVECs were maintained in EGM-2 (Lonza).
Organoid Generation and Dissociation
Self-patterned whole brain organoids were generated following the Quadrato et al. 2017 protocol (Quadrato et al., 2017). Briefly, H1 ESCs transduced with either miR-124-HSV-tk-GFP or HSV-tk-GFP were cultured as embryoid bodies for 5 days, transferred into Neural Induction (NI) media for 5 days, and finally embedded in Matrigel and cultured in Cortical Differentiation (CD) media for 25 days. Day 35 organoids were dissociated to single cell following a modified protocol using the GentleMACS Human Tumor Dissociation Kit, but without use of the GentleMACS dissociator and instead cells were triturated post-37°C 1-hr incubation with a 1000 μL pipetman prior to 70 μM filtration. Resulting single cell suspension was analyzed for GFP florescence via flow cytometry. Cells, embryoid bodies, and organoids were maintained under puromycin selection [0.75μg/μL] for the entirety of the experiment.
Animals
Animals used in this study were male NOD-scid IL2Rgammanull mice 8–10 weeks of age. Housing, husbandry and all procedures involving animals used in this study were performed in compliance with protocols (#S16003) approved by the University of California San Diego Institutional Animal Care and Use Committee (UCSD IACUC). Mice were group housed (up to 4 animals per cage) on a 12:12 hr light-dark cycle, with free access tso food and water in individually ventilated specific pathogen free (SPF) autoclaved cages. All mice used were healthy and were not involved in any previous procedures nor drug treatment unless indicated otherwise.
METHOD DETAILS
PGP1-Cas9 Clone Generation
The PGP1 human induced pluripotent stem cell line was a kind gift of Dr. George Church at Harvard Medical School. The sgRNA targeting AAVS1 locus of the human genome (spacer sequence GGGCCACTAGGGACAGGAT) was cloned into the Lenti-guide-puro plasmid (Addgene #52963). To generate the knockin donor plasmid, we cloned the CAG promoter followed by a cassette of co-expression of spCas9 and EGFP splitting via the P2A sequence into the pCR4-Blunt-TOPO vector (Thermo Fisher Scientific). Two homology arms were amplified from upstream (804 bp) and downstream (837 bp) of the sgRNA targeting site in AAVS1 genomic locus and constructed into the donor plasmid flanking the CAG-spCas9-P2A-EGFP cassette. Between the upstream homology arm and the CAG promoter, we inserted a splice acceptor sequence following by a T2A linked blasticidin resistance gene.
Human iPSC PGP1 cells were electroporated using 4D-Nucleofector system and P3 Primary Cell X kit (Lonza) according to the manufacturer’s instruction. Briefly, the PGP1 cells were dissociated into single cells. 1×106 cells were mixed with 100 μl nucleofection reagents and 10 μg DNA (5 μg Cas9 donor + 5 μg sgRNA) and electroporated. The cells were recovered with pre-warmed medium and then cultured on inactivated MEF feeders in 10 cm dishes with mTeSR medium supplemented with 0.5 μM ROCK-inhibitor. Afterward, the mTeSR medium without ROCK-inhibitor was refreshed daily. 2 μg/ml blasticidin were added into the culture medium 7 days after electroporation. The cells were cultured without passage until clones emerged on the plate. The clones were checked under the microscope and those with EGFP expression were picked up and expanded individually.
To detect genomic integration, the genomic DNA from cultured cells was extracted using DNeasy Blood & Tissue Kits (Qiagen). Approximately 500 ng of genomic DNA was used for each PCR reaction using KAPA HiFi HotStart Ready Mix (Kapa Biosystems). The PCR amplification of the left and right arm utilized primers that amplified regions spanning both the PGP1 AAVS1 endogenous locus and the engineered cassette (Figure S4B).
The primer sequences are listed below.
| Left_arm_forward | ACTTCCCCTCTTCCGATGTTG |
| Left_arm_reverse | ATTGTAGCCGTTGCTCTTTCA |
| Right_arm_forward | GAGCAAAGACCCCAACGAGAAGC |
| Right_arm_reverse | CTGCCTGGAGAAGGATGCAGGA |
This was further validated by direct Sanger sequencing of the arms (Figure S4A), The activity of Cas9 in the PGP1-Cas9 cells was validated by the generation of indels at the expected position when guide RNAs were introduced.
sgRNA Design
The CRISPR-KO sgRNA sequences targeting transcription factor genes were obtained from the GPP sgRNA Designer web tool (https://portals.broadinstitute.org/gpp/public/analysis-tools/sgrna-design, accessed February 2018) as follows. The 24 gene symbols in the table below were converted to Entrez gene IDs using Bioconductor package org.Hs.eg.db_3.5.0, and the resulting IDs were submitted together with the following parameters: enzyme Sp, taxon human, quota 50, include unpicked. From the resulting output, the two guide sequences with the highest “pick order” were selected for each target gene. To check the validity of each guide sequence, the corresponding context sequence was compared to the human reference genome at the predicted cut location using Bioconductor package BSgenome.Hsapiens.UCSC.hg38_1.4.1, and the cut location was confirmed to be fully within the target gene coding sequence determined using Bioconductor package TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0.
| Gene symbol | Entrez ID | sgRNA-1 | sgRNA-2 |
|---|---|---|---|
| SOX17 | 64321 | GGCAACGGGTAGCCGTCGAG | AGGGCGAGTCCCGTATCCGG |
| CDX2 | 1045 | CCGCAGTACCCGGACTACGG | CAAATATCGAGTGGTGTACA |
| HNF4A | 3172 | GGGACCGGATCAGCACTCGA | GCAATGACTACATTGTCCCT |
| GATA4 | 2626 | TGTGGGCACGTAGACTGGCG | CCGGCTTACATGGCCGACGT |
| GATA6 | 2627 | CGGGACGCCTCAGCTCGACA | GCCGACAGCGAGCTGTACTG |
| RUNX1 | 861 | CTGATCGTAGGACCACGGTG | TGCTCCCCACAATAGGACAT |
| FOXA2 | 3170 | ATGAACATGTCGTCGTACGT | TCCGTGAGCAACATGAACGC |
| PDX1 | 3651 | GGAGAACAAGCGGACGCGCA | TATTCAACAAGTACATCTCA |
| NKX2–1 | 7080 | GCGAGCGGCATGAACATGAG | GGTTGGCGCCGTACCATCCG |
| NKX2–5 | 1482 | GTAGGCACGTGGATAGAAGG | GAAGACAGAGGCGGACAACG |
| SOX9 | 6662 | ACGTCGCGGAAGTCGATAGG | TTCACCGACTTCCTCCGCCG |
| PROX1 | 5629 | AGTGTCCACAACTTGCGACA | CGGGTTGAGAATATAATTCG |
| SNAI1 | 6615 | GGGACTCTCCTGGAGCCGAA | TGTAGTTAGGCTTCCGATTG |
| TWIST1 | 7291 | CGGGAGTCCGCAGTCTTACG | AGCGGGTCATGGCCAACGTG |
| ASCL1 | 429 | CCAGGTTGACCAACTTGACG | AAACGCCGGCTCAACTTCAG |
| NEUROG1 | 4762 | CCGCATGCACAACTTGAACG | TTGGTGTCGTCGGGGAACGA |
| KLF6 | 1316 | TCTGAGGCTGAAACATAGCA | GCTGACCAAAACTTCGCCAA |
| KLF2 | 10365 | GGTTCGGGGTAATAGAACGC | CTTCGGTCTCTTCGACGACG |
| HES1 | 3280 | GTGCGAGGGCGTTAATACCG | AGCCAGTGTCAACACGACAC |
| FOXG1 | 2290 | AGCGCGTTGTAGCTGAACGG | CCGCGCCACTACGACGACCC |
| TULP3 | 7289 | GGAGTATGACAGTTCACCAA | TGAAAGTGTGAACTTCGATG |
| MYOG | 4656 | TTACACACCTTACACGCCCA | TCGAACCACCAGGCTACGAG |
| GATA3 | 2625 | TCCAAGACGTCCATCCACCA | CAGGGAGTGTGTGAACTGTG |
| FGFR2 | 2263 | CTTAGTCCAACTGATCACGG | TGACCAAACGTATCCCCCTG |
| Gene symbol | Entrez ID | sgRNA-1 | sgRNA-2 | sgRNA-3 |
|---|---|---|---|---|
| TCF4 | 6925 | GTGGACATCGGAGGAAGAC | TGTCCACTTTCCATCGTAG | CAAACGTTCATGTGGATGC |
| MECP2 | 4204 | GCTCCATCATCCGTGACCG | AAAGCCTTTCGCTCTAAAG | TTGCGTACTTCGAAAAGGT |
| L1CAM | 3897 | GCGTCCGGTGTCATTGGCC | GCGTACTATGTCACCGTGG | GCCAGTACCGAACTGGATG |
Library Preparation
The lentiviral backbone plasmid for the barcode vector was constructed containing the EF1α promoter, mCherry transgene flanked by BamHI restriction sites, followed by a P2A peptide and hygromycin resistance enzyme gene immediately downstream (ECIH). The backbone was digested with HpaI, and a pool of 20 bp long barcodes with flanking sequences compatible with the HpaI site, was inserted immediately downstream of the hygromycin resistance gene by Gibson assembly. The vector was constructed such that the barcodes were located only 200 bp upstream of the 3’-LTR region. This design enabled the barcodes to be transcribed near the poly-adenylation tail of the transcripts and a high fraction of barcodes to be captured during sample processing for scRNA-seq.
The lentiviral backbone plasmid for the sgRNAs was the CROPseq-Guide-Puro vector (Addgene #86708). To create the sgRNA library, individual sgRNAs were PCR amplified utilizing overlapping forward and reverse primers custom designed with flanking sequences compatible with the BSMBI restriction sites (Table S4B). The lentiviral backbone was digested with BSMBI (New England Biolabs) at 55°C for 3 hours in a reaction consisting of: CROPseq-Guide-Puro backbone, 5 μg, Buffer NEB 3.1, 5 μl, BSMBI, 5 μl, H20 up to 50 μl. After digestion, the vector was purified using a QIAquick PCR Purification Kit (Qiagen). Each sgRNA was then individually assembled via Gibson assembly.
The lentiviral backbone plasmid for the miRNA-HSV-tk-GFP constructs was an EF1-alpha promoter, GFP, IRES domain, and puromycin-resistance gene (EGIP) backbone. The lentiviral backbone was digested with EcoRV-HF (New England Biolabs) at 37°C for 1 hour to excise out the GFP in a reaction consisting of: EGIP backbone, 5 μg, 1X Cutsmart Buffer (New England Biolabs), 5 μl, EcoRV-HF, 5 μl, H20 up to 50 μl. After digestion, the vector was purified using a QIAquick PCR Purification Kit (Qiagen). We amplified a gBlock containing the Herpes Simplex Virus thymidine kinase (HSV-tk), 2A self-cleaving peptide, and GFP.
The primers used to amplify the gBlock contain unique miRNA binding sites (see below).
| miR_Empty_F | TGGCTAGTTAAGCTTGATATCGAATTCCTGCAGCCCGGGGGATCCAGATCACACCGGTCGCCA |
| miR_Empty_R | GGGAGAGGGGGGGGGGGCGGAATTCCGCGGGCCCGTCGACGCGGTTAACGCCGCTTTACTTGTACAG |
| miR_21_F | TGGCTAGTTAAGCTTGATATCGAATTCCTGCAGCCCGGGGGATCCTCAACATCAGTCTGATAAGCTA AGATCACACCGGTCGCCA |
| miR_21_R | GGGAGAGGGGGGGGGGGCGGAATTCCGCGGGCCCGTCGACGCGGTTTAGCTTATCAGACTGATGTTGA AACGCCGCTTTACTTGTACAG |
| miR_122_F | TGGCTAGTTAAGCTTGATATCGAATTCCTGCAGCCCGGGGGATCCCAAACACCATTGTCACACTCCA AGATCACACCGGTCGCCA |
| miR_122_R | GGGAGAGGGGGGGGGGGCGGAATTCCGCGGGCCCGTCGACGCGGTTTGGAGTGTGACAATGGTGTTTG AACGCCGCTTTACTTGTACAG |
| miR_124_F | TGGCTAGTTAAGCTTGATATCGAATTCCTGCAGCCCGGGGGATCCGGCATTCACCGCGTGCCTTA AGATCACACCGGTCGCCA |
| miR_124_R | GGGAGAGGGGGGGGGGGCGGAATTCCGCGGGCCCGTCGACGCGGTTTAAGGCACGCGGTGAATGCC AACGCCGCTTTACTTGTACAG |
| miR_126_F | TGGCTAGTTAAGCTTGATATCGAATTCCTGCAGCCCGGGGGATCCCGCATTATTACTCACGGTACGA AGATCACACCGGTCGCCA |
| miR_126_R | GGGAGAGGGGGGGGGGGCGGAATTCCGCGGGCCCGTCGACGCGGTTTCGTACCGTGAGTAATAATGCG AACGCCGCTTTACTTGTACAG |
| miR_302A_F | TGGCTAGTTAAGCTTGATATCGAATTCCTGCAGCCCGGGGGATCCAGCAAGTACATCCACGTTTAAGT AGATCACACCGGTCGCCA |
| miR_302A_R | GGGAGAGGGGGGGGGGGCGGAATTCCGCGGGCCCGTCGACGCGGTTACTTAAACGTGGATGTACTTGCT AACGCCGCTTTACTTGTACAG |
We cloned this amplicon into our digested EGIP backbone using standard Gibson assembly.
The Gibson assembly reactions were set up as follows: 1:10 molar ratio of digested backbone to sgRNA insert, 2X Gibson assembly master mix (New England Biolabs), H20 up to 20 μl. After incubation at 50°C for 1 h, the product was transformed into One Shot Stbl3 chemically competent Escherichia coli (Invitrogen). A fraction (150 μL) of cultures was spread on carbenicillin (50 μg/ml) LB plates and incubated overnight at 37°C for 15–18hrs (miRNA constructs required longer incubation times). Individual colonies were picked, introduced into 5 ml of carbenicillin (50 μg/ml) LB medium and incubated overnight in a shaker at 37°C. The plasmid DNA was then extracted with a QIAprep Spin Miniprep Kit (Qiagen), and Sanger sequenced to verify correct assembly of the vector and to extract barcode sequences.
To assemble the library, individual sgRNA vectors were pooled together in an equal mass ratio along with 5 non-targeting control (NTC) sgRNAs which constituted 50% of the final pool.
Viral Production
HEK 293T cells were maintained in high glucose DMEM supplemented with 10% fetal bovine serum (FBS). Cells were seeded in a 15 cm dish 1 day prior to transfection, such that they were 60–70% confluent at the time of transfection. For each 15 cm dish 36 μl of Lipofectamine 2000 (Life Technologies) was added to 1.5 ml of Opti-MEM (Life Technologies). Separately 3 μg of pMD2.G (Addgene #12259), 12 μg of pCMV delta R8.2 (Addgene #12263) and 9 μg of an individual vector or pooled vector library was added to 1.5 ml of Opti-MEM. After 5 minutes of incubation at room temperature, the Lipofectamine 2000 and DNA solutions were mixed and incubated at room temperature for 30 minutes. Medium in each 15 cm dish was replenished with 25 ml of fresh medium. After the incubation period, the mixture was added dropwise to each dish of HEK 293T cells. Supernatant containing the viral particles was harvested after 48 and 72 hours, filtered with 0.45 μm filters (Steriflip, Millipore), and further concentrated using Amicon Ultra-15 centrifugal ultrafilters with a 100,000 NMWL cutoff (Millipore) to a final volume of 600–800 μl, divided into aliquots and frozen at −80°C.
Viral Transduction
For viral transduction, virus was added at a low MOI (ensuring a single barcode/cell or a single sgRNA/cell) to stem cells at 20% confluency alongside polybrene (5 μg/ml, Millipore) in fresh mTeSR medium. The following day, medium was replaced with fresh mTeSR. Appropriate selection reagent was added 48 hrs after transduction (hygromycin [50μg/μL] for barcode / puromycin [0.75μg/μL] for CRISPR KO screen / miRNA-HSV-tk-GFP) (Thermo Fisher Scientific) and was replaced daily. For miRNA-HSV-tk-GFP transduced cells puromycin selection did not begin until 5–7 days after transduction to allow for enough GFP positive cells. For editing in CRISPR KO screen, selection was continued for 5 days prior to use for teratoma formation in mice.
sgRNA Editing Rate Validation
We individually transduced each sgRNA into our PGP-Cas9 cell line in an arrayed format and selected with puromycin after 48 hrs and allowed editing to occur for an additional 5 days (7 days total). From there we retrieved the cell pellets from each individual sgRNA and extracted gDNA. We then designed primers (Table S4C) upstream and downstream of the expected cut site for each individual sgRNA and amplified that region utilizing standard PCR on the gDNA extracted from each cell pellet transduced with each individual sgRNA. Each amplicon for each sgRNA was then sent out for deep sequencing. We used CRISPResso with default parameters to compute the fraction of reads containing mutations, which we split out into an indel rate and an overall mutation rate.
GCV-HSV-tk Killing in vitro
Cells transduced with miRNA-HSV-tk-GFP construct and EGIP-transduced controls grew for a maximum of 5 days in standard medium conditions in the presence of Ganciclovir ([GCV, Sigma-Aldrich] 1μM, 10μM, or 100μM) with daily phase and fluorescent microscopy imaging. GCV was resuspended and stored in 1 mL PBS (Gibco) aliquots at 3mg/mL in −20°C. Cells were seeded at similar densities on Day 0 of experiment.
miRNA-HSV-tk-GFP Knockdown in vitro
Cells were transduced with miRNA-HSV-tk-GFP constructs and allowed to grow for a maximum of 5 days in standard medium conditions. After 5 days, cells were spun down and resuspended in PBS (Gibco) at 1×106 cell / mL and ran on the Becton Dickinson FACScan flow cytometer gating for fluorescence (FL1-H [GFP positivity]) and forward scatter (FSC-H [shape and size]).
Teratoma Formation
A subcutaneous injection of 5–10 million PSCs in a slurry of Matrigel® and mTeSR medium (1:1) was made in the right flank of anesthetized Rag2−/−;γc−/− immunodeficient mice. Weekly monitoring of teratoma growth was made by quantifying approximate elliptical area (mm2) with the use of calipers measuring outward width and height.
Molecular Sculpting of Teratomas
Standard teratoma formation protocol was followed using miRNA-HSV-tk-GFP transduced H1s. Once teratomas reach a size of at least 10mm in one axis, intratumoral (IT) or combined intraperitoneal intrautumoral (IPIT) administration of GCV begins at 80mg/kg/d or 100mg/kg/d (50mg/kg/d at each site) respectively, using standard needle and syringe injection. Teratoma was allowed to grow for a total of 10 weeks before extraction.
Teratoma Processing
After growth for 70 days on average mice were euthanized by slow release of CO2 followed by secondary means via cervical dislocation. Tumor area was shaved, sprayed with 70% ethanol, and then extracted via surgical excision using scissors and forceps. Tumor was rinsed with PBS, weighed, and photographed. Tumors were inspected for external heterogeneity to ensure proper tumor representation. Representative tumors were cut in a semi-random fashion in ≤ 22 mm diameter pieces and frozen in OCT for sectioning and H&E staining courtesy of the Moore’s Cancer Center Histology Core. Remaining tumor was cut into small pieces 1–2mm in diameter and subjected to standard GentlaMACS™ protocols: Human Tumor Dissociation Kit (medium tumor settings), Red Blood Cell Lysis Kit, and Dead Cell Removal Kit. Single cells were then resuspended in .04% BSA for 10X Genomics chromium (Zheng et al., 2017) platform.
Histology and RNAScope®
Sectioning and H&E staining was performed by the Moore’s Cancer Center Histology Core. In brief, Optimal Cutting Temperature (O.C.T.) blocks were sectioned with a cryostat into 10 micron sections onto a positively charged glass slide. The slide was then stained with Harris hematoxylin and then rinsed in tap water and treated with an alkaline solution. The slide was then de-stained to remove non-specific background staining with a weak acid alcohol. The section was then stained with an aqueous solution of eosin and passed through several changes of alcohol, then rinsed in several baths of xylene. A thin layer of polystyrene mountant was applied, followed by a glass cover slip. Sections from teratomas were confirmed to have the presence of all 3 germ layers: ectoderm, mesoderm, and endoderm via microscopy identification courtesy of pathologist Dr. Ann Tipps. Further detailed identification also performed by Dr. Tipps.
Fresh frozen sections were subjected to standard RNAScope® Fluorescent Multiplex Reagent Kit protocols following fresh frozen tissue requirements. In brief, sections were fixed with chilled 200 mL of 4% PFA in 1X PBS in 4°C for 15 min. The slides were then placed in 50% EtOH for 5 min at RT, then placed in 70% EtOH for 5 min at RT, and then finally placed in 100% EtOH for 5 min at RT twice. After the slides had dried, we drew a hydrophobic barrier around the tissue. We then placed the dried slides on a HybEZ™ Slide Rack, and added Pretreat 4 to entirely cover each section and then incubated for 30 min at RT. Slides were then washed with 1X PBS. We then added the appropriate probe to cover each section. Slides were then placed in the slide rack and then placed in a HybEZ™ Oven for 2 hrs at 40°C. After 2 hrs, slides were taken out and slides were washed with 1X Wash Buffer for 2 min at RT twice. AMP 1-FL was then added to entirely cover each section. The slides were then placed on the slide rack and inserted into the oven for 30 min at 40°C. The slides were then taken out and slides were washed with 1X Wash Buffer for 2 min at RT twice. AMP 2-FL was then added to entirely cover each section. The slides were then placed on the slide rack and inserted into the oven for 15 min at 40°C. The slides were then taken out and slides were washed with 1X Wash Buffer for 2 min at RT twice. AMP 3-FL was then added to entirely cover each section. The slides were then placed on the slide rack and inserted into the oven for 30 min at 40°C. The slides were then taken out and slides were washed with 1X Wash Buffer for 2 min at RT twice. AMP 4-FL (Alt A, B, or C) was then added to entirely cover each section. The slides were then placed on the slide rack and inserted into the oven for 15 min at 40°C. The slides were then taken out and slides were washed with 1X Wash Buffer for 2 min at RT twice. The slides were then counterstained with DAPI (30 sec at RT) and mounted with ProLong™ Gold Antifade Mountant (Cat# P10144). We then placed a 24 mm x 50 mm coverslip over the tissue section and stored them in the dark at 4C.
Immunostaining
For SARS-CoV2 spike protein immunostaining, fresh frozen sections were rinsed once with PBS before addition of 10ug/mL of anti-rabbit IgG Alexa 488 (Invitrogen) + 10ug/mL of SARS-CoV2-spike-RBD protein (Sino Biological) diluted in PBS + 0.5% BSA for 30 mins shielded from light. Two consecutive washes were then performed with PBS + 0.5% BSA 10 min each with gentle agitation before imaging.
For neuro-ectoderm staining, fresh frozen sections were rinsed onced with PBS before fixation at room temperature for 15 min with 4% paraformaldehyde. Three consecutive washes were then performed with PBS 5 min each before addition of blocking buffer (5% normal donkey serum, 0.2% triton x-100 in PBS) for 1 hr. Primary antibody (anti-PAX6 rabbit [Millipore Sigma] diluted 1:50 in blocking buffer) was added overnight (12 hrs) at 4C. Three consecutive washes were then performed with PBS 10 min each with gentle agitation before addition of secondary antibody (Anti-Rabbit Dylight 550 (Abcam) diluted 1:200 in blocking buffer) for 1 hr at 37C shielded from light. Three consecutive washes were then performed with PBS 5 min each with gentle agitation before addition of DAPI (1:10,000 dilution in PBS) for 10 min. This was finally followed by three consecutive washes with PBS 10 min each with gentle agitation before imaging.
Microscopy
Following 24 hrs of incubation with RNAScope® probes in 4°C, slides were imaged using Zeiss 880 Airyscan Confocal microscope with special thanks to Michael Hu for image processing utilizing the UC San Diego Microscopy Core. Raw images on the Leica DMi8 were obtained with 16bit bit-depth per color, and highlights and shadows were adjusted in the LASX software. Raw images on the Zeiss 880 were obtained with 16bit bit-depth per color, and highlights and shadows were adjusted in the ZEN software. RNAScope images were dilated using ImageJ’s MorphoLib by splitting the image into the composite channels and dilating the dots in the appropriate channel. Dots were dilated to 3 pixels as disks.
Cost Analysis
Overall, the cost of profiling a single teratoma with the 10X RNA-seq system runs at about $1,300, including sequencing costs for ~8,000 cells (the output of a single 10X RNA-seq run) at a sequencing depth of 50,000 reads per cell. Mouse husbandry and reagents related to teratoma formation (cells, Matrigel, media) are relatively cheap in comparison. During teratoma growth, the researcher needs to only monitor the mice for health concerns, weights, and tumor measurements if desired. The teratoma can be extracted at any time after 3 weeks of growth. For the miRNA molecular sculpting experiments the mice require a daily dose of GCV until time of tumor extraction. It is also theoretically possible to inject both flanks of the mouse to generate 2 teratomas per animal. With the availability of easy to use analysis tools such as Seurat/PAGODA2, as well as methods for integrating datasets (such as CONOS), running a basic clustering and cell type annotation of scRNA-seq data is fairly straightforward.
QUANTIFICATION AND STATISTICAL ANALYSIS
Overview
For all figures, we used the CellRanger pipeline as described in the Single Cell RNA-Seq Processing section to generate counts matrices (Zheng et al., 2017). We also used the Seurat R package for clustering, data integration, and classification for all figures as described in the Seurat Data Integration and H1 Teratoma Clustering and Validation methods sections (Stuart et al., 2019). For assigning lentiviral barcodes and CRISPR guide RNAs to cells (relevant to Figure 2/S2 and Figure 4/S4 respectively), we used the genotyping-matrices method as described in the Lentiviral Barcode and CRISPR Guide Assignment section (Parekh et al., 2018). For Figure 3/S3, we used Similarity Weighted Nonnegative Embedding (SWNE) as described in the Developmental Staging Analysis section (Wu, Tamayo and Zhang, 2018b). For Figure 4, we quantified guide RNA editing using CRISPResso (Pinello et al., 2016). And for Figure S4, we used DESeq2 as described in the PGP1 Neural Disorder Screen Analysis section (Love, Huber and Anders, 2014). The remaining analysis was done using custom R scripts.
For the heterogeneity analysis in Figure 2/S2, we treated each teratoma as an individual data replicate. For Figure S4, we collapsed the expression all cells with the same cluster and guide RNA identity into a single replicate in order to run pseudobulk differential expression analysis. For Figure S5, each teratoma was treated as a replicate to compute the cell type proportion z-scores. In other analyses each cell was treated as a replicate.
A brief summary of the analysis details for each figure can be found in the results and figure legends. Below we also provide a mapping between each figure and the relevant methods sections:
Figure 1/S1: Seurat Data Integration and H1 Teratoma Clustering and Validation
Figure 2/S2: Quantitative Assessment of Teratoma Heterogeneity and Cell Type Bias and Lentiviral Barcode and CRISPR Guide Assignment
Figure 4/S4/S5: PGP1 Embryonic Lethal Screen Analysis, PGP1 Neural Disorder Screen Analysis and Molecular Sculpting Analysis
All analysis code as well as instructions on how to reproduce our analyses can be found at the Github repository: yanwu2014/teratoma-analysis-code.
Single Cell RNA-seq Processing
Using the 10X Genomics CellRanger (v2.01) pipeline (Zheng et al., 2017), we aligned Fastq files to a combined hg19 and mm10 reference using STAR aligner (Dobin et al., 2013), counted UMIs to generate human and mouse gene-expression counts matrices, and aggregated samples across 10X runs with the cellranger aggr command. All cellranger commands were run using default settings.
Seurat Data Integration
Data integration was performed on the aggregated counts matrices for each of the following datasets: the 7 H1 teratomas, the 6 PGP1 CRISPR-KO screen teratomas, and the 3 cell line teratomas. We used the Seurat v3 data integration pipeline (Butler et al., 2018; Stuart et al., 2019). Briefly, we first filtered the counts matrix for genes that are expressed in at least 0.1% of cells, and cells that express at least 200 genes. We then normalized the counts matrix using total-counts normalization, and log-transformed the result. Log-transforming RNA-seq counts results in the data following an approximately normal distribution, which is the assumption that Seurat makes for the remainder of the analysis (Law et al., 2014). For each teratoma, we identified highly variable genes, and selected the top 4000 genes that appeared as overdispersed across the most teratomas. We then identified anchor cells, and integrated the teratomas to create a batch-corrected gene expression matrix. After batch correction, we used a linear model to regress away library depth, and mitochondrial gene fraction, and ran Principal Components Analysis (PCA) (Abdi and Williams, 2010), keeping the first 30 principal components. We then used the PCs to generate a k Nearest Neighbors (kNN) graph, setting k = 10, and then used the kNN graph to calculate a shared nearest neighbors (SNN) graph (Houle et al., 2010). We ran modularity optimization algorithm with a resolution of 0.4 on the SNN graph to find clusters (Butler et al., 2018).
H1 Teratoma Clustering and Validation
H1 clusters were assigned to cell types using a two-stage strategy. First, we trained a kNN classifier on the Mouse Cell Atlas dataset using k = 40 (Tarlow et al., 2013), mapping mouse genes to their human orthologs. We projected each cell in the teratoma dataset onto the first 40 Principal Components (PCs) of the Mouse Cell Atlas and classified each cell in the H1 teratoma dataset using this kNN classifier to generate a rough set of cell type assignments for each cluster. We then manually inspected the marker genes for each cluster and adjusted the cell type based on the expression of canonical markers (Table S2A – E). We also specifically looked at transcription factor markers using the TRRUST database (Table S2A, S2D) (insert reference). We computed differential gene expression in Seurat using the default Wilcoxon rank-sum test, which does not make any assumptions about the distribution of the data being tested, otherwise known as a non-parametric test (Wilcoxon, 1946). Clusters that mapped to the same MCA cell type, and expressed similar marker genes were merged. Finally, we ran UMAP on the first 30 PCs as input in order to visualize the results (Becht et al., 2018; McInnes and Healy, 2018). We validated each annotated cell type by computing the Pearson correlation between the average expression of each cell type and the average expression of each broad cell type in the Mouse Organogenesis Cell Atlas (Cao et al., 2019). We used the union of all marker genes for the teratoma cell types and Mouse Organogenesis Cell Atlas cell types to perform the correlation analysis.
In some cases, it was necessary to sub-cluster the cells to achieve greater cell type resolution. Specifically, we noted that the ciliated epithelium cluster had both retinal and airway markers so we sub-clustered the all cells mapping to ciliated epithelium in order to separate retinal epithelium and airway epithelium. Additionally, we sub-clustered the neuro-ectoderm in order to identify interneurons, peripheral neurons, retinal progenitors, and early neuro-ectoderm. In both cases we simply subsetted the gene expression matrix with the cells of interest and reran the Seurat analysis pipeline, identifying sub-clusters using known marker genes (Table S2G).
Quantitative Assessment of Teratoma Heterogeneity and Cell Type Bias
In order to quantify the level of heterogeneity between teratomas we used the Normalized
Relative Entropy metric from CONOS (Barkas et al., 2019).
Where fk is a vector with the number of cells in each teratoma from cluster k, KL(fk, F) is the empirical KL divergence between fk and the total number of cells in each teratoma, F. Higher Normalized Relative Entropy means the cell types are more mixed across the teratomas and thus the teratomas are less heterogeneous.
There was only one replicate per non-H1 cell line teratoma as our main goal was to assess the heterogeneity across cell lines versus the heterogeneity within the H1 cell line, while also demonstrating that we could generate teratomas using multiple cell lines.
To quantify the heterogeneity/bias of individual cell types across teratomas we simply take the KL divergence of the number of cells in each teratoma from that cell type/cluster and the total number of cells in each teratoma and then scale by the number of cells in each cell type. For each cell type k:
Lentiviral Barcode and CRISPR Guide Assignment
To assign one or more lentiviral/gRNA barcode to each cell, we extracted each barcode by identifying its flanking sequences, resulting in reads that contain cell, UMI, and barcode tags. To remove potential chimeric reads, we used a two-step filtering process. First, we only kept barcodes that made up at least 0.5% of the total amount of reads for each cell. We then counted the number of UMIs and reads for each plasmid barcode within each cell, and only assigned that cell any barcode that contained at least 10% of the cell’s read and UMI counts. The code for assigning barcodes to each cell can be found on GitHub at: https://github.com/yanwu2014/genotyping-matrices (Parekh et al., 2018).
H1 Cell Barcoding Analysis
We extracted lentiviral barcodes from the genomic DNA fastq files before and after teratoma formation for the 3 barcoded H1 teratomas. We counted the number of unique barcodes that were supported by at least 10 reads (the reads requirement is to mitigate overcounting unique barcodes due to minor sequencing errors) and then computed the fraction of unique barcodes that remain after teratoma formation to assess the approximate number of cells that are involved in the teratoma formation process.
We also identified lentiviral barcodes at the single cell level, using the barcode assignment strategy described in the Lentiviral Barcode and CRISPR Guide Assignment section. For each cell type, we computed its bias for specific barcodes using the same relative entropy metric we used to compute teratoma bias.
Where bk is a vector with the number of cells in each barcode from cluster k, KL(bk, B) is the empirical KL divergence between bk and the total number of cells in each barcode, B.
Developmental Staging Analysis
In order to assess the developmental maturity of the teratoma cell types, we computed the average expression of all cells related to neuro-ectoderm (Radial Glia, Intermediate Neuronal Progenitors, Early Neurons) and gut (Oral/Esophageal, Stomach, Intestine) cell types and calculated the cosine similarity of the teratoma average expression to the average expression of fetal human cells across different time points. We used all genes that were detected in both the fetal and teratoma data.
For the neuro-ectoderm cells, we then sub-clustered those cells and identified additional cell types using canonical marker genes (Table S2G). We then matched those neuro-ectoderm sub-clustered cell types to cell types in a larger fetal week 17–18 single cell prefrontal cortex dataset.
We next generated Similarity Weighted Nonnegative Embeddings (SWNE) (Wu, Tamayo and Zhang, 2018a) for the neuronal and gut cell types using the top 3000 overdispersed genes in each tissue type. Briefly, SWNE uses nonnegative matrix factorization (NMF) (Lee and Seung, 1999) to decompose a gene expression matrix into component factors, embeds the factors in 2D using sammon mapping (Sammon, 1969), and embeds the cells and key genes in the 2D space relative to the factors. The cell positions are smoothed using a shared nearest neighbors (SNN) network. For the neuronal SWNE embedding, we used 30 NMF factors and 20 nearest neighbors when computing the SNN. For the gut SWNE embedding, we used 20 NMF factors and 30 nearest neighbors. We projected teratoma data onto the fetal SWNE, by first projecting the teratoma data onto the fetal NMF factors and generating embedding coordinates. We then smooth the projected coordinates by projecting the teratoma data onto the fetal SNN.
We then compared the expression of key neuronal/gut marker genes in each neuronal and gut cell type by correlating the expression of those markers between the teratoma data and the fetal human data. We used the scaled gene expression for both the teratoma and fetal data, which involves subtracting the average expression and dividing by the standard deviation. We selected the cell type markers for the neuro-ectoderm and gut comparisons using published studies of the developing human cortex and developing gut. Specifically, we selected VIM/SOX2 as markers for Radial Glia, DLX1 as a marker for Interneurons, and HMGB2 as a marker for Cycling Progenitors using the markers from the single-cell RNA-seq study of week 17 – 18 developing human cortex (Polioudakis et al., 2019). HES5 is known to be a key regulator of the neural progenitor state while DCX and NEUROD1 are essential for early neuronal differentiation (Gao et al., 2009; Bansod, Kageyama and Ohtsuka, 2017; Khalaf-Nazzal et al., 2017). For the developing gut, we selected CDX1/CDX2 as Mid/Hindgut markers and PAX9 as a foregut marker from the single-cell RNA-seq study of the developing human digestive tract (Gao et al., 2018). HHEX regulates midgut development, specifically the formation of the pancreas from the gut tube (Bort et al., 2004). SOX2 is a known foregut marker that regulates gut patterning while FOXJ1 marks foregut cells primed for the lung epithelial lineage (Que et al., 2007; Green et al., 2011).
PGP1 Embryonic Lethal Screen Analysis
For each of the six teratomas across the original and replicate screens, we used two technical replicate 10X runs. In order to ensure consistent cell types across teratomas, we merged the 10X runs corresponding to the same teratoma, and then integrated all six teratomas across both the original and replicate screen using Seurat v3 data integration. We used 3000 anchor features and 20 CCA dimensions for the integration. Using the annotated H1 teratoma dataset as the reference, we used Seurat label transfer to identify the cell type for all cells in the screen datasets. Due to the relatively low number of cells per guide RNA in the original screen, we collapsed closely related cell types into broader cell groupings in order to boost the power of our analysis. Specifically, Airway Epithelium was merged into Foregut (Airway epithelium is derived from the foregut epithelium during development), Schwann Cells and Melanoblasts were grouped as Schwann Cell Progenitors (SCP), Immune Cells, Erythrocytes, and Hematopoietic Stem Cells (HSCs) were grouped as Hematopoietic cells, Muscle Progenitors and Cardiac/Skeletal Muscle were grouped as Muscle, all MSC/Fibroblast populations were merged, Intermediate Neuronal Progenitors (INP) and Radial Glia were grouped as Neuronal Progenitors, and Retinal Neurons and Early Neurons were simply grouped as Neurons. In order to visualize the PGP1 data, we projected the integrated screen dataset onto the first 20 PCs from the H1 dataset and ran UMAP on the projected PCs.
We validated the editing efficiencies of all our guide RNAs using PCR amplification of the expected cut site and looking for mutations and indels with CRISPResso. We then selected the top guide targeting each gene with at least a 60% overall editing efficiency and a 40% indel efficiency which resulted in a total of 16 out of 48 guides selected. We then only used these 16 validated guides for further computational analysis. Unfortunately, the TULP3–2 guide was not detected in the replicate screen so we ended up using 15 guides (plus 5 NTC guides) for analysis.
We assigned CRISPR-KO gene perturbations using the barcode assignment strategy described in the Lentiviral Barcode and CRISPR Guide Assignment section. To determine the total effect of each knockout, we computed a normalized Earth Mover’s Distance (EMD) between all cells in each gene knockout with all cells belonging to the NTC separately for each screen (Chen et al., 2020). EMD computes the difference in cell type composition between two groups of cells, weighted by how transcriptionally distinct the cell types are (Chen et al., 2020). Thus, differences in cell type composition between cells belonging to the gene knockouts and NTC that arise from the fact that the label transfer has a hard time distinguishing similar cell types will not be as highly weighted as differences between distinct cell types. We ran the EMD analysis separately for the original and replicate screens, and normalized the EMD metric so that the average EMD for all NTC guides would equal 1.
To assess the effect of gene knockouts on individual cell types, we used a ridge regression model with the R glmnet package as initially described in the PerturbSeq method (Friedman et al., 2015; Dixit, Parnas, Li, Chen, et al., 2016). Briefly, for each CRISPR gRNA, this resulted in regression coefficients for each cell type describing the enrichment or depletion of that gRNA in that cell type. This method assumes that the data is normally distributed, which is approximately true for RNA-seq and scRNA-seq data when log-transformed (insert ref). We permuted the gRNA assignments to assign p-values to each coefficient representing the probability that coefficient is non-zero by chance. Because we used a non-parametric permutation test, we did not make any assumptions about the distribution of regression coefficients. We then used the Benjamini-Hochberg multiple testing correction (Thissen, Steinberg and Kuang, 2002) to generate False Discovery Rates and visualized coefficients with an FDR < 0.05. For each gRNA, we computed the cell type shift effect size as the average EMD effect across the screens. The reproducibility of the gRNA knockout was assessed by correlating the gRNA knockout effects (regression coefficients) across the original and replicate screen.
PGP1 Neural Disorder Screen Analysis
For each of the 2 teratomas across the original and replicate screens, we used two technical replicate 10X runs. In order to ensure consistent cell types across teratomas, we merged the 10X runs corresponding to the same teratoma, and then integrated the teratomas using Seurat v3 data integration. We used the same data integration and label transfer parameters as the embryonic lethal screen. We again collapsed closely related cell types into the broader cell groupings described in the PGP1 Embryonic Lethal Screen section, and additionally filtered out any remaining cell types with fewer than 200 cells.
We assigned CRISPR-KO gene perturbations using the barcode assignment strategy described in the Lentiviral Barcode and CRISPR Guide Assignment section. To determine the total effect of each knockout, we again computed the normalized Earth Mover’s Distance (EMD) between all cells in each gene knockout with all cells belonging to the NTC separately for each screen (Chen et al., 2020).
We analyzed differential expression for each broad cell type separately so that cell type specific effects would be captured. For each cell type, we summed the counts for all cells assigned to a specific guide RNA and a specific teratoma to create a pseudobulk expression matrix. This essentially treats each guide in each teratoma as a biological replicate for a given gene knockout, and enables us to use DESeq2, a well-validated differential expression method (Love, Huber and Anders, 2014). For each gene knockout, we ended up with 6 pseudobulk replicates (3 guides x 2 teratomas). We ran DESeq2 with default parameters, comparing the pseudobulk replicates for each gene with the NTC replicates, and used apeglm to shrink effect sizes. We set a False Discovery Rate cutoff of 0.1 to call a gene differentially expressed. We also ran DESeq2 on each teratoma separately to compute log fold-changes and assess reproducibility.
Molecular Sculpting Analysis
To assess the enrichment or depletion of cell types in the miRNA-HSV-tk transduced H1 teratomas, we compared teratomas that had ganciclovir (GCV) added using intratumoral (IT) and both intratumoral and intraperitoneal (IPIT) injection methods, versus a control teratoma that had the construct miRNA-HSV-tk but no GCV. All teratomas were injected on the same date and extracted after 10 weeks of growth. To assign cell types, we again used Seurat’s label transfer. We then collapsed cell types using the same merging strategy described in the PGP1 Teratoma Screen Analysis section, and then computed the fraction of cell types present in each teratoma. Finally, we computed log2 fold-changes of cell type fractions by dividing the cell type fractions in the GCV+ IT/IPIT teratomas with the cell type fractions in the GCV− teratoma. To compute an estimated z-score, we subtracted the GCV− teratoma fractions from the GCV+ IPIT/IT teratoma fractions and divided by the cell type fraction variance. The z-scores for IPIT and IT teratomas were computed separately, and the cell type fraction variance was computed by pooling the variance of the miRNA-HSV-tk teratomas and the variance of the plain H1 teratomas with Cohen’s pooled standard deviation (Cohen, 1988).
Figure Generation
All figures were generated using original artwork or open source with InkScape, Adobe Illustrator®, and ImageJ.
Supplementary Material
Figure S1. Comprehensive teratoma characterization. Related to Figure 1 and Table 1. (A) H&E stains (left to right, top to bottom): Choroid Plexus, Fetal Neuro-ectoderm, Retinal Pigment Epithelium (RPE), Developing Airway, Ciliated Respiratory Epithelium, Fetal Cartilage, Mesenchyme, Bone, Developing Cardiac/Skeletal Muscle, Squamous epithelium, Retinal Neurons (around RPE), Smooth Muscle, Adipocytes. (B) The fraction of cells that are classified as MSC/Fibroblast across each teratoma. (C) Heatmap of key marker genes for each cell type (guidelines separate cell types from different germ layers) (Table S3C). (D) Correlation of the average expression of each human teratoma cell type with the average expression of each fetal mouse cell type. (E) UMAP plot of mouse cell types in the H1 teratomas.
Figure S2. Assaying teratoma heterogeneity. Related to Figure 2. (A) UMAP scatterplot showing how each line (HUES62, PGP1, and H9) contributes to the various cell type clusters. (B) Left: the normalized proportion of each teratoma in every cell type. Right: the bias each cell type shows towards specific teratomas. A low bias score means the cell type is well mixed across all 7 teratomas. (C) Growth kinetics of 6 teratomas based on cell line (HUES62, PGP1, and H9). (D) Karyotyping of all 4 PSC lines. (E) Lentiviral barcode construct map. (F) Barcoding summary statistics for both bulk and single cell assays across the three barcoded teratomas.
Figure S3. Assaying teratoma maturity. Related to Figure 3 and Table 1. (A) A heatmap of log fold-changes for the top differentially expressed genes between matched teratoma neuro-ectoderm and fetal cortical cell types. (B) A heatmap of the enrichment scores for top differential genesets (via Geneset Enrichment Analysis) between matched teratoma neuro-ectoderm and fetal cortical cell types. (C) Cosine similarity of teratoma gut cells with fetal gut cells of different ages. (D) Projection of fetal gut epithelium cell types onto a teratoma gut epithelium SWNE embedding. (E) Correlation of the scaled expression of key marker genes across mid/hindgut epithelium and foregut epithelium between teratoma and fetal cell types. (F) Proportion of foregut and mid/hindgut cells in the teratoma and fetal gut. (G) A heatmap of log fold-changes for the top differentially expressed genes between matched teratoma gut epithelium and fetal gut epithelium cell types. (H) A heatmap of the enrichment scores for top differential genesets (via Geneset Enrichment Analysis) between matched teratoma gut epithelium and fetal gut epithelium cell types. (I) H&E stains (left) as well as RNA FISH staining (right) of FOXJ1 (Airway epithelium), CDX2 (Intestinal epithelium), TNNT2 (Cardiac muscle), and THY1 (mesenchymal stem cell/fibroblast). Scalebar = 50μM (20x). Dots were dilated using ImageJ.
Figure S4. Engineering teratomas via genetic perturbations. Related to Figure 4. (A) Schematic showing knock-in of the CAG-spCas9-P2A-EGFP cassette with an upstream T2A linked blasticidin resistance gene into the AAVS1 locus thus, creating the Cas9-expressing PGP1 line (above). Accompanying validated trace sequences of the left and right arms (below). (B) 2% agarose gel confirming integration of the CAG-spCas9-P2A-EGFP cassette into the AAVS1 locus of the PGP1 line via PCR amplification of the left and right arm spanning the endogenous locus and the engineered cassette compared to a PGP1 negative control. (C) Observed cells per gRNA and cells per gene for the screen. (D) UMAP projection of PGP1 cell types classified using the H1 cell types as a reference. (E) PGP1-Cas9 iPSCs were transduced with a CRISPR-Cas9 library targeting TCF4 (Pitt-Hopkins Syndrome), MECP2 (Rett Syndrome), and L1CAM (L1 Syndrome) with 3 guides each. After generating 2 teratomas with the PGP1-iPSCs, scRNA-seq was used to identify shifts in cell type specific gene expression as a result of gene knockouts. (F) Shift in cell types as measured by normalized Earth Mover’s Distance (EMD) due to knockouts from the embryonic lethal knockouts and the disease screen knockouts (TCF4, MECP2, L1CAM). (G - I) The shift in gene expression as measured by log2 fold-change against NTC guides across both teratoma replicates for (G) L1CAM knockout in Neurons, (H) MECP2 knockout in Neural Progenitors, (I) TCF4 knockout in Neural Progenitors. The color of the data points represents the −log(False Discovery Rate) as computed by DESeq2.
Figure S5. Engineering teratomas via molecular sculpting. Related to Figure 4. (A) Phase images from light microscopy showing H1 cell survival after 3 and 5 days in the presence of GCV (10μM). H1 ESC line was either transduced with GFP control (EGIP backbone) or miR-124-HSV-tk-GFP. (B)-(C) Quantification using flow cytometry and gating based on the presence or absence of GFP in HEK293T and HeLa/HUVEC cells (B)/(C) transduced with either No GFP control, HSV-tk-GFP, or miR-21-HSV-tk-GFP/miR-126-HSV-tk-GFP for 5 days (Methods). (D) Schematic of generating self-patterned whole brain organoids (Methods). (E) Images of teratomas grown in the absence and presence of GCV administration (80mg/kg/d, Methods) for 10 weeks. (F) H&E stains of teratomas grown in the absence (left) and presence (right) of GCV administration. Arrowheads highlight regions of neuro-ecotoderm. Scalebars are directly labeled. (G) anti-PAX6 (red) and DAPI (blue) immunostaining in GCV+ and GCV− control sections across 3 different regions of the corresponding teratoma. Scalebar = 2 mm. (H) Secondary antibody staining only (Dylight 550, red) and DAPI (blue) for a GCV+ and GCV− negative teratoma. Scalebar = 2 mm. (I) RNA FISH analysis of HES5 (red) and DAPI (blue) in a GCV+ and GCV− teratoma. Scalebar = 2 mm, 200 μm (magnified insert).
Table S1. Teratoma Metrics. Related to Figure 1. (A) H1 teratoma metrics. (B). Cell line teratoma metrics. (C) Embryonic lethal screen teratoma metrics. (D) Embyronic lethal screen repool teratoma metrics. (E) Neural disease screen teratoma metrics. (F) miR-124 teratoma metrics.
Table S2. Cell Type Identification. Related to Figure 1 and Table 1. (A). Top TF markers for original Seurat clusters. (B) Top overall markers for original Seurat clusters. (C) Mapping Seurat clusters to cell types. (D) Top TF markers for each mapped cell type. (E) Top overall markers for each mapped cell type. (F) Sub-clustering for the ciliated epithelium. (G) Sub-clustering for the neuro-ectoderm.
Table S3. Cell Type Summary. Related to Figure 1 and Figure 2. (A) Cell type counts. (B) Cell type proportions. (C) Final set of cell type marker genes with references.
Table S4. Developmental Screen Targets and sgRNAs. Related to Figure 4. (A) Target genes known to be embryonic lethal in mice. (B) sgRNA primers used for cloning. (C) Indel amplification primers used to assess sgRNA editing rates. (D) sgRNA editing rates.
Table S5. Neural Disease Screen sgRNAs and Differentially Expressed Genes. Related to Figure S4. (A) sgRNA primers for cloning. (B) Summary of differentially expressed genes (DEGs) for each gene knockout in each broad cell type. (C) Differentially expressed genes in neurons. (D) Differentially expressed genes in Neural Progenitors. (E) Differentially expressed genes in Muscle. (F) Differentially expressed genes in Immune cells. (G) miRNA Sequences, Target Sites, and Lineage Specificities.
Highlights:
Identified 20 preliminary teratoma cell types via scRNA-seq, histology and RNA FISH.
Benchmarked teratoma brain and gut cell types against human fetal scRNA-seq datasets.
Demonstrated teratomas enable CRISPR screens across multiple cell types simultaneously.
Engineered teratomas with miRNA gene circuits to enrich for a specific lineage.
ACKNOWLEDGEMENTS
We thank members of the Mali lab for advice and help with experiments, Marianna Yusupova for help with initial studies, Alexander Militar for assistance in schematic generation, in loving memory of Nakon Aroonsakool, and to the Moore’s Cancer Center Histology Core, UC San Diego Microscopy Core, Sanford Consortium Flow Cytometry Core, and IGM Genomics Center for help with sample processing. This work was generously supported by UCSD Institutional Funds and NIH grants (R01HG009285, RO1CA222826, RO1GM123313).
Footnotes
DECLARATION OF INTERESTS
D.M., Y.W., K.Z. and P.M. have filed patents based on this work. K.Z. is a co-founder, equity holder, and paid consultant of Singlera Genomics, which has no commercial interests related to this study. P.M. is a scientific co-founder of Shape Therapeutics, Boundless Biosciences, Seven Therapeutics, Navega Therapeutics, and Engine Biosciences, which have no commercial interests related to this study. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- Abdi H and Williams LJ (2010) ‘Principal component analysis’, Chemometrics and Intelligent Laboratory Systems, 2, pp. 433–459. doi: 10.1002/wics.101. [DOI] [Google Scholar]
- Adamson B et al. (2016) ‘A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response’, Cell. Elsevier, 167(7), pp. 18671882.e21. doi: 10.1016/j.cell.2016.11.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akcakaya P et al. (2018) ‘In vivo CRISPR-Cas gene editing with no detectable genome-wide off-target mutations’, bioRxiv, p. 272724. doi: 10.1101/272724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Amabile G et al. (2019) ‘In vivo generation of transplantable human hematopoietic cells from induced pluripotent stem cells’, 121(8), pp. 1–3. doi: 110.1182/blood-2012-06-434407.There. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ambros V (2004) ‘The functions of animal microRNAs’, 431(September). [DOI] [PubMed] [Google Scholar]
- An Z et al. (2018) ‘A quiescent cell population replenishes mesenchymal stem cells to drive accelerated growth in mouse incisors’, Nature Communications. Springer US, 9(1). doi: 10.1038/s41467-017-02785-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aurora M and Spence JR (2016) ‘hPSC-derived lung and intestinal organoids as models of human fetal tissue’, Developmental Biology. Elsevier, 420(2), pp. 230–238. doi: 10.1016/j.ydbio.2016.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Avior Y, Biancotti JC and Benvenisty N (2015) ‘TeratoScore: Assessing the Differentiation Potential of Human Pluripotent Stem Cells by Quantitative Expression Analysis of Teratomas’, Stem Cell Reports. The Authors, 4(6), pp. 967–974. doi: 10.1016/j.stemcr.2015.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bansod S, Kageyama R and Ohtsuka T (2017) ‘Hes5 regulates the transition timing of neurogenesis and gliogenesis in mammalian neocortical development’, Development (Cambridge), 144(17), pp. 3156–3167. doi: 10.1242/dev.147256. [DOI] [PubMed] [Google Scholar]
- Barkas N et al. (2019) ‘Joint analysis of heterogeneous single-cell RNA-seq dataset collections’, Nature Methods. Springer US, 16(8), pp. 695–698. doi: 10.1038/s41592-019-0466z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bartel DP (2018) ‘Review Metazoan MicroRNAs’, Cell. Elsevier Inc., 173(1), pp. 20–51. doi: 10.1016/j.cell.2018.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bartel DP, Lee R and Feinbaum R (2004) ‘MicroRNAs: Genomics, Biogenesis, Mechanism, and Function Genomics : The miRNA Genes’, 116, pp. 281–297. [DOI] [PubMed] [Google Scholar]
- Bartfeld S et al. (2015) ‘In Vitro Expansion of Human Gastric Epithelial Stem Cells and Their Responses to Bacterial Infection’, Gastroenterology, 148(1), pp. 126–136.e6. doi: 10.1053/j.gastro.2014.09.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Becht E et al. (2018) ‘Dimensionality reduction for visualizing single-cell data using UMAP’, Nature Biotechnology, 37(1). doi: 10.1038/nbt.4314. [DOI] [PubMed] [Google Scholar]
- Bigorgne AE et al. (2014) ‘TTC7A mutations disrupt intestinal epithelial apicobasal polarity’, Journal of Clinical Investigation, 124(1), pp. 328–337. doi: 10.1172/JCI71471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Black JB et al. (2016) ‘Targeted Epigenetic Remodeling of Endogenous Loci by CRISPR/Cas9-Based Transcriptional Activators Directly Converts Fibroblasts to Neuronal Cells’, Cell Stem Cell. Elsevier Inc., 19(3), pp. 406–414. doi: 10.1016/j.stem.2016.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bocker W (2002) ‘WHO classification of breast tumors and tumors of the female genital organs: pathology and genetics’, Verhandlungen der Deutschen Gesellschaft fur Pathologie. Germany, 86, pp. 116–119. [PubMed] [Google Scholar]
- Boj SF et al. (2015) ‘Organoid Models of Human and Mouse Ductal Pancreatic Cancer’, Cell, 160(1), pp. 324–338. doi: 10.1016/j.cell.2014.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bort R et al. (2004) ‘Hex homeobox gene-dependent tissue positioning is required for organogenesis of the ventral pancreas’, Development, 131(4), pp. 797–806. doi: 10.1242/dev.00965. [DOI] [PubMed] [Google Scholar]
- Brown J, Quadrato G and Arlotta P (2018) Studying the Brain in a Dish : 3D Cell Culture Models of Human Brain Development and Disease 1st edn, Human Embryonic Stem Cells in Development. 1st edn. Elsevier Inc. doi: 10.1016/bs.ctdb.2018.03.002. [DOI] [PubMed] [Google Scholar]
- Butler A et al. (2018) ‘Integrating single-cell transcriptomic data across different conditions, technologies, and species’, Nature Biotechnology, 36(5). doi: 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao J et al. (2017) ‘Comprehensive single-cell transcriptional profiling of a multicellular organism.’, Science (New York, N.Y.). American Association for the Advancement of Science, 357(6352), pp. 661–667. doi: 10.1126/science.aam8940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao J et al. (2019) ‘The single-cell transcriptional landscape of mammalian organogenesis’, Nature. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Capowski EE et al. (2019) ‘Reproducibility and staging of 3D human retinal organoids across multiple pluripotent stem cell lines’, Development, 146(1), p. dev171686. doi: 10.1242/dev.171686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Castro DS et al. (2011) ‘A novel function of the proneural factor Ascl1 in progenitor proliferation identified by genome-wide characterization of its targets’, Genes and Development, 25(9), pp. 930–945. doi: 10.1101/gad.627811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cathery W et al. (2018) ‘Concise Review: The Regenerative Journey of Pericytes Toward Clinical Translation’, Stem Cells, 36(9), pp. 1295–1310. doi: 10.1002/stem.2846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chak K et al. (2016) ‘Increased precursor microRNA-21 following status epilepticus can compete with mature microRNA-21 to alter translation.’, Experimental neurology, 286, pp. 137– 146. doi: 10.1016/j.expneurol.2016.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chambers SM, Tchieu J and Studer L (2013) ‘Build-a-brain’, Cell Stem Cell. Elsevier, 13(4), pp. 377–378. doi: 10.1016/j.stem.2013.09.010. [DOI] [PubMed] [Google Scholar]
- Chan SSK et al. (2018) ‘Skeletal Muscle Stem Cells from PSC-Derived Teratomas Have Functional Regenerative Capacity’, Cell Stem Cell. Elsevier Inc., 23(1), pp. 74–85.e6. doi: 10.1016/j.stem.2018.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chapleau CA et al. (2013) ‘Evaluation of current pharmacological treatment options in the management of Rett syndrome: from the present to future therapeutic alternatives.’, Current clinical pharmacology, 8(4), pp. 358–369. doi: 10.2174/15748847113086660069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen M and Qi LS (2017) ‘Repurposing CRISPR System for Transcriptional Activation’, in Li L-C (ed.) RNA Activation. Singapore: Springer Singapore, pp. 147–157. doi: 10.1007/978981-10-4310-9_10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen WS et al. (2020) ‘Uncovering axes of variation among single-cell cancer specimens’, Nature Methods. Springer US. doi: 10.1038/s41592-019-0689-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clevers H (2016) ‘Review Modeling Development and Disease with Organoids’, Cell. Elsevier Inc., 165(7), pp. 1586–1597. doi: 10.1016/j.cell.2016.05.082. [DOI] [PubMed] [Google Scholar]
- Cohen J (1988) Statistical Power Analysis for the Behavioral Sciences. New York, NY, US: Routledge; Available at: 10.4324/9780203771587. [DOI] [Google Scholar]
- Collin J et al. (2019) ‘Deconstructing Retinal Organoids: Single Cell RNA-Seq Reveals the Cellular Components of Human Pluripotent Stem Cell-Derived Retina’, Stem Cells, 37(5), pp. 593–598. doi: 10.1002/stem.2963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Datlinger P et al. (2017) ‘Pooled CRISPR screening with single-cell transcriptome readout’, Nature Methods. Nature Research, 14(3), pp. 297–301. doi: 10.1038/nmeth.4177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dean L (2012) ‘Pitt-Hopkins Syndrome’, in Pratt VM et al. (eds). Bethesda (MD). [PubMed] [Google Scholar]
- Dekkers JF et al. (2013) ‘A functional CFTR assay using primary cystic fibrosis intestinal organoids’, Nature Medicine. Nature Publishing Group, 19(7), pp. 939–945. doi: 10.1038/nm.3201. [DOI] [PubMed] [Google Scholar]
- Denisenko E et al. (2019) ‘Systematic assessment of tissue dissociation and storage biases in single-cell and single-nucleus RNA-seq workflows’, bioRxiv, p. 832444. doi: 10.1101/832444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dijk D Van et al. (2018) ‘Recovering Gene Interactions from Single-Cell Data Resource Recovering Gene Interactions from Single-Cell Data Using Data Diffusion’, Cell. Elsevier Inc., 174(3), pp. 716–729.e27. doi: 10.1016/j.cell.2018.05.061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ding J et al. (2019) ‘Systematic comparative analysis of single cell RNA-sequencing methods’, bioRxiv, p. 632216. doi: 10.1101/632216. [DOI] [Google Scholar]
- Dixit A, Parnas O, Li B, Weissman JS, et al. (2016) ‘Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens’, Cell. Elsevier Inc., 167(7), pp. 1853–1857.e17. doi: 10.1016/j.cell.2016.11.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dixit A, Parnas O, Li B, Chen J, et al. (2016) ‘Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens’, Cell. Elsevier Inc., 167(7), pp. 1853–1866.e17. doi: 10.1016/j.cell.2016.11.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobin A et al. (2013) ‘STAR: Ultrafast universal RNA-seq aligner’, Bioinformatics, 29(1), pp. 15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dutta D, Heo I and Clevers H (2017) ‘Disease Modeling in Stem Cell-Derived 3D Organoid Systems’, Trends in Molecular Medicine. Elsevier Ltd, 23(5), pp. 393–410. doi: 10.1016/j.molmed.2017.02.007. [DOI] [PubMed] [Google Scholar]
- Ehinger Y et al. (2018) ‘Rett syndrome from bench to bedside: recent advances.’, F1000Research, 7, p. 398. doi: 10.12688/f1000research.14056.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farrell JA et al. (2018) ‘Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis’, Science, 3131(April), p. eaar3131. doi: 10.1126/science.aar3131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fijneman RJA et al. (2012) ‘Runx1 is a tumor suppressor gene in the mouse gastrointestinal tract’, Cancer Science, 103(3), pp. 593–599. doi: 10.1111/j.1349-7006.2011.02189.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fligor CM et al. (2018) ‘Three-Dimensional Retinal Organoids Facilitate the Investigation of Retinal Ganglion Cell Development, Organization and Neurite Outgrowth from Human Pluripotent Stem Cells’, Scientific Reports, 8(1), p. 14520. doi: 10.1038/s41598-018-32871-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forrest MP et al. (2013) ‘Knockdown of Human TCF4 Affects Multiple Signaling Pathways Involved in Cell Survival, Epithelial to Mesenchymal Transition and Neuronal Differentiation’, PLOS ONE. Public Library of Science, 8(8). doi: 10.1371/journal.pone.0073169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forrest MP et al. (2014) ‘The emerging roles of TCF4 in disease and development’, Trends in Molecular Medicine. Elsevier Ltd, 20(6), pp. 322–331. doi: 10.1016/j.molmed.2014.01.010. [DOI] [PubMed] [Google Scholar]
- Friedman AJ et al. (2015) ‘Lasso and Elastic-Net Regularized Generalized Linear Models’, Available online at https://cran.r-project.org/web/packages/glmnet/glmnet.pdf (Verified 29 July. 2015). Available at: http://www.jstatsoft.org/v33/i01/.
- Gao D et al. (2014) ‘Organoid Cultures Derived from Patients with Advanced Prostate Cancer’, Cell, 159(1), pp. 176–187. doi: 10.1016/j.cell.2014.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gao N, White P and Kaestner KH (2009) ‘Establishment of Intestinal Identity and Epithelial-Mesenchymal Signaling by Cdx2’, Developmental Cell. Elsevier Ltd, 16(4), pp. 588–599. doi: 10.1016/j.devcel.2009.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gao S et al. (2018) ‘Tracing the temporal-spatial transcriptome landscapes of the human fetal digestive tract using single-cell RNA-sequencing’, Nature Cell Biology. Springer US, 20(6), pp. 721–734. doi: 10.1038/s41556-018-0105-4. [DOI] [PubMed] [Google Scholar]
- Gao Z et al. (2009) ‘Neurod1 is essential for the survival and maturation of adult-born neurons.’, Nature neuroscience. NIH Public Access, 12(9), pp. 1090–2. doi: 10.1038/nn.2385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garrido-Martín EM et al. (2012) ‘Vascular injury triggers Krüppel-like factor 6 mobilization and cooperation with specificity protein 1 to promote endothelial activation through upregulation of the activin receptor-like kinase 1 gene’, Circulation Research, 112(1), pp. 113–127. doi: 10.1161/circresaha.112.275586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goding CR (2000) ‘Mitf from neural crest to melanoma: signal transduction and transcription in the melanocyte lineage.’, Genes & development. United States, 14(14), pp. 1712–1728. [PubMed] [Google Scholar]
- Green MD et al. (2011) ‘Generation of anterior foregut endoderm from human embryonic and induced pluripotent stem cells’, Nature Biotechnology. Nature Publishing Group, 29(3), pp. 267–273. doi: 10.1038/nbt.1788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo C et al. (2018) ‘CellTag Indexing: a genetic barcode-based multiplexing tool for single-cell technologies’, bioRxiv, p. 335547. doi: 10.1101/335547. [DOI] [Google Scholar]
- Han H, Cho JW, Lee S, et al. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res. 2018;46(D1):D380–D386. doi: 10.1093/nar/gkx1013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han X et al. (2018) ‘Mapping the Mouse Cell Atlas by Microwell-Seq’, Cell. Elsevier Inc., 172(5), pp. 1091–1107.e17. doi: 10.1016/j.cell.2018.02.001. [DOI] [PubMed] [Google Scholar]
- Hirosawa M et al. (2017) ‘Cell-type-specific genome editing with a microRNA-responsive CRISPR-Cas9 switch’, Nucleic Acids Research, 45(13), p. e118. doi: 10.1093/nar/gkx309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hodge RD et al. (2019) ‘Conserved cell types with divergent features between human and mouse cortex’, Nature, p. 384826. doi: 10.1101/384826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Houle ME et al. (2010) ‘Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?’, in Gertz M and Ludäscher B (eds) Scientific and Statistical Database Management. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 482–500. [Google Scholar]
- Huch M et al. (2017) ‘The hope and the hype of organoid research’, pp. 938–941. doi: 10.1242/dev.150201. [DOI] [PubMed] [Google Scholar]
- Huch M and Koo B-K (2015) ‘Modeling mouse and human development using organoid cultures’, Development, 142(18), pp. 3113–3125. doi: 10.1242/dev.118570. [DOI] [PubMed] [Google Scholar]
- Jabaudon D and Lancaster M (2018) ‘Exploring landscapes of brain morphogenesis with organoids’, pp. 2016–2019. doi: 10.1242/dev.172049. [DOI] [PubMed] [Google Scholar]
- Jang S et al. (2017) ‘Dynamics of embryonic stem cell differentiation inferred from single-cell transcriptomics show a series of transitions through discrete cell states’, eLife, 6, pp. 1–28. doi: 10.7554/eLife.20487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jung P et al. (2011) ‘Isolation and in vitro expansion of human colonic stem cells’, Nature Medicine, 17(10), pp. 1225–1227. doi: 10.1038/nm.2470. [DOI] [PubMed] [Google Scholar]
- Kalluri R and Weinberg RA (2009) ‘The basics of epithelial-mesenchymal transition’, 119(6). doi: 10.1172/JCI39104.1420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khalaf-Nazzal R et al. (2017) ‘Early born neurons are abnormally positioned in the doublecortin knockout hippocampus’, Human molecular genetics, 26(1), pp. 90–108. doi: 10.1093/hmg/ddw370. [DOI] [PubMed] [Google Scholar]
- Kim JW et al. (2016) ‘Characterizing genomic alterations in cancer by complementary functional associations.’, Nature biotechnology, 34(5), pp. 3–5. doi: 10.1038/nbt.3527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim T and Shivdasani RA (2016) ‘Stomach development, stem cells and disease’, pp. 554–565. doi: 10.1242/dev.124891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klein AM et al. (2015) ‘Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells Resource Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells’, Cell. Elsevier Inc., 161(5), pp. 1187–1201. doi: 10.1016/j.cell.2015.04.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lagos-quintana M et al. (2002) ‘Identification of Tissue-Specific MicroRNAs from Mouse’, 12(02), pp. 735–739. [DOI] [PubMed] [Google Scholar]
- Law CW et al. (2014) ‘voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.’, Genome biology, 15(2), p. R29. doi: 10.1186/gb-2014-15-2-r29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee DD and Seung HS (1999) ‘Learning the parts of objects by non-negative matrix factorization’, Nature, 401(6755), pp. 788–791. doi: 10.1038/44565. [DOI] [PubMed] [Google Scholar]
- Lensch MW et al. (2007) ‘Teratoma Formation Assays with Human Embryonic Stem Cells: A Rationale for One Type of Human-Animal Chimera’, Cell Stem Cell, 1(3), pp. 253–258. doi: 10.1016/j.stem.2007.07.019. [DOI] [PubMed] [Google Scholar]
- Li J et al. (2009) ‘MiR-21 indicates poor prognosis in tongue squamous cell carcinomas as an apoptosis inhibitor.’, Clinical cancer research : an official journal of the American Association for Cancer Research. United States, 15(12), pp. 3998–4008. doi: 10.1158/1078-0432.CCR-08-3053. [DOI] [PubMed] [Google Scholar]
- Li W and Pozzo-Miller L (2014) ‘BDNF deregulation in Rett syndrome.’, Neuropharmacology, 76 Pt C(0 0), pp. 737–746. doi: 10.1016/j.neuropharm.2013.03.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin Y et al. (2009) ‘Characterization of SoxB2 and SoxC genes in amphioxus (Branchiostoma belcheri): implications for their evolutionary conservation.’, Science in China. Series C, Life sciences. China, 52(9), pp. 813–822. doi: 10.1007/s11427-009-0111-7. [DOI] [PubMed] [Google Scholar]
- Liu C et al. (2018) ‘Modeling human diseases with induced pluripotent stem cells : from 2D to 3D and beyond’, pp. 1–6. doi: 10.1242/dev.156166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Love MI, Huber W and Anders S (2014) ‘Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.’, Genome biology, 15(12), p. 550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu J et al. (2005) ‘MicroRNA expression profiles classify human cancers’, 435(June). doi: 10.1038/nature03702. [DOI] [PubMed] [Google Scholar]
- Lu Z et al. (2008) ‘MicroRNA-21 promotes cell transformation by targeting the programmed cell death 4 gene’, Oncogene. Nature Publishing Group, 27, p. 4373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Macosko EZ et al. (2015) ‘Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets’, Cell. Elsevier Inc., 161(5), pp. 1202–1214. doi: 10.1016/j.cell.2015.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marmigère F et al. (2006) ‘The Runx1/AML1 transcription factor selectively regulates development and survival of TrkA nociceptive sensory neurons’, Nature Neuroscience, 9(2), pp. 180–187. doi: 10.1038/nn1631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McInnes L and Healy J (2018) ‘UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction’, arXiv, pp. 1–18. Available at: http://arxiv.org/abs/1802.03426. [Google Scholar]
- Medina PP and Slack FJ (2008) ‘MicroRNAs and cancer : An overview’, 4101. doi: 10.4161/cc.7.16.6453. [DOI] [PubMed] [Google Scholar]
- Miki K et al. (2015) ‘Efficient detection and purification of cell populations using synthetic micro-RNA switches’, Cell Stem Cell, 16, pp. 699–711. doi: 10.1016/j.stem.2015.04.005. [DOI] [PubMed] [Google Scholar]
- Miller JA et al. (2014) ‘Transcriptional landscape of the prenatal human brain’, Nature, 508(7495), pp. 199–206. doi: 10.1038/nature13185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moris N et al. (2020) ‘An in vitro model for anteroposterior organisation during human development’, Nature. doi: 10.1038/s41586-020-2383-9. [DOI] [PubMed] [Google Scholar]
- Mort RL et al. (2015) ‘The melanocyte lineage in development and disease’, Development, 142(7), pp. 620–632. doi: 10.1242/dev.123729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nissim L et al. (2017) ‘Synthetic RNA-based immunomodulatory gene circuits for cancer immunotherapy’, Cell, 171, pp. 1138–1150. doi: 10.1016/j.cell.2017.09.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ortmann D and Vallier L (2017) ‘Variability of human pluripotent stem cell lines’, Current Opinion in Genetics & Development. Elsevier Ltd, 46, pp. 179–185. doi: 10.1016/j.gde.2017.07.004. [DOI] [PubMed] [Google Scholar]
- Parekh U et al. (2018) ‘Mapping Cellular Reprogramming via Pooled Overexpression Screens with Paired Fitness and Single-Cell RNA-Sequencing Readout’, Cell Systems. Elsevier Inc, pp. 1–8. doi: 10.1016/j.cels.2018.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peter IS and Davidson EH (2011) ‘Evolution of gene regulatory networks controlling body plan development.’, Cell. United States, 144(6), pp. 970–985. doi: 10.1016/j.cell.2011.02.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Philipp F et al. (2018) ‘Human Teratoma-Derived Hematopoiesis Is a Highly Polyclonal Process Supported by Human Umbilical Vein Endothelial Cells’, Stem Cell Reports. ElsevierCompany., 11(5), pp. 1051–1060. doi: 10.1016/j.stemcr.2018.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phipson B et al. (2019) ‘Evaluation of variability in human kidney organoids’, Nature Methods, 16(1), pp. 79–87. doi: 10.1038/s41592-018-0253-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pijuan-Sala B et al. (2019) ‘A single-cell molecular map of mouse gastrulation and early organogenesis’, Nature. doi: 10.1038/s41586-019-0933-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinello L et al. (2016) ‘Analyzing CRISPR genome-editing experiments with CRISPResso’, Nature Biotechnology, 34(7), pp. 695–697. doi: 10.1038/nbt.3583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Polioudakis D et al. (2019) ‘A Single-Cell Transcriptomic Atlas of Human Neocortical Development during Mid-gestation’, Neuron. Elsevier Inc, pp. 1–17. doi: 10.1016/j.neuron.2019.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qi LS et al. (2013) ‘Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression’, Cell. Elsevier, 152(5), pp. 1173–1183. doi: 10.1016/j.cell.2013.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qin Q et al. (2011) ‘Normal and disease-related biological functions of Twist1 and underlying molecular mechanisms’, Nature Publishing Group. Nature Publishing Group, 22(1), pp. 90–106. doi: 10.1038/cr.2011.144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quadrato G et al. (2017) ‘Cell diversity and network dynamics in photosensitive human brain organoids’, Nature. Nature Publishing Group, 545(7652), pp. 48–53. doi: 10.1038/nature22047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Que J et al. (2007) ‘Multiple dose-dependent roles for Sox2 in the patterning and differentiation of anterior foregut endoderm’, Development, 134(13), pp. 2521–2531. doi: 10.1242/dev.003855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raff RA (1996) The Shape of Life; Genes, Development and the Evolution of Animal Form. Chicago, IL: University of Chicago Press. [Google Scholar]
- Richard I et al. (2000) ‘Human–mouse differences in the embryonic expression patterns of developmental control genes and disease genes’, Human Molecular Genetics, 9(2), pp. 165–173. doi: 10.1093/hmg/9.2.165. [DOI] [PubMed] [Google Scholar]
- Richardson MK et al. (1997) ‘There is no highly conserved embryonic stage in the vertebrates: implications for current theories of evolution and development.’, Anatomy and embryology. Germany, 196(2), pp. 91–106. [DOI] [PubMed] [Google Scholar]
- Rosenberg AB et al. (2017) ‘Scaling single cell transcriptomics through split pool barcoding’, Bioarxiv. doi: 10.1101/105163. [DOI] [Google Scholar]
- Rosenberg AB et al. (2018) ‘Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding’, Science, 12(April), p. eaam8999. doi: 10.1126/science.aam8999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Royo JL et al. (2011) ‘Transphyletic conservation of developmental regulatory state in animal evolution.’, Proceedings of the National Academy of Sciences of the United States of America. United States, 108(34), pp. 14186–14191. doi: 10.1073/pnas.1109037108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Samatov TR, Wicklein D and Tonevitsky AG (2016) ‘L1CAM: Cell adhesion and more’, Progress in Histochemistry and Cytochemistry, 51(2), pp. 25–32. doi: 10.1016/j.proghi.2016.05.001. [DOI] [PubMed] [Google Scholar]
- Sammon JW (1969) ‘A Nonlinear Mapping for Data Structure Analysis’, IEEE Transactions on Computers, C–18(5), pp. 401–409. doi: 10.1109/T-C.1969.222678. [DOI] [Google Scholar]
- Sanjana N, Shalem O & Zhang F Improved vectors and genome-wide libraries for CRISPR screening. Nat Methods 11, 783–784 (2014). 10.1038/nmeth.3047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sarper SE et al. (2018) ‘Runx1-Stat3 signaling regulates the epithelial stem cells in continuously growing incisors’, Scientific Reports. Springer US, 8(1), pp. 1–12. doi: 10.1038/s41598-018-29317-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sato T et al. (2009) ‘Single Lgr5 stem cells build crypt-villus structures in vitro without a mesenchymal niche’, Nature. Nature Publishing Group, 459(7244), pp. 262–265. doi: 10.1038/nature07935. [DOI] [PubMed] [Google Scholar]
- Sato T et al. (2011) ‘Long-term expansion of epithelial organoids from human colon, adenoma, adenocarcinoma, and Barrett’s epithelium’, Gastroenterology. Elsevier Inc, 141(5), pp. 1762–1772. doi: 10.1053/j.gastro.2011.07.050. [DOI] [PubMed] [Google Scholar]
- Scheitz CJF and Tumbar T (2013) ‘New insights into the role of Runx1 in epithelial stem cell biology and pathology’, Journal of cellular biochemistry, 114(5), pp. 985–993. doi: 10.1002/jcb.24453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seiler A et al. (2005) ‘Regulation of miRNA expression during neural cell specification’, 21(January), pp. 1469–1477. doi: 10.1111/j.1460-9568.2005.03978.x. [DOI] [PubMed] [Google Scholar]
- Shapiro B et al. (2015) ‘Clusterin, a gene enriched in intestinal stem cells, is required for L1-mediated colon cancer metastasis’, Oncotarget, 6(33), pp. 34389–34401. doi: 10.18632/oncotarget.5360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi M et al. (2016) ‘CNS tau efflux via exosomes is likely increased in Parkinson’s disease but not in Alzheimer’s disease’, Alzheimer’s and Dementia, 12(11), pp. 1125–1131. doi: 10.1016/j.jalz.2016.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shivdasani RA (2006) ‘Review in translational hematology MicroRNAs : regulators of gene expression and cell differentiation’, 108(12), pp. 3646–3654. doi: 10.1182/blood-2006-01-030015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Silberg DG et al. (2002) ‘Cdx2 Ectopic Expression Induces Gastric Intestinal Metaplasia’, pp. 689–696. doi: 10.1053/gast.2002.31902. [DOI] [PubMed] [Google Scholar]
- Simmini S et al. (2014) ‘Transformation of intestinal stem cells into gastric stem cells on loss of transcription factor Cdx2’, Nature Communications. Nature Publishing Group, 5, pp. 1–10. doi: 10.1038/ncomms6728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith KP, Luong MX and Stein GS (2009) ‘Pluripotency: Toward a gold standard for human ES and iPS cells’, Journal of Cellular Physiology, 220(1), pp. 21–29. doi: 10.1002/jcp.21681. [DOI] [PubMed] [Google Scholar]
- de Souza N (2017) ‘Organoid variability examined’, Nature Methods. Nature Publishing Group, a division of Macmillan Publishers Limited; All Rights Reserved., 14, p. 655. [Google Scholar]
- Stevens L (1962) ‘The biology of teratomas including evidence indicating their origin form primordial germ cells’, Annee Biol., 1, pp. 585–610. [PubMed] [Google Scholar]
- Stevens L (1967) ‘THE BIOLOGY OF TERATOMAS’, Adv Morphog, 6, pp. 1–31. [DOI] [PubMed] [Google Scholar]
- Stevens LC and Pierce GB (1975) ‘Teratomas: Definitions and Terminology’, Teratomas and Differentiation, pp. 13–14. [Google Scholar]
- Stuart T et al. (2018) ‘Comprehensive integration of single cell data’, bioRxiv, pp. 1–34. doi: 10.1101/460147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stuart T et al. (2019) ‘Comprehensive Integration of Single-Cell Data’, Cell. Elsevier Inc., 177(7), pp. 1888–1902.e21. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stumpel C and Vos YJ (1993) ‘L1 Syndrome.’, in Adam MP et al. (eds). Seattle (WA). [Google Scholar]
- Sun Y et al. (2015) ‘An updated role of microRNA-124 in central nervous system disorders: a review.’, Frontiers in cellular neuroscience, 9, p. 193. doi: 10.3389/fncel.2015.00193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suzuki A et al. (2016) ‘Regulation of transient receptor potential vanilloid 1 expression in trigeminal ganglion neurons via methyl-CpG binding protein 2 signaling contributes tongue heat sensitivity and inflammatory hyperalgesia in mice’, Molecular Pain, 12, pp. 1–11. doi: 10.1177/1744806916633206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suzuki N et al. (2013) ‘Generation of Engraftable Hematopoietic Stem Cells From Induced Pluripotent Stem Cells by Way of Teratoma Formation’, Molecular Therapy. The American Society of Gene & Cell Therapy, 21(7), pp. 1424–1431. doi: 10.1038/mt.2013.71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tarlow D et al. (2013) ‘Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning’, Proceedings of the 30th International Conference on Machine Learning, 28(3), pp. 199–207. Available at: http://proceedings.mlr.press/v28/tarlow13.html. [Google Scholar]
- Thissen D, Steinberg L and Kuang D (2002) ‘Quick and Easy Implementation of the Benjamini-Hochberg Procedure for Controlling the False Positive Rate in Multiple Comparisons’, Journal of Educational and Behavioral Statistics. American Educational Research Association, 27(1), pp. 77–83. doi: 10.3102/10769986027001077. [DOI] [Google Scholar]
- THURLBECK WLLIAMM, R. E. S. (1973) ‘Solid Teratoma of the Ovary: A Clinicopatlzological Analysis of 9 Cases’, (January 1960), pp. 2563–2571. [Google Scholar]
- Tsukada M et al. (2017) ‘In Vivo Generation of Engraftable Murine Hematopoietic Stem Cells by Gfi1b, c-Fos, and Gata2 Overexpression within Teratoma’, Stem Cell Reports. ElsevierCompany., 9(4), pp. 1024–1033. doi: 10.1016/j.stemcr.2017.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsunemoto R et al. (2018) ‘Diverse reprogramming codes for neuronal identity’, Nature, 557(7705), pp. 375–380. doi: 10.1038/s41586-018-0103-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Umansky KB et al. (2015) ‘Runx1 Transcription Factor Is Required for Myoblasts Proliferation during Muscle Regeneration’, PLoS Genetics, 11(8), pp. 1–31. doi: 10.1371/journal.pgen.1005457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van de Wetering M et al. (2015) ‘Prospective Derivation of a Living Organoid Biobank of Colorectal Cancer Patients’, Cell, 161(4), pp. 933–945. doi: 10.1016/j.cell.2015.03.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vastag L et al. (2011) ‘Remodeling of the metabolome during early frog development’, PLoS ONE, 6(2). doi: 10.1371/journal.pone.0016881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Velasco S et al. (2019) ‘Individual brain organoids reproducibly form cell diversity of the human cerebral cortex’, Nature. doi: 10.1038/s41586-019-1289-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagner DE et al. (2018) ‘Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo Daniel’, Science, 25(3), pp. 289–313. doi: 10.1007/s11065-015-9294-9.Functional. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J et al. (2017) ‘Single-cell gene expression analysis reveals regulators of distinct cell subpopulations among developing human neurons’, Genome Research, 27(11), pp. 1783–1794. doi: 10.1101/gr.223313.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang S et al. (2008) ‘The Endothelial-Specific MicroRNA miR-126 Governs Vascular Integrity and Angiogenesis’, pp. 261–271. doi: 10.1016/j.devcel.2008.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilcoxon F (1946) ‘Individual Comparisons of Grouped Data by Ranking Methods’, Journal of Economic Entomology, 39(2), pp. 269–270. doi: 10.1093/jee/39.2.269. [DOI] [PubMed] [Google Scholar]
- Willis RA (1934) ‘The Structure of Teratoma’, The Journal of Pathology and Bacteriology, XL(I). [Google Scholar]
- Willis RA (1935) ‘THE HISTOGENESIS OF NEURAL TISSUE IN TERATOMAS. ( PLATES’, The Journal of Pathology and Bacteriology. [Google Scholar]
- Wolburg H et al. (2009) ‘Ependymal cells’, Encyclopedia of Neuroscience, pp. 1133–1140. doi: 10.1016/B978-008045046-9.01001-9. [DOI] [Google Scholar]
- Wu Y, Tamayo P and Zhang K (2018a) ‘Visualizing and Interpreting Single-Cell Gene Expression Datasets with Similarity Weighted Nonnegative Embedding.’, Cell systems. United States, 7(6), pp. 656–666.e4. doi: 10.1016/j.cels.2018.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y, Tamayo P and Zhang K (2018b) ‘Visualizing and interpreting single-cell gene expression datasets with Similarity Weighted Nonnegative Embedding’, Cell Systems. Elsevier Inc., 7(6), pp. 656–666.e4. doi: 10.1101/276261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J et al. (2004) ‘Twist, a Master Regulator of Morphogenesis, Plays an Essential Role in Tumor Metastasis’, 117, pp. 927–939. [DOI] [PubMed] [Google Scholar]
- Yao Q et al. (2009) ‘MicroRNA-21 promotes cell proliferation and down-regulates the expression of programmed cell death 4 ( PDCD4 ) in HeLa cervical carcinoma cells’, Biochemical and Biophysical Research Communications. Elsevier Inc., 388(3), pp. 539–542. doi: 10.1016/j.bbrc.2009.08.044. [DOI] [PubMed] [Google Scholar]
- Yao Z et al. (2017) ‘A Single-Cell Roadmap of Lineage Bifurcation in Human ESC Models of Embryonic Brain Development’, Cell Stem Cell. Elsevier Inc., 20(1), pp. 120–134. doi: 10.1016/j.stem.2016.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yin X et al. (2016) ‘Engineering Stem Cell Organoids’, Cell Stem Cell. Elsevier Inc., 18(1), pp. 25–38. doi: 10.1016/j.stem.2015.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng GXY et al. (2017) ‘Massively parallel digital transcriptional profiling of single cells’, Nature Communications. Nature Publishing Group, 8, pp. 1–12. doi: 10.1038/ncomms14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhong S et al. (2018) ‘A single-cell RNA-seq survey of the developmental landscape of the human prefrontal cortex’, Nature. Nature Publishing Group. doi: 10.1038/nature25980. [DOI] [PubMed] [Google Scholar]
- Zhu S et al. (2008) ‘MicroRNA-21 targets tumor suppressor genes in invasion and metastasis’, Cell Research. Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 18, p. 350. [DOI] [PubMed] [Google Scholar]
- Zhu Y et al. (2018) ‘Spatiotemporal transcriptomic divergence across human and macaque brain development’, Science, 362(6420), p. eaat8077. doi: 10.1126/science.aat8077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zovein AC et al. (2008) ‘Fate Tracing Reveals the Endothelial Origin of Hematopoietic Stem Cells’, Cell Stem Cell. Elsevier Inc., 3(6), pp. 625–636. doi: 10.1016/j.stem.2008.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Comprehensive teratoma characterization. Related to Figure 1 and Table 1. (A) H&E stains (left to right, top to bottom): Choroid Plexus, Fetal Neuro-ectoderm, Retinal Pigment Epithelium (RPE), Developing Airway, Ciliated Respiratory Epithelium, Fetal Cartilage, Mesenchyme, Bone, Developing Cardiac/Skeletal Muscle, Squamous epithelium, Retinal Neurons (around RPE), Smooth Muscle, Adipocytes. (B) The fraction of cells that are classified as MSC/Fibroblast across each teratoma. (C) Heatmap of key marker genes for each cell type (guidelines separate cell types from different germ layers) (Table S3C). (D) Correlation of the average expression of each human teratoma cell type with the average expression of each fetal mouse cell type. (E) UMAP plot of mouse cell types in the H1 teratomas.
Figure S2. Assaying teratoma heterogeneity. Related to Figure 2. (A) UMAP scatterplot showing how each line (HUES62, PGP1, and H9) contributes to the various cell type clusters. (B) Left: the normalized proportion of each teratoma in every cell type. Right: the bias each cell type shows towards specific teratomas. A low bias score means the cell type is well mixed across all 7 teratomas. (C) Growth kinetics of 6 teratomas based on cell line (HUES62, PGP1, and H9). (D) Karyotyping of all 4 PSC lines. (E) Lentiviral barcode construct map. (F) Barcoding summary statistics for both bulk and single cell assays across the three barcoded teratomas.
Figure S3. Assaying teratoma maturity. Related to Figure 3 and Table 1. (A) A heatmap of log fold-changes for the top differentially expressed genes between matched teratoma neuro-ectoderm and fetal cortical cell types. (B) A heatmap of the enrichment scores for top differential genesets (via Geneset Enrichment Analysis) between matched teratoma neuro-ectoderm and fetal cortical cell types. (C) Cosine similarity of teratoma gut cells with fetal gut cells of different ages. (D) Projection of fetal gut epithelium cell types onto a teratoma gut epithelium SWNE embedding. (E) Correlation of the scaled expression of key marker genes across mid/hindgut epithelium and foregut epithelium between teratoma and fetal cell types. (F) Proportion of foregut and mid/hindgut cells in the teratoma and fetal gut. (G) A heatmap of log fold-changes for the top differentially expressed genes between matched teratoma gut epithelium and fetal gut epithelium cell types. (H) A heatmap of the enrichment scores for top differential genesets (via Geneset Enrichment Analysis) between matched teratoma gut epithelium and fetal gut epithelium cell types. (I) H&E stains (left) as well as RNA FISH staining (right) of FOXJ1 (Airway epithelium), CDX2 (Intestinal epithelium), TNNT2 (Cardiac muscle), and THY1 (mesenchymal stem cell/fibroblast). Scalebar = 50μM (20x). Dots were dilated using ImageJ.
Figure S4. Engineering teratomas via genetic perturbations. Related to Figure 4. (A) Schematic showing knock-in of the CAG-spCas9-P2A-EGFP cassette with an upstream T2A linked blasticidin resistance gene into the AAVS1 locus thus, creating the Cas9-expressing PGP1 line (above). Accompanying validated trace sequences of the left and right arms (below). (B) 2% agarose gel confirming integration of the CAG-spCas9-P2A-EGFP cassette into the AAVS1 locus of the PGP1 line via PCR amplification of the left and right arm spanning the endogenous locus and the engineered cassette compared to a PGP1 negative control. (C) Observed cells per gRNA and cells per gene for the screen. (D) UMAP projection of PGP1 cell types classified using the H1 cell types as a reference. (E) PGP1-Cas9 iPSCs were transduced with a CRISPR-Cas9 library targeting TCF4 (Pitt-Hopkins Syndrome), MECP2 (Rett Syndrome), and L1CAM (L1 Syndrome) with 3 guides each. After generating 2 teratomas with the PGP1-iPSCs, scRNA-seq was used to identify shifts in cell type specific gene expression as a result of gene knockouts. (F) Shift in cell types as measured by normalized Earth Mover’s Distance (EMD) due to knockouts from the embryonic lethal knockouts and the disease screen knockouts (TCF4, MECP2, L1CAM). (G - I) The shift in gene expression as measured by log2 fold-change against NTC guides across both teratoma replicates for (G) L1CAM knockout in Neurons, (H) MECP2 knockout in Neural Progenitors, (I) TCF4 knockout in Neural Progenitors. The color of the data points represents the −log(False Discovery Rate) as computed by DESeq2.
Figure S5. Engineering teratomas via molecular sculpting. Related to Figure 4. (A) Phase images from light microscopy showing H1 cell survival after 3 and 5 days in the presence of GCV (10μM). H1 ESC line was either transduced with GFP control (EGIP backbone) or miR-124-HSV-tk-GFP. (B)-(C) Quantification using flow cytometry and gating based on the presence or absence of GFP in HEK293T and HeLa/HUVEC cells (B)/(C) transduced with either No GFP control, HSV-tk-GFP, or miR-21-HSV-tk-GFP/miR-126-HSV-tk-GFP for 5 days (Methods). (D) Schematic of generating self-patterned whole brain organoids (Methods). (E) Images of teratomas grown in the absence and presence of GCV administration (80mg/kg/d, Methods) for 10 weeks. (F) H&E stains of teratomas grown in the absence (left) and presence (right) of GCV administration. Arrowheads highlight regions of neuro-ecotoderm. Scalebars are directly labeled. (G) anti-PAX6 (red) and DAPI (blue) immunostaining in GCV+ and GCV− control sections across 3 different regions of the corresponding teratoma. Scalebar = 2 mm. (H) Secondary antibody staining only (Dylight 550, red) and DAPI (blue) for a GCV+ and GCV− negative teratoma. Scalebar = 2 mm. (I) RNA FISH analysis of HES5 (red) and DAPI (blue) in a GCV+ and GCV− teratoma. Scalebar = 2 mm, 200 μm (magnified insert).
Table S1. Teratoma Metrics. Related to Figure 1. (A) H1 teratoma metrics. (B). Cell line teratoma metrics. (C) Embryonic lethal screen teratoma metrics. (D) Embyronic lethal screen repool teratoma metrics. (E) Neural disease screen teratoma metrics. (F) miR-124 teratoma metrics.
Table S2. Cell Type Identification. Related to Figure 1 and Table 1. (A). Top TF markers for original Seurat clusters. (B) Top overall markers for original Seurat clusters. (C) Mapping Seurat clusters to cell types. (D) Top TF markers for each mapped cell type. (E) Top overall markers for each mapped cell type. (F) Sub-clustering for the ciliated epithelium. (G) Sub-clustering for the neuro-ectoderm.
Table S3. Cell Type Summary. Related to Figure 1 and Figure 2. (A) Cell type counts. (B) Cell type proportions. (C) Final set of cell type marker genes with references.
Table S4. Developmental Screen Targets and sgRNAs. Related to Figure 4. (A) Target genes known to be embryonic lethal in mice. (B) sgRNA primers used for cloning. (C) Indel amplification primers used to assess sgRNA editing rates. (D) sgRNA editing rates.
Table S5. Neural Disease Screen sgRNAs and Differentially Expressed Genes. Related to Figure S4. (A) sgRNA primers for cloning. (B) Summary of differentially expressed genes (DEGs) for each gene knockout in each broad cell type. (C) Differentially expressed genes in neurons. (D) Differentially expressed genes in Neural Progenitors. (E) Differentially expressed genes in Muscle. (F) Differentially expressed genes in Immune cells. (G) miRNA Sequences, Target Sites, and Lineage Specificities.
Data Availability Statement
The raw and processed data generated from this study are available at Gene Expression Omnibus with accession code GSE156170
All code used for analysis are available at this github repository: yanwu2014/teratoma-analysis-code. Instructions for reproducing our analysis step by step are also in this repository.




