Abstract
Identification of novel functional domains and characterization of detailed regulatory mechanisms in cancer-driving genes is critical for advanced cancer therapy. To date, CRISPR gene editing has primarily been applied to defining the role of individual genes. Recently, high-density mutagenesis via CRISPR tiling of gene-coding exons has been demonstrated to identify functional regions in genes. Furthermore, breakthroughs in combining CRISPR library screens with single-cell droplet RNA sequencing (sc-RNAseq) platforms have revealed the capacity to monitor gene expression changes upon genetic perturbations at single-cell resolution. Here, we present “sc-Tiling,” which integrates a CRISPR gene-tiling screen with single-cell transcriptomic and protein structural analyses. Distinct from other reported single-cell CRISPR screens focused on observing gene function and gene-to-gene/enhancer-to-gene regulation, sc-Tiling enables the capacity to identify regulatory mechanisms within a gene-coding region that dictate gene activity and therapeutic response.
Subject terms: High-throughput screening, Cancer genetics, Epigenetics, CRISPR-Cas systems
Identifying functional domains and genetic regulatory mechanisms is essential for developing new therapies. Here the authors present sc-Tiling, single-cell high-density CRISPR tiling screening for functional domain characterization.
Introduction
The integration of CRISPR (clustered, regularly interspaced, short palindromic repeats) with next-generation sequencing technology for high-throughput genetic screens is a powerful tool for discovering functional genes in various pathways and cellular contexts1,2. Furthermore, high-density CRISPR targeting of coding exons has been demonstrated to identify functional domains in genes3–7. However, the traditional CRISPR dropout/enrichment screens restricted the application to investigate functional elements associated with cell survival phenotypes. Recent breakthroughs in combining the CRISPR library screens with droplet RNA-sequencing (RNA-seq) platforms demonstrated the capacity of monitoring the gene expression changes upon genetic perturbations in single cells (e.g., Perturb-seq, CRISP-seq, CROP-seq)8–11. The current single-cell CRISPR screens focused on observing single-gene function, gene-to-gene interaction, and enhancer-to-gene regulation10,12–16. Nevertheless, the potential of single-cell CRISPR screen technology to examine the gene function at a sub-gene resolution has not been fully explored.
In this study, we develop a single-cell CRISPR gene tiling pipeline “sc-Tiling” to provide high-resolution transcriptomic profiling of the coding regions of histone H3 lysine 79 (H3K79) methyltransferase DOT1L, an epigenetic therapeutic candidate selectively essential to mixed-lineage leukemia gene-rearranged (MLL-r) leukemia17–19. Furthermore, we couple the sc-Tiling with three-dimensional structural modeling and discovered a previously unrecognized self-regulatory domain in DOT1L that modulates the chromatin interaction, enzymatic activation, and therapeutic sensitivity in MLL-r leukemia.
Results
Development of the sc-Tiling screen
Recent achievements in cancer epigenetics include discovery of a central role for the H3K79 methyltransferase DOT1L in maintaining MLL-r leukemia, an aggressive malignancy recognized in 5–10% of human acute leukemia cases19,20. A selective DOT1L inhibitor, EPZ5676 (Pinometostat)21, has demonstrated proof-of-principle clinical benefits via induction of differentiation of MLL-r leukemic cells in a phase I clinical trial22. However, the variable responses of patients with MLL-r in this trial underscore the need for additional mechanistic insights into functional regions of DOT1L to improve therapeutic efficacy and trial designs for DOT1L-targeted therapy.
To achieve high-resolution characterization of DOT1L’s function, we developed a single-cell CRISPR gene-tiling approach named sc-Tiling, which utilizes a capture sequence (CS1: 5′-GCTTTAAGGCCGGTCCTAGCA-3′) at the end of each single guide RNA (sgRNA) for direct capture by the Chromium Next GEM Single Cell 3ʹ Kit v3.1 (Fig. 1a and Supplementary Fig. 1a–c)11. We cloned a pool of 602 sgRNAs that target most of the “NGG” protospacer adjacent motifs within the mouse Dot1l coding exons (average targeting density 7.7 bp per sgRNA; Supplementary Fig. 2a, b and Supplementary Data 1). We then delivered this CRISPR library into Cas9-expressing mouse MLL-AF9 transduced leukemic cells (MLL-AF9-Cas9+; Supplementary Fig. 2c), a well-established murine leukemia model that mimics human MLL-r conditions18,23. Three days after transduction (Supplementary Fig. 2d, e), the cells carrying library constructs were subjected to droplet single-cell barcoding and messenger RNA (mRNA)/sgRNA library preparation using the 10X Chromium workflow (Fig. 1a). Subsequent single-cell transcriptomic analysis revealed an average of 26,350 reads per cell and a median of 2935 genes detected per cell (Supplementary Fig. 3). To avoid contamination by doublets and multi-sgRNA-infected cells, we filtered out any single cells carrying more than one sgRNA sequence. Finally, 88.2% of single cells (4362 out of 4943) passed the quality control (QC) filter (Fig. 1b), giving an average library coverage of 7.1 cells per sgRNA.
Single-cell projections using Uniform Manifold Approximation and Projection (UMAP)24 of DOT1L-dependent genes18 identified seven cell clusters (Fig. 1c). Gene expression annotation revealed distinct distributions of cells expressing leukemia-associated genes (Meis1, Hoxa9, and Myc; clustered toward the right) vs. myeloid-differentiation markers (Cd11b, Gr1, and Ltf; clustered toward the left) (Fig. 1d). Cells expressing sgRNAs targeting the functionally essential lysine methyltransferase (KMT) core (residues M127–P332; total 56 sgRNA) of DOT1L17,25 clustered to regions that overlap with the differentiated myeloid population (Fig. 1e). On the contrary, the sgRNAs targeting a non-essential region of DOT1L (the C-terminal end 100 amino acids of DOT1L; total 54 sgRNA) behaved similarly to spiked-in negative control sgRNAs (targeting Firefly luciferase [Luc], Renilla luciferase [Ren], green fluorescent protein [GFP], red fluorescent protein [RFP], and Rosa26 coding sequences; Supplementary Data 1), with both clusters to the region representing undifferentiated leukemia (Supplementary Fig. 4a). Trajectory analysis (pseudo-time)26 correlated closely with the expression of these marker genes, with leukemia-associated genes being gradually reduced, while myeloid-differentiation markers increased along the pseudo-time trajectory (Fig. 1f, right to left and Supplementary Fig. 4b). These results indicate efficient CRISPR editing of DOT1L in cells expressing the CS1 direct-capturable sgRNA library.
Structural and transcriptomic profiling of sc-Tiling
To evaluate the resolution of sc-Tiling for detecting functional elements within a protein domain, we summarized the overall behavior of neighboring sgRNAs using a local-smoothing strategy5 (Fig. 1g), and mapped the smoothened pseudo-time score to a cryo-electron microscopy structure of the DOT1L KMT core in an “active state” interacting with a histone H2B-ubiquitinylated nucleosome (Fig. 1h)27,28. Our results revealed that within the KMT core domain, the resolution of sc-Tiling allowed recognition of all the amino acid residues that directly contacted the enzymatic substrate S-adenosyl methionine (SAM pocket) and the D1 loop (residues P133–T13925) (Supplementary Fig. 5a). This method also detected the critical regions within the KMT core domain that mediate its chromatin interaction. These include the W22–D32 loop (Supplementary Fig. 5b; interacts with histone H4 tail), R282 loop (Supplementary Fig. 5c; interacts with the histone H2A/H2B acidic patch), and T320–K330 helix (Supplementary Fig. 5d; interacts with the ubiquitin conjugated to histone H2BK120)27,28. Taken together, sc-Tiling clearly distinguished the functional regions of KMT from the non-essential region (residues A33–T100) that is not involved in substrate/ligand interaction, revealing the capacity of single-cell CRISPR gene-tiling to pinpoint functional elements at a sub-domain resolution.
To identify novel functional elements that modulate DOT1L activity, we utilized the top 100 genes affected by DOT1L inhibitor18 to develop a high-resolution transcriptomic correlation heatmap across DOT1L protein (Fig. 2a). This method revealed two functionally distinct segments of DOT1L, i.e., the N-module (residues M1–T900) and the C-module (residues P901–N1537). The strong correlation of the sgRNAs targeting the C-module with the negative control sgRNAs (Supplementary Fig. 4a) indicates a lack of essential components in the C-terminal portion of DOT1L. On the other hand, we observed several functional regions of DOT1L within the N-module, including the KMT core (black dashed triangle)25,27,28 and the AF9-binding motif (green dashed box; residues T863–T900)29. Whereas the AF9-binding motif showed a moderate correlation (Pearson score ~0.75) with the KMT core, we identified a region (cyan dashed box; designated as the “R domain”) located in the center of the N-module that exhibited a higher correlation (Pearson score >0.8) with the KMT core in the transcriptional signature. Based on this observation, we presumed that disrupting the function of the R domain would impair the survival of MLL-AF9 leukemia cells, similar to inhibition of the KMT core. To test this, we utilized the DOT1L-tiling CRISPR library to perform pooled survival screens4,6 in MLL-AF9-Cas9+ cells and examined the cell survival by comparing the frequencies of each integrated sgRNA sequence before vs. after 3-, 6-, 9-, or 12-day cultures using high-throughput sequencing (Fig. 2b). Our results revealed a progressive depletion of clusters of sgRNAs (and smoothed CRISPR scan scores) targeting the KMT core, AF9-binding motif, and the first half of the R domain (designated as the “R1 element;” residues F460–G555). Furthermore, we sought to combine the principal component analysis of sc-Tiling (PC1 score) with the survival CRISPR scan score for individual amino acids in DOT1L (Fig. 2c). This approach revealed that the distribution of KMT core (black dots) overlaps with a segment located in the center of the R1 element (R1 center; E489–L515; red dots) in both transcriptomic and survival profiling, suggesting the functional association of this region with the KMT core.
R domain modulates the efficacy of DOT1L inhibitory therapy
To confirm the results from sc-Tiling analyses, we chose three sgRNAs each targeting the KMT core, AF9-binding motif, and the R1 center for functional validation (guide sequence and editing efficiency shown in Supplementary Fig. 6). Using an RFP flow cytometric growth competition assay (Supplementary Fig. 7)3 and immunoblotting, we observed that compared to the sgRNAs targeting AF9-binding motif, expression of sgRNAs targeting the R1 center resulted in a more drastic suppression of cell proliferation (Fig. 2d) and impaired histone H3K79 methyltransferase activity (Fig. 2e), resembling the effects of sgRNAs targeting the KMT core. In addition to a similar UMAP distribution between cells expressing sgRNAs targeting the KMT core and the R1 element (Fig. 2f), single-cell droplet RNA seq (sc-RNA-seq) revealed significantly overlapped gene regulation between these two sgRNA-targeted populations (Fig. 2g, h). These results indicate functional coordination between the DOT1L KMT core and R1 element for histone modification.
To investigate whether the R domain mediates the response of MLL-AF9 leukemia cells to DOT1L-inhibitory treatment, we compared a pair of pooled survival tiling screens conducted under control (dimethyl sulfoxide (DMSO)) vs. DOT1L-inhibited (1 μM EPZ5676) conditions (Fig. 3a). Consistent with the results of the sc-Tiling, we observed that a cluster of 27 sgRNAs targeting the R1 region (residues F460–G555) sensitized the MLL-AF9-Cas9+ cells to DOT1L inhibition (Fig. 3b and Supplementary Fig. 8c). By contrast, a cluster of 36 sgRNAs targeting the residues A558–C662 (designated as the “R2 element”) exhibited a significantly increased CRISPR score only in the DOT1L-inhibited condition (Fig. 3a, b). The expression of individual sgRNAs targeting the R2 element exhibited minimal impact on the proliferation of MLL-AF9-Cas9+ cells (Fig. 3c and Supplementary Fig. 8a, b), but increased the resistance index to the DOT1L inhibitor (Fig. 3d and Supplementary Fig. 8c), confirming the EPZ5676-resistant phenotype we observed in the CRISPR gene body scans. Computational modeling of the R domain (residues F460–C662) revealed a consensus “coiled-coil” structure consisting of four alpha-helices (Fig. 3e), which is capable of interacting with the KMT core domain of DOT1L (Fig. 3f). Within the R domain, the R1 element (consisting of CC0 and CC1) overlaps with an area previously reported to interact with AF1030,31, a coactivator of DOT1L required for methyltransferase activation. On the other hand, the R2 element (consisting of CC2 and CC3) is predicted to interact with the DOT1L KMT core and masks the R282 loop (Fig. 3f), thereby interrupting the DOT1L–nucleosome interaction and methyltransferase activity of the KMT core. This model suggests that the R domain mediates the transition from a “closed” to an “open” state of DOT1L (Fig. 3g; left to right), which is required before the engagement of the KMT core with nucleosomes for H3K79 methylation (Fig. 3g; blue area summarized in Fig. 1h).
To evaluate the impact of this self-regulatory mechanism on DOT1L-targeted therapy, we queried the cBioPortal database32 and focused on the R2 element (residues A558–C662) that exerted a robust EPZ5676-resistant phenotype in the CRISPR scan. Out of a total of 54,510 patient samples, we found 19 DOT1L variant alleles to exist in this 105-amino acid region (Supplementary Fig. 9a and Supplementary Table 1). Compared to the expression of wild-type-DOT1L constructs, the expression of several mutant-DOT1L constructs (each harbors a single amino acid missense mutation) in MLL-AF9 cells resulted in an increased resistance to EPZ5676 treatment (Fig. 3h and Supplementary Fig. 9b, c). We then focused on the top three drug-resistant variants (Q584P, L626P, and C637G) and found that these mutant-DOT1L led to an elevated H3K79me2 (Supplementary Fig. 10) and required a higher dosage of EPZ5676 to suppress their activity compared to wild-type-DOT1L (Fig. 3i, j). Computational modeling of these drug-resistant variants indicates that mutations at these residues may destabilize alpha-helix bundles and lead to dissociation of the R domain from the KMT core, resulting in increased kinetic activity and tolerance to DOT1L-inhibitory therapy (Supplementary Fig. 11).
Discussion
High-throughput CRISPR genetic screens have been wildly used for discovering functional genes in mammalian systems. In contrast, the potential of CRISPR technology to investigate gene function at a sub-gene (i.e., protein domain or sub-domain) resolution has not been fully explored. Furthermore, traditional pooled CRISPR screens limit the ability to identify functional elements associated with cell killing/proliferation phenotypes (i.e., by observing the depletion or enrichment of specific sgRNA). The requirement for significant changes in cell number in survival CRISPR screens (which typically take 2–4 weeks of culture) prohibits the determination of causal mechanisms induced by CRISPR perturbation.
To overcome this obstacle, our study integrated a CRISPR gene-tiling screen with a recently available direct-capture Perturb-seq workflow11 to develop the single-cell CRISPR gene body-scan pipeline sc-Tiling. Using this approach, we provide a high-resolution transcriptomic correlation map across DOT1L, an epigenetic therapeutic candidate essential to MLL-r leukemia17–19. We noted that the traditional survival CRISPR gene scan (x-axis; Fig. 2c) was unable to distinguish the AF9-binding motif (mediates recruitment of AF9-containing super elongation complex to support gene transcription)33,34 from the KMT core (mediates H3K79 methylation and open chromatin)18. In contrast, sc-Tiling (y-axis; Fig. 2c) efficiently differentiated these two functionally distinct domains through transcriptional profiling. The fact that cell killing through targeting the AF9-binding motif does not impair the H3K79me2 level (Fig. 2e) testifies the catalytic-independent role of the AF9-binding motif in DOT1L. We envision a significant advance of sc-Tiling to recognize underlying mechanisms of the functional domains. Furthermore, we foresee the transcriptomic profiling in sc-Tiling to enable dissection of functional elements that participate in diverse cellular processes (e.g., metabolism, cell fate decision, tissue homeostasis) that the end phenotypes might not be the cellular survival or proliferation.
Although the limitations of CRISPR genome editing (e.g., variable cutting efficiency, potential for off-targeting, and the mosaic effect [i.e., generation of random mutations]) remain concerns in the CRISPR sc-Tiling approach, by considering multiple sgRNAs clustered in a peptide region via a local-smoothing strategy, we significantly increased the statistical confidence and minimized the impact of noise associated with individual sgRNAs. Importantly, the use of single-cell transcriptional profiling in sc-Tiling could predict functional elements and corresponding gene regulations that led to a cellular survival phenotype after prolonged culture, and provided superior resolution in detecting sub-domain functional elements than survival CRISPR gene-tiling screens using pooled sequencing (i.e., Fig. 1g vs. 2b; KMT core). Furthermore, when we coupled sc-Tiling with three-dimensional structural modeling, we discovered a self-regulatory R domain in DOT1L that modulates chromatin interaction, enzymatic activation, and therapeutic sensitivity in MLL-r leukemia. To our knowledge, this is the first characterization of an intragenic regulatory module that mediates switching between a “closed” and an “open” state of an epigenetic enzyme.
Finally, our study demonstrates the utility of combining sc-Tiling with consortium genomic databases (e.g., cBioPortal, CCLE, dbSNP; Supplementary Table 1 and Supplementary Fig. 12) for de novo identification of therapeutically relevant alleles in the human population (Fig. 3h). We propose that sc-Tiling may complement the rapidly growing multi-omics databases to provide additional insights that bridge functional genomics, structural biology, and clinical investigation. We envision that this approach will accelerate the recognition of clinically impactful variants within the human genome and has the potential to direct more precise clinical trials and therapeutic decisions.
Methods
Cas9-expressing MLL-AF9 leukemic cell culture
Mouse MLL-AF9 leukemic cells were generated by transformation of mouse bone marrow Lin−Sca1+cKit+ cells with a MIG (MSCV-IRES-GFP) retrovirus expressing the MLL-AF9 fusion protein and transplanted into sublethally irradiated recipient mice23. Leukemic blasts were subsequently harvested from the diseased mice and cultured in vitro in Iscove’s modified Dulbecco’s medium (Gibco) plus 15% fetal bovine serum (Gibco) supplemented with 20 ng/ml mouse stem cell factor (PeproTech), 10 ng/ml mouse interleukin-3 (IL-3) (PeproTech), 10 ng/ml mouse IL-6 (PeproTech), penicillin (100 U/ml; Gibco), streptomycin (100 μg/ml; Gibco), and plasmocin (5 μg/ml; InvivoGen). Cas9-expressing MLL-AF9 cells were established through lentiviral transduction of LentiCas9-Blast (Addgene)35, followed by blasticidin S (10 mg/ml; Gibco) selection, single-cell cloning, and CRISPR editing efficiency test (Supplementary Fig. 2c, d).
CRISPR gene-tiling screens
sgRNA sequences targeting the coding regions of mouse Dot1l (Supplementary Data 1) were designed using the Genetic Perturbation Platform (Broad Institute)36. Briefly, sgRNA oligonucleotides were synthesized via microarray (CustomArray) and cloned into the pUCS1EPR lentiviral sgRNA vector (Supplementary Fig. 1a) using BsmBI (NEB)36. A step-by-step protocol describing the cell culture protocol can be found at Protocol Exchange37. The sgRNA library was packaged by HEK293 cells (ATCC) cotransfected with psPAX2 (Addgene) and pMD2.G (Addgene) to produce lentiviral particles, and pre-titrated to obtain 10–20% infection (monitored by flow cytometry for RFP [tagRFP] expression) in the MLL-AF9-Cas9+ cells. Each screen culture was calculated to maintain at least 1000× the number of constructs in each library. For sc-Tiling, library-transduced cultures were selected using puromycin (2.5 µg/ml; Gibco) for 3 days and subjected to single-cell separation and barcoding using a Chromium Controller (10X Genomics). For survival CRISPR gene tiling, the sgRNA library-transduced cells were subcultured every 3 days for a total of 12 days. At each designated time point, the number of cells from cultures that covered at least 1000× the number of constructs in the library was collected for analysis.
sc-Tiling data analysis
Using the Next GEM Single Cell 3′ Kit v3.1 and a Chromium Controller (10X Genomics), CS1-captured sgRNA and the poly(dT)-captured mRNA from each single cell were converted to next-generation sequencing libraries (Supplementary Fig. 1c), and sequenced (paired-end 150 base pair) using Illumina HiSeqX (Novogene Inc.). Sequencing QC and data preprocessing were performed using Seurat v3.024. Low-quality single cells with abnormal gene numbers (<200 or >4500) or significant mitochondrial RNA contamination (>10% reads) were removed (Supplementary Fig. 3a). The normalized expression data from selected single cells then underwent dimensionality reduction by principal component analysis and UMAP embeddings for visualization and clustering. Cells were clustered based on the poly(dT)-captured transcriptome information and simultaneously annotated by CS1-captured sgRNA. Single cells with more than one detected sgRNA sequence (due to multiple sgRNA transductions or multiple cells in a single-cell droplet) were excluded. Pseudo-time trajectory analysis of the DOT1L inhibitor-affected genes was performed on single-cell transcriptomic data using Monocle26. Position-ordered Pearson correlation matrix across the Dot1l gene body was calculated based on the top 100 genes affected by DOT1L inhibition.
Three-dimensional protein structural annotation of sc-Tiling
First, the median value pseudo-time projection generated from sc-Tiling was summarized for each sgRNA. To depict the pseudo-time score over regions with no sgRNA coverage, we interpolated the signal via Gaussian kernel smoothing in R38. The bandwidth was defined by the maximum gap length of the non-covered regions for local smoothing due to regional uneven sgRNA densities. To map the smoothed pseudo-time score to peptide positions, the average pseudo-time score over the trinucleotide codons was calculated for each peptide position. Pairwise alignments of primary amino acid sequences were performed using CLC Main Workbench version 8.1 (Qiagen) to ensure functional annotations of the smoothed pseudo-time scores of mouse Dot1l sc-Tiling data onto human DOT1L protein structures. Atomic data of macromolecular structures were retrieved from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB Protein Data Bank (PDB) at https://www.rcsb.org)39 in PDB file format. The PDB files were visualized and analyzed using UCSF Chimera (version 1.14 build 42000)40. Subsequently, the smoothed pseudo-time scores were mapped onto three-dimensional protein structures using the “Defined Attribute” and “Render by Attribute” functionalities in UCSF Chimera40.
Survival CRISPR gene tiling data analysis
Genomic DNA from survival screen cell pellets was harvested, PCR-amplified (NEBNext Ultra II Q5; NEB) using primers DCF01 5′-CTTGTGGAAAGGACGAAACACCG-3′ and CS1_R01 5′-TGCTAGGACCGGCCTTAAAGC-3′ (Supplementary Fig. 1a and Supplementary Table 2), and subjected to high-throughput sequencing (NextSeq550, Illumina). To quantify sgRNA reads in the library, we first extracted 20-nucleotide sequences that matched the sgRNA backbone structure (5′-CACCG and 3′-GTTT) from raw fastq reads. Extracted reads were then mapped to a reference database built from corresponding sgRNA library sequences using Bowtie241. Only reads that perfectly matched the reference database were counted. The frequency for individual sgRNAs was calculated as the read counts of each sgRNA divided by the total read counts matched to the library. Individual sgRNAs with read counts <5% of the expected frequency were excluded from downstream analysis. A CRISPR score was defined as a log 10-fold change in the frequency of individual sgRNAs between early (day 0) and late (defined time points) of the screened samples, calculated using the edgeR R package42 based on the negative binomial distribution of sgRNA read count data. To obtain a CRISPR scan score over regions with no sgRNA coverage, we interpolated the signal via Gaussian kernel smoothing in R38. Bandwidth was defined as the maximum gap length of the non-covered regions for local smoothing due to regional uneven sgRNA densities. To map CRISPR scan scores to peptide positions, the average CRISPR scan score over the trinucleotide codons was calculated for each peptide position. To compare survival screens performed in different culture conditions (e.g., control vs. EPZ5676-treated), the smoothed CRISPR scan score was further normalized by the median CRISPR score of the negative control sgRNA (defined as 0.00; sgRNA targeting Luc, Ren, GFP, RFP, and Rosa26) and the median CRISPR score of the positive control sgRNA (defined as −1.00; sgRNA targeting mRpa3)3 within the screen data.
Computational structural modeling
Four helices (CC0–CC3) of the R domain were predicted using the PSIPRED v3.3 server43. Sequence alignment of the helical regions (Fig. 3e) was produced using the MultAlin v5.4.1 server44. The model of the coiled-coil domain was predicted using the I-TASSER server45. The complex model of the R domain and KMT core domain (PDB ID: 3UWP)46 was picked from 5000 complex models generated using the ZDOCK v3.0.2 software47. The best model (Fig. 3f) was selected based on the largest number of hydrophobic contact residue pairs between the KMT core and R domain. The structures were visualized using the PyMOL v1.8.6 software (Schrödinger, LLC) and UCSF Chimera40.
Generation of human DOT1L variant cDNA expression constructs
A MIY (MSCV-IRES-YFP) retroviral construct expressing wild-type human DOT1L and yellow fluorescent protein (YFP) was obtained from Dr. Yi Zhang25. The initial wild-type human DOT1L complementary DNA (cDNA) (MIY-DOT1L-WT) was then point-mutated to obtain 19 clinically observed DOT1L variants (Supplementary Fig. 9b) using the Q5 Site-Directed Mutagenesis Kit (NEB). The mutated DOT1L cDNA fragments were confirmed using Sanger sequencing (Eton Bioscience).
Western blotting
Cells were harvested and lysed in LDS sample buffer (Invitrogen) at 5 × 106 cells/mL, separated electrophoretically using Bolt 4–12% Bis-Tris plus gels (Invitrogen), and transferred onto polyvinylidene difluoride (PVDF) membranes (0.2 µm pore size, low fluorescence) using PVDF Mini Stacks and iBlot 2 (Invitrogen). Membranes were probed with rabbit anti-H3K79me2 antibody (D15E8, Cell Signaling Technology; 1:1000), rabbit anti-histone H3 (ab1791, Abcam; 1:10,000), and mouse anti-β-actin antibody (ab8226, Abcam; 1:1000) at 4 °C overnight. After washing, the membranes were incubated with horseradish peroxidase-linked goat anti-rabbit IgG antibody (CST7074, Cell Signaling Technology; 1:10,000), donkey anti-rabbit IgG antibody conjugated with Alexa Fluor 488 (ab150061, Abcam; 1:10,000), or donkey anti-mouse IgG antibody conjugated with Cy3 (AP192C, Sigma-Aldrich; 1:10,000) at room temperature for 1 h. Chemiluminescent signals were developed using the SuperSignal West Femto Substrate (Cat# 34095, Thermo Fisher). The chemiluminescent and fluorescent signals on Western blot membranes were detected using a ChemiDoc imaging system (Bio-Rad). Signal intensity from image files was analyzed using the ImageJ software (National Institutes of Health). Representative Western blot images were selected from at least two independently performed experiments.
Growth competition assay
Cas9-expressing MLL-AF9 cells were virally transduced with the designated constructs (RFP+ ipUSEPR lentiviral sgRNA constructs listed in Supplementary Fig. 6; YFP+ MIY retroviral DOT1L variant cDNA constructs listed in Supplementary Fig. 9) in 96-well plates at ~50% infection and monitored using flow cytometry for RFP or YFP (FP). At each time point, live cell counts and the percentage of FP+ cells (FP%) were obtained by high-throughput flow cytometry and 4′,6-diamidino-2-phenylindole (Invitrogen) dye exclusion using an Attune NxT flow cytometer with an autosampler (Thermo Fisher).
The relative proliferation (RP) of FP+ (sgRNA- or DOT1L cDNA-expressing) vs. FP− (non-transduced) cells was defined as:
1 |
where N(t) and FP%(t) are the observed live cell number and FP+% at time point t; d3 denotes the day 3 time point.
The resistance index was defined as:
2 |
where RP(x,m) is the RP of cells expressing sgRNA or DOT1L cDNA variant x under m µM of EPZ5676 (Selleck Chemicals) on day 9; con denotes the sg-Luc or wild-type DOT1L cDNA.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This work was supported by the American Society of Hematology (to C.-W.C.), Alex’s Lemonade Stand Foundation (to C.-W.C.), Leukemia & Lymphoma Society (to J.C. and S.T.R.), and National Institutes of Health Grants CA176745, CA206963 (to S.A.A.), CA197498, CA233691, CA236626 (to C.-W.C.). The sequencing and structural computational studies were supported by the National Institutes of Health P30 award CA033572 (City of Hope). We thank Dr. Sarah Wilkinson for editing the manuscript.
Source data
Author contributions
L.Y., A.K.N.C., K.M., C.D.D., X.W., S.P.P., M.L., X.X., Q.L., N.M., K.Y.C., J.W., Y.S.-F., Z.F., and G.X. performed the experiments; L.Y., A.K.N.C., H.L., S.L., W.L., Y.-C.Y., D.H., and C.-W.C. analyzed the data; D.H., S.T.R., T.H., M.M., J.C., S.A.A., and C.-W.C. provided conceptual input; L.Y., A.K.N.C., K.M., H.L., S.A.A., and C.-W.C. wrote the paper; S.A.A. and C.-W.C. conceived and supervised the study.
Data availability
The 10X Genomics single-cell CRISPR and RNA-seq data generated in this study have been deposited in the Gene Expression Omnibus database under accession code GSE174307. Three-dimensional protein structures (PDB ID 3UWP and 6NQA) were obtained from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB; https://www.rcsb.org)39. Consortium genomic information were obtained from cBioPortal (https://www.cbioportal.org)32, Cancer Cell Line Encyclopedia (CCLE; https://portals.broadinstitute.org/ccle)48, and dbSNP (https://www.ncbi.nlm.nih.gov/snp/)49 databases. Additional data that support the findings of this study are provided in the Supplementary information and Source data files. Source data are provided with this paper.
Code availability
The computational codes/tool packages used in this study are available at https://github.com/l0yang05/singleCell_CRISPR_10x (GitHub) and through other developers and venders, including Genetic Perturbation Platform (Broad Institute)36, Seurat v3.2.324, Monocle2.14.026, Gaussian kernel smoothing in R38, CLC Main Workbench version 8.1 (Qiagen), Bowtie2.3.5.141, edgeR package 3.28.142, PSIPRED v3.3 server43, MultAlin v5.4.1 server44, I-TASSER server v5.145, ZDOCK v3.0.2 software47, PyMOL v1.8.6 software (Schrödinger, LLC), UCSF Chimera 1.1540, ImageJ 1.8.0_17250, PRALINE multiple sequence alignment51, FlowJo v9, and Attune NxT v3.1.2 (Thermo Fisher).
Competing interests
The authors declare no competing interests.
Footnotes
Peer review information Nature Communications thanks Han Xu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Lu Yang, Anthony K. N. Chan, Kazuya Miyashita.
These authors jointly supervised this work: Scott A. Armstrong, Chun-Wei Chen.
Contributor Information
Scott A. Armstrong, Email: scott_armstrong@dfci.harvard.edu
Chun-Wei Chen, Email: cweichen@coh.org.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-021-24324-0.
References
- 1.Shalem O, Sanjana NE, Zhang F. High-throughput functional genomics using CRISPR-Cas9. Nat. Rev. Genet. 2015;16:299–311. doi: 10.1038/nrg3899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tsherniak A, et al. Defining a Cancer Dependency Map. Cell. 2017;170:564–576 e516. doi: 10.1016/j.cell.2017.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Shi J, et al. Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nat. Biotechnol. 2015;33:661–667. doi: 10.1038/nbt.3235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Munoz DM, et al. CRISPR screens provide a comprehensive assessment of cancer vulnerabilities but generate false-positive hits for highly amplified genomic regions. Cancer Discov. 2016;6:900–913. doi: 10.1158/2159-8290.CD-16-0178. [DOI] [PubMed] [Google Scholar]
- 5.Schoonenberg VAC, et al. CRISPRO: identification of functional protein coding sequences based on genome editing dense mutagenesis. Genome Biol. 2018;19:169. doi: 10.1186/s13059-018-1563-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.He W, et al. De novo identification of essential protein domains from CRISPR-Cas9 tiling-sgRNA knockout screens. Nat. Commun. 2019;10:4541. doi: 10.1038/s41467-019-12489-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ipsaro JJ, et al. Rapid generation of drug-resistance alleles at endogenous loci using CRISPR-Cas9 indel mutagenesis. PLoS ONE. 2017;12:e0172177. doi: 10.1371/journal.pone.0172177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Adamson B, et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell. 2016;167:1867–1882 e1821. doi: 10.1016/j.cell.2016.11.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jaitin DA, et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-Seq. Cell. 2016;167:1883–1896.e1815. doi: 10.1016/j.cell.2016.11.039. [DOI] [PubMed] [Google Scholar]
- 10.Datlinger P, et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods. 2017;14:297–301. doi: 10.1038/nmeth.4177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Replogle JM, et al. Combinatorial single-cell CRISPR screens by direct guide RNA capture and targeted sequencing. Nat. Biotechnol. 2020;38:954–961. doi: 10.1038/s41587-020-0470-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Giladi A, et al. Single-cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis. Nat. Cell Biol. 2018;20:836–846. doi: 10.1038/s41556-018-0121-4. [DOI] [PubMed] [Google Scholar]
- 13.Gasperini M, et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell. 2019;176:377–390 e319. doi: 10.1016/j.cell.2018.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.McFaline-Figueroa JL, et al. A pooled single-cell genetic screen identifies regulatory checkpoints in the continuum of the epithelial-to-mesenchymal transition. Nat. Genet. 2019;51:1389–1398. doi: 10.1038/s41588-019-0489-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tian R, et al. CRISPR interference-based platform for multimodal genetic screens in human iPSC-derived neurons. Neuron. 2019;104:239–255 e212. doi: 10.1016/j.neuron.2019.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Norman TM, et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science. 2019;365:786–793. doi: 10.1126/science.aax4438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bernt KM, et al. MLL-rearranged leukemia is dependent on aberrant H3K79 methylation by DOT1L. Cancer Cell. 2011;20:66–78. doi: 10.1016/j.ccr.2011.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Chen CW, et al. DOT1L inhibits SIRT1-mediated epigenetic silencing to maintain leukemic gene expression in MLL-rearranged leukemia. Nat. Med. 2015;21:335–343. doi: 10.1038/nm.3832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chen CW, Armstrong SA. Targeting DOT1L and HOX gene expression in MLL-rearranged leukemia and beyond. Exp. Hematol. 2015;43:673–684. doi: 10.1016/j.exphem.2015.05.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chan AKN, Chen CW. Rewiring the epigenetic networks in MLL-rearranged leukemias: epigenetic dysregulation and pharmacological interventions. Front. Cell Dev. Biol. 2019;7:81. doi: 10.3389/fcell.2019.00081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Daigle SR, et al. Potent inhibition of DOT1L as treatment of MLL-fusion leukemia. Blood. 2013;122:1017–1025. doi: 10.1182/blood-2013-04-497644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Stein EM, et al. The DOT1L inhibitor pinometostat reduces H3K79 methylation and has modest clinical activity in adult acute leukemia. Blood. 2018;131:2661–2669. doi: 10.1182/blood-2017-12-818948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Krivtsov AV, et al. Transformation from committed progenitor to leukaemia stem cell initiated by MLL-AF9. Nature. 2006;442:818–822. doi: 10.1038/nature04980. [DOI] [PubMed] [Google Scholar]
- 24.Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018;36:411–420. doi: 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Min J, Feng Q, Li Z, Zhang Y, Xu RM. Structure of the catalytic domain of human DOT1L, a non-SET domain nucleosomal histone methyltransferase. Cell. 2003;112:711–723. doi: 10.1016/S0092-8674(03)00114-4. [DOI] [PubMed] [Google Scholar]
- 26.Trapnell C, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 2014;32:381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Worden EJ, Hoffmann NA, Hicks CW, Wolberger C. Mechanism of cross-talk between H2B ubiquitination and H3 methylation by Dot1L. Cell. 2019;176:1490–1501.e1412. doi: 10.1016/j.cell.2019.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Valencia-Sanchez MI, et al. Structural basis of Dot1L stimulation by histone H2B lysine 120 uUbiquitination. Mol. Cell. 2019;74:1010–1019.e1016. doi: 10.1016/j.molcel.2019.03.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kuntimaddi A, et al. Degree of recruitment of DOT1L to MLL-AF9 defines level of H3K79 di- and tri-methylation on target genes and transformation potential. Cell Rep. 2015;11:808–820. doi: 10.1016/j.celrep.2015.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Deshpande AJ, et al. AF10 regulates progressive H3K79 methylation and HOX gene expression in diverse AML subtypes. Cancer Cell. 2014;26:896–908. doi: 10.1016/j.ccell.2014.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhang H, et al. Structural and functional analysis of the DOT1L-AF10 complex reveals mechanistic insights into MLL-AF10-associated leukemogenesis. Genes Dev. 2018;32:341–346. doi: 10.1101/gad.311639.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Gao J, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 2013;6:pl1. doi: 10.1126/scisignal.2004088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Erb MA, et al. Transcription control by the ENL YEATS domain in acute leukaemia. Nature. 2017;543:270–274. doi: 10.1038/nature21688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Park G, Gong Z, Chen J, Kim JE. Characterization of the DOT1L network: implications of diverse roles for DOT1L. Protein J. 2010;29:213–223. doi: 10.1007/s10930-010-9242-8. [DOI] [PubMed] [Google Scholar]
- 35.Sanjana NE, Shalem O, Zhang F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods. 2014;11:783–784. doi: 10.1038/nmeth.3047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Doench JG, et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 2016;34:184–191. doi: 10.1038/nbt.3437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chen, C.-W. Cell culture protocol for sc-Tiling: high-resolution characterization of gene function using single-cell CRISPR tiling. Protoc. Exch. 10.21203/rs.3.pex-1544/v1 (2021).
- 38.Canver MC, et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature. 2015;527:192–197. doi: 10.1038/nature15521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Burley SK, et al. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2019;47:D464–D474. doi: 10.1093/nar/gky1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Pettersen EF, et al. UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem. 2004;25:1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
- 41.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Buchan DW, Minneci F, Nugent TC, Bryson K, Jones DT. Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res. 2013;41:W349–W357. doi: 10.1093/nar/gkt381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 1988;16:10881–10890. doi: 10.1093/nar/16.22.10881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Yang J, et al. The I-TASSER Suite: protein structure and function prediction. Nat. Methods. 2015;12:7–8. doi: 10.1038/nmeth.3213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Yu W, et al. Catalytic site remodelling of the DOT1L methyltransferase by selective inhibitors. Nat. Commun. 2012;3:1288. doi: 10.1038/ncomms2304. [DOI] [PubMed] [Google Scholar]
- 47.Pierce BG, Hourai Y, Weng Z. Accelerating protein docking in ZDOCK using an advanced 3D convolution library. PLoS ONE. 2011;6:e24657. doi: 10.1371/journal.pone.0024657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Ghandi M, et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature. 2019;569:503–508. doi: 10.1038/s41586-019-1186-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Sherry ST, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. doi: 10.1093/nar/29.1.308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods. 2012;9:671–675. doi: 10.1038/nmeth.2089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Simossis VA, Heringa J. PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res. 2005;33:W289–W294. doi: 10.1093/nar/gki390. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The 10X Genomics single-cell CRISPR and RNA-seq data generated in this study have been deposited in the Gene Expression Omnibus database under accession code GSE174307. Three-dimensional protein structures (PDB ID 3UWP and 6NQA) were obtained from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB; https://www.rcsb.org)39. Consortium genomic information were obtained from cBioPortal (https://www.cbioportal.org)32, Cancer Cell Line Encyclopedia (CCLE; https://portals.broadinstitute.org/ccle)48, and dbSNP (https://www.ncbi.nlm.nih.gov/snp/)49 databases. Additional data that support the findings of this study are provided in the Supplementary information and Source data files. Source data are provided with this paper.
The computational codes/tool packages used in this study are available at https://github.com/l0yang05/singleCell_CRISPR_10x (GitHub) and through other developers and venders, including Genetic Perturbation Platform (Broad Institute)36, Seurat v3.2.324, Monocle2.14.026, Gaussian kernel smoothing in R38, CLC Main Workbench version 8.1 (Qiagen), Bowtie2.3.5.141, edgeR package 3.28.142, PSIPRED v3.3 server43, MultAlin v5.4.1 server44, I-TASSER server v5.145, ZDOCK v3.0.2 software47, PyMOL v1.8.6 software (Schrödinger, LLC), UCSF Chimera 1.1540, ImageJ 1.8.0_17250, PRALINE multiple sequence alignment51, FlowJo v9, and Attune NxT v3.1.2 (Thermo Fisher).