A scalable strategy for high-throughput GFP tagging of endogenous human proteins

Manuel D Leonetti; Sayaka Sekine; Daichi Kamiyama; Jonathan S Weissman; Bo Huang

doi:10.1073/pnas.1606731113

. 2016 Jun 6;113(25):E3501–E3508. doi: 10.1073/pnas.1606731113

A scalable strategy for high-throughput GFP tagging of endogenous human proteins

Manuel D Leonetti ^a,^b,¹, Sayaka Sekine ^c,¹, Daichi Kamiyama ^c,², Jonathan S Weissman ^a,^b,², Bo Huang ^c,²

PMCID: PMC4922190 PMID: 27274053

Significance

The function of a large fraction of the human proteome still remains poorly characterized. Tagging proteins with a functional sequence is a powerful way to access function, and inserting tags at endogenous genomic loci allows the preservation of a near-native cellular background. To characterize the cellular role of human proteins in a systematic manner and in a native context, we developed a method for tagging endogenous human proteins with GFP that is both rapid and readily applicable at a genome-wide scale. Our approach allows studying both localization and interaction partners of the protein target. Our results pave the way for the large-scale generation of endogenously tagged human cell lines for a systematic functional interrogation of the human proteome.

Keywords: CRISPR/Cas9, GFP library, genome engineering

Abstract

A central challenge of the postgenomic era is to comprehensively characterize the cellular role of the ∼20,000 proteins encoded in the human genome. To systematically study protein function in a native cellular background, libraries of human cell lines expressing proteins tagged with a functional sequence at their endogenous loci would be very valuable. Here, using electroporation of Cas9 nuclease/single-guide RNA ribonucleoproteins and taking advantage of a split-GFP system, we describe a scalable method for the robust, scarless, and specific tagging of endogenous human genes with GFP. Our approach requires no molecular cloning and allows a large number of cell lines to be processed in parallel. We demonstrate the scalability of our method by targeting 48 human genes and show that the resulting GFP fluorescence correlates with protein expression levels. We next present how our protocols can be easily adapted for the tagging of a given target with GFP repeats, critically enabling the study of low-abundance proteins. Finally, we show that our GFP tagging approach allows the biochemical isolation of native protein complexes for proteomic studies. Taken together, our results pave the way for the large-scale generation of endogenously tagged human cell lines for the proteome-wide analysis of protein localization and interaction networks in a native cellular context.

More than a decade after the completion of the Human Genome Project (1), over 30% of human genes still lack clear functional annotation (2, 3). Functional tagging is a powerful strategy to characterize the cellular role of proteins. In particular, tags allow access to two key features of protein function: localization (using fluorescent tags) and interaction partners (using epitope tags and immunoprecipitation). Hence, by tagging proteins in a systematic manner, a comprehensive functional description of an organism’s proteome can be achieved. The power of systematic tagging approaches is best illustrated by studies conducted in the budding yeast Saccharomyces cerevisiae (4). In particular, a genome-wide collection of GFP-tagged yeast strains enabled the systematic study of protein localization in live cells (5), whereas libraries of strains expressing TAP epitope-fusion proteins paved the way for the large-scale isolation and proteomic analysis of protein complexes (6, 7). One of the great advantages of yeast genetics (especially in S. cerevisiae) is the efficiency and relative simplicity of PCR-based homologous recombination (8). As a result, functional tags can be easily inserted in a gene locus of interest, preserving endogenous expression levels and minimizing genomic disruption. Together, these genome-wide tagged libraries helped provide a comprehensive snapshot of the yeast protein landscape under near-native conditions (4, 5, 9–11).

The development of clustered regularly interspersed short palindromic repeat associated protein 9 (CRISPR/Cas9)-based methods has profoundly transformed our ability to directly tag human genes at their endogenous loci by facilitating homologous-directed repair (HDR) (12, 13). These methods pave the way for the construction of genome-wide, endogenously tagged libraries of human cells. Any large-scale effort should ideally meet four criteria: (i) scalability, to allow large numbers of genes to be tagged in a time- and cost-effective manner; (ii) specificity, limiting tag insertion to the genomic target (ideally in a “scarless” manner that avoids insertion of irrelevant DNA such as selection marker genes); (iii) versatility of the tag, preferably allowing both localization and proteomic analyses; and (iv) selectability of knockin cells. Recently, a strategy based on electroporation of Cas9/single-guide RNA (sgRNA) ribonucleoprotein complexes (RNPs) has been reported that enables both scalability and specificity (14, 15). In this approach, RNPs are assembled in vitro from purified sgRNA and Cas9, both of which can be obtained commercially or rapidly generated in house. The HDR template containing tag sequence and homology arms to the target locus is supplied as a long single-stranded DNA (ssDNA), commercially available up to 200 nt in length. Electroporation of RNP and ssDNA donor into cells results in very high (>30%) knockin efficiencies, whereas the limited RNP half-life in vivo minimizes off-target integration (14). We reasoned that this strategy would be well suited for large-scale knockin efforts in human cells and envisioned that GFP would be a functional tag of choice: on top of being a fluorescent marker, GFP is also a highly efficient purification handle for protein capture and subsequent proteomic analysis (16–18). GFP-tagged cells are also readily selectable by flow cytometry.

Here we present an experimental approach for the functional tagging of endogenous human loci that meets all four of the above criteria. We recently described how a split-GFP system allows functional GFP endogenous knockin using a minimal tagging sequence (GFP11, corresponding to the 11th β-strand of the superfolder GFP β-barrel structure) (19). When expressed in the same cell, GFP11 and its complementary GFP fragment (GFP1–10) enable functional GFP tagging upon complementation (20). A key advantage of the GFP11 sequence is its small size (16 aa): this allows commercial ssDNA oligomers to be used as HDR donors, circumventing any requirement for molecular cloning. Here we show that electroporation of Cas9 RNPs and GFP11 ssDNA donors in cells constitutively expressing GFP1–10 enables the fast (<1 d) and robust generation of GFP-tagged human cell lines. Tagged proteins are expressed from their endogenous genomic loci with minimal genomic disruption. Applying this strategy to a set of 48 human proteins, we demonstrate the scalability of our method and define the expression threshold for detection of knockin cells by flow cytometry. We next present how our protocols can be easily adapted to allow the knockin of GFP11 repeats at a given locus, which critically allows the functional characterization of low-abundance proteins in a native context. Finally, we describe how GFP11 tagging also enables the isolation of endogenous protein complexes for proteomic analysis, highlighting the versatility of our approach to examine complementary aspects of protein function.

Results

GFP11 and RNP Electroporation Enable Cloning-Free, High-Efficiency GFP Tagging in Human Cells.

Our approach combines two existing methodologies. First, we took advantage of a split-GFP system that separates the superfolder GFP protein into two fragments: GFP1–10 and GFP11 (20). GFP1–10 (i.e., GFP without the 11th β-strand) contains an immature GFP chromophore and is nonfluorescent by itself. Upon coexpression in the same cell, GFP1–10 and GFP11 assemble noncovalently and spontaneously reconstitute a functional GFP molecule (20, 21). Fused to a protein of interest, GFP11 recruits its GFP1–10 partner and enables fluorescent tagging by GFP complementation (Fig. 1A). The fluorescent intensity of the complemented GFP11/GFP1–10 complex is essentially identical to that of full-length GFP (19, 21). Second, we used electroporation of preassembled Cas9 RNPs to achieve high-efficiency genome editing in human cells (14, 15). In particular, very high rates of knockin have been reported using timed delivery of Cas9 RNPs and ssDNA HDR templates in human cell lines (14). A critical advantage of this strategy is that all of the components required for editing (Cas9, sgRNA, and HDR template) are commercially available or rapidly synthesized in house. Cas9 protein can be readily purified from Escherichia coli overexpression cultures (22). Similarly, sgRNAs can be easily transcribed in vitro (14, 23). Purified Cas9 and synthetic sgRNAs can also be obtained commercially. Finally, synthetic ssDNA oligomers are readily available, with a typical size limit of 200 nt. Here, the small size of GFP11 (16 aa) is key: 200 nt is enough to include the GFP11 sequence (57 nt, including a 3-aa linker) flanked by two ∼70-nt homology arms for HDR. Together, the GFP11 methodology and Cas9 RNP electroporation enable the high-efficiency fluorescent tagging of human proteins at their endogenous loci with minimal preparation. Importantly, no molecular cloning is required.

Our experimental design is outlined in Fig. 1B. sgRNAs are transcribed in vitro following PCR assembly of a template including a T7 promoter. RNPs are obtained by mixing of sgRNAs with purified Cas9 protein and supplemented with HDR ssDNA donor. Finally, the RNP/donor mix (100 pmol each) is electroporated into cells that constitutively express the GFP1–10 fragment. For all experiments, we used a human 293T cell line in which the GFP1–10 fragment is stably expressed under the control of a strong spleen focus forming virus (SFFV) promoter by lentiviral integration (hereafter, 293T^GFP1–10). To test our strategy, we targeted the inner nuclear membrane protein lamin A/C in 293T^GFP1–10 cells using an N-terminal GFP11 tag. Flow cytometry analysis demonstrated very high efficiency of functional GFP tagging (>35%) (Fig. 1C). To verify that the GFP signal corresponds to GFP-tagged lamin A/C, we sorted the GFP⁺ cells (as a polyclonal population) and analyzed them by microscopy. All cells exhibited a clear GFP localization limited to the immediate perinuclear region (Fig. 1C). Low-magnitude images are shown in Fig. S1, demonstrating a specific perinuclear localization of GFP-tagged lamin A/C across the entire cell population. These results demonstrate that functional tagging with GFP11 is effectively exclusively on target, eliminating the need to obtain clonal cell lines.

Fig. S1. — Low-magnification microscopy analysis of GFP11-lamin A/C cells. FACS-sorted GFP11-lamin A/C knockin cells were stained with DAPI and analyzed by confocal microscopy. (Scale bars, 50 μm.)

Our protocol can be performed in less than a day (Fig. 1B). We use in-house in vitro transcription as a cost-effective alternative to synthetic sgRNAs, whereas using commercial synthetic sgRNAs could further shorten the time needed to conduct the experiments. We routinely use column-based methods for sgRNA purification, but solid-phase reversible immobilization (SPRI) magnetic beads can be used to the same effect and are best suited for large-scale preparation in a multiwell format (24). The final electroporation step is done in a 96-well format so that a large number of cell lines can be processed in parallel. Therefore, our method is well suited for the rapid and robust generation of libraries of GFP-tagged human cell lines in a multiwell format. Detailed protocols are available in Materials and Methods.

Library-Scale Generation of Knockin Cell Lines.

To test whether our experimental design was applicable to the library-scale generation of endogenously tagged human cell lines, we applied it to a set of 48 human genes in 293T^GFP1–10 cells. This experiment addresses two complementary questions. First, we wanted to evaluate whether most loci would be amenable to GFP11 knockin. Second, we sought to determine the threshold of endogenous protein expression that yields a sufficient level of GFP fluorescence for the detection of knockin cells by flow cytometry or microscopy.

We chose to tag proteins with distinctive subcellular localizations so that microscopy analysis of GFP⁺ cells would be a good predictor of on-target knockin. GFP11 was introduced at either N or C termini. For each protein target we tested a single sgRNA, selected to induce genomic cleavage within 30 nt of the chosen terminus. HDR donor templates were designed to disrupt the sgRNA recognition site to prevent further cleavage of knocked-in sequences by Cas9. Finally, we characterized the efficiency of GFP11 knockin by flow cytometry (Fig. 2A). Of the 48 genes we targeted, 30 (i.e., 63%) gave rise to a clear population of GFP⁺ cells. For each of these 30 successful targets, we analyzed the resulting cells by confocal microscopy and confirmed that GFP fluorescence matched exclusively the expected subcellular localization of the corresponding protein (Fig. 2A; complete data for all 30 cell lines are shown in Fig. S2). We further characterized four of these cell lines by FACS followed by immunofluorescence using antibodies specific to the target proteins. In all cases, GFP and immunofluorescence signals coincided entirely, validating the specificity of GFP11 knockin (Fig. S3). Altogether, this initial library-scale analysis proves that our method is scalable for the specific endogenous GFP tagging of a large number of human genes.

Fig. 2. — Library-scale GFP11 tagging of 48 different gene targets. (A) Examples of successful targets showing knockin efficiency (flow cytometry histograms, *Upper*) and confocal microscopy analysis (GFP fluorescence, *Lower*). (Scale bars, 10 μm.) As GFP intensity varies widely across different targets, the different images showed here use different levels of brightness and contrast. (B) Correlation between target expression level (defined as ribosome profiling RPKM) and GFP signal (as measured by flow cytometry, arbitrary units scaled to background fluorescence = 1). The 30 successful targets and 18 unsuccessful targets are shown as blue and brown dots, respectively. For successful targets, a linear regression is shown (solid line, Pearson’s R = 0.69). (*Inset*) Box plots showing RPKM distribution for unsuccessful vs. successful targets. Boxes represent 25th, 50th, and 75th percentiles. Whiskers represent minimum and maximum values. (C) Analysis of NUP35 GFP11 knockin by flow cytometry (*Upper*) and confocal microscopy (*Lower*). (Scale bar, 10 μm.) NUP35 knockin cells are not detected by flow cytometry but can be identified by microscopy.

Fig. S2. — Analysis of GFP⁺ targets by flow cytometry and fluorescence microscopy. All 30 gene targets successfully tagged by GFP11 are shown. For each target, both flow cytometry analysis (*Upper*) and representative confocal microscopy pictures (*Lower*) are shown. Expected subcellular localization of each target is indicated. ER, endoplasmic reticulum; PM, plasma membrane; cyto, cytosol; OMM, outer mitochondrial membrane; mito, mitochondria. (Scale bars, 10 μm.)

Fig. S3. — Comparison of GFP signal and protein localization by immunofluorescence. For the four knockin targets shown, GFP⁺ cells were FACS sorted, fixed, and stained with antibodies to the corresponding protein target. In each case, GFP fluorescence (*Top*) and immunostaining (*Middle*) are compared. (Scale bars, 5 μm.)

To test the robustness of our approach, we deliberately targeted proteins spanning a wide range of native expression levels. To correlate GFP fluorescence to protein abundance, we used a published ribosome profiling dataset from 293T cells as a reference for protein expression levels (25). Ribosome profiling is a high-throughput sequencing-based method that measures the density of ribosomes present on cellular mRNAs, thus providing a measure of protein synthesis rate (26). For each gene, ribosome density as measured by ribosome profiling is represented by a reads per kilobase of transcript per million mapped reads (RPKM) value. Because the abundance of a given protein is closely associated with the rate of its synthesis, RPKM data are a reasonable proxy for absolute protein expression levels (27). The relationship between flow cytometry GFP signal of knockin cells and RPKM level for all 48 proteins we tested is shown in Fig. 2B. GFP fluorescence intensity and predicted protein abundance for the 30 positive knockin lines are well correlated (Fig. 2B, blue dots), indicating that GFP11 expression reports on the native expression level of the target protein. To estimate a minimal expression level compatible with GFP detection by flow cytometry, we found that an expression level of 27 RPKM would yield a GFP signal 2 SDs above background fluorescence (Fig. 2B, light blue line) based on a regression from our data (Fig. 2B, solid line). In the ribosome profiling dataset, about 30% of proteins expressed in 293T cells are found above this 27 RPKM threshold (defining here a protein as expressed if its RPKM is nonzero). In other words, this qualitative analysis suggests that ∼30% of proteins in a given cell line have an expression level compatible with the detection of GFP11 knockin cells by flow cytometry.

Low protein expression is likely the main determinant for the lack of GFP⁺ cells detected by flow cytometry in 37% of the genes we targeted. Comparing expression levels of the successful vs. unsuccessful sets of targets revealed that unsuccessful targets have significantly lower expression levels (median expression: 180 vs. 40 RPKM, respectively) (Fig. 2B, box plots). Therefore, the fluorescent signal for some of these failed targets might simply be below the detection limit of our flow cytometry assay. This is exemplified by NUP35 (Fig. 2C), a nuclear-pore complex protein of low expression level (43 RPKM). NUP35 GFP11-tagged cells scored negative by flow cytometry, but confocal microscopy analysis revealed cells exhibiting dim GFP fluorescence clearly restricted to foci on the nuclear membrane (Fig. 2C), indicative of specific NUP35 tagging. Fluorescent detection of NUP35 is facilitated by the fact that NUP35 concentrates in specific foci so that proteins of similar abundance but with a more diffuse localization pattern might be very hard to detect, even by microscopy. Altogether, our data show that relying on endogenous expression levels poses a particular challenge for the study of low-abundance proteins, which in fact make up the bulk of proteins in human cells.

A Scalable Strategy for the Knockin of GFP11 Repeats Enables Fluorescent Detection of Low-Expression Proteins.

Our results highlight the difficulty in studying proteins of low abundance while maintaining native expression levels. How can these two elements be reconciled? As we have previously shown (19), the GFP11 system offers an elegant solution: by tagging a protein with repeats of the GFP11 sequence, multiple GFP1–10 fragments can be recruited to the same polypeptide, thereby increasing the fluorescent signal of the target (Fig. 3A). Importantly, tagging with GFP11 repeats preserves native protein function. For example, the tandem arrangement of seven GFP11 sequences enabled us to readily track a single transport particle in primary cilia without affecting its motility (19).

Fig. 3. — Knockin of GFP11 repeats increases GFP fluorescence. (A) Principle of fluorescent tagging with GFP11 repeats. (B) Experimental workflow for ssDNA synthesis of HDR templates. See text for details. IVT, in vitro transcription; RT, reverse transcription. (C) Comparison of 1× GFP11 vs. 4× GFP11 knockin at the lamin A/C N terminus as analyzed by flow cytometry (*Left*) and confocal microscopy; GFP fluorescence (*Right*). (Scale bars, 10 μm.) Microscopy images were taken under identical exposure conditions and are shown using identical brightness and contrast settings, and can therefore be directly compared with one another.

We sought to develop an experimental strategy that would allow knockin of GFP11 repeats while preserving the scalability, specificity, and efficiency of our protocols. In particular, we reasoned that using a ssDNA form of HDR template would be advantageous because ssDNA donors have been shown to be more efficient and less prone to nonspecific integration than their double-stranded counterparts (14, 28). Because GFP11 repeats exceed the current size limitation for ssDNA synthesis, we exploited the availability of large synthetic double-stranded DNA fragments for the production of ssDNA templates by adapting a method originally described for the synthesis of imaging probes (29). Our strategy starts with a synthetic (commercial) dsDNA fragment containing a T7 promoter followed by a cassette of GFP11 repeats flanked by homology arms (Fig. 3B). T7 in vitro transcription followed by reverse transcription yields a DNA:RNA hybrid product. The RNA strand can be readily hydrolyzed at high pH to produce a corresponding ssDNA molecule (Fig. 3B). By using SPRI magnetic beads for all purification steps, these protocols can be carried out in multiwell format and in less than 8 h. Together with the wide availability of commercial resources for synthetic dsDNA synthesis, our method enables the fast and scalable production of ssDNA HDR templates irrespective of sequence length.

To evaluate this approach, we prepared a ssDNA template for the tagging of the lamin A/C N terminus with four repeats of GFP11 (including ∼300-nt homology arms flanking a 4× GFP11 tagging cassette of ∼250 nt). Flow cytometry analysis (Fig. 3C) revealed that the 4× GFP11 cassette was integrated with similar efficiency to the 1× GFP11 counterpart. In addition, 4× GFP11 tagging led to a corresponding fourfold increase in fluorescence intensity (Fig. 3C). This increase is also apparent in microscopy images taken using identical exposure levels (Fig. 3C, Right). This microscopy analysis also confirmed that GFP signal is limited to the inner nuclear membrane, confirming knockin specificity. Altogether, these results validate our experimental strategy for the scalable and high-efficiency tagging of endogenous loci with GFP11 repeats. By lowering the expression level required for detection, GFP11 repeats enable the study of low-abundance proteins in their native cellular context. These methods pave the way for the construction of GFP-tagged cell libraries covering a majority of the human proteome. For example, whereas the analysis above indicated that only about 30% of the proteome is accessible with a single GFP11 (RPKM > 27), about 60% of all expressed proteins could be detected with 4× GFP11 repeats (assuming a fourfold lower detection limit, i.e., RPKM > 6.8).

Isolation of Native Protein Complexes from GFP11 Knockin Cells.

One of the great advantages of GFP is its versatility as both a fluorescent marker and a very effective handle for the immunopurification of native complexes (16). The use of anti-GFP pull-downs for the high-resolution mapping of protein interactions by mass spectrometry is illustrated by recent studies using human lines containing GFP-tagged genes expressed on bacterial artificial chromosomes (17, 18). Therefore, we envisioned that GFP11 endogenous knockin cell lines might be a valuable resource for the study of native protein–protein interactions in human cells.

We first confirmed that the noncovalent GFP11/GFP1–10 assembly can be efficiently captured by conventional anti-GFP reagents. We focused on four well-established multiprotein complexes: cohesin (30), the SEC61 translocon (31), clathrin (32) and the SPOTS sphingolipid synthesis complex (33). For each, we tagged a single subunit in 293T^GFP1–10 cells, FACS-sorted knockin cells, and prepared lysates that were incubated with a commercial anti-GFP nanobody resin. After extensive washing of the resin, we eluted bound proteins by denaturation in SDS buffer and analyzed protein complexes by Western blot. For all four complexes, we were able to recover the GFP11-tagged bait as well as its expected interaction partners (Fig. 4A). Because bound proteins can be directly digested on-beads and affinity capture is sufficient for quantitative mass spectrometry experiments (17, 18), our results demonstrate the utility of endogenous GFP11 knockin for the proteomic analysis of native protein complexes.

Fig. 4. — Isolation of native protein complexes in GFP11 knockin cells by GFP immunoprecipitation. (A) Western blot analysis following GFP immunoprecipitation. Four distinct protein complexes (cohesin, translocon, clathrin, and SPOTS) were studied. For each complex, a single subunit was tagged with GFP11 (“GFP11 bait,” marked by an asterisk in corresponding drawings). Proteins were captured on anti-GFP resin, washed extensively, and eluted in SDS buffer. Protein content was analyzed by SDS/PAGE and Western blot using protein-specific antibodies. Both GFP11 bait and expected interaction partners can be recovered. Numbers represent the migration of molecular weight markers (in kilodaltons). (B) Comparison of knockin efficiency of GFP11 vs. GFP11-TEV tag sequences at the SEC61B N terminus, as analyzed by flow cytometry. Corresponding ssDNA HDR templates are shown. (C) Recovery of purified SEC61 complex following on-resin TEV cleavage. Proteins were captured on anti-GFP resin, washed extensively, and eluted by incubation with TEV protease. Eluates were analyzed by SDS/PAGE and silver staining. SEC61 proteins are marked, as well as unidentified interaction partners (asterisk).

For applications in which the recovery of purified proteins is advantageous (e.g., activity assays or structural studies), we modified our tagging cassette to include a tobacco etch virus protease (TEV) site to allow the specific release of captured proteins by protease treatment. To pilot this approach, we tagged the SEC61B N terminus with GFP11 followed by a TEV recognition sequence. Because the TEV recognition sequence is short enough (7 aa), the GFP11-TEV cassette can be included on a 200-nt synthetic ssDNA oligo template (Fig. 4B). Knockin efficiencies of GFP11-TEV vs. GFP11 alone were comparable (18% and 28%, respectively) (Fig. 4B). We FACS sorted knockin cells, captured tagged proteins on anti-GFP beads, and eluted by treatment with TEV protease. Analysis of the eluate by SDS/PAGE and silver staining (Fig. 4C) showed the specific elution of the entire SEC61 complex (SEC61A, SEC61B, and SEC61G) (Fig. S4), together with unidentified interaction partners (Fig. 4C, asterisk). The comprehensive analysis of the SEC61 interactome is beyond the scope of the present study, but this pilot experiment demonstrates that our tagging method can be easily adapted to include protease recognition sites for the release of captured proteins. In particular, this purification strategy yields very pure material despite the low abundance of endogenous proteins: no background staining was detected in control samples using lysates from either the GFP1–10 parent cell line or a GFP11-SEC61B construct that does not include a TEV recognition sequence (Fig. 4C). These controls also demonstrate the high specificity of anti-GFP nanobody reagents for the capture of tagged proteins.

Fig. S4. — Identification of SEC61 subunits by Western blot (related to Fig. 4). To confirm the identity of proteins observed by silver staining in the GFP11-TEV-SEC61B eluate (lane 3), the eluate was analyzed by Western blot using subunit-specific antibodies. All images were aligned according to the position of molecular weight (MW) standards. Subunit identity was assigned based on matching position between silver stain and Western blot images.

Discussion

Altogether, our results establish GFP11 RNP knockin as a powerful strategy for the fast and efficient generation of endogenously tagged human cell lines. Our approach has several key advantages. First, contrary to designs that require the multistep preparation of HDR targeting vectors, all of the protocols we describe require no molecular cloning and can be carried out very rapidly and in large-scale format. Second, Cas9 RNP electroporation and ssDNA templates enable very high knockin efficiency while minimizing off-target cleavage or nonspecific tag integration (14). Third, the GFP11 system provides a simple solution for the study of low-abundance proteins because knockin of GFP11 repeats increases fluorescence signal. Fourth, GFP is a particularly versatile tool that enables the study of both protein localization and protein–protein interactions. Finally, the utility of endogenously tagged cell lines is evident, allowing the function of a protein to be characterized under the control of native regulators of gene expression and without disturbing endogenous interaction stoichiometry. In this respect, the small size of the GFP11 cassette is advantageous because its introduction into a locus of interest is relatively seamless, minimizing perturbation of the surrounding genomic structure. Together, the methods presented here provide scalability, specificity, versatility, and selectability and pave the way for the genome-scale construction of human cell lines tagged with GFP at endogenous loci. Interestingly, we recently described a split-sfCherry construct using a design similar to the GFP11 system (19). All our protocols can be directly adapted to any other split-fluorescent proteins, enabling the construction of multicolor tagged cell lines. Furthermore, other functional sequences can be coupled to GFP11 to tag proteins for various applications (e.g., protease sites for elution, or degron sequences for the specific control of protein expression) (34), and our results with 4× GFP11 knockin show that long tagging cassettes can be integrated with high efficiency. Lastly, GFP11-tagged cell lines could be a valuable resource for structural genomics efforts. Indeed, GFP tagging is a powerful tool to identify biochemically stable protein complexes by fluorescent size-exclusion chromatography (35) and also enables the recovery of high-purity material suitable for structural characterization (especially by cryoelectron microscopy, which does not require large amounts of material).

Our approach also has a few limitations that should be addressed. The main restriction is the requirement for GFP1–10 expression in the cell line of interest. Here we used lentiviral methods for the integration of a GFP1–10 expression cassette for practicability. A more controlled strategy would be to insert the GFP1–10 cassette in an established safe harbor locus, where insertion of exogenous sequences is known to preserve genomic integrity (36). Safe harbor integration can be easily achieved, for example at the human AAVS1 locus (36). The cytoplasmic form of GFP1–10 can only complement with GFP11 accessible from the cytoplasm or the nucleus. To address this restriction, we have previously demonstrated that adding localization signals to GFP1–10 enables the labeling of GFP11-tagged proteins in other cellular compartments, such as using endoplasmic reticulum–localized GFP1–10 to label endoplasmic reticulum lumen proteins and extracellular domains of transmembrane proteins (19). A last limitation of our approach is inherent to any effort of protein tagging. It is possible that, in a subset of proteins, introduction of GFP11 would disturb protein function (for example by changing protein structure or shielding an important interaction interface). We believe that the small size of GFP11 is beneficial in this respect, as it should not affect much the native folding of the target protein. Importantly, GFP11 can be introduced interchangeably at either the N or C terminus (or in any loop region) of a protein target, and it is likely that in cases where introducing the tag at one site is problematic, introducing it at another position would be well tolerated.

Finally, our strategy is also limited by any shortcomings of the CRISPR/Cas9 system. In particular, knockin efficiency depends critically on the activity of the sgRNA used for genomic cleavage. Different sgRNA sequences can vary widely in term of potency, and prediction algorithms have been developed to overcome this issue (37, 38). However, because HDR knockin requires genomic cleavage close to the site of tag integration, for some genes the choice of sgRNAs to pick from might be scarce. However, our results are very encouraging in this respect. In our 48-gene library-scale experiment, we only tested a single sgRNA for each gene and saw a high rate (63%) of successful tagging. Alternatively, tagging a given protein at another site in the protein sequence might allow more optimal genomic cleavage. A last limitation is that, because 100% knockin efficiency is not currently attainable, most targeted cells have only a single allele tagged. Moreover, because nonhomologous end joining is usually more prevalent than homologous-directed repair following Cas9 cleavage (14), it is likely that in some cells the nontagged allele (or alleles, in polyploid cells) will contain indel mutations. We believe that, in most cases, this should not compromise the proper functional characterization of the target protein. In particular, working with polyclonal populations and using population averages helps mitigate the possible defects present in a small number of individual cells. Alternatively, single clones can be isolated to identify homozygous knockin cells. The very high knockin efficiencies that we report will significantly facilitate the successful isolation of homozygous clones.

Altogether, the results of our library-scale experiment highlight the applicability of GFP11 knockin for the tagging of a large fraction of the human proteome. We anticipate that low expression level of a target protein will be an obstacle to the detection and selection of a subset of GFP11-tagged cells. The tagging of genes with GFP11 repeats provides a direct solution to this drawback. Notably, tagging with GFP11 repeats is not substantially more challenging than tagging with a single GFP11 sequence. Our protocols for the production of long ssDNA templates are simple, fast (<1 d), and cloning free. Furthermore, the example of lamin A/C tagging (Fig. 3C) demonstrates that 1× GFP11 and 4× GFP11 cassettes are integrated with comparable efficiency. Therefore, tagging with GFP11 repeats should be preferred for proteins expected to be expressed at low levels. On the other hand, for a small subset of targets we could not detect GFP⁺ cells despite their high predicted expression (Fig. 2B), suggesting that expression level is not the sole determinant for successful tagging. In some cases, this lack of detectable tagging might indicate that the Cas9/sgRNA complex failed to access and cut the target genomic sequence (for example, we have recently shown that high nucleosome occupancy can impede Cas9 access to DNA) (39). As a solution, tagging could be achieved by using sgRNAs targeting alternative sites within the desired locus. In some other cases, the lack of GFP detection could originate from the lack of physical accessibility to the GFP11 tag for complementation with GFP1–10 (for example, if GFP11 is buried inside a structural pocket within the target protein). Then, introducing a longer linker between the target protein and the GFP11 tag would be beneficial.

Overall, we believe that the many advantages of GFP11 RNP knockin far outweigh its potential limitations, especially for studies requiring the tagging of many different genes in parallel given the speed and scalability of our protocols. In addition, our protocols will directly benefit from the continued and rapid optimization of CRISPR/Cas9-based methods. Altogether, the experimental approach described here directly paves the way for the generation of genome-wide libraries of human cells harboring GFP-tagged proteins at their endogenous loci. This opens tremendous opportunities for the comprehensive characterization of the human proteome in a native cellular context.

Materials and Methods

Nucleic Acid Reagents.

All synthetic nucleic acid reagents were purchased from Integrative DNA Technologies (IDT DNA). For knockin of a single GFP11 sequence, 200-mer HDR templates were ordered in ssDNA form (Ultramer oligos). For knockin of GFP11 repeats, HDR template was ordered in dsDNA form (gBlock fragments) and processed to ssDNA as described below. The complete set of DNA sequences used for the experiments described here can be found in Dataset S1.

293T^GFP1–10 Generation and Cell Culture.

HEK 293T cells were cultured in high-glucose DMEM supplemented with 10% (vol/vol) FBS, 1 mM glutamine and 100 μg/mL penicillin/streptomycin (Gibco). 293T^GFP1–10 cells were generated by lentiviral integration from the vector pHR-SFFV-GFP1-10 described in ref. 19 and a clonal cell line was isolated and used for knockin experiments. Cells were maintained below 80% confluency.

sgRNA in Vitro Transcription.

sgRNAs were prepared following methods by Lin et al. (14) with some modifications. sgRNAs were obtained by in vitro transcription of a DNA template of the following sequence: 5′-TAA TAC GAC TCA CTA TAG GNN NNN NNN NNN NNN NNN NNG TTT AAG AGC TAT GCT GGA AAC AGC ATA GCA AGT TTA AAT AAG GCT AGT CCG TTA TCA ACT TGA AAA AGT GGC ACC GAG TCG GTG CTT TTT TT-3′ containing a T7 promoter (TAATACGACTCACTATAG), a gene-specific ∼20-nt sgRNA sequence starting with a G for optimal T7 transcription (GNNNNNNNNNNNNNNNNNNN), and a common sgRNA constant region. The DNA template was generated by overlapping PCR using a set of four primers: three primers common to all reactions (forward primer T25: 5′-TAA TAC GAC TCA CTA TAG-3′; reverse primer BS7: 5′-AAA AAA AGC ACC GAC TCG GTG C-3′ and reverse primer ML611: 5′-AAA AAA AGC ACC GAC TCG GTG CCA CTT TTT CAA GTT GAT AAC GGA CTA GCC TTA TTT AAA CTT GCT ATG CTG TTT CCA GCA TAG CTC TTA AAC-3′) and one gene-specific primer (forward primer 5′-TAA TAC GAC TCA CTA TAG GNN NNN NNN NNN NNN NNN NNG TTT AAG AGC TAT GCT GGA A-3′). For each template a 100-μL PCR was set using iProof High-Fidelity Master Mix (Bio-Rad) reagents supplemented with 1 μM T25, 1 μM BS7, 20 nM ML611, and 20 nM gene-specific primer. The thermocycler setting consisted of: 95 °C for 30 s; 30 cycles of 95 °C for 15 s, 57 °C for 15 s, and 72 °C for 15 s; and 72 °C for 30 s. The PCR product was purified on DNA Clean and Concentrator-5 columns (Zymo Research) following the manufacturer’s instructions and eluted in 12 μL of RNase-free DNA buffer (2 mM Tris pH 8.0 in DEPC-treated H₂O). Next, a 100-μL in vitro transcription reaction was set using 300 ng DNA template and 1000 units of T7 RNA polymerase in buffer containing (in millimolar): 40 Tris pH 7.9, 20 MgCl₂, 5 DTT, 2 spermidine, and 2 each NTP (New England Biolabs). Following a 4-h incubation at 37 °C, the sgRNA product was purified on RNA Clean and Concentrator-5 columns (Zymo Research) and eluted in 15 μL of RNase-free RNA buffer (10 mM Tris pH 7.0 in DEPC-treated H₂O). sgRNA quality was routinely checked by running 3 pg of the purified sgRNA on a 10% polyacrylamide gel containing 7 M urea (Novex TBE-urea gels, ThermoFisher Scientific).

RNP Assembly and Electroporation.

Cas9/sgRNA RNP complexes were prepared following methods by Lin et al. (14) with some modifications. Cas9 protein (pMJ915 construct, containing two nuclear localization sequences) was expressed in E. coli and purified by the University of California Berkeley Macrolab following protocols described by Jinek et al. (22). The 293T^GFP1–10 cells were treated with 200 ng/mL nocodazole (Sigma) for 15 h before electroporation to increase HDR efficiency as shown by Lin et al. (14). RNP complexes were assembled with 100 pmol Cas9 protein and 130 pmol sgRNA just before electroporation and combined with HDR template in a final volume of 10 μL. First, 130 pmol purified sgRNA was diluted to 6.5 μL in Cas9 buffer (final concentrations: 150 mM KCl, 20 mM Tris pH 7.5, 1 mM TCEP-HCl, 1 mM MgCl₂, 10% vol/vol glycerol) and incubated at 70 °C for 5 min. A total of 2.5 μL of Cas9 protein (40 μM stock in Cas9 buffer, i.e., 100 pmol) was then added and RNP assembly was carried out at 37 °C for 10 min. Finally, 1 μL of HDR template (100 μM stock in Cas9 buffer, i.e., 100 pmol) was added to this RNP solution. Electroporation was carried out in a Amaxa 96-well shuttle Nuleofector device (Lonza) using SF-cell line reagents (Lonza) following the manufacturer’s instructions. Nocodazole-treated 293T^GFP1–10 cells were washed with PBS and resuspended to 10⁴ cells per microliter in SF solution immediately before electroporation. For each sample, 20 μL of cells (i.e., 2 × 10⁵ cells) was added to the 10 μL RNP/template mixture. Cells were immediately electroporated using the CM130 program and transferred to 1 mL supplemented DMEM in a 24-well plate. Electroporated cells were cultured for 5 d before analysis.

Preparation of 4× GFP11-LMNA ssDNA Template.

The 4× GFP11-LMNA ssDNA template was prepared from a commercial dsDNA fragment (gBlock, IDT DNA) containing the template sequence preceded by a T7 promoter, adapting a strategy first described by Chen et al. (29). The dsDNA fragment was first amplified by PCR (forward primer ML888: 5′-AGC TGA TAA TAC GAC TCA CTA TAG GG-3′, reverse primer ML904: 5′-CGA CTT TCG CGC CAC TCA AGC-3′) using Kapa HiFi reagents (Kapa Biosystems) in a 100-μL reaction containing 0.25 μM each primer, 10 ng DNA template, and 0.3 mM dNTPs. Amplified dsDNA was purified using SPRI beads (AMPure XP resin, Beckman Coulter) at a 1:1 DNA:resin volume ratio (following manufacturer’s instructions) and eluted in 25 μL RNase-free H₂O. Next, RNA was formed by T7 in vitro transcription using T7 HiScribe reagents (New England Biolabs) in a 50-μL reaction containing: 5 pmol dsDNA template, 10 mM each NTP, and 5 μL HiScribe T7 polymerase. Following a 4-h incubation at 37 °C, the reaction was treated with 4 units TURBO DNase (ThermoFisher Scientific) and incubated another 15 min at 37 °C. The RNA product was then purified using SPRI beads at a 1:1 RNA:resin volume ratio and eluted in 60 μL RNase-free H₂O. DNA:RNA hybrid was then synthesized by reverse transcription using Maxima H RT reagents (ThermoFisher Scientific). First, a 42-μL solution (in nuclease-free water) containing 500 pmol RNA template, 1 nmol ML904 primer, and 2.4 mM each dNTPs was incubated 5 min at 65 °C and transferred on ice for 5 min to allow for primer annealing. Then, 12 μL 5× Maxima buffer, 3 μL Maxima RT enzyme, and 3 μL SUPERase In RNase inhibitor were added and the RT reaction was carried out for 45 min at 50 °C. Finally, the RNA strand was hydrolyzed by the addition of 24 μL of NaOH solution (0.5 M NaOH + 0.25 M EDTA, in H₂O) followed by incubation at 95 °C for 10 min. The final ssDNA product was purified using SPRI beads at a 1:1.2 DNA:resin volume ratio and eluted in 15 μL H₂O.

Flow Cytometry and Analysis.

Analytical flow cytometry was carried out on a LSR II instrument (BD Biosciences) and cell sorting, on a FACSAria II (BD Biosciences). Flow cytometry data analysis and figure preparation was done using FlowJo software. For the measurement of GFP signals in Fig. 3B, flow cytometry traces were fitted with two Gaussian functions (the first Gaussian corresponding to background fluorescence, the second Gaussian to specific GFP fluorescence). GFP signal is measured by the difference: (average specific GFP fluorescence) − (average background fluorescence). Double Gaussian fit was particularly important to measure GFP signal of low-expression proteins, for which background and specific GFP signals have significant overlap (e.g., the SPTLC1 target in Fig. 2A).

Protein Pull-Down.

For each sample, the cell pellet from a 15-cm plate culture was resuspended in 1.5 mL GFP buffer [150 mM K-acetate, 50 mM Hepes pH 6.8, 2 mM MgCl₂, 1 mM CaCl₂, 15% (vol/vol) glycerol] supplemented with 1.5% (wt/vol) digitonin (high purity, Merck Millipore) and protease inhibitors (cOmplete EDTA-free mixture, Roche), and incubated 2 h at 4 °C, rotating. The lysate was then clarified by centrifugation (20,000 × g, 30 min, 4 °C) and the supernatant incubated with 8 μL anti-GFP resin slurry (GFP-Trap_A resin, ChromoTek) for 2 h at 4 °C, rotating. The resin was then washed three times with wash buffer (GFP buffer + 0.1% digitonin). For Western blot analysis, proteins were eluted by boiling the washed resin in SDS buffer [50 mM Tris pH 6.8, 2% (wt/vol) SDS, 1% β -ME, 6% glycerol; final concentrations]. For TEV elution, the washed resin was incubated with 0.5 μg of His₆-TEV protease (Sigma) overnight at 4 °C.

Primary Antibodies Used for Western Blot.

The primary antibodies used for Western blot were as follows: anti-SMC1 (ProMab 20426); anti-SMC3 (Abcam ab9263); anti-RAD21 (Abcam ab992); anti-SEC61B (Cell Signaling Technologies D5Q1W) anti-SEC61A (Cell Signaling Technologies D7Q6V); anti-SEC61G (Proteintech 11147–2-AP); anti-CLTA (X16, gift from Yvette Schollmeier, F. Brodsky Laboratory, University of California, San Francisco); anti-CLTC (Santa Cruz Biotechnology sc-12734); anti-SPTLC1 (BD Biosciences 611305); anti-SPTLC2 (ProSci 6305); and anti-ORMDL (Abcam ab128660). All antibodies were used at 1:1,000 dilution.

Imaging.

Cells were grown in 96-well glass bottom plates with no. 1.5 high performance cover glass (In Vitro Scientific) coated with Fibronectin (Roche) for 48 h and then fixed with 4% paraformaldehyde (Electron Microscopy Sciences, cat. no. 15710-S) for 15 min at room temperature. The fixed cells were imaged on an inverted Nikon Ti-E microscope, Yokogawa CSU-22 confocal scanner unit, Plan Fluor 10×/0.3 numerical aperature (N.A.) objective or Plan Apo VC 60×/1.4 N.A. oil objective, an Andor EM-CCD camera (iXon DU897), and Micro-Manager software. All imaging experiments were performed at University of California San Francisco Nikon Image Center. For the comparison of 1× GFP11-LMNA and 4× GFP11-LMNA in Fig. 3C, exactly the same excitation power, exposure time, and brightness and contrast were used. The brightness and contrast for other images were automatically set by ImageJ. For immunocytochemistry, mouse monoclonal anti-histone H2B (1:50; Abcam, ab52484) antibody, rabbit polyclonal antibodies anti-lamin A/C (1:20; Santa Cruz Biotechnology, H110), anti-cAMP protein kinase catalytic subunit (1:1,000; Abcam, ab26322), and anti-CBX/HP1 β (1:100; Abcam, ab10478) were used. Anti-mouse or anti-rabbit donkey secondary antibodies (Jackson Immuno Research Laboratories) were conjugated with Alexa Fluor 647 or Cy5, respectively. The fixed cells were permeabilized with 0.1% Triton X-100 (Sigma), blocked with 5% BSA (Jackson Immuno Research Laboratories) in PBS, and stained with primary antibodies and secondary antibodies at 4 °C overnight.

Supplementary Material

Supplementary File

pnas.1606731113.sd01.xlsx^{(42.4KB, xlsx)}

Acknowledgments

We thank B. Staahl and S. Lin in the J. Doudna laboratory (University of California, Berkeley) for advice with ribonucleoprotein complex preparation; E. Crawford in the J. DeRisi laboratory (University of California, San Francisco) for advice with single-guide RNA purification; and A. Banfal in the J.S.W. laboratory for help with 4× GFP11-LMNA template preparation. M.D.L. is a fellow of the Jane Coffin Childs Memorial Funds for Medical Research. This work was supported by NIH R21MH101688 (to B.H. and D.K.); NIH Director’s New Innovator Award DP2OD008479 (to B.H.); the Howard Hughes Medical Institute (J.S.W.); and a Japan Society for the Promotion of Science Postdoctoral Fellowship for Overseas Researchers (to S.S.).

Footnotes

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1606731113/-/DCSupplemental.

References

1.International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome. Nature. 2004;431(7011):931–945. doi: 10.1038/nature03001. [DOI] [PubMed] [Google Scholar]
2.Hanson AD, Pribat A, Waller JC, de Crécy-Lagard V. ‘Unknown’ proteins and ‘orphan’ enzymes: The missing half of the engineering parts list--and how to find it. Biochem J. 2009;425(1):1–11. doi: 10.1042/BJ20091328. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Dey G, Jaimovich A, Collins SR, Seki A, Meyer T. Systematic discovery of human gene function and principles of modular organization through phylogenetic profiling. Cell Rep. 2015;10(6):993–1006. doi: 10.1016/j.celrep.2015.01.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Schuldiner M, Weissman JS. The contribution of systematic approaches to characterizing the proteins and functions of the endoplasmic reticulum. Cold Spring Harb Perspect Biol. 2013;5(3):a013284. doi: 10.1101/cshperspect.a013284. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Huh W-K, et al. Global analysis of protein localization in budding yeast. Nature. 2003;425(6959):686–691. doi: 10.1038/nature02026. [DOI] [PubMed] [Google Scholar]
6.Gavin A-C, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415(6868):141–147. doi: 10.1038/415141a. [DOI] [PubMed] [Google Scholar]
7.Krogan NJ, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440(7084):637–643. doi: 10.1038/nature04670. [DOI] [PubMed] [Google Scholar]
8.Baudin A, Ozier-Kalogeropoulos O, Denouel A, Lacroute F, Cullin C. A simple and efficient method for direct gene deletion in Saccharomyces cerevisiae. Nucleic Acids Res. 1993;21(14):3329–3330. doi: 10.1093/nar/21.14.3329. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Ghaemmaghami S, et al. Global analysis of protein expression in yeast. Nature. 2003;425(6959):737–741. doi: 10.1038/nature02046. [DOI] [PubMed] [Google Scholar]
10.Chong YT, et al. Yeast proteome dynamics from single cell imaging and automated analysis. Cell. 2015;161(6):1413–1424. doi: 10.1016/j.cell.2015.04.051. [DOI] [PubMed] [Google Scholar]
11.Breker M, Schuldiner M. The emergence of proteome-wide technologies: Systematic analysis of proteins comes of age. Nat Rev Mol Cell Biol. 2014;15(7):453–464. doi: 10.1038/nrm3821. [DOI] [PubMed] [Google Scholar]
12.Cong L, et al. Multiplex genome engineering using CRISPR/Cas systems. Science. 2013;339(6121):819–823. doi: 10.1126/science.1231143. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Jinek M, et al. RNA-programmed genome editing in human cells. eLife. 2013;2:e00471. doi: 10.7554/eLife.00471. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Lin S, Staahl BT, Alla RK, Doudna JA. Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery. eLife. 2014;3:e04766. doi: 10.7554/eLife.04766. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Kim S, Kim D, Cho SW, Kim J, Kim JS. Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins. Genome Res. 2014;24(6):1012–1019. doi: 10.1101/gr.171322.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Cristea IM, Williams R, Chait BT, Rout MP. Fluorescent proteins as proteomic probes. Mol Cell Proteomics. 2005;4(12):1933–1941. doi: 10.1074/mcp.M500227-MCP200. [DOI] [PubMed] [Google Scholar]
17.Hein MY, et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell. 2015;163(3):712–723. doi: 10.1016/j.cell.2015.09.053. [DOI] [PubMed] [Google Scholar]
18.Hubner NC, et al. Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions. J Cell Biol. 2010;189(4):739–754. doi: 10.1083/jcb.200911091. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Kamiyama D, et al. Versatile protein tagging in cells with split fluorescent protein. Nat Commun. 2016;7:11046. doi: 10.1038/ncomms11046. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Cabantous S, Terwilliger TC, Waldo GS. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat Biotechnol. 2005;23(1):102–107. doi: 10.1038/nbt1044. [DOI] [PubMed] [Google Scholar]
21.Kent KP, Childs W, Boxer SG. Deconstructing green fluorescent protein. J Am Chem Soc. 2008;130(30):9664–9665. doi: 10.1021/ja803782x. [DOI] [PubMed] [Google Scholar]
22.Jinek M, et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012;337(6096):816–821. doi: 10.1126/science.1225829. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Yang H, et al. One-step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome engineering. Cell. 2013;154(6):1370–1379. doi: 10.1016/j.cell.2013.08.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.DeAngelis MM, Wang DG, Hawkins TL. Solid-phase reversible immobilization for the isolation of PCR products. Nucleic Acids Res. 1995;23(22):4742–4743. doi: 10.1093/nar/23.22.4742. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Jan CH, Williams CC, Weissman JS. Principles of ER cotranslational translocation revealed by proximity-specific ribosome profiling. Science. 2014;346(6210):1257521. doi: 10.1126/science.1257521. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Ingolia NT, Brar GA, Rouskin S, McGeachy AM, Weissman JS. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat Protoc. 2012;7(8):1534–1550. doi: 10.1038/nprot.2012.086. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Li G-W, Burkhardt D, Gross C, Weissman JS. Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell. 2014;157(3):624–635. doi: 10.1016/j.cell.2014.02.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Chen F, et al. High-frequency genome editing using ssDNA oligonucleotides with zinc-finger nucleases. Nat Methods. 2011;8(9):753–755. doi: 10.1038/nmeth.1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Chen KH, Boettiger AN, Moffitt JR, Wang S, Zhuang X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015;348(6233):aaa6090. doi: 10.1126/science.aaa6090. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Peters J-M, Tedeschi A, Schmitz J. The cohesin complex and its roles in chromosome biology. Genes Dev. 2008;22(22):3089–3114. doi: 10.1101/gad.1724308. [DOI] [PubMed] [Google Scholar]
31.Park E, Rapoport TA. Mechanisms of Sec61/SecY-mediated protein translocation across membranes. Annu Rev Biophys. 2012;41:21–40. doi: 10.1146/annurev-biophys-050511-102312. [DOI] [PubMed] [Google Scholar]
32.Brodsky FM. Diversity of clathrin function: New tricks for an old protein. Annu Rev Cell Dev Biol. 2012;28:309–336. doi: 10.1146/annurev-cellbio-101011-155716. [DOI] [PubMed] [Google Scholar]
33.Breslow DK, Weissman JS. Membranes in balance: Mechanisms of sphingolipid homeostasis. Mol Cell. 2010;40(2):267–279. doi: 10.1016/j.molcel.2010.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Nishimura K, Fukagawa T, Takisawa H, Kakimoto T, Kanemaki M. An auxin-based degron system for the rapid depletion of proteins in nonplant cells. Nat Methods. 2009;6(12):917–922. doi: 10.1038/nmeth.1401. [DOI] [PubMed] [Google Scholar]
35.Kawate T, Gouaux E. Fluorescence-detection size-exclusion chromatography for precrystallization screening of integral membrane proteins. Structure. 2006;14(4):673–681. doi: 10.1016/j.str.2006.01.013. [DOI] [PubMed] [Google Scholar]
36.Sadelain M, Papapetrou EP, Bushman FD. Safe harbours for the integration of new DNA in the human genome. Nat Rev Cancer. 2011;12(1):51–58. doi: 10.1038/nrc3179. [DOI] [PubMed] [Google Scholar]
37.Doench JG, et al. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat Biotechnol. 2014;32(12):1262–1267. doi: 10.1038/nbt.3026. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Moreno-Mateos MA, et al. CRISPRscan: Designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat Methods. 2015;12(10):982–988. doi: 10.1038/nmeth.3543. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Horlbeck MA, et al. Nucleosomes impede Cas9 access to DNA in vivo and in vitro. eLife. 2016;5:e12677. doi: 10.7554/eLife.12677. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

pnas.1606731113.sd01.xlsx^{(42.4KB, xlsx)}

[r1] 1.International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome. Nature. 2004;431(7011):931–945. doi: 10.1038/nature03001. [DOI] [PubMed] [Google Scholar]

[r2] 2.Hanson AD, Pribat A, Waller JC, de Crécy-Lagard V. ‘Unknown’ proteins and ‘orphan’ enzymes: The missing half of the engineering parts list--and how to find it. Biochem J. 2009;425(1):1–11. doi: 10.1042/BJ20091328. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3] 3.Dey G, Jaimovich A, Collins SR, Seki A, Meyer T. Systematic discovery of human gene function and principles of modular organization through phylogenetic profiling. Cell Rep. 2015;10(6):993–1006. doi: 10.1016/j.celrep.2015.01.025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4] 4.Schuldiner M, Weissman JS. The contribution of systematic approaches to characterizing the proteins and functions of the endoplasmic reticulum. Cold Spring Harb Perspect Biol. 2013;5(3):a013284. doi: 10.1101/cshperspect.a013284. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5] 5.Huh W-K, et al. Global analysis of protein localization in budding yeast. Nature. 2003;425(6959):686–691. doi: 10.1038/nature02026. [DOI] [PubMed] [Google Scholar]

[r6] 6.Gavin A-C, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415(6868):141–147. doi: 10.1038/415141a. [DOI] [PubMed] [Google Scholar]

[r7] 7.Krogan NJ, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440(7084):637–643. doi: 10.1038/nature04670. [DOI] [PubMed] [Google Scholar]

[r8] 8.Baudin A, Ozier-Kalogeropoulos O, Denouel A, Lacroute F, Cullin C. A simple and efficient method for direct gene deletion in Saccharomyces cerevisiae. Nucleic Acids Res. 1993;21(14):3329–3330. doi: 10.1093/nar/21.14.3329. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9] 9.Ghaemmaghami S, et al. Global analysis of protein expression in yeast. Nature. 2003;425(6959):737–741. doi: 10.1038/nature02046. [DOI] [PubMed] [Google Scholar]

[r10] 10.Chong YT, et al. Yeast proteome dynamics from single cell imaging and automated analysis. Cell. 2015;161(6):1413–1424. doi: 10.1016/j.cell.2015.04.051. [DOI] [PubMed] [Google Scholar]

[r11] 11.Breker M, Schuldiner M. The emergence of proteome-wide technologies: Systematic analysis of proteins comes of age. Nat Rev Mol Cell Biol. 2014;15(7):453–464. doi: 10.1038/nrm3821. [DOI] [PubMed] [Google Scholar]

[r12] 12.Cong L, et al. Multiplex genome engineering using CRISPR/Cas systems. Science. 2013;339(6121):819–823. doi: 10.1126/science.1231143. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] 13.Jinek M, et al. RNA-programmed genome editing in human cells. eLife. 2013;2:e00471. doi: 10.7554/eLife.00471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r14] 14.Lin S, Staahl BT, Alla RK, Doudna JA. Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery. eLife. 2014;3:e04766. doi: 10.7554/eLife.04766. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r15] 15.Kim S, Kim D, Cho SW, Kim J, Kim JS. Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins. Genome Res. 2014;24(6):1012–1019. doi: 10.1101/gr.171322.113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16.Cristea IM, Williams R, Chait BT, Rout MP. Fluorescent proteins as proteomic probes. Mol Cell Proteomics. 2005;4(12):1933–1941. doi: 10.1074/mcp.M500227-MCP200. [DOI] [PubMed] [Google Scholar]

[r17] 17.Hein MY, et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell. 2015;163(3):712–723. doi: 10.1016/j.cell.2015.09.053. [DOI] [PubMed] [Google Scholar]

[r18] 18.Hubner NC, et al. Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions. J Cell Biol. 2010;189(4):739–754. doi: 10.1083/jcb.200911091. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19] 19.Kamiyama D, et al. Versatile protein tagging in cells with split fluorescent protein. Nat Commun. 2016;7:11046. doi: 10.1038/ncomms11046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20.Cabantous S, Terwilliger TC, Waldo GS. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat Biotechnol. 2005;23(1):102–107. doi: 10.1038/nbt1044. [DOI] [PubMed] [Google Scholar]

[r21] 21.Kent KP, Childs W, Boxer SG. Deconstructing green fluorescent protein. J Am Chem Soc. 2008;130(30):9664–9665. doi: 10.1021/ja803782x. [DOI] [PubMed] [Google Scholar]

[r22] 22.Jinek M, et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012;337(6096):816–821. doi: 10.1126/science.1225829. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23] 23.Yang H, et al. One-step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome engineering. Cell. 2013;154(6):1370–1379. doi: 10.1016/j.cell.2013.08.022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r24] 24.DeAngelis MM, Wang DG, Hawkins TL. Solid-phase reversible immobilization for the isolation of PCR products. Nucleic Acids Res. 1995;23(22):4742–4743. doi: 10.1093/nar/23.22.4742. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r25] 25.Jan CH, Williams CC, Weissman JS. Principles of ER cotranslational translocation revealed by proximity-specific ribosome profiling. Science. 2014;346(6210):1257521. doi: 10.1126/science.1257521. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26] 26.Ingolia NT, Brar GA, Rouskin S, McGeachy AM, Weissman JS. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat Protoc. 2012;7(8):1534–1550. doi: 10.1038/nprot.2012.086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r27] 27.Li G-W, Burkhardt D, Gross C, Weissman JS. Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell. 2014;157(3):624–635. doi: 10.1016/j.cell.2014.02.033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28] 28.Chen F, et al. High-frequency genome editing using ssDNA oligonucleotides with zinc-finger nucleases. Nat Methods. 2011;8(9):753–755. doi: 10.1038/nmeth.1653. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r29] 29.Chen KH, Boettiger AN, Moffitt JR, Wang S, Zhuang X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015;348(6233):aaa6090. doi: 10.1126/science.aaa6090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r30] 30.Peters J-M, Tedeschi A, Schmitz J. The cohesin complex and its roles in chromosome biology. Genes Dev. 2008;22(22):3089–3114. doi: 10.1101/gad.1724308. [DOI] [PubMed] [Google Scholar]

[r31] 31.Park E, Rapoport TA. Mechanisms of Sec61/SecY-mediated protein translocation across membranes. Annu Rev Biophys. 2012;41:21–40. doi: 10.1146/annurev-biophys-050511-102312. [DOI] [PubMed] [Google Scholar]

[r32] 32.Brodsky FM. Diversity of clathrin function: New tricks for an old protein. Annu Rev Cell Dev Biol. 2012;28:309–336. doi: 10.1146/annurev-cellbio-101011-155716. [DOI] [PubMed] [Google Scholar]

[r33] 33.Breslow DK, Weissman JS. Membranes in balance: Mechanisms of sphingolipid homeostasis. Mol Cell. 2010;40(2):267–279. doi: 10.1016/j.molcel.2010.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r34] 34.Nishimura K, Fukagawa T, Takisawa H, Kakimoto T, Kanemaki M. An auxin-based degron system for the rapid depletion of proteins in nonplant cells. Nat Methods. 2009;6(12):917–922. doi: 10.1038/nmeth.1401. [DOI] [PubMed] [Google Scholar]

[r35] 35.Kawate T, Gouaux E. Fluorescence-detection size-exclusion chromatography for precrystallization screening of integral membrane proteins. Structure. 2006;14(4):673–681. doi: 10.1016/j.str.2006.01.013. [DOI] [PubMed] [Google Scholar]

[r36] 36.Sadelain M, Papapetrou EP, Bushman FD. Safe harbours for the integration of new DNA in the human genome. Nat Rev Cancer. 2011;12(1):51–58. doi: 10.1038/nrc3179. [DOI] [PubMed] [Google Scholar]

[r37] 37.Doench JG, et al. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat Biotechnol. 2014;32(12):1262–1267. doi: 10.1038/nbt.3026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r38] 38.Moreno-Mateos MA, et al. CRISPRscan: Designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat Methods. 2015;12(10):982–988. doi: 10.1038/nmeth.3543. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r39] 39.Horlbeck MA, et al. Nucleosomes impede Cas9 access to DNA in vivo and in vitro. eLife. 2016;5:e12677. doi: 10.7554/eLife.12677. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A scalable strategy for high-throughput GFP tagging of endogenous human proteins

Manuel D Leonetti

Sayaka Sekine

Daichi Kamiyama

Jonathan S Weissman

Bo Huang

Series information

Significance

Abstract

Results

GFP11 and RNP Electroporation Enable Cloning-Free, High-Efficiency GFP Tagging in Human Cells.

Fig. 1.

Fig. S1.

Library-Scale Generation of Knockin Cell Lines.

Fig. 2.

Fig. S2.

Fig. S3.

A Scalable Strategy for the Knockin of GFP11 Repeats Enables Fluorescent Detection of Low-Expression Proteins.

Fig. 3.

Isolation of Native Protein Complexes from GFP11 Knockin Cells.

Fig. 4.

Fig. S4.

Discussion

Materials and Methods

Nucleic Acid Reagents.

293TGFP1–10 Generation and Cell Culture.

sgRNA in Vitro Transcription.

RNP Assembly and Electroporation.

Preparation of 4× GFP11-LMNA ssDNA Template.

Flow Cytometry and Analysis.

Protein Pull-Down.

Primary Antibodies Used for Western Blot.

Imaging.

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

293T^GFP1–10 Generation and Cell Culture.