Skip to main content
Human Gene Therapy logoLink to Human Gene Therapy
. 2019 Jul 16;30(7):814–828. doi: 10.1089/hum.2018.169

New Human Chromosomal Sites with “Safe Harbor” Potential for Targeted Transgene Insertion

Stefan Pellenz 1,,†,,, Michael Phelps 1,,, Weiliang Tang 1, Blake T Hovde 2,,§, Ryan B Sinit 1, Wenqing Fu 2,,**, Hui Li 1,,††, Eleanor Chen 1, Raymond J Monnat Jr 1,,2,,*
PMCID: PMC6648220  PMID: 30793977

Abstract

This study identified 35 new sites for targeted transgene insertion that have the potential to serve as new human genomic “safe harbor” sites (SHS). SHS potential for these 35 sites, located on 16 chromosomes, including both arms of the human X chromosome, and for the existing human SHS AAVS1, hROSA26, and CCR5 was assessed using eight different desirable, widely accepted criteria for SHS verifiable with human genomic data. Three representative newly identified sites on human chromosomes 2 and 4 were then experimentally validated by in vitro and in vivo cleavage-sensitivity tests, and analyzed for population-level and cell line–specific sequence variants that might confound site targeting. The highly ranked site on chromosome 4 (SHS231) was further characterized by targeted homology-dependent and -independent transgene insertion and expression in different human cell lines. The structure and fidelity of transgene insertions at this site were confirmed, together with analyses that demonstrated stable expression and function of transgene-encoded proteins, including fluorescent protein markers, selectable marker cassettes, and Cas9 protein variants. SHS-integrated transgene-encoded Cas9 proteins were shown to be capable of introducing a large (17 kb) gRNA-specified deletion in the PAX3/FOXO1 fusion oncogene in human rhabdomyosarcoma cells and as a Cas9–VPR fusion protein to upregulate expression of the muscle-specific transcription factor MYF5 in human rhabdomyosarcoma cells. An engineering “toolkit” was developed to enable easy use of the most extensively characterized of these new human sites, SHS231, located on the proximal long arm of chromosome 4. The target sites identified here have the potential to serve as additional human SHS to enable basic and clinical gene editing and genome-engineering applications.

Keywords: human genome, genome editing, transgene insertion site, “safe harbor” site, Cas9 nucleases, gene therapy

Introduction

Many human genome engineering applications require the introduction and stable integration of transgenes into host cells. For applications that do not require precise targeting of an existing gene or locus (e.g., to introduce or modify an endogenous gene, allele, or regulatory element), a common strategy is to target transgene integration to one of a small number of chromosomal “safe harbor” sites (SHS) for expression, presumably without disrupting the expression of adjacent or more distant genes. These putative SHS play an increasingly important role in developing effective gene therapies; in the investigation of gene structure, function, and regulation; and in cell-based biotechnology.

The most widely used of the putative human SHS, the AAVS1 site on chromosome 19q, was initially identified as a site for recurrent adeno-associated virus insertion.1 Other potential SHS have been identified on the basis of homology, with sites first identified in other species (e.g., the human homolog of the permissive murine Rosa26 locus2) or among the growing number of human genes that appear non-essential under some circumstances.3,4 One putative SHS of this type is the CCR5 chemokine receptor gene, which, when disrupted, confers resistance to human immunodeficiency virus infection.5 Additional potential genomic SHS have been identified in human and other cell types on the basis of viral integration site mapping6–8 or gene-trap analyses, as was the original murine Rosa26 locus.9

The nature of human SHS identified to date, together with a set of desirable general properties for any SHS, have defined the criteria used to assess the SHS potential of additional sites in the human genome. The first systematic list of SHS criteria grew from early gene therapy trials using viral vectors, most notably for the hemoglobinopathies.8,10 These included plausible criteria from first principles, for example location outside of transcriptional units and ultra-conserved regions, and 50–300 kb away from the 5′ ends of genes, cancer-related genes, and micro RNAs.8,10 This list was subsequently expanded to include additional, less well-defined criteria such as the exclusion of cell type or lineage-specific essential genes and regulatory RNAs (e.g., long non-coding RNAs) and of cell type–specific, topologically defined nuclear domains (TADs) that have been associated with cancer genes. Chromatin epigenetic profiles (e.g., of a combination of H3K27 methylation and acetylation marks) have also been used to signal the potential for both high efficiency targeting and persistent transgene expression.11 All of these criteria depend heavily upon context: cell type and lineage, tissue specificity of gene expression,12,13 and intended application. These considerations identify additional criteria by which to assess potential SHS for use as part of gene editing or engineering applications.11

In order to expand the number of potentially useful human SHS, the human genome was searched for target-site regions containing target sites for three classes of genome-editing nuclease in close proximity. The 35 sites identified in this way were then assessed for SHS potential using eight different genomic criteria in parallel with the existing human AAVS1, ROSA26, and CCR5 sites. Several potential new SHS were experimentally characterized to demonstrate functional competence for efficient, targeted transgene insertion and expression in different human cell types. These 35 potential new human SHS, located on 16 different human chromosomes and 23 chromosome arms, including both arms of the human X chromosome, provide an expanded list of potential human SHS for targeted transgene insertion to enable basic science as well as clinical applications.

Methods

Cell lines/cell culture

Human 293T cells or derivatives and four human rhabdomyosarcoma (RMS) cell lines derived from unrelated patients were used for experiments. All five lines were cultured in Dulbecco's modified Eagle's medium supplemented with 10% (v/v) fetal bovine serum (Hyclone; GE Healthcare/ Biosciences, Pittsburgh, PA), 2 mM L-glutamine, and antibiotics (1% pen-strep, Gibco; Thermo Fisher Scientific, Waltham, MA) in a 5% CO2 humidified 37°C incubator. Human 293T-REX cells, a derivative of the parent 293T cell line (ATCC cell line CRL-3216), were grown in accordance with the supplier's instructions (Invitrogen/Thermo Fisher Scientific). The human RMS cancer cell lines RD, Rh5, Rh30, and SMS-CTR have been described previously14 and were obtained from the laboratories of Dr. Corinne Linardic (Duke University School of Medicine, Durham, NC) and ATCC (Manassas, VA). Cells were tested periodically for Mycoplasma infection, and cell-line identification and authentication was performed by DNA fingerprinting or, in the case of RMS lines, by short tandem repeat profiling in the Dana Farber Cancer Institute Molecular Diagnostic Laboratory.

Potential new SHS identification

In order to structure the search for potential new human SHS, the study looked first for high-quality matches in the human genome to the rare cutting human LAGLIDADG family homing endonuclease mCreI. Homing endonucleases such as mCreI, the monomerized form of I-CreI,15 have long (≥20 bp), highly sequence-specific target sites, and, by extension, a correspondingly small number of cleavage sites in the human genome. The mCreI homing endonuclease was chosen in part for this analysis on the basis of extensive prior analyses of the effect of all target-site base-pair variants upon cleavage efficiency.16–18 These data provided a well-defined starting point to search the human genome for cleavage sites in regions of the human genome that could be further assessed for “safe harbor” potential.19 It was reasoned that any high-quality human target-site match for mCreI would, if it met other criteria as a potential SHS, also contain one or more adjacent cleavage sites for editing by the Cas9- and TALEN-based editing nucleases that have less stringent targeting requirements than does mCreI.20,21 Once identified, the chromosomal region adjacent to high-quality mCreI target sites could then be assessed using predefined criteria for site safety, likely functional competence and the presence of potentially confounding sequence variation.

This site search was initiated by using detailed information on the cleavage specificity of mCreI to construct a position weight matrix that consisted of 128 mCreI target-site variants that were all predicted to be cleaved with ≥90% of the efficiency of the native mCreI site16–18,22 (Fig. 1A and B). A set of these 128 target sites in FASTA-format was then used to drive a BLAST search to identify corresponding high-quality matches in the reference human genome sequence (GRCh37/hg19) using the following BLAST parameters: optimize for “highly similar sequences (megablast)”; max target seqs = 50; short queries: “adjust for short sequences”; expect threshold = 1; word size = 7; match/mismatch: 4, −5; and gap cost: existence = 12/extension = 8. All of the resulting matches of ≥95% identity (19/20 or 20/20 bp matches vs. the mCreI variant target-site query library) were then evaluated for SHS potential, as described below.

Figure 1.

Figure 1.

Identification and mapping of new human “safe harbor” sites (SHS). (A) The canonical mCreI homing endonuclease cleavage site is shown at the top, with twofold symmetric base-pair positions shaded. Shown below, the target site is a position weight matrix (PWM; also often referred to as a position-specific site matrix [PSSM]) that summarizes biochemical data on the functional consequences of each possible base substitution on cleavage sensitivity at each mCreI target site position, scaled so that a value of 1 = native site cleavage sensitivity and values <0.3 indicate cleavage resistance. Base pairs highlighted in yellow indicate either the canonical base pair at that position or a highly cleavable base-pair substitution. (B) Work flow that uses these PWN/PSSM data to initiate the search for and evaluation of predicted highly cleavage-sensitive mCreI target sites in the human genome. (C) Physical confirmation and functional verification of two new unique SHS located on chromosomes 2p (SHS229) and 4q (SHS231). A third highly ranked SHS (SHS253) was identified at six locations on the short arms of chromosomes 2, 5, and X and on the long arms of chromosomes 7, 14, and 17. Asterisks indicate sites where base-pair variants have been identified in the mCreI target site in human population genetic data (see text for additional details).

Potential new SHS identified by the BLAST search above were then evaluated by applying eight SHS Criteria (Table 1), as were the existing human SHS AAVS1, HsROSA26, and CCR5. The Table 1 site-scoring criteria represent a combination of previously suggested desirable general properties for human SHS,8,23,24 together with genomic data for the 600 kb region centered on each potential SHS.25,26 These eight criteria were grouped into three categories, reflecting the presence of adjacent, potentially confounding genetic or regulatory elements or sequence/copy number variation, and the potential for targeted editing and transgene expression. All criteria were given equal weighting in assessing SHS potential, though the list of criteria can be easily extended, modified, or given different weighting to reflect planned target-site use better.11

Table 1.

Safe harbor site criteria and human genomic data sources used to assess each criterion

  Potential SHS criterion UCSC browser track source
Safety 1. >300 kb from any cancer-related gene on allOncogenes list Genes and gene predictions: UCSC genes
2. >300 kb from any miRNA/other functional small RNAs Genes and gene predictions: sno/miRNA
3. >50 kb from any 5′ gene end Genes and gene predictions: RefSeq genes
Functional silence 4. >50 kb away from any replication origin Regulation: UW Repli-seq: Peaks
5. >50 kb away from any ultra-conserved element Regulation: VISTA enhancers
6. Low transcriptional activity (no mRNA ±25 kb) mRNA and EST: human mRNAs
Structure accessibility 7. Not in copy number variable region Repeats: segmental dups
8. In open chromatin (DHS signal ±1 kb) Regulation: ENC DNase/FAIRE: uniform DNasel HS
9. Unique (one copy in human genome) BLAST search output (see Table 2)

Individual criteria were assessed by opening each site in the UCSC Genome Browser using site coordinates listed in Table 2 and then examining the 600 kb region surrounding the site-anchoring mCreI target site to determine the presence or absence of the indicated criterion. For example, the 300 kb flanking the SHS323 target site was examined in the UCSC Genome Browser for the presence of any gene or gene prediction as indicated the UCSC Genes Browser track. The same procedure was performed for additional criteria using the UCSC track sources listed in the last column.

All SHS candidates, including the three previously defined human SHS, were evaluated using the above criteria and genomic data aggregated from the UCSC Genome Browser.25,26 In brief, sites were first searched 300 kb up- and downstream to identify genes or RNAs of any type, especially any already related to cancer; adjacent transcriptional activity, regardless of annotation; the presence of replication origins or ultra-conserved elements; a location in open chromatin, as assessed by nuclease sensitivity; and whether the potential SHS was located in a region of copy number variation. Next, 1000 Genomes Project (1KGP) data (www.ncbi.nlm.nih.gov/variation/tools/1000genomes/) were used to identify base pair–level population genetic variation in all of the mCreI-anchored potential SHS within single individuals27,28 (Supplementary Table S1). This provided an estimate of the fraction of potential new sites that would be directly accessible within an individual by mCreI and, by extension, an estimate of accessibility by other genome engineering nucleases that have less stringent targeting criteria. The effect of individual mCreI target-site base-pair variants on site cleavage efficiency was assessed using the mCreI position weight matrix (PWM) developed from single base-pair profiling experiments16,17 (Fig. 1B and Supplementary Table S1).

All of the 35 newly identified potential SHS were amplified from human genomic DNA and cleaved in vitro to verify predicted mCreI cleavage sensitivity. Site-specific primer pairs for each site were designed using the CLC Workbench Primer Design Tool (CLC Bio, Boston, MA) to generate ∼300–400 bp polymerase chain reaction (PCR) products containing the mCreI target site (Supplementary Table S2). Genomic DNA purified from human 293T cells using a Wizard Genomic DNA Purification Kit (Promega, Madison, WI) was used as a PCR template (Supplementary Table S2) in reactions performed in 25 μL of 1 × Thermo polymerase buffer containing all four dNTPs at 200 μM, 150 ng genomic DNA, and 400 nM each primer, with 1.25 units of Taq polymerase (New England Biolabs, Ipswich, MA). Amplifications were performed using a 1 min 95°C denaturation step, followed by 30 cycles of 30 s at 95°C, 30 s at 50°C, and 30 s at 68°C, and then 5 min at 68°C. Alternatively, some SHS were amplified in 25 μL reactions that contained 12.5 μL PrimeStar Max DNA polymerase premix (Takara, Mountain View, CA), 50 ng purified genomic DNA, and 240 nM final concentration for each amplification primer. Amplifications were performed using 35 cycles of 10 s at 98°C, 15 s at 50°C, and 3 min at 72°C. SHS-specific PCR products were gel-purified using a QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany), quantified by spectrophotometry, and then digested with purified mCreI protein in 15 μL reactions containing 15 fmol DNA substrate and 0, 15, or 150 fmol of mCreI15,16 in 170 mM KCl, 10 mM MgCl2, and 20 mM Tris, pH 9.0. Digestions were performed at 37°C for 1 h and then stopped by adding 3 μL (1:6) of 6 × stop buffer (60 mM Tris.HCl, pH 7.4, 3% sodium dodecyl sulfate, 30% glycerol, and 150 mM EDTA) prior to electrophoresis through a 1% agarose gel run in TAE buffer (40 mM Tris, 20 mM acetic acid, 1 mM EDTA). Substrate and cleavage product bands were identified following gel electrophoresis by ethidium bromide staining, digital image capture, and band intensities quantified using ImageJ. A native mCreI target site was included in all experiments as a PCR amplification and mCreI digestion positive control. A subset of new SHS was also sequence verified from PCR products using SHS-specific primers by capillary sequencing (Supplementary Table S2; Genewiz, South Plainfield, NJ). Sequence reads were aligned to the human reference genome sequence using the CLC Workbench Alignment tool (CLC Bio).

The in vivo cleavage sensitivity of several potential SHS was subsequently assessed in 293T cells by co-expressing the mCreI homing endonuclease together with the TREX2 3′ to 5′ exonuclease, followed by site amplification and mCreI cleavage versus mock-transfected control cells. Three representative new sites were extensively analyzed in this way: SHS231, a unique chromosome 4 site that was the most highly scored for SHS potential; SHS229, a chromosome 2 site that was the sole newly identified site with perfect nucleotide sequence identity to a highly cleavage-sensitive mCreI site variant; and SHS253, the chromosome 2–specific member of the small family of six identical target-site sequences represented once each on six different chromosomes (chromosomes 2, 5, 7, 14, 17, and X; Fig. 1C and Table 2). TREX2 co-expression in this assay provides a way to assess the fraction of sites cleaved in vivo more accurately by promoting misrepair of cleaved mCreI site ends to promote error-prone non-homologous end joining (NHEJ)-mediated break repair29 with the generation of a mutant, mCreI-resistant target site (Supplementary Fig. S1). The expression vector used in these experiments consisted of a pRRL-based lentiviral vector backbone that encoded the open reading frames for mCreI and TREX2, together with mCherry fluorescent protein in a single translational unit separated by self-cleaving T2A peptides30 (Supplementary Fig. S1). Target-site cleavage efficiency was then estimated by determining the fraction of site-specific PCR product that was mCreI cleavage-resistant, and thus likely mutant, versus a mock-infected control culture.

Table 2.

Location and scoring of canonical and newly identified potential human safe harbor sites

Genomic location Sequence Cre site match Table 1 site criterion Site score Site ID
1 2 3 4 5 6 7 8
Current human SHSs                        
chr19: 55,625,241–55,629,351     + + + + + 5 AAVS1
chr3: 46,414,443–46,414,942     + + + + + 5 CCR5
chr3: 9,415,082–9,414,043     + + + 3 hROSA26
Canonical I-Crel/mCrel site AAAACGTCGTGAGACAGTTT                      
New human SHSs                        
chr1: 152,360,840–152,360,859 AAAATGTCAgGAGACATTTT 19 + + + + 4 323
chr8: 68,720,172–68,720,191   19 + + + + + + + 7 325
chr1: 175,942,362–175,942,381 AAACTGTCATGAGACATTTg 19 + + 2 289
chr1: 231,999,396–231,999,415 AAACTGTCATGgGACAGATT 19 + + + + + 5 227
*chr2: 45,708,354–45,708,373 AAAATGTCATGCGACATTTT 20 + + + + + 5 229
*chr2: 48,830,185–48,830,204 AAACTGaCATAAGACAGATT 19 + + + + 4 253
chr5: 19,069,307–19,069,326   19 + + + + + 5 255
chr7: 138,809,594–138,809,613   19 + + + + 4 257
chr14: 92,099,558–92,099,577   19 + + + + + 5 259
chr17: 48,573,577–48,573,596   19 + + + + 4 261
chrX: 12,590,812–12,590,831   19 + + + + + 5 263
chr2: 77,263,930–77,263,949 AAAATGTgGTGAGACATTTT 19 + + + + + + 6 317
chr2: 150,500,675–150,500,694 AAACTGTCATAAGACAGATc 19 + + + + + + + 7 303
chr3: 31,670,871–31,670,890 AAAATGTCATACtACAGATT 19 + + + + + 5 331
chr4: 37,769,238–37,769,257 AAACCGTCGTGAtACATTTT 19 + + + + + + 6 283
*chr4: 58,976,613–58,976,632 AAACTGTCATAtGACAGATT 19 + + + + + + + 7 231
chr5: 7,577,728–7,577,747 AAAATGTCATGAGACAGTcT 19 + + + + + 5 315
chr5: 93,159,222–93,159,241 AAAATGTCAaGAGACATTTT 19 + + + 3 327
chr5: 159,922,029–159,922,048 AAACTGTCAaAAGACAGATT 19 + + + 3 305
chr16: 19,323,777–19,323,796   19 + + + + + 5 307
chr20: 5,055,245–5,055,264   19 + + + + 4 309
chr6: 89,574,320–89,574,339 AAACTGTCcTAAGACAGTTT 19 + + + + + 5 285
chr6: 114,713,905–114,713,924 AAAATtTCATGAGACATTTT 19 + + + + + + + 7 233
chr6: 134,385,946–134,385,965 AAAATGTCATGAGgCAGTTT 19 + + + + + + 6 311
chr6: 138,972,461–138,972,480 AAACTGTCATACcACAGTTT 19 + + + + 4 299
chr7:113,327,685–113,327,704 AAACTGTCATACaACAGTTT 19 + + + + + + 6 301
chr8: 40,727,927–40,727,946 AAACTGaCGTAAGACAGATT 19 + + + + + 6 293
chr11: 32,680,546–32,680,565 AAAATGTCcTGAGACAGATT 19 + + + + + 5 319
chr12: 27,543,737–27,543,756 AAAAaGTCATGAGACATTTT 19 + + + + 4 333
chr12: 66,516,386–66,516,405 AAACTGTaGTAAGACAGATT 19 + + + + 4 295
chr12: 126,152,581–126,152,600 AAAATGTCATGAGAtATTTT 19 + + + + + 5 329
chr17: 14,810,285–14,810,304 AAACaGTCATAAGACAGATT 19 + + + + 4 297
chr22: 35,770,121–35,770,140 AAACTGaCATGAGACAGATT 19 + + + + 4 291
chrX: 16,059,732–16,059,751 AAAATGTCATGAGAaAGTTT 19 + + + + + + 6 313
chrX: 79,674,328–79,674,347 AAAATGTCATAAGgCAGTTT 19 + + + 3 321

Groups of shaded sites share the same mCreI target site sequence, but are found at different sites in the human genome; * identifies three newly identified SHS chosen for additional genomic and/or functional characterization.

A modified calcium phosphate (CaPO4) transfection protocol31 was used to introduce a pRRL-based lentiviral expression vector encoding mCreI, TREX2, and mCherry proteins into human 293T cells32 (Supplementary Fig. S1). Cells (2–4 × 105/well) were plated on a six-well plate 24 h prior and grown to ∼70% confluence. Then, they were transfected by adding expression vector plasmid DNA (1.5 μg in 10 μL H2O) mixed with 40 μL freshly prepared 0.25 M CaCl2 and 40 μL 2 × BBS buffer (50 mM BES, pH 6.95 [NaOH], 280 mM NaCl, 1.5 mM Na2HPO4; Boston BioProducts, Ashland, MA). Cells were incubated at room temperature for 15 min prior to adding DNA and then incubated overnight in 3% CO2 at 37°C. The medium was changed the following day, and cells were grown for an additional 24 h in a 5% CO2, 37°C humidified incubator prior to checking transfection efficiency by quantifying the frequency of mCherry-positive cells by flow cytometry. In brief, cells were trypsinized, counted, and fixed with formaldehyde (1% v/v final concentration, 10 min at room temperature followed by the addition of 1/20 volume of 2.5 M glycine) prior to flow cytometric analysis of ∼2 × 10–4 cells/transfection on a BD FACS Canto II flow cytometer (BD Biosciences, San Jose, CA). Genomic DNA prepared from co-transfected and control cells was used for PCR amplification and in vitro mCreI cleavage analysis of specific SHS, as described above.

Homology-dependent target-site editing by three genome engineering nucleases

In order to determine whether SHS cleavage could promote homology-directed repair, the study focused on the high-ranked chromosome 4 SHS231 site. Human 293T cells were co-transfected with a SHS231-specific repair template and an expression vector for mCreI or for a site-specific TALEN pair or for the equivalent Cas9 cleavase and nickase enzymes (Fig. 2 and Supplementary Fig. S1). The mCreI expression vector described above was used, whereas the SHS231-specific TALEN protein pair was designed using the TALEN Targeter 2.0 Web design engine.33,34 Each TALEN open reading frame was generated by assembling the following repeat variable di-residues (RVDs): left TALEN: NG NG NN NN HD NG NI NH NN NH HD NG NI NI NN NN NI NG NG NI, corresponding to the nucleotide sequence TTGGCTAGGGCTAAGGATTA (chr 4: 58,976,594–58,976,613); and right TALEN: NG NN NG NI NG NH HD NG NG NG HD HD NG HD NG NG NN NG NG NI, corresponding to the nucleotide sequence TGTATGCTTTCCTCTTGTTA (chr 4:58,976,613–58,976,632).33,34 Forward- and reverse-strand, 20 bp-specific TALEN sequences were then inserted into the TALEN expression vector pRKSXX-pCVL-UCOE.7-SFFV-BFP-2A-HA-NLS2.0-TruncTAL (kindly provided by Dr. Andrew Scharenberg, Seattle Children's Research Institute, Seattle, WA).

Figure 2.

Figure 2.

Structure of three representative new target sites indicating location of mCreI, Cas9, and TALEN target sites. The top two sequence diagrams detail features of the chr2 SHS229 and SHS253, whereas the bottom diagram provides additional detail and results on the chr 4q SHS231. Locations of cleavage sites for mCreI, TALEN, and CRISPR/Cas9 nucleases centered on mCreI target/cleavage sites (key shown top right). The SHS231 repair template is shown below the chr 4 target site region, which also indicates the location of the variable 108 bp SINE-derived insertion sequence identified in one or both alleles of a subset of the cell lines examined. The bottom panel shows independently cloned and sequenced inserts from targeted SHS231 insertions by all three nucleases. The mCreI targeting experiments used an expression vector that encoded both mCreI and the TREX2 nuclease (see text), and Cas9 targeting was performed using a common guide RNA and either a Cas9 cleavase or nickase. Numbers to the right of each row indicate the number of independent targeting events that were cloned and sequenced (see text for additional details).

A SHS231-specific CRISPR/Cas9 expression vector was constructed in pX26035,36 that contained expression cassettes for the Streptococcus pyogenes Cas9 nuclease, the CRISPR RNA array, and the tracrRNA. The SHS231 Cas9 target site, 5′-AAAACATTTATATACTGCGTGG-3′, located 110 bp downstream of the mCreI/TALEN cleavage site, was identified using the CRISPR Design Tools Resource developed by Zhang et al.35,36 A corresponding SHS231-specific Cas9 nickase expression vector was also constructed in pX334, which encoded a Cas9 D10A substitution to confer nickase activity.

The template for site-specific, homology-dependent repair was a common 1 kb DNA fragment consisting of 611 and 528 bp homology arms that flanked the mCreI target-site region of SHS231, and contained a 48 bp insert at its center consisting of the canonical loxP recombinase site and adjacent, diagnostic restriction endonuclease cleavage sites for PvuI and SacII (Fig. 2). The repair template was made by overlap extension PCR using oligonucleotide primers to generate PCR products that, upon re-amplification, incorporated the 48 bp loxP insert at the center of the repair template (Supplementary Table S2).

Calcium phosphate transfection (as described above) was used to introduce nuclease expression vectors and repair templates into human 293T cells. Transfection efficiency was checked by determining the fraction of mCherry-positive cells by flow cytometry, prior to molecular characterization of SHS231 target-site editing by PCR amplification and PvuI or SacII restriction digest (Fig. 2 and Supplementary Fig. S2). PCR products were also cloned into a pGEM-T Easy plasmid vector (Promega, Madison, WI) and transformed into α-Select Chemically Competent Gold Efficiency Cells (Bioline, Taunton, MA), followed by plasmid preparation from white (insert-containing) colonies that were capillary sequenced using a T7 promoter sequencing primer (Fig. 2). Sequencing results were aligned with the repair template sequence using the CLC Main Workbench software (CLCBio).

Homology-independent target-site editing using Cas9 variants

To increase transgene integration efficiency at the SHS231 locus, the use of a homology-independent genome-editing strategy, similar to methods that have been used for transgene integration in zebrafish36,37 and murine models,38 was evaluated. Dual human U6-driven guide RNAs (gRNA) targeting SHS231 were simultaneously inserted into a custom S. pyogenes Cas9-T2A-green fluorescent protein (GFP) expression plasmid (pUS2-SH231) using Gibson assembly, as previously described.39 SHS231-specific gRNAs SHS231-gRNA1: 5′-GCCTCCCCCATAGTACCAT-3′ and SHS231-gRNA2: 5′-GATGTGCTCACTGAGTCTGA-3′ were designed to target and cleave both the SHS231 genomic locus and each arm of the SH231 repair template (Supplementary Fig. S1; Supplementary Table S3) to promote efficient transgene integration by NHEJ-mediated DNA repair.37,38 Transgene cassettes were flanked by Bxb1 recombinase and ΦC31 attP integrase target sites that, once integrated, could be used for high-efficiency SHS-specific editing by these recombinase/integrase proteins.40,41

Homology-independent SHS231 editing was performed by electroporating the pUS2-SH231 dual guide-targeting Cas9 expression vector (3 μg) and repair templates (3 μg) into each of three different human RMS cell lines (Rh5, Rh30, and SMS-CTR [Hinson 2013]; 1 × 106 cells per transfection) in 100 μL volumes using a Neon electroporation system (Life Technologies, Carlsbad, CA) according to the manufacturer's protocol. Two 1,150 V pulses for 30 ms each were performed, followed by growth for 2 weeks in the presence of selection (puromycin, hygromycin, or blasticin, depending on the repair template; see Supplementary Fig. S1 and Supplementary Table S4) to select pools of transgene-containing cells. SHS231-specific targeted integration was confirmed with PCR amplification of the SHS231 target site (Q5 polymerase; New England Biolabs) using a transgene-specific and adjacent, genome-anchored primer pair (SHS231 gFwd: GAACCAGAGCCACCCAGTTG, and Bxb1 rev; GTTTGTACCGTACACCACTGAGAC).

The efficiency of SHS231 editing by different endonucleases was determined by co-transfecting two independent RMS cells lines (SMS-CTR and RD) with a puromycin resistance cassette-containing SH231 repair template along with an expression vector for mCreI, Cas9 nickase (with a single gRNA), or Cas9 cleavase (with single and dual gRNAs). RMS cells were also co-transfected with the piggybac transposon-enabled SHS231 repair template and a piggybac transposase plasmid (PB210PA-1, Palo Alto, CA) to compare the SHS231 knock-in efficiencies of Cas9, mCreI, and transposase-mediated random transgene integration. Two days following transfection, cells were plated onto 24-well plates at 3 × 104 cells/well to account for any transfection-induced cell death, followed by growth in the presence of puromycin (2.5 μg/mL) for 10 days. Cells were then fixed with 2% paraformaldahyde, stained with 0.5% crystal violet, and imaged on a Nikon SMZ-745 stereo microscope to quantify crystal violet-stained pixels using ImageJ software (NIH, Bethesda, MD), as previously described.42

Stable gene expression from SHS231 transgene insertions

Persistent gene expression from SHS231-integrated GFP- or Cas9-encoding transgenes was assessed by time-course imaging of site-targeted cells for the continued expression of GFP or by quantitative reverse transcription PCR (qRT-PCR) for Cas9 expression versus the endogenous housekeeping gene GAPDH (Fig. 4A and B). Time-course imaging of GFP fluorescence was performed using an EVOS imaging system (Life Technologies) of transfected cells during continuous passaging in culture. Cas9 transgene expression from SHS231 was quantified by qRT-PCR SYBR green fluorescence on a CFX96 qPCR machine (Cas9 qFwd: 5′-CCCAAGAGGAACAGCGATAAG-3′; Cas9 qRev: 5′-CCACCACCAGCACAGAATAG-3′; BioRad, Hercules, CA).

Figure 4.

Figure 4.

Stable expression, functional gene editing, and gene activation by SHS231 integrated transgenes. (A) Long-term stable green fluorescent protein (GFP) expression from a SHS231 integrated transgene in two independent RMS cell lines Rh5 and SMS-CTR14 cultured in the absence of antibiotic selection. (B) Relative Cas9 expression level (cycle threshold [Ct]) from a SHS231 integrated Cas9 cassette compared to cells transduced with high-titer Cas9 expressing lentivirus or the endogenous expression level of GAPDH. Both SHS231 and lentiviral Cas9 variants were expressed from the human EF1α promoter. (C) Targeted deletion of a 17,188 bp gDNA segment of the PAX3/FOXO1 fusion oncogene in a third RMS cell line, Rh30, by Cas9 protein expressed from the SHS231 locus. Dual gRNA target sites (blue and green triangles) and deletion PCR primer sites (purple arrows) are identified. (D) Demonstration of endogenous MYF5 gene activation with SHS231 expressed dCas9-VPR and Cas9-VPR transgenes relative to wild-type (unmodified) RMS cells. Gene activation was achieved by targeting full-length (20 bp) or truncated (14 bp) gRNAs (blue, green, and red triangles) to the promoter region of the MYF5 gene.

The functional activity of SHS transgene-encoded Cas9 protein was also assessed for functional activity. This was done by lentiviral transduction of SHS231 Cas9-expressing cells to express dual gRNAs specific for the PAX3/FOXO1 fusion oncogene contained in RMS cell line Rh30 (Fig. 4B; P/F gRNA1: 5′-GATCAATAGATGCTCCTGA-3′; P/F gRNA2: 5′-GACCTTGTTTTATGTGTACA-3′). Successful editing resulting in a 17.2 kb gDNA-directed deletion was detected using PCR amplification of the region spanning the target gDNA deletion site (Fig. 4B; P/F Fwd: 5′-AGGTTGTCCTGAACGTACCTATCAC-3′; P/F Rev: 5′-TGCTTCTCCGACACCCCTAATCT-3′; 885 bp).

A second assessment of SHS231 transgene-encoded protein function was performed by expressing the Cas9-based transcription activator proteins dCas9-VPR or Cas9-VPR in each of two different RMS cell lines, followed by lentiviral expression of dual or triple Cas9 gRNAs designed to target these transactivators to the endogenous muscle gene transcriptional activator MYF5 that is expressed at low levels in Rh5 and SMS-CTR RMS cells. The MYF5 promoter activating gRNAs for dCas9-VPR were gRNA1A, 5′-GATTCCTCACGCCCAGGAT-3′; gRNA2A, 5′-GTTTGTCCAGACAGCCCCCG-3′; and gRNA3A, 5′-GTTTCACACAAAAGTGACCA-3′. The corresponding truncated activating Cas9-VPR gRNAs targeting the MYF5 promoter region were tgRNA1A: 5′-GATAGGCTAAAACAA-3′ and tgRNA2A: 5′-GTGCCTGGCCACTG-3′. Changes in MYF5 gene expression were quantified by SYBR green qRT-PCR using the MYF5-specific primers MYF5 qFwd, 5′-CTGCCCAAGGTGGAGATCCTCA-3′ and MYF5 qRev, 5′-CAGACAGGACTGTTACATTCGGGC-3′, and plotted as fold changes in MYF5 expression in cells expressing Cas9 alone or Cas9-VPR targeted to the MYF5 promoter regions using 14 or 20 bp gRNAs (Fig. 4D).

Results

The BLAST search of 128 predicted highly cleavable mCreI target-site variants identified 35 different matches in the human genome (Fig. 1A and B). These represented 27 distinct mCreI target-site sequences, of which the majority (24/27; 89%) were found only once, with the remaining represented two, three, or six times in the human genome (Fig. 1C and Table 2). Only one target site was a perfect match to a mCreI target-site variant (SHS229; a 20/20 bp match), whereas the other hits differed by 1 bp (19/20 bp matches, or 95% identical) to a query sequence. The target sites mapped to 16 of the 23 human chromosome pairs, including the X chromosome, or nearly half of all chromosome arms (Fig. 1C and Table 2). All 35 new target sites, together with AAVS1, CCR5, and hROSA26, were next evaluated using predefined criteria for likely safety, function, and accessibility (Tables 1 and 2). Twenty-five of the newly identified target sites (71%) met more than half (≥5/9) of these criteria, as did the AAVS1 and CCR5 sites (Table 2). When likely site safety was examined, defined by lack of adjacent coding, regulatory, or other functionally defined regions (Table 1, criteria 1–6), 21/35 (60%) of the new target sites met ≥4/6 criteria, with three sites (SHS231, 233, and 303) matching all six criteria. In contrast, AAVS1, CCR5, and hROSA26 each matched only 3/6 criteria (Table 2).

Transgene target-site genetic variation between individuals has the potential to complicate or disrupt the editing of any region of the human genome. Thus, all 35 of the new sites were assessed for copy number and base pair–level genetic variation. None of the target sites was located in a copy number–variable region of the human genome, though base pair–level genetic variation was identified in 11/35 mCreI target sites using whole-genome sequencing data generated as part of the 1KGP.27,28 Four target sites contained single nucleotide variations that had been previously shown to suppress mCreI cleavage efficiency in vitro strongly by ≥70% (Fig. 1A and Supplementary Table S1), with variant frequencies in 1KGP data ranging from 0.5041 (SHS255) to 0.0037 (SHS293). Of note, among individuals analyzed as part of the 1KGP, fully 80% of the 1,092 individuals for whom data were available lacked any SNP variants in any of the 35 target sites, and 94% had all 35 target sites predicted fully mCreI-cleavage sensitive (Supplementary Table S1). No indel variants were detected in the mCreI target site in 1KGP data.

The same genomic insertion adjacent to the SHS231 site was identified in RD and SMS-CTR RMS cells, and subsequently in the widely used human osteosarcoma cell line U-2 OS.43 This consisted of a 108 bp insertion adjacent to SHS231 with a 35-base poly-T sequence and adjacent short sequence blocks reminiscent of transposable element short tandem duplications. This polymorphic insertion was found to be an exact match for a segment of an AluYa5 subfamily, SINE-derived repeat of 311 bp that is present in ∼4,000 non-redundant copies in the human genome (see www.dfam.org/entry/DF0000053), and was present in one (SMS-CTR) or both (RD) alleles in RMS cell lines and in one of two SHS231 alleles in U-2 OS cells (Fig. 2, and additional data not shown). Though located near SHS231, this insertion did not affect SHS231 access or editing in the experiments conducted in these lines. This combination of SNV variation and less frequent, larger sequence variants illustrates the need to characterize any putative SHS region for structure and sequence variation prior to beginning editing projects.

Efficient in vivo cleavage and editing of new target sites by multiple genome-editing nucleases

In vivo accessibility of several new target sites was assessed by determining their cleavage sensitivity and ability to be edited by different nuclease/repair template combinations. These experiments focused on three representative new sites that included the highly ranked single-copy chromosome 4q site SHS231; a second single-copy chromosome 2p site, SHS229; and a target-site sequence, SHS229, that had 19/20 bp identical matches with single-copy sequences located on chromosome arms 2p, 5p, 7q, 14q, 17q, and Xp. Half of these potential SHS defined by the SHS229 target-site sequence had SHS scores equivalent to the AAVS1 and CCR5 sites (Fig. 1 and Table 2). The in vivo cleavage sensitivity of these and three additional SHS was analyzed by co-expressing mCreI with the TREX2 3′ to 5′ exonuclease in human 293T cells, followed by PCR amplification and mCreI digestion of target sites. This experiment was designed to identify a cleavage-resistant target-site fraction in nuclease-expressing cells, from which a minimum estimate of in vivo cleavage efficiency can be derived.29 Supplementary Figure S2 provides an representative example of this type of analysis, with quantitation of the cleaved fraction normalized to available template for SHS229, 231, and 253. Cre + TREX2 expression reduced the cleaved fraction of SHS PCR product from 1.18- to 1.5-fold, depending upon the individual site and experiment (Supplementary Fig. S2).

In order to determine whether SHS cleavage in vivo could catalyze high-fidelity homology-dependent repair, human 293T cells were co-transfected with an expression vector(s) for an editing nuclease (mCreI, a CRISPR/Cas9 cleavase/nickase, or a TALEN pair), together with a SHS-specific repair template containing a loxP site flanked by two different diagnostic restriction sites (Fig. 2). Three target sites (SHS229, 231, and 253) were analyzed following mCreI expression, two (SHS229 and 231) after CRISPR/Cas9 cleavase/nickase expression, and SHS231 after TALEN expression. Target site–specific PCR amplicons from transfected cells were digested with PvuI and SacII to confirm targeted capture and site-specific integration of the loxP repair template. Target site-specific PCR amplicons were also cloned and sequenced to confirm the structure and fidelity of cleavage-dependent, targeted SHS integration events (Fig. 2). This assay readily identified SHS-specific loxP site knock-in events above background (Supplementary Fig. S2), and provided estimates of SHS-specific knock-in frequencies that ranged from 3.5% (template + TALEN at SHS231) to 23.5% (template + mCre + TREX2 at SHS253) across five separate biological replicate experiments (Supplementary Fig. S2, and additional results not shown). SHS231 editing products were also cloned and sequenced to characterize the frequency and fidelity of site-specific editing in 293T cells further. The fraction of successfully edited SHS231 clones recovered from 293T cells was 4.8% for mCreI/TREX2 (3/63 clones); 6.1% (2/33) for CRISPR/Cas9 nuclease; 16.1% (5/31) for CRISPR/Cas9 nickase; and 1.23% (1/81) for a SHS231-specific TALEN pair, as determined by the sequencing of individual cloned PCR products (Fig. 2). These results, despite variability, clearly document high-fidelity SHS-specific editing by four different nucleases at three representative new target sites in human cells.

In order to increase SHS transgene integration efficiency and facilitate genome editing in post-mitotic cells, the use of a homology-independent transgene integration strategy for SHS231 editing was evaluated. These experiments used Cas9-mediated cleavage of the transgene repair template and genomic SHS target locus (i.e., using dual gRNAs) to promote transgene integration by NHEJ DNA repair (Fig. 3A).37,38 While indel mutations can be introduced during NHEJ-mediated repair of the target locus, this is not a serious concern, since the potential SHS were specifically identified to contain no immediately adjacent genes or functional genomic elements. Molecular analysis of SHS231 integration events by amplification, cloning, and sequencing of the 5′ SHS231 integration site identified both direct fusion events (no indels), as well as the expected short indel mutations at the gRNA cleavage site (Fig. 3A), evidence compatible with a NHEJ-mediated DNA repair mechanism.

Figure 3.

Figure 3.

Integration of transgenes into chromosome 4q SHS231 locus using homology-independent non-homologous end joining (NHEJ) DNA repair mechanisms. (A) Homology-independent transgene integration is mediated by targeting both the repair template and genomic target site using dual CRISPR guide RNAs (gRNAs; blue and green triangles represent gRNA target sites). The structure of a SH231 repair template expressing the puromycin resistance gene is indicated, including the size of a puromycin transgene cassette and locations of CRISPR gRNAs. Representative sequences from the 5′ transgene integration site after knock-in-specific polymerase chain reaction (PCR) amplification (PCR primers; purple arrows). (B) Relative knock-in efficiency of a puromycin cassette at the SHS231 locus using homology-independent repair (US2-Cas9; NHEJ) and homology-directed repair (nCas9, Cas9, mCreI; HDR) compared to random piggybac transposition (PBase) in the rhabdomyosarcoma (RMS) cell lines RD and SMS-CTR.14 Crystal violet staining was used to visualize the percentage of stable puromycin-resistant colonies resulting from 3 × 104 transfected cells after 10 days in culture. No colonies were identified from CRISPR only or repair template only controls. (C) Quantification of crystal violet staining from SHS231 knock-in stable cells generated from the same RMS cell lines used in (B) above. Asterisk indicates significant difference between NHEJ- and HDR-mediated SHS231 knock-in approaches, p < 0.05.

The efficiency of dual gRNA Cas9 cleavage-mediated editing of the chromosome 4 SHS231 locus was compared to the Cas9 nickase, cleavase, and mCreI-mediated HDR approaches by co-transfection of each endonuclease with a repair template expressing puromycin (Fig. 3B and C and Supplementary Fig. S1). The relative editing efficiencies of these endonucleases were also compared to random integration of the repair template using a piggybac transposon where the repair template contained piggybac terminal repeat sequences flanking the transgene cassette. This experiment was performed in two independent RMS cells lines (RD and SMS-CTR) by measuring the number of stable puromycin-resistant clones resulting from 3 × 104 transfected cells. Homology-independent transgene integration of the puromycin repair template was twofold higher when compared to HDR-mediated insertion of the transgene directed by Cas9, nCas9, and mCreI nucleases. Neither of these approaches, however, was as efficient as random integration mediated by piggybac transposition (Fig. 3B and C).

Characterization of stability, expression, and functionality of SHS231 integrated genes

The functional utility of any potential human SHS depends critically upon persistent transgene expression following targeted transgene insertion. Thus, the persistent expression and functional competence were assessed of several different transgene cassettes at the chromosome 4 SHS231 for which the best editing efficiency and fidelity data were available. SHS231 transgene expression and stability were assessed by following SHS231 transgene-encoded GFP expression in two independent RMS cells lines (SMS-CTR and Rh5) where transgene insertion had been mediated by homology-independent site editing. SHS231 GFP transgene expression was persistent, with no apparent drop in the fraction or intensity of GFP+ cells over 45 days in culture in the absence of antibiotic selection. These times correspond to an estimated 15 population doublings in Rh5 cells and 25 population doublings in SMS-CTR cells (Fig. 4A), and provide evidence for persistent transgene integration and expression from SHS231 over usefully long periods of time in mitotically dividing cells.

The study also determined whether other useful, persistently expressed SHS231-encoded transgene proteins retained function. Stable SHS231 Cas9-expressing cell lines were generated and used for this purpose in light of the growing range of Cas9-enabled methods to study gene structure and function or for CRISPR-based genetic screens. Readily detectable Cas9 expression was observed from a SHS231 knock-in transgene, though lower than Cas9 expression in cells super-infected with high titer lentivirus encoding Cas9, or to the level of expression of the endogenous GAPDH protein (Fig. 4B). The functional competence of SHS231-expressed Cas9 protein was further demonstrated in Rh30 RMS cells by transducing Cas9-expressing cells with a lentivirus expressing two gRNAs targeting a PAX3/FOXO1 fusion oncogene contained in Rh30 (Fig. 4C). The gDNA-predicted 17,188 bp, Cas9-mediated deletion in PAX3/FOXO1 was readily detected by PCR amplification of gRNA-transduced cell pools using primers that flanked the PAX3/FOXO1 gRNA target sites (Fig. 4C).

In a third series of new target-site functional validation experiments, transgene cassettes were integrated at SHS231 that expressed the chimeric Cas9-derived transcriptional activators dCas9-VPR or Cas9-VPR by Cas9-mediated knock-in. VPR is a tripartite transcription factor consisting of VP64, P65, and Rta transactivation domains. Fusion of this transcription factor to the C-terminus of the Cas9 protein generates a potent transcriptional activator protein (dCas9-VPR or Cas9-VPR) that can be directed by expressing target gene-specific gRNAs.44 In order to test this system, RMS cell lines expressing dCas9-VPR or Cas9-VPR from the SHS231 site was transduced with a lentivirus expressing two or three gRNAs that target the promoter region of the MYF5 gene (Fig. 4D). MYF5 encodes Myogenic Factor 5, a transcriptional activator of muscle-specific genes that play a role in muscle differentiation.45 It is typically expressed at very low levels in many RMS cells, and therefore is a good candidate for measuring gRNA-targeted, Cas9-VPR-mediated gene activation. It was found that the expression of both full-length (20 bp) and truncated (14 bp) gRNAs to the promoter region of MYF5 robustly and reproducibly upregulated expression in both RMS cell lines in a Cas9-VPR-dependent manner (Fig. 4D).

These results collectively demonstrate efficient editing of representative, newly defined target sites for transgene insertion, with persistent expression of both GFP and Cas9 protein variants in mitotically dividing cells. Moreover, the ability of these targeted transgene-encoded proteins to drive additional useful gene-editing outcomes, including the promotion of large deletions in a PAX3/FOXO1 fusion oncogene, and induced expression of the MYF5 gene in RMS cells was demonstrate. In order to enable the wider immediate use of the chromosome 4q SHS231 site for targeted transgene insertion, a SHS231-specific “toolkit” has been assembled to enable easy editing of this site in additional human cell types (Supplementary Fig. S1 and Supplementary Table S4; available from Addgene, Cambridge, MA). All of the expression vector transgenes included in this set are driven by the constitutive human EF-1α promoter, and contain additional attP sites to serve as “landing pads” for ΦC31 and Bxb1 integrase-mediated, high-efficiency SHS transgene insertion.

Discussion

Only a small number of SHS for targeted transgene insertion are in wide use in human cells. These were originally identified by several routes, including viral insertion site or loss-of-function analyses (e.g., AAVS1 and CCR5), or by their similarity to putative SHS in other organisms (e.g., the hROSA26 locus).11 The present results identify, and in part experimentally validate, 35 new human genomic targeted transgene insertion sites that have potential to serve as new human SHS. A representative subset of these new sites has been experimentally validated, and for the best-validated, experimental evidence is provided for successful targeting, transgene insertion, and persistent expression of selectable, scorable, or functionally active proteins.

The 35 potential new SHS were identified by focusing first on editing nuclease target-site analyses. Then, they were assessed for “safe harbor” potential using previously proposed, widely accepted, and desirable SHS properties. Each property (Table 1) was assessed for each new site using site-specific human structural, genetic, and regulatory data (e.g., ENCODE data46). More than half of the potential new SHS (20/35; 57%) met 4/6 core safety criteria (Table 2), in contrast to the widely used human AAVS1, CCR5, and hROSA26 SHS that each met three or fewer of the core safety criteria defined by adjacent coding and regulatory elements (Table 2).

These newly identified sites were then experimentally validated to demonstrate their presence in the human genome, and that each contained a site-anchoring 20 bp cleavage site for the rare-cutting mCreI homing endonuclease. In vivo analyses using four different editing nucleases were performed to demonstrate both homology-dependent and -independent site editing with persistent transgene expression and activity. In order to assess whether most or all of the newly identified sites could be successfully edited in different individuals, a combination of target-site amplification and sequencing, site-specific human population genetic variation data, and a previously determined mCreI position weight matrix was used (Fig. 1A). This analysis identified low-frequency base-pair variation in the mCreI target site in 11/35 new target sites that had been shown to inhibit (n = 4), impair (n = 3), or not affect (n = 4) mCreI target-site cleavage (Fig. 1 and Supplementary Table S1). One insertion adjacent to the best-validated of the sites, the chromosome 4 SHS231, was also identified that consisted of a truncated AluYa5 subfamily, SINE-derived repeat sequence that did not substantially affect editability, though might reduce the efficiency of homology-mediated repair in cell lines or individuals carrying this sequence polymorphism. These analyses emphasize the need to analyze potential new SHS for potentially confounding sequence variants prior to beginning editing experiments, especially where the efficiency of targeting may be a key criterion for success.

As part of the experimental validation of a subset of the newly identified sites, both Cas9 nickase- and cleavase-dependent editing was demonstrated, along efficient editing of the chromosome 4q SHS231 site by both homology-dependent and -independent, likely NHEJ-mediated, transgene insertion pathways. Cas9 nickase-driven, homology-dependent repair provides a potentially high-fidelity editing pathway that minimizes the risk of DNA double-strand break-driven chromosomal rearrangements. In contrast, the lower-fidelity dual-cleavage knock-in approach that depends on NHEJ may provide an efficient way to generate cell populations with virtually identical, site-specific transgene insertions. This approach could in many instances eliminate the time and expense of isolating multiple cell clones while retaining the natural heterogeneity found in the human cells and cell lines most often used to study biology and to model disease states. Dual-cleavage knock-in strategies may also allow the editing of many nondividing cell types, in contrast to homology-dependent pathways that can only be efficiently used in dividing cells.

Efficient transgene insertion and expression is a key requirement for any putative SHS, and additional work will be needed to identify site-specific variables such as sequence variation that might affect site editing or transgene expression levels in different cell types (see, e.g., Daboussi et al.47). The identification of many potential transgene insertion sites with “safe harbor” potential provides a ready list of alternatives should site-specific problems arise. The efficiency of SHS-targeted editing can likely also be further optimized. Important variables include cell type-specific gene-transfer efficiencies, repair template type (single vs. double stranded), and the length and the degree of nucleotide sequence identity between the repair template and target-site flanking sequences, alluded to above. The highest efficiency of homology-directed repair can in most instances be promoted by incorporating >200 bp of perfect DNA sequence identity between a SHS and donor repair template arms.48–51 Both target-site base pair–level variation in 11 target sites using population genetic variation data and a SINE-derived insertion near the SHS231 site were identified in the subset of cell lines analyzed. This type of variation can often be compensated for once identified.

The new targeted transgene insertion sites identified provide alternatives to the small number of sites currently used as SHS in human cells. The SHS assessment and scoring strategy used was more comprehensive than previous efforts, and can be further modified to incorporate new or application-specific SHS scoring criteria. For example, the growing list of apparently dispensable human genes3,4 offers a rich source of potential new human SHS, though many of these genes may meet only a few of the commonly accepted, desirable criteria for SHS (Table 1). More complete characterization of the 35 new target sites, and of other potential new human SHS, should enable a wide range of basic as well as translational human genome-engineering applications.

Supplementary Material

Supplemental data
Supp_Table1.pdf (30.3KB, pdf)
Supplemental data
Supp_Table2.pdf (27.6KB, pdf)
Supplemental data
Supp_Fig1.pdf (210.2KB, pdf)
Supplemental data
Supp_Fig2.pdf (73.5KB, pdf)
Supplemental data
Supp_Table3.pdf (19.7KB, pdf)
Supplemental data
Supp_Table4.pdf (20.9KB, pdf)

Acknowledgments

We thank Drs. Kenny Matreyek and Douglas Fowler for introducing us to the BxbI recombinase system and for providing materials, and Dr. Andrew Scharenberg for TREX2 vectors; Alden Hackmann for help with Figures; Marilyn Moelhman and Texia Loh for work on the design and analysis of CRISPR activation and SHS231 homology-independent knock-in experiments; and Mara Blair for manuscript preparation help. S.P., H.L., R.B.S., and R.J.M. Jr. were supported by NIH Award 1RL1CA133831, while B.T.H. was supported by Interdisciplinary Training in Genomic Sciences NHGRI T32 Award HG00035. M.P. and E.C. were supported by NIH R01 CA227432-01.

Author Disclosure

R.J.M. Jr. holds equity in bluebird bio (Cambridge, MA), though performs no work and receives no compensation from bluebird bio. No competing financial interests exist for the remaining authors.

Supplementary Material

Supplementary Fig. S1

Supplementary Fig. S2

Supplementary Table S1

Supplementary Table S2

Supplementary Table S3

Supplementary Table S4

References

  • 1. Kotin RM, Linden RM, Berns KI. Characterization of a preferred site on human chromosome 19q for integration of adeno-associated virus DNA by non-homologous recombination. EMBO J 1992;11:5071–5078 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Irion S, Luche H, Gadue P, et al. Identification and targeting of the ROSA26 locus in human embryonic stem cells. Nat Biotechnol 2007;25:1477–1482 [DOI] [PubMed] [Google Scholar]
  • 3. MacArthur DG, Balasubramanian S, Frankish A, et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 2012;335:823–828 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Saleheen D, Natarajan P, Armean IM, et al. Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. Nature 2017;544:235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Liu R, Paxton WA, Choe S, et al. Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell 1996;86:367–377 [DOI] [PubMed] [Google Scholar]
  • 6. Schröder ARW, Shinn P, Chen H, et al. HIV-1 integration in the human genome favors active genes and local hotspots. Cell 2002;110:521–529 [DOI] [PubMed] [Google Scholar]
  • 7. Mitchell RS, Beitzel BF, Schroder ARW, et al. Retroviral DNA integration: ASLV, HIV, and MLV show distinct target site preferences. PLoS Biol 2004;2:e234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Papapetrou EP, Lee G, Malani N, et al. Genomic safe harbors permit high β-globin transgene expression in thalassemia induced pluripotent stem cells. Nat Biotechnol 2011;29:73–78 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Zambrowicz BP, Imamoto A, Fiering S, et al. Disruption of overlapping transcripts in the ROSA βgeo 26 gene trap strain leads to widespread expression of β-galactosidase in mouse embryos and hematopoietic cells. Proc Natl Acad Sci U S A 1997;94:3789–3794 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Sadelain M, Papapetrou EP, Bushman FD. Safe harbours for the integration of new DNA in the human genome. Nat Rev Cancer 2011;12:51. [DOI] [PubMed] [Google Scholar]
  • 11. Papapetrou EP, Schambach A. Gene insertion into genomic safe harbors for human gene therapy. Mol Ther 2016;24:678–684 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Lonsdale J, Thomas J, Salvatore M, et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet 2013;45:580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Wong AK, Troyanskaya OG, Krishnan A. GIANT 2.0: genome-scale integrated analysis of gene networks in tissues. Nucleic Acids Res 2018;46:W65–W70 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Hinson A, Jones R, Crose L, et al. Human rhabdomyosarcoma cell lines for rhabdomyosarcoma research: utility and pitfalls. Front Oncol 2013;3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Li H, Pellenz S, Ulge U, et al. Generation of single-chain LAGLIDADG homing endonucleases from native homodimeric precursor proteins. Nucleic Acids Res 2009;37:1650–1662 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Ulge UY, Baker DA, Monnat RJ. Comprehensive computational design of mCreI homing endonuclease cleavage specificity for genome engineering. Nucleic Acids Res 2011;39:4330–4339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Li H, Ulge UY, Hovde BT, et al. Comprehensive homing endonuclease target site specificity profiling reveals evolutionary constraints and enables genome engineering applications. Nucleic Acids Res 2012;40:2587–2598 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Li H, Monnat R ., Jr. Homing endonuclease target site specificity defined by sequential enrichment and next-generation sequencing of highly complex target site libraries. Methods Mol Biol 2014;1123:151–163 [DOI] [PubMed] [Google Scholar]
  • 19. Pellenz S, Monnat RJ., Jr. Identification and analysis of genomic homing endonuclease target sites. Methods Mol Biol 2014;1123:1–22 [DOI] [PubMed] [Google Scholar]
  • 20. Merkert S, Martin U. Targeted genome engineering using designer nucleases: state of the art and practical guidance for application in human pluripotent stem cells. Stem Cell Res 2016;16:377–386 [DOI] [PubMed] [Google Scholar]
  • 21. Gupta RM, Musunuru K. Expanding the genetic editing tool kit: ZFNs, TALENs, and CRISPR-Cas9. J Clin Invest 2014;124:4154–4161 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Argast GM, Stephens KM, Emond MJ, et al. I-PpoI and I-CreI homing site sequence degeneracy determined by random mutagenesis and sequential in vitro enrichment. J Mol Biol 1998;280:345–353 [DOI] [PubMed] [Google Scholar]
  • 23. DeKelver RC, Choi VM, Moehle EA, et al. Functional genomics, proteomics, and regulatory DNA analysis in isogenic settings using zinc finger nuclease-driven transgenesis into a safe harbor locus in the human genome. Genome Res 2010;20:1133–1142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. van Rensburg R, Beyer I, Yao XY, et al. Chromatin structure of two genomic sites for targeted transgene integration in induced pluripotent stem cells and hematopoietic stem cells. Gene Ther 2012;20:201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Hinrichs AS, Zweig AS, Raney BJ, et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res 2018;47:D853–D858 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Kuhn RM, Haussler D, Kent WJ. The UCSC genome browser and associated tools. Brief Bioinform 2013;14:144–161 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 2015;526:68–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 2012;491:56–65 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Certo MT, Gwiazda KS, Kuhar R, et al. Coupling endonucleases with DNA end-processing enzymes to drive gene disruption. Nat Methods 2012;9:973–975 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Szymczak-Workman AL, Vignali KM, Vignali DAA. Design and construction of 2A peptide-linked multicistronic vectors. Cold Spring Harb Protoc 2012;2012:199–204 [DOI] [PubMed] [Google Scholar]
  • 31. Chen C, Okayama H. High-efficiency transformation of mammalian cells by plasmid DNA. Mol Cell Biol 1987;7:2745–2752 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Dull T, Zufferey R, Kelly M, et al. A third-generation lentivirus vector with a conditional packaging system. J Virol 1998;72:8463–8471 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Cermak T, Doyle EL, Christian M, et al. Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting. Nucleic Acids Res 2011;39:7879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Doyle EL, Booher NJ, Standage DS, et al. TAL Effector-Nucleotide Targeter (TALE-NT) 2.0: tools for TAL effector design and target prediction. Nucleic Acids Res 2012;40:W117–W122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Cong L, Ran FA, Cox D, et al. Multiplex genome engineering using CRISPR/Cas systems. Science 2013;339:819–823 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Hsu PD, Scott DA, Weinstein JA, et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 2013;31:827–832 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Auer TO, Duroure K, De Cian A, et al. Highly efficient CRISPR/Cas9-mediated knock-in in zebrafish by homology-independent DNA repair. Genome Res 2014;24:142–153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Suzuki K, Tsunekawa Y, Hernandez-Benitez R, et al. In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration. Nature 2016;540:144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Phelps MP, Bailey JN, Vleeshouwer-Neumann T, et al. CRISPR screen identifies the NCOR/HDAC3 complex as a major suppressor of differentiation in rhabdomyosarcoma. Proc Natl Acad Sci U S A 2016;113:15090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Matreyek KA, Fowler DM, Stephany JJ. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res 2017;45:e102–e102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Michele PC. The phiC31 integrase system for gene therapy. Curr Gene Ther 2006;6:633–645 [DOI] [PubMed] [Google Scholar]
  • 42. Guzmán C, Bagga M, Kaur A, et al. ColonyArea: an ImageJ plugin to automatically quantify colony formation in clonogenic assays. PLoS One 2014;9:e92444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Ponten J, Saksela E. Two established in vitro cell lines from human mesenchymal tumours. Int J Cancer 1967;2:434–447 [DOI] [PubMed] [Google Scholar]
  • 44. Chavez A, Scheiman J, Vora S, et al. Highly efficient Cas9-mediated transcriptional programming. Nat Methods 2015;12:326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Zammit PS. Function of the myogenic regulatory factors Myf5, MyoD, Myogenin and MRF4 in skeletal muscle, satellite cells and regenerative myogenesis. Semin Cell Dev Biol 2017;72:19–32 [DOI] [PubMed] [Google Scholar]
  • 46. Consortium EP. An integrated encyclopedia of DNA elements in the human genome. Nature 2012;489:57–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Daboussi F, Zaslavskiy M, Poirot L, et al. Chromosomal context and epigenetic mechanisms control the efficacy of genome editing by rare-cutting designer endonucleases. Nucleic Acids Res 2012;40:6367–6379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Donoho G, Jasin M, Berg P. Analysis of gene targeting and intrachromosomal homologous recombination stimulated by genomic double-strand breaks in mouse embryonic stem cells. Mol Cell Biol 1998;18:4070–4078 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Jasin M, Rothstein R. Repair of strand breaks by homologous recombination. Cold Spring Harb Perspect Biol 2013;5:a012740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. LaRocque JR, Jasin M. Mechanisms of recombination between diverged sequences in wild-type and BLM-deficient mouse and human cells. Mol Cell Biol 2010;30:1887–1897 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Renkawitz J, Lademann CA, Jentsch S. Mechanisms and principles of homology search during recombination. Nat Rev Mol Cell Biol 2014;15:369–383 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental data
Supp_Table1.pdf (30.3KB, pdf)
Supplemental data
Supp_Table2.pdf (27.6KB, pdf)
Supplemental data
Supp_Fig1.pdf (210.2KB, pdf)
Supplemental data
Supp_Fig2.pdf (73.5KB, pdf)
Supplemental data
Supp_Table3.pdf (19.7KB, pdf)
Supplemental data
Supp_Table4.pdf (20.9KB, pdf)

Articles from Human Gene Therapy are provided here courtesy of Mary Ann Liebert, Inc.

RESOURCES