Skip to main content
. Author manuscript; available in PMC: 2017 Jul 28.
Published in final edited form as: Cell. 2016 Jun 30;166(3):679–690. doi: 10.1016/j.cell.2016.05.070

Figure 1. Identification of 6 Base Sequences Linked to Low GFP Expression.

Figure 1

(A) Schematic of the method to examine effects of three randomized codons on superfolder GFP expression, using the RNA-ID reporter. Shown is the FACS sort of (NNN)3 Library 1.

(B) Distribution of syn-GFPSEQ scores. Variants were assigned to low (magenta; n = 1119), intermediate (gray; n = 5127), and high (gold; n = 24417, excluding high expression synonymous references) expression categories.

(C) Significance of 6-mer enrichment in low expression variants by 6-mer position (1–4) in the 9-base variable region (library insertion). 6-mers with at least one p-value ≤ 0.001 are plotted based on hierarchical clustering of positional permutation p-values. Fifty-seven 6-mers are not plotted due to missing values; this includes 6-mers that form an in-frame stop codon. 6-mers with a p-value ≤ 0.001 at both in-frame start positions (1 and 4) are labeled (although CUG-AGG, CUG-AUA*, and CUU-AGG are not plotted because they form a stop codon at another position). Candidate inhibitory pairs that remain enriched in a reduced structure dataset are indicated with a star.

(D) Flow cytometry scatter plots from 6 individual variants; label (GFP*100/RFP).

See also Tables S1S4.