Skip to main content
. 2018 Mar 28;7:e34420. doi: 10.7554/eLife.34420

Figure 3. The deep mutational scanning selects for functional Envs and yields measurements that are well correlated among replicates.

(A) The average per-codon mutation frequency when sequencing plasmids encoding wildtype Env (DNA), plasmid mutant libraries (mutDNA), mutant viruses after the final infection (mutvirus), and virus generated from wild-type plasmids (virus). Mutations are categorized as nonsynonymous, synonymous, or stop codon. The DNA samples show that sequencing errors are rare, and the virus samples show that viral-replication errors are well below the frequency of mutations in the mutDNA samples. Comparing the mutvirus to mutDNA shows clear purifying selection against stop codons and some nonsynonymous mutations, particularly after subtracting the background error rates given by the virus and DNA samples (Figure 3—source data 2). More extensive plots from the analysis of the deep sequencing data are in Supplementary file 1 and 2. (B) Correlations between replicates in the measured preferences of each site in Env for all 20 amino acids. Blue indicates replicate measurements on BF520, red indicates replicate measurements on BG505, and gray indicates across-Env measurements of BF520 versus BG505. R is the Pearson correlation coefficient. The numerical values for the preferences are in Figure 3—source data 2. Figure 3—figure supplement 1 shows the correlations using contour rather than scatter plots.

Figure 3—source data 1. Average frequencies of nonsynonymous, synonymous, and stop-codon mutations as plotted in mutfreqs are in avgmutfreqs.csv.
There is only one DNA sample for BF520 which is listed three times with each BF520 replicate. We calculate the error-corrected pre-selection mutation frequency as the mutDNA minus DNA, and the error-corrected post-selection mutation frequency as the mutvirus minus the virus. We use these error-corrected frequencies to calculate the percent of mutations remaining after selection.
DOI: 10.7554/eLife.34420.010
Figure 3—source data 2. Preferences for each replicate and averages are in all_prefs_unscaled.zip.
DOI: 10.7554/eLife.34420.011

Figure 3.

Figure 3—figure supplement 1. Correlations plotted on a contour rather than a scatter plot.

Figure 3—figure supplement 1.

This figure shows the same data as in Figure 3B except that it shows KDE contour plots rather than points. As is obvious from this representation, the vast majority of the points fall near the origin (very low preference in both replicates), and most of the correlation signal is therefore due to the relatively modest number of amino acids that have high preference at any given site. This is expected, since most amino acid mutations at most sites will be strongly deleterious, and so have low preference.