Skip to main content
. 2018 Mar 9;7:e32110. doi: 10.7554/eLife.32110

Figure 2. Confirmation of amino acid biosynthetic genes with high-throughput fitness experiments.

(A) Fitness scores for 6,558 genes in media with and without amino acid supplementation (drop-out complete mix). Gene fitness scores are log ratios of final versus starting abundance averaged over multiple barcoded insertions per gene across three biological replicates. Genes that had consistently different enrichment scores between treatments (∆F > 1, |T| statistic >3) are highlighted and represent genes for which mutant strains are auxotrophic for one or more amino acids, nucleotides, or vitamins present in the drop-out complete mixture. (B) Fitness scores in media supplemented with arginine or methionine. Highlighted genes are the same as highlighted in (A). Deletion strains for circled or boxed genes are auxotrophic for methionine or arginine, respectively, in S. cerevisiae or A. nidulans. See Supplementary file 2 for full fitness data. (C) Hierarchical clusters of fitness scores in supplemented and non-supplemented media. Fitness scores for each biological replicate versus its Time 0 replicate for genes with a consistent fitness defect (F < −1, T < −3) in one or more of the following conditions: Yeast extract/Peptone/Dextrose media (YPD) or defined media (DM, composed of yeast nitrogen base plus glucose) with or without the following supplements: (+DOC), arginine (+ARG), or methionine (+MET). (D) Sulfur amino acid biosynthesis in R. toruloides as inferred from fitness experiments. CysA/CysB are named according to their A. nidulans orthologs, all others by orthologs in S. cerevisiae. Auxotrophic mutants had F < −1 in non-supplemented media (DM) and T < −3 in DM versus supplemented media (DOC). Multiple insertions were mapped in STR3, suggesting non-essentiality, but strain abundance was too low to estimate fitness in BarSeq data. *MET16 had fitness scores that clustered with the other auxotrophic mutants, but TDM-DOC was −2.7. **Fitness scores for insertions in MET8 were not inconsistent with auxotrophy, but only two insertions were abundant enough to be tracked. 5MTHTG: 5-methyltetrahydropteroyltri-L-glutamate, THTG: tetrahydropteroyltri-L-glutamate, SAM: S-adenosyl-L-methionine, SAH: S-adenosyl-homocysteine, APS: adenylyl-sulfate, PAPS: 3'-phosphoadenylyl-sulfate. (E) Arginine biosynthesis in R. toruloides as inferred from fitness experiments. *ARG8 had fitness scores that clustered with the other auxotrophic mutants, but TDM-DOC was −2.9. NAG: N-acetylglutamate, NAGSA: N-acetylglutamate semialdehyde, NAAO: N-alpha-acetylornithine. The following figure supplements are available for Figure 2.

Figure 2.

Figure 2—figure supplement 1. Barcode abundance in BarSeq experiments.

Figure 2—figure supplement 1.

(A) Histogram of barcode abundance in a typical BarSeq experiment with 20 million reads per sample. (B) Histogram of tracked barcodes per gene in a typical BarSeq experiment. Median seven barcodes per gene, 68,021 total barcodes in 6,558 genes. See Supplementary file 1 for a full list of insert density by gene and orthologs reported as essential in model fungi. (C) Histogram of total reads per sample per gene in a typical experiment. This is the sum of counts for all the barcodes that were used in gene fitness estimation for each gene, averaged across all samples.
Figure 2—figure supplement 2. Contributions of individual strains to gene-level fitness scores.

Figure 2—figure supplement 2.

(A) Raw counts for barcoded insertions inside and flanking the coding region for RTO4_9377 ARG5 in Time 0 samples and after growth in non-supplemented defined media (DM). The gene structure is shown below the plot with coding exons as large blue boxes and five prime/three prime untranslated regions as smaller blue boxes. The location of each insertion is noted with a black line between the gene model and the corresponding data on the bar chart. Counts from each biological replicate for each insertion are clustered together in the order they were mapped to the gene. The dark grey area of the plot indicates insertions in the central 80% of the coding region, and the flanking area in light grey indicates insertions in the first or last 10% of the coding region. (B) Fitness scores for individual insertions in ARG5, plotted in the same order as (A). The height of the bar indicates the fitness score derived from the log2 ratio of counts shown in (A). Shading of the bar indicates the weight assigned to each insertion in calculating F for the gene. F and T for each individual replicate and the average F and combined T for this condition are displayed above the plot. (C) Raw counts for insertions in ARG5 in Time 0 and after growth in arginine-supplemented media (+DOC: DM plus drop-out complete mix). (D) Fitness scores for ARG5 on arginine-supplemented media (+DOC). (E) Raw counts for insertions in RTO4_11741 MET16 on non-supplemented media. (F) Fitness scores for insertions in MET16 on non-supplemented media. (G) Raw counts for insertions in MET16 on methionine-supplemented media (+DOC). (H) Fitness scores for insertions in MET16 on methionine-supplemented media (+DOC).
Figure 2—figure supplement 3. Properties of T-statistics.

Figure 2—figure supplement 3.

(A) Distributions of T-statistics for mock comparisons between individual replicate Time 0 samples from our auxotrophy experiments, shown as a quantile-quantile (QQ) plot. A perfect fit to the standard normal distribution would be on the x = y line (dashed grey line). (B) Distributions of T-statistics for mock comparisons between shuffled sets of Time 0 samples from three experiments, shown as a quantile-quantile plot. (C) Histogram of gene lengths for all genes with sufficient data to compute fitness scores, and for observations with |T| > 3 versus Time 0 for combined T-statistics across biological triplicates. (D) Histogram of the number of inserts per gene with sufficient depth in BarSeq experiments to contribute to fitness estimation. All genes and observations with |T| > 3 versus Time 0 for combined T-statistics across biological triplicates. (E) Histogram of total counts per gene per sample for all genes and for observations with |T| > 3 versus Time 0 for combined T-statistics across biological triplicates. (F) Histogram of GC content for all genes and for observations with |T| > 3 versus Time 0 for combined T-statistics across biological triplicates. (G) Average fitness score and T-statistics for observations with |T| > 3 versus gene length. Fitness scores and T-statistics were binned by gene length in intervals of 500 bp and averaged across each bin.