Supporting information for Akashi and Gojobori (2002) Proc. Natl. Acad. Sci. USA 99 (6), 3695–3700. (10.1073/pnas.062526999)

Supporting Text

Calculations of Metabolic Costs of Amino Acids.

Table 5 shows energetic costs for the precursors employed in biosynthesis of amino acids. For growth on glucose, we assume production of oxaloacetate (for biosynthesis) through the carboxylation of phosphoenolpyruvate. The cost of α-ketoglutarate includes the cost of producing an oxaloacetate molecule (through the anapleurotic pathway) to replenish the tricarboxylic acid (TCA) cycle. Relative costs of precursors are similar for other substrates that enter fueling pathways as C5 or C6 units (fructose, lactose, and pentoses). For growth on acetate, we assume a maximum energy gain through the oxidation of acetyl-CoA via the TCA cycle. The glyoxylate shunt and gluconeogenesis are assumed to operate to produce precursors. The cost of a-ketoglutarate includes the cost of replenishing the TCA cycle with an oxaloacetate molecule (through the glyoxylate cycle). For growth on malate, we assume a maximum energy gain through the conversion of malate to acetyl-CoA (via pyruvate) and oxidation through the TCA cycle. Precursors are generated via malate à oxaloacetate and malate à pyruvate and subsequent gluconeogenesis. The cost of a-ketoglutarate includes the cost of producing an oxaloacetate molecule (directly from malate) to replenish the TCA cycle.

Table 6 shows the energetic costs of biosynthesis of amino acids given these costs of precursor metabolites.

Metabolic Costs and Codon Usage Bias for Leucine. The use of synonymous codon bias as a measure of gene expression levels could bias our analyses. Although major codons increase in frequency within each synonymous family with increasing gene expression, the magnitude of the increase differs among amino acids (1). Thus, a shift in amino acid composition toward those that show higher synonymous bias can increase major codon usage (MCU) in the absence of expression differences. To control for such effects, analyses were conducted employing MCU within a single synonymous family. Leucine is the most common amino acid encoded in the genomes of both B. subtilis and E. coli and its MCU, MCUL, was used as a measure of expression levels. Only genes with 25 or more leucine codons were included in MCUL analyses. Although the number of genes in the data set was reduced and the noise in estimating expression levels through synonymous codon bias was expected to increase substantially, the correlation between energetic costs and MCUL remained highly statistically significant (B. subtilis: n = 1,770, rS = −0.237, Z = 10.27, P < 10–5; E. coli: n = 2,246, rS = –0.154, Z = 7.40, P < 10–5).

Metabolic Costs and Codon Usage Bias Excluding the Beginnings and Ends of Genes.

Codons located near the beginnings and ends of genes may experience selective pressures in addition to those experienced by codons located more centrally within genes (2, 3). Constraints relating to translation initiation may require a lower melting temperature and skewed base composition. To eliminate the contribution of such compositional constraints, analyses were performed with the first and last 50 codons excluded for each gene (for the calculations of both MCU and cost). Although the data set was substantially reduced, negative correlations between energetic costs and synonymous codon usage remained highly statistically significant (B. subtilis: n = 2,256, rS = –0.375, Z = 19.21, P < 10–5; E. coli: n = 2,587, rS = –0.227, Z = 11.85, P < 10–5).

Metabolic Costs and Codon Usage Bias on Leading and Lagging DNA Strands.

Mutational processes appear to differ for the leading and lagging strands of DNA (4). The GC skew, (G – C)/(G + C), differs on the plus and minus strands in both E. coli (5) and B. subtilis (6). To control for differences in mutational patterns experienced by different genes, analyses were conducted separately for leading and lagging strand genes and the results remained essentially unchanged (leading strand: B. subtilis: n = 1,435, rS = –0.385, Z = 15.77, P < 10–5; E. coli: n = 1,660, rS = –0.250, Z = 10.51, P < 10–5; lagging strand: B. subtilis: n = 1,620, rS = –0.382, Z = 16.62, P < 10–5 ; E. coli: n = 1,737, rS = –0.230, Z = 9.86, P < 10–5).

Metabolic Costs and Codon Usage Bias Excluding Costs of GNN Codons.

Eigen and Schuster (7) and Trifanov (8) have proposed that GNN codons may enhance translational processivity (i.e., reduce frameshift errors and ribosomal falloff). GNN codons increase in frequency as a function of measures of codon usage bias in E. coli (9). Our findings show similar increases of GNN as a function of synonymous codon usage bias in whole proteome analyses, as well as within functional categories, in both E. coli and B. subtilis. Selection for metabolic efficiency and translational processivity make overlapping predictions (i.e., increases in abundance of low-cost amino acids such as Val, Gly, and Ala). However, processivity selection does not fully account for the patterns described above; the association between metabolic costs and MCU remain significant when analyses are restricted to costs of non-GNN codons (B. subtilis: rS = –0.094, Z = 5.20, P < 10–5; E. coli: rS = 0.132, Z = 7.76, P < 10–5).

Amino Acid Compositional Changes as a Function of Codon Usage Bias.

Statistical analyses of associations between amino acid abundance and MCU are shown in Table 4. Examples of amino acids that increase and decrease in frequency as a function of synonymous codon usage bias are shown for B. subtilis and E. coli in Fig. 5 and Fig. 6, respectively.

1. Kanaya, S., Yamada, Y., Kudo, Y. & Ikemura, T. (1999) Gene 238, 143–155.

2. Bulmer, M. (1988) J. Theor. Biol. 133, 67–71.

3. Eyre-Walker, A. C. & Bulmer, M. (1993) Genetics 140, 1407–1412.

4. Lobry, J. R. (1996) Mol. Biol. Evol. 13, 660–665.

5. Blattner, F. R., Plunkett, G., 3rd, Bloch, C. A., Perna, N. T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J. D., Rode, C. K., Mayhew, G. F., et al. (1997) Science 277, 1453–1462.

6. Kunst, F., Ogasawara, N., Moszer, I., Albertini, A. M., Alloni, G., Azevedo, V., Bertero, M. G., Bessieres, P., Bolotin, A., Borchert, S. et al. (1997) Nature (London) 390, 249–256.

7. Eigen, M. & Schuster, P. (1979) The Hypercycle (Springer, Berlin), p. 64.

8. Trifonov, E. N. (1987) J. Mol. Biol. 194, 643–652.

9. Gutiérrez, G., Márquez, L. & Marín, A. (1996) Nucleic Acids Res. 24, 2525–2527.