Evidence of Selection for Low Cognate Amino Acid Bias in Amino Acid Biosynthetic Enzymes

Rui Alves; Michael A Savageau

doi:10.1111/j.1365-2958.2005.04566.x

. Author manuscript; available in PMC: 2007 Mar 29.

Published in final edited form as: Mol Microbiol. 2005 May;56(4):1017–1034. doi: 10.1111/j.1365-2958.2005.04566.x

Evidence of Selection for Low Cognate Amino Acid Bias in Amino Acid Biosynthetic Enzymes

Rui Alves ^1,², Michael A Savageau ^1,^*

PMCID: PMC1839009 NIHMSID: NIHMS2915 PMID: 15853887

Summary

If the enzymes responsible for biosynthesis of a given amino acid are repressed and the cognate amino acid pool suddenly depleted, then derepression of these enzymes and replenishment of the pool would be problematic, if the enzymes were largely composed of the cognate amino acid. In the proverbial ‘Catch 22’, cells would lack the necessary enzymes to make the amino acid, and they would lack the necessary amino acid to make the needed enzymes. Based on this scenario, we hypothesize that evolution would lead to the selection of amino acid biosynthetic enzymes that have a relatively low content of their cognate amino acid. We call this the ‘cognate bias hypothesis’. Here we test several implications of this hypothesis directly using data from the proteome of Escherichia coli. Several lines of evidence show that low cognate bias is evident in 15 of the 20 amino acid biosynthetic pathways. Comparison with closely related Salmonella typhimurium shows similar results. Comparison with more distantly related Bacillus subtilis shows general similarities as well as significant differences in the detailed profiles of cognate bias. Thus, selection for low cognate bias plays a significant role in shaping the amino acid composition for a large class of cellular proteins.

Introduction

Proteins are versatile effectors and mediators of cellular response. They catalyse reactions, serve as structural components of the cell and mediate cellular adaptation through sensing and signal transduction. Structure and function of proteins are subject to natural selection (King and Jukes, 1969; Richmond, 1970; Li, 1997) and, because structure and function are ultimately determined by the sequence of amino acids in proteins, it stands to reason that the amino acid composition of proteins is also subject to selection.

Previous studies have identified different types of selective pressure that are important in determining the relative amino acid composition for the proteins of an organism. Differences in the mutational bias of the different codons for each amino acid can partly account for the differences in relative amino acid composition (see Lobry, 1997; Singer and Hickey, 2000; Akashi and Gojobori, 2002; Seligmman, 2003 and references therein). The metabolic cost of synthesizing an amino acid, in terms of ATP and reducing equivalents, is also important in determining which amino acids are more prevalent in a proteome. The cheaper an amino acid is to synthesize, the more it is used (see Karlin and Bucher, 1992; Lobry and Gautier, 1994; Dufton, 1997; Jansen and Gerstein, 2000; Akashi and Gojobori, 2002; Seligmman, 2003 and references therein). Functional reasons that justify differential usage of amino acids in a given group of proteins have also been identified (Trifonov, 1987; Mazel and Marliere, 1989; Karlin and Bucher, 1992). For example, membrane proteins are biased towards high relative composition of hydrophobic amino acids (Karlin and Bucher, 1992).

Genes coding for amino acid biosynthetic enzymes are repressed in a medium where the cognate amino acid is present. Escherichia coli and many other bacteria typically derepress the expression of a small set of enzymes whenever there is a need to synthesize any given amino acid or set of amino acids. These are usually encoded in an operon or regulon, and their expression tends to be co-ordinated (see Herrmann and Somerville, 1983; Neidhardt, 1999 for reviews). When growing in a medium with low amino acid content, a significant fraction of cellular protein consists of enzymes involved in amino acid biosynthesis (Maaløe and Kjeldgaard, 1966; Neidhardt et al., 1990).

Although cells have general mechanisms such as induction of proteolyses and activation of the stringent response for remodelling the amino acid content of proteins when the organisms is stressed (Reeve et al., 1984; Matin, 1991; Foster and Spector, 1995; Magnusson et al., 2003; Weichart et al., 2003; Nystrom, 2004) by amino acid limitation, they also are likely to have more specific mechanisms. For example, if cells were growing in a rich medium and suddenly one of the exogenously supplied amino acids became depleted, then derepression of the corresponding biosynthetic enzymes and replenishment of the intracellular pool of that amino acid would be delayed and limited if the enzymes were largely composed of the cognate amino acid. Based on such a scenario, we hypothesize that evolution would lead to the selection of amino acid biosynthetic enzymes that have a relatively low content of their cognate amino acid, thus avoiding the ‘Catch 22’ situation in which the biosynthetic enzymes cannot be synthesized for lack of the amino acid and the amino acid cannot be synthesized for lack of the biosynthetic enzymes.

To explore the dynamics of this situation, we first adapt an existing computer model for an amino acid biosynthetic pathway (Xiu et al., 2002) and show that low cognate bias correlates with a greater extent of derepression of the pathway and with faster response times for this derepression. This suggests that our cognate-bias hypothesis is reasonable and that response time may well be the selective pressure for the low bias. We test several implications of this hypothesis directly using data from the well-characterized organism Escherichia coli, and we compare these results with the results from similar tests for a closely related Gram-negative organism Salmonella typhimurium and for a more distantly related Gram-positive organism Bacillus subtilis.

For each organism, we calculate the amino acid composition of proteins that are involved in the amino acid biosynthetic pathways and compare their composition with that of larger groups of proteins from the same organism, including the entire proteome. We find that most amino acid biosynthetic pathways in each of the organisms do have a low cognate bias. There are a few exceptions and in some cases there are functional reasons that can account for this. The closely related organisms have very similar profiles of cognate bias, whereas the more distantly related organisms have profiles with significant differences that may reflect their different evolutionary history and ecological niche.

Results

Amino acid composition

To determine whether a protein or a group of proteins has a relative amino acid composition that is significantly different from that of a larger group to which it belongs, one must first determine the composition of the larger group. Choosing an appropriate group of proteins to serve as a control for the calculation of average amino acid composition requires careful consideration. The relative composition of the control group, which is then used to estimate the probability of amino acid occurrence in a protein, should be a weighted average of the proteins being synthesized by the cell. This is so because the synthesis of an amino acid biosynthetic enzyme during derepression and the synthesis of all other proteins being expressed at the same time compete for the limiting amino acid. The relative amino acid composition of the protein complement in growing cells provides an estimate of the average composition of these proteins. Proteins expressed at low levels have a small contribution to the overall amino acid composition of cellular protein, whereas proteins with a high level of expression have a large contribution.

Thus, we have searched the literature for experimental determinations of the amino acid composition in the protein complement of the bacteria studied in this work. Having found such studies we then compared the experimentally determined composition with the composition of other groups of cellular proteins. We found that the relative amino acid composition of the protein complement in growing cells is almost identical to that of the cellular proteome determined from the DNA sequence (Table 1). For each genome, we also have calculated the relative composition for the entire group of non-enzymatic proteins, for the entire group of enzymatic proteins, and for each more specialized group of enzymes within an EC classification (i.e. classes 1 through 6). Class 1 includes all oxyreductase enzymes, class 2 all transferases, class 3 all hydrolases, class 4 all lyases, class 5 all isomerases and class 6 all ligases.

Table 1.

Average relative amino acid composition of bacterial proteins and of two different environments.

Amino acid	E. coli^a	S. typhimurium^a	B. subtilis^a	E. coli^b	S. typhimurium^b	B. subtilis^b	Soil^c	Intestine^c
Ala	0.093	0.098	0.077	0.112	–	0.045	Intermediate	Low
Arg	0.054	0.012	0.008	0.050	–	0.064	Low	High
Asn	0.040	0.052	0.052	0.050	–	0.037	Low	Low
Asp	0.051	0.056	0.072	0.050	–	0.037	Intermediate	Low
Cys	0.012	0.039	0.045	0.017	–	0.013	Low	Low
Gln	0.045	0.074	0.069	0.056	–	0.072	Low	Low
Glu	0.057	0.023	0.023	0.056	–	0.072	Low	High
Gly	0.072	0.059	0.074	0.086	–	0.058	High	Low
His	0.021	0.043	0.071	0.017	–	0.024	Low	Low
Ile	0.061	0.110	0.096	0.046	–	0.067	Low	Low
Leu	0.114	0.028	0.028	0.091	–	0.086	Low	Low
Lys	0.045	0.038	0.039	0.056	–	0.090	Low	High
Met	0.026	0.045	0.037	0.024	–	0.032	Low	Low
Phe	0.039	0.044	0.038	0.034	–	0.055	Low	Low
Pro	0.045	0.056	0.041	0.042	–	0.035	Low	Low
Ser	0.056	0.058	0.063	0.049	–	0.043	High	Low
Thr	0.055	0.055	0.054	0.053	–	0.042	Intermediate	Low
Trp	0.015	0.070	0.068	0.011	–	0.021	Low	High
Tyr	0.030	0.015	0.010	0.028	–	0.038	Low	High
Val	0.069	0.029	0.035	0.072	–	0.068	Low	Low

Open in a new tab

Calculated from the translated version of coding sequences in the genomes.

Experimental determination of the amino acid fraction in the cell after total protein purification and hydrolysis. The values for B. subtilis have been calculated from Sauer et al. (1996). The values for E. coli have been calculated from Pramanik and Keasling (1998).

Qualitative amino acid make-up of two different environments (Savageau, 1983) that are relevant for these organisms.

The results of the analysis presented in this article do not differ significantly when the different groups of proteins are used to calculate the probability of amino acid occurrence. Therefore, we present only the results based on the relative amino acid composition of the proteome to calculate the probability of amino acid occurrence.

Verifying two basic assumptions

There are basic assumptions involving each of the two types of Monte Carlo (MC) simulations that we have used to determine the statistical significance of our comparisons. The first MC approach assumes that, with respect to the relative amino acid composition, there is no strong correlation between any two different types of amino acids. To test the validity of this assumption for the E. coli proteome, we have calculated the Spearman correlation coefficient between the relative amounts of any two amino acids in the proteins. These correlation coefficients are small, which supports, to a first approximation, our first assumption (Table S1). (This has no implications regarding finer detail correlations between neighbouring amino acids or other factors that have been shown to influence the selection of amino acids at any given location in a protein; Cootes et al., 1998.) The second MC approach assumes that the relative composition of a protein is independent of the protein length. To test the validity of this assumption, we have calculated the Spearman correlation coefficient between the relative amounts of each amino acid and the length of the E. coli proteins. These correlations are also small, which supports the second assumption (Table S1).

Having verified that the two assumptions above are, to a first approximation, correct allows us to calculate, in closed form, the significance of the amino acid bias for any given protein (see Experimental procedures). Although we have used these three different approaches to calculate how significantly biased our proteins are and have used 10 different control groups, for each protein of interest and for each method of calculation and control group, the differences are at most a few per cent (data not shown). Therefore, we shall only present and discuss the data for the analytical approach using the entire proteome as the control group.

By using the analytical approach, we estimate the cognate amino acid bias of a protein by the probability (P-value) that the relative cognate amino acid composition of a protein is below (low bias) or above (high bias) that of the control group (see Experimental procedures for details).

Effect of cognate bias on time for amino acid recovery

Amino acid biosynthetic pathways in bacteria are repressible (Herrmann and Somerville, 1983; Neidhardt, 1999). For example, when growing in a medium in which an amino acid is available, E. coli cells typically repress expression of the genes that code for enzymes of the cognate pathway. This situation is represented in Fig. 1. When the proteins of a pathway that synthesize a given amino acid have a composition that is enriched for that amino acid (high cognate bias), it is likely that this high cognate bias will tend to prevent or delay the recovery of amino acid levels when cells are shifted from a medium that is rich in the amino acid to one that is poor.

Fig. 1 — Schematic model of a specific amino acid biosynthetic path-way in its cellular context. X₁- mRNA coding for the enzymes of the pathway that synthesizes the kth amino acid; X₂- enzymes of the pathway that synthesizes the kth amino acid; X₃- cognate amino acid (kth) of the biosynthetic pathway. See text for further discussion.

To analyse this hypothesis in a specific case we use a previously developed model (Xiu et al., 2002) of Trp biosynthesis in E. coli. The original normalized equations are the following

\frac{d X_{1}}{d t} = \frac{(1 + X_{3})}{[1 + (1 + \frac{k_{1} X_{3}}{k_{2} + X_{3}}) X_{3}]} \frac{k_{3}}{(k_{3} + X_{3})} - k_{4} X_{1}

(1)

\frac{d X_{2}}{d t} = k_{10} X_{1} - k_{1} X_{2}

(2)

\frac{d X_{3}}{d t} = k_{11} + \frac{k_{5} X_{2}}{(k_{5} + X_{3}^{2})} - k_{6} X_{3} - \frac{X_{3}}{(1 + X_{3})} \frac{k_{6} X_{3}}{(k_{7} + X_{3})} - \frac{k_{8} X_{3}}{(k_{9} + X_{3})}

(3)

X₁ represents the concentration of the mRNA that codes for the enzymes of the biosynthetic pathway. The synthesis of mRNA is repressed by an increase in the concentration of the amino acid (X₃) that is synthesized by the pathway. The term k₃/(k₃ + X₃) in Eq. 1 represents the attenuation by the leader peptide of the operon in the presence of Trp. The term $(1 + X_{3}) ∕ {1 + [1 + k_{1} X_{3} ∕ (k_{2} + X_{3})] X_{3}]}$ is a normalized function of the effect that Trp has on the repressor protein and on the repressor binding to the operator, assuming rapid equilibrium of both reactions. The decay of the mRNA molecules is a first-order process. The mRNA molecules are templates for the synthesis of the enzymes (X₂) in the biosynthetic pathway. Trp (X₃) synthesis is an enzymatic process whose rate is described by $k_{5} X_{2} ∕ (k_{5} + X_{3}^{2})$ The free Trp pool can be depleted by dilution (k₆X₃), binding to the repressor ${[X_{3} ∕ (1 + X_{3})] [k_{6} X_{3} ∕ (k_{7} + X_{3})]}$ or usage in protein synthesis $[k_{8} X_{3} ∕ (k_{9} + X_{3})]$ (for further details on the form of Eqs 1 to 3, see Xiu et al., 2002).

We have modified this model to include the possibility of an exogenous supply for Trp (k₁₁). The equation that accounts for the time-dependent behaviour of the enzyme concentration (X₂) is not explicitly dependent on the Trp concentration (X₃) in the original model. This is a justified approximation because the number of Trp residues is very low in the enzymes that catalyse Trp biosynthesis. Now imagine an organism that is identical to the first, except that the number of Trp residues in the Trp-biosynthetic enzymes is large. In this situation, the influence of Trp concentration on the rate of synthesis of the Trp-biosynthetic enzymes needs to be made explicit. To do this, we modify the rate of enzyme production in Eq. 2 to include a Henri-Michaelis-Menten dependence on the concentration of the cognate amino acid. The new equation that describes the time-dependent behaviour of the enzyme level (X₂) is now

\frac{d X_{2}}{d t} = \frac{k_{10} X_{1} X_{3}}{(k_{M} + X_{3})} - k_{1} X_{2}

(4)

When K_M = 0, Eq. 4 is the same as Eq. 2. The larger the K_M, the larger the relative amount of cognate amino acid in the biosynthetic enzymes.

We use Eqs 1, 3 and 4 and parameter values from Xiu et al. (2002) to simulate the following experiment involving biosynthetic enzymes with increasing levels of cognate amino acid in their composition. Let bacteria grow exponentially in a Trp-rich medium until a steady state has been achieved and then, at time zero, switch them to various media, with different lower amounts of Trp. This will lead to the derepression of the Trp-biosynthetic enzymes. The results of such an experiment are shown for three different shifts (Fig. 2). The lower the Trp levels in the poor medium, the higher the derepressed enzyme levels. For increasing relative composition of Trp in the biosynthetic enzymes, the organism will take longer to produce a similar amount of enzyme and thus to set up an appropriate response to the challenge of amino acid depletion. Furthermore, as the relative amount of amino acid in the biosynthetic enzymes increases, the amino acid levels in the new steady state decrease. For a large depletion of amino acid in the medium, biosynthetic enzymes with high relative amino acid composition exhibit an initial bout of synthesis, but then fail to be synthesized at a steady state rate (compare among panels A, C and E of Fig. 2) sufficient to produce acceptable amino acid levels (compare among panels B, D and F of Fig. 2).

Fig. 2 — Time-course of derepression and amino acid recovery for the computer model of the tryptophan biosynthetic pathway from *E. coli*. Cells growing in a medium with excess Trp (k₁₁ = 1) are switched to media containing various lower amounts of Trp (k₁₁ < 1): A and B (k₁₁ = 0.5), C and D (k₁₁ = 0.41), E and F (k₁₁ = 0, which corresponds to no Trp). Expression of the *trp* operon undergoes derepression and is allowed to reach a new steady state. The upper curve in each panel corresponds to the case in which the rate of enzyme synthesis is independent of Trp concentration (k_M = 0), and the curves then decrease in the order of increasing dependence on Trp concentration (k_M = 0, 0.1, 1, 10, 20, 50, 100, 500). The steady state concentrations of Trp and of the Trp-biosynthetic enzymes in a Trp-reduced media decrease with increasing k_M.

A, C and E. Dimensionless time-course for protein levels, which are normalized with respect to the maximum derepressed steady-state value in (E).

B, D and F. Dimensionless time-course for intracellular amino acid levels, which are normalized with respect to the same initial value.

Note that the y-axes changes scale progressively from panel to panel in order to show differences while accommodating the increasing degrees of derepression. See text for further discussion.

Thus, in nature, with highly variable environmentally supplied Trp levels, the cells with a higher Trp content in their Trp-biosynthetic enzymes would be out-competed by those with a lower Trp content. The regulatory loops in the biosynthesis of other amino acids are similar to the one we have analysed, which suggests that composition effects on temporal responses are common phenomena.

Correlation between cognate bias and molecular activity

The specific activity of an enzyme is determined by the product of its molecular activity and the number of enzyme molecules. For a given specific activity, those enzymes with the lowest molecular activity require the largest number of molecules and their synthesis consumes the largest amount of the cognate amino acid. During the critical phase of derepression, the most rate-determining enzymes in amino acid biosynthetic pathways will be under strong selection to minimize the content of their cognate amino acid. Hence, we expect the cognate bias of these enzymes to be directly correlated with their molecular activity.

Estimates of the specific activity for most enzymes involved in amino acid biosynthetic pathways can be found either in the primary literature or in the BRENDA data-base. Specific activity is determined by the amount of reaction catalysed during each time unit by a fixed weight of enzyme (usually having units of μM min^-1 mg ^-1). We estimate the molecular weight of the enzymes by adding the weight of their amino acid residues and subtracting the weight of one water molecule per peptide bond. Using this information, together with the specific activity, we can calculate a molecular activity for each of the enzymes. However, one needs to keep in mind that the purification methods and conditions under which the specific activities have been determined for the different enzymes are not the same. Thus, it is likely that some errors are introduced in the calculations. The numbers for both the measured specific activity and the calculated molecular activity of the enzymes graphically represented in Fig. 3 are shown in Table 2.

Fig. 3 — Schematic representation of amino acid biosynthetic pathways. Names for each enzyme, represented here by their EC number, and the corresponding gene are given in Table 2.

Table 2.

Enzymes of Escherichia coli, Salmonella typhimurium and Bacillus subtilis that are involved in the biosynthetic pathway for each amino acid.

		Gene			Specific activity (μM min^-1 mg^-1)			Molecular activity (Mol reactant s^-1 Mol enzyme^-1)
Amino Acid	EC number	E. coli	S. typhimurium	B. subtilis	E. coli	S. typhimurium	B. subtilis	E. coli	S. typhimurium	B. subtilis
Alanine	2.6.1.66	avtA	avtA	–	0.0196	0.0196	–	0.0151	0.0152	–
	2.6.1.42	ilvE	ilvE	ywaA	15.9	15.9	15.9	9.05	9.02	10.7
	5.1.1.1	alr	alr	alr, yncD	133	910	143	–	0.130	103
Arginine	2.3.1.1	argA	argA	argA	133	133	133	109	109	96.1
	2.7.2.8	argB	argB	argB	0.540	0.540	0.540	0.244	0.243	0.249
	1.2.1.38	argC	argC	argC	0.950	0.950	0.950	0.569	0.569	0.603
	2.6.1.11	argD	argD	argD	11.2	11.2	11.2	8.17	8.15	7.63
	3.5.1.16	argE	argE	amhX	800	800	800	564	562	567
	2.1.3.3	argF	–	argF	2900	–	265	1780	–	153
	2.1.3.3	argI	argI	–	2900	1900	–	1780	1160	–
	6.3.4.5	argG	argG	argG	12.8	12.8	4.54	10.6	11.1	3.39
	4.3.2.1	argH	argH	argH	0.380	0.380	0.380	0.319	0.320	0.329
Asparagine	6.3.5.4	asnB	asnB	asnB, asnH, asnO	0.300	0.300	0.300	0.313	0.313	0.353
	6.3.1.1	asnA	asnA	–	57.3	57.3	–	35.0	35.1	–
	3.5.1.1	iaaA, ybik	iaaA, ybik	ansA	–	–	32.0	–	–	19.4
Aspartate	2.6.1.1	aspC	aspC	aspB	232	232	220	168	168	158
	4.3.1.1	aspA	aspA	ansB	167	167	320	150	145	19.4
	6.3.5.4	–	–	asnO, asnH, asnB	–	–	0.300	–	–	0.353
	3.5.1.1	–	–	ansA	–	–	32.0	–	–	19.4
Cysteine	2.5.1.47	cysK; cysM	cysK; cysM	cysK, yrhA, ytkP	6.30	1100	25.5	3.62	398	13.9
	2.3.1.30	cysE	cysE	cysE	71.6	397	71.6	35.0	194	28.8
Glutamine	6.3.1.2	glnA	glnA	glnA	153	231	153	132	187	128
	6.3.5.5	carAB	carAB	–	4.16	4.16	–	1.20	0.890	–
	3.5.1.12	–	–	ylaM, ybgJ	–	–	716	–	–	406
Glutamate	1.4.1.13	gltBD	gltBD	gltAB	18.6	18.6	23.0	72.8	8.32	12.4
	1.4.1.4	gdhA	gdhA	–	250	231	–	202	187	–
	1.4.1.2	–	–	rocG	–	–	80.0	–	–	62.2
Glycine	2.1.2.1	glyA	glyA	glyA	13.6	13.6	13.6	10.3	10.3	10.3
Histidine	2.4.2.17	hisG	hisG	hisG	544	544	544	302	301	214
	3.6.1.31-3.5.4.19	hisI	hisI	hisI	0.00300	0.00300	332	0.00114	0.00113	132
	5.3.1.16	hisA	hisA	hisA	7.80	7.80	7.80	3.40	3.39	3.45
	2.4.2.-	hisHF	hisHF	hisFH	0.900	0.900	0.900	0.325	0.0523	0.0506
	4.2.1.19-3.1.3.15	hisB	hisB	hisB	5.70	202000	0.310	3.84	135000	0.111
	2.6.1.9	hisC	hisC	hisC	1890	1890	325	1240	1250	217
	3.1.3.15	–	–	hisJ	–	–	427	–	–	217
	1.1.1.23	hisD	hisD	hisD	15.3	14.3	3.60	11.8	10.9	2.77
Isoleucine	1.2.4.1	–	–	acoAB, pdhAB	–	–	0.120	–	–	0.00567
	4.3.1.19	ilvA	ilvA	–	230	683	–	215	640	–
	2.2.1.6	ilvGM; ilvBN; ilvIH	ilvGM; ilvBN; ilvIH	ilvBN	4000	4000	4000	500	456	812
	1.1.1.86	ilvC	ilvC	ilvC	1.90	1.91	1.91	1.71	1.72	1.19
	4.2.1.9	ilvD	ilvD	ilvD	63.0	63.0	63.0	67.6	35.7	62.5
	2.6.1.42	ilvE	ilvE	ilvE	27.3	27.3	27.3	15.5	4.34	18.3
Leucine	2.3.3.13	leuA	leuA	leuA	14.5	14.7	7.10	13.8	14.1	6.73
	4.2.1.33	leuCD	leuCD	leuCD	0.072	0.0720	6.18	0.00800	0.00527	0.985
	1.1.1.85	leuB	leuB	leuB	0.085	35.5	10.5	0.0561	35.5	6.99
	2.6.1.42	ilvE	ilvE	ybgE	57.0	53.4	27.3	32.5	53.4	18.0
	1.4.1.9	–	–	bcd	–	–	110	–	–	73.3
Lysine	2.7.2.4	lysC	lysC	lysC	5.69	5.69	30.0	4.60	4.61	21.8
	1.2.1.11	asd	asd	asd	145	145	73.0	96.7	97.0	46.0
	4.2.1.52	dapA	dapA	dapA	100	100	458	52.1	52.1	237
	1.3.1.26	dapB	dapB	dapB	398	398	56.4	191	191	27.7
	2.3.1.117	dapD	dapD	dapD	36.0	36.0	36.0	17.9	17.9	15.0
	2.6.1.17	dapC; yfdZ	dapC; yfdZ	yugH	5.33	5.33	5.33	4.10	4.10	3.76
	3.5.1.18	dapE	dapE	ytjP	3.33	3.33	3.33	2.31	2.31	1.99
	5.1.1.7	dapF	dapF	dapF	18.7	18.7	18.7	9.45	9.49	9.62
	4.1.1.20	lysA	lysA	lysA	7.50	7.50	28.3	5.77	5.75	22.9
Methionine	2.3.1.46	metA	metA	metB	–	–	–	–	–	–
	2.5.1.48	metB	metB	yjcI	10.0	18.2	10.0	6.92	12.6	6.95
	4.4.1.8	metC	metC	yjcJ	248	7.90	7.90	179	5.64	5.59
	2.1.1.13	metH	metH	–	9.30	9.30	–	21.1	21.6	–
	2.1.1.14	metE	metE	metC	2.50	2.50	0.240	3.53	3.53	0.347
	2.1.1.10	–	–	ybgG	–	–	1.37	–	–	0.793
Phenylalanine	5.4.99.5-4.2.1.51	pheA	pheA	pheA, aroA, aroH	52.0	32.0	17.5	37.3	22.8	4.23
	2.6.1.57	tyrB	tyrB	–	170	170	–	123	123	–
	2.6.1.9	–	–	aroJ	–	–	323	–	–	216
Proline	2.7.2.11	proB	proB	proJ, proB	12.7	12.7	12.7	8.25	8.27	8.52
	1.2.1.41	proA	proA	proA	28.2	28.2	28.2	21.0	21.0	21.3
	1.5.1.2	proC	proC	proGH	2550	2550	280	1200	1190	141
Serine	1.1.1.95	serA	serA	serA	9.70	9.70	15.6	7.14	7.11	14.8
	2.6.1.52	serC	serC	serC	15.0	15.0	15.0	9.94	9.95	10.0
	3.1.3.3	serB	serB	rsbP	3.00	3.00	3.00	1.75	1.75	2.300
Threonine	2.7.2.4-1.1.1.3	thrA	thrA	lysC, dapG, yclM	5.69	5.69	30.0	8.45	8.41	21.5
	1.2.1.11	asd	asd	asd	145	145	0.0400	97.0	97.0	0.0252
	1.1.1.3	–	–	hom	–	–	51.0	–	–	40.3
	2.7.1.39	thrB	thrB	thrB	3.10	3.10	3.10	1.74	1.72	1.72
	4.2.3.1	thrC	thrC	thrC	7.70	7.70	8.80	6.04	6.03	5.49
Tryptophan	4.1.3.27	trpDE	trpDE	trpE	2.60	0.380	0.380	1.30	0.0457	0.368
	2.4.2.18	trpD	trpD	trpD	177	–	1.58	2.46	–	0.948
	5.3.1.24-4.1.1.48	trpC	trpC	trpF	8.78	2.30	8.78	7.24	1.89	3.52
	4.1.1.48	–	–	trpC	–	–	3.70	–	–	1.72
	4.2.1.20	trpAB	trpAB	trpAB	125	125	2.80	20.0	39.7	1.07
Tyrosine	5.4.99.5-1.3.1.12	tyrA	tyrA	aroA, aroH, pheA	52.0	52.0	17.5	36.4	36.3	4.23
	2.6.1.57	tyrB	tyrB	aroJ	70.2	70.2	323	50.9	50.9	216
Valine	1.2.4.1	–	–	acoAB, pdhAB	–	–	0.120	–	–	0.00567
	4.3.1.19	ilvA	ilvA	–	230	683	–	215	640	–
	2.2.1.6	ilvGM; ilvBN; ilvIH	ilvGM; ilvBN; ilvIH	ilvBN	4000	4000	4000	500	456	812
	1.1.1.86	ilvC	ilvC	ilvC	1.90	1.91	1.91	1.71	1.72	1.19
	4.2.1.9	ilvD	ilvD	ilvD	63.0	63.0	63.0	67.6	35.7	62.5
	2.6.1.42	ilvE	ilvE	ilvE	27.3	27.3	27.3	15.5	4.34	18.3
	2.6.1.66	avtA	avtA	–	0.0196	0.0196	–	0.0151	0.0152	–

Open in a new tab

For each amino acid, italicized type indicates the gene for the enzyme in its classical biosynthetic pathway. Bold italicized type indicates an auxiliary enzyme that has been observed to replace an enzyme in the classical pathway, at least in vitro, but that is not usually considered part of the biosynthetic pathway. We were unable to find appropriate estimates for the activity of enzyme 2.3.1.46 in methionine biosynthesis.

As indicated above, we expect the most rate-determining enzymes in the amino acid biosynthetic pathways to have a cognate bias, given by a P-value, that is positively correlated with molecular activity. That is, the lower the molecular activity, the lower the cognate bias (P-value). The data in Table 3 show a statistically significant positive correlation between cognate bias and molecular activity for the amino acid biosynthetic enzymes from each of the three organisms. This supports the cognate bias hypothesis and suggests that fast recovery of amino acid pools is a significant pressure in determining the cognate amino acid composition of biosynthetic enzymes. The strength and significance of this pressure is greater for the enteric bacteria than for B. subtilis.

Table 3.

Spearman rank correlation coefficient (r) between cognate amino acid bias of the amino acid biosynthetic enzymes and their molecular activity.

	E. coli	S. typhimurium	B. subtilis	Overall
	r (n = 79)^d,^e	r (n = 79)^d,^e	r (n = 89)^d,^e	r (n = 247)^d,^e
Molecular activity^a	0.284 (P < 0.0033)	0.301 (P < 0.0030)	0.186 (P < 0.010)	0.230 (P < 5 × 10^-7)
Protein cost^b	-0.173 (P < 0.05)	-0.104 (P < 0.15)	0.147 (P < 0.04)	-0.0819 (P < 0.05)
GC content^c	0.216 (P < 0.02)	0.154 (P < 0.07)	0.157 (P < 0.04)	0.164 (P < 0.01)
Codon bias^c	-0.201 (P < 0.03)	0.0975 (P < 0.18)	0.0580 (P < 0.27)	-0.0322 (P < 0.24)

Open in a new tab

Calculated as described in Experimental procedures.

Calculated by adding the costs for synthesizing each of the amino acid residues that constitute the enzymes of a given amino acid biosynthetic pathway.

Calculated for the genes that encode the enzymes of a given amino acid biosynthetic pathway.

n represents the number of enzyme activities used to calculate the correlations. It can be determined from Table 2.

P represents the probability that the correlation is non-significant (see Experimental procedures).

Also shown for comparison are correlations between cognate amino acid bias of the amino acid biosynthetic enzymes and three other factors that could potentially influence the amino acid composition of the same biosynthetic enzymes. These three factors have been previously identified as influencing amino acid composition of the total proteome

As controls, we have determined correlations between cognate amino acid bias and three other factors that might influence amino acid composition of amino acid biosynthetic enzymes. As shown in Table 3, these correlations are much less significant. We also have determined the correlation between bias of each amino acid (non-cognate as well as cognate) and molecular activity for each of the proteins involved in amino acid biosynthesis (Table 4). The correlations are in general low and non-significant, which shows that the significant correlations are specific for the cognate amino acid.

Table 4.

Significance of the Spearman rank correlation coefficient (r) between amino acid bias and molecular activity of enzymes involved in amino acid biosynthesis.^a

	E. coli	S. typhimurium	B. subtilis	Overall
	r (n = 79)^b,^c	r (n = 79)^b,^c	r (n = 89)^b,^c	r (n = 247)^b,^c
Ala	-0.0299 (P < 0.38)	-0.0390 (P < 0.35)	0.118 (P < 0.084)	-0.0489 (P < 0.16)
Arg	-0.00708 (P < 0.47)	-0.0599 (P < 0.28)	0.00803 (P < 0.47)	-0.0883 (P < 0.038)
Asn	-0.0869 (P < 0.19)	-0.0708 (P < 0.24)	0.160 (P < 0.032)	-0.0121 (P < 0.40)
Asp	-0.127 (P < 0.10)	-0.157 (P < 0.058)	-0.332 (P < 4.8 × 10^-5)	-0.155 (P < 0.00045)
Cys	0.0288 (P < 0.41)	-0.0334 (P < 0.37)	0.0420 (P < 0.36)	-0.00400 (P < 0.46)
Gln	0.171 (P < 0.054)	0.0300 (P < 0.39)	-0.0316 (P < 0.36)	0.103 (P < 0.020)
Glu	-0.175 (P < 0.043)	0.0159 (P < 0.43)	-0.0641 (P < 0.22)	-0.0392 (P < 0.21)
Gly	-0.0731 (P < 0.23)	0.156 (P < 0.061)	0.0145 (P < 0.44)	0.0372 (P < 0.23)
His	-0.0178 (P < 0.41)	-0.176 (P < 0.041)	-0.0218 (P < 0.39)	-0.129 (P < 0.0046)
Ile	0.103 (P < 0.17)	0.108 (P < 0.14)	0.241 (P < 0.0024)	0.0827 (P < 0.049)
Leu	-0.0469 (P < 0.31)	-0.107 (P < 0.14)	-0.106 (P < 0.11)	-0.0806 (P < 0.052)
Lys	-0.0802 (P < 0.21)	0.0549 (P < 0.30)	-0.0912 (P < 0.14)	0.0337 (P < 0.25)
Met	0.259 (P < 0.0067)	0.289 (P < 0.0022)	-0.0801 (P < 0.17)	0.234 (P < 1.3 × 10^-6)
Phe	-0.00113 (P < 0.13)	-0.168 (P < 0.048)	-0.231 (P < 0.0033)	0.102 (P < 0.020)
Pro	-0.134 (P < 0.091)	-0.0370 (P < 0.35)	0.154 (P < 0.036)	-0.0730 (P < 0.069)
Ser	0.130 (P < 0.1)	0.197 (P < 0.025)	0.0592 (P < 0.25)	0.128 (P < 0.0050)
Thr	0.183 (P < 0.043)	0.131 (P < 0.10)	-0.0357 (P < 0.34)	0.0667 (P < 0.093)
Trp	-0.107 (P < 0.11)	-0.176 (P < 0.028)	-0.174 (P < 0.0033)	-0.111 (P < 0.0066)
Tyr	-0.176 (P < 0.042)	-0.131 (P < 0.094)	0.130 (P < 0.064)	-0.0598 (P < 0.10)
Val	0.179 (P < 0.045)	0.0552 (P < 0.29)	-0.0177 (P < 0.42)	0.0755 (P < 0.065)
CB^d	0.284 (P < 0.0033)	0.301 (P < 0.0030)	0.186 (P < 0.01)	0.230 (P < 5 × 10^-7)

Open in a new tab

Values in bold are those that are significant to a level higher than 99% and r > 0.20.

n represents the number of enzyme activities used to calculate the correlations. It can be determined from Table 2.

P represents the probability that the correlation is non-significant (see Experimental procedures).

CB indicates cognate amino acid bias.

Methionine provides a notable exception to this pattern by exhibiting a statistically significant positive correlation between amino acid bias and molecular activity for all the amino acid biosynthetic enzymes. As the first amino acid of a protein is always Met, if enzymes contain fewer internal Met residues, then the additional free Met residues can be used to synthesize additional peptide chains and thus boost the velocity of the process in which the enzymes are involved. The selection for this effect should be stronger when the molecular activity of an enzyme is lower and the required rate of enzyme synthesis is correspondingly higher. This correlation should extend to all cellular enzymes, not just amino acid biosynthetic enzymes. However, testing this hypothesis is currently not feasible because there is not enough information regarding the specific activity for all the enzymes in a given organism.

Correlation between cognate bias and pathway length

The longer an amino acid biosynthetic pathway (larger number of protein chains), the stronger the selection for a low cognate bias because a larger number of proteins needs to be synthesized to obtain one full pathway set. According to this expectation, there should be an inverse correlation between cognate bias and number of peptide chains that make up the pathway complement. This correlation indeed exists, as shown in Table 5.

Table 5.

Correlation between number of proteins that make up the enzymes of each amino acid biosynthetic pathway and their minimum amino acid bias.

	E. coli	S. typhimurium	B. subtilis
Minimum bias	r (n = 20)^a,^b	r (n = 20)^a,^b	r (n = 20)^a,^b
Cognate	-0.49 (P < 0.003)	-0.72 (P < 0.004)	-0.23 (P < 0.21)
Non-cognate	-0.12 (P < 0.18)	-0.22 (P < 0.14)	-0.06 (P < 0.39)
Overall	-0.13 (P < 0.16)	-0.22 (P < 0.14)	-0.04 (P < 0.44)

Open in a new tab

n represents the number of enzyme activities used to calculate the correlations. It can be determined from Table 2.

P represents the probability that the correlation is non-significant (see Experimental procedures).

As a control we have also examined the correlation between the minimum non-cognate bias and the number of protein chains of the pathway. Table 5 shows that the correlation between minimum non-cognate bias and pathway length is negative, but that it is significantly weaker and less significant than in the case of the cognate bias.

Profiles of cognate bias in amino acid biosynthetic pathways

Figure 4B shows the profile of cognate bias for the group of E. coli proteins involved in the synthesis of a given amino acid. With the exception of Glu, Gly and Tyr, the group of biosynthetic proteins responsible for the synthesis of a given amino acid has a cognate bias that is below the 50th percentile of the control group.

Fig. 4 — The compositional bias of biosynthetic pathways with respect to their cognate amino acid.

As expected both from the earlier analysis and from the high degree of homology between many of the E. coli and S. typhimurium proteins, the profile of cognate bias for S. typhimurium is similar (Fig. 4A). However, in S. typhimurium the cognate bias of the Asp, Asn and Val biosynthetic pathways is also above the 50th percentile. Of the biosynthetic pathways that have a cognate bias below the 50th percentile, the average bias is somewhat smaller for S. typhimurium than for E. coli, again as one would have predicted from the fact that there is a stronger correlation between cognate bias and molecular activity for the enzymes of S. typhimurium (Table 3).

Although the general conclusions are similar, the detailed profile of cognate bias for B. subtilis differs considerably from that of the two enteric bacteria. More than half of the amino acid biosynthetic pathways have a cognate bias above the 50th percentile (Fig. 4C). These include the pathways for Ala, His, Phe, Ser and Thr, which are in addition to the pathways with cognate bias above the 50th percentile in E. coli and S. typhimurium.

Profiles of cognate bias in individual amino acid biosynthetic enzymes

Although the cognate bias of individual biosynthetic pathways is above the 50th percentile for some of the amino acids, this does not necessarily mean that selection for a low bias is too weak to be observed for these amino acids. It may be an indication that a subset of the enzymes in the pathway has a dominant effect on the rate of synthesis for the cognate amino acid and is critical for recovery of amino acid levels during derepression. Only this subset of enzymes would then be sufficiently sensitive to selection for a low cognate bias to be observed. To further analyse this implication of the cognate bias hypothesis, we examine the cognate bias of individual biosynthetic enzymes within each pathway for each of the three organisms.

Figure 5 shows the cognate bias for the individual enzyme that has the lowest value within each amino acid biosynthetic pathway. One sees immediately for each of the organisms that approximately 75% of the amino acids pathways exhibit at least one enzyme that has a low cognate bias. Of the 75%, the enzyme with the lowest cognate bias has either the lowest or the second lowest molecular activity in the pathway. Whenever genetic and biochemical data are available, one finds this enzyme to be regulated both by end-product inhibition at the level of enzyme activity and by repression at the level of gene expression (shown in Table 6 for the E. coli pathways). This type of regulation is an additional indication that the enzyme has a dominant influence over the flux through the pathway, particularly during the critical early phase of recovery.

Fig. 5 — Enzyme with the lowest value for cognate bias in the biosynthetic pathway of each amino acid. The order of the pathways is the same as that in Fig. 4. The x-axis shows the name of the gene coding for the enzyme. The numbers indicate the probability that this lowest bias occurs in a set of proteins, containing the same number of proteins as the pathway, and drawn randomly from the proteome of each organism.

Table 6.

Correlation between low cognate amino acid bias and regulation of gene expression and enzyme activity.^a

Biosynthetic pathway	Enzymes with low cognate amino acid bias (below 10% quantile)	Enzymes repressed by cognate amino acid addition	Enzymes derepressed upon cognate amino acid depletion	End-product-inhibited enzymes
Ala	AvtA	AvtA	–	–
Arg	ArgBCFI	ArgABCDEFGHI	ArgECBH	ArgA, ArgB
Asn	AsnA, AsnB	AsnA	–	AsnA, AsnB
Asp	–	–	–	–
Cys	CysK-Z,CysM	CysK-Z,CysM, CysE	CysK-Z, CysM, CysE	CysK-Z, CysM, CysE
Gln	GltB, GltD	GltB, GltD	GltD, GltB	GltD, GltB
Glu	–	GdhA	GdhA	GdhA
Gly	–	GlyA	GlyA	–
His	HisC, HisD	All	All	HisC, HisG
Ile	IlvGA	IlvGEDA	IlvGEDA	IlvA
Leu	IlvE, leuABCD	LeuABCD	LeuABCD	LeuA
Lys	LysC, DapE, DapF, LysA	LysC, DapE, LysA	LysC, DapA, LysA	LysC, DapA
Met	MetA, MetE, MetB	MetA, MetH, MetE, MetB, MetL	MetA, MetH, MetE	MetA
Phe	PheA slightly above 10% quantile	PheA	PheA	PheA
Pro	ProA, ProB	ProB	ProC	ProA
Ser	SerB	–	–	SerA
Thr	ThrAB	ThrAB	ThrABC	ThrAB
Trp	TrpABCDE	TrpABCDE	TrpABD	TrpED
Tyr	–	TyrA	TyrA	TyrA
Val	AvtA	AvtA, IlvG	–	IlvIHGM

Open in a new tab

This information has been compiled from Herrmann and Somerville (1983), Neidhardt (1999), (Khodursky et al., 2000) and references therein.

This more detailed analysis further supports the notion that selection for low cognate bias in enzymes within a given amino acid biosynthesis pathway is strong enough to influence the relative composition of biosynthetic enzymes. Nevertheless, for each organism there are some pathways in which the lowest cognate bias is greater than 0.1. These give rise to profile features that are fairly similar for the enteric bacteria E. coli and S. typhimurium, but different from that for the more distantly related bacterium B. subtilis.

For E. coli and S. typhimurium there are three amino acid biosynthetic pathways (Asp, Phe and Tyr) in which selection for low cognate bias appears to be masked by functional requirements.

Asp biosynthesis

In E. coli, AspC is the homodimeric protein that is traditionally thought to catalyse Asp biosynthesis from Glu and oxoglutarate, while AspA is the homotetrameric enzyme thought by many to produce fumarate and NH₃ from the catabolism of Asp. However, genetic and biochemical evidence suggests that AspA is likely to catalyse the reverse reaction (i.e. Fumarate + NH₃ → Asp) under normal conditions in vivo. In industrial processes, purified AspA, or E. coli cells overexpressing the enzyme, are used to produce Asp (for a discussion, see Herrmann and Somerville, 1983; Neidhardt, 1999 and references therein). The cognate bias of both enzymes is higher than the 10th percentile.

The protein AspC has a cognate bias that is around the 50th percentile. The analysis of the dimeric structure of AspC shows that three Asp residues are involved in the interface contact surface of the monomers. These three residues are also conserved (data not shown) in all AspC bacterial homologues from the SWISSPROT database (Boeckmann et al., 2003), suggesting an important functional role. If these three residues are discounted, the cognate bias of Asp drops to the 27th percentile. This is still above the 0.1 significance level, but is nevertheless a lower cognate bias.¹

Surprisingly, the protein AspA has a lower cognate bias (approximately 12th percentile) than AspC, although it is still above the 10-percentile threshold. This protein is active as a tetramer in which several of the Asp residues are involved in forming potential salt bridges between the monomers (Fig. S1). If one discounts these residues, which are selected for functional reasons, then the cognate bias of the enzyme is well below the first percentile. Thus, in this case it appears that functional considerations are responsible for masking the selection for low cognate bias.

Phe and Tyr biosynthesis

Although there is some variation regarding the lowest cognate bias in the Tyr and Phe biosynthetic pathways of E. coli and S. typhimurium, they have in common two qualitative features of interest. First, it is the initial enzyme in each pathway that has the lowest cognate bias. These enzymes are repressed at the level of gene expression and end-product inhibited at the level of enzyme activity (Neidhardt, 1999). As noted above, this is an indication that they catalyse a rate-determining step in the corresponding biosynthetic pathway. The correlation between the properties of the regulatory enzyme in the relevant pathway and its low cognate bias suggests that selection for low cognate bias might still be present but masked for functional reasons.

The second feature shared by both pathways is that a single gene, tyrB, encodes both the second and last enzymes in each pathway. Although the relevance of this second feature for our argument is less obvious, as shown below there is reason to believe that selection for low cognate bias is operating on these enzymes.

Table 1 shows that Phe and Tyr are among the least abundant amino acids in proteins, at approximately 4% and 3% respectively. However, their particular chemical properties, due to their side-chain aromatic ring, make them especially important in active centres and in substrate interaction sites (Pedersen and Finazzi-Agro, 1993; Frey, 2001; Rogers and Dooley, 2003). The number of residues that comprise an active centre is usually small compared with the total number of residues in the protein. A conservative estimate would suggest that less than 10 residues would account for most active sites. As the smaller active enzymes have between 100 and 150 residues, this would predict that approximately 10% of the residues in a protein are involved in the formation of an active centre. If proteins include a low percentage of Phe or Tyr residues, and if Phe or Tyr residues are necessary in the active centre, then these features will provide a strong selection for a high compositional bias for these amino acids in any protein.

A more detailed analysis of the enzymes in the Phe and Tyr biosynthetic pathways indicates selection for low cognate bias when the functional role of the amino acid in active centres is factored out. The gene that encodes the enzyme catalysing the second step in each pathway is tyrB. The protein product has 397 amino acids, of which 17 are Phe residues (4.3%) and 15 are Tyr residues (3.8%). A crystal structure for this protein has been deposited by Ko et al. in the protein databank (PDB code 3TAT), although a paper analysing the structure has not been published yet. This protein is a dimer and our own analysis shows that at least five Phe residues and six Tyr residues are involved in the active centre and in the interaction between monomers respectively (Fig. S1). Furthermore, these residues are conserved in homologous proteins from other organisms (data not shown). If we discount these residues, then the cognate bias of tyrB is below the 10th percentile for both the Phe pathway and the Tyr pathway.²

The functional analysis of the first enzyme in each pathway cannot be accomplished as easily. PheA, the first enzyme in the Phe biosynthetic pathway, is a 386-aminoacid-residue, bifunctional protein. Eleven of the 386 residues are Phe (2.9%). This is below the average cognate bias, but nevertheless above our 10-percentile threshold. The Protein Databank entry for this protein, file 1ECM, shows that there are no Phe residues involved in the active centre of the chorismate mutase activity of PheA. This is unlike the case of other chorismate mutases, such as that of B. subtilis. Bacterial homologues of PheA from the SWISSPROT database (Boeckmann et al., 2003) show perfect conservation for two of the 11 Phe residues (Fig. S2). This suggests an important functional role for these residues. If these residues are discounted, then the cognate amino acid bias falls bellow the sixth percentile. Furthermore, three of the other Phe residues are perfectly conserved in all but two of the proteins. This may be taken as an indication that these residues are important for the function or structure of the protein. Again, we find that the higher-than-expected cognate bias may result from the functional requirements of the protein for this specific type of amino acid residue.

We now consider the TyrA protein. This protein is composed of 373 amino acid residues, 10 of which are Tyr residues (2.7%). There is no known structure for this enzyme or for any of its homologues. Bacterial homologues of TyrA from the SWISSPROT database (Boeckmann et al., 2003) show perfect conservation for five out of the 10 Tyr residues. An additional Tyr residue is conserved in all but one of the homologues, where it is replaced by a Phe residue (Fig. S2). If these residues are discounted, the cognate amino acid bias of TyrA drops well below the 6th percentile. Furthermore, two of the additional Tyr residues are conserved in all but one of the homologous proteins.

As a control for these cases in which cognate amino acid residues are discounted, one can discount other conserved (non-cognate amino acid) residues and recalculate the compositional bias. When this is done, the compositional bias with respect to these amino acids does not drop as low as that for the cognate amino acid (data not shown).

For B. subtilis there are four amino acid biosynthetic pathways (Ala, Ser, Thr and Val) in which selection for low cognate bias appears either to be weaker or to be masked by other requirements.

Ala and Val biosynthesis

Of the 20 amino acids, the biosynthesis of Ala is probably the least well studied. It is likely that more enzymes yet to be identified could contribute to the synthesis of this amino acid. There are no available data that hint at possible explanations for the high cognate bias of Ala biosynthesis in B. subtilis. Regarding the biosynthesis of Val, although the cognate bias is not low, one finds that, for B. subtilis, the enzyme with the lowest cognate bias catalyses the first step committed to Val biosynthesis. Additionally, the second enzyme with the lowest bias is the one that catalyses the first step in the pathway common to the biosynthesis of Leu and Val. This suggests that other factors may be masking the selection for low cognate bias in these biosynthetic pathways.

Ser and Thr biosynthesis

In B. subtilis, the enzyme of the Ser biosynthetic pathway with the lowest cognate bias is encoded by the gene serA, and it is the first enzyme in the pathway. After careful comparative sequence and structure analysis with the homologue from E. coli, we could find no functional justification for the excess of Ser residues in the B. subtilis enzyme. Additional comparative sequence analysis with homologues from other Gram-positive bacteria shows that only two Ser residues are perfectly conserved. If we discount these residues and recalculate the cognate bias, this bias is still around the 40th percentile.

The enzyme of the Thr biosynthetic pathway with the lowest cognate bias is also the first enzyme in the pathway. A sequence alignment of the relevant Thr enzymes from different Gram-positive bacteria shows that there are four fully conserved Thr residues (data not shown), which implies an important functional role for these residues. When they are discounted, the cognate bias for Thr falls below the fifth percentile.

Finally, for all three organisms there are two biosynthetic pathways in which selection for low cognate bias appears to be completely overridden by some unknown mechanism that actually yields a higher-than-average cognate bias.

Gly and Glu biosynthesis

The pathways for Gly and Glu biosynthesis each involve a small number of enzymes (as low as one per pathway, depending on the organism). In the E. coli case, a careful analysis of the three-dimensional crystal structure and of the fully conserved residues between homologous proteins involved in Gly or Glu biosynthesis does not reveal any special function for the relevant amino acid. Therefore, other reasons must account for the higher cognate bias of Gly and Glu biosynthetic pathways. The biosynthetic pathways for these amino acids, which are composed of only one enzyme each, fall into the category of short pathways for which there is less intense selection for low cognate bias. However, this factor alone would not account for their higher-than-average cognate bias.

Correlation between bias in biosynthetic enzymes and environmental abundance of the cognate amino acid

Selection for low cognate bias is expected to be more intense for those amino acid biosynthetic pathways that must undergo the greatest range and frequency of derepression. This is likely to be associated with low and infrequent abundance of the cognate amino acid in the organism’s environment. Although the environments of E. coli, S. typhimurium and B. subtilis are complex, heterogeneous and difficult to characterize, there are data (Table 1) that suggest at least a relative ranking for the abundance of the amino acids in the human colon (a principal habitat of E. coli and S. typhimurium) and in soil (a principal habitat of B. subtilis). In attempting to compare the abundance of a given amino acid in these two environments, it is problematic if its relative abundance is the same in the two environments, either high or low, because these qualitative assessments do not deal with the absolute concentrations. Either environment could have an abundance that is either higher or lower than the other. This is less of a problem if qualitative comparisons are made in cases where the abundance is qualitatively different between environments, for example, high in one environment and low in the other. The qualitative result of such a comparison is likely to be valid, even if there is uncertainty in the absolute concentrations.

To perform such a qualitative comparison of environmental abundance with cognate bias, we apply a qualitative rank correlation test. The amino acids are given a score of one if the abundance is low, two if the abundance is intermediate and three if the abundance is high. Then, for each of the 10 amino acids that have a qualitatively different abundance in the two environments, and for each organism, we identify the enzyme in their biosynthetic pathway that has the lowest cognate bias. For each amino acid we ranked the cognate bias in the three organisms in the following way: when comparing the cognate bias of B. subtilis and E. coli, the lowest ranked organism was given the number 1, the other the number 2. A similar comparison was made between B. subtilis and S. typhimurium (Table 7). We then built a table of pairs of values, where the first element of the pair is the rank of the amino acid in the environment for the relevant organism(s) and the second element is the rank of the cognate bias. Calculating the Spearman rank correlation between the two sets of values for B. subtilis and E. coli suggests a positive correlation between high cognate bias and the amount of amino acid in the environment (r = 0.61, P < 0.004). The rank correlation calculated for B. subtilis and S. typhimurium is even stronger (r = 0.83, P < 0.00025).

Table 7.

Correlation between high bias of the cognate amino acid in the biosynthetic enzymes and high relative concentration of the cognate amino acid in the environment.

	Rank of amino acid concentration^a		Rank of minimum cognate bias^b		Rank of minimum cognate bias^c
Amino Acid	Soil	Colon	B. subtilis	E. coli	B. subtilis	S. typhimurium
Ala	2	1	2	1	2	1
Arg	1	3	1	2	1	2
Asp	2	1	1	2	1	2
Glu	1	3	1	2	1	2
Gly	3	1	1	2	2	1
Lys	1	3	1	2	1	2
Ser	3	1	2	1	2	1
Thr	2	1	2	1	2	1
Trp	1	3	1	2	1	2
Tyr	1	3	1	2	1	2
			Rank correlation (n = 20)^d r = 0.61 (P < 0.004)		Rank correlation (n = 20)^e. r = 0.83 (P < 0.00025)

Open in a new tab

If the concentration of amino acid in the environment is low, then the rank is 1; if it is intermediate, then the rank is 2; if it is high, then the rank is 3.

If cognate bias is lower in B. subtilis, then the rank is 1 for B. subtilis and 2 for E coli; if cognate bias is higher in B. subtilis, then the rank is 2 for B. subtilis and 1 for E. coli.

If cognate bias is lower in B. subtilis, then the rank is 1 for B. subtilis and 2 for S. typhimurium; if cognate bias is higher in B. subtilis, then the rank is 2 for B. subtilis and 1 for S. typhimurium.

Rank correlation calculated using the pair of values (rank of amino acid concentration, rank of minimum cognate bias) for B. subtilis and E. coli.

^e.

Rank correlation calculated using the pair of values (rank of amino acid concentration, rank of minimum cognate bias) for B. subtilis and S. typhimurium.

Table 7 shows that four of the amino acids with cognate bias that is higher in E. coli than in B. subtilis also have a relative abundance that is higher in the colon than in soil. Similarly, three of the amino acids with cognate bias that is higher in B. subtilis than in E. coli also have a relative abundance that is higher in soil than in the colon. Asp is a clear outlier; its cognate bias is higher in E. coli but its relative abundance is higher in soil. The two remaining cases, Gly and Trp, have essentially the same cognate bias in E. coli and B. subtilis, even though the relative abundance of Trp is greater in the colon and that of Gly is greater in soil.

Discussion

Prototrophic microorganisms like E. coli, S. typhimurium and B. subtilis are capable of synthesizing all of the amino acids. A considerable fraction of the bacterial genome is devoted to the encoding of enzymes involved in the biosynthesis of the amino acids (Neidhardt, 1999). However, these organisms exist in changing environments and when they encounter an exogenous source of a particular amino acid they typically repress the enzymes for its endogenous biosynthesis. This creates a particular dilemma when attempting to derepress a pathway for which the cognate amino acid has been depleted.

Cells have evolved various strategies for dealing with a sudden amino acid depletion. Proteases are able to reconfigure the complement of proteins (Reeve et al., 1984; Matin, 1991; Weichart et al., 2003; Nystrom, 2004) and liberate a supply of the limiting amino acid. The stringent response (Foster and Spector, 1995; Magnusson et al., 2003), by shutting down the synthesis of other proteins, and stimulating the synthesis of amino acid biosynthetic enzymes, can contribute to the replenishment of the limiting amino acid. These are likely to be rather general solutions to the problem of protein synthesis and not specific to a particular subset of amino acid biosynthetic enzymes.

Another strategy that addresses the problem of specificity is the following. With lowered amino acid concentrations, there is a shift in charging from isoacceptor tRNAs with lower affinity for the amino acid to ones with higher affinity, thereby allowing those proteins whose mRNA is enriched for the high-affinity isoaccepting species to be synthesized at a faster rate than would be possible without the enrichment. At least 10 of the amino acid biosynthetic pathways in E. coli show this enrichment specifically for the cognate amino acid (Elf et al., 2003). Although differences in the charging of isoacceptor tRNAs can account for the relative usage of the different synonymous codons, these differences cannot fully account for the total relative amount of a given amino acid in the proteins that synthesize that amino acid. Recovery from the repressed state when the exogenous supply of a given amino acid is no longer available would be difficult if the enzymes of the biosynthetic pathway had a composition that was high in the cognate amino acid, independently of the relative codon usage for that amino acid.

In addressing this issue we have hypothesized that the enzymes of specific amino acid biosynthetic pathways, or at least those with the greatest influence on the rate of the pathway, should be biased towards low values of the cognate amino acid when compared with the entire proteome of the organism. In this article, we have presented several lines of evidence that support this cognate bias hypothesis.

First, a computer model of the tryptophan biosynthetic pathway in E. coli showed that derepression of the tryptophan biosynthetic enzymes would be more compromised if Trp residues were more abundant in these enzymes. The results of this simulation (Fig. 2) suggest that the extent and rapidity of response may well be selective pressures responsible for low cognate bias.

Second, the prediction of a direct correlation between molecular activity of amino acid biosynthetic enzymes and their cognate bias was tested by direct calculation using information from databases for enzyme activities and genome sequences. A statistically significant direct correlation between molecular activity and bias is found for cognate (Table 3) but not for non-cognate amino acids (Table 4).

Third, the prediction of an inverse correlation between number of enzymes in the amino acid biosynthetic pathways and their cognate bias was tested by direct calculation using information from databases for metabolic pathways and genome sequences. As expected, there was a statistically significant inverse correlation between pathway length and bias for cognate but not for noncognate amino acids (Table 5).

Fourth, a more detailed enzyme-by-enzyme, pathway-by-pathway and organism-by-organism analysis found strong evidence for low cognate bias in approximately 75% of the amino acid biosynthetic pathways (Fig. 5). For four of the remaining pathways the selection for low bias appears to be masked by other factors, and become evident when the influence of these factors is removed. For example, certain biosynthetic enzymes have their cognate amino acid located at highly conserved positions that are key in determining protein structure and function. When these residues are discounted in the calculation of the cognate bias, the residual composition of the enzyme is distinctly biased towards low values of the cognate amino acid. This is the case for Asp, Phe and Tyr biosynthetic enzymes in E. coli (Fig. S2) and for the first enzyme of the Thr biosynthetic pathway in B. subtilis.

For three cases in B. subtilis the evidence for cognate bias is less clear. As expected, the first enzyme has the lowest cognate bias in the biosynthetic pathway for Ser, Thr and Val. However, only in the case of Thr are there highly conserved cognate residues, which when discounted result in a significantly low cognate bias for the enzyme. In the case of Ala, the enzymes are still poorly characterized and there is no evidence for low cognate bias. In two pathways, Glu and Gly, additional factors appear to completely override the selection for low cognate bias and yield higher-than-average cognate bias.

Clearly, this type of bias is a general principle that applies with varying degrees to any system that exhibits this form of positive feedback. For example, an earlier study has shown that the atomic composition of some biosynthetic enzymes is biased against atoms that are fixed in metabolism by those enzymes (Baudouin-Cornu et al., 2001). Also, preliminary results from the analysis of the E. coli and S. cerevisiae proteomes (R. Alves and A. Salvador, preliminary unpublished results) suggest that proteins involved in detoxification of reactive oxygen species are biased towards low relative content of highly oxydizable amino acid residues, thus allowing these proteins to remain active for longer periods in an oxidizing environment. The same preliminary results clearly indicate that the relative amount of highly oxidyzable amino acid residues in proteins expressed under anaerobic conditions is significantly greater than that in proteins expressed exclusively under aerobic conditions.

Finally, in comparisons of E. coli with S. typhimurium, another closely related enteric Gram-negative organism, and B. subtilis, a more distantly related Gram-positive organism, we have observed differences in the detailed profile of cognate bias (Fig. 5) that might reflect differences in the intensity of selection for low cognate bias. We have argued that selection for low cognate bias is expected to be more intense for those amino acid biosynthetic pathways that must undergo the greatest range and frequency of derepression. This is likely to be associated with low and infrequent abundance of the cognate amino acid in the organism’s environment.

Although there is great heterogeneity in the amino acid measurements, the profile of their relative abundance in the colon appears to exhibit a number of differences from that in soil (Table 1). If one accepts the 10 cases in which there appear to be a qualitative difference, and the argument that a higher relative concentration implies weaker selection for low cognate bias, then one can examine whether these data are consistent with those for cognate bias in Fig. 5. The results of our comparisons show a positive qualitative correlation (Table 7) that further supports the selectionist explanation for low cognate amino acid bias in amino acid biosynthetic enzymes.

In summary, we have presented several lines of evidence showing that cognate bias plays a highly significant role in shaping the amino acid composition for a large class of cellular proteins. The profiles of cognate amino acid bias are similar for two closely related organisms, E. coli and S. typhimurium; they differ for two more distantly related organisms E. coli or S. typhimurium and B. subtilis in ways that show a qualitative relationship to the environments of these organisms. Such differences, if substantiated with a broader group of organisms, may serve as a ‘finger print’ that reflects their different evolutionary history and ecological niche.

Experimental procedures

Model organisms

We use E. coli K12, S. typhimurium and B. subtilis as our model organisms. The proteome and genome information for these organisms was downloaded from the KEGG database release 28.0 (Kanehisa et al., 2002).

Proteins involved in amino acid biosynthesis

We use pathway information available on Ecocyc (Karp et al., 2002), KEGG (Kanehisa et al., 2002), WIT (Overbeek et al., 2000), Herrmann and Somerville (1983) and Neidhardt (1999), and cross-correlate this information to determine the biosynthetic pathway for each amino acid in each of the organisms. Table 2 summarizes this information in terms of gene name, enzyme activity and EC number. Figure 3 shows how the network of amino acid biosynthetic reactions is connected.

Calculation of molecular activity

We have used the database BRENDA and the references therein to obtain estimates for the specific activity of the enzymes involved in amino acid biosynthesis. If this activity was not available for the specific organism of interest, we used the available value for the organism whose protein had the strongest homology to the target enzyme. Molar weight of the enzymes has been estimated by adding the individual weight of all residues of a protein and subtracting the weight of a water mol per peptide bond in the protein. Using this molar weight we converted specific activity into molecular activity (Table 2). Whenever an enzyme is known to be formed by multiple subunits, the molecular weight of the enzyme was calculated by adding the weight of each constituent subunit together.

Analysis of proteome and genome data

The analysis of relative amino acid composition is performed from cDNAs and peptide strings using locally developed PERL scripts.

Statistical analysis of the data

Monte Carlo simulations and statistical analysis of the data are performed using locally developed PERL scripts and Mathematica (Wolfram, 1999) notebooks.

The Spearman rank correlation coefficient determines the existence of non-linear correlations between sets of data (Cohen and Holliday, 1998). This correlation coefficient is given by

r = \frac{\sum_{i = 1}^{n} R (x_{i}) R (y_{i}) - \frac{1}{n} [\sum_{i = 1}^{n} R (x_{i})] [\sum_{i = 1}^{n} R (y_{i})]}{\sqrt{\sum_{i = 1}^{n} R^{2} (x_{i}) - \frac{1}{n} {[\sum_{i = 1}^{n} R (x_{i})]}^{2}} \sqrt{\sum_{i = 1}^{n} R^{2} (y_{i}) - \frac{1}{n} {[\sum_{i = 1}^{n} R (y_{i})]}^{2}}}

(5)

where R(x_i) and R(y_i) represent the rank of x_i and y_i in the sample, respectively, and n is the number of pairs in the sample.

To test the significance of r we use the Fisher z-statistic with the null hypothesis that the correlation coefficient is zero (and thus that there is no correlation in the population for the tested variables). The z-test makes no assumptions about the specific distribution of the data being analysed. It is well known that the variable z, defined as

z = \frac{\sum_{i = 1}^{n} {[R (x_{i}) - R (y_{i})]}^{2} - \frac{n (n - 1) (n + 1)}{6}}{\sqrt{\frac{n^{2} (n - 1) {(n + 1)}^{2}}{36}}} \sim N (0.1)

(6)

has a normal distribution with mean 0 and standard deviation 1. The P-value is calculated by determining the quantile for the absolute value of z in the normal distribution. If P < a (0 < a < 1) there is a likelihood a that the correlation coefficient is in fact 0, i.e. that there is no correlation between the y-values and the x-values in the sample. We have also calculated the t-statistics for the different coefficients. However, here we present only P-values as determined from the z-statistics, because the significance is lower for our samples, thus providing a more conservative estimate of significance.

Kinetic modelling

The kinetic modelling is performed using the program PLAS (Voit and Ferreira, 2000).

Calculating statistical significance of amino acid bias

Determining whether a protein is significantly biased towards low relative composition for any given amino acid is a two-step process. First, we compare the composition of the protein with that of a reference group. Second, we calculate the significance of the difference between the protein and the reference group. We use three different approaches to calculate this significance. Two involve MC simulations to determine the significance of the bias in the relative composition of a protein with respect to a given amino acid in the context of the E. coli proteome, and the third involves an analytical calculation.

In the first MC approach, we randomly generate 1000 protein sequences having the same length as our protein of interest, assuming a relative amino acid composition that is, on average, the same as that of the reference group. The relative composition of the individual protein sequences in this set of 1000 random sequences is then ordered with respect to the amino acid of interest. Finally, our protein is considered to be significantly biased towards low levels of a given amino acid if its composition with respect to that amino acid is lower than that of 90% of the random protein sequences in our set, i.e. if it is below the 10th percentile of bias.

In the second MC approach, we draw from the reference group of proteins, randomly, a set of 1000 proteins, allowing for repetition. We then order the relative composition of this set of random proteins with respect to the amino acid of interest. Our protein is considered to be significantly biased towards low levels of a given amino acid if its composition with respect to that amino acid is lower than that of 90% of the random proteins in our set, i.e. if it is below the 10th percentile of bias.

The third approach to estimating significance involves an analytical calculation. Consider a protein P of length L with a relative composition of c₁, . . . , c₂₀ for each of the 20 amino acid types. The average composition for the set of the control proteins is given by p₁, . . . , p₂₀, with $Σ_{i = 1}^{20} p_{i} = 1$ . If the relative composition of a protein with respect to a given amino acid is independent of all other amino acids (which must be verified), then the probability that a protein of length L (belonging to a set of proteins that, on average, has a relative amount p_i of amino acid i) has N residues of amino acid i is given by

p_{i} (N) = \frac{L!}{N! (L - N)!} p_{i}^{N} {(1 - p_{i})}^{L - N}

(7)

The cumulative probability that a protein of L residues has no more than N residues of amino acid i is then given by

P_{i} (N) = \sum_{r = 1}^{N} \frac{L!}{r! (L - N)!} p_{i}^{r} {(1 - p_{i})}^{L - r}

(8)

Thus if P_i(N) < 0.1 for our protein of interest, there is a 90% chance that the protein is significantly biased towards low values of amino acid i with respect to the control group of proteins.

Homology comparisons of bacterial enzymes

When structural information was unavailable, we have performed sequence homology studies to investigate the possibility that a sufficient number of cognate residues might be involved in important functional roles. To evaluate this possibility, we used PSI-BLAST (Altschul et al., 1997) to search for all the bacterial homologues of the relevant protein in the SWISSPROT database (Boeckmann et al., 2003) that are both classified as having the same function and have an E-value smaller than 10^-4. These sequences were then aligned using CLUSTALW (Chenna et al., 2003) and conservation of cognate residues was studied.

Supplementary Material

Supplementary Table 1

NIHMS2915-supplement-TableS1.pdf^{(52.5KB, pdf)}

Supplementary Figure 1A

NIHMS2915-supplement-FigS1A.pdf^{(87.7KB, pdf)}

Supplementary Figure 1B

NIHMS2915-supplement-FigS1B.pdf^{(36.3KB, pdf)}

Supplementary Figure 1C

NIHMS2915-supplement-FigS1C.pdf^{(48.6KB, pdf)}

Supplementary Figure 2A

NIHMS2915-supplement-FigS2A.pdf^{(466.6KB, pdf)}

Supplementary Figure 2B

NIHMS2915-supplement-FigS2B.pdf^{(329.1KB, pdf)}

Acknowledgements

We thank Dr Armindo Salvador for a critical review of an earlier version of this manuscript and for fruitful discussions. We thank three anonymous reviewers for suggestions that improved the clarity of this article. This work was supported in part by a grant to M.A.S. from the US Public Health Service (RO1-GM30054) and fellowships to R.A. from the Spanish Ministerio de Educacion, Cultura y Deporte (SB2000-031) and the Portuguese FCT (BPD 11533/2002).

Footnotes

The cognate bias of a protein after removing a given residue needs to be recalculated. Using AspC as an example, the recalculation is performed the following way. (i) Calculate the number of Asp residues in an average protein with the same length as AspC. (ii) Discount the number of residues with established functional roles (three residues in this case) from both AspC and the average proteins and recalculate the average frequency of the different amino acids. (iii) Use these new probabilities to recalculate the cognate bias of the protein. In other words, we remove the same number of Asp residues from the control proteins and recalculate the probabilities.

A procedure similar to that described in the previous footnote for Asp is used to recalculate the bias of proteins containing Tyr and Phe residues that are functionally important. In the case of these two amino acids, the functional residues are, in many cases, involved in catalysis. Therefore, one could also discount all conserved Tyr or Phe residues from the active centre of enzymes to recalculate the probabilities of amino acid occurrence. However, the structure for most enzymes is still unknown and so is the actual composition of their active centres which precludes such an approximation at this time.

References

Akashi H, Gojobori T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci USA. 2002;99::3695–3700. doi: 10.1073/pnas.062526999. [DOI] [PMC free article] [PubMed] [Google Scholar]
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25::3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baudouin-Cornu P, Surdin-Kerjan Y, Marliere P, Thomas D. Molecular evolution of protein atomic composition. Science. 2001;293::297–300. doi: 10.1126/science.1061052. [DOI] [PubMed] [Google Scholar]
Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E. The SWISSPROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003;31::365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003;31::3497–3500. doi: 10.1093/nar/gkg500. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cohen L, Holliday M. Statistics for the SocialScientists. Addison-Wesley; New York: 1998. [Google Scholar]
Cootes AP, Curmi PM, Cunningham R, Donnelly C, Torda AE. The dependence of amino acid pair correlation on structural environment. Proteins: Struct Function Genet. 1998;32::175–189. [PubMed] [Google Scholar]
Dufton MJ. Genetic code synonym quotas and amino acid complexity: cutting the cost of proteins? J Theor Biol. 1997;187::165–173. doi: 10.1006/jtbi.1997.0443. [DOI] [PubMed] [Google Scholar]
Elf J, Nilsson D, Tenson T, Ehrenberg M. Selective charging of tRNA isoacceptors explains patterns of codon usage. Science. 2003;300::1718–1722. doi: 10.1126/science.1083811. [DOI] [PubMed] [Google Scholar]
Foster JW, Spector MP. How Salmonella survive against the odds. Ann Rev Microbiol. 1995;49::145–174. doi: 10.1146/annurev.mi.49.100195.001045. [DOI] [PubMed] [Google Scholar]
Frey PA. Radical mechanisms of enzymatic catalysis. Ann Rev Biochem. 2001;70::121–148. doi: 10.1146/annurev.biochem.70.1.121. [DOI] [PubMed] [Google Scholar]
Herrmann KM, Somerville RL. Amino Acids:Biosynthesis and Genetic Regulation. Addison-Wesley Publishing; Reading, MA: 1983. [Google Scholar]
Jansen R, Gerstein M. Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. Nucleic Acids Res. 2000;28::1481–1488. doi: 10.1093/nar/28.6.1481. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kanehisa M, Goto S, Sato K, Fijibuchi W, Nakaya A. The KEGG database at Genomenet. Nucleic Acids Res. 2002;30::42–46. doi: 10.1093/nar/30.1.42. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karlin S, Bucher P. Correlation analysis of amino acid usage in protein classes. Proc Natl Acad Sci USA. 1992;89::12165–12169. doi: 10.1073/pnas.89.24.12165. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karp PD, Riley M, Saier M, Paulsen IT, Paley S, Pellegrini-Toole A. The Ecocyc Database. Nucleic Acids Res. 2002;30::56–58. doi: 10.1093/nar/30.1.56. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khodursky AB, Peter BJ, Cozzarelli NR, Botstein D, Brown PO, Yanofsky C. DNA microarray analysis of gene expression in response to physiological and genetic changes that affect tryptophan metabolism in Escherichia coli. Proc Natl Acad Sci USA. 2000;97::12170–12175. doi: 10.1073/pnas.220414297. [DOI] [PMC free article] [PubMed] [Google Scholar]
King JL, Jukes TH. Non darwinian evolution. Science. 1969;164::788–798. doi: 10.1126/science.164.3881.788. [DOI] [PubMed] [Google Scholar]
Li W-H. Molecular Evolution. Sinauer Associates; New York: 1997. [Google Scholar]
Lobry JR. Influence of genomic G+C content on average amino acid composition of proteins from 59 bacterial species. Gene. 1997;205::309–316. doi: 10.1016/s0378-1119(97)00403-4. [DOI] [PubMed] [Google Scholar]
Lobry JR, Gautier C. Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. Nucleic Acids Res. 1994;22::3174–3180. doi: 10.1093/nar/22.15.3174. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maaløe O, Kjeldgaard NO. Control of Macromolecular Synthesis; a Study of DNA, RNA, and Protein Synthesis in Bacteria. W.A. Benjamin; New York: 1966. [Google Scholar]
Magnusson LU, Nystrom T, Farewell A. Underproduction of sigma 70 mimics a stringent response. A proteome approach. J Biol Chem. 2003;278::968–973. doi: 10.1074/jbc.M209881200. [DOI] [PubMed] [Google Scholar]
Matin A. The molecular basis of carbon-starvation-induced general resistance in Escherichia coli. Mol Microbiol. 1991;5::3–10. doi: 10.1111/j.1365-2958.1991.tb01819.x. [DOI] [PubMed] [Google Scholar]
Mazel D, Marliere P. Adaptive eradication of methionine and cysteine from cyanobacterial light-harvesting proteins. Nature. 1989;341::245–248. doi: 10.1038/341245a0. [DOI] [PubMed] [Google Scholar]
Neidhardt FC. Escherichia coli and Salmonella: Cellular and Molecular Biology. American Society for Microbiology; Washington, DC: 1999. [Google Scholar]
Neidhardt FC, Ingraham JL, Schaechter M. Physiology of the Bacterial Cell: A Molecular Approach. Sinauer Ass; Sunderland, MA: 1990. [Google Scholar]
Nystrom T. Stationary-phase physiology. Ann Rev Microbiol. 2004;58::161–181. doi: 10.1146/annurev.micro.58.030603.123818. [DOI] [PubMed] [Google Scholar]
Overbeek R, Larsen N, Pusch G, D’Souza M, Selkov E, Jr, Kyrpides N, et al. WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res. 2000;28::123–125. doi: 10.1093/nar/28.1.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pedersen JZ, Finazzi-Agro A. Protein-radical enzymes. FEBS Lett. 1993;325::53–58. doi: 10.1016/0014-5793(93)81412-s. [DOI] [PubMed] [Google Scholar]
Pramanik J, Keasling JD. Effect of Escherichia coli biomass composition on central metabolic fluxes predicted by a stoichiometric model. Biotech Bioeng. 1998;60::230–238. doi: 10.1002/(sici)1097-0290(19981020)60:2<230::aid-bit10>3.0.co;2-q. [DOI] [PubMed] [Google Scholar]
Reeve CA, Bockman AT, Matin A. Role of protein degradation in the survival of carbon-starved Escherichia coli and Salmonella typhimurium. J Bacteriol. 1984;157::758–763. doi: 10.1128/jb.157.3.758-763.1984. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richmond RC. Non darwinian evolution: a critique. Nature. 1970;225::1025–1028. doi: 10.1038/2251025a0. [DOI] [PubMed] [Google Scholar]
Rogers MS, Dooley DM. Copper-tyrosyl radical enzymes. Curr Opin Chem Biol. 2003;7::189–196. doi: 10.1016/s1367-5931(03)00024-3. [DOI] [PubMed] [Google Scholar]
Sauer U, Hatzimanikatis V, Hohmann HP, Manneberg M, van Loon AP, Bailey JE. Physiology and metabolic fluxes of wild-type and riboflavinproducing Bacillus subtilis. Appl Environ Microbiol. 1996;62::3687–3696. doi: 10.1128/aem.62.10.3687-3696.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
Savageau MA. Escherichia coli habitats, cell types, and molecular mechanisms of gene control. Am Nat. 1983;122::732–744. [Google Scholar]
Seligmman H. Cost-minimization of amino acid usage. J Mol Evol. 2003;56::151–161. doi: 10.1007/s00239-002-2388-z. [DOI] [PubMed] [Google Scholar]
Singer GA, Hickey DA. Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. Mol Biol Evol. 2000;17::1581–1588. doi: 10.1093/oxfordjournals.molbev.a026257. [DOI] [PubMed] [Google Scholar]
Trifonov EN. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. J Mol Biol. 1987;194::643–652. doi: 10.1016/0022-2836(87)90241-5. [DOI] [PubMed] [Google Scholar]
Voit EO, Ferreira AEN. Computational Analysis of Biochemical Systems, a Practical Guide for Biochemists and Molecular Biologists. Cambridge University Press; Cambridge, UK: 2000. [Google Scholar]
Weichart D, Querfurth N, Dreger M, Hengge-Aronis R. Global role for ClpP-containing proteases in stationary phase adaptation of Escherichia coli. J Bacteriol. 2003;185::115–125. doi: 10.1128/JB.185.1.115-125.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wolfram S. The Mathematica Book. Cambridge University Press; New York: 1999. [Google Scholar]
Xiu Z-L, Chang Z-Y, Zeng A-P. Nonlinear dynamics of regulation of bacterial trp operon: model analysis of integrated effects of repression, feedback inhibition, and attenuation. Biotechnol Prog. 2002;18::686–693. doi: 10.1021/bp020052n. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table 1

NIHMS2915-supplement-TableS1.pdf^{(52.5KB, pdf)}

Supplementary Figure 1A

NIHMS2915-supplement-FigS1A.pdf^{(87.7KB, pdf)}

Supplementary Figure 1B

NIHMS2915-supplement-FigS1B.pdf^{(36.3KB, pdf)}

Supplementary Figure 1C

NIHMS2915-supplement-FigS1C.pdf^{(48.6KB, pdf)}

Supplementary Figure 2A

NIHMS2915-supplement-FigS2A.pdf^{(466.6KB, pdf)}

Supplementary Figure 2B

NIHMS2915-supplement-FigS2B.pdf^{(329.1KB, pdf)}

[R1] Akashi H, Gojobori T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci USA. 2002;99::3695–3700. doi: 10.1073/pnas.062526999. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25::3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Baudouin-Cornu P, Surdin-Kerjan Y, Marliere P, Thomas D. Molecular evolution of protein atomic composition. Science. 2001;293::297–300. doi: 10.1126/science.1061052. [DOI] [PubMed] [Google Scholar]

[R4] Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E. The SWISSPROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003;31::365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003;31::3497–3500. doi: 10.1093/nar/gkg500. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Cohen L, Holliday M. Statistics for the SocialScientists. Addison-Wesley; New York: 1998. [Google Scholar]

[R7] Cootes AP, Curmi PM, Cunningham R, Donnelly C, Torda AE. The dependence of amino acid pair correlation on structural environment. Proteins: Struct Function Genet. 1998;32::175–189. [PubMed] [Google Scholar]

[R8] Dufton MJ. Genetic code synonym quotas and amino acid complexity: cutting the cost of proteins? J Theor Biol. 1997;187::165–173. doi: 10.1006/jtbi.1997.0443. [DOI] [PubMed] [Google Scholar]

[R9] Elf J, Nilsson D, Tenson T, Ehrenberg M. Selective charging of tRNA isoacceptors explains patterns of codon usage. Science. 2003;300::1718–1722. doi: 10.1126/science.1083811. [DOI] [PubMed] [Google Scholar]

[R10] Foster JW, Spector MP. How Salmonella survive against the odds. Ann Rev Microbiol. 1995;49::145–174. doi: 10.1146/annurev.mi.49.100195.001045. [DOI] [PubMed] [Google Scholar]

[R11] Frey PA. Radical mechanisms of enzymatic catalysis. Ann Rev Biochem. 2001;70::121–148. doi: 10.1146/annurev.biochem.70.1.121. [DOI] [PubMed] [Google Scholar]

[R12] Herrmann KM, Somerville RL. Amino Acids:Biosynthesis and Genetic Regulation. Addison-Wesley Publishing; Reading, MA: 1983. [Google Scholar]

[R13] Jansen R, Gerstein M. Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. Nucleic Acids Res. 2000;28::1481–1488. doi: 10.1093/nar/28.6.1481. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Kanehisa M, Goto S, Sato K, Fijibuchi W, Nakaya A. The KEGG database at Genomenet. Nucleic Acids Res. 2002;30::42–46. doi: 10.1093/nar/30.1.42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Karlin S, Bucher P. Correlation analysis of amino acid usage in protein classes. Proc Natl Acad Sci USA. 1992;89::12165–12169. doi: 10.1073/pnas.89.24.12165. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Karp PD, Riley M, Saier M, Paulsen IT, Paley S, Pellegrini-Toole A. The Ecocyc Database. Nucleic Acids Res. 2002;30::56–58. doi: 10.1093/nar/30.1.56. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Khodursky AB, Peter BJ, Cozzarelli NR, Botstein D, Brown PO, Yanofsky C. DNA microarray analysis of gene expression in response to physiological and genetic changes that affect tryptophan metabolism in Escherichia coli. Proc Natl Acad Sci USA. 2000;97::12170–12175. doi: 10.1073/pnas.220414297. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] King JL, Jukes TH. Non darwinian evolution. Science. 1969;164::788–798. doi: 10.1126/science.164.3881.788. [DOI] [PubMed] [Google Scholar]

[R19] Li W-H. Molecular Evolution. Sinauer Associates; New York: 1997. [Google Scholar]

[R20] Lobry JR. Influence of genomic G+C content on average amino acid composition of proteins from 59 bacterial species. Gene. 1997;205::309–316. doi: 10.1016/s0378-1119(97)00403-4. [DOI] [PubMed] [Google Scholar]

[R21] Lobry JR, Gautier C. Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. Nucleic Acids Res. 1994;22::3174–3180. doi: 10.1093/nar/22.15.3174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Maaløe O, Kjeldgaard NO. Control of Macromolecular Synthesis; a Study of DNA, RNA, and Protein Synthesis in Bacteria. W.A. Benjamin; New York: 1966. [Google Scholar]

[R23] Magnusson LU, Nystrom T, Farewell A. Underproduction of sigma 70 mimics a stringent response. A proteome approach. J Biol Chem. 2003;278::968–973. doi: 10.1074/jbc.M209881200. [DOI] [PubMed] [Google Scholar]

[R24] Matin A. The molecular basis of carbon-starvation-induced general resistance in Escherichia coli. Mol Microbiol. 1991;5::3–10. doi: 10.1111/j.1365-2958.1991.tb01819.x. [DOI] [PubMed] [Google Scholar]

[R25] Mazel D, Marliere P. Adaptive eradication of methionine and cysteine from cyanobacterial light-harvesting proteins. Nature. 1989;341::245–248. doi: 10.1038/341245a0. [DOI] [PubMed] [Google Scholar]

[R26] Neidhardt FC. Escherichia coli and Salmonella: Cellular and Molecular Biology. American Society for Microbiology; Washington, DC: 1999. [Google Scholar]

[R27] Neidhardt FC, Ingraham JL, Schaechter M. Physiology of the Bacterial Cell: A Molecular Approach. Sinauer Ass; Sunderland, MA: 1990. [Google Scholar]

[R28] Nystrom T. Stationary-phase physiology. Ann Rev Microbiol. 2004;58::161–181. doi: 10.1146/annurev.micro.58.030603.123818. [DOI] [PubMed] [Google Scholar]

[R29] Overbeek R, Larsen N, Pusch G, D’Souza M, Selkov E, Jr, Kyrpides N, et al. WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res. 2000;28::123–125. doi: 10.1093/nar/28.1.123. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Pedersen JZ, Finazzi-Agro A. Protein-radical enzymes. FEBS Lett. 1993;325::53–58. doi: 10.1016/0014-5793(93)81412-s. [DOI] [PubMed] [Google Scholar]

[R31] Pramanik J, Keasling JD. Effect of Escherichia coli biomass composition on central metabolic fluxes predicted by a stoichiometric model. Biotech Bioeng. 1998;60::230–238. doi: 10.1002/(sici)1097-0290(19981020)60:2<230::aid-bit10>3.0.co;2-q. [DOI] [PubMed] [Google Scholar]

[R32] Reeve CA, Bockman AT, Matin A. Role of protein degradation in the survival of carbon-starved Escherichia coli and Salmonella typhimurium. J Bacteriol. 1984;157::758–763. doi: 10.1128/jb.157.3.758-763.1984. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Richmond RC. Non darwinian evolution: a critique. Nature. 1970;225::1025–1028. doi: 10.1038/2251025a0. [DOI] [PubMed] [Google Scholar]

[R34] Rogers MS, Dooley DM. Copper-tyrosyl radical enzymes. Curr Opin Chem Biol. 2003;7::189–196. doi: 10.1016/s1367-5931(03)00024-3. [DOI] [PubMed] [Google Scholar]

[R35] Sauer U, Hatzimanikatis V, Hohmann HP, Manneberg M, van Loon AP, Bailey JE. Physiology and metabolic fluxes of wild-type and riboflavinproducing Bacillus subtilis. Appl Environ Microbiol. 1996;62::3687–3696. doi: 10.1128/aem.62.10.3687-3696.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Savageau MA. Escherichia coli habitats, cell types, and molecular mechanisms of gene control. Am Nat. 1983;122::732–744. [Google Scholar]

[R37] Seligmman H. Cost-minimization of amino acid usage. J Mol Evol. 2003;56::151–161. doi: 10.1007/s00239-002-2388-z. [DOI] [PubMed] [Google Scholar]

[R38] Singer GA, Hickey DA. Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. Mol Biol Evol. 2000;17::1581–1588. doi: 10.1093/oxfordjournals.molbev.a026257. [DOI] [PubMed] [Google Scholar]

[R39] Trifonov EN. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. J Mol Biol. 1987;194::643–652. doi: 10.1016/0022-2836(87)90241-5. [DOI] [PubMed] [Google Scholar]

[R40] Voit EO, Ferreira AEN. Computational Analysis of Biochemical Systems, a Practical Guide for Biochemists and Molecular Biologists. Cambridge University Press; Cambridge, UK: 2000. [Google Scholar]

[R41] Weichart D, Querfurth N, Dreger M, Hengge-Aronis R. Global role for ClpP-containing proteases in stationary phase adaptation of Escherichia coli. J Bacteriol. 2003;185::115–125. doi: 10.1128/JB.185.1.115-125.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] Wolfram S. The Mathematica Book. Cambridge University Press; New York: 1999. [Google Scholar]

[R43] Xiu Z-L, Chang Z-Y, Zeng A-P. Nonlinear dynamics of regulation of bacterial trp operon: model analysis of integrated effects of repression, feedback inhibition, and attenuation. Biotechnol Prog. 2002;18::686–693. doi: 10.1021/bp020052n. [DOI] [PubMed] [Google Scholar]

PERMALINK

Evidence of Selection for Low Cognate Amino Acid Bias in Amino Acid Biosynthetic Enzymes

Rui Alves

Michael A Savageau

Summary

Introduction

Results

Amino acid composition

Table 1.

Verifying two basic assumptions

Effect of cognate bias on time for amino acid recovery

Fig. 1.

Fig. 2.

Correlation between cognate bias and molecular activity

Fig. 3.

Table 2.

Table 3.

Table 4.

Correlation between cognate bias and pathway length

Table 5.

Profiles of cognate bias in amino acid biosynthetic pathways

Fig. 4.

Profiles of cognate bias in individual amino acid biosynthetic enzymes

Fig. 5.

Table 6.

Asp biosynthesis

Phe and Tyr biosynthesis

Ala and Val biosynthesis

Ser and Thr biosynthesis

Gly and Glu biosynthesis

Correlation between bias in biosynthetic enzymes and environmental abundance of the cognate amino acid

Table 7.

Discussion

Experimental procedures

Model organisms

Proteins involved in amino acid biosynthesis

Calculation of molecular activity

Analysis of proteome and genome data

Statistical analysis of the data

Kinetic modelling

Calculating statistical significance of amino acid bias

Homology comparisons of bacterial enzymes

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases