Stoichiogenomics: the evolutionary ecology of macromolecular elemental composition

James J Elser; Claudia Acquisti; Sudhir Kumar

doi:10.1016/j.tree.2010.10.006

. Author manuscript; available in PMC: 2012 Jan 1.

Published in final edited form as: Trends Ecol Evol. 2010 Nov 18;26(1):38–44. doi: 10.1016/j.tree.2010.10.006

Stoichiogenomics: the evolutionary ecology of macromolecular elemental composition

James J Elser ¹, Claudia Acquisti ^1,^2,^*, Sudhir Kumar ^1,²

PMCID: PMC3010507 NIHMSID: NIHMS249318 PMID: 21093095

Abstract

The new field of “stoichiogenomics” integrates evolution, ecology, and bioinformatics to reveal surprising patterns of differential usage of key elements (e.g., nitrogen, N) in proteins and nucleic acids. Because the canonical amino acids as well as nucleotides differ in element counts, natural selection due to limited element supplies might bias monomer usage to reduce element costs. For example, proteins that respond to N limitation in microbes use a lower proportion of N-rich amino acids, whereas proteome- and transcriptome-wide element contents differ significantly for plants as compared to animals, likely because of differential severity of element limitations. In this review, we show that with these findings, new directions for future investigations are emerging, particularly via the increasing availability of diverse meta-genomic and meta-transcriptomic data sets.

Natural selection, primary sequence evolution, and the role of key resources

In the last 60 years of DNA-centered research, a multitude of forces have been identified that shape the evolutionary landscape of molecules of inheritance (nucleic acids) and their downstream products (e.g., proteins and ribosomal RNA). These forces act on individual mutations in the blueprint of life (primarily DNA, except for a few viruses), which are subjected to the sieve of natural selection and to the whims of random genetic drift within populations. A vast majority of mutations at functionally important DNA positions experience negative selective pressures either directly or via the product they encode [1], [2], [3]. Classically, this natural selection acting directly on DNA and protein sequences has been concerned with the requirements of biochemical function, structure, and/or the cellular milieu of DNA and protein sequences, including tRNA abundance and energy synthesis costs.

Historically, there has been lack of appreciation and understanding of natural selection on the genome sequence of an organism exerted by its ecological context, i.e., the environment in which an organism lives. In some interpretations of the classical framework, particularly in the context of eukaryotic and multicellular biota, it is as if the primary structure of the genome were insulated from ecological and environmental factors. However, eco-environmental natural selection can act directly on genomes because the availability and synthesis of the genome and protein building blocks (nucleotides and amino acids) are dictated by the presence of key macroelements (e.g., nitrogen, N) in the immediate environment. Mutations that place a greater demand on limited elemental resources will experience purifying selection, both for “junk” and functionally important positions they inflict. This selective pressure is expected to be low but its cumulative effects are potentially detectable if the limitation persists over time. They should be discernable by comparing genomes of species with and without resource-limited environments, because the fixation of natural variation will be under the control of purifying selection, albeit mild, and random genetic drift in the two genomes, respectively.

Some effects of environmental limitations on the primary structure of the genome (i.e. its linear DNA sequence) have been previously documented in prokaryotes and single-celled eukaryotes [4], [5], perhaps because in those species each cell is directly exposed to the environment. However, these studies tend to emphasize the direct effects of the organism’s environment per se (e.g., direct physico-chemical impacts [6]) rather than the ecological context of its resource supplies. Indeed, how nutrient and energy supplies shape protein evolution in microbes and multicellular organisms had remained largely unexplored until recently.

In the following, we consider resource-driven DNA and protein evolution within the broader framework of biological stoichiometry, which is the study of balance of energy and multiple chemical elements in living systems [7]. We evaluate the direct impact of organismal resource constraints on molecular evolution by examining evidence for the existence of element-based monomer usage biases (MUB) in fundamental biological molecules: proteins and nucleic acids. Such biases can arise because different monomers (amino acids and nucleotides) differ substantially in the number of key macroelements (e.g., N, S, and C) they contain. Often the environmental supplies of such elements limit the growth and reproduction of organisms. If such limitations are sufficiently strong or prolonged in the evolutionary history of a species, we would predict that selection would work to reduce the use of these limiting elements as they will hamper production of major biopolymers. Thus, the over-riding theme of this article is to explore the stoichiometric signature of environmental resource limitation on the primary structure of life’s most important molecules.

Evidence for Element-driven Monomer Usage Bias (MUB) in Proteins

The evolutionary costs and benefits of producing a protein involve the interplay of the physiological and ecological domains of biology. Many biota (especially plants, algae, and microbes) face direct limitation of their growth and reproduction by insufficient supplies not only of energy, but also of key nutrient elements (such as C, N, and S) needed for construction of proteins and nucleic acids. Importantly, the bioenergetic costs of synthesis differ for the various amino acids [8] as do the number of C, N, and S atoms needed to build them. This suggests that selection leading to usage bias of different monomers in proteins might occur based not only on energetic costs of construction of different amino acids, but also on their element costs. Furthermore, the ecological environment experienced by different biota will also play a role in determining the strength of selection for increased efficiency of resource use (energy or element) in different contexts. Finally, we would expect this selection to act on some molecules more than others, such as those that are abundant in the cell and those that are used in different metabolic pathways or structures that are closely associated with conditions of nutrient limitation. Following are some of the emerging lines of evidence relevant to these possibilities.

Amino acid usage is element-biased in microbes

Many initial analyses on biased amino acid usage involved microorganisms. For example, an early study [9] showed element-based MUB affecting use of S-containing cysteine as a means to avoid uptake of toxic molecules (selenate and chromate). Subsequently, evidence for element-based MUB associated with direct element limitations in the environment was obtained in proteome-wide studies of microorganisms [10] in which the element costs of enzymes involved in acquisition and processing of a given chemical element (C, N, or S) were compared to the element costs of the overall proteome. The analysis revealed that the enzymes involved in processing of a given chemical element have MUB that results in lower use of that element (Figure 1A). For example, the C assimilatory enzymes in E. coli and S. cerevisiae are unusually low in C compared to the overall proteomes (and to the enzymes involved in processing some other element). Similarly, N assimilatory enzymes are low in N compared to the overall proteome in S. cerevisiae. These observations are interpreted to be adaptive responses to facilitate construction of the cellular components most needed under conditions of resource limitation of a given element.

Examples of element-based monomer usage bias (MUB) in prokaryotes and eukaryotes. (a) Nitrogen-processing enzymes (blue line) in yeast are unusually low in N compared to the average protein in the proteome (black) and to the average protein involved in sulfur-processing (red) [10], suggesting a form of “element-sparing” in molecules that are most important during limitation by a given element. (b) Protein N content declines with gene expression intensity in plants (green) but not in animals (red) [13], an outcome consistent with the fact that plants likely experience more frequent and sustained direct N limitation than do animals. (c) Proteome N-content in domesticated plants (black bars) and plants associated with N-fixing symbionts (green and black) is similar to that in animals (red) and higher in N-content than proteome N-content in undomesticated plants (green) [15], suggesting that release from selection pressures for N conservation has reduced MUB for N conservation in crop plants. Error bars (standard errors) are very small (max: ~0.001) and thus were not included, as they are not discernible on the plot. (d) The proteins involved in anabolic machinery (spliceosome and ribosomes) are higher in nitrogen content than are the proteins associated with catabolic machinery (proteosome, lysosome, and vacuole) for *A. thaliana* (green) and *H. sapiens* (red) [16], a result consistent with an interpretation of N-sparing during nutrient limitation when catabolic processes dominate. (e) In microorganisms, the N:C ratio of a taxon’s proteome is positively correlated with the N:C ratio of its genome [17], suggesting that selection for N conservation may even have been operating during early evolution of the canonical genetic code. Filled circles denote Archaea. Open symbols denote Eubacteria: free-living only, circle; capable of living as animal symbionts, diamond; capable of living as plant symbionts, square; capable of living as animal and plant symbionts, triangle.

This conclusion is supported by more recent work by J. Bragg and A. Wagner [11] who evaluated the visibility of element costs to natural selection for conservation of C, N, S, and P in Saccharomyces. They did so by accounting for the element costs of changes in mRNA production and cellular protein levels associated with all the genes in the genome. They concluded that, under the respective forms of elemental limitation, mutations leading to doubling in expression are visible to natural selection for >90% of the genes in the yeast genome. That is, such mutations resulted in selection coefficients that were 10-times larger than the selection coefficient for which natural selection and drift have a similar effect on the fate of a mutation.

Further work corroborating such findings comes from R.P. Carlson [12], who performed whole-metabolism pathway analysis in E. coli under nutrient sufficiency and nutrient (N)-deficiency. In whole-metabolism pathway analysis, both actual and potential alternative flux pathways leading among metabolic reactants are quantified and compared using available metabolic maps and compared for different growth conditons. He notes that E. coli’s biochemical networks are indeed close to optimized for thermodynamic (energetic) efficiency under nutrient-sufficient conditions but, under nutrient-limitation, E. coli expresses pathways (such as the Entner-Doudoroff pathway) that have low proteome synthesis costs (reflecting accumulated effects of differences in protein copy numbers, protein lengths, and monomer N contents) and that operate quite inefficiently in energetic terms. More analyses of this kind seem promising, given emerging data from functional genomics.

Amino acid usage is also element-biased in multicellular organisms

This work on protein elemental composition was extended to higher organisms by J.J. Elser and colleagues [13], focusing especially on nitrogen and comparing photoautotrophic organisms (“plants”) and metazoan species. This analysis was motivated by an overall expectation that plants have likely experienced longer and more severe N limitation as compared to animals. Thus, plants should show signs of N-targeted MUB. Consistent with this idea, across nine “plant” species (two with complete genome sequences) relative to nine animal species (all with whole genome sequences), plant proteins used an average of 7.1% less N in their side chains compared to animal proteins. While plant and animal proteins potentially differ in other important ways (such as average sequence length) and N-rich amino acids differ in many ways other than in their N-content, many potential confounding factors were systematically evaluated and rejected [13]. Consequently, the observed difference has been interpreted to be a deep imprint of N-limitation at the level of protein primary sequences.

To test these stoichiometric interpretations, Elser and colleagues were the first to examine the element content of proteins as a function of the level of expression of the genes encoding them, taking advantage of existing EST (Expressed Sequence Tags) databases. Consistent with expectations, proteins associated with highly expressed genes in plants are especially low in N compared to those encoded by low-expression genes (Figure 1B). Recent work with some microorganisms (E. coli, S. cerevisiae, and S. pombe) has confirmed this association of element investment with expression level [14], as the C and N contents of proteins were negatively associated with their cellular abundance in each of these species as well (for C, this effect remains even after correcting for energetic costs).

MUB is weakened in crop plants

Domestication of crops, along with fertilization of cultivated soils, represents an ideal natural experiment to investigate the role of environmental nitrogen availability in shaping the element composition of proteins. The nitrogen conservation hypothesis predicts that the massive N enrichment by fertilization of cultivated soils is tantamount to removing the selection pressure exerted by N-limitation in crop species. A similar effect is also expected for plants in symbiosis with N-fixing bacteria (e.g., legumes). Indeed, there are significant signs of decreased N conservation (thus, higher N content) in the proteomes of crops and legumes compared with undomesticated plants [15]. An analysis of the complete proteomes of four crops, two legumes, and three undomesticated plants, has shown that the N content of amino acid residues is more than 7% higher in crops and legumes than in undomesticated plants (Figure 1C). Consistent with the hypothesis of N limitation, these observations suggest that the global N content of the plant proteomes is mainly driven by the importance of N limitation in a species’ evolutionary history, rather than merely a reflection of a phylogenetic group. For example, the dicots M. truncatula and L. japonicum show a N content which is more similar to the monocot crop species Z. mays and to the undomesticated dicot A. thaliana (Figure 1C). This observation constitutes an example of convergent evolution in proteome composition brought on by the relaxation of the resource limitations in independent lineages.

Catabolic proteins are especially low in nitrogen

The role of nitrogen limitation in shaping molecular evolution was further investigated by studying the elemental composition of the metabolic apparatus that is involved in the cellular response to nutritional stress [16]. By examining the elemental composition of major functional classes of proteins involved in anabolic and catabolic processes in multicellular eukaryotic model organisms, it was found that eco-physiological selection for nutrient conservation specifically targets the cellular components of the catabolic machinery, which are highly expressed in response to nutrient limitation (Figure 1D). Thus, it appears that the RNA component of the anabolic machineries (ribosome and spliceosome) underpins the stoichiometric differences between the two apparatus. It is useful to note that, because N-rich amino acids tend to have positive charge, the charge distribution of nucleotides requires a high N content in proteins that have a close physical interaction with nucleic acids. This purely functional point of view indicates high nutritional costs of the processes (such as transcription, translation and genome duplication) that operate during fast growth and nutrient sufficiency. It also reinforces the idea that resource availability and the optimization of nutrient allocation are important factors shaping the molecular architecture of cellular structures.

Element biases are deeply rooted in the genetic code

At least some of the causal basis of element-driven MUB seems to lie deep in the history of life, as evidenced by a positive association between the N:C ratios of amino acids and the N:C ratios of their corresponding nucleic acid codons in the canonical genetic code [17]. This result is suggested to reflect the intrinsic correlations of hydrophobicity of particular amino acids and the hydrophobicity of their corresponding codons. They also documented strong correlations of whole-proteome N:C ratios with whole-genome N:C ratio (correlation = 0.90) across 94 species of fully sequenced microorganisms (Figure 1E). These relationships reflect the underlying stoichiometric correlation between amino acids and codons along with the cross-species variation in genomic GC-content and its known association with amino acid frequencies. As a result, average proteomic and genomic N use was positively correlated with GC-content while genomic and proteomic C use was negatively correlated with GC-content. These findings illustrate that, among microbes at least, there is a deep stoichiometric association of C and N use in the cellular machinery of life. However, it is important to note that the N-based MUB observed in plant proteomes [13] exists above and beyond any such effect of GC-content. It was noted [13] that the GC content of A. thaliana introns is 32% while that of D. melanogaster introns is 37%. Both of these differ from mouse (GC content of 45%), but the estimated N content of fruit fly and mouse proteomes are nearly identical (0.387 and 0.381, respectively). Thus, it is improbable that GC content differences are responsible for the observed differences in proteomic N content between plants and animals [13].

From Amino Acid to Nucleotide Usage Biases

Nucleotide biosynthesis originates with amino acids, so element limitations that affect processing of amino acids should also affect the construction of intronic and other non-coding DNA that rely on these precursors [18]. Indeed, genome-scale investigations of animals and plants have established a link between ecological resource limitations and the element composition of functionally important parts of the genome.

Element-based nucleotide biases also occur, especially in plants

The number of nitrogen atoms per monomer ranges between one and four in amino acids and between two and five in nucleotides, while phosphorus is a fundamental component of nucleotides but not of amino acids. Therefore, an efficient logic of nutrient allocation predicts that nutrient limitations should influence not only the molecular composition of proteins, but also that of genes and transcripts, favoring monomer usage biases that conserve limiting elements in a coordinated manner from genomes to transcriptomes to proteomes. However, biases affecting nitrogen use are expected to be smallest in DNA because among-nucleotide variation is buffered by the complementarity rules of the double helix (A and T entail 7 N atoms while G and C entail 8 N atoms). Single stranded RNA shows a larger palette of possible N content per nucleotide with purines (adenine and guanine, 5 N atoms each) containing more nitrogen than pyrimidines (cytosine 3 N atoms, and uracil 2 N atoms). Furthermore, RNA generally contributes 5–10 times more than DNA to cellular biomass [19]. Therefore, an efficient strategy of N and P conservation should primarily affect the transcribed portions of the genome. Recent work [15] has shown that this is indeed the case, reporting that the local composition of introns in plants promotes overall nitrogen conservation via nucleotide biases in the transcribed strand (Figure 2). Thus, chronic nutrient limitation in plants has influenced the elemental composition of the transcriptome. Moreover, consistent with the predictions of natural selection for N-conservation and consistent with previous observations of MUB in proteins, crop transcriptomes show higher nitrogen content than those of “wild” plants, reflecting relaxation of natural selection for N conservation in response to the extensive use of nitrogen-rich fertilizers and other effects of domestication [15].

N conservation effect size (as a percentage) in the vascular plant *Arabidopsis thaliana* compared to *Homo sapiens* and *Drosophila melanogaster* in DNA, RNA, and proteins [15]. In this analysis, for each molecule the overall monomer N-content in *Arabidopsis* was compared to the monomer N-content for that molecule averaged for both human and fruit fly. As expected, the apparent magnitude of N conservation in plants vs. animals is weak for the genome (due to base-pairing rules) but is strong for both the transcriptome and for the proteome.

Nutrients can also be conserved by shortening polymer length

Because the production of every nucleotide entails nitrogen and phosphorus costs, a regime of nutrient conservation should affect not only the nucleotide composition but also the total number of nucleotides used in transcripts. This predicts that, in autotrophs, purifying selection will prevent lengthening of transcribed regions, while it might allow a substantial expansion of non-transcribed regions because these do not occur the materials costs associated with transcript production. We evaluated this using existing genomic data and showed that indeed Arabidopsis thaliana has a consistent signature of nitrogen and phosphorus conservation in functionally diverse transcribed non-coding sequences (introns, 5'UTRs, and 3'UTRs) due to biased nucleotide composition and reduced length of the transcripts. Although the length of introns is known to correlate with genome size, the average intron length in plants is significantly shorter than in animals when organisms with similar genome size are compared (Figure 3). For example, A. thaliana and D. melanogaster have comparable genome sizes, but the average intron length is over one order of magnitude lower in A. thaliana. Despite their different functional requirements, the same signal is detected in untranslated regions (UTRs): both 5’ and 3’ UTRs exons have a lower N content per nucleotide and are shorter in plants than in animals (Figure 3; t-test p<0.0001). These results further reinforce the view that N and P conservation represents a ubiquitous force shaping non-coding regions of the transcriptome.

Nitrogen conservation in plants illustrated by lower average N content per nucleotide and shorter sequence length (exons) of UTRs in plants vs animals. The mean and standard errors are plotted per organisms for a set of invertebrate, mammal, and plant species. Sequence data were obtained from the UTResource and species with information from at least 50 genes were selected; number of species for 3’ and 5’ UTRs were as follows: mammals (13 and 16), invertebrates (61 and 43), and plants (81 and 68).

Future Directions

The field of stoichiogenomics is in its infancy. Work to date has focused on a relatively narrow range of questions within the still-limited number of species for which comprehensive genomic data with excellent gene annotation are available. This situation is changing rapidly due to the revolutionary advances in DNA sequencing technology that has been accelerating the rate of genome sequencing for different species and individuals within species. Furthermore, these sequence-based techniques can now quantify gene and protein expression more precisely and generate longer sequences from diverse environmental samples (e.g., metagenomics). These advances will facilitate population, ecological, and taxonomic investigations of reciprocal interactions between environmental resource supplies and macromolecular evolution. We note that, since selection for element conservation is most likely to be seen in molecules that are at high abundance in organisms, future studies might prove especially fruitful by combining sequence information with functional data that link macromolecular element composition to the likely abundance of those molecules in the cell. We also caution that achieving a comprehensive understanding of the role of nutrient limitation in affecting macromolecular evolution is likely to be challenging, given the difficulty in assessing the role of nutrient limitation in the evolutionary history of many biota. Nevertheless, we highlight some possibilities below but this list is now limited only by biologists’ imaginations.

Metagenomics is providing a major arena for new stoichiogenomic studies

Due to the revolution in high throughput sequencing, scientists increasingly have access to large volumes of genomic data (both DNA and RNA) for naturally occurring biota [20], especially microorganisms [21]. Often these data are obtained in conjunction with monitoring of key environmental variables (see [22]), such as temperature and chemical conditions (e.g., pH, salinity, and nutrient [N, P] supplies). These data offer an unprecedented opportunity to connect patterns of genomic evolution to key ecological drivers and consequences. For example, do the proteomes and transcriptomes of the microbes that dominate under extremely low productivity conditions in the ocean bear signs of MUB that conserve energy and key nutrient resources? The biochemical versatility of the microbes that occupy low-nutrient environments is already demonstrated in remarkable features, as in the ability of various open-ocean cyanobacteria (Synechococcus, Prochlorococcus, Crocosphaera, and Trichodesmium) to completely substitute sulfolipids for phospholipids under P-limited growth conditions [23]. Is such biochemical fine-tuning reflected also in the proteomes and transcriptomes of such species? Under resource-limiting conditions, we would expect features such as preferential use of amino acids with low bioenergetic construction costs and with low N contents, perhaps together with shorter transcripts and shorter polymers used in various structures (e.g., rRNAs and ribosomal proteins). Consistent with this, recent analyses have indeed shown that differences in nutrient availability in sea surface versus deep sea environments substantially affect the average proteome elemental composition in strains of Prochlorococcus adapted to different depths, as strains isolated from deep water layers with low-light (low photosynthesis), high-N had proteomic MUB in favor of low-C and high-N amino acids while the opposite was observed for strains from surface waters (where light is abundant and N is low) [24]. Also informative would be similar studies of soil microbes performed over chrono-sequences during which the overall and relative availabilities of nitrogen and phosphorus change. One can also imagine similar stoichiogenomic-inspired studies of the gut microbiome in animals differing in dietary habits (e.g., carnivores vs. herbivores).

Key comparisons will shed light on major evolutionary selection events

Some major evolutionary events seem very attractive as targets for future stoichiogenomic studies. For example, it is well known that domestication produced a suite of correlated genetic changes in the plant taxa adopted by humans for food production [25]. Given that part of the domestication process in-volved modification of soil fertility (e.g., rotation to fresh soils and manuring in the distant past; provision of chemical fertilizers in the recent past), it is likely that crop plants have experienced relaxed selection for nutrient conservation in proteomes and transcriptomes. Indeed, as discussed above, there is some support for this prediction [15] but we are missing data for key comparisons. For example, it would be best to compare domesticated species with their ancestral taxa or closest living relatives, such that, for example, "red rice" (Oryza rufipogon) could be compared with “white rice” (Oryza sativa) or teosinte compared with maize and corn. These comparisons would enable us to evaluate the effects on MUB of such relaxed selection from nutrient limitation combined with selection for high production.

New bioinformatics resources are emerging

As is clear from above, understanding how an organism’s autecology and environment shape the evolution of genes, proteins, and genomes is of fundamental importance in modern biology. Rapid development of DNA and protein sequence databases, knowledge of the basic biological function of individual proteins, and the availability of information characterizing relative gene expression intensities provide unique and exciting opportunities to construct eco-genomic hypotheses and test their predictions. Biologists at large are also becoming interested in understanding how environmental constraints (such as nutrient limitation) shape protein, DNA, and RNA sequences. Such investigations require the analysis of element composition of proteins and genomic segments across a large number of species, which necessitates the availability of bioinformatics knowledge bases at the interface of ecology, evolution, and genomics. As an example, we have recently developed one such resource (www.graspdb.net) for twelve drosophilid species that are evolutionarily closely related but ecologically divergent taxa [26]. This resource will enable biologists to look for element-driven MUB in the genomes of drosophilids having divergent diets or evolutionary histories in strongly contrasting habitats. In general, however, there is a need for flexible and scalable systems that will contain an integrated ecological dataset (characterizing key life history traits of species) and genomic data consisting of protein and DNA compositions. Such resources will provide an easy-to-use tool for the exploration and analysis of stoichiometric characteristics of proteins and genomes that will be freely available to the research community.

Summary and Outlook

In this paper we have reviewed recent findings that show how ecological perspectives can inform molecular evolution analyses and so reveal new patterns in the primary structure of biological molecules. The overall picture is one in which the signs of element limitations can be seen in the monomer usage patterns of the molecules involved in processing limiting elements [10]. This is evident in the proteomic and transcriptomic monomer use patterns for major groups of biota (plants vs. animals) that differ in the severity of direct element limitation 13], and in the monomer profiles associated with different cellular components [16]. It is likely that further patterns will emerge as additional studies proceed. Indeed this seems almost assured, as one of the most exciting aspects of biology in the post-genomic era is the wealth of genome sequence and gene expression data available. However, access to large volumes of information does not guarantee that key insights and discovery will emerge. In fact, such a "data glut" might impede discovery because scientists well informed in biology are not always experts in handling large-scale datasets and computational techniques. For this reason, we advocate the development of biologist-centered tools and databases [27]. In addition, insight is more likely to emerge by the synergistic collaboration of computationally skilled analysts with theoretical and empirical biologists working from clearly articulated conceptual frameworks. Biological stoichiometry is one such framework [7] and its emerging application in studies of macromolecular evolution and population genetics promises future insights for connecting the disparate realms of molecular evolution and ecosystem ecology.

Acknowledgements

This work was supported by a National Science Foundation grant to JJE, SK, and W. Fagan, and a National Institutes of Health grant to SK. W Fagan and three anonymous reviewers provided useful feedback on a draft of the manuscript.

Glossary Box

EST: Expressed Sequence Tag. A short sub-sequence of transcribed DNA. Generally these are analyzed by cloning the associated mRNAs and, thus, can be rapidly assessed and quantified, allowing at least a relative comparison of expression levels of different genes.
MUB: Monomer usage bias. A statistically disproportionate use of individual amino acids (or classes of amino acids) or nucleotides in a protein or nucleic acid sequence compared to some measure of random (equiproportional) use of that monomer.
UTR: Untranscribed Region. These are segments of DNA that flank a gene on the "upstream" (5' UTR) or the "downstream” end (3' UTR).
Stoichiogenomics: The study of the elemental composition of macromolecules (protein, DNA, RNA, etc) using genomics data and bioinformatics tools.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Reference List

1.Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nat. Rev. Genet. 2007;8:610–618. doi: 10.1038/nrg2146. [DOI] [PubMed] [Google Scholar]
2.Kimura M. The Neutral Theory of Molecular Evolution. Cambridge University Press; 1983. [Google Scholar]
3.Subramanian S, Kumar S. Higher intensity of natural selection on >90% of the human genes revealed by the intrinsic replacement mutation rates. Mol. Biol. Evol. 2006;23:2283–2287. doi: 10.1093/molbev/msl123. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bohlin J, Skjerve E. Examination of genome homogeneity in prokaryotes using genomic signatures. PLoS ONE. 2009;4:e8113. doi: 10.1371/journal.pone.0008113. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Wang H-C, et al. On the correlation between genomic G+C content and optimal growth temperature in prokaryotes: Data quality and confounding factors. Biochem. Biophys. Res. Commun. 2006;342:681–684. doi: 10.1016/j.bbrc.2006.02.037. [DOI] [PubMed] [Google Scholar]
6.Bragg JG, et al. Variation among species in proteomic sulphur content is related to environmental conditions. Proc. R. Soc. B. 2006;273:1293–1300. doi: 10.1098/rspb.2005.3441. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Elser JJ, Hamilton AL. Stoichiometry and the new biology: the future is now. PLoS Biol. 2007;5:e181. doi: 10.1371/journal.pbio.0050181. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Akashi H, Gojobori T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci U S A. 2002;99:3695–3700. doi: 10.1073/pnas.062526999. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Scheibel T, et al. S. cerevisiae and sulfur: a unique way to deal with the environment. FASEB J. 1997;11:917–921. doi: 10.1096/fasebj.11.11.9285490. [DOI] [PubMed] [Google Scholar]
10.Baudouin-Cornu P, et al. Molecular evolution of protein atomic composition. Science. 2001;293:297–300. doi: 10.1126/science.1061052. [DOI] [PubMed] [Google Scholar]
11.Bragg JG, Wagner A. Protein material costs: single atoms can make an evolutionary difference. Trends Genet. 2009;25:5–8. doi: 10.1016/j.tig.2008.10.007. [DOI] [PubMed] [Google Scholar]
12.Carlson RP. Metabolic systems cost-benefit analysis for interpreting network structure and regulation. Bioinformatics. 2007;23:1258–1264. doi: 10.1093/bioinformatics/btm082. [DOI] [PubMed] [Google Scholar]
13.Elser JJ, et al. Signatures of ecological resource availability in the animal and plant proteomes. Mol. Biol. Evol. 2006;23 doi: 10.1093/molbev/msl068. 10.1093. [DOI] [PubMed] [Google Scholar]
14.Li N, et al. Low contents of carbon and nitrogen in highly abundant proteins: evidence of selection for the economy of atomic composition. J. Mol. Evol. 2009;68:248–255. doi: 10.1007/s00239-009-9199-4. [DOI] [PubMed] [Google Scholar]
15.Acquisti C, et al. Ecological nitrogen limitation shapes the DNA composition of plant genomes. Mol. Biol. Evol. 2009;26:953–956. doi: 10.1093/molbev/msp038. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Acquisti C, et al. From elements to biological processes: signatures of nitrogen limitation in the elemental composition of the catabolic apparatus. Proc. R. Soc. B. 2009;276:2605–2610. doi: 10.1098/rspb.2008.1960. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Bragg JG, Hyder CL. Nitrogen versus carbon use in prokaryotic genomes and proteomes. Proc. R. Soc. Lond. B. 2004;271:S374–S377. doi: 10.1098/rsbl.2004.0193. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Hessen DO, et al. Genome streamlining and the elemental costs of growth. Trends Ecol. Evol. 2010;25:75–80. doi: 10.1016/j.tree.2009.08.004. [DOI] [PubMed] [Google Scholar]
19.Sterner RW, Elser JJ. Ecological Stoichiometry: The Biology of Elements from Molecules to the Biosphere. Princeton University Press; 2002. [Google Scholar]
20.DeLong EF. The microbial ocean from genomes to biomes. Nature. 2009;459:200–206. doi: 10.1038/nature08059. [DOI] [PubMed] [Google Scholar]
21.Riesenfeld CS, et al. Metagenomics: Genomic analysis of microbial communities. Annu. Rev. Genet. 2004;38:525–552. doi: 10.1146/annurev.genet.38.072902.091216. [DOI] [PubMed] [Google Scholar]
22.Jones N. Undersea project delivers data flood. Nature. 2010;464 doi: 10.1038/4641115a. [DOI] [PubMed] [Google Scholar]
23.Van Mooy B, et al. Phytoplankton in the ocean use non-phosphorus lipids in response to phosphorus scarcity. Nature. 2009;458:69–72. doi: 10.1038/nature07659. [DOI] [PubMed] [Google Scholar]
24.Lv J, et al. Association between the availability of environmental resources and the atomic composition of organismal proteomes: Evidence from Prochlorococcus strains living at different depths. Biochem. Biophys. Res. Commun. 2008;375:241–246. doi: 10.1016/j.bbrc.2008.08.011. [DOI] [PubMed] [Google Scholar]
25.Burke J, et al. Crop evolution: from genetics to genomics. Curr. Opin. Genet. Dev. 2007;17:525–532. doi: 10.1016/j.gde.2007.09.003. [DOI] [PubMed] [Google Scholar]
26.Stark A, et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature. 2007;450:219–232. doi: 10.1038/nature06340. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Kumar S, Dudley J. Bioinformatics software for biologists in the genomics era. Bioinformatics. 2007;23:1713–1717. doi: 10.1093/bioinformatics/btm239. [DOI] [PubMed] [Google Scholar]

[R1] 1.Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nat. Rev. Genet. 2007;8:610–618. doi: 10.1038/nrg2146. [DOI] [PubMed] [Google Scholar]

[R2] 2.Kimura M. The Neutral Theory of Molecular Evolution. Cambridge University Press; 1983. [Google Scholar]

[R3] 3.Subramanian S, Kumar S. Higher intensity of natural selection on >90% of the human genes revealed by the intrinsic replacement mutation rates. Mol. Biol. Evol. 2006;23:2283–2287. doi: 10.1093/molbev/msl123. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Bohlin J, Skjerve E. Examination of genome homogeneity in prokaryotes using genomic signatures. PLoS ONE. 2009;4:e8113. doi: 10.1371/journal.pone.0008113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Wang H-C, et al. On the correlation between genomic G+C content and optimal growth temperature in prokaryotes: Data quality and confounding factors. Biochem. Biophys. Res. Commun. 2006;342:681–684. doi: 10.1016/j.bbrc.2006.02.037. [DOI] [PubMed] [Google Scholar]

[R6] 6.Bragg JG, et al. Variation among species in proteomic sulphur content is related to environmental conditions. Proc. R. Soc. B. 2006;273:1293–1300. doi: 10.1098/rspb.2005.3441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Elser JJ, Hamilton AL. Stoichiometry and the new biology: the future is now. PLoS Biol. 2007;5:e181. doi: 10.1371/journal.pbio.0050181. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Akashi H, Gojobori T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci U S A. 2002;99:3695–3700. doi: 10.1073/pnas.062526999. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Scheibel T, et al. S. cerevisiae and sulfur: a unique way to deal with the environment. FASEB J. 1997;11:917–921. doi: 10.1096/fasebj.11.11.9285490. [DOI] [PubMed] [Google Scholar]

[R10] 10.Baudouin-Cornu P, et al. Molecular evolution of protein atomic composition. Science. 2001;293:297–300. doi: 10.1126/science.1061052. [DOI] [PubMed] [Google Scholar]

[R11] 11.Bragg JG, Wagner A. Protein material costs: single atoms can make an evolutionary difference. Trends Genet. 2009;25:5–8. doi: 10.1016/j.tig.2008.10.007. [DOI] [PubMed] [Google Scholar]

[R12] 12.Carlson RP. Metabolic systems cost-benefit analysis for interpreting network structure and regulation. Bioinformatics. 2007;23:1258–1264. doi: 10.1093/bioinformatics/btm082. [DOI] [PubMed] [Google Scholar]

[R13] 13.Elser JJ, et al. Signatures of ecological resource availability in the animal and plant proteomes. Mol. Biol. Evol. 2006;23 doi: 10.1093/molbev/msl068. 10.1093. [DOI] [PubMed] [Google Scholar]

[R14] 14.Li N, et al. Low contents of carbon and nitrogen in highly abundant proteins: evidence of selection for the economy of atomic composition. J. Mol. Evol. 2009;68:248–255. doi: 10.1007/s00239-009-9199-4. [DOI] [PubMed] [Google Scholar]

[R15] 15.Acquisti C, et al. Ecological nitrogen limitation shapes the DNA composition of plant genomes. Mol. Biol. Evol. 2009;26:953–956. doi: 10.1093/molbev/msp038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Acquisti C, et al. From elements to biological processes: signatures of nitrogen limitation in the elemental composition of the catabolic apparatus. Proc. R. Soc. B. 2009;276:2605–2610. doi: 10.1098/rspb.2008.1960. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Bragg JG, Hyder CL. Nitrogen versus carbon use in prokaryotic genomes and proteomes. Proc. R. Soc. Lond. B. 2004;271:S374–S377. doi: 10.1098/rsbl.2004.0193. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Hessen DO, et al. Genome streamlining and the elemental costs of growth. Trends Ecol. Evol. 2010;25:75–80. doi: 10.1016/j.tree.2009.08.004. [DOI] [PubMed] [Google Scholar]

[R19] 19.Sterner RW, Elser JJ. Ecological Stoichiometry: The Biology of Elements from Molecules to the Biosphere. Princeton University Press; 2002. [Google Scholar]

[R20] 20.DeLong EF. The microbial ocean from genomes to biomes. Nature. 2009;459:200–206. doi: 10.1038/nature08059. [DOI] [PubMed] [Google Scholar]

[R21] 21.Riesenfeld CS, et al. Metagenomics: Genomic analysis of microbial communities. Annu. Rev. Genet. 2004;38:525–552. doi: 10.1146/annurev.genet.38.072902.091216. [DOI] [PubMed] [Google Scholar]

[R22] 22.Jones N. Undersea project delivers data flood. Nature. 2010;464 doi: 10.1038/4641115a. [DOI] [PubMed] [Google Scholar]

[R23] 23.Van Mooy B, et al. Phytoplankton in the ocean use non-phosphorus lipids in response to phosphorus scarcity. Nature. 2009;458:69–72. doi: 10.1038/nature07659. [DOI] [PubMed] [Google Scholar]

[R24] 24.Lv J, et al. Association between the availability of environmental resources and the atomic composition of organismal proteomes: Evidence from Prochlorococcus strains living at different depths. Biochem. Biophys. Res. Commun. 2008;375:241–246. doi: 10.1016/j.bbrc.2008.08.011. [DOI] [PubMed] [Google Scholar]

[R25] 25.Burke J, et al. Crop evolution: from genetics to genomics. Curr. Opin. Genet. Dev. 2007;17:525–532. doi: 10.1016/j.gde.2007.09.003. [DOI] [PubMed] [Google Scholar]

[R26] 26.Stark A, et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature. 2007;450:219–232. doi: 10.1038/nature06340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Kumar S, Dudley J. Bioinformatics software for biologists in the genomics era. Bioinformatics. 2007;23:1713–1717. doi: 10.1093/bioinformatics/btm239. [DOI] [PubMed] [Google Scholar]

PERMALINK

Stoichiogenomics: the evolutionary ecology of macromolecular elemental composition

James J Elser

Claudia Acquisti

Sudhir Kumar

Abstract

Natural selection, primary sequence evolution, and the role of key resources

Evidence for Element-driven Monomer Usage Bias (MUB) in Proteins

Amino acid usage is element-biased in microbes

Figure 1.

Amino acid usage is also element-biased in multicellular organisms

MUB is weakened in crop plants

Catabolic proteins are especially low in nitrogen

Element biases are deeply rooted in the genetic code

From Amino Acid to Nucleotide Usage Biases

Element-based nucleotide biases also occur, especially in plants

Figure 2.

Nutrients can also be conserved by shortening polymer length

Figure 3.

Future Directions

Metagenomics is providing a major arena for new stoichiogenomic studies

Key comparisons will shed light on major evolutionary selection events

New bioinformatics resources are emerging

Summary and Outlook

Acknowledgements

Glossary Box

Footnotes

Reference List

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Stoichiogenomics: the evolutionary ecology of macromolecular elemental composition

James J Elser

Claudia Acquisti

Sudhir Kumar

Abstract

Natural selection, primary sequence evolution, and the role of key resources

Evidence for Element-driven Monomer Usage Bias (MUB) in Proteins

Amino acid usage is element-biased in microbes

Figure 1.

Amino acid usage is also element-biased in multicellular organisms

MUB is weakened in crop plants

Catabolic proteins are especially low in nitrogen

Element biases are deeply rooted in the genetic code

From Amino Acid to Nucleotide Usage Biases

Element-based nucleotide biases also occur, especially in plants

Figure 2.

Nutrients can also be conserved by shortening polymer length

Figure 3.

Future Directions

Metagenomics is providing a major arena for new stoichiogenomic studies

Key comparisons will shed light on major evolutionary selection events

New bioinformatics resources are emerging

Summary and Outlook

Acknowledgements

Glossary Box

Footnotes

Reference List

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases