An improved estimation of tRNA expression to better elucidate the coevolution between tRNA abundance and codon usage in bacteria

Yulong Wei; Jordan R Silke; Xuhua Xia

doi:10.1038/s41598-019-39369-x

. 2019 Feb 28;9:3184. doi: 10.1038/s41598-019-39369-x

An improved estimation of tRNA expression to better elucidate the coevolution between tRNA abundance and codon usage in bacteria

Yulong Wei ^1,^#, Jordan R Silke ^1,^#, Xuhua Xia ^1,^2,^✉

PMCID: PMC6395768 PMID: 30816249

Abstract

The degree to which codon usage can be explained by tRNA abundance in bacterial species is often inadequate, partly because differential tRNA abundance is often approximated by tRNA copy numbers. To better understand the coevolution between tRNA abundance and codon usage, we provide a better estimate of tRNA abundance by profiling tRNA mapped reads (tRNA tpm) using publicly available RNA Sequencing data. To emphasize the feasibility of our approach, we demonstrate that tRNA tpm is consistent with tRNA abundances derived from RNA fingerprinting experiments in Escherichia coli, Bacillus subtilis, and Salmonella enterica. Furthermore, we do not observe an appreciable reduction in tRNA sequencing efficiency due to post-transcriptional methylations in the seven bacteria studied. To determine optimal codons, we calculate codon usage in highly and lowly expressed genes determined by protein per transcript. We found that tRNA tpm is sensitive to identify more translationally optimal codons than gene copy number and early tRNA fingerprinting abundances. Additionally, tRNA tpm improves the predictive power of tRNA adaptation index over codon preference. Our results suggest that dependence of codon usage on tRNA availability is not always associated with species growth-rate. Conversely, tRNA availability is better optimized to codon usage in fast-growing than slow-growing species.

Introduction

Codon optimization is critical to researchers seeking to improve protein production. Early experimental studies have shown that replacing rare codons with optimal ones increases protein yields in Escherichia coli^1,2. The optimal codon within a given family is the most frequently used, especially in highly expressed genes (HEGs)^3–5. In order to explain codon preference, early studies in E. coli^6–8 have shown that codon usage coevolves with tRNA abundance. The availability of tRNAs influences the usage of corresponding codons; conversely, high usage of preferred codons drives up the availability of their decoding tRNAs^6,9.

Additionally, tRNA-mediated codon usage bias has been broadly observed in a variety of organisms, including the gram-negative bacterium Salmonella enterica serovar typhimurium¹⁰, the gram-positive Bacillus subtilis¹¹, eukaryotes such as yeast^10,12, a variety of fungal and invertebrate mitochondrial genomes^13,14, and viruses including HIV¹⁵, and bacteriophages^16,17. Nonetheless, the nature of the relationship between tRNA abundance and codon usage across bacterial species has been the subject of debate among researchers, with some suggesting that tRNA availability is the main driving force of codon usage bias^18,19 and others contending that the two are weakly correlated^20,21.

A number of codon usage indices use tRNA gene copy as proxy of tRNA abundance to identify translationally optimal codons. These include the Codon Bias Index²², Frequency of Optimal Codons (F_op)¹⁰, and tRNA Adaptation Index (tAI)¹⁸. All three of these indices define a translationally optimal codon as one that corresponds to the most abundant isoacceptor tRNA, with Codon Bias Index additionally incorporating gene expression information. Nevertheless, the use of tRNA gene copy is often undesirable. This is exemplified in B. subtilis¹⁸ in which tAI (based on tRNA gene copy) fails to accurately predict codon usage, although codon usage correlates strongly with early experimental tRNA abundance¹¹ and conforms well to the selection-mutation-drift theory^21,23. Moreover, while tRNA copy number is correlated with codon usage in a number of fast-growing bacterial species with a high variation in tRNA gene copy number, slow-growing ones exhibit little tRNA gene redundancy with many tRNA genes existing as a single copy. Resultantly, codon usage in such slow-growing species is poorly predicted by tRNA gene copy number²⁴. For example, Leptospira interrogans and Mycobacterium tuberculosis have only 25 and 45 annotated tRNA genes according to the Genomic tRNA Database (GtRNAdb)²⁵, respectively. Obviously, variation in codon usage cannot be well explained by tRNA gene redundancy if there is little variation in tRNA gene copy number. It stands to reason that the coevolutionary relationship between tRNA abundance and codon usage can be better characterized if tRNA abundance can be measured accurately.

To provide a better estimation for tRNA abundance in bacteria, we employed 14 publicly available RNA Sequencing (RNA-Seq) datasets, two for each of the seven species studied (E. coli, S. enterica, B. subtilis, Bacteroides thetaiotaomicron, L. interrogans, M. tuberculosis, and Synechocystis species). We quantified reads mapped to tRNA genes retrieved from GtRNAdb in transcripts per kilobase million (tpm) using kallisto²⁶. To improve mapping efficacy, we have recently developed a new tool for processing RNA-Seq data, ARSDA²⁷, that stores identical reads as single entries to drastically reduce data storage and computation time for analyzing large RNA-Seq datasets relative to previous methods^28,29. These species were selected because their protein abundance data are available in PaxDb³⁰, their growth rates are described on the basis of generation time (bacteria with >2.5 hour generation times are considered slow growing and all those with lower generation times are fast growing)²⁴, and their RNA-Seq data are available (GEO Datasets).

A known issue with tRNA sequencing via standard Illumina protocols in eukaryotes^31,32 is post-transcriptional methylation occurring at a number of specific tRNA sites. Two recent approaches (DM-tRNA-seq³² and ARM-seq³¹) of tRNA sequencing employ the E. coli derived AlkB demethylase enzyme to efficiently remove N¹-methyladenosine (m¹A), N³-methylcytosine (m³C) and N¹-methylguanosine (m¹G) structural modifications that hinder the activity of cDNA reverse transcriptase. Specifically, ARM-Seq³¹ demonstrates that wild-type AlkB alone is sufficient to remove all three of the aforementioned modifications and generate full length tRNA cDNA. These studies are consistent with a prior investigation that similarly concluded that AlkB could capably demethylate m¹G³³. It is important to note that both studies focus on eukaryotes, and AlkB treatments may not be necessary to remove tRNA methylations in bacteria that naturally encode their own AlkB homologs. Besides E. coli^31–34, several lines of evidence suggest AlkB homologous proteins are present in other bacterial species^35–37. Specifically, AlkB homologs are observed in B. subtilis³⁸, S. enterica, M. tuberculosis^36,39, Synechocystis sp.^36,40, and species in Leptospira^36,39 and Bacteroides⁴¹ genera. Nonetheless, bacterial tRNA sequencing efficiency is a point of investigation in this study.

To our knowledge, our results are the first to show that tRNA quantification by RNA-Seq data is well correlated with early tRNA abundance derived from RNA fingerprinting (hereafter referred as RNA fingerprinting abundance) reported previously in E. coli, S. enterica and B. subtilis^10–12,42. Briefly, determining RNA fingerprinting abundances involve separating radiolabelled RNA by 2D gel electrophoresis followed by quantification of radioactivity. Despite the challenges associated with tRNA sequencing in yeast^31,43 and mammals^31,32,44, our results suggest that tRNA methylation may not appreciably influence tRNA sequencing efficiency in the bacterial species studied herein. We devised an integrated approach to show that tRNA tpm better predicts translationally optimal codons in E. coli than F_op, and improves the predictive power of tAI over codon preference. We found that the dependence of codon preference on tRNA availability is not always stronger in fast-growing species, and optimal codons can be well explained by tRNA content in certain slow-growing species. Conversely, tRNA availability is better optimized to codon usage in highly expressed genes of fast-growing than slow-growing species.

Results

RNA-Seq mappings are consistent with tRNA abundance estimates

To accurately profile tRNA transcripts in tpm, we processed RNA-Seq data to remove adapters and low-quality sequences (See Materials and Methods for more detail) and quantified reads mapped to all unique tRNA sequences (Supplementary File S1) in tRNA tpm. To demonstrate the fidelity of tRNA tpm in bacteria, we compared these values with RNA fingerprinting abundances (Fig. 1, Supplementary File S2) previously reported in E. coli^10,42, S. enterica¹⁰, and B. subtilis¹¹. In all cases, tRNA tpm correlates with RNA fingerprinting abundance (Fig. 1: R² > 0.4, P < 0.05).

Comparison between tRNA tpm and fingerprinting abundance in E. *coli*, S. *enterica*, and B. *subtilis*. In panels (a) the averaged tRNA abundances across five growth phases retrieved from Dong, *et al*.⁴², (b,c) are tRNA abundances retrieved from Ikemura¹⁰, and in (c) the tRNA abundances retrieved from Kanaya, *et al*.¹¹.

Documented tRNA methylation does not appreciably affect tRNA sequencing in bacteria studied

To investigate the potential effects of site-specific tRNA methylation on standard RNA-Seq experiments in bacteria, we visualized the RNA-Seq read depths for all seven species studied (Fig. 2, Supplementary File S1). In E. coli, read depths before and after documented tRNA methylation sites (18, 32, 34, 37, 46, and 54) in GenBank annotation (NC_000913, Supplementary File S1) do not vary substantially, and we observe no partial tRNA mappings (Fig. 2a–d) contrary to the “hard-stops” previously described in both yeast and human tRNAs in the absence of demethylation treatment³¹. Additionally, tpm values associated with the set of tRNAs that can be potentially modified at five or all six documented methylation sites do not differ substantially from the set of tRNAs that can be potentially modified at four or less sites (Fig. 2d,e; two-tailed Student’s t-test with unequal variance: P = 0.477). We define tRNAs that can potentially be methylated at >4 sites as heavily methylated with respect to other tRNAs. This cut-off was chosen because it divides the 50 unique tRNA sequences into two subsets of roughly equal size. Similarly, hard-stops were not observed at documented methylated sites in all six other bacteria studied (Supplementary File S1).

RNA-Seq read map for all 50 unique tRNA sequences in E. *coli*, split in three sets (a–c). Each line represents the read depth of entire sequence region of one unique tRNA sequence, with sites susceptible to methylation (m²G18, m²C32 or m²U32, m⁵U34 or m²C34 or cmo⁵U34, m⁶A37, m⁷G46 and m⁵U54) highlighted red. In (d) the distribution of mapped reads across the entire length of each unique tRNA sequence. Red indicates tRNAs that are potentially methylated at >4 sites (heavily methylated) and all others are highlighted blue. In (e) the distribution of total tRNA tpm in sets of heavily methylated and other tRNAs.

An improved estimation of codon preference using tRNA tpm

To investigate how well tRNA tpm explains codon preference across bacterial lineages, we first designed an integrative approach to determine translationally optimal codons. We use two criteria to define what constitutes a translationally optimal codon: 1) it has the highest relative synonymous codon usage (RSCU)⁴⁵ in HEGs, and 2) its RSCU in HEGs is greater than that in lowly expressed genes (LEGs). These two criteria combine the specifications of optimal codons defined by two previous studies: the first criteria is used by Novoa, et al.¹⁹ and the second is adapted from Rocha²⁴ (HEGs vs. others). These criteria are also consistent with those used in the calculation of the Codon Adaptation Index⁴ and Index of Translation Elongation⁵. By our definition, there can be only one codon that is most translationally optimal within each synonymous group, consistent with the original definition of optimal codon determined by F_op¹⁰. Thus, the characterized translationally optimal codons will increase elongation efficiency relative to others. The availability of protein abundance data in parts per million (ppm) for species studied herein and mRNA transcript abundance (in tpm) determined using kallisto²⁶ enables us to calculate protein per transcript (ppm/tpm) in the identification of HEGs and LEGs (see Materials and Methods for more detail).

To establish readable isoacceptor tRNA content, we consider all cognate and near-cognate interactions, as well as those enabled by anticodon modifications. Most studies^10,11,18,42 consider anticodon-codon cognate (e.g., tRNA^Arg_UGC reading GCA) and near-cognate (e.g., tRNA^Arg_UGC reading GCG) pairings; while some^19,24 also consider pairings allowed due to anticodon modifications (e.g., tRNA^Arg_UmGC reading GCC and GCU) which increases the predictive influence of tRNA content on codon usage¹⁹. In order to explain RSCU, we adapt the idea of Relative tRNA Gene frequency from Novoa, et al.¹⁹ to derive Relative tRNA Usage (RTU) (see Materials and Methods for more detail). By considering tRNA abundance for synonymous codons, RTU improves tRNA tpm as an estimator of tRNA abundance (Fig. 3). In particular, the correlation between tRNA tpm and average tRNA abundance⁴² in E. coli (Fig. 1a; R² = 0.498, P < 0.0001) improves if we consider their RTU values (Fig. 3a: R² = 0.646, P < 0.0001). Furthermore, both RTU values correlate with codon usage (Fig. 3b).

Relationship between (a) tRNA tpm and averaged tRNA abundance from RNA fingerprinting⁴² (tRNA abundance) in E. *coli*, and (b) RTUs and RSCU in E. *coli* HEGs.

Using RTU, we estimate how well translationally optimal codons match codons with the highest tRNA availability by adapting the four rules of codon-anticodon constraint¹⁰. Rule one states that codon usage is constrained by tRNA availability⁴⁶. Rules two to four focus on specific base pairing efficiencies and they describe that cognate codon-anticodon pairs are generally more efficient and preferable relative to near-cognate wobble pairs^22,47–50. Hence, we first rank synonymous codons by highest RTU, and then rank by cognate tRNA abundances (Supplementary File S2). Supplementary Fig. S1 provides a flowchart explaining the approach to predict translationally optimal codons by tRNA availability. For example, applying this two-step identification approach for Threonine codons in E. coli, we first select ACC and ACU since they both have the highest RTU and are both readable by the same tRNAs (tRNA^Thr_GGU and tRNA^Thr_UGU) due to anticodon modification by ADATs¹⁹. Next, we rank codon preference by cognate tRNA abundance: tRNA tpm of the cognate tRNA^Thr_GGU for ACC is 46537.7, and ACU is not decoded by a cognate tRNA. Thus, the predicted optimal codon based on tRNA availability and pairing constraints is ACC in Threonine. Indeed, ACC is the translationally optimal codon for Threonine based on our definition: it has the highest RSCU in HEGs (2.111), and (2) its RSCU HEG is higher than RSCU LEG (1.468). We extended this two-step approach to predict translationally optimal codons in all seven species (Table 1).

Table 1.

Translationally optimal codons estimated for synonymous groups in seven bacterial species.

Amino acid¹	Synonymous codons	E. coli	S. enterica	B. subtilis	B. thetaiotaomicron	Synechocystis sp.	M. tuberculosis	L. interrogans
Ala	GCA, GCC, GCG, GCU	—³	—	GCA^a	GCU	GCC	GCC	GCA
Cys	UGC, UGU	UGC ^b	UGC	UGC	UGU	—	UGC	—
Asp	GAC, GAU	—	—	—	GAU	—	GAC	—
Glu	GAA,GAG	GAA^ab	GAA	GAA ^a	GAA	GAA	GAG	GAA
Phe	UUC,UUU	UUC ^ab	UUC	—	UUC	—	UUC	—
Gly	GGN	GGC ^ab	GGC	GGC	GGU	GGU	GGC ^*	—
His	CAC, CAU	CAC ^b	—	—	—	—	CAC	—
Ile	AUA, AUC, AUU	AUC ^ab	AUC	—	AUC	—	AUC	—
Lys	AAA, AAG	AAA ^ab	AAA	AAA ^a	AAA	—	AAG	—
Leu 2-fold²	UUA, UUG	UUG ^b	UUG	UUA^a	—	UUG	UUG	—
Leu 4-fold	CUA, CUC, CUG, CUU	CUG ^a	CUG	CUU	—	CUG^*	CUG	—
Asn	AAC, AAU	AAC ^ab	AAC	—	—	—	AAC	—
Pro	CCA, CCC, CCG, CCU	CCG ^*ab	CCG	—	—	CCC^*	CCG	CCU
Gln	CAA, CAG	CAG ^ab	CAG	CAA ^a	—	—	CAG	—
Arg 2-fold	AGA, AGG	AGA^a	AGA	AGA	AGA	AGG	AGG	AGA
Arg 4-fold	CGA, CGC, CGG, CGU	CGU ^a	CGU	CGC	CGU	CGG ^*	CGC	CGU
Serine 2-fold	AGC, AGU	AGC ^b	AGC	AGC	—	—	AGC	—
Serine 4-fold	UCA, UCC, UCG, UCU	UCU	UCC	UCA	UCU	UCC	—	UCU
Thr	ACA, ACC, ACG, ACU	ACC ^ab	ACC	ACA^a	ACU	ACC	ACC	ACU
Val	GUA, GUC, GUG, GUU	—		GUU	GUA	—	—
Tyr	UAC, UAU	—		—	—	—	UAC

Open in a new tab

Bold are translationally optimal codons that also have the highest tRNA availability estimated by tpm from two independent RNA-Seq datasets.

¹Two amino acids (Met and Trp) are omitted because they are each encoded by a single codon.

²The 6-fold degenerate codon families (Leu, Arg, and Ser) are broken into 2 and 4-fold families because of differences in the first codon base.

³An optimal codon cannot be determined by our definition (e.g., RSCU HEG < RSCU LEG violates the second criterion).

^*Predicted preferred codons match optimal codons in only one of the two RNA-Seq data analyzed.

^aTranslationally optimal codons match optimal codons determined by F_op in Ikemura¹⁰ and Kanaya, et al.¹¹.

^bTranslationally optimal codons identified using average tRNA abundance from RNA fingerprinting approach, retrieved from Dong, et al.⁴².

Many studies have demonstrated that bacterial tRNA abundance fluctuates due to growth phase and culture conditions such as temperature and media^42,51–53. Hence, for each species, we have acquired tRNA tpm values from two independent, but experimentally consistent (i.e., all strains are wildtype, and all cultures are taken during log-phase growth) RNA-Seq datasets (Table 2) that were prepared for sequencing on the same platform (Illumina) to verify consistency in predicting translationally optimal codons using tRNA tpm (Table 1).

Table 2.

The seven bacterial species studied herein due to their availability of protein abundance, growth rate and RNA-Seq data.

Species	Strain	Growth Rate^a	NCBI Accession	Experiment ID^*
Bacteroides thetaiotaomicron	VPI-5482	Slow	NC_004663	SRX020805, SRX860738
Bacillus subtilis	168	Fast	NC_000964	SRX515181, SRX2804667
Escherichia coli	K-12	Fast	NC_000913	SRX515174, SRX669653
Leptospira interrogans	Fiocruz L1-130	Slow	AE016823	SRX2448246, SRX405952
Mycobacterium tuberculosis	H37Rv	Slow	NC_000962	SRX1372108, SRX4374910
Salmonella enterica	LT2	Fast	NC_003197	SRX1638989, SRX1258668
Synechocystis sp.	PCC 6803	Slow	NC_017277	SRX347145, SRX4145044

Open in a new tab

^aInformation on species growth-rate are retrieved from Rocha²⁴.

^*All selected datasets have matching species strains between protein abundance, GtRNAdb, NCBI and RNA-Seq data, with the exception of L. interrogans (SRX2448246) due to the lack of a second RNA-Seq data in GEO Datasets for strain Fiocruz L1-130. All cultures in RNA-Seq experiments were isolated during log phase of growth. The first listed SRX dataset was selected for all analyses, and both were used for Table 1 and Fig. S4.

Implementing tRNA tpm in tAI calculation improves the non-parametric S correlation

We calculated tAI using gene copy number and tRNA tpm (See Materials and Methods for more detail, Supplementary Fig. S2), and Table 3 shows the non-parametric regression S which reflects the correlation between tAI values and effective number of codons⁵⁴ (corrected for silent substitutions¹⁸). For six out of seven species studied, S’ calculated with tRNA tpm is higher than S calculated using tRNA gene copy number (Table 3). Hence, tAI’ values calculated using tRNA tpm better correlate with codon usage than tAI values obtained from tRNA gene copy number (two-tailed Student’s t-test with unequal variance: P < 0.0001; Supplementary Fig. S2) in all species except L. interrogans. Note that S values calculated herein use non-hypothetical and non-pseudo genes, whereas those originally calculated (S₀)¹⁸ use all coding DNA sequences, although value ranks stay consistent (Table 3).

Table 3.

Non-parametric regression S correlations between tAI values and effective number of codons.

Species	S ₀	S	S'
E. coli	0.70^a	0.61^b	0.71^c
S. enterica	0.63	0.59	0.69
B. thetaiotaomicron	0.55	0.4	0.62
Synechocystis sp.	0.38	0.27	0.5
B. subtilis	−0.01	0.18	0.33
M. tuberculosis	−0.04	0.1	0.13
L. interrogans	N/A	0.23	0.2

Open in a new tab

^aS values retrieved from dos Reis, et al.¹⁸, calculated using all coding DNA sequences.

^bS values calculated using tRNA gene copy number, using genes having non-zero protein abundances.

^cS values calculated using tRNA tpm, using genes having non-zero protein abundances.

Abundance of tRNA depends on codon usage in fast-growing species

Codon usage preferences can also drive up tRNA content^6,24. For each tRNA species (distinguished by anticodon), we define its readable codon usage as the sum usage of its cognate and near-cognate codons (e.g., the readable codon usage of tRNA^Ala_GGC = GCC usage + GCU usage). Readable codon usage in HEGs better explains tRNA tpm in fast-growing species than in slow-growing species. More specifically, readable codon usage better correlates with tRNA tpm in E. coli, S. enterica, and B. subtilis, than in B. thetaiotaomicron, Synechocystis sp., M. tuberculosis, and L. interrogans (Supplementary Fig. S2).

Discussion

Two approaches have been taken to characterize the effect of tRNA on codon usage. The first tests whether gain or loss of tRNA genes will lead to predicted changes in codon usage^55–57. In tunicates and bivalves, an additional tRNA^Met/UAU gene is present in the mitochondrial genome. One would expect that the additional tRNA^Met/UAU would favor increased usage of AUA codons, and this expectation is empirically substantiated^55,56. One may also reason that, if a bacteriophage encodes many tRNA genes in its own genome, especially when these tRNAs are rare in the host, then the phage codon usage will be less dependent on the host tRNA pool. This expectation is also consistent with empirical evidence^16,58. The second approach quantifies within-species association of codon usage with tRNA abundance which is often approximated by tRNA gene copy number, but this proxy has two key shortcomings. First, we do not know if it is generally true that tRNA gene copy number serves as a good proxy for tRNA abundance. Second, some bacterial species, such as M. tuberculosis, have only a single tRNA gene for all anticodons. In such cases, we cannot use tRNA gene copy number as a proxy of abundance because of its lack of variability. Thus, accurate quantification of tRNA abundance is crucial for detecting codon adaptation.

Our RNA-Seq-based analysis shows that snapshots of bacterial tRNA pools can be captured using RNA-Seq profiling in spite of claims that standard Illumina RNA Sequencing protocols inefficiently quantify tRNAs in eukaryotes^32,43,44 due to a number of modifications that increase the stability of tRNA secondary structure. While some tRNA modifications are shared among the three kingdoms of life, others are kingdom specific⁵⁹. Nonetheless, tRNA post-transcriptional methylation is extensive in Eubacteria⁵⁹, and we acknowledge that tRNA tpm values may underestimate tRNA abundances since tRNA is highly structured and difficult to denature. Despite this, we do not observe any drastic read count drops or hard-stops in mapped reads at or flanking documented methylated sites in the seven species studied here, nor do we see partially mapped tRNA regions (Fig. 2a–c; Supplementary File S1) that were commonly observed in untreated tRNA sequencing data in Eukaryotes³¹. A caveat is the presence of a hard-stop in most tRNAs at site 50 for B. subtilis (SRX2804667), Synechocystis sp. (SRX4145044) and L. interrogans (SRX2448246) (Supplementary File S1). However, there is no documented methyltransferase activity acting at this site for these species, and the drop in mapped reads is only observed in one of the two SRX datasets retrieved for each species. Nonetheless, we acknowledge that tRNA sequencing may be potentially enhanced with demethylase treatments^31,32 that should be employed in future tRNA-Seq studies in bacteria.

Because eukaryotic tRNA read mapping abundances are considerably higher in demethylated samples than untreated samples³¹, we expected that our read mapping abundances may be similarly impacted by tRNA methylation. In light of this, we considered all annotated methylation sites in bacterial tRNAs, even though they may not be methylated at all times due to structural constraints. Surprisingly, our observations reveal that read mapping abundances of E. coli tRNAs that are potentially heavily methylated do not differ from those that are susceptible to methylation at fewer than five methylation sites (Fig. 2e). This suggests that the requirement for demethylase treatment prior to tRNA sequencing in bacterial species with functional AlkB demethylase homologs may be more relaxed, especially since demethylation treatments in current eukaryotic tRNA sequencing approaches are bacterial AlkB-facilitated^31,32.

In all bacteria, we combined our RNA quantification approach with available protein abundances to determine the most translationally optimal codon in 21 codon groups (Table 1) based on codon usage differences between HEG and LEG subsets. These subsets of genes are established based on protein per transcript (Supplementary Table S1, File S3), with the aim of providing a more accurate estimate of translation efficiency from protein abundance by decoupling rates of transcription. However, species-specific translationally optimal codons cannot always be established because the two described criteria are violated in some groups (e.g., the codon with the highest RSCU is more over-represented in LEGs than HEGs). In these cases, there is no evidence that the most abundantly used synonymous codon would contribute to increase translation efficiency. In particular, L. interrogans represents a slow-growing species wherein codon optimization is very poor (only seven translationally optimal codons can be characterized in 21 codon groups).

Our tRNA quantification approach better predicts translationally optimal codons over F_op¹⁰. In E. coli, synonymous codons with the highest tRNA tpm (ranked by highest TPU followed by highest cognate tRNA abundance) match 15 out of 17 translationally optimal codons, but 13 out of 17 when we replace tRNA tpm with averaged RNA fingerprinting abundance retrieved from Dong, et al.⁴² (Table 1). In contrast, F_op determines 12 such translationally optimal codons¹⁰ of which AGA (Arg) is the only codon that our method does not predict to be optimal (Table 1). Additionally, all translationally optimal codons determined herein are consistent with optimal codons determined by F_op, except in the Serine 4-fold family where F_op predicts UCC whereas we predict UCU to be optimal. In the case of B. subtilis, both tRNA tpm and F_op determine six translationally optimal codons (Table 1). It is worth mentioning that F_op determines 16 optimal codons in B. subtilis¹¹, but most may not be translationally optimal. For example, CCA (Pro) was determined to be an optimal codon, but CCG (Pro) is substantially more preferred than CCA (Pro) (RSCU in HEGs are 1.735 and 0.782, respectively).

Nonetheless, bacterial tRNA abundance may not fully explain the variation in usage of all 61 sense codons (Supplementary Fig. S4). First, codon preference cannot always be inferred reliably from tRNA gene redundancy or experimentally measured tRNA abundance. For example, inosine is expected to pair best with C and U, but less with A (presumably because of the bulky I/A pairing involving two purines)⁶⁰. Second, what matters in translation elongation is the availability of charged tRNAs. It is difficult to determine the level of charged tRNAs, and researchers typically would use transcriptionally determined tRNAs or even the number of tRNA genes in the genome as a proxy of charged tRNAs. Unfortunately, the abundance of tRNAs does not always reflect the abundance of charged tRNA⁶¹. Lastly, other factors such as mutation bias^21,62–65 may exert more pressure on codon usage in certain species.

Conversely, the variation in tRNA tpm is better explained by codon usage (Supplementary Fig. S3) in fast-growing (E. coli, B. subtilis and S. enterica) than in slow-growing species (B. thetaiotaomicron, L. interrogans, M. tuberculosis and Synechocystis sp.). This result supports the theory that tRNA translation machinery is better optimized to codon usage in fast-growing than slow-growing species^9,24. Indeed, duplicating tRNA genes is an effective way to elevate transcript abundance in species that grow and replicate rapidly^10–12,42, but not in slow-growing species (Supplementary Fig. S5).

One potentially important implementation of tRNA tpm is in the calculation of tAI. Our results (Table 3, Supplementary Fig. S2) show that tAI’ (calculated using tRNA tpm) better explains effective number of codons than tAI (calculated using tRNA gene copy number) for all species studied except L. interrogans. Considering S and S₀ calculated using tRNA gene copy numbers, their differences are likely due to our usage of the subset of non-hypothetical and non-pseudo genes that have protein abundance values³⁰ (S) whereas all DNA coding sequences (including hypothetical and pseudo genes) were used in the original calculation¹⁸ (S₀). Additionally, both the GtRNAdb^25,66 and DNA coding sequences (GenBank annotations) have been continuously curated since 2004. Lastly, only wobble base pairings were considered in the original introduction of tAI^18,67; whereas we have also considered possible anticodon modifications^19,24 that further relax codon pairing. These differences improve the calculation of the S correlation using tRNA copy numbers, notably in B. subtilis and M. tuberculosis. In contrast, the originally calculated negative S₀ correlation for B. subtilis was a major shortcoming of the tAI method¹⁸ and was criticized²¹ for suggesting a lack of selective pressure exerted by tRNA abundance on codon preference in this species.

In the case of M. tuberculosis, our tRNA quantification approach is much more sensitive to determining tRNA-mediated codon bias than tAI. The S correlations are consistently the lowest for M. tuberculosis (Table 3), yet we identified 17 out of 19 translationally optimal codons using tRNA tpm (Table 1). Our method recaptures the “weak but significant codon usage preference” previous reported^21,68 in this slow-growing species, and show that the degree to which tRNA availability explains optimal codon usage is species-specific and does not always depend on growth-rate.

We studied the coevolution between codon usage and tRNA abundance in three fast-growing species (E. coli, S. enterica, and B. subtilis) and four slow-growing species (B. thetaiotaomicron, L. interrogans, M. tuberculosis, and Synechocystis sp.). Our findings indicate that tRNA quantification by tpm offers better predictions of translationally optimal codons over F_op in E. coli, and improves the calculation of tAI to better reflect codon preference in all species studied except L. interrogans. The usage of translationally optimal codons can be well explained by relative tRNA tpm in E. coli and S. enterica; however, both tRNA tpm and RNA fingerprinting abundances¹¹ offer weaker explanations for codon preference in B. subtilis. The influence of tRNA availability on codon bias is not always stronger in fast-growing species, and optimal codons can be well explained by tRNA content in certain slow-growing species such as M. tuberculosis. Conversely, the tRNA translation machinery is better optimized to codon usage in HEGs of fast-growing than slow-growing species.

Materials and Methods

Processing genomic, proteomic and RNA-seq data

We retrieved the annotated genomes (Table 2) for three fast growing species (E. coli, B. subtilis, and S. enterica) and four slow-growing species (B. thetaiotaomicron, L. interrogans, M. tuberculosis, and Synechocystis sp.) in GenBank format from the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov). For each species, all documented tRNA methyltransferase genes were retrieved from GenBank annotations (Supplementary File S1).

Protein abundance data corresponding with each of these species were retrieved from PaxDb 4.0³⁰ https://pax-db.org/ and abundance values were associated with GeneIDs retrieved from the genomes using DAMBE 7⁶⁹. The integrated protein abundance dataset was taken when available.

RNA-Seq runs of wildtype species were fetched from GEO DataSets (https://www.ncbi.nlm.nih.gov/gds/) in FASTQ format. FASTQ files were converted to FASTQ+ format using ARSDA 1.1²⁷ in order to reduce file sizes by grouping identical reads under a single ID while retaining the copy number for each read (in the format S<read#>_<copy#>). The FASTQ+ data was then processed using both CutAdapt 1.17⁷⁰ and Trimmomatic 0.38⁷¹ to remove flanking adapter sequences and purge low quality reads. For experiments that use the oligo(dT)-adapter primer for cDNA synthesis, RNA fragments are first poly-adenylated at the 3′ end. In these cases, we set CutAdapt to recognize “AAAAA”. We also used CutAdapt to recognize and remove all possible adapters in experiments that used custom adapters, with 10% mismatch error rate. When the adapters conformed to standard Illumina protocols, we simply used the “ILLUMINACLIP” function built into Trimmomatic to trim the relevant library of adapters from reads (ILLUMINACLIP: <adapters.fa> :2:20:6). After adapters were trimmed, we retained reads that were a minimum of 25 nt long (-m 25 in CutAdapt or MINLEN:25 in Trimmomatic) to mitigate bias in expression levels⁷². We filtered trimmed reads to remove poor quality sequences with average Phred scores lower than 30 (0.1% probability of a base calling error)⁷³.

RNA-Seq read mapping for tRNAs and mRNAs

We retrieved the sequences of all genomically encoded tRNAs for each organism from the Genomic tRNA Database (GtRNAdb 2.0⁶⁶) in FASTA format and removed predicted pseudo-tRNAs and those with unspecified anticodons. The FASTA files containing tRNA sequences were read into DAMBE to represent identical sequences with one ID indicating the number of identical copies. Since mature tRNAs are modified to have 5′-CCA-3′ appended to their 3′ end, we manually added CCA to sequences lacking this motif³². The modified tRNA FASTA files for all species were indexed and the associated RNA-Seq reads from processed FASTQ+ files were pseudo-aligned to each tRNA index and tRNA tpm was quantified using kallisto v0.44.0²⁶. The tRNA pseudo-alignments for each species were subsequently sorted and site-specific depth values were generated for each tRNA using the sorted pseudo-alignments via the ‘sort’ and ‘depth’ commands from SAMtools⁷⁴, respectively.

Similarly, all non-pseudo and non-hypothetical DNA coding sequences with non-zero protein abundance values were retrieved using DAMBE in FASTA format and indexed. The associated RNA-Seq reads from processed FASTQ+ files were pseudo-aligned to each mRNA index and mRNA tpm values quantified using kallisto.

Determination of translationally highly and lowly expressed genes by protein per transcript

Protein per transcript (ppm/tpm) was estimated by taking gene protein abundance (ppm) divided by its mRNA tpm, for both RNA-Seq datasets in each species except B. subtilis. For B. subtilis, protein per transcript was obtained with only SRX515181 dataset, because SRX2804667 MiSeq experimental protocol effectively removes large transcripts to study tRNAs⁷⁵. From each dataset, genes with top and bottom 30% ppm/tpm values were selected (Supplementary File S3). A gene is considered to be highly expressed if the gene ID is found in both gene sets for each species; the same was done to identify lowly expressed genes from the bottom 30% ppm/tpm gene sets (Supplementary File S3, Table S1). To verify the validity of this approach, we determined the number of ribosomal protein (30S and 50S subunit) genes that are present in each gene sets. We observed a great number of ribosomal protein genes in the top 30% ppm/tpm gene sets and nearly none in the bottom 30% ppm/tpm gene sets (Supplementary Table S1). This is expected because ribosomal protein genes are commonly accepted and used as highly expressed genes^4,24.

Computation of relative synonymous codon usage and tRNA usage metrics

Relative synonymous codon usage (RSCU)⁴⁵ values were computed for each species by loading HEGs and LEGs (Supplementary File S3) into DAMBE and selecting “Seq. Analysis” > “Codon Usage” > “Relative synonymous codon usage”. DAMBE’s implementation of the RSCU computation automatically splits 6-fold degenerate codon families into a 2-fold and 4-fold degenerate family based on difference at the first codon position.

To acquire relative tRNA usage for each synonymous codon (RTU), we adapt the RSCU formula (1) in the same way that Relative tRNA Gene frequency was employed by Novoa, et al.¹⁹ using tRNA gene copy number:

R T U_{i} = \frac{t p m o f t R N A_{i}}{\frac{\sum_{i}^{n} t R N A_{i}}{n}}

where i is any codon within a 2 or 4-fold degenerate codon family, and n is the total number of codons in the synonymous group. RSCU and RTU are both calculated by breaking 6-fold codon families (Arginine, Leucine, and Serine) into 2 and 4-fold groups (e.g., 4-fold CGN (arg) and 2-fold AGN (arg)). Methionine and Tryptophan have been omitted from RSCU and RTU calculation because they are encoded by a single codon (RSCU and RTU = 1). Similarly, codon groups with RTU = 1 have been omitted from plots when applicable (Supplementary Fig. S4), because RTU will not estimate RSCU for these codons. All correlation coefficients (R²) are calculated by taking the square of Pearson’s correlation r (Figs 1 and 3, Supplementary Figs S3–5).

Computation of tAI and correlation S

We first calculated tAI using the original formulation of the model^18,67 which considers the copy number of all isoacceptor tRNAs for each codon via the author’s tAI R package version 0.2 (https://github.com/mariodosreis/tai) for all species in this study. We additionally computed a modified version of tAI (tAI’) which uses the summed tpm values associated with codon-specific isoacceptor tRNAs in lieu of tRNA copy number. Rather than using all annotated DNA coding sequences in these calculations, as was done originally¹⁸, we only considered genes with non-zero protein abundance values from the integrated datasets stored in PaxDb. The tAI and tAI′ values for each species were plotted (Supplementary Fig. S2) against the effective number of codons corrected for silent substitutions at the third codon position (f[GC3s] - Nc) to determine the S and S’ correlation coefficients, respectively.

Supplementary information

Supplementary File S1^{(999KB, xlsx)}

Supplementary File S2^{(301KB, xlsx)}

Supplementary File S3^{(2.2MB, xlsx)}

Supplementary File S4^{(775.8KB, pdf)}

Acknowledgements

This work was supported by the Discovery Grant of Natural Science and Engineering Research Council of Canada to X.X. (NSERC, RGPIN/2018-03878), and the Ontario Graduate Scholarship 2018-2019 to Y.W.

Author Contributions

Y.W., J.R.S. and X.X. designed the study and wrote the main manuscript text. Y.W. and J.R.S. collected and analyzed the data. X.X. developed the computer programs and supervised the study. All authors reviewed the manuscript. X.X. supervised the project.

Data Availability

Supplementary File S1 contains RNA-Seq read depths and tRNA methylation profile. Supplementary File S2 contains identifications of translationally optimal codons and tRNA abundances (gene copy, tpm and fingerprinting data) and tRNA quantification approaches. Supplementary File S3 contains protein per transcript data, protein abundance information, identified translationally HEGs and LEGs. Supplementary File S4 contains Supplementary Figs S1–S5 and Table S1.

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yulong Wei and Jordan R. Silke contributed equally.

Electronic supplementary material

Supplementary information accompanies this paper at 10.1038/s41598-019-39369-x.

References

1.Robinson M, et al. Codon usage can affect efficiency of translation of genes in Escherichia coli. Nucleic acids research. 1984;12:6663–6671. doi: 10.1093/nar/12.17.6663. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Sorensen MA, Kurland CG, Pedersen S. Codon usage determines translation rate in Escherichia coli. J Mol Biol. 1989;207:365–377. doi: 10.1016/0022-2836(89)90260-X. [DOI] [PubMed] [Google Scholar]
3.McPherson DT. Codon preference reflects mistranslational constraints: a proposal. Nucleic Acids Res. 1988;16:4111–4120. doi: 10.1093/nar/16.9.4111. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Sharp PM, Li WH. The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic acids research. 1987;15:1281–1295. doi: 10.1093/nar/15.3.1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Xia X. A Major Controversy in Codon-Anticodon Adaptation Resolved by a New Codon Usage Index. Genetics. 2015;199:573–579. doi: 10.1534/genetics.114.172106. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Bulmer M. Coevolution of codon usage and transfer RNA abundance. Nature. 1987;325:728–730. doi: 10.1038/325728a0. [DOI] [PubMed] [Google Scholar]
7.Gouy M, Gautier C. Codon usage in bacteria: correlation with gene expressivity. Nucleic acids research. 1982;10:7055–7074. doi: 10.1093/nar/10.22.7055. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol. 1981;151:389–409. doi: 10.1016/0022-2836(81)90003-6. [DOI] [PubMed] [Google Scholar]
9.Higgs PG, Ran W. Coevolution of codon usage and tRNA genes leads to alternative stable states of biased codon usage. Molecular biology and evolution. 2008;25:2279–2291. doi: 10.1093/molbev/msn173. [DOI] [PubMed] [Google Scholar]
10.Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Molecular biology and evolution. 1985;2:13–34. doi: 10.1093/oxfordjournals.molbev.a040335. [DOI] [PubMed] [Google Scholar]
11.Kanaya S, Yamada Y, Kudo Y, Ikemura T. Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene. 1999;238:143–155. doi: 10.1016/S0378-1119(99)00225-5. [DOI] [PubMed] [Google Scholar]
12.Xia X. How optimized is the translational machinery in Escherichia coli, Salmonella typhimurium and Saccharomyces cerevisiae? Genetics. 1998;149:37–44. doi: 10.1093/genetics/149.1.37. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Carullo M, Xia X. An Extensive Study of Mutation and Selection on the Wobble Nucleotide in tRNA Anticodons in Fungal Mitochondrial Genomes. Journal of Molecular Evolution. 2008;66:484. doi: 10.1007/s00239-008-9102-8. [DOI] [PubMed] [Google Scholar]
14.Xia X. The cost of wobble translation in fungal mitochondrial genomes: integration of two traditional hypotheses. BMC Evolutionary Biology. 2008;8:211. doi: 10.1186/1471-2148-8-211. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.van Weringh A, et al. HIV-1 modulates the tRNA pool to improve translation efficiency. Molecular biology and evolution. 2011;28:1827–1834. doi: 10.1093/molbev/msr005. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Chithambaram S, Prabhakaran R, Xia X. Differential codon adaptation between dsDNA and ssDNA phages in Escherichia coli. Molecular biology and evolution. 2014;31:1606–1617. doi: 10.1093/molbev/msu087. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Prabhakaran R, Chithambaram S, Xia X. Escherichia coli and Staphylococcus phages: effect of translation initiation efficiency on differential codon adaptation mediated by virulent and temperate lifestyles. J Gen Virol. 2015;96:1169–1179. doi: 10.1099/vir.0.000050. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.dos Reis M, Savva R, Wernisch L. Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 2004;32:5036–5044. doi: 10.1093/nar/gkh834. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Novoa EM, Pavon-Eternod M, Pan T. & Ribas de Pouplana, L. A role for tRNA modifications in genome structure and codon usage. Cell. 2012;149:202–213. doi: 10.1016/j.cell.2012.01.050. [DOI] [PubMed] [Google Scholar]
20.Rojas J, et al. Codon usage revisited: Lack of correlation between codon usage and the number of tRNA genes in enterobacteria. Biochemical and Biophysical Research Communications. 2018;502:450–455. doi: 10.1016/j.bbrc.2018.05.168. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE. Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res. 2005;33:1141–1153. doi: 10.1093/nar/gki242. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Bennetzen JL, Hall BD. Codon selection in yeast. J. Biol. Chem. 1982;257:3026–3031. [PubMed] [Google Scholar]
23.Bulmer M. The selection-mutation-drift theory of synonymous codon usage. Genetics. 1991;129:897–907. doi: 10.1093/genetics/129.3.897. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Rocha EP. Codon usage bias from tRNA’s point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res. 2004;14:2279–2286. doi: 10.1101/gr.2896904. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Chan PP, Lowe TM. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009;37:4. doi: 10.1093/nar/gkn787. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34:525–527. doi: 10.1038/nbt.3519. [DOI] [PubMed] [Google Scholar]
27.Xia X. ARSDA: A New Approach for Storing, Transmitting and Analyzing Transcriptomic Data. G3: Genes|Genomes|Genetics. 2017;7:3839–3848. doi: 10.1534/g3.117.300271. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Kodama Y, Shumway M, Leinonen R. The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 2012;40:18. doi: 10.1093/nar/gkr1103. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Leinonen R, Sugawara H, Shumway M. The sequence read archive. Nucleic Acids Res. 2011;39:9. doi: 10.1093/nar/gkq1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Wang M, Herrmann CJ, Simonovic M, Szklarczyk D, von Mering C. Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics. 2015;15:3163–3168. doi: 10.1002/pmic.201400441. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Cozen AE, et al. ARM-seq: AlkB-facilitated RNA methylation sequencing reveals a complex landscape of modified tRNA fragments. Nat Methods. 2015;12:879–884. doi: 10.1038/nmeth.3508. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Zheng G, et al. Efficient and quantitative high-throughput tRNA sequencing. Nat Methods. 2015;12:835–837. doi: 10.1038/nmeth.3478. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Falnes PØ, Bjørås M, Aas PA, Sundheim O, Seeberg E. Substrate specificities of bacterial and human AlkB proteins. Nucleic acids research. 2004;32:3456–3461. doi: 10.1093/nar/gkh655. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Falnes PO, Johansen RF, Seeberg E. AlkB-mediated oxidative demethylation reverses DNA damage in Escherichia coli. Nature. 2002;419:178–182. doi: 10.1038/nature01048. [DOI] [PubMed] [Google Scholar]
35.Schulz S, Perez-de-Mora A, Engel M, Munch JC, Schloter M. A comparative study of most probable number (MPN)-PCR vs. real-time-PCR for the measurement of abundance and assessment of diversity of alkB homologous genes in soil. J Microbiol Methods. 2010;80:295–298. doi: 10.1016/j.mimet.2010.01.005. [DOI] [PubMed] [Google Scholar]
36.van den Born E, et al. Bioinformatics and functional analysis define four distinct groups of AlkB DNA-dioxygenases in bacteria. Nucleic Acids Res. 2009;37:7124–7136. doi: 10.1093/nar/gkp774. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Wang L, Wang W, Lai Q, Shao Z. Gene diversity of CYP153A and AlkB alkane hydroxylases in oil-degrading bacteria isolated from the Atlantic Ocean. Environmental microbiology. 2010;12:1230–1242. doi: 10.1111/j.1462-2920.2010.02165.x. [DOI] [PubMed] [Google Scholar]
38.Gao P, et al. An Exogenous Surfactant-Producing Bacillus subtilis Facilitates Indigenous Microbial Enhanced Oil Recovery. Frontiers in microbiology. 2016;7:186–186. doi: 10.3389/fmicb.2016.00186. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Nie, Y. et al. Diverse alkane hydroxylase genes in microorganisms and environments. Scientific reports4 (2014). [DOI] [PMC free article] [PubMed]
40.Cassier-Chauvat C, Veaudor T, Chauvat F. Comparative Genomics of DNA Recombination and Repair in Cyanobacteria: Biotechnological Implications. Frontiers in microbiology. 2016;7:1809–1809. doi: 10.3389/fmicb.2016.01809. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.van den Born E, et al. Viral AlkB proteins repair RNA damage by oxidative demethylation. Nucleic acids research. 2008;36:5451–5461. doi: 10.1093/nar/gkn519. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Dong H, Nilsson L, Kurland CG. Co-variation of tRNA Abundance and Codon Usage inEscherichia coliat Different Growth Rates. Journal of Molecular Biology. 1996;260:649–663. doi: 10.1006/jmbi.1996.0428. [DOI] [PubMed] [Google Scholar]
43.Pang YL, Abo R, Levine SS, Dedon PC. Diverse cell stresses induce unique patterns of tRNA up- and down-regulation: tRNA-seq for quantifying changes in tRNA copy number. Nucleic Acids Res. 2014;42:27. doi: 10.1093/nar/gku945. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Loher, P., Telonis, A. G. & Rigoutsos, I. Accurate Profiling and Quantification of tRNA Fragments from RNA-Seq Data: A Vade Mecum for MINTmap. Methods Mol Biol, 7339-7332_7316 (2018). [DOI] [PubMed]
45.Sharp PM, Li WH. An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol. 1986;24:28–38. doi: 10.1007/BF02099948. [DOI] [PubMed] [Google Scholar]
46.Ikemura T, Ozeki H. Codon usage and transfer RNA contents: organism-specific codon-choice patterns in reference to the isoacceptor contents. Cold Spring Harb Symp Quant Biol. 1983;2:1087–1097. doi: 10.1101/SQB.1983.047.01.123. [DOI] [PubMed] [Google Scholar]
47.Grosjean H, Fiers W. Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene. 1982;18:199–209. doi: 10.1016/0378-1119(82)90157-3. [DOI] [PubMed] [Google Scholar]
48.Ikemura T. Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J Mol Biol. 1982;158:573–597. doi: 10.1016/0022-2836(82)90250-9. [DOI] [PubMed] [Google Scholar]
49.Nishimura, S. Modified nucleosides and isoaccepting tRNA. (MIT Press, 1978).
50.Weissenbach J, Dirheimer G. Pairing properties of the methylester of 5-carboxymethyl uridine in the wobble position of yeast tRNAArg3. Biochimica et Biophysica Acta (BBA) - Nucleic Acids and Protein Synthesis. 1978;518:530–534. doi: 10.1016/0005-2787(78)90171-5. [DOI] [PubMed] [Google Scholar]
51.Avcilar-Kucukgoze I, et al. Discharging tRNAs: a tug of war between translation and detoxification in Escherichia coli. Nucleic Acids Res. 2016;44:8324–8334. doi: 10.1093/nar/gkw697. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Chen D, Texada DE. Low-usage codons and rare codons of Escherichia coli. Gene Therapy and Molecular Biology. 2006;10:1. [Google Scholar]
53.Dittmar KA, Mobley EM, Radek AJ, Pan T. Exploring the regulation of tRNA distribution on the genomic scale. J Mol Biol. 2004;337:31–47. doi: 10.1016/j.jmb.2004.01.024. [DOI] [PubMed] [Google Scholar]
54.Wright F. The ‘effective number of codons’ used in a gene. Gene. 1990;87:23–29. doi: 10.1016/0378-1119(90)90491-9. [DOI] [PubMed] [Google Scholar]
55.Xia X, Huang H, Carullo M, Betran E, Moriyama EN. Conflict between Translation Initiation and Elongation in Vertebrate Mitochondrial Genomes. PLoS ONE. 2007;2:e227. doi: 10.1371/journal.pone.0000227. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Xia, X. In Evolution in the fast lane: Rapidly evolving genes and genetic systems (eds Rama S. Singh, Jianping Xu, & Rob J. Kulathinal) 73–82 (Oxford University Press, 2012).
57.Xia, X. In Bioinformatics and the Cell 197–238. (Springer, Cham, 2018).
58.Prabhakaran R, Chithambaram S, Xia X. Aeromonas phages encode tRNAs for their overused codons. Int J Comput Biol Drug Des. 2014;7:168–182. doi: 10.1504/IJCBDD.2014.061645. [DOI] [PubMed] [Google Scholar]
59.Hori H. Methylated nucleosides in tRNA and tRNA methyltransferases. Frontiers in genetics. 2014;5:144–144. doi: 10.3389/fgene.2014.00144. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Xia, X. “Bioinformatics and Translation Elongation” in Bioinformatics and the Cell. 197–238 (Springer, Cham., 2018).
61.Elf J, Nilsson D, Tenson T, Ehrenberg M. Selective charging of tRNA isoacceptors explains patterns of codon usage. Science. 2003;300:1718–1722. doi: 10.1126/science.1083811. [DOI] [PubMed] [Google Scholar]
62.Duret L. Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev. 2002;12:640–649. doi: 10.1016/S0959-437X(02)00353-2. [DOI] [PubMed] [Google Scholar]
63.Muto A, Osawa S. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci USA. 1987;84:166–169. doi: 10.1073/pnas.84.1.166. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Osawa S, et al. Directional mutation pressure and transfer RNA in choice of the third nucleotide of synonymous two-codon sets. Proc Natl Acad Sci USA. 1988;85:1124–1128. doi: 10.1073/pnas.85.4.1124. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Yang Z, Nielsen R. Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Molecular biology and evolution. 2008;25:568–579. doi: 10.1093/molbev/msm284. [DOI] [PubMed] [Google Scholar]
66.Chan PP, Lowe TM. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic acids research. 2016;44:D184–D189. doi: 10.1093/nar/gkv1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
67.dos Reis M, Wernisch L, Savva R. Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res. 2003;31:6976–6985. doi: 10.1093/nar/gkg897. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Andersson GE, Sharp PM. Codon usage in the Mycobacterium tuberculosis complex. Microbiology. 1996;142:915–925. doi: 10.1099/00221287-142-4-915. [DOI] [PubMed] [Google Scholar]
69.Xia X. DAMBE7: New and Improved Tools for Data Analysis in Molecular Biology and Evolution. Molecular biology and evolution. 2018;35:1550–1552. doi: 10.1093/molbev/msy073. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17:10. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
71.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Williams CR, Baccarella A, Parrish JZ, Kim CC. Trimming of sequence reads alters RNA-Seq gene expression estimates. BMC Bioinformatics. 2016;17:103. doi: 10.1186/s12859-016-0956-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8:175–185. doi: 10.1101/gr.8.3.175. [DOI] [PubMed] [Google Scholar]
74.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Ernst FGM, et al. Cold adaptation of tRNA nucleotidyltransferases: A tradeoff in activity, stability and fidelity. RNA Biol. 2018;15:144–155. doi: 10.1080/15476286.2017.1391445. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File S1^{(999KB, xlsx)}

Supplementary File S2^{(301KB, xlsx)}

Supplementary File S3^{(2.2MB, xlsx)}

Supplementary File S4^{(775.8KB, pdf)}

Data Availability Statement

[CR1] 1.Robinson M, et al. Codon usage can affect efficiency of translation of genes in Escherichia coli. Nucleic acids research. 1984;12:6663–6671. doi: 10.1093/nar/12.17.6663. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Sorensen MA, Kurland CG, Pedersen S. Codon usage determines translation rate in Escherichia coli. J Mol Biol. 1989;207:365–377. doi: 10.1016/0022-2836(89)90260-X. [DOI] [PubMed] [Google Scholar]

[CR3] 3.McPherson DT. Codon preference reflects mistranslational constraints: a proposal. Nucleic Acids Res. 1988;16:4111–4120. doi: 10.1093/nar/16.9.4111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Sharp PM, Li WH. The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic acids research. 1987;15:1281–1295. doi: 10.1093/nar/15.3.1281. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Xia X. A Major Controversy in Codon-Anticodon Adaptation Resolved by a New Codon Usage Index. Genetics. 2015;199:573–579. doi: 10.1534/genetics.114.172106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Bulmer M. Coevolution of codon usage and transfer RNA abundance. Nature. 1987;325:728–730. doi: 10.1038/325728a0. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Gouy M, Gautier C. Codon usage in bacteria: correlation with gene expressivity. Nucleic acids research. 1982;10:7055–7074. doi: 10.1093/nar/10.22.7055. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol. 1981;151:389–409. doi: 10.1016/0022-2836(81)90003-6. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Higgs PG, Ran W. Coevolution of codon usage and tRNA genes leads to alternative stable states of biased codon usage. Molecular biology and evolution. 2008;25:2279–2291. doi: 10.1093/molbev/msn173. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Molecular biology and evolution. 1985;2:13–34. doi: 10.1093/oxfordjournals.molbev.a040335. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Kanaya S, Yamada Y, Kudo Y, Ikemura T. Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene. 1999;238:143–155. doi: 10.1016/S0378-1119(99)00225-5. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Xia X. How optimized is the translational machinery in Escherichia coli, Salmonella typhimurium and Saccharomyces cerevisiae? Genetics. 1998;149:37–44. doi: 10.1093/genetics/149.1.37. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Carullo M, Xia X. An Extensive Study of Mutation and Selection on the Wobble Nucleotide in tRNA Anticodons in Fungal Mitochondrial Genomes. Journal of Molecular Evolution. 2008;66:484. doi: 10.1007/s00239-008-9102-8. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Xia X. The cost of wobble translation in fungal mitochondrial genomes: integration of two traditional hypotheses. BMC Evolutionary Biology. 2008;8:211. doi: 10.1186/1471-2148-8-211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.van Weringh A, et al. HIV-1 modulates the tRNA pool to improve translation efficiency. Molecular biology and evolution. 2011;28:1827–1834. doi: 10.1093/molbev/msr005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Chithambaram S, Prabhakaran R, Xia X. Differential codon adaptation between dsDNA and ssDNA phages in Escherichia coli. Molecular biology and evolution. 2014;31:1606–1617. doi: 10.1093/molbev/msu087. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Prabhakaran R, Chithambaram S, Xia X. Escherichia coli and Staphylococcus phages: effect of translation initiation efficiency on differential codon adaptation mediated by virulent and temperate lifestyles. J Gen Virol. 2015;96:1169–1179. doi: 10.1099/vir.0.000050. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.dos Reis M, Savva R, Wernisch L. Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 2004;32:5036–5044. doi: 10.1093/nar/gkh834. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Novoa EM, Pavon-Eternod M, Pan T. & Ribas de Pouplana, L. A role for tRNA modifications in genome structure and codon usage. Cell. 2012;149:202–213. doi: 10.1016/j.cell.2012.01.050. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Rojas J, et al. Codon usage revisited: Lack of correlation between codon usage and the number of tRNA genes in enterobacteria. Biochemical and Biophysical Research Communications. 2018;502:450–455. doi: 10.1016/j.bbrc.2018.05.168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE. Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res. 2005;33:1141–1153. doi: 10.1093/nar/gki242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Bennetzen JL, Hall BD. Codon selection in yeast. J. Biol. Chem. 1982;257:3026–3031. [PubMed] [Google Scholar]

[CR23] 23.Bulmer M. The selection-mutation-drift theory of synonymous codon usage. Genetics. 1991;129:897–907. doi: 10.1093/genetics/129.3.897. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Rocha EP. Codon usage bias from tRNA’s point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res. 2004;14:2279–2286. doi: 10.1101/gr.2896904. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Chan PP, Lowe TM. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009;37:4. doi: 10.1093/nar/gkn787. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34:525–527. doi: 10.1038/nbt.3519. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Xia X. ARSDA: A New Approach for Storing, Transmitting and Analyzing Transcriptomic Data. G3: Genes|Genomes|Genetics. 2017;7:3839–3848. doi: 10.1534/g3.117.300271. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Kodama Y, Shumway M, Leinonen R. The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 2012;40:18. doi: 10.1093/nar/gkr1103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Leinonen R, Sugawara H, Shumway M. The sequence read archive. Nucleic Acids Res. 2011;39:9. doi: 10.1093/nar/gkq1015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Wang M, Herrmann CJ, Simonovic M, Szklarczyk D, von Mering C. Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics. 2015;15:3163–3168. doi: 10.1002/pmic.201400441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Cozen AE, et al. ARM-seq: AlkB-facilitated RNA methylation sequencing reveals a complex landscape of modified tRNA fragments. Nat Methods. 2015;12:879–884. doi: 10.1038/nmeth.3508. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Zheng G, et al. Efficient and quantitative high-throughput tRNA sequencing. Nat Methods. 2015;12:835–837. doi: 10.1038/nmeth.3478. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Falnes PØ, Bjørås M, Aas PA, Sundheim O, Seeberg E. Substrate specificities of bacterial and human AlkB proteins. Nucleic acids research. 2004;32:3456–3461. doi: 10.1093/nar/gkh655. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Falnes PO, Johansen RF, Seeberg E. AlkB-mediated oxidative demethylation reverses DNA damage in Escherichia coli. Nature. 2002;419:178–182. doi: 10.1038/nature01048. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Schulz S, Perez-de-Mora A, Engel M, Munch JC, Schloter M. A comparative study of most probable number (MPN)-PCR vs. real-time-PCR for the measurement of abundance and assessment of diversity of alkB homologous genes in soil. J Microbiol Methods. 2010;80:295–298. doi: 10.1016/j.mimet.2010.01.005. [DOI] [PubMed] [Google Scholar]

[CR36] 36.van den Born E, et al. Bioinformatics and functional analysis define four distinct groups of AlkB DNA-dioxygenases in bacteria. Nucleic Acids Res. 2009;37:7124–7136. doi: 10.1093/nar/gkp774. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Wang L, Wang W, Lai Q, Shao Z. Gene diversity of CYP153A and AlkB alkane hydroxylases in oil-degrading bacteria isolated from the Atlantic Ocean. Environmental microbiology. 2010;12:1230–1242. doi: 10.1111/j.1462-2920.2010.02165.x. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Gao P, et al. An Exogenous Surfactant-Producing Bacillus subtilis Facilitates Indigenous Microbial Enhanced Oil Recovery. Frontiers in microbiology. 2016;7:186–186. doi: 10.3389/fmicb.2016.00186. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Nie, Y. et al. Diverse alkane hydroxylase genes in microorganisms and environments. Scientific reports4 (2014). [DOI] [PMC free article] [PubMed]

[CR40] 40.Cassier-Chauvat C, Veaudor T, Chauvat F. Comparative Genomics of DNA Recombination and Repair in Cyanobacteria: Biotechnological Implications. Frontiers in microbiology. 2016;7:1809–1809. doi: 10.3389/fmicb.2016.01809. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.van den Born E, et al. Viral AlkB proteins repair RNA damage by oxidative demethylation. Nucleic acids research. 2008;36:5451–5461. doi: 10.1093/nar/gkn519. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Dong H, Nilsson L, Kurland CG. Co-variation of tRNA Abundance and Codon Usage inEscherichia coliat Different Growth Rates. Journal of Molecular Biology. 1996;260:649–663. doi: 10.1006/jmbi.1996.0428. [DOI] [PubMed] [Google Scholar]

[CR43] 43.Pang YL, Abo R, Levine SS, Dedon PC. Diverse cell stresses induce unique patterns of tRNA up- and down-regulation: tRNA-seq for quantifying changes in tRNA copy number. Nucleic Acids Res. 2014;42:27. doi: 10.1093/nar/gku945. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Loher, P., Telonis, A. G. & Rigoutsos, I. Accurate Profiling and Quantification of tRNA Fragments from RNA-Seq Data: A Vade Mecum for MINTmap. Methods Mol Biol, 7339-7332_7316 (2018). [DOI] [PubMed]

[CR45] 45.Sharp PM, Li WH. An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol. 1986;24:28–38. doi: 10.1007/BF02099948. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Ikemura T, Ozeki H. Codon usage and transfer RNA contents: organism-specific codon-choice patterns in reference to the isoacceptor contents. Cold Spring Harb Symp Quant Biol. 1983;2:1087–1097. doi: 10.1101/SQB.1983.047.01.123. [DOI] [PubMed] [Google Scholar]

[CR47] 47.Grosjean H, Fiers W. Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene. 1982;18:199–209. doi: 10.1016/0378-1119(82)90157-3. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Ikemura T. Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J Mol Biol. 1982;158:573–597. doi: 10.1016/0022-2836(82)90250-9. [DOI] [PubMed] [Google Scholar]

[CR49] 49.Nishimura, S. Modified nucleosides and isoaccepting tRNA. (MIT Press, 1978).

[CR50] 50.Weissenbach J, Dirheimer G. Pairing properties of the methylester of 5-carboxymethyl uridine in the wobble position of yeast tRNAArg3. Biochimica et Biophysica Acta (BBA) - Nucleic Acids and Protein Synthesis. 1978;518:530–534. doi: 10.1016/0005-2787(78)90171-5. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Avcilar-Kucukgoze I, et al. Discharging tRNAs: a tug of war between translation and detoxification in Escherichia coli. Nucleic Acids Res. 2016;44:8324–8334. doi: 10.1093/nar/gkw697. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Chen D, Texada DE. Low-usage codons and rare codons of Escherichia coli. Gene Therapy and Molecular Biology. 2006;10:1. [Google Scholar]

[CR53] 53.Dittmar KA, Mobley EM, Radek AJ, Pan T. Exploring the regulation of tRNA distribution on the genomic scale. J Mol Biol. 2004;337:31–47. doi: 10.1016/j.jmb.2004.01.024. [DOI] [PubMed] [Google Scholar]

[CR54] 54.Wright F. The ‘effective number of codons’ used in a gene. Gene. 1990;87:23–29. doi: 10.1016/0378-1119(90)90491-9. [DOI] [PubMed] [Google Scholar]

[CR55] 55.Xia X, Huang H, Carullo M, Betran E, Moriyama EN. Conflict between Translation Initiation and Elongation in Vertebrate Mitochondrial Genomes. PLoS ONE. 2007;2:e227. doi: 10.1371/journal.pone.0000227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Xia, X. In Evolution in the fast lane: Rapidly evolving genes and genetic systems (eds Rama S. Singh, Jianping Xu, & Rob J. Kulathinal) 73–82 (Oxford University Press, 2012).

[CR57] 57.Xia, X. In Bioinformatics and the Cell 197–238. (Springer, Cham, 2018).

[CR58] 58.Prabhakaran R, Chithambaram S, Xia X. Aeromonas phages encode tRNAs for their overused codons. Int J Comput Biol Drug Des. 2014;7:168–182. doi: 10.1504/IJCBDD.2014.061645. [DOI] [PubMed] [Google Scholar]

[CR59] 59.Hori H. Methylated nucleosides in tRNA and tRNA methyltransferases. Frontiers in genetics. 2014;5:144–144. doi: 10.3389/fgene.2014.00144. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Xia, X. “Bioinformatics and Translation Elongation” in Bioinformatics and the Cell. 197–238 (Springer, Cham., 2018).

[CR61] 61.Elf J, Nilsson D, Tenson T, Ehrenberg M. Selective charging of tRNA isoacceptors explains patterns of codon usage. Science. 2003;300:1718–1722. doi: 10.1126/science.1083811. [DOI] [PubMed] [Google Scholar]

[CR62] 62.Duret L. Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev. 2002;12:640–649. doi: 10.1016/S0959-437X(02)00353-2. [DOI] [PubMed] [Google Scholar]

[CR63] 63.Muto A, Osawa S. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci USA. 1987;84:166–169. doi: 10.1073/pnas.84.1.166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR64] 64.Osawa S, et al. Directional mutation pressure and transfer RNA in choice of the third nucleotide of synonymous two-codon sets. Proc Natl Acad Sci USA. 1988;85:1124–1128. doi: 10.1073/pnas.85.4.1124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Yang Z, Nielsen R. Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Molecular biology and evolution. 2008;25:568–579. doi: 10.1093/molbev/msm284. [DOI] [PubMed] [Google Scholar]

[CR66] 66.Chan PP, Lowe TM. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic acids research. 2016;44:D184–D189. doi: 10.1093/nar/gkv1309. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR67] 67.dos Reis M, Wernisch L, Savva R. Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res. 2003;31:6976–6985. doi: 10.1093/nar/gkg897. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] 68.Andersson GE, Sharp PM. Codon usage in the Mycobacterium tuberculosis complex. Microbiology. 1996;142:915–925. doi: 10.1099/00221287-142-4-915. [DOI] [PubMed] [Google Scholar]

[CR69] 69.Xia X. DAMBE7: New and Improved Tools for Data Analysis in Molecular Biology and Evolution. Molecular biology and evolution. 2018;35:1550–1552. doi: 10.1093/molbev/msy073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR70] 70.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17:10. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]

[CR71] 71.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR72] 72.Williams CR, Baccarella A, Parrish JZ, Kim CC. Trimming of sequence reads alters RNA-Seq gene expression estimates. BMC Bioinformatics. 2016;17:103. doi: 10.1186/s12859-016-0956-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR73] 73.Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8:175–185. doi: 10.1101/gr.8.3.175. [DOI] [PubMed] [Google Scholar]

[CR74] 74.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR75] 75.Ernst FGM, et al. Cold adaptation of tRNA nucleotidyltransferases: A tradeoff in activity, stability and fidelity. RNA Biol. 2018;15:144–155. doi: 10.1080/15476286.2017.1391445. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An improved estimation of tRNA expression to better elucidate the coevolution between tRNA abundance and codon usage in bacteria

Yulong Wei

Jordan R Silke

Xuhua Xia

Abstract

Introduction

Results

RNA-Seq mappings are consistent with tRNA abundance estimates

Figure 1.

Documented tRNA methylation does not appreciably affect tRNA sequencing in bacteria studied

Figure 2.

An improved estimation of codon preference using tRNA tpm

Figure 3.

Table 1.

Table 2.

Implementing tRNA tpm in tAI calculation improves the non-parametric S correlation

Table 3.

Abundance of tRNA depends on codon usage in fast-growing species

Discussion

Materials and Methods

Processing genomic, proteomic and RNA-seq data

RNA-Seq read mapping for tRNAs and mRNAs

Determination of translationally highly and lowly expressed genes by protein per transcript

Computation of relative synonymous codon usage and tRNA usage metrics

Computation of tAI and correlation S

Supplementary information

Acknowledgements

Author Contributions

Data Availability

Competing Interests

Footnotes

Electronic supplementary material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases