ABSTRACT
The presence of tRNA genes in bacteriophages has been explained on the basis of codon usage (tRNA genes are retained in the phage genome if they correspond to codons more common in the phage than in its host) or amino acid usage (independent of codon, the amino acid corresponding to the retained tRNA gene is more common in the phage genome than in the bacterial host). The existence of a large database of sequenced mycobacteriophages, isolated on the common host Mycobacterium smegmatis, allows us to test the above hypotheses as well as explore other hypotheses for the presence of tRNA genes. Our analyses suggest that amino acid rather than codon usage better explains the presence of tRNA genes in mycobacteriophages. However, closely related phages that differ in the presence of tRNA genes in their genomes are capable of lysing the common bacterial host and do not differ in codon or amino acid usage. This suggests that the benefits of having tRNA genes may be associated with either growth in the host or the ability to infect more hosts (i.e., host range) rather than simply infecting a particular host.
KEYWORDS: amino acid use, codon usage, mycobacteriophage, tRNA gene
Introduction
Bacteriophages have “compact” genomes, with little useless baggage. Most of their genome is coding (often over 90% – e.g., refs.1-4) and genes that are not expressed are not thought to be retained. As intracellular parasites, they rely on their hosts' cellular machinery for most housekeeping functions. However, analyses of viral genomes have shown that they sometimes contain genes involved in DNA replication, transcription, or translation.1-2,5 The rapidly growing database of sequenced bacteriophages has allowed us to document the widespread presence of such genes. For example, phage genomes may contain genes for DNA polymerase and/or helicase.2,6 The products of some phage genes modify the bacterial RNA polymerases to increase transcription of early phage genes.7-8 In addition, phages can encode their own tRNA genes, and the occasional presence of the tRNA genes has been called intriguing.9-10
The complement of genes found in a phage is understood to represent a dynamic between gene acquisition via recombination events involving horizontal (or lateral) gene transfer from the genomes of their hosts and/or of co-infecting phages and retention through positive selection.11-12 The prevalent hypothesis to explain the retention of tRNA genes proposes that these correspond to codons that are more common in the phage's genome than in the host's.9-10 This hypothesis makes at least 2 specific predictions: 1) phages that have codon usages similar to their hosts would not benefit from retaining any acquired tRNA genes and their genomes should not retain such genes (although they may contain genes or pseudogenes that have yet to be discarded); 2) phages with codon usages different from their hosts should retain tRNA genes that correspond to the codons more heavily used by the phage than its host. A corollary prediction is that lytic/virulent phages are more likely to retain tRNA genes they acquire since they do not integrate into the genome of their hosts and experience the mutational adaptations of temperate phages (changes in GC content to match that of their host). This may explain why some virulent phages have more tRNA genes than temperate phages.10 A related hypothesis proposes that it is amino acid usage, not codon usage, which leads to the retention of particular tRNA genes. The tRNA genes that are selected for retention are those that correspond to amino acids used by the phage more than its hosts.9 In addition, both hypotheses predict that phages with tRNA genes may grow better in their hosts than similar phages that lack such genes. In support of this prediction, T4 phages whose tRNA genes are deleted have smaller burst sizes and lower rates of protein synthesis than wild-type T4.13
The existence of a large database of mycobacteriophages that have been mostly isolated on a common host, Mycobacterium smegmatis (strain mc2 155; GC content 67.4%) provides an opportunity to examine the presence/absence of tRNA genes within a comparative context.6 These phages belong to 2 families, the siphoviridae and the myoviridae. Further, they can be grouped into different clusters based on genetic similarity, with different methodologies (e.g., average nucleotide identity, gene content analyses) providing concordant clustering.14 Cluster C belongs to the myoviridae, while all other clusters are siphoviridae. Most important for our purposes, these clusters vary in the character under consideration in this paper: in some clusters, tRNA genes are always present while in other clusters they are always absent. In the largest cluster – cluster A – both phages without and with tRNA genes are common, allowing for more fine-grained comparisons. In this paper, we: 1) describe the distribution and location of tRNA genes among mycobacteriophage clusters; 2) test various predictions about the presence and abundance of tRNA genes among mycobacteriophage clusters; and 3) compare amino acid and codon usage of cluster A phages with tRNA genes to that of their putative hosts and to that of members of their cluster without tRNA genes.
Results
The distribution of tRNA genes among mycobacteriophage clusters
Genes for tRNAs are rather frequent in the genomes of bacteriophages. Of a randomly selected 100 sequenced bacteriophages in GenBank as of March 1, 2015, 38 had tRNA genes. Similarly, among our sample of mycobacteriophages, 41.4% contained at least one tRNA gene. The distribution of tRNA genes in mycobacteriophages is cluster specific (Table 1); except for clusters A, E and K, all members of a cluster either had no tRNA genes or possessed at least one tRNA gene. For cluster E, all but one phage possessed at least one tRNA gene. Phages that belong to clusters C, L, M, and V contain many tRNA genes, while phages that belong to cluster B have never been observed to contain tRNA genes despite the reasonably large number of members that have been sequenced. Cluster A is atypical of other mycobacteriophage clusters in having nearly equal number of members that contain no tRNA genes and members that contain some (1–5) tRNA genes (Fig. 1). For cluster A, we found that phages without tRNA genes (51,918.4 ± 995.1bp) were significantly longer (∼574 bp) than phages with tRNA genes (51,374.5 ± 1,327.1 bp; t = 2.27, p = 0.0255, df = 94) across all subclusters. We also found no significant difference in ORF (Open Reading Frames for amino acid coding genes) number for phages without tRNA genes (89.7 ± 5.8) and those with (90.5 ± 6.1; t = 0.60, p = 0.550, df = 94). Similarly, cluster K phages exhibited no significant difference in genome length between phages without tRNA genes (60,130.6 ± 1,679.3 bp) and those with tRNA genes (59,190.5 ± 4,266.1 bp; t = 0.66, p = 0.511, df = 14) and no significant difference in average ORF number between phages without tRNA (97.2 ± 5.6) and those with (95.2 ± 1.9; t = 1.17, p = 0.262, df = 14). Thus, having tRNA genes in these clusters imposes no costs in terms of genome length or number of genes.
Table 1.
Genome characteristics of the sequenced mycobacteriophages. Information based on data found on phagesdb.org as of March 1, 2015. Only annotated phages were included. ‡ Three not yet annotated but sequenced F1 phages contain a tRNA-Trp.
Cluster | Presence of integrase | Number of Members | Number of Subclusters | Average Genome Size (bp) | Avg GC% | Avg # Genes | Avg # tRNAs | % w/ tRNAs |
---|---|---|---|---|---|---|---|---|
A | Yes | 309 | 13 | 51,550 | 63.3 | 90.2 | 1.0 | 50.5 |
B | No | 142 | 5 | 68,670 | 67.1 | 98.3 | 0 | 0 |
C | No | 61 | 2 | 155,716 | 64.7 | 230.1 | 33.1 | 100 |
D | No | 10 | 2 | 64,965 | 59.4 | 88.8 | 0 | 0 |
E | Yes | 53 | – | 75,527 | 63.0 | 143.4 | 1.9 | 95.7 |
F | Yes | 79 | 3 | 57,421 | 61.5 | 104.6 | 0‡ | 0 |
G | Yes | 20 | – | 42,315 | 66.9 | 62.1 | 0 | 0 |
H | No | 5 | 2 | 69,469 | 57.3 | 98.7 | 0 | 0 |
I | Yes | 5 | 2 | 51,129 | 66.3 | 80.2 | 0 | 0 |
J | Yes | 18 | – | 110,417 | 60.9 | 235.0 | 1.6 | 100 |
K | Yes | 47 | 5 | 60,099 | 66.9 | 95.4 | 0.8 | 75.0 |
L | Yes | 17 | 3 | 75,207 | 58.9 | 123.0 | 9.4 | 100 |
M | Yes | 4 | 2 | 81,509 | 61.4 | 141.3 | 18.7 | 100 |
N | Yes | 11 | – | 43,153 | 66.2 | 65.7 | 0 | 0 |
O | No | 6 | – | 70,485 | 65.4 | 120.0 | 0 | 0 |
P | Yes | 13 | – | 47,780 | 67.0 | 79.7 | 0 | 0 |
Q | Yes | 5 | – | 53,755 | 67.4 | 78.0 | 0 | 0 |
R | No | 5 | – | 71,424 | 56.0 | 98.0 | 0 | 0 |
S | No | 3 | – | 65,193 | 63.4 | 107.0 | 0 | 0 |
T | Yes | 3 | – | 42,833 | 66.2 | 61.0 | 0 | 0 |
U | No | 2 | – | 69,942 | 50.4 | 104.0 | 1.0 | 100 |
V | No | 2 | – | 78,263 | 57.0 | 148.0 | 24.0 | 100 |
Singletons | Yes | 7 | – | 63,330 | 63.5 | 87.0 | 0 | 0 |
Figure 1.
Distribution of tRNA genes among the A subcluster mycobacteriophages. For each subcluster, the average number of tRNA genes among members of that subcluster is plotted. Each bar is color coded to indicate the relative frequency of different tRNA genes in that subcluster.
Across clusters, an association between genome length and number of tRNA genes was only observed when the only myoviridae cluster was included (r = 0.729, df = 21, p = 0.00012; Fig. 2). However, if Cluster C is removed, this association becomes non-significant (r = 0.398, df = 20, p = 0.074); moreover, the second largest phages (Cluster J) average only 1.6 tRNA gene (Table 1). Thus, genome size explains very little of the variation in the number of tRNA genes among mycobacteriophages. Given that all of these phages have been isolated on a common host, we asked whether clusters with a %GC content that differs most from that of their Mycobacterium smegmatis host were more likely to contain tRNA genes and found no correlation between these variables (r = 0.153, df = 21, p > 0.05; Fig. 3). Finally, we did not find that lytic clusters (as defined by the absence of an integrase gene in all phages of that cluster) were more likely to contain tRNA genes than temperate clusters (Table 1; Chi-square = 0.36, df = 1, p > 0.05).
Figure 2.
Average number of tRNA genes as a function of genome size among mycobacteriophages. The average values for these variables were calculated for each cluster; N = 22. Including cluster C (red square), the linear relation is stronger (dashed line, R2 = 0.531) than when cluster C is excluded (solid line, R2 = 0.158). Based on data in Table 1.
Figure 3.
Average number of tRNA genes as a function of deviation in %GC content from their isolation host. For each cluster, average %GC content was obtained and the absolute deviation from the %GC of Mycobacterium smegmatis mc2 155 was calculated. Based on data in Table 1.
The location of tRNA genes among mycobacteriophage clusters
The location of tRNA genes in the genomes of phages of a particular cluster is typically highly conserved. For example, in cluster A, the tRNA genes are almost always among the “early” genes – the first 10–11 ORFs of a given genome, often before the lysin A gene (Table 2). Among the A1 phages, there is one exception to this pattern, in phages Perseus and Kykar; their single tRNA gene as well as a preceding unique gene are found between 2 minor tail proteins, after the tape measure gene. Although their specific location is usually conserved within a cluster, tRNA genes are found in different locations in different mycobacteriophage clusters. For example, in clusters E, L, and M, tRNA genes are found in the last third of the genome, while in cluster K, like cluster A, they are usually found early in the genome, before the terminase (Table 2). The high conservation of location suggests that gains of tRNA genes are highly constrained in where they can be inserted or retained in phages of a given cluster; otherwise, why would they always be found in the same location?
Table 2.
Location of tRNA genes in mycobacteriophage clusters. Data based on annotated phages available in GenBank. N = number of genomes examined.
Cluster | Number of genes | Number of tRNA genes | Number of tRNA groups | Phage example | Location | N |
---|---|---|---|---|---|---|
A | 71–104 | 1–5 | 1 | D29 | 7–11 | 40 |
A | 92 | 1 | 1 | Perseus | 30 | 1 |
C | 250–273 | 29–35 | 3 | MoMoMixon | 84–88; 153–183; 221–222 | 6 |
E | 142–150 | 2 | 1 | Lilac | 108–109 | 9 |
J | 243 | 2 | 1 | Courthouse | 181–185 | 1 |
K | 95–99 | 1 | 1 | Adephagia | 5 | 5 |
K | 93–95 | 1 | 1 | Fionnbarth | 38 | 2 |
L | 145 | 12 | 1 | Crossroads | 108–126 | 1 |
L | 140 | 11 | 2 | Whirlwind | 50; 104–118 | 1 |
M | 153–175 | 16–21 | 1 | Bongo | 91–114 | 3 |
U | 110 | 1 | 1 | Patience | 87 | 1 |
V | 171–173 | 24 | 1 | Cosmo | 101–130 | 2 |
Amino acid and codon usage among the cluster A phages
Cluster A1 is the only cluster with significant number of phages with and without tRNA genes, allowing comparison between related phages that differ in this one feature. Moreover, as the largest mycobacteriophage cluster, it provides us with the greatest power for statistical analyses. Eight tRNA genes coding for 7 amino acids were observed among cluster A phages (Fig. 1; Table 3). As can be expected for phages with high GC content isolated on a high GC content host, 7 of these tRNA genes were for codons that ended in G or C; the exception being the second Leu tRNA gene that is only found in 2 phages.
Table 3.
Distribution of tRNA genes found in cluster A. Data based on all sequenced phages found on phagesdb.org as of March 1, 2015.
Amino acid | Codon | Total number of members | Subcluster (number of members with this tRNA gene) |
---|---|---|---|
Leu | UUA | 2 | A1 (2) |
Tyr | UAC | 3 | A2 (2), A10 (1) |
Lys | AAG | 6 | A2 (3), A8 (3) |
Glu | GAG | 8 | A2 (2), A9 (6) |
Leu | CUG | 32 | A3 (32) |
Gln | CAG | 41 | A2 (19), A6 (17), A10 (3), A12 (1), A13 (1) |
Asn | AAC | 68 | A2 (7), A3 (39), A6 (17), A7 (1), A10 (3), A13 (1) |
Trp | UGG | 123 | A1 (9), A2 (11), A3 (43), A5 (24), A6 (17), A7 (2), A8 (2), A10 (9), A11 (5), A13 (1) |
The major functional explanation for the presence of tRNA genes in phages is that they correspond to tRNAs for codons that are more common in the phage genome than in the host genome. However, it is possible that a given tRNA gene is favored because the phage uses the amino acid it corresponds to – rather than the specific codon - more than its host. We tested these predictions by comparing the use of a particular amino acid or codon in phages with a given tRNA gene to that of their potential hosts. We also compared the use of a particular amino acid or codon in phages with and without a given tRNA gene.
Although almost all of these phages were isolated on a common host, this does not mean that this bacterium is their usual host. To gain a better sense of the possible bacterial source of the tRNA genes of mycobacteriophages and thus their potential hosts, we determined whether each unique phage tRNA sequences (55 unique sequences) showed significant nucleotide similarity to bacterial tRNA genes in the Genomic tRNA database.15 We describe 5 potential outcomes: 1) a phage tRNA gene only showed nucleotide similarity to tRNA genes of Mycobacterium sp. (not observed); 2) a phage tRNA gene had best nucleotide similarity to tRNA genes of Mycobacterium and to a lesser extent to other bacterial hosts (observed in 6 sequences: Tyr-GUA in A10; Lys-CUU in A8; Asn-GUU in one A2 and in some A3; Trp-CCA in some A3 and A10 phages); 3) a phage tRNA gene displayed greater nucleotide similarity to a non-mycobacterial tRNA gene and to a lesser extent to a Mycobacterium tRNA gene (32 of the 55 sequences); 4) a phage tRNA gene only had nucleotide similarity to non-mycobacterial tRNA genes (one Glu-CUC sequence in A2, all 3 Leu-CAG sequences in A3, and 12 of 13 Gln-CUG in various subclusters); and 5) a tRNA gene did not exhibit similarity to any bacterial tRNA gene in the database (one Gln-CUG in some A2). In a few cases, the phage tRNA showed greater similarity to a bacterial tRNA gene with a different anticodon.
We used the above analysis to generate a list of potential bacterial sources for tRNA genes and to explore the possibility that these sources may also be potential hosts for these phages (see Fig. 4). If that is the case, the codon usage of these bacteria is relevant to the phage's biology. Overall, for 3 tRNA species, we documented significant similarity to tRNA genes of Mycobacterium smegmatis mc2 155 (Tyr-GUA, Lys-CUU, Trp-CCA). For 5 of the tRNA species, we found significant similarity to tRNA genes of Mycobacterium ulcerans (Leu-UAA, Tyr-GUA, Glu-CUC, Asn-GUU, Trp-CCA). For 2 tRNA species (Leu-CAG and Gln-CUG), we observed no sequence similarity to Mycobacterium species. For example, for Leu-CAG, the best match was to the Leu-CAG of Hydrogenivirga sp 128-5-R1-1, while for Gln-CUG it was to the Gln-CUG of Nitrosococcus oceani ATCC 19707.
Figure 4.
Relative frequency of amino acid usage for cluster A phages containing a specific tRNA gene (w/ tRNA), phages lacking that tRNA gene (w/o tRNA), for the capsid protein of phages with and without a specific tRNA gene (capsid w/ and capsid w/o), and for their potential hosts: Mycobacterium smegmatis (M. s.), Mycobacterium ulcerans (M. u.), and bacterial hosts (Other) selected based on BLAST matches for each phage tRNA species (for Tyr: Rhodopseudomonas palustris CGA009, Lys: Leptothrix cholodnii SP-6, Glu: Moorella thermoacetica ATCC 39073, Leu: Staphylococcus epidermidis RP62A, Gln: Nitrosococcus oceani ATCC 19707, Asn: Candidatus Koribacter versatilis Ellin345: Trp: Clavibacter michiganensis sepedonicus). Amino acids are listed based on tRNA gene frequency – from rare to most common (Trp); Leu-UAA is excluded due to small sample size (only one annotated phage with this tRNA gene).
On the basis of these matches, we compared the prevalence of a particular amino acid or codon in the called genes of the cluster A mycobacteriophages to that of called genes in Mycobacterium smegmatis and ulcerans as well as an additional bacterial host for each tRNA species, selected on the basis of matches in the Genomic tRNA database and availability of a complete genome annotation in GenBank. In all comparisons (Fig. 4), phages use the amino acids Trp, Tyr, Lys more than all their putative hosts, and use the amino acids Glu, Gln and Asn more than their putative mycobacterial hosts but not their other selected host. However, although it is the amino acid with an associated tRNA gene most used by phages (Fig. 4), Leu is used less frequently by phages than by their putative hosts. In contrast, Trp is the amino acid least used by these phages, although the tRNA gene for Trp is the most common tRNA gene in these phages (Table 3). There is also little difference in the use of these amino acids by the phages with and without a given tRNA gene (Table 4). Only in the case of Asn was the presence of a tRNA gene associated with greater use of that amino acid; although the difference is statistically significant, it is small (frequency of Asn in phages with vs. without the Asn tRNA gene: 3.33% vs. 3.25%; Fig. 4).
Table 4.
T-tests comparing amino acid usage for phages with and without a given tRNA gene across cluster A mycobacteriophages (N = 164 genomes in GenBank) for the whole genome or the capsid protein. Only values in bold were significant after doing sequential Bonferroni correction.
Sample size |
Whole genome |
Capsid protein |
||||
---|---|---|---|---|---|---|
tRNA | with | without | t | P | t | P |
Tyr | 3 | 161 | 0.99 | 0.3241 | 0.29 | 0.7724 |
Lys | 3 | 161 | 0.98 | 0.3304 | 2.49 | 0.0137 |
Glu | 8 | 156 | 1.50 | 0.1351 | 0.63 | 0.5279 |
Leu | 18 | 146 | 0.89 | 0.3744 | 1.28 | 0.2027 |
Gln | 26 | 138 | 1.85 | 0.0658 | 5.05 | <0.0001 |
Asn | 39 | 125 | 3.42 | 0.0008 | 5.50 | <0.0001 |
Trp | 93 | 71 | 0.35 | 0.7252 | 7.04 | <0.0001 |
We found less support for the hypothesis that tRNA genes are retained based on codon usage. For an amino acid with a unique tRNA species such as Trp, this is obviously not a viable explanation. For the other amino acids, the tRNA gene does correspond to a codon used most frequently in the genome of mycobacteriophages (Fig. 5); in most cases the relative frequency of use of this codon (compared to other codons for this amino acid) is greater than 70%. In addition, the phages usually use the codon that corresponds to their tRNA gene more than their putative hosts, but there are exceptions (Fig. 5). The Leu codon is used by the mycobacterial hosts more than the phages, the Lys codon is used by the “other” host more than the phages, while the Gln codon is used by M. smegmatis at a similar frequency as in mycobacteriophages. In no cases did we find a difference in codon usage between phages with and without a particular tRNA gene (Table 5).
Figure 5.
Relative frequency of codon usage for cluster A phages containing a specific tRNA gene (w/ tRNA), phages lacking that tRNA gene (w/o tRNA), for the capsid protein of phages with and without a specific tRNA gene (capsid w/ and capsid w/o), and for their potential hosts (see list in Fig. 4). Amino acids are listed based on tRNA gene frequency – from rare to most common; Leu-UAA is excluded due to small sample size (only one annotated phage with this tRNA gene).
Table 5.
T-tests comparing codon usage for phages with and without a given tRNA gene across the cluster A mycobacteriophages (N = 164 genomes in GenBank) for the whole genome or the capsid protein. Only values in bold were significant after doing sequential Bonferroni correction.
Whole | Capsid | |||||
---|---|---|---|---|---|---|
tRNA | In-group | Out-group | t | P | t | P |
Tyr | 3 | 161 | 0.42 | 0.6752 | 0.35 | 0.7266 |
Lys | 3 | 161 | 1.24 | 0.2172 | 0.44 | 0.6613 |
Glu | 8 | 156 | 2.15 | 0.0333 | 7.11 | <0.0001 |
Leu | 17 | 147 | 1.94 | 0.0537 | 9.92 | <0.0001 |
Gln | 26 | 138 | 1.93 | 0.0549 | 2.32 | 0.0214 |
Asn | 39 | 125 | 1.23 | 0.2221 | 0.78 | 0.4340 |
We repeated these analyses, comparing amino acid or codon usage by phages with and without a particular tRNA gene, focusing on one gene that can be expected to be highly efficiently translated, the gene for the major capsid protein. The capsid proteins in these cluster A phages are predicted to contain 309–397 amino acid residues. For the amino acids associated with tRNA genes, Trp is least used while Leu is most used in the capsid protein (amino acid, # of residues in capsid protein: Trp, 4–6; Tyr, 6–11; Gln, 8–18; Asn, 10–18; Glu, 10–22; Lys, 11–21; Leu, 17–34). The frequency of 4 of these amino acids in the capsid protein differed between phages with and without a given tRNA gene in the direction expected: Lys, Gln, Asn and Trp were used more in the capsid protein of phages with these tRNA genes (Table 4 and Fig. 4). Compared to their use by their putative bacterial hosts, 3 of these amino acids (Lys, Gln, and Asn) are also used more frequently in the capsid protein. In addition, although the codons for all of these tRNA genes are used more frequently in the capsid gene than the whole genome and more frequently than in their putative hosts, the differences between phages with and without a given tRNA gene is significant only for Glu and Leu (Table 5 and Fig. 5). These data should be interpreted with caution given the small number of residues for some of these amino acids in the capsid protein and the unequal representation of different subclusters in the 2 groups of phages.
Discussion
Many hypotheses have been proposed to explain the presence of tRNA genes in bacteriophages. One hypothesis proposed that temperate bacteriophages integrate into bacterial genomes at one of the host's tRNA genes.16 In these cases the phage's tRNA gene compensates for the inactivation of the host's tRNA. Although possibly true in some specific cases, this hypothesis cannot be generally true since phage integration does not always disrupt the tRNA gene and phages that integrate into a host's tRNA gene do not always bring in their own tRNA genes.10 Moreover, this hypothesis does not explain the presence of more than one tRNA gene or the presence of tRNA genes in lytic phages.
Codon versus amino acid usage
A number of studies have suggested that matching codon usage of their host is a major force in phage evolution.10,17-19 However, a survey of recent studies comparing codon usage by phages and their hosts revealed a mixture of studies where the presence of tRNA genes corresponds to codons overly expressed by the phage compared to its host10,20-23 and studies where that is not the case.24-27 In phage PVP-SE1, only 11 of the codons associated with its 24 tRNA genes were present at a frequency higher than in its Salmonella hosts.28 In the 5 Acinetobacter phages studied by Jin et al., an association between the presence of tRNA genes and codon usage could only be observed for one Acinetobacter phage, ZZ1, and for only 5 out of the 8 tRNA genes.4 In the Acinetobacter Acibel004 phage, 16 out of its 22 tRNAs are found in higher frequency in the phage's genome than in its host A. baumannii.29 In phage B2, only 2 of the 6 tRNA genes corresponded to codons more frequently used by the phage than its Lactobacillus plantarum host.30 Limor-Waisberg et al. looked at cyanophages with very different genome size and GC content: T7-like podoviruses have small genomes, few tRNA genes and only infect hosts with similar GC content, while T4-like myoviruses have larger genomes, more tRNA genes, and can infect hosts that differ in GC content.21 Among the myoviruses, greater difference in GC content between host and phages was associated with more tRNA genes in these phages.21 However, we did not find this pattern in mycobacteriophages (Fig. 3). In all of the above studies, the alternative hypothesis that amino acid usage explained the retention of tRNA genes was not explored.
Similarly, the picture is mixed for eukaryotic viruses. For example, the tRNA genes in mimiviruses do not correspond to the codons and/or amino acids most used by these viruses31 while the tRNA genes of prasinoviruses32 correspond to amino acids used least by their hosts.
Our comparative study of amino acid and codon usage in cluster A mycobacteriophages provides support for both hypotheses, with somewhat greater support for amino acid usage (Fig. 4 and 5). First, the most common tRNA gene in these phages is the tRNA gene for Trp, which is of course encoded by only one codon. Second, either amino acid or codon usage can explain the presence of the remaining tRNA genes. However, the presence of the tRNA genes for Leu is consistent with the codon usage hypothesis only when considering the capsid protein only. When considering the whole genome, we found no differences in amino acid usage by phages that could explain why some phages have a given tRNA gene in their genome and very similar phages do not (Table 4). Amino acid composition of particular proteins such as the major capsid protein may be relevant (Table 4) but which proteins should be included in such analyses remains to be determined. Differences in codon usage between these phages could only explain the presence of 2 tRNA genes (Glu-CUC and Leu-CUG) when considering the codon usage of the major capsid protein rather than the whole phage genome.
Host use vs. host range
Several factors make testing the amino acid/codon hypotheses difficult. First, we rarely know if a phage has been isolated on its usual/preferred host. Second, whether codon or amino acid usage better explains the retention of tRNA genes, we still need to know how phages benefit from the presence of such genes. The usual assumption, supported by limited experimental work,13 is that phages grow better (larger burst size and shorter latency) in their hosts due to the presence of these tRNA genes. An alternative, and not mutually exclusive, benefit is that phages with tRNA genes are capable of lysing a greater number of hosts (i.e., have a broader host range). Without better knowledge of a phage's host range, testing the hypothesis that tRNA genes are retained due to their correspondence to highly used amino acids or codons is difficult. If the phage and its host have similar amino acid/codon usage, then the presence of tRNA genes remains unexplained. They may be useful in another host or they may have yet to be eliminated. Similarly, if the phage and its host have different codon usage but the phage possesses tRNA genes that do not correspond to its highly used amino acids/codons, the same alternative explanations can be invoked. Without better data on a phage's host range, we can never reject the hypothesis that the tRNA genes would be useful in an alternative host. Third, codon/amino acid usage may differ for phage genes expressed during “latency” versus those expressed during “production.”17,33 Thus, the tRNA genes may only be useful and needed for the expression of some genes and it is the amino acid/codon usage of these genes that matter.34 Differences in amino acid and codon usages between the capsid protein and the whole genome of cluster A mycobacteriophages support this argument (Fig. 4 and 5). We are not asserting that amino acid or codon usage for a single gene explains the presence of tRNA genes. However, determining which genes are relevant for such analysis will require experimental work, determining differences in protein production in the absence vs. presence of a given tRNA gene. Finally, given the rapid rate of evolution of phage genomes, it is quite possible that tRNA genes have different roles in different clusters.34 These considerations suggest that much more work needs to be done to understand the presence of tRNA genes in phages.
Conclusions
Although codon usage between the host and the phage in an attractive explanation for the presence of tRNA genes in phages, our comparative study in mycobacteriophages suggests that the pattern of amino acid usage by phages fits the data better and that this alternative hypothesis needs to be considered. To better understand the role of tRNA genes in phages, we also need to understand how the presence of tRNA genes affects either host range (do phages with or without tRNA genes differ in host ranges?) or growth rate (do phages with or without tRNA genes differ in burst size or latency?). In addition to figuring out the role of tRNA genes, other questions remain unanswered: 1) why are tRNA genes more common in some (sub)clusters than others? 2) why do all members of some clusters have tRNA genes while some clusters do not contain any tRNA genes?
Our data also suggest that when multiple tRNA genes are found in the same phage they may have ultimately been acquired from different bacteria. Thus, tRNA genes have probably been acquired through a complex series of recombination events involving bacterial hosts that these phages may not themselves infect and the dynamics between insertion and retention in phage genomes constrain them to specific genome locations.
Materials and methods
Database
Our study made use of the large collection of mycobacteriophages isolated as part of the PHIRE and Howard Hughes Medical Institute's SEA-PHAGES programs.3,14 The majority of these bacteriophages have been isolated on a common host, Mycobacterium smegmatis mc2 155. All are double-stranded DNA viruses that belong to 2 families: Myoviridae or Siphoviridae. These phages have been further divided into a number of clusters based on genome similarity, with members of a cluster exhibiting nucleotide similarity over more than 50% of their genome length and having similar genome organization.14 All Myoviridae belong to cluster C, with the remaining clusters belonging to the Siphoviridae (Table 1). The terms of use for this phage collection are available on the Actinobacteriophage Database web site (http://phagesdb.org). Given the dynamic nature of this database, where phages are added on a weekly basis, we “closed” our dataset by only including in our study the 817 sequenced and annotated phages listed as of March 1st, 2015. This data set, maintained by Dan Russell and the Pittsburgh Bacteriophage Institute, provided information on genome length, and on number of coding (ORF) and tRNA genes.
Confirming tRNA genes in Cluster A
For the reasons listed above, many of our analyses focused on Cluster A. For these phages, we verified the tRNA genes by running all genome sequences through Aragorn35 and tRNA-Scan36-37 with default settings and only tRNA genes that were called by both programs were included in our analyses. All tRNA genes had COVE scores > 49.0, with a COVE score of 20 being usually considered good for a tRNA gene (Pope, personal communication). Both programs called the same tRNA genes except in one case. For the recently isolated (2014) phage DarthPhader, the first member of subcluster A12, Aragorn called a proline, which was not called by tRNA-Scan. This is the only instance where a tRNA Pro has been called in this cluster. In addition, this tRNA sequence does not have sequence similarity to any tRNA gene found on the tRNAscan-SE genomic database.15 We excluded this tRNA gene from our analyses, but this has no impact for our analyses or conclusions.
Comparative studies of tRNA genes in cluster A
To better understand the homology among tRNA genes, all tRNA sequences for a particular tRNA species were aligned with Muscle (default settings) and compared using MEGA6, and all unique sequences were compared to bacterial tRNA sequences using the Genomic tRNA database.15 All but one sequence showed matches to at least one bacterial tRNA gene.
Amino acid and codon usage
Based on annotations available on GenBank and called open reading frames (ORFs), we derived amino acid and codon usage for selected phages and bacteria. We only used annotations (164 phage genomes) that were available from GenBank and had thus been curated. We compared the usage of a particular amino acid/codon in phages with the tRNA gene for that amino acid/codon to those without that tRNA gene by t-test (with p value adjusted for multiple comparisons by sequential Bonferonni). We did these analyses both for the whole genome and for the capsid protein only.
Disclosure of potential conflicts of interest
No potential conflicts of interest were disclosed.
Acknowledgments
We thank the Spring 2014 Phage Genomics class at Gettysburg College, especially Celina Harris, for the initial analyses that spurred this work. This study was only made possible by the work done by the PHIRE and SEA-PHAGES programs, and we are particularly grateful to Graham Hatfull, Debbie Jacobs-Sera, Welkin Pope, and Dan Russell for their help and support, to Dan Russell for the data in Table 1, and to the Howard Hughes Medical Institute for their support of the SEA-PHAGES program. We also thank the anonymous reviewer who suggested doing the amino acid/codon usage analyses using the coding region for the major capsid protein.
Funding
This work was supported by Research and Professional Development grants from Gettysburg College to Véronique Delesalle and Greg Krukonis and by a grant to Gettysburg College from the Howard Hughes Medical Institute through the Precollege and Undergraduate Science Education Program.
References
- [1].Alonso JC, Lüder G, Stiege AC, Chai S, Frank Weise F, Trautner TA. The complete nucleotide sequence and functional organization of Bacillus subtilis bacteriophage SPP1. Gene 1997; 204:201-12; PMID:9434185; http://dx.doi.org/ 10.1016/S0378-1119(97)00547-7 [DOI] [PubMed] [Google Scholar]
- [2].Pedulla ML, Ford ME, Houtz JM, Karthikeyan T, Wadsworth C, Lewis JA, Jacobs-Sera D, Jacob Falbo J, Gross J, Pannunzio NR, et al.. Origins of highly mosaic mycobacteriophage genomes. Cell 2003; 113:171-82; PMID:12705866; http://dx.doi.org/ 10.1016/S0092-8674(03)00233-2 [DOI] [PubMed] [Google Scholar]
- [3].Hatfull GF, Jacobs-Sera D, Lawrence JG, Pope WH, Russell DA, Ko C, Weber RJ, Patel MC, Germane KL, Edgar RH, et al.. Comparative genomic analysis of 60 mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. J Mol Biol 2010; 397:119-43; PMID:20064525; http://dx.doi.org/ 10.1016/j.jmb.2010.01.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Jin J, Li ZJ, Wang SW, Wang SM, Chen SJ, Huang DH, Zhang G, Li YH, Wang XT, Wang J, Zhao GQ. Genome organisation of the Acinetobacter lytic phage ZZ1 and comparison with other T4-like Acinetobacter phages. BMC Genomics 2014; 15: art. no:793; PMID:25218338; http://dx.doi.org/ 10.1186/1471-2164-15-793 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola B, Suzan M, Claverie JM. The 1.2-megabase genome sequence of Mimivirus. Science 2004; 306(5700):1344-50; PMID:15486256; http://dx.doi.org/ 10.1126/science.1101485 [DOI] [PubMed] [Google Scholar]
- [6].Pope WH, Bowman CA, Russell DA, Jacobs-Sera D, Asai DJ, Cresawn SG, Jacobs WR Jr, Hendrix RW, Lawrence JG, Hatfull GF, et al.. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity. Elife 2015; 4:e06416; PMID:25919952; http://dx.doi.org/ 10.7554/eLife.06416 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Miller ES, Kutter E, Mosig G, Arisaka F, Kunisawa T, Ruger W. Bacteriophage T4 genome. Microbiol Mol Biol Rev 2003; 67:86-156; PMID:12626685; http://dx.doi.org/ 10.1128/MMBR.67.1.86-156.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Hinton D M, Pande S, Wais N, Johnson XB, Vuthoori M, Makela A, Hook-Barnard I. Transcriptional takeover by σ appropriation: remodelling of the σ70 subunit of Escherichia coli RNA polymerase by the bacteriophage T4 activator MotA and co-activator AsiA. Microbiology 2005; 151:1729-40; PMID:15941982; http://dx.doi.org/ 10.1099/mic.0.27972-0 [DOI] [PubMed] [Google Scholar]
- [9].Kunisawa T. Functional role of mycobacteriophage transfer RNAs. J Theor Biol 2000; 205:167-70; PMID:10860710; http://dx.doi.org/ 10.1006/jtbi.2000.2057 [DOI] [PubMed] [Google Scholar]
- [10].Bailly-Bechet M, Vergassola M, Rocha E. Causes for the intriguing presence of tRNAs in phages. Genome Res 2007; 17:1486-95; PMID:17785533; http://dx.doi.org/ 10.1101/gr.6649807 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Juhala RJ, Ford ME, Duda RL, Youlton A, Hatfull GF, Hendrix RW. Genomic sequences of bacteriophages HK97 and HK022: pervasive genetic mosaicism in the lambdoid bacteriophages. J Mol Biol 2000; 299:27-51; PMID:10860721; http://dx.doi.org/ 10.1006/jmbi.2000.3729 [DOI] [PubMed] [Google Scholar]
- [12].Lawrence JG, Hatfull GF, Hendrix RW. Imbroglios of viral taxonomy: genetic exchange and failings of phenetic approaches. J Bacteriol 2002; 184:4891-905; PMID:12169615; http://dx.doi.org/ 10.1128/JB.184.17.4891-4905.2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Wilson JH. Function of bacteriophage-T4 transfer-RNAs. J Mol Biol 1973; 74:753-4; PMID:4729526; http://dx.doi.org/ 10.1016/0022-2836(73)90065-X [DOI] [PubMed] [Google Scholar]
- [14].Hatfull GF, Pedulla ML, Jacobs-Sera D, Cichon PM, Foley A, Ford ME, Gonda RM, Houtz JM, Hryckowian AJ, Kelchner VA, et al.. Exploring the mycobacteriophage metaproteome: phage genomics as an educational platform. PLoS Genet 2006; 2:e92; PMID:16789831; http://dx.doi.org/ 10.1371/journal.pgen.0020092 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Chan PP, Lowe TM. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res 2009; 37:D93-7; PMID:18984615; http://dx.doi.org/ 10.1093/nar/gkn787 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Cheetham BF, Katz M E. A role for bacteriophages in the evolution and transfer of bacterial virulence determinants. Mol Microb 1995; 18:201-8; http://dx.doi.org/ 10.1111/j.1365-2958.1995.mmi_18020201.x [DOI] [PubMed] [Google Scholar]
- [17].Krakauer DC, Jansen VA. Red queen dynamics of protein translation. J Theor Biol 2002; 218:97-109; PMID:12297073; http://dx.doi.org/ 10.1006/jtbi.2002.3054 [DOI] [PubMed] [Google Scholar]
- [18].Carbone A. Codon bias is a major factor explaining phage evolution in translationally biased hosts. J Mol Evol 2008; 66:210-23; PMID:18286220; http://dx.doi.org/ 10.1007/s00239-008-9068-6 [DOI] [PubMed] [Google Scholar]
- [19].Bahir I, Fromer M, Prat Y, Linial M. Viral adaptation to host: a proteome-based analysis of codon usage and amino acid preferences. Mol Syst Biol 2009; 5:311; PMID:19888206; http://dx.doi.org/ 10.1038/msb.2009.71 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Villion M, Chopin M-C, Deveau H, Ehrlich SD, Moineau S, Chopin A. P087, a lactococcal phage with a morphogenesis module similar to an Enterococcus faecalis prophage. Virology 2009; 388:49-56; PMID:19349056; http://dx.doi.org/ 10.1016/j.virol.2009.03.011 [DOI] [PubMed] [Google Scholar]
- [21].Limor-Waisberg K, Carmi A, Scherz A, Pilpel Y, Furman I. Specialization versus adaptation: two strategies employed by cyanophages to enhance their translation efficiencies. Nucleic Acids Res 2011; 39:6016-28; PMID:21470965; http://dx.doi.org/ 10.1093/nar/gkr169 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Prabhakaran R, Chithambaram S, Xia X. Aeromonas phages encode tRNAs for their overused codons. Int J Comput Biol Drug Des 2014; 7:168-82; PMID:24878728; http://dx.doi.org/ 10.1504/IJCBDD.2014.061645 [DOI] [PubMed] [Google Scholar]
- [23].Wittmann J, Dreiseikelmann B, Rohde M, Meier-Kolthoff JP, Bunk B, Rohde C. First genome sequences of Achromobacter phages reveal new members of the N4 family. Virol J 2014; 11: art. no:14; http://dx.doi.org/ 10.1186/1743-422X-11-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Samson JE, Moineau S. Characterization of Lactococcus lactis phage 949 and comparison with other lactococcal phages. Appl Environ Microbiol 2010; 76:6843-52; PMID:20802084; http://dx.doi.org/ 10.1128/AEM.00796-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Dreher TW, Brown N, Bozarth CS, Schwartz AD, Riscoe E, Thrash C, Bennett SE, Tzeng SC, Maier CS. A freshwater cyanophage whose genome indicates close relationships to photosynthetic marine cyanomyophages. Environ Microbiol 2011; 13:1858-74; PMID:21605306; http://dx.doi.org/ 10.1111/j.1462-2920.2011.02502.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Cornelissen A, Hardies SC, Shaburova OV, Krylov VN, Mattheus W, Kropinski AM, Lavigne R. Complete genome sequence of the giant virus OBP and comparative genome analysis of the diverse ΦKZ-related phages. J Virol 2012; 86:1844-52; PMID:22130535; http://dx.doi.org/ 10.1128/JVI.06330-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Gervasi T, Curto RL, Narbad A, Mayer MJ. Complete genome sequence of ΦCP51, a temperate bacteriophage of Clostridium perfringens. Arch Virol 2013; 158:2015-7; PMID:23575881; http://dx.doi.org/ 10.1007/s00705-013-1647-1 [DOI] [PubMed] [Google Scholar]
- [28].Santos SB, Kropinski AM, Ceyssens PJ, Ackermann HW, Villegas A, Lavigne R, Krylov VN, Carvalho CM, Ferreira EC, Azeredo J. Genomic and proteomic characterization of the broad-host-range Salmonella phage PVP-SE1: creation of a new phage genus. J Virol 2011; 85:11265-73; PMID:21865376; http://dx.doi.org/ 10.1128/JVI.01769-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Merabishvili M, Vandenheuvel D, Kropinski AM, Mast J, De Vos D, Verbeken G, Noben JP, Lavigne R, Vaneechoutte M, Pirnay JP. Characterization of newly isolated lytic bacteriophages active against Acinetobacter baumannii. PloS One 2014; 9:e104853; PMID:25111143; http://dx.doi.org/ 10.1371/journal.pone.0104853 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Briggiler MM, Garneau JE, Tremblay D, Quiberoni A, Moineau S. Characterization of two virulent phages of Lactobacillus plantarum. Appl Environ Microbiol 2012; 78:8719-34; PMID:23042172; http://dx.doi.org/ 10.1128/AEM.02565-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Colson P, Fournous G, Diene SM, Raoult D. Codon usage, amino acid usage, transfer RNA and amino-acyl-tRNA synthetases in Mimiviruses. Intervirology 2013; 56:364-75; PMID:24157883; http://dx.doi.org/ 10.1159/000354557 [DOI] [PubMed] [Google Scholar]
- [32].Michely S, Toulza E, Subirana L, John U, Cognat V, Maréchal-Drouard L, Grimsley N, Moreau H, Piganeau G. Evolution of codon usage in the smallest photosynthetic eukaryotes and their giant viruses. Genome Biol Evol 2013; 5:848-59; PMID:23563969; http://dx.doi.org/ 10.1093/gbe/evt053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Karlin S, Blaisdell BE, Schachtel GA. Contrasts in codon usage of latent versus productive genes of Epstein-Barr virus: data and hypotheses. J Virol 1990; 64:4264-73; PMID:2166815 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Pope WH, Anders KR, Baird M, Bowman CA, Boyle MM, Broussard GW, Chow T, Clase KL, Cooper S, Cornely KA, et al.. Cluster M mycobacteriophages Bongo, PegLeg, and Rey with unusually large repertoires of tRNA isotypes. J Virol 2014; 88:2461-80; PMID:24335314; http://dx.doi.org/ 10.1128/JVI.03363-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Laslett D, Canback B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res 2004; 32:11-6; PMID:14704338; http://dx.doi.org/ 10.1093/nar/gkh152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Lowe TM, Eddy SR. tRNAscan-SE: A Program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997; 25:0955-64; http://dx.doi.org/ 10.1093/nar/25.5.0955 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Schattner P, Brooks AN, Lowe TM. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 2005; 33:W686-9; PMID:15980563; http://dx.doi.org/ 10.1093/nar/gki366 [DOI] [PMC free article] [PubMed] [Google Scholar]