Thermus thermophilus bacteriophage ϕYS40 genome and proteomic characterization of virions

Tatyana Naryshkina; Jing Liu; Laurence Florens; Selene K Swanson; Andrey R Pavlov; Nadejda V Pavlova; Ross Inman; Sergei A Kozyavkin; Michael Washburn; Arcady Mushegian; Konstantin Severinov

doi:10.1016/j.jmb.2006.08.087

. Author manuscript; available in PMC: 2007 Dec 8.

Published in final edited form as: J Mol Biol. 2006 Sep 6;364(4):667–677. doi: 10.1016/j.jmb.2006.08.087

Thermus thermophilus bacteriophage ϕYS40 genome and proteomic characterization of virions

Tatyana Naryshkina ^1,^#, Jing Liu ^2,^#, Laurence Florens ², Selene K Swanson ², Andrey R Pavlov ³, Nadejda V Pavlova ³, Ross Inman ⁴, Sergei A Kozyavkin ³, Michael Washburn ², Arcady Mushegian ^2,⁵, Konstantin Severinov ^1,^6,^7,^*

PMCID: PMC1773054 NIHMSID: NIHMS14318 PMID: 17027029

Abstract

We determined the sequence of the 152,372-bp genome of ϕYS40, a lytic tailed bacteriophage of Thermus thermophilus. The genome contains 170 putative open reading frames and three tRNA genes. Functions for 25% of ϕYS40 gene products were predicted on the basis of similarity to proteins of known function from diverse phages and bacteria. ϕYS40 encodes a cluster of proteins involved in nucleotide salvage, such as flavin-dependent thymidylate synthase, thymidylate kinase, ribonucleotide reductase, and deoxycytidylate deaminase, and in DNA replication, such as DNA primase, helicase, type A DNA polymerase, and predicted terminal protein involved in initiation of DNA synthesis. The structural genes of ϕYS40, most of which have no similarity to sequences in public databases, were identified by mass-spectrometric analysis of purified virions. Various ϕYS40 proteins have different phylogenetic neighbors, including Myovirus, Podovirus, and Siphovirus gene products, bacterial genes, and in one case, a dUTPase from a eukaryotic virus. ϕYS40 has apparently arisen through multiple acts of recombination between different phage genomes as well as through acquisition of bacterial genes.

Keywords: Thermus thermophilus, bacteriophage, genome, virion, proteomics, bioinformatics, DNA polymerase

Introduction

In the last decade, the genomes of several hundred phages have been completely sequenced (282 complete dsDNA phage genomes in the Genome Division of GenBank as of July 2006). While bacterial hosts of these phages are phylogenetically diverse, only ten of those completely sequenced phages are known to infect thermophilic microorganisms. Most of‘thermophilic’ phages were isolated from a small number of archaeal species ¹^-³. Sequence analysis revealed that archaeophages encode mostly uncharacterized proteins with no similarities to sequences in public databases, though more detailed examination revealed a limited number of recognizable ATPases, nucleotide salvage enzymes, and putative transcription factors ⁴. As of the time of this writing, the only sequenced genome of a phage from a thermophilic eubacterium is RM 378 that infects Rhodothermus marinus⁵.

During their development in a bacterial host, phages are known to regulate host macromolecular synthesis by modifying host transcription and translation machinery and making it serve the needs of the virus. Proteins from thermophilic bacteria are particularly amenable to structural studies of large complexes involved in DNA replication, DNA transcription, and RNA translation. Thus, structural and functional analysis of thermophilic phage-encoded regulators and their complexes with RNA polymerases, ribosomes, and other components of thermophilic bacteria can provide insights into molecular mechanisms of regulation of transcription, translation, and other cellular processes. With these ideas in mind, we determined the genomic sequence of ϕYS40, a large myophage hosted by the thermophilic bacterium Thermus thermophilus (temperature range from 56 to 78°C) ⁶. Here, we present the results of a preliminary study of the ϕYS40 genome and the proteome of ϕYS40 virions.

Results

Overview of the ϕYS40 genome

The sequence of the ϕYS40 genome was determined using the fimer technology and assembled into a single 152,372 bp contig using the phredPhrap package (see Materials and Methods). The G + C content of the ϕYS40 genome is 32.59%, which is significantly lower than that of its host (69.4%). Though the GC-content of ϕYS40 is close to values typical of the low-GC Gram-positive bacteria, there is no specific evolutionary affinity between sequences of ϕYS40 and these bacteria, and the GC-content of the phage may instead reflect specific aspects of phage molecular biology, for example distinct mutational bias of its DNA polymerase. ϕYS40 DNA appears to be unmodified as it is susceptible to digestion with all common methylation-sensitive restriction endonucleases tested (data not shown).

A total of 170 ORFs were predicted in the ϕYS40 genome (Table 1, Fig. 1). The intergenic regions were screened for additional genes by searching GenBank, GenPept, and the database of unfinished microbial genomes at NCBI, but no additional conserved ORFs were found. The predicted ϕYS40 ORFs are between 43 and 1744 codons in length. As with most other phages, the genome of ϕYS40 is tightly packed: coding sequences occupy 95% of the ϕYS40 genome. There are 46 cases of overlaps (from 1 to 40 bases long) between neighboring ORFs. The longest non-coding region (390 bp) lies between ORF138 and ORF139. Most of the 170 predicted ORFs start at the AUG codon, 22 ORFs use GUG codon, and three use UUG. At the ends of ϕYS40 genes, there are 90 TAA stop codons, 66 TGA codons, and 16 TAG codons.

Table 1.

Gene products of phage ^ϕYS40 and their predicted molecular functions.

ORF name	ORF strand/position^a	ORF length (amino acids)	The best database match with validated similarity	Taxonomic origin of the best match	Function and other properties^b
1	- / (7..1938)	643	34419532	Vibrio phage KVP40	distal tail fiber protein
2	- / (1941..4586)	881			unknown
3	- / (4573..7410)	945	48696430	Staphylococcus phage K	portal protein
4	- / (7412..8068)	218	90591438	Flavobacterium johnsoniae UW101	TM, unknown
5	- / (8096..8530)	144	19924248	Methanocaldococcus jannaschii	S-adenosylmethionine decarboxylase (adoMetDC)
6	- / (8564..8788)	74			unknown
7	- / (8801..9412)	203			unknown
8	- / (9399..9941)	180	9631083	Lymantria dispar nucleopolyhedrovirus	dUTPase
9	- / (9955..10782)	275	33357605	Thermotoga maritima	flavin-dependent thymidylate synthase
10	- / (10816..11331)	171			unknown
11	- / (11310..11783)	157	33860394	Burkholderia cepacia phage Bcep22	gp18, unknown function
12	- / (11776..12795)	339	23029929	Microbulbifer degradans	RecA/RadA recombinase
13	- / (12792..13367)	191	46200225	Thermus thermophilus HB27	Rad52 strand-exchange protein
14	- / (13413..14756)	447	22978288	Ralstonia metallidurans	DNA helicase DnaB
15	- / (14743..15036)	97			unknown
16	15124..15453	109			unknown
17	15467..16576	369	23029305	Microbulbifer degradans	IMP dehydrogenase / GMP reductase
18	16640..17050	136	23110678	Novosphingobium aromaticivorans	DNA binding HTH-domain protein, transcription regulator
19	- / (17108..18343)	411			Major structural protein
20	- / (18400..18837)	145			unknown
21	- / (18834..19214)	126			unknown
22	- / (19187..19960)	257			unknown
23	- / (19944..21620)	558	27262500	Heliobacillus mobilis	DNA primase bacterial DnaG type
24	- / (21669..22277)	202	37526389	Photorhabdus luminescens	thymidine kinase
25	- / (22302..23015)	237	15595102	Borrelia burgdorferi	ATP-dependent ClpP protease
26	- / (22975..23901)	308	9964625	Roseobacter phage SIO1	RecB family exonuclease
27	- / (23898..25247)	449	15900485	Streptococcus pneumoniae	DEAD domain helicase
28	25396..26796	466			unknown
29	26822..27331	169	52216967	Bacteroides fragilis YCH46	sugar-phosphate nucleotidyltransferas e
30	- / (27328..29085)	585			unknown
31	- / (29090..29803)	237			unknown
32	- / (29818..30291)	157			unknown
33	30387..32498	703	29348669	Bacteroides thetaiotaomicron	DNA polymerase, without N-terminal 5-3 exonuclease domain
34	- / (32491..32781)	96			3 TMs, unknown
35	- / (32768..33034)	88			2 TMs, unknown
36	- / (33031..33309)	92			unknown
37	33381..33746	121			unknown
38	33730..34158	142	21229604	Xanthomonas campestris	deoxycytidylate deaminase
39	34188..34616	142			unknown
40	34631..35155	174			unknown
41	35201..37594	797	23104360	Azotobacter vinelandii	ribonucleotide reductase, alpha subunit, the N-terminus
42	37607..38206	199	20808702	Thermoanaerobacter tengcongensis	ribonucleotide reductase, alpha subunit, the C-terminus
43	38240..38446	68			unknown
44	38459..38911	150			unknown
45	38898..39227	109			unknown
46	39224..39439	71			unknown
47	39441..39884	147			4 TMs, unknown
48	39877..40185	102			unknown
49	40201..40548	115			unknown
50	40558..41013	151			unknown
51	41010..42482	490			unknown
52	42536..43408	290	45914890	Mesorhizobium sp. BNC1	UDP-3-O-[3-hydroxy-myristory] glucosamine N-acyltransferase
53	43411..43938	175			unknown
54	43940..44425	161			unknown
55	- / (44426..45127)	233	23055325	Geobacter metallireducens	unknown
56	45187..46209	340	51891857	Symbiobacterium thermophilum	conserved bacterial protein, unknown
57	46199..47536	466	42521856	Bdellovibrio bacteriovorus	spore cortex synthesis protein SpoVR
58	47564..49414	616			unknown
59	49453..51312	619	23112542	Desulfitobacterium hafniense	putative serine protein kinase
60	51410..51997	195	29366771	Streptomyces phage phi-BT1	putative dNMP kinase
61	52035..52484	149
62	- / (52477..54345)	622	15668504	Methanocaldococcus jannaschii	terminase large subunit
63	- / (54320..55108)	262			unknown
64	- / (55105..55485)	126			unknown
65	- / (55466..56017)	183	22855150	Bacillus phage B103	terminal protein
66	56049..56315	88			unknown
67	- / (56362..57102)	246			unknown
68	- / (57104..57754)	216			unknown
69	- / (57775..59721)	648	22973075	Chloroflexus aurantiacus	tail sheath protein
70	- / (59782..60492)	236			unknown
71	- / (60495..61157)	220	48696435	Staphylococcus phage K	Zn ribbon, similar to archaeal transcription factor IIB
72	- / (61167..61682)	171			unknown
73	- / (61756..63168)	470			Major structural protein
74	- / (63204..64838)	544	48696431	Staphylococcus phage K	unknown, 3 coiled coil regions
75	- / (64838..65098)	86			unknown
76	- / (65085..69662)	1525			unknown
77	- / (69684..74918)	1744			unknown, 3 coiled coil regions
78	- / (74931..75296)	121			unknown
79	- / (75309..79883)	1524	40744644	Aspergillus nidulans	helicase (DEAD motif replaced by DDAE)
80	- / (79880..80743	287			unknown
81	- / (80788..82740)	650			unknown
82	- / (82771..84609)	612			unknown
83	- / (84867..85094)	75			unknown
84	- / (85328..85558)	76			unknown
85	- / (85767..85919)	50			unknown
86	- / (86022..86273)	83			unknown
87	- / (86382..86618)	78			unknown
88	- / (86909..87154)	81			unknown
89	- / (87505..87990)	161	15805515	Deinococcus radiodurans	2 TMs, unknown
90	- / (88074..88529)	151			unknown
91	- / (88642..89250)	202			unknown
92	- / (89349..89783)	144			unknown
93	- / (89796..90221)	141			unknown
94	- / (90481..90927)	148			unknown
95	- / (91036..91212)	58			unknown
96	91231..91359	43			unknown
97	- / (91417..91824)	135			3 TMs, unknown
98	- / (91835..92380)	181			unknown
99	- / (92503..93045)	180			unknown
100	- / (93045..93635)	196			unknown
101	- / (93619..94131)	170			unknown
102	- / (94337..94873)	178			unknown
103	- / (94885..95373)	162			unknown
104	- / (95510..96025)	171			unknown
105	- / (96096..96626)	176			unknown
106	- / (96833..97354)	173			unknown
107	- / (97575..99263)	562			unknown
108	- / (99280..100323)	347	15643692	Thermotoga maritima	ATPase
109	- / (100462..101157)	231			unknown
110	- / (101227..101973)	248			unknown
111	- / (102138..102530)	130			unknown
112	- / (102531..103076)	181			unknown
113	- / (103077..103616)	179			unknown
114	- / (103616..104107)	163	11992695	Escherichia coli	glycosyltransferase
115	- / (104451..104693)	80			unknown
116	- / (104803..105279)	158			unknown
117	- / (105422..105979)	185			unknown
118	- / (105969..106520)	183			unknown
119	- / (106510..107076)	188			unknown
120	- / (107090..107539)	149			unknown
121	- / (107552..108046)	164			unknown
122	- / (108141..108644)	167			unknown
123	- / (108772..109290)	172			unknown
124	- / (109328..109819)	163			unknown
125	- / (109998..110513)	171	18462664	Shigella flexneri	unknown
126	- / (110561..111145)	194			unknown
127	- / (111157..111654)	165			unknown
128	- / (111663..112133)	156			unknown
129	- / (112165..112677)	170			unknown
130	- / (112689..113195)	168			unknown
131	- / (113202..113630)	142			unknown
132	113852..114388	178			coiled coil, unknown
133	- / (114385..115032)	215			unknown
134	- / (115155..115724)	189			unknown
135	- / (115727..116299)	190			unknown
136	- / (116271..116693)	140			unknown
137	116815..117474	219			unknown
138	- / (117442..118005)	187			unknown
139	- / (118395..119999)	534	19552983	Corynebacterium glutamicum	unknown
140	- / (120226..120777)	183			unknown
141	120821..120994	58			unknown
142	- / (120953..123997)	1014			coiled coil, unknown
143	- / (124012..124536)	174			unknown
144	- / (124553..125593)	346	10956653	Rhodococcus equi	M27/M37 peptidase
145	- / (125598..126548)	316			unknown
146	- / (126553..126813)	86			3 TMs, unknown
147	126870..127055	61			unknown
148	127065..127460	131			unknown
149	127471..127959	162			unknown
150	127979..129967	662	34762157	Fusobacterium nucleatum	putative baseplate assembly protein
151	129964..131859	631			unknown
152	131870..134260	796	903862	Escherichia coli phage K3	wac fibritin neck whisker
153	134253..136364	703			unknown
154	136388..137287	299			unknown
155	137294..137644	116			unknown
156	137634..138497	287			unknown
157	138469..139269	266			unknown
158	139253..143296	1347			unknown
159	143322..143846	174			unknown
160	144155..144367	70			unknown
161	- / (144357..145424)	355	15674141	Lactococcus lactis	Radical SAM superfamily enzyme
162	- / (145421..146374)	317			unknown
163	- / (146390..147022)	210			unknown
164	147094..147639	181			unknown
165	147677..148306	209			unknown
166	148300..148689	129			unknown
167	148736..150229	497			unknown
168	150256..151341	361			unknown
169	151338..151907	189			unknown
170	151894..152157	87			unknown

Open in a new tab

^ϕ

YS40 virion proteins detected by MudPIT are indicated in red.

position of the ORFs in the phage YS40 genome; “-” indicates a leftwards transcription orientation.

presence of transmembrane domains (TM) and coiled coil regions are indicated.

The ϕYS40 genome.

Bacteriophage ϕYS40 genome is schematically presented with predicted ORFs indicated by arrows. Arrow direction indicates the direction of transcription. Several ORFs with clear functional predictions for their products are color-coded (see also Table 1 for more details).

Two-thirds of the ϕYS40 genes (114 genes) are transcribed in one direction, designated as leftward in the genome map (Fig. 1), and 56 genes are transcribed in the rightward direction. The G+C content is approximately the same for both sets of ORFs. Taking a set of genes transcribed in the same direction and having no more than three consecutive intruders (i.e., genes transcribed in a different direction) as a cluster, we find four gene clusters in the ϕYS40 genome. The ORF1-ORF36 and ORF62-ORF146 clusters are transcribed in the leftward direction, and ORF37-ORF61 and ORF147-ORF170 clusters are transcribed in the rightwards direction (Fig. 1). The probability of obtaining each of the four clusters by chance, calculated using equation 2 from Durand and Sankoff ⁷ is less than 0.1, indicating that at least part of the clustering may be due to evolutionary or functional constraints.

tRNA genes

Using the tRNA scan-SE program, we identified three tRNA genes in the ϕYS40 genome. The tRNA1 gene overlaps with ORF61, whereas the tRNA2 and tRNA3 genes are both located between ORF139 and ORF140. Other large tailed dsDNA bacteriophages, such as coliphage T4 ⁸, vibriophage KVP40 ⁹, and phage phiKZ of P. aeruginosa ¹⁰ also encode several tRNAs.

The ϕYS40 tRNA1 and tRNA3 recognize ACA (threonine) and AGA (arginine) codons, respectively. These codons, while overrepresented in the ϕYS40 genome, are the rarest threonine and arginine codons in T. thermophilus genes. tRNA2 has a CAU anticodon, which would correspond to methionine codon AUG if C34 in the wobble position is unmodified. In homologous tRNAs from a number of bacteria and bacteriophages, the corresponding cytidine is converted to lysidine, which results in the AUA (Ile) decoding ¹¹^-¹³. Determinants for tRNA^Ile identity are thought to consist of anticodon loop bases A37 and A38, the discriminator base A73, and conserved base pairs in the D-arm (U12·A23), the anticodon arm (C29·G41), and the acceptor arm (C4·G69)¹⁴. All these characteristics are present in ϕYS40 tRNA2, which therefore may decode the isoleucine codon AUA, another rare T. thermophilus codon that is much more frequent in ϕYS40 ORFs. Thus, ϕYS40-encoded tRNAs may ensure efficient decoding of codons that are overrepresented in the phage genome relative to its host.

Sequence analysis of predicted ϕYS40 proteins

Analysis of intrinsic features of protein sequences indicates that seven ϕYS40 ORFs encode proteins with putative transmembrane domains (from one to three) and four ϕYS40 proteins are predicted to have coiled-coil regions. Only one protein, gp107, is predicted to be strongly non-globular, and only one protein, gp35, contains an N-terminal secretion signal peptide. All deduced amino acid sequences were compared to proteins in the non-redundant database at NCBI using the PSI-BLAST program with a slightly relaxed cutoff for profile inclusion (-h parameter). The comparison showed that ∼25% of ϕYS40 proteins display sequence similarity to proteins of known function (Table 1).

ϕYS40 proteins involved in nucleotide metabolism

Like other large phage genomes, ϕYS40 encodes a number of enzymes involved in nucleotide metabolism. They are gp8, a homolog of mammalian/viral UTPase (EC 3.6.1.23); gp9, related to a predicted flavin-dependent thymidylate synthase (EC 2.1.1.148); GMP reductase gp17 (EC 1.7.1.7); thymidine kinase gp24 (EC 2.7.1.21); deoxycytidylate deaminase gp38 (EC 3.5.4.12); dNMP kinase gp60 (EC 2.7.4.-); and the catalytic α subunit of ribonucleotide reductase encoded by two adjoining ORFs, gp41 and gp42 (EC 1.17.4.1). Except for dUTPase gp8, all these gene products show stronger sequence similarity to prokaryotic or phage enzymes than to their eukaryotic or archaeal counterparts. The best database match and closest phylogenetic neighbor for dUTPase gp8 is dUTPase from Lymantria dispar nucleopolyhedrosis virus. Gene exchange between phages and bacteria has been suggested to account for odd gene phylogenies that are sometimes observed in the components of bacterial replication and transcription machinery ¹⁵. Our observation indicates that eukaryotic viruses, and perhaps their hosts, may also be involved in such exchange.

ϕYS40 proteins involved in DNA replication and recombination

ϕYS40 encodes most of the proteins required for replisome formation, namely gp14, a replication initiation helicase DnaB; gp23, a bacterial DnaG-family DNA primase; gp26, a RecB family exonuclease; gp33, a type A DNA polymerase, and gp27, a DEAD box helicase. Another predicted DEAD-box helicase is encoded by gp79. Based on the fact that gp79 is a part of the ϕYS40 virion, we suspect that it is involved in viral DNA packaging. ϕYS40 also encodes two recombination proteins, gp12, a RecA/RadA recombinase, and gp114, an ssDNA-annealing protein of the ERF family. There are no gene products with detectable sequence similarity to known ssDNA-binding proteins ¹⁶ or DNA ligases.

The product of gene 65 is of particular interest for understanding the replication mechanism of ϕYS40. It shows a striking sequence similarity to a portion of the terminal protein (TP) of B. subtilis phage ϕ29. The Ser²³² residue of the TP protein forms a phosphoester bond with the 5'-terminal dAMP of the phage genome, and is essential for protein-primed replication of linear dsDNA genome of ϕ29 ¹⁷^-¹⁹. This serine is conserved in ϕYS40 gp65 (Fig. 2). Thus, it is likely that gp65 primes the replication of ϕYS40 genomic DNA. It should be noted that the ends of the ϕYS40 genome as presented in Fig. 1 are arbitrary, since no defined ends were revealed during genome sequencing and assembly, indicating that the ϕYS40 genome may be circularly permuted or may have direct terminal repeats. This matter requires further investigation.

Sequence alignment of the TP proteins.

Multiple alignment of terminal proteins (TP) from ϕ29 family phages and phage ϕYS40 gp65. The stretch of * indicates a region of a predicted amphipathic alpha-helix in TP. Distances, in amino acid residues, from the ends of each sequence and between blocks, are shown in parentheses. A white font in blue indicates the residue identical in all sequences compared, yellow shading indicates the conservation of hydrophobic residues, grey shading indicates the conservation of polar and charged residues. The white font in red indicates the Ser₂₃₂ that is essential for TP priming activity.

Properties of the ϕYS40 DNA polymerase

The ϕYS40 gp33 is a type A DNA polymerase, which contains a conserved nucleotidyltransferase domain and a 3'-5' exonuclease domain, but lacks the 5'→3' exonuclease domain. Since gp33 is the first known example of a type A DNA polymerase from a thermophilic phage, we expressed recombinant gp33 in E. coli and studied its properties in vitro. At 60-65 °C, recombinant gp33 exhibited moderate polymerization activity and very strong 3'→ 5'exonuclease activity toward both single-stranded DNA and double-stranded DNA substrates, even in the presence of 1 mM dNTP. As a result, at pH > 8.0 and low salt concentrations, the enzyme mostly hydrolyzed the primer. The increase of salt concentration partially inhibited the exonucleolytic activity and allowed primer elongation, until further increase inhibited the polymerase activity as well. The decay of primer-template substrate by gp33 exonuclease was abolished when primers were protected with thiolate modification, but the interference of the exonucleolytic activity during elongation resulted in poor DNA yield.

Gp33 was moderately thermostable. Both polymerase and exonuclease functions were lost after a 3-min incubation at 85 °C. At 75 °C, the polymerase activity decreased faster than the exonuclease activity; as a result, the enzyme produced shorter elongation products after heating. Similarly low thermostability has been reported for type B DNA polymerase from the Rhodothermus marinus phage (a half-life of 2 min at 90 °C ⁵). These observations indicate that both processivity of ϕYS40 DNA polymerase and its stability at elevated temperatures must be conferred by its interactions with other components of the replicative complex, in marked contrast with other DNA polymerases of bacteria and archaea, such as Taq or Pfu, which are processive and thermostable in the absence of cofactors.

Protein composition of ϕYS40 virions

To identify ϕYS40 structural proteins, ϕYS40 virions were purified by double sedimentation in CsCl gradients. The results of SDS-PAGE analysis of purified ϕYS40 virions are shown in Fig. 3. The two major protein components of the virion were identified by mass spectrometry as gp73 and gp19 (Fig. 3). These proteins may correspond to major head and tail proteins, but their function could not have been predicted by sequence comparison because of lack of database homologs.

SDS-PAGE analysis of the ϕYS40 virion proteins.

The SDS gel shows the protein composition of purified ϕYS40 virions. The two major bands identified by mass-spectrometry are indicated.

Three independent ϕYS40 lysates of increasing titer (from 2×10⁷ to 2×10⁹ pfu/ml) were also directly examined by multidimensional protein identification technology, MudPIT ²⁰ a shotgun proteomics approach where proteolytic peptides of a protein complex under study (in our case, phage virions) are generated, loaded onto triphasic microcapillary columns, eluted over several chromatography steps and analyzed directly by tandem mass spectrometry. Peptides matching 33 ϕYS40 proteins were detected in one or more of these samples. There were also 79 host proteins, all of which decreased in abundance when the lysates of higher titer were used as a starting material for CsCl purification (Supplementary Table A). In contrast, the NSAF (Normalized Spectral Abundance Factor, see Materials and Methods) values for ϕYS40 proteins increased with the titer of phage in the starting sample (Supplementary Table B). Gp73 and gp19 were detected at the highest levels in all three analyses in agreement with these being major structural proteins. With the exception of gp52 (annotated as a UDP-3-O-[3-hydroxymyristory] glucosamine N-acyltransferase), gp69 (tail sheath protein), gp79 (DEAD-Box helicase), gp150 (putative baseplate assembly protein), and gp152 (fibritin neck whisker), most ϕYS40 virion proteins identified in this analysis are novel proteins without any detectable database homologs. Interestingly, all multiply detected ϕYS40 virion proteins are the products of adjacent co-transcribed genes, except for ORF19 (Fig. 4C). In particular, a group of 13 proteins detected at high levels are the products of genes at the end of the largest cluster of ϕYS40 genes (ORF62-ORF146, above) that therefore may correspond to the late gene cluster.

MudPIT analysis of ϕYS40 lysates.

Normalized Spectral Abundance Factor (NSAF) values measured for ϕYS40 proteins detected in at least two of the three runs.

NSAFs for contaminating *T. thermophilus* proteins detected in at least two of the three runs.

All 33 ϕYS40 genes for which products were detected are plotted along the genome as a function of the measured NSAF values (when proteins were identified in several runs, maximal NSAF values are reported). The arrows under the x axis represent the position of the leftward and rightward predicted transcription clusters.

DISCUSSION

Bacteriophages may be the most abundant living entities on Earth. It has been proposed that the origin of dsDNA bacteriophages is as ancient as DNA replication itself and that the analysis of the currently known bacteriophages may provide clues to early evolution of cellular and viral genomes ¹⁵.

Here, we report a preliminary analysis of Thermus thermophilus bacteriophage ϕYS40 genome. The analysis shows that ϕYS40 does not easily fit into previously established groups of dsDNA bacterial viruses and may represent a distinct branch of the Myoviridae family. A substantial fraction of ϕYS40 genes codes for predicted proteins to which no function can be assigned; however, 25% of the ϕYS40-encoded proteins show detectable homology to their counterparts in a broad phylogenetic range of microorganisms, and some proteins are homologous to proteins found in other dsDNA bacteriophages infecting diverse hosts, such as Staphylococcus, Rhodothermus marinus, and Vibrio parahaemolyticus. In agreement with morphological data, predicted tail genes are mostly Myoviridae-related. Most of other ϕYS40 genes that have database homologs are, however, closer to either podoviral or siphoviral gene products: for instance, gp26 (RecB family exonuclease) and gp60 (dNMP kinase) are most closely related to homologs from a podovirus SIO1 and a λ-like siphovirus phi-BT1, respectively. Yet other genes are phylogenetically close to bacterial genes, and, in one case, to a homolog from a eukaryotic baculovirus. ϕYS40 has apparently arisen through multiple acts of recombination between different groups of phages and perhaps even their hosts.

Molecular adaptations to thermophily in various species are of great interest. Comparative studies of the genomes of thermophilic, hyperthermophilic, and mesophilic prokaryotes have suggested several attributes of thermostability at the levels of amino acid sequence, properties of folded proteins, and gene content. The proposed sequence level predictors of thermostability, such as large charged-versus-polar (CvP) amino acid ratio or (E + K)/(Q + H) ratio, are not conclusive in the case of ϕYS40, and genes that are indicative of the host ability to survive at extreme temperatures ²¹are missing from the ϕYS40 genome. Moreover, only seven ϕYS40 gene products have closest phylogenetic neighbors in thermophilic microorganisms.

In its genome size, ϕYS40 is similar to T4, an E. coli phage that is known to rely on host RNA polymerase for expression of its genes. During its development, T4 sequentially modifies host RNA polymerase to shut off transcription of host genes and to ensure correct expression of several classes of its own genes (reviewed in Ref. 22). Like T4, ϕYS40 does not encode its own RNA polymerase and therefore has to rely on the host enzyme for transcription of its DNA. The early genes of ϕYS40 should therefore be transcribed by the T. thermophilus RNA polymerase holoenzyme, most likely containing general initiation factor σ^A. Preliminary analysis reveals the presence of sequences with strong similarities to bacterial housekeeping sigma promoters in front of many ϕYS40 genes, but no such sequences are found in front of genes coding for ϕYS40 structural proteins (A. Sevostyanova, M. Gelfand and KS, unpublished observations). Structural genes, which should be expressed late in infection, must be therefore transcribed by a modified form of host RNA polymerase. Further biochemical studies may reveal ϕYS40 proteins that are required for these modifications.

MATERIALS AND METHODS

Cell growth and phage infection

The bacterial strain Thermus thermophilus HB8 and ϕYS40 were generously provided by Dr. Tairo Oshima, Tokyo University of Pharmacy & Life Science. The cells and phage were grown overnight in the Tth medium (0.8% polypeptone, 0.4% yeast extract, 0.2% NaCl, and 0.35 M CaCl₂ and 0.4 M MgSO₄) at 65°C with vigorous agitation.

To isolate individual ϕYS40 plaques, 1 ml of overnight HB8 culture (OD600∼1.6) was centrifuged and resuspended in 100 μl of the Tth medium and combined with 5 μl dilutions of ϕYS40 stock, incubated for 15 min at 65°C, plated in soft Tth agar (0.7 %), and incubated overnight at 65°C. An individual plaque was picked up and subjected to two more rounds of plaque purification, before making a phage lysate stock solution. To this end, a single plaque was resuspended in a small volume of the Tth medium and mixed with 0.1 ml of overnight HB8 culture. The mixture was incubated for 15 minutes at 65 °C to allow phage absorption, 5 ml of fresh Tth medium was added and the culture was incubated on a rotary shaker at 65 °C until complete lysis occurred (usually overnight). Cell debris was removed from the lysate by centrifugation at 12,000g for 15 minutes. The resultant phage stock (6×10⁹ pfu/ml) was saturated with chloroform and stored at 4 °C. The ϕYS40 stock was used to prepare larger amounts of phage lysate using a scale-up of the procedure described above.

Purification of ϕYS40 virions

DNase I and RNase A (each to a final concentration of 1 μg/ ml) were added to ϕYS40 lysed T. thermophilus culture followed by a 30-min incubation at 30°C. Solid NaCl was added a final concentration of 1 M and dissolved by swirling. The lysed culture was left on ice for 1 h and centrifuged at 11,000 g for 10 min at 4°C. To precipitate ϕYS40, PEG 8000 was added to the supernatant to the final concentration of 10% (w/v) followed by a 1-h incubation on ice. Precipitated ϕYS40 particles were recovered by centrifugation at 11,000g for 10 min at 4 °C. The phage pellet was resuspended in 2 ml of SM buffer (NaCl, MgSO₄,Tris-HCl, pH7.5, 2% gelatin). The PEG 8000 and cell debris were extracted from the phage suspension by adding an equal volume of chloroform and centrifuged at 3,000g for 15 min at 4°C. 0.5 g of solid CsCl per milliliter of bacteriophage suspension was added to the aqueous phase, which contained the bacteriophage particles, and dissolved by gentle mixing. CsCl step gradients (three steps with 1.45, 1.50, and 1.70 g/l density) were performed in Beckman SW41 polypropylene centrifuge tubes at 22,000 rpm for 2 hrs at 4 °C and at 38.000 rpm for 24 hrs at 4 °C (Beckman SW50.1 rotor, Beckman Coulter, Fullerton, CA). Purified bacteriophage suspension was dialyzed twice at room temperature for 1 h against a 1000-fold volume of 10 mM NaCl, 50 mM Tris-HCl pH 8.0, 10 mM MgCl₂.

Extraction of phage DNA

EDTA (to a final concentration of 20 mM), proteinase K (to a final concentration of 50 μg/ml), SDS (to a final concentration of 0.5%) were added to bacteriophage solution and incubated at 56°C for 1 h. An equal volume of phenol was added to chilled bacteriophage suspension, mixed, and centrifuged at 3000 g for 5 min at room temperature. The aqueous phase was extracted with a 50:50 mixture of equilibrated phenol and chloroform, and equal volume of chloroform. DNA was precipitated with ethanol.

Genome sequencing

Initial sequence data were obtained using mini shotgun library of phage DNA. Several rounds of sequencing reactions were performed directly on phage DNA using ThermoFidelase and Fimer technology²³^,²⁴. Trace assembly was done with phredPhrap package (http://www.phrap.com/) ²⁵. The final round of sequencing resulted in one pseudocircular contig with a no- errors quality level.

Sequence analysis

ORFs of ϕYS40 were predicted using the GeneMark server (http://opal.biology.gatech.edu/GeneMark/heuristic_hmm2.cgi, Ref. 26). The PSI-BLAST program ²⁷ was used to detect the homologs of ϕYS40 genes in the DNA and protein databases, with profile inclusion cutoff E-value in PSI-BLAST (-h parameter) set at 0.02. Both options for low-complexity filtering (-F parameter) and composition-based statistics (-t parameter) were sometime adjusted for better detection in sequence similarities. Phylogenetic analysis was performed using the programs in the PHYLIP package.²⁸

tRNA genes were searched by using the tRNAscan-SE program. ²⁹ Searches for the presence of the transmembrane helices and coiled coil regions were done with the aid of the SEALS package. ³⁰

MudPIT

Three independent virion lysates were prepared by double sedimentation in CsCl gradients and had phage titers of 2×10⁷ pfu/ml, 4.2×10⁸ pfu/ml and 2×10⁹ pfu/ml. These lysates were treated with for 30 minutes at 37°C with 0.1U of benzonase (Sigma, St. Louis, MO), then precipitated in 20% trichloroacetic acid, 100mM Tris-HCl, ph 8.5, overnight at 4°C. The dried protein pellets were denatured, reduced, alkylated and digested with endoproteinase LysC and trypsin (both from Roche Applied Science, Indianapolis, IN) as described previously. ³¹ Peptide mixtures were pressure-loaded onto split-triphasic microcapillary columns, installed in-line with a Quaternary Agilent 1100 series HPLC pump coupled to Deca-XP ion trap tandem mass spectrometer (ThermoElectron, San Jose, CA) and analyzed via seven-step chromatography as described in Ref. 31.

The MS/MS datasets were searched using SEQUEST ³² against a database of 171 YS40 predicted gene products, combined with 2224 protein sequences from Thermus thermophilus, strain HB8 (chromosome and large plasmid) downloaded from NCBI on 2005-08-01, as well as usual contaminants such as human keratins, IgGs, and proteases. In addition, to estimate background correlations, each sequence in the database was randomized (keeping the same amino acid composition and length) and the resulting “shuffled” sequences were concatenated to the “normal” sequences and searched at the same time (the total number of sequences searched was 5144).

DTASelect/CONTRAST program³³ was used to select spectra/peptide matches with normalized difference in cross-correlation score (DeltCn) of at least 0.11, a minimum cross-correlation score (XCorr) of 1.8 for singly-, 2.5 for doubly-, and 3.5 for triply-charged spectra, a maximum Sp rank of 10, and a minimal length of 7 amino acids. In addition, the peptides had to be fully tryptic. No peptides matching shuffled protein sequences passed this criteria set. Spectral counts are considered to be a good estimation of absolute protein abundance³⁴. To account for the fact that larger proteins tend to contribute more peptide/spectra, spectral counts are divided by protein length defining a Spectral Abundance Factor (SAF).³⁵ SAF values are normalized against the sum of all SAFs for each run (removing redundant proteins) allowing us to compare protein levels across different runs using the Normalized Spectral Abundance Factor (NSAF) value.

ϕYS40 DNA polymerase

The gene encoding ϕYS40 DNA polymerase was PCR amplified using appropriate primers annealing at the beginning and the end of ϕYS40 gene 33 and containing engineered NdeI site CATATG overlapping with the initiating ATG codon of gene 33 and a HindIII site downstream of the termination codon (primer sequences are available from the authors upon request). The amplified fragment with treated with NdeI and HindIII and cloned into appropriately digested pet21d plasmid and transformed into the E. coli expression strain BL-21 pLysS. Cells were grown in 1 L of LB medium and induced with 1 mM IPTG. Cell pellet was dissolved in 15 ml of lysis buffer and centrifuged at 17000 rpm for 30 min (no heat treatment). Lysate was diluted to 0.25M NaCl, and applied on a Heparin Sepharose High-Trap column (GE Healthcare, Newark, NJ), equilibrated with 50 mM Tris pH 7.5, containing 0.25 M NaCl and 2 mM mercaptoethanol. After washing with the same buffer, ϕYS40 DNA polymerase was eluted in about 0.3-0.35 M NaCl and appeared to be over 80% pure by SDS-PAGE. Assays of its enzymatic activities were done essentially as described by Pavlov et al., Ref. 36.

Supplementary Material

NIHMS14318-supplement-01.pdf^{(166.1KB, pdf)}

Acknowledgments

This work was supported by NIH RO1 grant GM64530 (to KS) and NIH GM61898 to Seth Darst. The authors thank Galina Glazko and Frank Emmert-Streib (both from Stowers Institute) for assistance on the gene clustering analysis and the analysis on codon usage, respectively.

References

1.Palm P, Schleper C, Grampp B, Yeats S, McWilliam P, Reiter WD, Zillig W. Complete nucleotide sequence of the virus SSV1 of the archaebacterium Sulfolobus shibatae. Virology. 1991;185:242–250. doi: 10.1016/0042-6822(91)90771-3. [DOI] [PubMed] [Google Scholar]
2.Arnold HP, Zillig W, Ziese U, Holz I, Crosby M, Utterback T, Weidmann JF, Umayam LA, Teffera K, Kristjansson JK, Klenk HP, Nelson KE, Fraser CM. A novel lipothrixvirus, SIFV, of the extremely thermophilic crenarchaeon Sulfolobus. Virology. 2000;267:252–266. doi: 10.1006/viro.1999.0105. [DOI] [PubMed] [Google Scholar]
3.Wiedenheft B, Stedman K, Roberto F, Willits D, Gleske AK, Zoeller L, Snyder J, Douglas T, Young M. Comparative genomic analysis of hyperthermophilic archaeal Fuselloviridae viruses. J. Virol. 2004;78:1954–1961. doi: 10.1128/JVI.78.4.1954-1961.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Prangishvili D, Garrett RA, Koonin EV. Evolutionary genomics of archaeal viruses: Unique viral genomes in the third domain of life. Virus Res. 2006;117:52–67. doi: 10.1016/j.virusres.2006.01.007. [DOI] [PubMed] [Google Scholar]
5.Hjorleifsdottir S, Hreggvidsson GO, Fridjonsson OH, Aevarsson A, Kristjansson JK. Bacteriophage RM 378 of a thermophilic host organism. Patent: WO 0075335-A 14-DEC-2000 Decode Genetics EHF. 2000
6.Sakaki Y, Oshima T. Isolation and characterization of a bacteriophage infectious to an extreme thermophile, Thermus thermophilus HB8. J. Virol. 1975;15:1449–1453. doi: 10.1128/jvi.15.6.1449-1453.1975. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Durand D, Sankoff D. Tests for gene clustering. J. Comput. Biol. 2003;10:453–482. doi: 10.1089/10665270360688129. [DOI] [PubMed] [Google Scholar]
8.Miller ES, Kutter EM, Mosig G, Arisaka F, Kunisawa T, Rüger W. Bacteriophage T4 genome. Microbiol. Mol. Biol. Rev. 2003;67:86–156. doi: 10.1128/MMBR.67.1.86-156.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Miller ES, Heidelberg JF, Eisen JA, Nelson WC, Durkin AS, Ciecko A, Feldblyum TV, White O, Paulsen IT, Nierman WC, Lee J, Szczypinski B, Fraser CM. Complete genome sequence of the broad-host-range vibriophage KVP40: comparative genomics of a T4-related bacteriophage. J. Bacteriol. 2003;185:5220–5233. doi: 10.1128/JB.185.17.5220-5233.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Mesyanzhinov VV, Robben J, Grymonprez B, Kostyuchenko VA, Bourkaltseva MV, Sykilinda NN, Krylov VN, Volckaert G. The genome of bacteriophage ϕKZ of Pseudomonas aeruginosa. J. Mol. Biol. 2002;317:1–19. doi: 10.1006/jmbi.2001.5396. [DOI] [PubMed] [Google Scholar]
11.Matsugi J, Murao K, Ishikura H. Characterization of a B. subtilis minor isoleucine tRNA deduced from tDNA having a methionine anticodon CAT. J. Biochem. (Tokyo) 1996;119:811–816. doi: 10.1093/oxfordjournals.jbchem.a021312. [DOI] [PubMed] [Google Scholar]
12.Muramatsu T, Nishikawa K, Nemoto F, Kuchino Y, Nishimura S, Miyazawa T, Yokoyama S. Codon and amino-acid specificities of a transfer RNA are both converted by a single post-transcriptional modification. Nature. 1988;336:179–181. doi: 10.1038/336179a0. [DOI] [PubMed] [Google Scholar]
13.Muramatsu T, Yokoyama S, Horie N, Matsuda A, Ueda T, Yamaizumi Z, Kuchino Y, Nishimura S, Miyazawa T. A novel lysine-substituted nucleoside in the first position of the anticodon of minor isoleucine tRNA from Escherichia coli. J. Biol. Chem. 1988;263:9261–9267. doi: 10.1351/pac198961030573. [DOI] [PubMed] [Google Scholar]
14.Nureki O, Niimi T, Muramatsu T, Kanno H, Kohno T, Florentz C, Giege R, Yokoyama S. Molecular recognition of the identity-determinant set of isoleucine transfer RNA from Escherichia coli. J. Mol. Biol. 1994;236:710–724. doi: 10.1006/jmbi.1994.1184. [DOI] [PubMed] [Google Scholar]
15.Filée J, Forterre P, Laurent J. The role played by viruses in the evolution of their hosts: a view based on informational protein phylogenies. Res. Microbiol. 2003;154:237–243. doi: 10.1016/S0923-2508(03)00066-4. [DOI] [PubMed] [Google Scholar]
16.Ponomarev VA, Makarova KS, Aravind L, Koonin EV. Gene duplication with displacement and rearrangement: origin of the bacterial replication protein PriB from the single-stranded DNA-binding protein Ssb. J. Mol. Microbiol. Biotechnol. 2003;4:225–229. doi: 10.1159/000071074. [DOI] [PubMed] [Google Scholar]
17.Hermoso JM, Méndez E, Soriano F, Salas M. Location of the serine residue involved in the linkage between the terminal protein and the DNA of phage ϕ29. Nucleic Acids Res. 1985;13:7715–7728. doi: 10.1093/nar/13.21.7715. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Garmendia C, Salas M, Hermoso JM. Site-directed mutagenesis in the DNA linking site of bacteriophage ϕ29 terminal protein: isolation and characterization of a Ser232----Thr mutant. Nucleic Acids Res. 1988;16:5727–5740. doi: 10.1093/nar/16.13.5727. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Garmendia C, Hermoso JM, Salas M. Functional domain for priming activity in the phage ϕ29 terminal protein. Gene. 1990;88:73–79. doi: 10.1016/0378-1119(90)90061-u. [DOI] [PubMed] [Google Scholar]
20.Washburn MP, Wolters D, Yates JR., 3rd. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 2001;19:242–247. doi: 10.1038/85686. [DOI] [PubMed] [Google Scholar]
21.Makarova KS, Wolf YI, Koonin EV. Potential genomic determinants of hyperthermophily. Trends Genet. 2003;19:172–176. doi: 10.1016/S0168-9525(03)00047-7. [DOI] [PubMed] [Google Scholar]
22.Nechaev S, Severinov K. Bacteriophage-induced modifications of host RNA polymerase. Annu. Rev. Microbiol. 2003;57:301–322. doi: 10.1146/annurev.micro.57.030502.090942. [DOI] [PubMed] [Google Scholar]
23.Slesarev AI, Mezhevaya KV, Makarova KS, Polushin NN, Shcherbinina OV, Shakhova VV, Belova GI, Aravind L, Natale DA, Rogozin IB, Tatusov RL, Wolf YI, Stetter KO, Malykh AG, Koonin EV, Kozyavkin SA. The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc. Natl. Acad. Sci. U.S.A. 2002;99:4644–4649. doi: 10.1073/pnas.032671499. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Polushin N, Malykh A, Morocho AM, Slesarev A, Kozyavkin S. High-throughput production of optimized primers (fimers) for whole-genome direct sequencing. Methods Mol. Biol. 2005;288:291–304. doi: 10.1385/1-59259-823-4:291. [DOI] [PubMed] [Google Scholar]
25.Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8:175–815. doi: 10.1101/gr.8.3.175. [DOI] [PubMed] [Google Scholar]
26.Besemer J, Borodovsky M. Heuristic approach to deriving models for gene finding. Nucleic Acids Res. 1999;27:3911–392. doi: 10.1093/nar/27.19.3911. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Altschul SF, Madden TI, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Felsenstein J. PHYLIP (Phylogeny Inference Package) version 3.6. Department of Genome Sciences, University of Washington; Seattle: 2005. Distributed by the author. [Google Scholar]
29.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Walker DR, Koonin EV. SEALS: a system for easy analysis of lots of sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1997;5:333–339. [PubMed] [Google Scholar]
31.Tomomori-Sato C, Sato S, Parmely TJ, Banks CA, Sorokina I, Florens L, Zybailov B, Washburn MP, Brower CS, Conaway RC, Conaway JW. A mammalian mediator subunit that shares properties with Saccharomyces cerevisiae mediator subunit Cse2. J. Biol. Chem. 2004;279:5846–5851. doi: 10.1074/jbc.M312523200. [DOI] [PubMed] [Google Scholar]
32.Eng J, McCormack AL, Yates JR., 3rd An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Amer. Mass Spectrom. 1994;5:976–989. doi: 10.1016/1044-0305(94)80016-2. [DOI] [PubMed] [Google Scholar]
33.Tabb DL, McDonald WH, Yates JR., 3rd. DTASelect and Contrast: Tools for assembling and comparing protein identifications from shotgun proteomics. J. Proteome Res. 2002;1:21–26. doi: 10.1021/pr015504q. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Liu H, Sadygov RG, Yates JR., 3rd. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 2004;76:4193–4201. doi: 10.1021/ac0498563. [DOI] [PubMed] [Google Scholar]
35.Powell DW, Weaver CM, Jennings JL, McAfee KJ, He Y, Weil PA, Link AJ. Cluster analysis of mass spectrometry data reveals a novel component of SAGA. Mol. Cell. Biol. 2004;24:7249–7259. doi: 10.1128/MCB.24.16.7249-7259.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Pavlov AR, Belova GI, Kozyavkin SA, Slesarev AI. Helix-hairpin-helix motifs confer salt resistance and processivity on chimeric DNA polymerases. Proc. Natl. Acad. Sci. U. S. A. 2002;99:13510–13515. doi: 10.1073/pnas.202127199. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS14318-supplement-01.pdf^{(166.1KB, pdf)}

[R1] 1.Palm P, Schleper C, Grampp B, Yeats S, McWilliam P, Reiter WD, Zillig W. Complete nucleotide sequence of the virus SSV1 of the archaebacterium Sulfolobus shibatae. Virology. 1991;185:242–250. doi: 10.1016/0042-6822(91)90771-3. [DOI] [PubMed] [Google Scholar]

[R2] 2.Arnold HP, Zillig W, Ziese U, Holz I, Crosby M, Utterback T, Weidmann JF, Umayam LA, Teffera K, Kristjansson JK, Klenk HP, Nelson KE, Fraser CM. A novel lipothrixvirus, SIFV, of the extremely thermophilic crenarchaeon Sulfolobus. Virology. 2000;267:252–266. doi: 10.1006/viro.1999.0105. [DOI] [PubMed] [Google Scholar]

[R3] 3.Wiedenheft B, Stedman K, Roberto F, Willits D, Gleske AK, Zoeller L, Snyder J, Douglas T, Young M. Comparative genomic analysis of hyperthermophilic archaeal Fuselloviridae viruses. J. Virol. 2004;78:1954–1961. doi: 10.1128/JVI.78.4.1954-1961.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Prangishvili D, Garrett RA, Koonin EV. Evolutionary genomics of archaeal viruses: Unique viral genomes in the third domain of life. Virus Res. 2006;117:52–67. doi: 10.1016/j.virusres.2006.01.007. [DOI] [PubMed] [Google Scholar]

[R5] 5.Hjorleifsdottir S, Hreggvidsson GO, Fridjonsson OH, Aevarsson A, Kristjansson JK. Bacteriophage RM 378 of a thermophilic host organism. Patent: WO 0075335-A 14-DEC-2000 Decode Genetics EHF. 2000

[R6] 6.Sakaki Y, Oshima T. Isolation and characterization of a bacteriophage infectious to an extreme thermophile, Thermus thermophilus HB8. J. Virol. 1975;15:1449–1453. doi: 10.1128/jvi.15.6.1449-1453.1975. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Durand D, Sankoff D. Tests for gene clustering. J. Comput. Biol. 2003;10:453–482. doi: 10.1089/10665270360688129. [DOI] [PubMed] [Google Scholar]

[R8] 8.Miller ES, Kutter EM, Mosig G, Arisaka F, Kunisawa T, Rüger W. Bacteriophage T4 genome. Microbiol. Mol. Biol. Rev. 2003;67:86–156. doi: 10.1128/MMBR.67.1.86-156.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Miller ES, Heidelberg JF, Eisen JA, Nelson WC, Durkin AS, Ciecko A, Feldblyum TV, White O, Paulsen IT, Nierman WC, Lee J, Szczypinski B, Fraser CM. Complete genome sequence of the broad-host-range vibriophage KVP40: comparative genomics of a T4-related bacteriophage. J. Bacteriol. 2003;185:5220–5233. doi: 10.1128/JB.185.17.5220-5233.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Mesyanzhinov VV, Robben J, Grymonprez B, Kostyuchenko VA, Bourkaltseva MV, Sykilinda NN, Krylov VN, Volckaert G. The genome of bacteriophage ϕKZ of Pseudomonas aeruginosa. J. Mol. Biol. 2002;317:1–19. doi: 10.1006/jmbi.2001.5396. [DOI] [PubMed] [Google Scholar]

[R11] 11.Matsugi J, Murao K, Ishikura H. Characterization of a B. subtilis minor isoleucine tRNA deduced from tDNA having a methionine anticodon CAT. J. Biochem. (Tokyo) 1996;119:811–816. doi: 10.1093/oxfordjournals.jbchem.a021312. [DOI] [PubMed] [Google Scholar]

[R12] 12.Muramatsu T, Nishikawa K, Nemoto F, Kuchino Y, Nishimura S, Miyazawa T, Yokoyama S. Codon and amino-acid specificities of a transfer RNA are both converted by a single post-transcriptional modification. Nature. 1988;336:179–181. doi: 10.1038/336179a0. [DOI] [PubMed] [Google Scholar]

[R13] 13.Muramatsu T, Yokoyama S, Horie N, Matsuda A, Ueda T, Yamaizumi Z, Kuchino Y, Nishimura S, Miyazawa T. A novel lysine-substituted nucleoside in the first position of the anticodon of minor isoleucine tRNA from Escherichia coli. J. Biol. Chem. 1988;263:9261–9267. doi: 10.1351/pac198961030573. [DOI] [PubMed] [Google Scholar]

[R14] 14.Nureki O, Niimi T, Muramatsu T, Kanno H, Kohno T, Florentz C, Giege R, Yokoyama S. Molecular recognition of the identity-determinant set of isoleucine transfer RNA from Escherichia coli. J. Mol. Biol. 1994;236:710–724. doi: 10.1006/jmbi.1994.1184. [DOI] [PubMed] [Google Scholar]

[R15] 15.Filée J, Forterre P, Laurent J. The role played by viruses in the evolution of their hosts: a view based on informational protein phylogenies. Res. Microbiol. 2003;154:237–243. doi: 10.1016/S0923-2508(03)00066-4. [DOI] [PubMed] [Google Scholar]

[R16] 16.Ponomarev VA, Makarova KS, Aravind L, Koonin EV. Gene duplication with displacement and rearrangement: origin of the bacterial replication protein PriB from the single-stranded DNA-binding protein Ssb. J. Mol. Microbiol. Biotechnol. 2003;4:225–229. doi: 10.1159/000071074. [DOI] [PubMed] [Google Scholar]

[R17] 17.Hermoso JM, Méndez E, Soriano F, Salas M. Location of the serine residue involved in the linkage between the terminal protein and the DNA of phage ϕ29. Nucleic Acids Res. 1985;13:7715–7728. doi: 10.1093/nar/13.21.7715. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Garmendia C, Salas M, Hermoso JM. Site-directed mutagenesis in the DNA linking site of bacteriophage ϕ29 terminal protein: isolation and characterization of a Ser232----Thr mutant. Nucleic Acids Res. 1988;16:5727–5740. doi: 10.1093/nar/16.13.5727. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Garmendia C, Hermoso JM, Salas M. Functional domain for priming activity in the phage ϕ29 terminal protein. Gene. 1990;88:73–79. doi: 10.1016/0378-1119(90)90061-u. [DOI] [PubMed] [Google Scholar]

[R20] 20.Washburn MP, Wolters D, Yates JR., 3rd. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 2001;19:242–247. doi: 10.1038/85686. [DOI] [PubMed] [Google Scholar]

[R21] 21.Makarova KS, Wolf YI, Koonin EV. Potential genomic determinants of hyperthermophily. Trends Genet. 2003;19:172–176. doi: 10.1016/S0168-9525(03)00047-7. [DOI] [PubMed] [Google Scholar]

[R22] 22.Nechaev S, Severinov K. Bacteriophage-induced modifications of host RNA polymerase. Annu. Rev. Microbiol. 2003;57:301–322. doi: 10.1146/annurev.micro.57.030502.090942. [DOI] [PubMed] [Google Scholar]

[R23] 23.Slesarev AI, Mezhevaya KV, Makarova KS, Polushin NN, Shcherbinina OV, Shakhova VV, Belova GI, Aravind L, Natale DA, Rogozin IB, Tatusov RL, Wolf YI, Stetter KO, Malykh AG, Koonin EV, Kozyavkin SA. The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc. Natl. Acad. Sci. U.S.A. 2002;99:4644–4649. doi: 10.1073/pnas.032671499. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Polushin N, Malykh A, Morocho AM, Slesarev A, Kozyavkin S. High-throughput production of optimized primers (fimers) for whole-genome direct sequencing. Methods Mol. Biol. 2005;288:291–304. doi: 10.1385/1-59259-823-4:291. [DOI] [PubMed] [Google Scholar]

[R25] 25.Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8:175–815. doi: 10.1101/gr.8.3.175. [DOI] [PubMed] [Google Scholar]

[R26] 26.Besemer J, Borodovsky M. Heuristic approach to deriving models for gene finding. Nucleic Acids Res. 1999;27:3911–392. doi: 10.1093/nar/27.19.3911. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Altschul SF, Madden TI, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Felsenstein J. PHYLIP (Phylogeny Inference Package) version 3.6. Department of Genome Sciences, University of Washington; Seattle: 2005. Distributed by the author. [Google Scholar]

[R29] 29.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Walker DR, Koonin EV. SEALS: a system for easy analysis of lots of sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1997;5:333–339. [PubMed] [Google Scholar]

[R31] 31.Tomomori-Sato C, Sato S, Parmely TJ, Banks CA, Sorokina I, Florens L, Zybailov B, Washburn MP, Brower CS, Conaway RC, Conaway JW. A mammalian mediator subunit that shares properties with Saccharomyces cerevisiae mediator subunit Cse2. J. Biol. Chem. 2004;279:5846–5851. doi: 10.1074/jbc.M312523200. [DOI] [PubMed] [Google Scholar]

[R32] 32.Eng J, McCormack AL, Yates JR., 3rd An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Amer. Mass Spectrom. 1994;5:976–989. doi: 10.1016/1044-0305(94)80016-2. [DOI] [PubMed] [Google Scholar]

[R33] 33.Tabb DL, McDonald WH, Yates JR., 3rd. DTASelect and Contrast: Tools for assembling and comparing protein identifications from shotgun proteomics. J. Proteome Res. 2002;1:21–26. doi: 10.1021/pr015504q. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Liu H, Sadygov RG, Yates JR., 3rd. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 2004;76:4193–4201. doi: 10.1021/ac0498563. [DOI] [PubMed] [Google Scholar]

[R35] 35.Powell DW, Weaver CM, Jennings JL, McAfee KJ, He Y, Weil PA, Link AJ. Cluster analysis of mass spectrometry data reveals a novel component of SAGA. Mol. Cell. Biol. 2004;24:7249–7259. doi: 10.1128/MCB.24.16.7249-7259.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Pavlov AR, Belova GI, Kozyavkin SA, Slesarev AI. Helix-hairpin-helix motifs confer salt resistance and processivity on chimeric DNA polymerases. Proc. Natl. Acad. Sci. U. S. A. 2002;99:13510–13515. doi: 10.1073/pnas.202127199. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Thermus thermophilus bacteriophage ϕYS40 genome and proteomic characterization of virions

Tatyana Naryshkina

Jing Liu

Laurence Florens

Selene K Swanson

Andrey R Pavlov

Nadejda V Pavlova

Ross Inman

Sergei A Kozyavkin

Michael Washburn

Arcady Mushegian

Konstantin Severinov

Abstract

Introduction

Results

Overview of the ϕYS40 genome

Table 1.

Figure 1.

tRNA genes

Sequence analysis of predicted ϕYS40 proteins

ϕYS40 proteins involved in nucleotide metabolism

ϕYS40 proteins involved in DNA replication and recombination

Figure 2.

Properties of the ϕYS40 DNA polymerase

Protein composition of ϕYS40 virions

Figure 3.

Figure 4.

DISCUSSION

MATERIALS AND METHODS

Cell growth and phage infection

Purification of ϕYS40 virions

Extraction of phage DNA

Genome sequencing

Sequence analysis

MudPIT

ϕYS40 DNA polymerase

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases