TABLE 2.
In-depth analysis, using contextual information, of a random subset of proteins classified as genus restricted by all automated methods
Accession no. | Protein name | Taxonomic position (family, genus, species) | Taxonomic distribution found by automated methods | Taxonomic distribution found by in-depth manual analysis | Type of protein (reference) | Evidence |
---|---|---|---|---|---|---|
YP_227375 | Large coat protein | Secoviridae, unclassified, Strawberry latent ringspot virus | 2 species | >40 families | Contains two domains of capsids with a jellyroll fold (PFAM clan Viral_ssRNA_CP) (49) | Marginal HHpred hits (E = 0.28 for region 241–308 and E = 7.5 for 43–159 region) to PFAM family RhV, HHalign comparison with RhV alignment (E = 5 × 10−2), functional confirmation (50) |
NP_042511 | Coat protein | Barnaviridae, Barnavirus, Mushroom bacilliform virus | 1 species | >40 families | Capsid with a jellyroll fold (PFAM clan Viral_ssRNA_CP) (49) | Marginal HHpred hit (E = 0.2) to PFAM family Viral_Coat for 67–181 region, HHalign comparison with Viral_coat (E = 1.4 × 10−3), functional confirmation (51) |
YP_308882 | 6K2 | Potyviridae, Ipomovirus, Cucumber vein yellowing virus | 4 species | 4 genera in Potyviridae family (Ipomovirus, Poacevirus, Tritimovirus, Macluravirus) | 6K2 (87) | Subsignificant CSI-BLAST hits to other 6K2 proteins (also located between CI protein and VPg protein), significant HHalign comparison between full-length 6K2 of Ipomovirus and 6K2 of Tritimovirus, Poacevirus, and Macluravirus (E = 2.9 × 10−7) |
NP_776026 | Putative matrix protein M | Flaviviridae, Flavivirus, Tamana bat virus | 1 species | At least 1 whole genus (Flavivirus), may also be homologous to matrix protein of related genus Pegivirus | Membrane protein (52) | Subsignificant CSI-BLAST hits of M proteins of flaviviruses to Tamana bat virus M protein, which has identical domain position within polyprotein (between C and E proteins), significant HHalign comparison between 53–159 region of M of Tamana bat virus and M of other flaviviruses (E = 6 × 10−6) |
NP_778215 | VPg | Unclassified, Sobemovirus, Turnip rosette virus | 1 species | 1 whole genus (Sobemovirus) | Genome-linked protein Vpg (88) | CSI-BLAST finds full-length significant matches to VPg of many sobemoviruses, further iterative sequence searches identify as homologs VPg of all other sobemoviruses, all have same position within 2a/2b polyprotein (downstream of serine protease domain) |
YP_293702 | P9 | Closteroviridae, Crinivirus, Tomato chlorosis virus | 1 genus | 1 genus | P9 (53) | |
NP_619694 | Hypothetical protein p34 | Closteroviridae, Crinivirus, Lettuce infectious yellows virus | 1 species | 2 speciesa | Endonuclease | HHpred hit to RNase Dicera (PDBb accession no. 3c4b, E = 2.10−4) |
NP_851570 | p22 | Closteroviridae, Crinivirus, Cucurbit yellow stunting disorder virus | 1 species | 1 species | ||
YP_293697 | p5 | Closteroviridae, Crinivirus, Tomato chlorosis virus | 1 species | 1 species | ||
YP_053926 | Hypothetical peptide | Comoviridae, Nepovirus (subgroup A), Tobacco ringspot virus | 1 species | 1 species |
p34 has homologs in only two viral species but is homologous to a vast family of RNases from cellular organisms and thus most probably originated by horizontal transfer, which is beyond the scope of this study (see Materials and Methods and Discussion).
PDB, Protein Data Bank.