Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2019 Apr 16;10:806. doi: 10.3389/fmicb.2019.00806

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2019 Ponsero and Hurwitz.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

PMC Copyright notice

Influence of the training set composition on the model performances. (A) Recall of VirFinder “phage-prok” model on viral contigs isolated in various aquatic ecosystems. The recall was assessed for VirFinder “phage-prok” model when considering viral contigs isolated in various aquatic ecosystems (pelagic, freshwater, hot spring, coral-associated and wastewater). The sequences were downloaded from the IMG V/R env database (methods described in Supplementary File S1 and list of metagenomes used in Supplementary File S2). The viral sequences were broken down to 5000, 3000, 1000, and 500 bp and used to evaluate VirFinder “phage-prok” model. The mean of the recall was calculated for three evaluation sets of 2000 viral sequences each with the exception of the coral-associated evaluation sets composed of 200 viral examples due to the low amount of sequence available for this ecosystem. The error bars correspond to the standard deviation on the three measures. (B) F1-score of classifiers trained on Tara Oceans Metagenomes. Tara-trained models were trained on 10 000 viral and 10 000 prokaryotic sequences from Tara Oceans metagenomes and viromes broken down to 5000 bp. Previous cleaning steps were performed to ensure a low contamination content of the training set (see Supplementary File S1). The F1-score of a Tara-trained model and of VirFinder’s “phage-prok” model was calculated for evaluation sets composed of viruses and prokaryotes isolated in a marine ecosystem (“marine genomes”) or an evaluation set composed of viral and prokaryotic genomes regardless of their origin (“all genomes”). For the “marine evaluation set,” genomes from phages and prokaryotes isolated in marine ecosystems were downloaded from Genbank and the Patric database, respectively, and the sequences were broken down to 5000, 3000, 1000, and 500 bp (see methods in Supplementary File S1 and list of genomes available in Supplementary File S2). The “all genomes” evaluation set is composed of genomes from phages and prokaryotes from RefSeq database published after 2014 (see methods in Supplementary File S1 and list of genomes available in Supplementary File S2). The mean of the F1-score was calculated for three evaluation sets composed of 2000 viral sequences and 2000 prokaryotic sequences. The error bars correspond to the standard deviation on the three measures.