Table 3.
Protein family hits to described proteins.
Family | DB | Tool | Best Hit (protein) | Best hit (species) | Curated Annotation |
---|---|---|---|---|---|
1217 | nr | blastp & hmmsearch | unknown | Veillonella sp. CAG:933 | Bacterial protein |
532 | nr | hmmsearch | hypothetical protein | M. rupellensis | Bacterial protein |
565b | nr | blastp | hypothetical protein H257_12751 | A. astaci | Putative replication protein, viral or bacterial |
nr | hmmsearch | putative replication protein | Phytophthora parasitica virus | ||
956b | nr | blastp & hmmsearch | hypothetical protein | C. trachomatis | Torque Teno virus ORF |
Four of the 32 ORFan protein families match proteins in the NR database. The table describes for each family: (1) the database of the hit, (2) the tool used to detect the similarity, (3) the description of the highest scoring hit, (4) the annotated species for the highest scoring hit, and (5) our manually curated annotation for the protein based on all the significant hits for the protein family. Manual annotation was required since the best hit for a sequence does not always correspond to the most plausible annotation, due to wrong metadata or to the discovery of a distant relative of a protein conserved in many different organisms.