TABLE 2.
Factors to compare between databases | Database
|
||||
---|---|---|---|---|---|
GenBank | MicroSeq | RDP-II | RIDOM | Our internal database | |
Total no. of sequences in databasea | About 20,000,000, with about 90,000 16S rRNA gene | About 1,400 | About 6,000 | More than 300 | About 1,500 clinical strains |
Was a similar sequence in the database? | Yes, several; closest were G. bergeri (Y13365) and an unnamed oral strain | No, but Gemella haemolysans was identified as the closest relative at 6.5%; thus, one must compare to other databases also | Yes | No | Yes; 0% difference from isolate from finger abscess |
Can sequences be imported from another database? | Yes; import best selected sequences from GenBank | ||||
Software comparison of our isolate with G. bergeri (Y13365) | 0% difference, 100% homology to G. bergeri (Y13365) | 0% difference from imported G. bergeri (Y13365) | 0.963 related to G. bergeri (Y13365) in database | NA | |
Comparison with a second tier of closely related strains | There are two sequences deposited as G. haemolysans ATCC 10379; one with GenBank no. L14326 is good quality, one with GenBank no. M58799 has too many N's; L14326 is 93% similar | G. morbillorum and G. haemolysans are equally related at 6.52% dissimilarity | G. haemolysans ATCC 10379 is related at 0.677; these sequences are generally imported from GenBank; the poor-quality sequence, M58799, looks as if it is more closely related, with a 0.739 relative relatedness | NA |
For all the public databases, total numbers of sequences are increasing rapidly.