Skip to main content
. 2011 Jun 27;39(Web Server issue):W38–W44. doi: 10.1093/nar/gkr441

Table 1.

Databases used by the FFAS server that were added or significantly modified since 2005 [databases of profiles such as PDB, PfamA, SCOP and COG, added before 2005, are regularly updated; for details, see (1)]

Database Sources and preparation of the data
Profile preparation database used to calculate sequences profiles
    NR85S (sequences) The NR database from National Center for Biotechnology Information (NCBI) and the following sets of metagenomic sequences: Global Ocean Sampling (GOS) data from the JCVI and CAMERA consortia (6), microbial metagenome samples from the Joint Genome Institute (http://imgweb.jgi-psf.org/cgi-bin/m/main.cgi), human gut metagenome samples from the Hattori Lab (24), the Human Oral Microbiome Database from The Forsyth Institute (http://www.homd.org/index.php), and the human gut dataset from the Meta-HIT consortium (7). All sequences have been clustered at 85% of sequence identity with the CD-HIT program (25). The regions of low complexity have been masked with the SEG program (26).
New annotation databases available for profile–profile searches by FFAS
    VFDB (profiles) VFDB: Virulence Factors Database (VFDB) (27) from http://www.mgc.ac.cn/VFs/
    HUMSAVAR (profiles) Human polymorphisms and disease mutations (HUMSAVAR) (28) from (http://www.uniprot.org/docs/humsavar). Proteins containing >1000 residues were split into overlapping fragments of 500 residues.
    Complete human proteome (profiles) The set of sequences of canonical isoforms of human proteins have been downloaded from the Uniprot database page of Complete Proteomes (http://www.uniprot.org/taxonomy/complete-proteomes). Proteins containing >600 residues were split into overlapping fragments of 300 residues. Signal peptides predicted with SignalP (29) were removed from all sequences (similarities between signal peptides present in different proteins tend to increase the number of false positives in profile–profile searches).
    Selected microbial proteomes (pathogens and members of human microbiome) and two eukaryotic proteomes (profiles) The proteomes of Bacillus anthracis, Borrelia burgdorferi, Bacteroides thetaiotaomicron, Caulobacter crescentus, Chlamydia trachomatis, Escherichia coli, Eubacterium rectale, Helicobacter pylori, Mycoplasma genitalium, Mycoplasma pneumoniae, Mycobacterium tuberculosis, Neisseria meningitidis, Staphylococcus aureus, Saccharomyces cerevisiae, Salmonella typhi, Thermotoga maritima and Yersinia pestis have been downloaded from the NCBI database of complete microbial genomes (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi). When multiple strains of the same organism were available, the strain with the most references in the literature was used. Signal peptides predicted with SignalP were removed from all sequences. Proteins containing >1000 residues were split into overlapping fragments of 500 residues.