Abstract
The Fungal Secretome KnowledgeBase (FunSecKB) provides a resource of secreted fungal proteins, i.e. secretomes, identified from all available fungal protein data in the NCBI RefSeq database. The secreted proteins were identified using a well evaluated computational protocol which includes SignalP, WolfPsort and Phobius for signal peptide or subcellular location prediction, TMHMM for identifying membrane proteins, and PS-Scan for identifying endoplasmic reticulum (ER) target proteins. The entries were mapped to the UniProt database and any annotations of subcellular locations that were either manually curated or computationally predicted were included in FunSecKB. Using a web-based user interface, the database is searchable, browsable and downloadable by using NCBI’s RefSeq accession or gi number, UniProt accession number, keyword or by species. A BLAST utility was integrated to allow users to query the database by sequence similarity. A user submission tool was implemented to support community annotation of subcellular locations of fungal proteins. With the complete fungal data from RefSeq and associated web-based tools, FunSecKB will be a valuable resource for exploring the potential applications of fungal secreted proteins.
Database URL: http://proteomics.ysu.edu/secretomes/fungi.php
Introduction
Fungi play an important role in carbon cycling as they use secreted enzymes to break down lignocelluloses and other biopolymers then transporting the resulting products into the cells as their food. The secreted proteins in plant associated fungi play important roles in plant and fungi symbiosis or fungal pathogenicity (1). Fungal secreted proteins also play important roles in the development of fungal diseases in human (2,3). Secreted fungal enzymes have found a wide range of applications in the food, feed, pulp and paper, bioethanol and textile industries (4).
Signal-peptide dependent secreted proteins contain a signal peptide (SP) at the N-terminus that directs the ribosomes to the rough endoplasmic reticulum (ER) for completing polypeptide synthesis (5,6). The signal peptide, typically 15–30 amino acids long and consisting of 15–20 hydrophobic amino acid residues, is cleaved off during translocation across the membrane. While some proteins without an N-terminal signal peptide can be found in the ER and the Golgi, over 90% of human secreted proteins (7) and ∼90% of the Aspergillus niger extracellular proteins identified by mass spectrometry contain classical N-terminal signal peptides (8). There are also examples of non-classically secreted proteins in fungi, including the Saccharomyces cerevisiae mating pheromone a-factor (9) and two galectins from Coprinus cinereus (10), but it is generally believed that the vast majority of secreted fungal proteins are processed by the classical secretory pathway (8).
The term secretome is often used to refer to the complete set of secreted proteins in an organism (2,11,12). However, the term has also been used to include the set of proteins involved in the secretory pathway (13,14). In the work described here, the secretome only includes the secreted proteins in an organism. Along with an increased number of species having genomes being completely sequenced, we see an increased number of publications on fungal secretome identification and analysis using both computational and experimental approaches (15). For example, secretomes have been reported in following fungi including A. niger (8), Candida albicans (16), Phanerochaete chrysosporium (17), Sclerotinia sclerotiorum (18), Fusarium graminearum (19) and Ustilago maydis (20). Considering the biological importance of secreted proteins and their potential industrial applications, we developed a knowledgebase of fungal secretomes for identification, annotation and curation of both computationally predicted and experimentally identified fungal secreted proteins. This knowledgebase is designed to serve as a central portal for providing as well as collecting information on fungal secretomes.
Data collection and database implementation
The fungal protein sequences were retrieved from the NCBI Reference Sequence collection (RefSeq) database (release April, 2010) (http://www.ncbi.nlm.nih.gov/RefSeq/). The rational for choosing the RefSeq protein data set was that RefSeq provides a comprehensive, integrated, non-redundant, well-annotated set of proteins and also the corresponding nucleotide sequences were also linked for these protein sequences in their database (21). The data in the fungal secretome knowledgebase (FunSecKB) were obtained from the following three sources: (i) the features predicted using computational approaches; (ii) subcellular locations annotated in UniProtKB; and (iii) our manual curation with experimental evidence obtained from recent literature.
Computational methods for prediction of secreted proteins
The fungal protein sequences downloaded from the NCBI RefSeq database were processed using the following programs including SignalP (version 3.0, http://www.cbs.dtu.dk/services/SignalP/) (22), Phobius (http://phobius.binf.ku.dk/) (23,24), WolfPsort (http://wolfpsort.org/) (25,26) and TargetP (http://www.cbs.dtu.dk/services/TargetP/) (27), for signal peptide and subcellular location prediction. We chose these four predictors because they were previously evaluated favorably and widely used by the fungal secretome research community (8,16,28). TMHMM (http://www.cbs.dtu.dk/services/TMHMM) was used to identify proteins having transmembrane domains (29) and PS-Scan (http://www.expasy.org/tools/scanprosite/) was used to scan ER targeting sequence (Prosite: PS00014) (30). With each of the programs, the default parameters for eukaryotes or fungi were used. For SignalP prediction, only entries that were predicted having a ‘mostly likely cleavage site’ by SignalP-NN algorithm and a ‘signal peptide’ by SignalP-HMM algorithm were considered to be true signal peptide ‘positives’, using the N-terminal 70 amino acids (22). For predicting membrane proteins using TMHMM, the entries having membrane domains not located within the N-terminus (the first 70 amino acids) were treated as real membrane proteins. Protein sequences predicted to have a signal peptide by SignalP were further processed using FragAnchor to identify the glycosylphosphatidyinositol (GPI) anchors (http://navet.ics.hawaii.edu/∼fraganchor/NNHMM/NNHMM.html) (31). Protein sequences predicted as having a GPI anchor may be attached to the outside of the plasma membrane or may be secreted to be targeted to the cell wall (32).
We recently performed the accuracy evaluation of the computational methods, using 241 experimentally identified secreted proteins and 5992 non-secreted proteins in fungi that were retrieved from UniProt/Swiss-Prot data set, and found that the highest prediction accuracy (92.1% in sensitivity and 98.9% in specificity) was achieved by combining SignalP, WolfPsort and Phobius for signal peptide prediction, TMHMM for eliminating membrane proteins, and PS-Scan for removing ER targeting proteins (28). Thus, the secretomes defined in this study include the manually curated secreted proteins along with the proteins predicted as having a signal peptide at their N-terminus by SignalP and Phobius and with a subcellular location predicted as extracellular by WolfPsort, but not having a transmembrane domain or an ER targeting signal. The information provided by TargetP and fragAnchor were also included in the annotation which may be useful for identifying mitochondrial targeted proteins or GPI anchored membrane or cell wall proteins. An overview of the database’s features are shown in Figure 1.
Linking RefSeq proteins to UniProtKB annotation
The fungal protein entries in FunSecKB are linked to the UniProtKB using the mapping information generated in UniProtKB (ftp://ftp.uniprot.org/pub/databases/uniprot/current release/knowledgebase/idmapping/) (33). We also integrated the subcellular location information of fungal proteins annotated in UniProtKB including curated (reviewed, from the UniProtKB/Swiss-Prot data set) and predicted (unreviewed, from the UniProtKB/TrEMBL data set). In addition, we also included manually curated protein entries in UniProtKB/Swiss-Prot data set which could not be mapped to entries in the RefSeq database.
Manual curation and community annotation
FunSecKB supports community curation of subcellular locations of fungal proteins based on published experimental evidence. A submission form was developed for users to provide subcellular location annotation and the literature source to support the annotation. After our curator’s validation, these data will be incorporated into the database. Currently we have manually curated more than two hundred secreted proteins from A. niger (8). Manual curation is an ongoing process, thus additional secreted proteins will be manually curated and integrated into the database with time.
The information from the above three sources are integrated in the annotation (Figure 1). The annotated entries are linked to the RefSeq database in NCBI and UniProtKB as well as related literature for entries manually curated by our curators or the community. The data will be updated when a new RefSeq data set is released from NCBI (http://www.ncbi.nlm.nih.gov/RefSeq/).
Data access
FunSecKB can be accessed through the database web interface at http://proteomics.ysu.edu/secretomes/fungi.php. There are three approaches to accessing the data including: (i) search individual proteins using NCBI’s RefSeq gi or accession number, UniProt accession number, keyword or by species; (ii) search or download the whole secretome or a subset of manually curated secreted proteins of a species and (iii) search all fungal proteins or fungal secreted proteins using BLAST.
The annotation page contains the summary and the details of subcellular locations predicted by the tools mentioned above and annotation retrieved from UniProtKB. Each entry is linked to both RefSeq and UniProtKB. The secretome, including predicted and curated secreted proteins from a particular species, can be searched and downloaded by selecting a species from the species list for complete genomes or inputting a species name for others not having a complete genome. The protein sequences of the secretome from a species can be downloaded into a fasta file. Manually curated secreted proteins consist of proteins retrieved from UniProtKB/Swiss-Prot with subcellular locations labeled as ‘reviewed’ and proteins curated by our curators and the users. The proteins curated by us and by the community are supported by experimental evidence for their subcellular location annotation and the related literature can be found on the same page. The annotation page also contains the primary protein sequence (Figure 1). The database interface provides a link to the BLAST input interface to search through the proteins retrieved from RefSeq: either all fungal proteins or just the fungal secretomes.
Preliminary data analysis
Currently FunSecKB contains a total of 478 073 fungal protein sequences including 23 878 predicted and/or curated secreted proteins from a total of 118 fungal species. This includes 52 fungal species, with one species having two different varieties, having a complete predicted proteome set. We performed a preliminary analysis on the 53 complete secretomes of 52 fungal species including 43 Ascomycetes, 7 Basidiomycetes (with Cryptococcus neoformans having two varieties) and 2 Microsporidia (Table 1). Overall, fungal species having an expanded genome size encode more proteins in their predicted proteomes (r = 0.75) (Figure 2a). Ajellomyces dermatitidis and Postia placenta are two outliers. For the P. placenta genome of 69 Mb the RefSeq only has 9083 predicted proteins, however, Martinez et al. (2009) reported 17 173 proteins predicted from the P. placenta genome (34). Thus the discrepancy may be caused by lagged database update. The reason for the A. dermatitidis data is not known.
Table 1.
Species | Phylum | Genome (Mb) | Predicted Proteome | Predicted Secretome | Curated Secretome | GPI-anchored Secretome | Soluble Secretome | Secretome (%) | GPI-anchored Portion (%) |
---|---|---|---|---|---|---|---|---|---|
Ajellomyces capsulatus | Ascomycota | 31 | 9313 | 224 | 0 | 25 | 199 | 2.4 | 11.2 |
Ajellomyces dermatitidis | Ascomycota | 74 | 9587 | 335 | 0 | 51 | 284 | 3.5 | 15.2 |
Ashbya gossypii | Ascomycota | 8 | 4725 | 93 | 2 | 21 | 72 | 2.0 | 22.6 |
Aspergillus clavatus | Ascomycota | 28 | 9121 | 571 | 17 | 71 | 500 | 6.3 | 12.4 |
Aspergillus flavus | Ascomycota | 36 | 13 487 | 951 | 25 | 100 | 851 | 7.1 | 10.5 |
Aspergillus fumigatus | Ascomycota | 29 | 9630 | 624 | 58 | 74 | 550 | 6.5 | 11.9 |
Aspergillus nidulans | Ascomycota | 30 | 9541 | 704 | 29 | 76 | 628 | 7.4 | 10.8 |
Aspergillus niger | Ascomycota | 34 | 14 102 | 832 | 253 | 82 | 750 | 5.9 | 9.9 |
Aspergillus oryzae | Ascomycota | 37 | 12 074 | 843 | 28 | 85 | 758 | 7.0 | 10.1 |
Aspergillus terreus | Ascomycota | 29 | 10 401 | 774 | 23 | 70 | 704 | 7.4 | 9.0 |
Botryotinia fuckeliana | Ascomycota | 39 | 16 389 | 755 | 4 | 92 | 663 | 4.6 | 12.2 |
Candida albicans | Ascomycota | 28 | 14 633 | 449 | 41 | 117 | 332 | 3.1 | 26.1 |
Candida dubliniensis | Ascomycota | 16 | 5860 | 184 | 0 | 55 | 129 | 3.1 | 29.9 |
Candida glabrata | Ascomycota | 12 | 5192 | 121 | 7 | 48 | 73 | 2.3 | 39.7 |
Candida tropicalis | Ascomycota | 15 | 6254 | 212 | 1 | 64 | 148 | 3.4 | 30.2 |
Chaetomium globosum | Ascomycota | 34 | 11 048 | 862 | 1 | 67 | 795 | 7.8 | 7.8 |
Clavispora lusitaniae | Ascomycota | 16 | 5936 | 169 | 0 | 40 | 129 | 2.8 | 23.7 |
Coccidioides immitis | Ascomycota | 29 | 10 440 | 263 | 2 | 41 | 222 | 2.5 | 15.6 |
Debaryomyces hansenii | Ascomycota | 12 | 6335 | 148 | 1 | 38 | 110 | 2.3 | 25.7 |
Gibberella zeae | Ascomycota | 36 | 11 690 | 900 | 1 | 102 | 798 | 7.7 | 11.3 |
Kluyveromyces lactis | Ascomycota | 11 | 5357 | 113 | 5 | 37 | 76 | 2.1 | 32.7 |
Lachancea thermotolerans | Ascomycota | 10 | 5091 | 128 | 0 | 29 | 99 | 2.5 | 22.7 |
Lodderomyces elongisporus | Ascomycota | 16 | 5799 | 139 | 0 | 34 | 105 | 2.4 | 24.5 |
Magnaporthe grisea | Ascomycota | 40 | 14 010 | 1471 | 3 | 127 | 1344 | 10.5 | 8.6 |
Neosartorya fischeri | Ascomycota | 33 | 10 406 | 751 | 21 | 78 | 673 | 7.2 | 10.4 |
Neurospora crassa | Ascomycota | 39 | 9844 | 592 | 10 | 76 | 516 | 6.0 | 12.8 |
Penicillium chrysogenum | Ascomycota | 32 | 12 791 | 703 | 5 | 102 | 601 | 5.5 | 14.5 |
Penicillium marneffei | Ascomycota | 29 | 10 663 | 538 | 0 | 79 | 459 | 5.0 | 14.7 |
Phaeosphaeria nodorum | Ascomycota | 37 | 16 002 | 1103 | 1 | 101 | 1002 | 6.9 | 9.2 |
Pichia guilliermondii | Ascomycota | 11 | 5920 | 159 | 0 | 33 | 126 | 2.7 | 20.8 |
Pichia pastoris | Ascomycota | 9 | 5040 | 105 | 0 | 31 | 74 | 2.1 | 29.5 |
Pichia stipitis | Ascomycota | 15 | 5816 | 144 | 0 | 35 | 109 | 2.5 | 24.3 |
Podospora anserina | Ascomycota | 33 | 10 272 | 789 | 1 | 89 | 700 | 7.7 | 11.3 |
Pyrenophora tritici-repentis | Ascomycota | 37 | 12 169 | 942 | 0 | 93 | 849 | 7.7 | 9.9 |
Saccharomyces cerevisiae | Ascomycota | 12 | 5885 | 156 | 101 | 41 | 115 | 2.7 | 26.3 |
Schizosaccharomyces japonicus | Ascomycota | 11 | 4824 | 109 | 0 | 7 | 102 | 2.3 | 6.4 |
Schizosaccharomyces pombe | Ascomycota | 13 | 5001 | 112 | 43 | 7 | 105 | 2.2 | 6.3 |
Sclerotinia sclerotiorum | Ascomycota | 38 | 14 446 | 623 | 1 | 88 | 535 | 4.3 | 14.1 |
Talaromyces stipitatus | Ascomycota | 36 | 13 252 | 580 | 0 | 65 | 515 | 4.4 | 11.2 |
Uncinocarpus reesii | Ascomycota | 22 | 7760 | 312 | 0 | 45 | 267 | 4.0 | 14.4 |
Vanderwaltozyma polyspora | Ascomycota | 15 | 5376 | 116 | 0 | 28 | 88 | 2.2 | 24.1 |
Yarrowia lipolytica | Ascomycota | 22 | 6472 | 299 | 5 | 78 | 221 | 4.6 | 26.1 |
Zygosaccharomyces rouxii | Ascomycota | 12 | 4994 | 120 | 0 | 33 | 87 | 2.4 | 27.5 |
Coprinopsis cinerea | Basidiomycota | 36 | 13 546 | 917 | 8 | 106 | 811 | 6.8 | 11.6 |
Cryptococcus neoformans (neoformans B-3501A) | Basidiomycota | 19 | 6578 | 186 | 0 | 34 | 152 | 2.8 | 18.3 |
Cryptococcus neoformans (neoformans JEC21) | Basidiomycota | 21 | 6594 | 181 | 0 | 30 | 151 | 2.7 | 16.6 |
Laccaria bicolor | Basidiomycota | 59 | 18 215 | 650 | 0 | 99 | 551 | 3.6 | 15.2 |
Malassezia globosa | Basidiomycota | 9 | 4286 | 134 | 0 | 8 | 126 | 3.1 | 6.0 |
Moniliophthora perniciosa | Basidiomycota | 27 | 13 649 | 465 | 0 | 39 | 426 | 3.4 | 8.4 |
Postia placenta | Basidiomycota | 69 | 9083 | 391 | 0 | 22 | 369 | 4.3 | 5.6 |
Ustilago maydis | Basidiomycota | 20 | 6548 | 431 | 2 | 21 | 410 | 6.6 | 4.9 |
Encephalitozoon cuniculi | Microsporidia | 3 | 1996 | 17 | 2 | 0 | 17 | 0.9 | 0.0 |
Enterocytozoon bieneusi | Microsporidia | 4 | 3632 | 21 | 0 | 0 | 21 | 0.6 | 0.0 |
Other species | 998 | 367 | 366 | ||||||
Total | 478 073 | 23 878 | 1067 | 3014 |
The proportion of the secretomes in the proteomes in different species varies significantly from <1% in Encephalitozoon cuniculi and Enterocytozoon bieneusi, two Microsporidia species (unicellular parasites), to >10% in Magnaporthe grisea, a rice pathogenic fungus (Table 1). Overall, predicted secretome sizes increase with expanded proteome sizes in fungal species (r = 0.83) (Figure 2b). We further identified GPI-anchored proteins in the predicted secretome, which represent insoluble portions of secreted proteins that are components of cell walls or attached to the outside of cell membrane. We see that both insoluble and soluble portions are increased with increased proteome size in different fungal species (Figure 2c and 2d).
The functional categorization of predicted secretomes was analyzed using the rpsBLAST tool in the NCBI BLAST package to search the conserved domain database (35). The highly encoded secreted protein families having more than 50 members in the whole database are listed in Table 2. Preliminary functional analysis revealed that the fungal secretomes largely consist of enzymes, particularly hydrolases, which are used to breakdown carbohydrates, lipids, proteins and all other types of organic materials by fungi (Table 2). Furthermore, a total of 10 397 secreted proteins have GO annotations in UniProtKB. Among them, molecular functional classification using GOSlimViewer (http://agbase.msstate.edu/cgi-bin/tools/goslimviewer_select.pl) showed 43% were hydrolases including peptidases (Figure 3) (36). These enzymes have potential applications in biofuel production. The database user interface features an easy to use option to download predicted secretomes from completely sequenced fungal species. This provides a resource for further detailed species specific or interspecies comparative analysis.
Table 2.
CDD functional domains | Numbers |
---|---|
pfam00135, COesterase, Carboxylesterase | 314 |
pfam03443, Glyco hydro 61, Glycosyl hydrolase family 61 | 301 |
COG0277, GlcD, FAD/FMN-containing dehydrogenases | 287 |
cd04077, Peptidases S8 PCSK9 ProteinaseK like: Peptidase S8 family domain in ProteinaseK-like proteins | 223 |
pfam00450, Peptidase S10, Serine carboxypeptidase | 215 |
pfam00295, Glyco hydro 28, Glycosyl hydrolases family 28 | 207 |
pfam00067, p450, Cytochrome P450 | 160 |
pfam00933, Glyco hydro 3, Glycosyl hydrolase family 3 N terminal domain | 156 |
cd05474, pepsin-like proteinases secreted from pathogens to degrade host proteins | 154 |
COG2303, BetA, Choline dehydrogenase and related flavoproteins | 152 |
pfam01083, Cutinase | 139 |
pfam09362, DUF1996, Domain of unknown function (DUF1996) | 136 |
pfam00264, Tyrosinase, Common central domain of tyrosinase | 130 |
TIGR03388, ascorbase, L-ascorbate oxidase, plant type | 128 |
cd04056, Peptidases S53, Peptidase domain in the S53 family | 124 |
pfam04389, Peptidase M28, Peptidase family M28 | 122 |
COG5309, COG5309, Exo-beta-1,3-glucanase | 121 |
pfam04616, Glyco hydro 43, Glycosyl hydrolases family 43 | 114 |
cd00519, Lipase 3, Lipase (class 3) | 106 |
PRK02106, PRK02106, choline dehydrogenase | 100 |
COG2730, BglC, Endoglucanase | 99 |
pfam00328, Acid phosphat A, Histidine acid phosphatase | 98 |
pfam03856, SUN, Beta-glucosidase (SUN family) | 97 |
pfam07519, Tannase, Tannase and feruloyl esterase | 97 |
smart00656, Amb all, Amb all domain | 94 |
pfam00457, Glyco hydro 11, Glycosyl hydrolases family 11 | 92 |
cd06097, Aspergillopepsin like: Aspergillopepsin like, aspartic proteases of fungal origin | 91 |
cd02877, GH18 hevamine XipI class III | 88 |
pfam00331, Glyco hydro 10, Glycosyl hydrolase family 10 | 88 |
pfam01565, FAD binding 4, FAD binding domain | 87 |
pfam03583, LIP, Secretory lipase | 87 |
pfam03659, Glyco hydro 71, Glycosyl hydrolase family 71 | 87 |
pfam01185, Hydrophobin, Fungal hydrophobin | 85 |
pfam01532, Glyco hydro 47, Glycosyl hydrolase family 47 | 79 |
cd02181, GH16 MLG1 glucanase | 78 |
cd05471, Pepsin-like aspartic proteases, bilobal enzymes that cleave bonds in peptides at acidic pH | 77 |
cd05384, SCP PRY1 like, SCP-like extracellular protein domain, PRY1-like sub-family restricted to fungi | 75 |
cd07203, Fungal Phospholipase B-like; cPLA2 GrpIVA homologs; catalytic domain | 71 |
pfam00840, Glyco hydro 7, Glycosyl hydrolase family 7 | 71 |
pfam00150, Cellulase, Cellulase (glycosyl hydrolase family 5) | 70 |
pfam11790, Glyco hydro cc, Glycosyl hydrolase catalytic core | 70 |
pfam01522, Polysacc deac 1, Polysaccharide deacetylase | 69 |
pfam07971, Glyco hydro 92, Glycosyl hydrolase family 92 | 68 |
smart00636, Glyco 18, Glycosyl hydrolase family 18 | 68 |
cd00842, MPP ASMase, acid sphingomyelinase and related proteins | 67 |
cd03457, intradiol dioxygenase like, Intradiol dioxygenase supgroup | 67 |
pfam03663, Glyco hydro 76, Glycosyl hydrolase family 76 | 67 |
pfam05577, Peptidase S28, Serine carboxypeptidase S28 | 67 |
pfam12296, HsbA, Hydrophobic surface binding protein A | 65 |
cd02183, GH16 GPI glucanosyltransferase | 64 |
COG0654, 2-polyprenyl-6-methoxyphenol hydroxylase and related FAD-dependent oxidoreductases | 63 |
pfam01055, Glyco hydro 31, Glycosyl hydrolases family 31 | 62 |
cd06248, Peptidase M14 Carboxypeptidase A/B-like subfamily | 61 |
pfam02128, Peptidase M36, Fungalysin metallopeptidase (M36) | 61 |
pfam04185, Phosphoesterase, Phosphoesterase family | 61 |
pfam11765, Hyphal reg CWP, Hyphally regulated cell wall protein | 60 |
pfam01328, Peroxidase 2, Peroxidase, family 2 | 59 |
pfam01828, Peptidase A4, Peptidase A4 family | 58 |
pfam03198, Glyco hydro 72, Glycolipid anchored surface protein | 57 |
cd01846, Fatty acyltransferase-like subfamily of the SGNH hydrolases, a diverse family of lipases and esterases | 56 |
pfam02102, Peptidase M35, Deuterolysin metalloprotease (M35) | 56 |
pfam00723, Glyco hydro 15, Glycosyl hydrolases family 15 | 54 |
pfam00128, Alpha-amylase, Alpha amylase, catalytic domain | 53 |
cd08588, Catalytic domain of Arabidopsis thaliana PI-PLC X domain-containing protein | 52 |
PHA03247, PHA03247, large tegument protein UL36; Provisional | 52 |
pfam01301, Glyco hydro 35, Glycosyl hydrolases family 35 | 51 |
pfam11937, DUF3455, Protein of unknown function (DUF3455) | 51 |
Discussion
While constructing our database, a similar fungal secretome database (FSD, http://fsd.snu.ac.kr/) was published by Choi et al. (37). However, there are several important differences between the two databases (Table 3). We used RefSeq data while the FSD used only completely sequenced fungal genome data including some ‘work in progress’ genomes (37). The prediction methods used for identification of secreted proteins were also different. The FSD used a three-layer hierarchical identification rule based on 9 different programs and considered entries to be secreted proteins as long as any one of the tools predicted it to be secreted, thus the number of secreted proteins were much higher than the number predicted in our database. For example, in A. niger, we predicted 832 secreted proteins in the strain CBS 513.88, while Choi et al. (37) predicted 1831 secreted proteins in the same strain and 2616 secreted proteins in the ATCC1015 strain in the FSD (37). However, there were only from 691 to 881 proteins which were predicted to be secreted, with 160 of them being confirmed experimentally in the ATCC1015 strain by Tsang et al. (8). Thus, we believe the methods used in the FSD significantly over-estimated the number of secreted proteins in fungi. In addition, the search for the FSD is limited to using the sequence locus name and can not be searched with NCBI gi and accession number, UniProt accession number or keywords. There is also not a curation tool available for the community annotation in FSD (37).
Table 3.
FSD | FunSecKB | |
---|---|---|
Data source | Fungal genomes | Fungal proteins in RefSeq |
Prediction tools | SignalP3.0; SigCleave; SigPred; RPSP; TMHMM2.0c; TargetP1.1b; PsortII; PredictNLS; SecretomeP1.0f | SignalP 3.0; Phobius1.01; WolfPsort0.2; TargetP1.1b, TMHMM2.0c; PS-Scan |
Data access | Sequence locus name; BLAST | Keywords, RefSeq gi or accession, UniProt accession; BLAST |
Community curation tool | Not available | Available |
In addition to the signal-peptide dependent secreted proteins using the classical ER-Golgi secretory pathway, there are non-classical, signal peptide independent, secretory pathways in all domains of organisms. Mammalian and bacterial leadless secreted proteins have been collected and used to implement the prediction software, SecretomeP, for predicting these proteins (http://www.cbs.dtu.dk/services/SecretomeP/) (38,39). The tool has not been trained with fungal-specific data and the accuracy for predicting fungal non-classical secreted protein could not be evaluated, thus we did not include this tool in our data processing. Although the FSD used SecretomeP to predict non-classical secreted proteins, the predicted secreted proteins were not included in the secretome analysis; including them would make the putative secretome >40% of whole proteome (37). Nevertheless, the FunSecKB and the FSD databases could complement each other as different data sources, prediction tools and data access utilities were implemented.
In summary, we constructed FunSecKB to identify, annotate and curate the secreted proteins in fungi. The data can be searched using protein identifiers or keywords, and by species. Most of the secreted proteins are currently predicted by computational tools. However, the community can use the curation module implemented in our site to manually curate subcellular locations of fungal proteins having experimental evidence. The resource described in the work is expected to provide a query and curation system that will help the community to further understand the secretome biology and explore various potential applications of fungal secreted proteins in bio-processing or environmental remediation industries.
Acknowledgements
We thank Gary Walker at YSU and the anonymous reviewers for providing helpful comments on improving the article.
Funding
Youngstown State University (YSU) Research Council grant (2009-2010 #04-10 to X.J.M.); YSU research professorship (to X.J.M.); College of Science, Technology, Engineering, and Mathematics Dean’s reassigned time (to X.J.M.). Funding for open access charge: the School of Graduate Studies and Research, Youngstown State University, Ohio, USA.
Conflict of interest. None Declared.
References
- 1.Kamoun S. The secretome of plant-associated fungi and oomycetes. In: Deising H, editor. The Mycota V–Plant Relationships. 2nd. Berlin, Heidelberg: Springer; 2009. pp. 173–180. [Google Scholar]
- 2.Cooper KG, Woods JP. Secreted dipeptidyl peptidase IV activity in the dimorphic fungal pathogen Histoplasma capsulatum. Infect. Immun. 2009;77:2447–2454. doi: 10.1128/IAI.01345-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Osherov N. New Insights in Medical Mycology. Netherlands: Springer; 2007. The virulence of Aspergillus fumigatus; pp. 185–212. [Google Scholar]
- 4.O’Toole N, Min XJ, Storms R, Butler G, Tsang A. Sequence-based analysis of fungal secretomes. Appl. Mycol. Biotechnol. Bioinform. 2006;6:277–296. [Google Scholar]
- 5.Blobel G, Dobberstein B. Transfer of proteins across membranes. I. Presence of proteolytically processed and unprocessed nascent immunoglobulin light chains on membrane-bound ribosomes of murine myeloma. J. Cell. Biol. 1975;67:835–851. doi: 10.1083/jcb.67.3.835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.von Heijne G. The signal peptide. J. Membr. Biol. 1990;115:195–201. doi: 10.1007/BF01868635. [DOI] [PubMed] [Google Scholar]
- 7.Scott M, Lu G, Hallett M, et al. The Hera database and its use in the characterization of endoplasmic reticulum proteins. Bioinformatics. 2004;20:937–944. doi: 10.1093/bioinformatics/bth010. [DOI] [PubMed] [Google Scholar]
- 8.Tsang A, Butler G, Powlowski J, et al. Analytical and computational approaches to define the Aspergillus niger secretome. Fungal Genetics Biol. 2009;46:S153–S160. doi: 10.1016/j.fgb.2008.07.014. [DOI] [PubMed] [Google Scholar]
- 9.Chen P, Sapperstein SK, Choi JD, et al. Biogenesis of the Saccharomyces cerevisiae mating pheromone a-factor. J. Cell. Biol. 1997;136:251–269. doi: 10.1083/jcb.136.2.251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Boulianne RP, Liu Y, Aebi M, et al. Fruiting body development in Coprinus cinereus: regulated expression of two galectins secreted by a non-classical pathway. Microbiology. 2000;146:1841–1853. doi: 10.1099/00221287-146-8-1841. [DOI] [PubMed] [Google Scholar]
- 11.Greenbaum D, Luscombe NM, Jansen R, et al. Interrelating different types of genomic data, from proteome to secretome: ‘oming in on function. Genome Res. 2001;11:1463–1468. doi: 10.1101/gr.207401. [DOI] [PubMed] [Google Scholar]
- 12.Hathout Y. Approaches to the study of the cell secretome. Expert Rev. Proteomics. 2007;4:239–248. doi: 10.1586/14789450.4.2.239. [DOI] [PubMed] [Google Scholar]
- 13.Tjalsma H, Bolhuis A, Jongbloed JD, et al. Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome. Microbiol. Mol. Biol. Rev. 2000;64:515–547. doi: 10.1128/mmbr.64.3.515-547.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Simpson JC, Mateos A, Pepperkok R. Maturation of the mammalian secretome. Genome Biol. 2007;8:211. doi: 10.1186/gb-2007-8-4-211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bouws H, Wattenberg A, Zorn H. Fungal secretomes-nature's toolbox for white biotechnology. Appl. Microbiol. Biotechnol. 2008;80:381–388. doi: 10.1007/s00253-008-1572-5. [DOI] [PubMed] [Google Scholar]
- 16.Lee SA, Wormsley S, Kamoun S, et al. An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms. Yeast. 2003;20:595–610. doi: 10.1002/yea.988. [DOI] [PubMed] [Google Scholar]
- 17.Wymelenberg AV, Sabat G, Martinez D, et al. The Phanerochaete chrysosporium secretome: database predictions and initial mass spectrometry peptide identifications in cellulose-grown medium. J. Biotechnol. 2005;118:17–34. doi: 10.1016/j.jbiotec.2005.03.010. [DOI] [PubMed] [Google Scholar]
- 18.Yajima W, Kav NN. The proteome of the phytopathogenic fungus Sclerotinia sclerotiorum. Proteomics. 2006;6:5995–6007. doi: 10.1002/pmic.200600424. [DOI] [PubMed] [Google Scholar]
- 19.Paper JM, Scott-Craig JS, Adhikari ND, et al. Comparative proteomics of extracellular proteins in vitro and in planta from the pathogenic fungus Fusarium graminearum. Proteomics. 2007;7:3171–3183. doi: 10.1002/pmic.200700184. [DOI] [PubMed] [Google Scholar]
- 20.Mueller O, Kahmann R, Aguilar G, et al. The secretome of the maize pathogen Ustilago maydis. Fungal Genet. Biol. 2008;1:S63–S70. doi: 10.1016/j.fgb.2008.03.012. [DOI] [PubMed] [Google Scholar]
- 21.Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35(Database issue):D61–D65. doi: 10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bendtsen JD, Nielsen H, von Heijne G, et al. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 2004;340:783–795. doi: 10.1016/j.jmb.2004.05.028. [DOI] [PubMed] [Google Scholar]
- 23.Käll L, Krogh A, Sonnhammer EL. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 2004;338:1027–1036. doi: 10.1016/j.jmb.2004.03.016. [DOI] [PubMed] [Google Scholar]
- 24.Käll L, Krogh A, Sonnhammer EL. Advantages of combined transmembrane topology and signal peptide prediction - the Phobius web server. Nucleic Acids Res. 2007;35(Web Server issue):W429–W432. doi: 10.1093/nar/gkm256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Horton P, Park KJ, Obayashi T, et al. WoLF PSORT: protein localization predictor. Nucleic Acids Res. 2007;35(Web Server issue):W585–W587. doi: 10.1093/nar/gkm259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sprenger J, Fink JL, Teasdale RD. Evaluation and comparison of mammalian subcellular localization prediction methods. BMC Bioinformatics. 2006;7(Suppl. 5):S3. doi: 10.1186/1471-2105-7-S5-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Olof Emanuelsson O, Henrik Nielsen H, Brunak S, et al. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 2000;300:1005–1016. doi: 10.1006/jmbi.2000.3903. [DOI] [PubMed] [Google Scholar]
- 28.Min XJ. Development of computational protocols for secreted protein prediction in different eukaryotes. J. Proteomics Bioinform. 2010;4:143–147. [Google Scholar]
- 29.Emanuelsson O, Brunak S, von Heijne G, et al. Locating proteins in the cell using TargetP, SignalP and related tools. Nat. Protoc. 2007;2:953–971. doi: 10.1038/nprot.2007.131. [DOI] [PubMed] [Google Scholar]
- 30.de Castro E, Sigrist CJ, Gattiker A, et al. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 2006;34(Web Server issue):W362–W365. doi: 10.1093/nar/gkl124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Poisson G, Chauve C, Chen X, et al. FragAnchor a large scale all Eukaryota predictor of Glycosylphosphatidylinositol-anchor in protein sequences by qualitative scoring. Genomics, Proteomics Bioinform. 2007;5:121–130. doi: 10.1016/S1672-0229(07)60022-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.de Groot PW, Ram AF, Klis FM. Features and functions of covalently linked proteins in fungal cell walls. Fungal Genet. Biol. 2005;42:657–675. doi: 10.1016/j.fgb.2005.04.002. [DOI] [PubMed] [Google Scholar]
- 33.Wu CH, Apweiler R, Bairoch A, et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 2006;34(Database issue):D187–D191. doi: 10.1093/nar/gkj161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Martinez D, Challacombe J, Morgenstern I, et al. Genome, transcriptome, and secretome analysis of wood decay fungus Postia placenta supports unique mechanisms of lignocellulose conversion. Proc. Natl Acad. Sci. USA. 2009;106:1954–1959. doi: 10.1073/pnas.0809575106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Marchler-Bauer A, Anderson JB, Chitsaz F, et al. CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 2009;37(Database issue):D205–D210. doi: 10.1093/nar/gkn845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.McCarthy FM, Wang N, Magee GB, et al. AgBase: a functional genomics resource for agriculture. BMC Genomics. 2006;7:229. doi: 10.1186/1471-2164-7-229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Choi J, Park J, Kim D, et al. Fungal secretome database: integrated platform for annotation of fungal secretomes. BMC Genomics. 2010;11:105. doi: 10.1186/1471-2164-11-105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bendtsen JD, Jensen LJ, Blom N, et al. Feature based prediction of non-classical and leaderless protein secretion. Protein Eng. Des. Sel. 2004;17:349–356. doi: 10.1093/protein/gzh037. [DOI] [PubMed] [Google Scholar]
- 39.Bendtsen JD, Kiemer L, Fausbøll A, et al. Non-classical protein secretion in bacteria. BMC Microbiol. 2005;5:58. doi: 10.1186/1471-2180-5-58. [DOI] [PMC free article] [PubMed] [Google Scholar]