Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Oct 29.
Published in final edited form as: Proteomics. 2008 Apr;8(7):1398–1414. doi: 10.1002/pmic.200700804

Determining the protein repertoire of Cryptosporidium parvum sporozoites

Sanya J Sanderson 1, Dong Xia 1, Helena Prieto 2, John Yates 2, Mark Heiges 3, Jessica C Kissinger 3, Elizabeth Bromley 4, Kalpana Lal 5, Robert E Sinden 5, Fiona Tomley 4, Jonathan M Wastling 1
PMCID: PMC2770187  NIHMSID: NIHMS110628  PMID: 18306179

Abstract

The genome of the intracellular parasite Cryptosporidium parvum has recently been sequenced, but protein expression data for the invasive stages of this important zoonotic gastrointestinal pathogen are limited. In this paper a comprehensive analysis of the expressed protein repertoire of an excysted oocyst/sporozoite preparation of C. parvum is presented. Three independent proteome platforms were employed which yielded more than 4800 individual protein identifications representing 1237 nonredundant proteins, corresponding to ∼30% of the predicted proteome. Peptide data were mapped to the corresponding locations on the C. parvum genome and a publicly accessible interface for proteome data was developed for data-mining and visualisation at CryptoDB (http://cryptodb.org). These data provide a timely and valuable resource for improved annotation of the genome, verification of predicted hypothetical proteins and identification of proteins not predicted by current gene models. The data indicated the expression of proteins likely to be important to the invasion and intracellular establishment of the parasite, including surface proteins, constituents of the remnant mitochondrion and apical organelles. Comparison of the expressed proteome with existing transcriptional data indicated only a weak correlation. For approximately half the proteome there was limited functional and structural information, highlighting the limitations in the current understanding of Cryptosporidium biology.

Keywords: 2-DE, Cryptosporidium, Global protein analysis, MudPIT

1 Introduction

Cryptosporidium parvum is a zoonotic apicomplexan protozoan parasite of the subclass Coccidia that causes a selflimiting gastro-intestinal infection in humans and other animals, but which can be persistent and life-threatening in immunocompromised individuals, particularly those with AIDS [1]. C. parvum is closely related to Cryptosporidium hominis, which is mostly associated with human infections [2] although both species have complex population structures resulting from sexual recombination within the host [3]. Cryptosporidiosis is a global disease, often occurring as water-borne outbreaks and of considerable importance to public health. In the livestock industry infection is widespread and responsible for significant economic losses [4]. Despite recent advances in the development of anticryptosporidial drugs [57], control of infection in humans is largely managed via the removal of oocysts from water supplies. If these control measures fail, outbreaks affecting many thousands of individuals can occur [8]. Infection commences with the ingestion of oocysts. Within the gastrointestinal tract excystation is stimulated, releasing four sporozoites which invade intestinal epithelial cells to occupy a unique intracellular, but extra-cytoplasmic location. The sporozoites undergo asexual reproduction forming merozoites which can invade further cells eventually producing the sexual stages of the parasite, allowing fertilisation and the formation of new oocysts. Apicomplexan parasites have evolved specialised apical invasion organelles, the micronemes, rhoptries and dense granules. Sequential secretion of their contents facilitates attachment, invasion, penetration and maintenance of the parasite within the host, making them attractive targets for drug and vaccine development. In Toxoplasma gondii proteomic analysis of the rhoptry organelles [9] revealed a range of proteins which form a complex and intimate relationship with the host cell upon invasion [10]. By contrast, little is known about the identity and role of the equivalent repertoire of proteins in Cryptosporidium.

The genomes of C. parvum and C. hominis have been sequenced [11, 12] providing 12–13-fold coverage of the 9.1 Mb genome. This is considerably smaller than that of the related Apicomplexa, T. gondii, Eimeria tenella and Plasmodium sp., and reflects a more compact genome with fewer genes, introns and shorter intergenic regions. Gene prediction programmes estimate the number of genes at ∼3900 (http://cryptodb.org), compared with ∼7800 for Toxoplasma (www.toxodb.org) and ∼5300 for Plasmodium falciparum [13]. Comparative orthologous analyses have allowed predictions of potential metabolic pathways, and of the possible protein content of various organelles [11, 12, 14]. Despite a sophisticated gene prediction analysis, a large (∼40%) proportion of the predicted genes of C. parvum have no known function. At present there is little evidence to confirm the expression of many of these genes. Expressed protein information is also valuable in refining the annotation of the genome. Watanabe et al. [15] recently developed a comparative database of apicomplexan full length cDNAs (http://comparasite.hgc.jp/) which incorporates expressed sequence transcript information for 682 C. parvum genes, but which covers only a small fraction of predicted genes. In addition, 567 ESTs corresponding to 334 C. parvum genes are listed at the Cryptosporidium genome database CryptoDB (http://cryptodb.org) [16]. At the protein level information is even more limited. A recent stage-specific proteome analysis of the sporozoites of Cryptosporidium identified just 217 nonredundant proteins, or less than 6% of the total predicted proteome [17]. Together, these data provide experimental evidence for only 485 (12%) of the estimated ∼3900 protein encoding genes.

This aim of this study was to undertake a comprehensive analysis of the expressed proteome of excysted C. parvum sporozoites. Three independent proteomic approaches were adopted to maximise coverage of the proteome: (i) 2-DE LCMS/MS; (ii) 1-DE LC-MS/MS; (iii) multi-dimensional protein identification technology (MudPIT) analysis, where tryptically digested peptides were separated by multidimensional LC followed by MS/MS. More than 4800 protein identifications were made representing 1237 nonredundant proteins in this life-cycle stage and accounting for one-third of the entire predicted proteome of C. parvum. Derived peptides were matched to the C. parvum genome (http://cryptodb.org/) and a user-interface developed to make the data publicly available. This information was used to help confirm predicted exon–intron boundaries and allowed the identification of proteins not predicted by the current gene models as well as providing a valuable and timely resource for those working on the genomics and functional biology of Cryptosporidium.

2 Materials and methods

2.1 Chemicals and materials

Chemicals were AnalaR or HPLC grade (VWR, Poole, UK) except: amidosulphobetaine-14 (ASB-14 Calbiochem, Nottingham, UK); Amberlite IRN 150 (Merck, Damstadt, Germany); CHCA (Sigma–Aldrich, Poole, UK); deoxycholate (Sigma–Aldrich, Steinheim, Germany); iodoacetamide (Sigma–Aldrich); Invitrosol (Invitrogen, Carlsbad, USA); Mini complete protease inhibitor cocktail (Roche, Penzberg, Germany); bovine pancreas sequencing grade trypsin (Roche); thiourea (Sigma–Aldrich); TCEP (tris (2-carboxyethyl) phosphine hydrochloride (Pierce, Rockford, USA); 2-DE consumables (Amersham Biosciences, Little Chalfont, UK). Urea was purified with Amberlite IRN 150 resin for 3 h at room temperature

2.2 Oocyst preparation and excystation

Purified oocysts of C. parvum passaged in lambs (IOWA strain) were purchased from Moredun Research Institute, Edinburgh and stored at 4°C in PBS pH 7.2 with 1000 U/mL penicillin/1000 µL/mL streptomycin [18, 19]. Excystation was performed at 37°C using DOC and sodium hydrogen carbonate [18] and continued until >80% excystation had been observed by microscopic examination at ×400 magnification (∼2 h). Excystation mixtures were pelleted at 13 000×g for 1 min, washed with 1 mL of PBS and repelleted at 13 000×g for 3 min at 4°C. Pellets were used immediately or stored at −80°C.

2.3 2-DE analysis

Freshly excysted sporozoites (4–9×108) were resuspended in 0.5–1 mL of 20 mM Tris/HCL pH 6.7, pelleted at 13 000×g for 3 min at 4°C and solubilised in: for the pH 3–10 NL gel, 150 µL of lysis buffer A (7 M urea, 2 M thiourea, 4% v/v CHAPS, 20 mM Tris base, 60 mM DTT, 1 mM EDTA, 1×Mini Complete protease cocktail inhibitor, 0.5% v/vpH 3–10 NL IPG buffer; for the pH 4–7 L gel, 280 µL of lysis buffer B (7 Murea, 2 M thiourea, 2% w/v CHAPS, 2% w/v ASB-14, 20 mM Tris base, 60 mM DTT, 1 mM EDTA, 1×Mini Complete protease cocktail inhibitor, 0.5% v/vpH 4–7 IPG buffer). Over 2–4 h at room temperature, the samples repeatedly vortexed and finally spun at 14 000×g for 2 min. Supernatants were added to equal volumes of 8 M urea, 2% w/v CHAPS, 0.002% w/v Bromophenol blue, 40 mM DTT), supplemented with 0.5% v/v pH 3–10 NL or pH 4–7 L IPG buffer and used to rehydrate 24 cm Immobiline IPG strips over 24 h at room temperature. Using an Ettan IPGphor II and manifold, the first dimension was as follows: stepped voltage, 500 V for 2 h; gradient voltage, 1000 V over 8 h; gradient voltage, 10 000 V over 3 h; stepped voltage, 10 000 V for 4 h 15 min (∼65 000 V·h). The IPG strips were equilibrated for 15 min each in 6 M urea, 50 mM Tris/HCL pH 8.8, 30% v/v glycerol, 2% w/v SDS, 0.002% w/v Bromophenol blue supplemented with 1% w/v DTT, then with 2.5% w/v iodoacetamide and mounted on 12.5% w/v precast 24 cm acrylamide gels resolved using an Ettan Dalt 6-MultiTemp III apparatus and buffering kit (Amersham Biosciences).

2.4 1-DE analysis

A pellet of 109 sporozoites was solubilised in 40 µL of 60 mM Tris/HCl pH 6.8, 5% v/v glycerol, 2% w/v SDS, 0.01% w/v Bromophenol blue, 100 mM DTT, with three cycles of 5 min at 90°C and 2 min vortexing,then spunat 16 000×g for 3 min. The supernatant was run on a 16 cm 12% v/v acrylamide gel using the denaturing Tris–glycine method of Laemmli [20], at 16 mA for 30 min and 24 mA for 6–7 h at 15°C. The gel was stained with Colloidal CBB, the lane cut into 124 slices <1 mm thickness and each digested with trypsin.

2.5 Colloidal Coomassie staining

Gels were fixed in 40% v/v ethanol, 10% v/v acetic acid overnight at room temperature, rinsed in distilled deionised water, stained for 5 days with Colloidal Coomassie stain (20% v/v methanol, 0.08% w/v CBB G250, 0.8% v/v phosphoric acid, 8% w/v ammonium sulphate), rinsed in distilled deionised water and stored in 1% v/v acetic acid at 4°C.

2.6 In-gel tryptic digestion

Gel plugs/slices were destained at 37°C using 50 mM ammonium bicarbonate/50% ACN. 1-D gel slices were incubated at 37°C with 10 mM DTT/100 mM ammonium bicarbonate for 30 min, then 100 mM iodoacetamide/ 100 mM ammonium bicarbonate for 1 h in the dark. Gel plugs/slices were dehydrated with 100% v/v ACN at 37°C and rehydrated at 37°C with 10 εL of 10 ng/µL sequencing grade trypsin in 25 mM ammonium bicarbonate. After 1 h, 25 mM ammonium bicarbonate was added to cover the gel pieces which were left at 37°C overnight. The reaction was stopped with 2 µL of 2.6 M formic acid (FA) and the samples stored at −20°C.

2.7 MALDI-PMF

MALDI-TOF was performed on a Waters Micromass mass spectrometer (Waters, Manchester, UK) operated in reflectron mode with positive ion detection. Samples were mixed 1:1 v/v with a saturated solution of CHCA in 50% v/v ACN/ 49% v/v water/0.1% v/v TFA, spotted onto the MALDI target and air-dried. Monoisotopic peptide masses of 1000–4000 m/z (Thomsons) were collected using a laser frequency of 5 Hz. Spectra typically consisted of 50–100 summated acquisitions (equivalent to 500–1000 laser shots) at laser energy 30–50% of the maximum. External mass calibration used a mixture of des-Arg bradykinin, neurotensin, ACTH (Corticotrophin) and oxidised insulin B chain (2.4, 2.4, 2.6 and 30 pmol/µL, respectively) each in 50% v/v ACN/0.1% v/v TFA. Data acquisition and processing used the MassLynx software suite (version 3.5), and the resultant monoisotopic peptide masses searched against (i) a locally installed protein database comprising gene prediction models and all possible ORFs >50 amino acids originating from the Cryptosporidium genome database (http://cryptodb.org/) and (ii) the MSDB database (http://ftp.ncbi.nih.gov/repository/MSDB/msdb.nam, version 20052909) mounted on a local MASCOT search engine. Search parameters: mass error ± 250 ppm; single missed trypsin cleavage; fixed carbamidomethyl modification of cysteine; variable oxidation of methionine.

2.8 MS/MS

LC-MS/MS used an LTQ ion-trap mass spectrometer (Thermo-Electron, Hemel Hempstead, UK) coupled on-line to a Dionex Ultimate 3000 (Dionex Company, Amsterdam, The Netherlands) HPLC system equipped with a nano-pep-Map100 C18 RP column (75 µm; 3 µm, 100 Å) equilibrated in 97.9% v/v water/2% v/v ACN/0.1% v/v FA at 300 nL/min. Tryptic peptides were desalted on a C18 TRAP, and resolved with a linear gradient of 0–50% v/v ACN/0.1% v/v FA over 30 min, followed by 80% v/v ACN/0.1% v/v FA for 5 min. Ionised peptides were analysed using the ‘triple play’ mode (0–06 m/z, global and Msx), consisting initially of a survey (MS) spectrum from which the three most abundant ions were determined (threshold = 200–500 TIC). The charge state of each ion was assigned from the C13 isotope envelope ‘zoom scan’, fragmented (collision energy 35% for 30 ms) and subjected to an MS/MS scan. The LTQ was tuned using a 500 fmol/µL solution of glufibrinopeptide (m/z 785.8, [M + 2H]2+). The resulting MS/MS spectra were submitted to TurboSequest Bioworks version 3.1 (Threshold cut off 0–1000; group scan default 100; minimum group count 1; minimum ion count 15; peptide tolerance 1.5), the individual spectra (dta files) merged into an mgf file and submitted to MASCOT (Matrix Science) and searched against a locally mounted Cryptosporidium genome database (see above). Search parameters: fixed carbamidomethyl modification of cysteine; variable oxidation of methionine; peptide tolerance ± 1.5 Da; MS/MS tolerance ± 0.8 Da; +1, +2, +3 peptide charge state; single missed trypsin cleavage. The average peptide score (protein score divided by number of peptides matching) and the MASCOT plug-hole score (ion scores above this convey >95% confidence level; p<0.05) were included during manual curation of significant hits [21, 22].

2.9 Sample preparation for MudPIT

A pellet of 109 sporozoites resuspended to ∼500 µg/mL in 500 µL 100 mM Tris buffer pH 8.5 were lysed by three cycles of freeze/thaw and the Tris-soluble and insoluble protein fractions separated at 16 000×g for 30 min.

Digestion of soluble fractions: MS compatible detergent Invitrosol was added to 1% v/v, the solution heated to 60°C for 5 min, vortexed for 2 min, denatured with 2 M urea, reduced with 5 mM TCEP, carboxyamidomethylated with 10 mM iodoacetamide, followed by addition of 1 mM CaCl2 and trypsin at a ratio of 1:100 (enzyme:protein) and incubated at 37°C overnight.

Digestion of insoluble fractions: 10% v/v Invitrosol was added to the pellet which was heated to 60°C for 5 min, vortexed for 2 min and sonicated for 1 h. The sample was diluted to 1% v/v Invitrosol with 8 M urea/100 mM Tris/HCl pH 8.5, reduced and carboxyamidomethylated as before, and digested with endoproteinase Lys-C for 6 h. The solution was diluted to 4 M urea with 100 mM Tris/HCl pH 8.5 and digested with trypsin as described above.

2.10 MS analysis by MudPIT

Three soluble replicates and one insoluble sample were each subjected to MudPIT analysis with modifications to the method of Link et al. [23], using a quaternary Agilent 1100 series HPLC coupled to a Finnigan LTQ-IT mass spectrometer (Thermo, San Jose, CA) with a nano-LC ESI source [24]. Peptide mixtures were resolved by strong cation exchange LC upstream of RP-LC as described [25]. Each sample (∼100 µg) was loaded onto separate microcolumns and resolved by fully automated 12 step chromatography.

Protein databases: A Cryptosporidium database was assembled (see above). To identify contaminant host-derived proteins the parasite database was supplemented with a contaminant database. As the method for Cryptosporidium oocyst isolation from host faecal material eliminates mammalian cells, the most likely source of contamination derives from natural gut flora; hence the complete prokaryote database from NCBI was used as a contaminant database. To estimate the amount of false positives a reverse database was added [26]. Poor quality spectra were removed from the data set using an automated spectral quality assessment algorithm [27]. Tandem mass spectra remaining after filtering were searched with the SEQUEST algorithm version 27 [28]. All searches were in parallel and were performed on a Beowulf computer cluster consisting of 100 1.2 GHz Athlon CPUs [29]. No enzyme specificity was considered for any search. SEQUEST results were assembled and filtered using the DTASelect (version 2.0) program [30] which uses a quadratic discriminate analysis to dynamically set XCorr and DeltaCN thresholds for the entire data set to achieve a userspecified false positive rate (5% in this analysis). The false positive rates are estimated by the program from the number and quality of spectral matches to the decoy database.

3 Results

3.1 2-DE Proteome map of C. parvum excysted sporozoites

Proteins from excysted sporozoites were resolved on broad (pH 3–10) and narrow (pH 4–7) range 2-DE (Fig. 1 and Fig. 2). Four large format gels were run of which two examples are shown. Individual protein spots were digested with trypsin, and analysed using either MALDI-TOF PMF or MS/MS to obtain protein identifications. Clusters of proteins sharing the same identification were common (shown boxed) and these most likely represent either isoenzymes, or proteins with PTM. Some gel plugs contained more than one protein and this is represented by overlapping boxes. Taking into account redundancy between gels, 282 unique proteins were identified across the four gels (Supporting Information Tables S1 and S2). Assuming post-translational variants are the product of a single gene, these data represents the expression of 115 individual Cryptosporidium genes.

Figure 1.

Figure 1

pH 3–10 NL proteome map of excysted C. parvum sporozoites. Soluble proteins from 4.8×108 sporozoites (∼160 µg protein) resolved by IEF over a broad nonlinear pH 3–10 range followed by molecular mass on a 12.5% w/v acrylamide gel under denaturing conditions. Protein spots are visualised using Coomassie Colloidal stain.

Figure 2.

Figure 2

pH 4–7 L proteome map of excysted C. parvum sporozoites. Soluble proteins from 7.5×108 sporozoites (∼250 µg protein) resolved by IEF over a narrow linear pH 4–7 range followed by molecular mass on a 12.5% w/v acrylamide gel under denaturing conditions. Protein spots are visualised using Coomassie Colloidal stain.

3.2 Gel-LC-MS/MS analysis of C. parvum excysted sporozoites

An SDS-soluble protein pool from excysted sporozoites was resolved by large format 1-DE and the lane divided into 124 contiguous gel slices of approximately 1 mm thickness (Fig. 3). Each gel slice was treated with trypsin and subjected to MS/MS analysis. An average of ∼23 proteins was identified per gel slice, giving a total of 2805 significant protein identifications from the entire gel. Abundant proteins appeared in more than one slice as the thickness of these protein bands exceeded the dimensions of the gel slices excised. A number of proteins were identified migrating to several positions in the gel corresponding to different Mr values. This could be an indication of: proteolytic processing (biologically relevant, or otherwise); the presence of additional PTM; or the retention of strong protein–protein interactions. Where a gene was identified in more than one gel slice, the hit with the best score is indicated in Supporting Information Table S3. These data represent a total of 642 nonredundant Cryptosporidium proteins.

Figure 3.

Figure 3

Sporozoite proteins soluble in SDS resolved by 1-D SDS-PAGE. SDS-soluble proteins from ∼109 sporozoites (∼139 µg protein) were resolved on a 12% w/v acrylamide gel under denaturing conditions as follows: protein standards (lane 1), C. parvum soluble protein (lane 2). Proteins were visualised using Coomassie Colloidal stain. The masses of the protein standards and the position of every tenth gel slice are shown.

3.3 MudPIT analysis of C. parvum excysted sporozoites

Excysted sporozoites were lysed to produce ‘soluble’ and ‘insoluble’ protein fractions which were processed for MudPIT analysis. For each fraction, more than 1000 protein hits were obtained. After removing redundancies caused by multiple gene annotations, 853 proteins were identified in the soluble fraction and 819 in the insoluble fraction, giving a total of 1672 proteins (Supporting Information Tables S4 and S5). The cellular fractionation prior to MudPIT was not designed to give optimal separation of soluble and membrane proteins, thus a large proportion of proteins were found in both fractions. A total of 1154 nonredundant proteins were identified (304 unique proteins in the soluble fraction and 338 unique proteins in the insoluble fraction, with 515 shared between both fractions).

3.4 Comparison of the different proteomic approaches

The rationale for using a multiplatform proteomic approach has been recently verified [31, 32]. This study adopted three independent approaches and three mass spectrometers to maximise coverage of the proteome resulting in a combined total of 1237 unique proteins. This represents 32% of the predicted proteome for all life-stages of Cryptosporidium based on current gene prediction models (http://cryptodb.org/). Although the number of proteins that are expected to be expressed from the sporozoite/oocyst excystation mixture is not known, it is likely to be less than the total predicted proteome, since stage-specific proteins from the merozoites and the sexual stages of the parasite are not likely to be represented. A proteomic analysis of the life cycle stages of related parasite P. falciparum using MudPIT revealed that only 6% of proteins were common to all stages [33]. Figure 4 illustrates the distribution of protein identifications obtained using the three proteomic approaches. MudPIT analysis generated the greatest number of identifications with almost twice that of Gel LC-MS/MS analysis and tenfold more than that obtained with the 2-DE analysis. Out of the three methods, Gel LC-MS/MS and MudPIT had the greatest overlap (567 proteins). However, proteins were identified which were unique to each approach (MudPIT, 47%; Gel LC-MS/MS, 6%; 2-DE, 0.7%). The statistical quality of the identifications from Gel LC-MS/MS analysis was better than that obtained from both 2-DE analysis and MudPIT, as has been previously observed [31]. Although identified with significant scores, ∼20% of the 1237 protein identifications were dependent upon one peptide and were identified using one approach. This is a relatively low proportion compared to many whole-cell proteomic investigations [33, 34]. The validity of these unique single-peptide identifications were verified by BLAST analysis of the individual peptides against MSDB.

Figure 4.

Figure 4

Comparison of proteome strategies. Venn diagram illustrating the numbers of unique and shared nonredundant protein identifications obtained for each of the three proteome approaches employed in this study.

3.5 Functional categories of expressed sporozoite proteins

Of the total dataset of 1237 expressed sporozoite/oocyst proteins detected, structural information was available for 1165, of which 213 proteins (18%) were found to contain one or more transmembrane (TM) domain motifs using a hidden Markov prediction program, TMHMM2. MudPIT favoured a greater proportionofthe hydrophobic proteins (18%) than did Gel LC-MS/MS analysis (14%) or 2-DE analysis (10%). Of the hydrophobic proteins, 65% were predicted to have a single TM domain. The remainder exhibited a wide distribution of TM domain content with ∼20% of those detected by MudPIT and Gel LC-MS/MS having fourormore. Proteins witha large number of TM domains (6–14) fall into the category of transporter proteins, involved in trafficking molecules across membranes or energy production via electron transport, for example, ABC Transporters (e.g. cgd6_5450); electron or cation transporters (genes cgd8_2330 and cgd1_990). Proteins without TM domains belonged to a diverse spectrum of functions, such as metabolism, protein synthesis/folding/degradation and transcription. Significantly, almost 50% of the proteins with TM domains (105) and 40% of soluble proteins (377) were of unknown function.

The complete dataset of proteins were ascribed a functional category (Fig. 5, grey bars) by assigning gene ontology (GO) classifications listed on CryptoDBto specific Munich Information Centre for Protein Sequences (MIPS) categories within the FunCatDB functional catalogue (http://mips.gsf.de/projects/funcat). Of these, 183 proteins were without a GO classification but it was possible to assign a putative MIPS category using additional information provided by BLASTsimilarities, Pfam domain alignments (http://www.sanger.ac.uk/Software/Pfam/), Cryptosporidium orthologues and paralogues, and from independent literature searches (Figure 5, black bars). For a large proportion (39%) of the proteins detected there is virtually no information available and it was not possible to assign a putative function and these are listed as unclassified.

Figure 5.

Figure 5

Functional categorisation of sporozoite proteome. Bar chart of cellular functions for all 1237 identified proteins. Proteins were assigned a MIPS category based on the GO annotation detailed at CryptoDB (grey bars). Proteins with no GO classification were assigned putative MIPS categories from additional published information (black bars). Proteins with no known function are listed as unclassified.

3.6 Evidence for organellar proteins

3.6.1 Mitochondrion

Previous analyses of the C. parvum and C. hominis genomes [11, 12] included predictions for 47 proteins related to iron–sulphur metabolism and the remnant mitochondrion of Cryptosporidium. An additional seven genes were annotated as mitochondrial on the genome scaffold (http://cryptodb.org/cryptodb/). Of these, 18 (34%) were detected in this study including: 11 chaperone/HSP proteins (cgd1_2970, cgd2_20, cgd2_3330, cgd3_2690, cgd3_3440, cgd4_2780, cgd6_4970, cgd7_360, cgd7_3880, cgd8_2820, cgd8_3770), transporters (cgd2_1030, solute carrier protein; cgd6_2360, mitochondrial FAD carrier protein); and enzymes involved in metabolism (cgd3_3120, alternative oxidase; cgd4_690, pyruvate/ferredoxin oxidoreductase; cgd6_3970, glutaredoxin-like protein; cgd6_4570, glutamate synthetase; cgd7_1900, mitochondrial NADH dehydrogenase). This analysis demonstrates that these proteins are present in the sporozoite stage of the parasite and is consistent with the existence of a functional relic mitochondrion of Cryptosporidium [14].

3.6.2 Invasion organelles: Microneme proteins

Only three proteins (TRAP-CI, GP900 and Cpa135) have been definitively located to the micronemes of Cryptosporidium although estimates of the number of micronemeassociated proteins in Eimeria and Toxoplasma is currently ∼30 [35, 36]. A search of orthologues and proteins containing domains associated with micronemal proteins in T. gondii, such as Apple, epidermal growth factor (EGF) and thrombospondin (TSP) domains on the genome scaffold (http://cryptodb.org) produced a further 24 potential micronemal proteins. This group includes 12 TSP domain containing molecules [37], a number of extracellular proteins of unknown function with EGF and/or Apple domains, the LCCL domain containing proteins such as Cpa135 [38] and mucin GP900 [39]. Of these, >50% were detected in the excysted sporozoites (Table 1). A novel TSP-like protein (cgd6_1740) was expressed which has domains involved in protein: protein interaction, e.g. focal adhesion targeting and Armadillo domains. Other domains listed are involved in diverse functions, e.g. carbohydrate binding (concanavalin and ricin lectins), and pentraxin domains are found in pattern recognition proteins involved in innate immunity such as in the detection of pathogens.

Table 1.

Putative micronemal proteins

Genea) Description Domainsb) Sequence identity to other apicomplexan proteinsc)
cgd1_3500 TRAP-C1 SP, C-terminal TM, 2 Apple, 6 TSP, Cy TSP4 (7e-36); TSP3* (9e-33)
cgd1_3510 TSP3 SP, TM, 2 Apple, 4 TSP TRAP-C1* (7.6e-49); TSP4 (9.6e-48)
cgd6_2310 TSP6 SP, C-terminal TM, Apple, TSP No
cgd2_3080 TSP10 SP, N-terminal TM, 4 TSP, Kringle, C-terminal TM No
cgd6_1660 TSP11 SP, 3 TSP Hypothetical protein, P. falciparum, gI23629551 (2.3e-12)
cgd6_1740 TRAP 1-like adhesion protein ARM, FAT, PIK cgd4_2670, DNA repair protein with FAT/PIK (8e-5)
cgd7_1730 Cpa135/CpCCP1 SP, Ric, disc, NEC, LCCL, levanase, apicA CpCCP2* (1.1e-106); CCP2 P. berghei, (3.8e-187)
cgd7_300 CpCCP2 SP, Ric, disc, NEC, LCCL, 2 levanase, apicA CpCCP1* (3.3e-120); hypothetical protein g|23509754 P. falciparum, (1.9e-156)
cgd2_790 CpCCP3 SP, 2 LH, 3 LCCL, 2 SR, pentraxin Scavenger receptor protein g)25990169, T. gondii (9.7e-249)
cgd1_1520 Extracellular protein SP, EGF, 9 TM, Con, pentraxin No
cgd1_3550 Low complexity mucin SP, Apple, T-rich cgd2_1590, fibrillin (7e-8)
cgd6_1080 gp45/15/60 mucin involved in invasion SP, TM No
cgd4_3620 p23 surface antigen found in gliding trails / No
cgd7_4020 GP900, mucin SP, TM, T-rich, Hy, Plakin, Glycosylation cgd7_4330*, threonine rich glycoprotein, possible mucin (3e-39)
a

Shading indicates proteins identified with large numbers of peptides which could indicate a greater level of abundance.

b

SP, signal peptide; TM, transmembrane domain; TSP, thrombospondin domain; Apple, apple/Pan adhesion motif; EGF, epidermal growth factor motif; ARM, Armadillo repeat motif; FAT, focal adhesion targeting domain; PIK, phosphatidylinositol kinase domain; LH, lipase/lipooxygenase repeat; Cy, TRAP-like cytoplasmic tail sequence; NEC, NEC motif; T-rich, stretches rich in threonines; Kringle, kringle domain; Hy, metallo-dependent hydrolase domain; ApicA, apicomplexan-specific cysteine-rich repeat; levanase, levanase activity motif; pentraxin, pentraxin pattern recognition receptor motif; Ric, ricin lectin motif; disc, discoidin motif; LCCL, Limulus factor C cochlear motif; SR, scavenger receptor; Con, concanavalin lectin motif.

c

* indicates a protein which is also expressed.

3.6.3 Invasion organelles: Rhoptry proteins (ROP)

To assess the potential array of rhoptry-related proteins detected in the sporozoite proteome, homologous BLAST searches were performed using protein sequences from the 16 putative ROPs listed on the Toxoplasma genome resource (http://toxodb.org). An additional 38 rhoptry-associated genes were identified by a proteomic analysis of a highly enriched rhoptry fraction from T. gondii [9]. Xu et al. [12] propose a further 14 C. hominis homologues of putative Plasmodium rhoptry-related genes, and two more rhoptry genes (cgd2_340; cgd3_710) are listed on the genome scaffold as potential P. yoelli 235 ROP orthologues (http://cryptodb.org). Of these 60 putative ROPs, 13 had no matching Cryptosporidium sequences via BLAST searches. Altogether 42 Cryptosporidium homologues were identified by BLAST searching; four exhibited very poor identity with the Toxoplasma and/or Plasmodium orthologues and therefore may not be true orthologues. Of the remainder, 12 putative sporozoite ROPs were detected in this study: cgd4_320 (RAB11); cgd8_2530 (a sushi-domain containing protein with identity to T. gondii rhoptry neck protein (RON1), 7.9e−34); cgd8_750 (cGMP dependent protein kinase with identity to T. gondii 583.m00003, 4.1e−227 and to various T. gondii ROPs); cgd4_2040 (235 ROP); cgd2_340, cgd3_710, cgd3_3310, cgd6_3930 and cgd8_1520 (putative P. yoelii p235 protein orthologues, e = 10−15 to 10−45 ); cgd2_3320 (secreted protease); cgd2_930 (insulinase like peptidase); cgd3_2020 (PP2C like phosphatase).

3.6.4 Invasion organelles: Dense Granule proteins

Analysis of the C. hominis genome did not previously reveal any specific dense granule associated proteins [12] however, T. gondii subtilisin is annotated as a dense granule protein and has two putative orthologues in Cryptosporidium, oneof which, cgd6_4840 was detected by this study. A further protein listed on the T. gondii genome resource (http://toxodb.org) as dense granule protein, p35 (GRA8) shows rather limited similarity (e−11, e−16) to three proline rich hypothetical proteins of unknown function, cgd1_590, cgd6_3940 and cgd7_4500.

3.7 Surface proteins

Previously Templeton et al. [40] identified a subset of secreted proteins in Cryptosporidium, many with well-known adhesion-type domains which may have a role in surface interactions with the host cell. The majority of these proteins were detected in this study (Table 2). Included in this list are oocyst cell wall proteins, mucins, families of secreted proteins which result from Cryptosporidium-specific gene expansions such as the Wyle and CpLSP gene clusters [11] and a locus of signal peptide-containing proteins of which recent evidence suggests one member, cgd4_3450, is highly immunogenic and may therefore be exposed on the parasite surface [41]. Most of these proteins are uncharacterised. Notably some exhibit motifs which may have a role in host immune evasion, for example, SCP domains are present in proteins involved in stress/pathogen interaction; KAZAL, Kringle and SPI motifs regulate protease activity.

Table 2.

Proteins which may be involved in surface interactions

Genea) Description Domainsb) Sequence identity to other apicomplexan proteinsc)
cgd5_690 Extracellular protein SP, 1 TM, Ric, Glycosyl N-Acetylgalactosaminyltransferase gi|37622165 T. gondii(1e-112)
cgd7_4160 Extracellular protein SP, 1 TM, Glycosyl, Ric Extracellular proteins with SP, TM, Glycosyl, Ric: cgd5_690* (3.5e-109); cgd7_1310* (2e-30); cgd6_1960 (8e-25)
cgd7_4560 Gigantic extracellular protein SP, C-terminal TM, 9 sushi, 2 archaeal protease TSP2 (1.4e-11)
cgd1_470 Secreted protein SP, 1 TM, S/T-rich, ARM cgd4_3630 hypothetical protein (3e-5)
cgd2_420 Low mass mucin SP, S-rich No
cgd2_430 Low mass mucin SP, S-rich No
cgd2_440 Low mass mucin SP, S-rich, TM No
cgd3_1540 Putative mucin SP, C/T-rich No
cgd4_3550 Putative mucin SP, TM, 12 KAZAL Hypothetical proteins with KAZAL domains, cgd4_750 (6e-28); cgd5_3380 (3e-9)
cgd5_2060 Putative mucin SP, T-rich, charged aa repeats No
cgd6_710 Putative mucin SP, T-rich, TM Hypothetical protein gi|95007416, T. gondii (2e-13)
cgd6_5400 Putative mucin SP, C/T-rich No
cgd6_5410 Putative mucin/′CP2′ SP, T-rich, charged aa repeats Reticulocyte binding protein 2 gi|33413772 P. falciparum (9e-9)
cgd6_5420 Putative mucin SP No
cgd8_700 Putative mucin SP, TM, T-rich, Spectrin Low complexity proteins cgd8_680* (3e-43) and cgd8_660 (2e-26)
cgd8_3520 Putative mucin SP, TM, C/T-rich No
cgd6_3130 Cytohesin-like molecule 1 TM No
cgd7_4540 15/60K secreted sporozoite protein Vinculin Hypothetical protein g|71031987, T. parva (4e-11)
cgd6_2330 Gal/GalNAc binding lectin SP, TM No
cgd1_1250 Large secreted protein SP, TM No
cgd5_1440 Large secreted protein SP No
cgd7_2340 Large secreted protein SP, 1 TM, cysteine protease 235 kDa ROP, P. yoelii EAA20475 (3e-6)
cgd7_3800 Large secreted protein locus SP, 1 TM No
cgd7_3810 Large secreted protein locus SP, TM, catenin/vinculin domain No
cgd7_3820 Large secreted protein locus SP No
cgd7_3830 Large secreted protein locus SP No
cgd7_3860 Large secreted protein locus SP, 1 TM Hypothetical proteins: cgd7_3870 (4e-9); cgd5_1440* (2.9e-7 )
cgd7_3870 Large secreted protein locus SP, 1 TM Large hypothetical protein, cgd5_1440* (4e-64)
cgd7_4280 Large secreted protein SP, 1 TM No
cgd7_4500 Large secreted protein SP, P-ext No
cgd7_4530 Large secreted protein SP, 1 TM No
cgd4_3430 Protein within SP containing locus SP, Ig fold No
cgd4_3440 Protein within SP containing locus SP, TM, No
cgd4_3450 Protein within SP containing locus SP, HMG No
cgd4_3460 Protein within SP containing locus SP No
cgd4_3470 Protein within SP containing locus SP, TM No
cgd4_3530 Protein within SP containing locus SP, TM, t-snare No
cgd4_3580 Protein within SP containing locus SP, protein kinase No
cgd6_3080 Hypothetical protein SP, TM No
cgd3_3430 Amiloride binding protein SP, MAM, Cu amine oxidase No
cgd5_2020 Extracellular protein SP, C-rich, SCP No
cgd7_4810 CpFNPA SP, Kringle, FN2, anthrax Hypothetical LCCL domain protein, P. falciparum g|23613448 (1.2e-110)
cgd5_2740 Wyle family protein SP, TM No
cgd8_3540 Wyle family protein SP, TM No
cgd8_3590 Wyle family protein SP No
cgd8_2530 Secreted sushi domain protein SP, TM, GETHR repeat, Sushi ROP1*, gi|71559158 T. gondii (5e-17); hypothetical protein, gi|68070831 P. berghei(3e-12)
cgd6_2090 COWP1 SP, GFR, C-rich Other COWPs, e.g. COWP2* (2.6e-97); COWP, gI|39578335 T. gondii (7e-32)
cgd7_1800 COWP2 SP, TM, GFR, Chit, Other COWPs, e.g. COWP1*, C. parvum(5.4e-89); COWP, gI|39578335 T. gondii (3e-18)
cgd4_670 COWP3 SP, GFR, Chit, SPI Other COWPs, e.g. COWP1*, C. parvum (2.2e-34); COWP, gI|39578335 T. gondii (4e-6)
cgd8_3350 COWP4 SP, GFR Other COWPs, e.g. COWP1*, C. parvum (9.4e-27); COWP, gI|39578335 T. gondii (3e-6)
cgd4_3090 COWP6 SP, 4 GFR Other COWPs, e.g. COWP1*, C. parvum (1e-48); COWP, gI|39578335 T. gondii (3e-48)
cgd6_200 COWP8 SP, C-rich Other COWPs, e.g. COWP1*, C. parvum (2.5e-257); COWP, gI|39578335 T. gondii (3e-7)
a

Shading indicates proteins identified with large numbers of peptides which could indicate a greater level of abundance.

b

SP, signal peptide; TM, transmembrane domain; Sushi/SRC, sushi motif or complement control element; ARM, Armadillo repeat motif; Glycosyl, family 2 glycosyltransferase; KAZAL, serine protease inhibitor repeat domains; C/S/T-rich, stretches rich in cysteines, serines or threonines; Kringle, kringle domain; Ric, ricin lectin motif; MAM, 4 conserved cysteines, adhesion domain; SCP, sperm coat protein motif; GETHR, pentapeptide repeat; LCCL, Limulus factor C cochlear protein motif; HMG, high mobility group motif; SPI, serine protease inhibitor; t-SNARE, t-SNARE repeat; vinculin, vinculin focal adhesion motif; anthrax, anthrax toxin motif; COWP, cryptosporidium oocyst wall protein; GFR, growth factor receptor; Chit, chitin binding domain; P-ext, extension of prolines; FN2, FN2 domain.

c

* indicates a protein which is also expressed.

3.8 Genes with highest similarity to bacterial genes

A small but significant proportion of genes in the Cryptosporidium genome are annotated as having closest similarity to genes from bacteria. This is of interest as the presence of these genes indicates a possible bacterial evolutionary origin of these genes and provides evidence for their acquisition via horizontal gene transfer. Of the 50 or so genes listed on CryptoDB (http://cryptodb.org/) and by Abrahamsen et al. [11] with highest identity to bacterial genes, 20 were detected in this study: cgd1_310 (apurinic endonuclease), cgd1_2130 (glutaminyl t-RNA synthetase), cgd2_2890 (biotin-acetyl-CoA carboxylase ligase), cgd3_3120 (AOX1, alternative oxidase), cgd4_300 (lipase/esterase), cgd4_700 (hypothetical protein), cgd5_2600 (leucine aminopeptidase), cgd5_4560 (tryptophan synthase), cgd6_1910 (chagasin), cgd6_4570 (glutamate synthetase), cgd7_230 (membrane protein), cgd7_470 (malate dehydrogenase), cgd7_480 (lactate dehydrogenase), cgd7_2620 (ATpase), cgd7_5330 (thioredoxin/ PDI), cgd8_480 (histone deactylase), cgd8_1700 (NAD dependent dehydrogenase), cgd8_1720 (acetaldehyde reductase plus alcohol dehydrogenase), cgd8_3290 (methyltransferase involved in ubiquinone/menaquinone biosynthesis), cgd8_3460 (methionyl-tRNA synthetase). These genes span a wide variety of cellular functions, including protein synthesis/folding/degradation and metabolism.

3.9 Comparison with other expression data

Comparison of this dataset with other expression data currently available indicated almost complete coverage (95%) of the Snelling et al. [17] proteome dataset. Comparison with the 682 full-length cDNAs obtained from C. parvum sporozoites (http://fullmal.hgc.jp/cp/hosted at Comparasite http://comparasite.hgc.jp/) [15] and 334 nonredundant ESTs listed on CryptoDB confirmed the translation of 275 (40%) and of 157 (47%) proteins for which there is EST evidence, respectively. Only 50% of the genes confirmed by the two EST datasets were shared by both datasets, suggesting differences in sampling or preparation of the RNA. Conversely, only 20% of the proteins detected in the sporozoites had corresponding mRNA transcripts possibly indicating that the majority of proteins within the excysted sporozoites were extant, having been expressed at an earlier stage in parasite development.

3.10 Public accessibility of the Cryptosporidium proteome data

To enable the proteomic data generated by this study to be viewed in the context of the genome scaffold of Cryptosporidium, an interface for the proteome data was developed for data-mining and visualisation at CryptoDB (http://cryptodb.org/). The peptide coordinate locations for each identified protein were mapped to their corresponding locations on the genomic contigs in CryptoDB. CryptoDB is a public database resource that integrates different data types (genomic, transcriptomic, proteomic) for multiple Cryptosporidium species. Using the interface developed in this study, users can now search for annotated gene models or ORFs that have evidence of expression based on peptides identified by MS/MS analysis. Detailed record pages for each gene/ORF embed a graphical representation of the peptides in the context of the protein and genomic sequences and show the extent of peptide coverage. The peptide glyphs are colour coded to highlight the experimental conditions from which they were derived (Fig. 6). Additional information about a given peptide is provided in a popup when mousing over the glyph. Users can restrict their searches to a particular experimental protocol and also by the number of unique spectra, or the total number of spectra mapping to an annotation. The results can be combined with other searches supported by the database including searches for genes with EST expression evidence, or GO categories, Pfam domains, etc. [42]. The proteomics interface in CryptoDB was designed to enable proteomics data from other researchers to be deposited alongside data from this study to enable a full proteomics picture for all life-stages of the parasite to be built with time.

Figure 6.

Figure 6

Graphical representation of data for C. parvum gene cgd6_3190 as displayed at CryptoDB (http://cryptodb.org). The upper grey panel represents the entire length of chromosome 6 (1.2 Mb) and the red box in the middle of this region represents the 5 kb region that is shown in the detail view below. The first track in the figure displays the localisation of peptides from three different experimental conditions as indicated by the three different colours. The bottom track shows the annotated genes in the region with the gene of interest highlighted in yellow. The putative protein product encoded by each gene model is indicated below the gene model and the gene name is indicated above. The arched thin black lines that separate some of the peptides were added to indicate when a peptide spans a predicted intron.

3.11 Annotation of the genome of C. parvum

In the majority of cases the proteomic data from this study confirmed predicted genes in the genome annotation, providing conclusive evidence of the existence of these hypothetical genes and of their expression at the protein level in this life-stage of the parasite. In some instances the peptides mapped to regions that spanned an intron providing experimental support for the existing annotations of intron/exon boundaries (Fig. 6). Peptides that span 13 introns in 10 genes (cgd1_420, cgd1_440, cgd1_530, cgd2_3910, cgd3_3200, cgd4_320, cgd6_3190, cgd6_4760, cgd7_1540, cgd7_4070) were identified. This represents supporting evidence for 5% of 237 introns annotated in the protein coding genes of C. parvum.

MS data were searched against both hypothetical annotated genes as well as against all possible ORFs >50 amino acids in the C. parvum genome, thus protein identification was not dependant on current annotation and the analysis retained the flexibility to match to alternative annotations. Forty-two peptide-containing ORFs were identified which did not overlap withanexisting gene annotation and may indicate potential errors and omissions in the present gene annotation. Thirty ORFs were identified which overlapped with a known gene but contained at least one peptide not fully encoded by the gene. Many ofthese ORFs were hit by a single low-scoring peptide and may be artefacts however there are some that warrant further study. AAEE01000001-4-405695-401253 identified by MudPIT, has no corresponding gene annotated in C. parvum but shares high similarity to a corresponding C. hominis hypothetical protein Chro.70187. A cluster of ORFs on the same strand of contig AAEE01000014 (AAEE01000014-2-77531-77743, AAEE01000014-3-77790-78011, AAEE01000014-2-78011-78295, AAEE01000014-1-78355-78534) bear peptide evidence from multiple assays (Fig. 7). This clustering is suggestive of the exons of a gene, although it may be coincidental and only one ORF (AAEE01000014-2-78011-78295) shows any similarity to known proteins – Myxococcus xanthus polyketide synthase type I and Lyngbya sp. hypothetical protein L8106_03382.

Figure 7.

Figure 7

Cluster of ORFs on contig AAEE01000014 with MS evidence of expression as displayed at CryptoDB (http://cryptodb.org). Contig AAEE01000014 is 85 kb in length as indicated by the grey track at the top of the figure. The region from 77 to 79 kb is expanded below. The middle track indicates the observed peptide fragments from different experiments as indicated by the different colour bars. The track below the peptide fragments displays all ORFs >50 amino acids in length from both DNA strands. Each of the nine potential ORFs is indicated by a different colour (shades of blue are one strand and shades of red the other). The five ORFs identified by the peptide fragments are highlighted in yellow and all occur on the same DNA strand. No annotated genes are present in this region.

4 Discussion

Genome analysis of C. parvum has provided a wealth of information and enabled predictions to be made concerning various aspects of its biology such as metabolic function, evolution and host–parasite interactions [11, 12, 40]. The significance of this proteomic study is that it provides clear in vivo evidence for the expression of these hitherto predicted proteins in Cryptosporidium. The information has been made publicly available and will help to (i) underpin and further inform the annotation of the Cryptosporidium genome; (ii) establish a link between understanding transcription and expression of gene products; (iii) pin-point key/novel proteins that might act as valuable therapeutic targets in future studies.

Gene prediction analysis of the C. parvum and C. hominis genomes indicated that between 25 and 40% of the potential gene products are ‘hypothetical proteins’ with insufficient similarity to known proteins to allow any deductions regarding putative function to be made [11, 12]. This study found proteomic evidence in C. parvum for 482 of these hypothetical proteins which represents 39% of the total proteome detected in this study. Approximately 69–74% of the coding region of the C. parvum and C. hominis genome is thought to contain genes [11, 12]. This estimate is based on imperfect algorithmic prediction programmes and evidence from this study suggests that a further 72 regions of the genome scaffold may contain coding exons that have not been recognised by gene current prediction models. Similarly, some of the current genes models for which no peptide evidence exists may be partially or completely incorrect, although further mRNA expression data/deeper proteome analysis across all life-stages will be needed before such models can be eliminated with certainty.

Comparison of the proteome data obtained with EST datasets revealed that for half the genes for which transcriptional data are available, no corresponding protein could be detected by MS analysis. This discrepancy could reflect the expression of low copy number or transient proteins which were not detectable under the conditions used, or that there is an additional level of post-transcriptional control of protein expression. Evidence in support of either explanation has come from comparative studies of proteome and transcriptome datasets of Plasmodium [4345]. More surprisingly perhaps, for 80% of the proteins detected in this study there were no corresponding mRNA transcripts detected in either the EST dataset, or in the full-length cDNA data. This is likely to reflect the inadequacies of the current EST analysis for C. parvum, but might also indicate that many of the proteins in the sporozoite excystation mixture were extant, having been expressed at an earlier stage in parasite development; such stage-specific expression is documented for Plasmodium [45]. Of the proteins for which there was supporting cDNA evidence, a greater proportion were involved in protein synthesis, binding functions and transcription, whereas those involved in biogenesis of cellular components were under-represented. Proteins involved in the former functions tend to be abundant and would therefore be easily detected by MS. Functional categories of detected proteins for which there was scant EST evidence may indicate that the proteins are preformed or undergo very slow turn-over. Comparison with the CryptoDB EST dataset indicated a greater proportion of proteins involved in invasion (category: cell rescue, defence and virulence) and interaction with the environment, but this was not supported by comparison with the Watanabe full-length cDNAs. As the sporozoite is the invasive form of the parasite with invasion organelles preformed, it is likely that the protein components of these organelles are also preformed. Further data on the transcriptome of C. parvum are essential to resolve this crucial question of the relationship between transcription and protein expression in C. parvum raised by this study.

Cryptosporidium does not possess a mitochondrial genome [11, 12] although a single remnant organelle is present per cell [14, 46]. Current evidence supports the origin of all mitochondria as the endosymbiosis of an alpha proteobacterium [47]. One-third of the predicted relic-mitochondrial proteins were detected in this study spanning a range of functions including iron–sulphur cluster biosynthesis, electron transport and metabolism. This implies that the remnant mitochondrion is present and presumably functional within the sporozoite and is in agreement with previous work from both this laboratory and others [14, 46].

Understanding the metabolic function of the relic mitochondrion is crucial to elucidating the phylogenetic position of these organisms in Eukaryote evolution. A better understanding can also be gleaned from those genes which show closest similarity to bacterial genes. This study provides expressed protein evidence for many bacterial-like orthologues, spanning a wide range of cellular functions. A number were detected with very high MOWSE scores and good peptide coverage (e.g. cgd5_2600, leucine aminopeptidase; score 658, 15 peptides, 34% coverage) which is usually an indicator of high expression levels and should remove lingering doubts that these are true Cryptosporidium orthologues. Light microscopic examination indicated no visible bacterial contamination of the preparation of Cryptosporidium used in this study, as such any background signal from bacterially derived peptides would be overwhelmed by those derived from the Cryptosporidium proteins and difficult to detect.

Confirmation of true protein function can only be attained through biochemical analysis and subcellular localisation studies, however, predictions are possible from structural and sequence comparisons. In their analysis of the genome of C. hominis, Xu et al. [12] detailed absolute gene numbers per GO category. In the present study, proteins were assigned a MIPS category deduced from GO and other data sources. A degree of caution must be levelled in drawing comparisons with these data as the GO and MIPS assignments employed are not equivalent. Furthermore, detection of a given protein is dependent upon a number of variables including expression levels, stability, proteolysis by trypsin and the performance of the derived peptides within the mass spectrometer. None-the-less, the profile of proteins detected in this study approximates the functional profile of gene products predicted from the C. hominis genome, with three exceptions (for comparison see [12] supplementary data, Figure 10). First, proteins classified under ‘binding function’ are under represented, and those involved in translation, overrepresented. Although Snelling et al. [17] observed up-regulation of ribosomal proteins during excystation suggestive of an increase in relative terms, in the present study the overrepresentation is most likely an artefact of differences in GO assignment, whereby MIPS classify ribosomal proteins under ‘protein synthesis’ and GO (Amigo) as ‘structural proteins’ not ‘translation’. In the second exception, proteins involved in metabolism are under represented. It is likely that this is a true reflection of the metabolic state of the parasite. Within the cyst, sporozoites are in a quiescent state, and metabolism is restricted due to the scant availability of nutrients. Freshly in vitro excysted sporozoites have yet to interact with the host cell and the wide array of nutrients derived from the host metabolism, and as cell lysates were produced directly following excystation, cellular activity may have been slowed. Although it is assumed that Cryptosporidium relies on the import of host sugars to fuel a glycolysis-based metabolism, the MIPS definition of ‘metabolism’ also includes biosynthetic processes. Analysis of the Cryptosporidium genome has additionally revealed the presence of several hundred transporter molecules; necessary for the import of essential nutrients such as amino acids and nucleotides to compensate for limited biosynthetic activity. Previously, Snelling et al. [17] detected three transporters, none of which appeared to be upregulated during excystation. This study has expanded the dataset to 64 transporter-like proteins associated with the oocyst/sporozoite life cycle stages. The proportion of the total proteome that these represent (i.e. 9%) is in good agreement with that predicted from the analysis of the genome [12] however, there are clearly many more genes encoding transporter-like proteins which may be present but not detected, or are specific to other life-cycle stages. The third exception involves proteins assigned to the category ‘cell rescue, defence and virulence’ which also includes putative proteins of the apical organelles. Analysis of apicomplexan genomes has indicated a significant component of molecules involved in pathogenesis, immune evasion and adhesion [40]. Knowledge in this area is currently poor for Cryptosporidium however inclusion of the micronemal and rhoptry orthologues identified here indicates an enrichment of these proteins in the invasive form of the parasite, which is not unexpected.

Little is known of the contents of the secretory organelles of Cryptosporidium. Cryptosporidium sporozoites are thought to have several hundred micronemes, far fewer dense granules and perhaps only one rhoptry per cell [48]. Annotation of the genome has allowed the identification of orthologues from the better characterised apical organelles of T. gondii, Plasmodium and Eimeria [9, 36, 4853]. In T. gondii and E. tenella rapid sequential release of the contents of the apical organelles occurs upon contact with the host cells and these molecules are known to be involved in attachment, penetration, invasion and maintenance of the parasite within the host. Recently it has been proposed that the apicomplexanspecific secreted protein families (e.g. CpLSP, Wyle) and the large expansion of mucin-like molecules in Cryptosporidium may also have a role in invasion [11, 40]. These proteins are largely unknown, poorly annotated with low sequence similarities to other known proteins. Mucins are large, secreted, glycosylated proteins with threonine and/or cysteine stretches which may be involved in invasion of the host intestinal wall [54]. The best characterised mucins are sporozoite antigens, GP900 and Gp45/15/60 which are thought to act as ligand molecules, facilitating attachment to the host mucus [39, 55]. Both proteins were expressed in this study along with orthologues of known apical proteins, and other secreted proteins which express surface interaction domains including numerous mucins and members of the Wyle and CpLSP gene families. Measurement of mRNA transcript levels of the TSP and CpLSP proteins 2–72 h postinfection indicated stage-specific expression suggestive of a role in invasion [11, 37]. Detection of expressed proteins in this study from both families in freshly excysted sporozoites supports this conclusion and indicates that some of these proteins are preformed prior to infection of the host.

Although this study has provided definitive confirmation of the expression of a wide range of proteins that may be derived from the invasion organelles, preparation of sufficient amounts of highly enriched organelles is required to characterise fully their contents. Bradley et al. [9, 10] characterised the proteome of highly enriched rhoptries of T. gondii, revealing many new constituents and employing immunofluorescent labelling to confirm a rhoptry localisation for 11 proteins including RON1, ROP4 and RAB11. The expression of orthologous of RON1, ROP4 and RAB11 in the sporozoite stage of C. parvum has now been demonstrated by this analysis. A subcellular proteomic analysis of the secretory organelles of Cryptosporidium is currently underway to extend upon the information obtained in this study, and to provide in vitro confirmation of an apical localisation.

Acknowledgments

The authors would like to thank Steve Wright, Moredun Research Institute for the C. parvum oocyst preparation, Abu Mohammad Abdul M. Zonaed Siddiki for preparing the excysted sporozoites and Rich Oakes, Institute for Animal Health, Compton. S. J. Sanderson, E. Bromley and K. Lal were funded on a BBSRC grant [BBS/B/03807] awarded to J. M.W.,F. T.and R. E. S. M. Heiges, J. C. Kissinger and portions of the bioinformatics analysis and database inclusion were supported by the National Institute of Allergy and Infectious Diseases, NIH, Department of Health and Human Services [HHSN266200400037C]. J. R. Yates and H. Prieto were supported by NIH P41 RR11823 and NIH-NIAID-DMID-BAA-03-38.

Abbreviations

EGF

epidermal growth factor

FA

formic acid

GO

gene ontology

MIPS

Munich Information Centre for Protein Sequences

MudPIT

multi-dimensional protein identification technology

RON

rhoptry neck protein

ROP

rhoptry protein

TM

transmembrane

TSP

thrombospondin

Footnotes

The authors have declared no conflict of interest.

References

  • 1.Griffiths JK, Balakrishnan R, Widmer G, Tzipori S. Paromomycin and geneticin inhibit intracellular Cryptosporidium parvum without trafficking through the host cell cytoplasm: Implications for drug delivery. Infect. Immun. 1998;66:3874–3883. doi: 10.1128/iai.66.8.3874-3883.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Morgan-Ryan UM, Fall A, Ward LA, Hijjawi N, et al. Cryptosporidium hominis n. sp. (Apicomplexa: Cryptosporidiidae) from Homo sapiens. J. Eukaryot. Microbiol. 2002;49:433–440. doi: 10.1111/j.1550-7408.2002.tb00224.x. [DOI] [PubMed] [Google Scholar]
  • 3.Mallon M, Macleod A, Wastling J, Smith H, et al. Population structures and the role of genetic exchange in the zoonotic pathogen Cryptosporidium parvum. J. Mol. Evol. 2003;56:407–417. doi: 10.1007/s00239-002-2412-3. [DOI] [PubMed] [Google Scholar]
  • 4.de Graaf DC, Vanopdenbosch E, Ortega-Mora LM, Abbassi H, Peeters JE. A review of the importance of cryptosporidiosis in farm animals. Int. J. Parasitol. 1999;29:1269–1287. doi: 10.1016/S0020-7519(99)00076-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Smith HV, Corcoran GD. New drugs and treatment for cryptosporidiosis. Curr. Opin. Infect. Dis. 2004;17:557–564. doi: 10.1097/00001432-200412000-00008. [DOI] [PubMed] [Google Scholar]
  • 6.Abubakar I, Aliyu SH, Arumugam C, Usman NK, Hunter PR. Treatment of cryptosporidiosis in immunocompromised individuals: Systematic review and metaanalysis. Br. J. Clin. Pharmacol. 2007;63:387–393. doi: 10.1111/j.1365-2125.2007.02873.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Rossignol JF. Nitazoxanide in the treatment of acquired immune deficiency syndrome-related cryptosporidiosis: Results of the United States compassionate use program in 365 patients. Aliment. Pharmacol. Ther. 2006;24:887–894. doi: 10.1111/j.1365-2036.2006.03033.x. [DOI] [PubMed] [Google Scholar]
  • 8.Fricker CR, Crabb JH. Water-borne cryptosporidiosis: Detection methods and treatment options. Adv. Parasitol. 1998;40:241–278. doi: 10.1016/s0065-308x(08)60123-2. [DOI] [PubMed] [Google Scholar]
  • 9.Bradley PJ, Ward C, Cheng SJ, Alexander DL, et al. Proteomic analysis of rhoptry organelles reveals many novel constituents for host-parasite interactions in Toxoplasma gondii. J. Biol. Chem. 2005;280:34245–34258. doi: 10.1074/jbc.M504158200. [DOI] [PubMed] [Google Scholar]
  • 10.Wastling JM, Bradley PJ. In: Proteomic Analysis of the Rhoptry Organelles of Toxoplasma gondii, Toxoplasma: Molecular and Cellular Biology. Soldati D, Ajioka JW, editors. Norwich, UK: Horizon Scientific Press; 2007. pp. 445–454. [Google Scholar]
  • 11.Abrahamsen MS, Templeton TJ, Enomoto S, Abrahante JE, et al. Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science. 2004;304:441–445. doi: 10.1126/science.1094786. [DOI] [PubMed] [Google Scholar]
  • 12.Xu P, Widmer G, Wang Y, Ozaki LS, et al. The genome of Cryptosporidium hominis. Nature. 2004;431:1107–1112. doi: 10.1038/nature02977. [DOI] [PubMed] [Google Scholar]
  • 13.Gardner MJ, Hall N, Fung E, White O, et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002;419:498–511. doi: 10.1038/nature01097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Putignani L, Tait A, Smith HV, Horner D, et al. Characterization of a mitochondrion-like organelle in Cryptosporidium parvum. Parasitology. 2004;129:1–18. doi: 10.1017/s003118200400527x. [DOI] [PubMed] [Google Scholar]
  • 15.Watanabe J, Wakaguri H, Sasaki M, Suzuki Y, Sugano S. Comparasite: A database for comparative study of transcriptomes of parasites defined by full-length cDNAs. Nucleic Acids Res. 2007;35:D431–D438. doi: 10.1093/nar/gkl1039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Strong WB, Nelson RG. Preliminary profile of the Cryptosporidium parvum genome: An expressed sequence tag and genome survey sequence analysis. Mol. Biochem. Parasitol. 2000;107:1–32. doi: 10.1016/s0166-6851(99)00225-x. [DOI] [PubMed] [Google Scholar]
  • 17.Snelling WJ, Lin Q, Moore JE, Millar BC, et al. Proteomics analysis and protein expression during sporozoite excystation of Cryptosporidium parvum (Coccidia, Apicomplexa) Mol. Cell. Proteomics. 2007;6:346–355. doi: 10.1074/mcp.M600372-MCP200. [DOI] [PubMed] [Google Scholar]
  • 18.Campbell AT, Robertson LJ, Smith HV. Viability of Cryptosporidium parvum oocysts: Correlation of in vitro excystation with inclusion or exclusion of fluorogenic vital dyes. Appl. Environ. Microbiol. 1992;58:3488–3493. doi: 10.1128/aem.58.11.3488-3493.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hill BD. Enteric protozoa in ruminants: Diagnosis and control of Cryptosporidium, the role of the immune response. Rev. Sci. Tech. 1990;9:423–440. doi: 10.20506/rst.9.2.503. [DOI] [PubMed] [Google Scholar]
  • 20.Laemmli UK. Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature. 1970;227:680–685. doi: 10.1038/227680a0. [DOI] [PubMed] [Google Scholar]
  • 21.Chepanoske CL, Richardson BE, von RM, Peltier JM. Average peptide score: A useful parameter for identification of proteins derived from database searches of liquid chromatography/tandem mass spectrometry data. Rapid Commun. Mass Spectrom. 2005;19:9–14. doi: 10.1002/rcm.1741. [DOI] [PubMed] [Google Scholar]
  • 22.Shadforth I, Dunkley T, Lilley K, Crowther D, Bessant C. Confident protein identification using the average peptide score method coupled with search-specific, ab initio thresholds. Rapid Commun. Mass Spectrom. 2005;19:3363–3368. doi: 10.1002/rcm.2203. [DOI] [PubMed] [Google Scholar]
  • 23.Link AJ, Eng J, Schieltz DM, Carmack E, et al. Direct analysis of protein complexes using mass spectrometry. Nat. Biotechnol. 1999;17:676–682. doi: 10.1038/10890. [DOI] [PubMed] [Google Scholar]
  • 24.Gatlin CL, Kleemann GR, Hays LG, Link AJ, Yates JR., III Protein identification at the low femtomole level from silver-stained gels using a new fritless electrospray interface for liquid chromatography-microspray and nanospray mass spectrometry. Anal. Biochem. 1998;263:93–101. doi: 10.1006/abio.1998.2809. [DOI] [PubMed] [Google Scholar]
  • 25.Washburn MP, Wolters D, Yates JR., III Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 2001;19:242–247. doi: 10.1038/85686. [DOI] [PubMed] [Google Scholar]
  • 26.Peng J, Elias JE, Thoreen CC, Licklider LJ, Gygi SP. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for largescale protein analysis: The yeast proteome. J. Proteome Res. 2003;2:43–50. doi: 10.1021/pr025556v. [DOI] [PubMed] [Google Scholar]
  • 27.Bern M, Goldberg D, McDonald WH, Yates JR., III Automatic quality assessment of Peptide tandem mass spectra. Bioinformatics. 2004;20:I49–I54. doi: 10.1093/bioinformatics/bth947. [DOI] [PubMed] [Google Scholar]
  • 28.Eng JK, Mccormack AL, Yates JR. An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994;5:976–989. doi: 10.1016/1044-0305(94)80016-2. [DOI] [PubMed] [Google Scholar]
  • 29.Sadygov RG, Eng J, Durr E, Saraf A, et al. Code developments to improve the efficiency of automated MS/MS spectra interpretation. J. Proteome Res. 2002;1:211–215. doi: 10.1021/pr015514r. [DOI] [PubMed] [Google Scholar]
  • 30.Tabb DL, McDonald WH, Yates JR., III DTASelect and contrast: Tools for assembling and comparing protein identifications from shotgun proteomics. J. Proteome Res. 2002;1:21–26. doi: 10.1021/pr015504q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Breci L, Hattrup E, Keeler M, Letarte J, et al. Comprehensive proteomics in yeast using chromatographic fractionation, gas phase fractionation, protein gel electrophoresis, and isoelectric focusing. Proteomics. 2005;5:2018–2028. doi: 10.1002/pmic.200401103. [DOI] [PubMed] [Google Scholar]
  • 32.Elias JE, Haas W, Faherty BK, Gygi SP. Comparative evaluation of mass spectrometry platforms used in largescale proteomics investigations. Nat. Methods. 2005;2:667–675. doi: 10.1038/nmeth785. [DOI] [PubMed] [Google Scholar]
  • 33.Florens L, Washburn MP, Raine JD, Antony RM, et al. A proteomic view of the Plasmodium falciparum life cycle. Nature. 2002;419:520–526. doi: 10.1038/nature01107. [DOI] [PubMed] [Google Scholar]
  • 34.Lasonder E, Ishihama Y, Andersen JS, Vermunt AM, et al. Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature. 2002;419:537–542. doi: 10.1038/nature01111. [DOI] [PubMed] [Google Scholar]
  • 35.Tomley FM, Soldati DS. Mix and match modules: Structure and function of microneme proteins in apicomplexan parasites. Trends Parasitol. 2001;17:81–88. doi: 10.1016/s1471-4922(00)01761-x. [DOI] [PubMed] [Google Scholar]
  • 36.Zhou XW, Kafsack BF, Cole RN, Beckett P, et al. The opportunistic pathogen Toxoplasma gondii deploys a diverse legion of invasion and survival proteins. J. Biol. Chem. 2005;280:34233–34244. doi: 10.1074/jbc.M504160200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Deng M, Templeton TJ, London NR, Bauer C, et al. Cryptosporidium parvum genes containing thrombospondin type 1 domains. Infect. Immun. 2002;70:6987–6995. doi: 10.1128/IAI.70.12.6987-6995.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Tosini F, Agnoli A, Mele R, Gomez Morales MA, Pozio E. A new modular protein of Cryptosporidium parvum, with ricin B and LCCL domains, expressed in the sporozoite invasive stage. Mol. Biochem. Parasitol. 2004;134:137–147. doi: 10.1016/j.molbiopara.2003.11.014. [DOI] [PubMed] [Google Scholar]
  • 39.Barnes DA, Bonnin A, Huang JX, Gousset L, et al. A novel multi-domain mucin-like glycoprotein of Cryptosporidium parvum mediates invasion. Mol. Biochem. Parasitol. 1998;96:93–110. doi: 10.1016/s0166-6851(98)00119-4. [DOI] [PubMed] [Google Scholar]
  • 40.Templeton TJ, Iyer LM, Anantharaman V, Enomoto S, et al. Comparative analysis of apicomplexa and genomic diversity in eukaryotes. Genome Res. 2004;14:1686–1695. doi: 10.1101/gr.2615304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Trasarti E, Pizzi E, Pozio E, Tosini F. The immunological selection of recombinant peptides from Cryptosporidium parvum reveals 14 proteins expressed at the sporozoite stage, 7 of which are conserved in other apicomplexa. Mol. Biochem. Parasitol. 2007;152:159–169. doi: 10.1016/j.molbiopara.2006.12.010. [DOI] [PubMed] [Google Scholar]
  • 42.Heiges M, Wang H, Robinson E, Aurrecoechea C, et al. CryptoDB: A Cryptosporidium bioinformatics resource update. Nucleic Acids Res. 2006;34:D419–D422. doi: 10.1093/nar/gkj078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Coulson RM, Hall N, Ouzounis CA. Comparative genomics of transcriptional control in the human malaria parasite Plasmodium falciparum. Genome Res. 2004;14:1548–1554. doi: 10.1101/gr.2218604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Le Roch KG, Johnson JR, Florens L, Zhou Y, et al. Global analysis of transcript and protein levels across the Plasmodium falciparum life cycle. Genome Res. 2004;14:2308–2318. doi: 10.1101/gr.2523904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hall N, Karras M, Raine JD, Carlton JM, et al. A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science. 2005;307:82–86. doi: 10.1126/science.1103717. [DOI] [PubMed] [Google Scholar]
  • 46.Keithly JS, Langreth SG, Buttle KF, Mannella CA. Electron tomographic and ultrastructural analysis of the Cryptosporidium parvum relict mitochondrion, its associated membranes, and organelles. J. Eukaryot. Microbiol. 2005;52:132–140. doi: 10.1111/j.1550-7408.2005.04-3317.x. [DOI] [PubMed] [Google Scholar]
  • 47.Gabaldon T, Huynen MA. Shaping the mitochondrial proteome. Biochim. Biophys. Acta. 2004;1659:212–220. doi: 10.1016/j.bbabio.2004.07.011. [DOI] [PubMed] [Google Scholar]
  • 48.Tetley L, Brown SM, McDonald V, Coombs GH. Ultrastructural analysis of the sporozoite of Cryptosporidium parvum. Microbiology. 1998;144:3249–3255. doi: 10.1099/00221287-144-12-3249. [DOI] [PubMed] [Google Scholar]
  • 49.Bromley E, Leeds N, Clark J, McGregor E, et al. Defining the protein repertoire of microneme secretory organelles in the apicomplexan parasite Eimeria tenella. Proteomics. 2003;3:1553–1561. doi: 10.1002/pmic.200300479. [DOI] [PubMed] [Google Scholar]
  • 50.Kats LM, Black CG, Proellocks NI, Coppel RL. Plasmodium rhoptries: How things went pear-shaped. Trends Parasitol. 2006;22:269–276. doi: 10.1016/j.pt.2006.04.001. [DOI] [PubMed] [Google Scholar]
  • 51.Meissner M, Reiss M, Viebig N, Carruthers VB, et al. A family of transmembrane microneme proteins of Toxoplasma gondii contain EGF-like domains and function as escorters. J. Cell. Sci. 2002;115:563–574. doi: 10.1242/jcs.115.3.563. [DOI] [PubMed] [Google Scholar]
  • 52.Mercier C, Adjogble KD, Daubener W, Delauw MF. Dense granules: Are they key organelles to help understand the parasitophorous vacuole of all apicomplexa parasites? Int. J. Parasitol. 2005;35:829–849. doi: 10.1016/j.ijpara.2005.03.011. [DOI] [PubMed] [Google Scholar]
  • 53.Sam-Yellowe TY, Florens L, Wang T, Raine JD, et al. Proteome analysis of rhoptry-enriched fractions isolated from Plasmodium merozoites. J. Proteome Res. 2004;3:995–1001. doi: 10.1021/pr049926m. [DOI] [PubMed] [Google Scholar]
  • 54.Boulter-Bitzer JI, Lee H, Trevors JT. Molecular targets for detection and immunotherapy in Cryptosporidium parvum. Biotechnol. Adv. 2007;25:13–44. doi: 10.1016/j.biotechadv.2006.08.003. [DOI] [PubMed] [Google Scholar]
  • 55.O′Connor RM, Kim K, Khan F, Ward HD. Expression of Cpgp40/15 in Toxoplasma gondii: A surrogate system for the study of Cryptosporidium glycoprotein antigens. Infect. Immun. 2003;71:6027–6034. doi: 10.1128/IAI.71.10.6027-6034.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES