Abstract
Theoretical tryptic digests of all predicted proteins from the genomes of three organisms of varying complexity were evaluated for specificity and possible utility of combined peptide accurate mass and predicted LC normalized elution time (NET) information. The uniqueness of each peptide was evaluated using its combined mass (+/− 5 ppm and 1 ppm) and NET value (no constraint, +/− 0.05 and 0.01 on a 0–1 NET scale). The set of peptides both underestimates actual biological complexity due to the lack of specific modifications, and overestimates the expected complexity since many proteins will not be present in the sample or observable on the mass spectrometer because of dynamic range limitations. Once a peptide is identified from an LC-MS/MS experiment, its mass and elution time is representative of a unique fingerprint for that peptide. The uniqueness of that fingerprint in comparison to that for the other peptides present is indicative of the ability to confidently identify that peptide based on accurate mass and NET measurements. These measurements can be made using HPLC coupled with high resolution MS in a high-throughput manner. Results show that for organisms with comparatively small proteomes, such as Deinococcus radiodurans, modest mass and elution time accuracies are generally adequate for peptide identifications. For more complex proteomes, increasingly accurate measurements are required. However, the majority of proteins should be uniquely identifiable by using LC-MS with mass accuracies within +/− 1 ppm and elution time measurements within +/− 0.01 NET.
Keywords: peptide identification, mass tag, retention times, AMT, NET
INTRODUCTION
The current multiplicity of completed genomes has simultaneously expanded the number of possible protein sequences and increased the ambiguity of peptide/protein identifications. Genomes for more than 100 organisms have been sequenced, providing potential protein coding sequences that number in the millions. This large pool of sequences has allowed for the identification and characterization of proteins involved in a range of biological pathways, and generation of proteomic information is rapidly proliferating, especially with the use of methods such as shotgun proteomics, an approach in which John Yates III and colleagues have made critical contributions [1,2]. As exciting as these discoveries have been, this surge in information is presenting statistical challenges related to identification of the correct protein sequence from an increasingly large number of choices. As a result of the ambiguity among data, researchers are being asked to provide more information about the way in which protein identifications are made and the processes involved so that the level of potentially inherent false identifications becomes known [3,4].
Earlier studies have shown [5] that utilizing accurate mass spectrometric measurements for MS based identification of peptides within 0.1 ppm uncertainty (tolerance) can allow significant levels of confidence in protein identifications, even from mixtures with the complexity of some eukaryotic systems. However, as the genomic, and thus proteomic complexity of an organism increases, the ability to identify proteins (or peptides) on the basis of mass measurement alone becomes increasingly more difficult, especially if a technically challenging mass tolerance of +/− 0.1 ppm cannot be maintained. Experimental approaches for dealing with this complexity include focusing on only those peptides with a specific physical characteristic, e.g., isolation of cysteinyl peptides by chemical labeling or solid phase extraction techniques [6,7]; fractionation techniques to add a second dimension separation, e.g., MuDPiT [2,8–13]; and the use of peptide fragmentation patterns from collision induced dissociation (CID). While useful, these methods may hinder sample throughput or result in poor protein coverage. When high performance liquid chromatography (HPLC) separations (e.g., using a micro-capillary C18 column) are combined with high resolution mass spectrometric measurements, reproducible peptide elution times can be acquired in addition to mass measurements. Although the elution time of a particular peptide will vary from run to run due to column drift associated with temperature changes and flow rate, among other factors, this drift can be normalized over the length of the separation by using an appropriate algorithm to align multiple analyses [14].
In this theoretical study, peptide mass and predicted elution times were used as specific measures of peptide uniqueness to evaluate the effect of proteome complexity on protein identifications. This work demonstrates an extension of previous work [5] that proposed that a peptide identification could be distinguished from a mixture of closely related peptide masses, without performing tandem mass spectrometry (i.e., MS/MS), by utilizing high mass accuracy (as can be obtained using Fourier transform ion cyclotron resonance; FTICR) and normalized LC elution time information. For this study, elution time information predicted from theoretical tryptic peptide sequences [14] and calculated masses at several mass accuracy constraints were used to determine the likelihood of correctly identifying a peptide by comparing its mass and elution time with a similar mass and elution time for a previously identified peptide. The applicability of this method to large theoretical datasets was addressed by comparing data for four systems of varying complexity.
METHODS
Databases
Protein lists for three organisms – Deinococcus radiodurans, Saccharomyces cerevisiae, and Homo sapiens – were obtained from the following protein sequence repositories: Deinococcus radiodurans (TIGR, March 21, 2000), Saccharomyces cerevisiae (http://www.yeastgenome.org/ provided though Stanford University, January 6, 2003), and Homo sapiens (IPI, April 1, 2004). In addition, a fourth system comprised of the combined protein list from the results of 436 SEQUEST analyses of LC-MS/MS analyses (described in an appendix at the end of this paper) for Human Mammary Epithelial Cells (HMEC) was used to represent an observed subset of human proteins. These HMEC datasets represent a set of potential mass and time tags for HMEC that have been used in other studies utilizing the accurate mass and time (AMT) tag approach [15,16].
Simulated processing and analysis
An in silico digestion was performed on the proteins present in each database using Protein Digestion Simulator, a program written in-house using VB.NET (available online [17]). This program reads a list of protein names and sequences from an input file and performs a virtual tryptic digest on each protein sequence, then uses an improved version of the normalized elution time (NET) prediction program by Petritis, et al., also written in-house, to compute the predicted NET values for each sequence [14,18]. The in silico tryptic digestion cleaves each sequence after either lysine or arginine (K or R) sites, but not if the residue is followed by proline. The resultant peptides were permitted to have up to one "missed cleavage" (internal K or R), and were filtered to only include those with a mass between 600 and 3000 Da. The NET prediction program is a VB.NET DLL that takes as an input a peptide sequence, its length, and its calculated hydrophobic moment, and computes the NET for the sequence. The predicted NET for a given peptide, which generally ranges from 0 to 1 (approximately translated to mean 0% to 100% of a normalized run), is determined by employing a neural network-based model, developed using training data from 20 species and over 200,000 unique filtered peptide identifications from LC-MS/MS analyses. Cysteine-only databases were created for each of the four systems by selecting the subset of cysteine-containing peptides from the original databases.
All figures were prepared by using all peptides within their respective databases without filtering, and an algorithm developed by Anderson et al. [19] to quantitatively gauge peptide uniqueness. This algorithm determines a quality of match score, termed the spatially localized confidence (SLiC) score, by utilizing the standardized squared distance between a given peptide's mass and NET and each comparison peptide's mass and NET to estimate the closeness of each match to the given peptide. The distances are then normalized to compute a conditional probability between 0 and 1, gauging how uniquely each comparison peptide matches the given peptide. A peptide was designated as unique provided the only peptide within tolerance was itself, with a match score ≥ 0.7. For the uniqueness plots, (Figure 4) created following in silico digestion, each peptide was examined to compare its unique mass and elution time class with that for all other peptides within a particular database to determine whether or not it was uniquely identifiable within the given mass and NET tolerances.
Figure 4.

Percent of peptides that are unique vs. peptide monoisotoptic mass for the four systems for no NET constraint, and for +/− 0.05 and 0.01 NET, as well as for different levels of mass accuracy. The peptides plotted have SLiC scores greater than or equal to 0.7. Cysteinyl-only peptides are shown for H. sapiens in addition to all peptides for H. sapiens.
RESULTS AND DISCUSSION
Information from the analysis of the tryptic peptide and tryptic cysteinyl peptide databases for each system used in this study is provided in Table 1 and Table 2, respectively. In general, the H. sapiens database contains roughly 15 times more proteins than the smallest database HMEC, and 13 and 6.5 times more proteins than the D. radiodurans and S. cerevisiae databases, respectively. The H. sapiens database also has approximately 21 and 6 times more peptides than the D. radiodurans and S. cerevisiae databases, respectively. Over 90% of both H. sapiens and HMEC proteins contain cysteine residues, whereas only 63% of D. radiodurans proteins contain cysteine residues. S. cerevisiae lies between these two extremes, with 87% cysteine-containing proteins. It should be noted that while the HMEC derived protein list is smaller than the D. radiodurans protein list, the conforming HMEC peptide list is more than twice that of D. radiodurans, most likely due to differences in average protein size.
Table 1.
Description of Databases of Tryptic Peptides, Mass range 600 to 3000 Da
| Database Name | Proteins | Residues (Millions) | Tryptic Peptides with 0 or 1 missed Cleavages |
|---|---|---|---|
| D. radiodurans | 3,117 | 0.964 | 113,330 |
| S. cerevisiae | 6,360 | 2.99 | 402,861 |
| H. sapiens | 41,216 | 19.3 | 1,503,895 |
| HMEC subset | 2,759 | 1.63 | 228,563 |
Table 2.
Description of Databases Containing Cysteine, Mass range 600-3000 Da
| Database Name | Cysteinyl Peptides | Proteins containing Cysteine |
|---|---|---|
| D. radiodurans | 8% | 63% |
| S. cerevisiae | 14% | 87% |
| H. sapiens | 24% | 93% |
| HMEC subset | 18% | 94% |
Differences in system complexity are readily observed in Figure 1a, where predicted NET values are plotted against the monoisotoptic mass for tryptic peptides in each of the four systems. Expectedly, as the systems increase in complexity, the number of data points increases dramatically. Note that some areas of sparse data points are not consistent across all systems. For instance, the sparse region around 0.4 NET and 2500 Da for D. radiodurans contrasts to the same region for S. cerevisiae that appears dense, and even more sharply to the same region for human (i.e., H. sapiens and HMEC), likely reflecting the difference in complexity and amino acid distribution of a prokaryotic system compared to eukaryotic systems. The regions above 0.4 NET display an increased relative peptide density for S. cerevisiae, and to an even greater extent for H. sapiens. This observation indicates that eukaryotic systems not only have more peptides in the same regions as prokaryotic systems, but also have peptides in regions where prokaryotic peptides are much less frequent, reflecting biases in the amino acid composition. Unsurprisingly, the data for HMEC appears similar to that for S. cerevisiae; however, S. cerevisiae peptides are more dense than HMEC in the NET range >0.6.
Figure 1.

a) Global representation of tryptic digests for all four systems studied. b) Cysteine-containing peptides from tryptic digests for all four systems studied. Predicted Normalized Elution Time (NET) is plotted along the x-axis, and monoisotoptic mass in Daltons is plotted along the y-axis. Inset views are representative of the region contained within 1950 – 2000 Da and 0.3 – 0.4 NET.
It should be noted that when only cysteinyl peptides are considered (Figure 1b) the complexity of all four systems is substantially reduced compared to the whole proteome (Figure 1a), a significant attraction of approaches that isolate this sub-set of peptides. Cysteinyl tryptic peptides in Figure 1b are also well distributed across mass and NET values, indicating the viability of this method for obtaining a subset of peptides without biasing the sampling to any particular mass/NET region.
The effect of using isolated cysteinyl peptides for peptide identifications is illustrated in the scatter plots for each system in Figure 2. Here, a common dense region obtained from the plots in Figure 1 for each of the four systems is used to represent the distribution of peptides by their mass and NET values. The circles show the specificity that would be obtained using two different mass and NET constraints. For illustration, the peptide (i.e., mass and elution time feature) represented in the center of the target is the point assumed to be from experimental data. The lower precision constraints of +/− 5 ppm and 0.05 NET, represented by the outer region, include more peptide choices than the tighter constraints of +/− 1 ppm and 0.01 NET represented by the inner region. Visual inspection of all four systems in Figure 2 shows that if only cysteine-containing peptides are used for peptide identification, the number of unique peptide choices is reduced by more than half, illustrated by the disparity of triangles (cysteinyl peptides) compared to dots (non-cysteine containing peptides). For the D. radiodurans system, the number of peptides that fall within the outer boundary is 3, which presents some ambiguity for identification. However, by using the higher mass and NET accuracy (inner circle), the correct peptide can be chosen. H. sapiens and HMEC systems also have multiple peptides within a +/− 5 ppm/0.05 NET tolerance. Because a cysteinyl peptide was chosen as the experimental observation, cysteinyl isolation alone is not adequate for uniquely identifying the peptide using the lower precision constraints, as indicated by the other 3 cysteinyl peptides contained within the outer region for both systems. However, unique peptide identifications may be readily determined by using the more stringent mass and NET tolerances. Figure 2 also illustrates the benefit of using NET and mass rather than mass alone to determine a unique peptide hit. If only mass is used as the criteria for identification, one encounters at least 16 peptide choices in the 2160.08 Da region of the H. sapiens plot. This phenomenon is also true in the other systems, but because of their smaller proteomes, the number of choices within a given mass range and +/− 5 ppm, is fewer, yet still significant. Similar inspections can be performed using sparse peptide regions (data not shown).
Figure 2.

Focus on dense region of interest from the plots in Figure 1. For clarity, Figures 1a and 1b are combined in the dense plots by representing cysteinyl peptides as triangles and non-cysteine containing tryptic peptides as dots in the same plots. The peptide point chosen for comparison in both the H. sapiens and HMEC systems corresponds to the same cysteinyl tryptic peptide in both systems.
Figure 3 shows a granular view of the cysteinyl peptides from H. sapiens (from a dense region of the plot in Figure 2). Note the two peptides located very close to each other, with the target centered on one point and another point falling on the edge of the smaller circle. This observation highlights that even with the most stringent tolerances in mass and NET of +/− 1 ppm and 0.01, there can still be cases of ambiguity for peptide identifications. An approach for isolating the best match is to calculate a SLiC score based on the distance of the measured NET and mass for a high mass accuracy measurement (e.g., using FTICR) to the nearest neighboring peptide and the number of neighbors in the area for each peptide. The effect of distance and the number of neighboring peptides on these scores is illustrated in Figure 3 based upon +/− 5 ppm/0.05 NET (outer circle) and SLiC values for points A-C with +/− 1 ppm/0.01 NET constraints (values shown as underlined). The SLiC values for all other points at +/− 1 ppm/0.01 NET tolerances have a score of zero because they are too far from the center peptide (and therefore, are not shown). If the mass spectral analysis showed a peptide at mass 2160.065 and a NET value of 0.4044 obtained using the more stringent mass and NET constraints, then the SLiC score for peptide A would be 0.96, and 0.04 for peptide B. A threshold for SLiC values can be applied during data analysis, and minimum acceptable scores selected based upon the aims of a study and the manner of use. Thus, utilization of the SLiC score can be useful for the assessment of peptide identifications.
Figure 3.

Granular view of dense region from H. sapiens. The SLiC scores for +/− 1 ppm/0.01 NET are underlined, while those for +/− 5 ppm/0.05 NET are present for each data point. The peptide sequences for points A-C are KWLYVQETCPLCHYHLK, DPEVCLDLRPGTNYNVSLR, LCVRVVGCEGSSKPFFYNR, respectively.
Figure 4 shows the likely uniqueness of tryptic peptides for each system (assuming up to one missed cleavage) and their computed SLiC scores. The SLiC score is used to determine the relative position and number of neighboring peptides for each peptide. If a peptide had a SLiC score ≥0.7, then it was designated as unique, and the number of unique peptides out of the total peptides for each mass range was calculated. The total mass range of 500–4000 Da was divided evenly into 140 binned regions of 25 Da each, and this range of mass was used for the percent calculations. Figure 4 shows that the general trend of uniqueness is very low at masses <1000 Da, but increases as the peptide size increases. Lower accuracy measurements (5 and 10 ppm) are generally not sufficient to maintain greater than 50% uniqueness when either no or +/− 0.05 NET constraints are applied for any system except D. radiodurans. With a constraint of +/− 0.01 NET, unique identifications can be obtained from essentially all peptides >1000 Da. Larger peptides will predictably have a higher uniqueness than smaller peptides, and above 2000 Da at +/− 1 ppm/0.01 NET, >75% of peptides from all systems are unique. From the plot of D. radiodurans, at a mass of 1300 and accuracy of +/− 5 ppm and +/− 0.05 NET, ~50% of peptides are unique. At +/− 1 ppm and 0.01 NET accuracies, the percent unique peptides increase to ~75%.
When no NET constraints are employed, the number of unique tryptic peptides possible for HMEC is increased by up to 4-fold by using 0.1 ppm mass tolerances compared to 10 ppm. The complexity of some systems, e.g., the cysteine isolated peptides from H. sapiens, is such that when no NET constraints are considered, even 0.1 ppm tolerance can still leaves significant ambiguity for peptide identification. However, when NET constraints are used in conjunction with accurate mass measurements, the number of available candidates decreases sharply and the percentage of unique peptides increases. Furthermore, as tolerances are tightened from 0.1 to 0.01 ppm and 0.05 to 0.01 NET, the ambiguity of identification is practically eliminated. Thus, the NET restriction compensates for the loss in mass accuracy; a correlation seen across all systems. In the H. sapiens cysteinyl peptides panel with no NET constraint, less than 10% of the unique peptides can be identified using either 5 ppm or 10 ppm mass tolerances. Only the 0.1 ppm mass constraint will yield 75% uniqueness of the available peptides. A minimum of 1 ppm is required to differentiate 75% of the unique peptides, and at 5 ppm, the best possible rate is 50% uniqueness for a system with this complexity. While all the given mass tolerances allow for at least 50% uniqueness of the available peptides on the basis of mass alone, 1 ppm or better to obtain a level of more than 75%.
The cysteinyl peptides from H. sapiens show uniqueness trends that are similar to those of the other systems. Not surprisingly, tryptic peptides from H. sapiens (Figure 4, bottom row) show a lower average uniqueness in comparison to the other systems. This is due to the complexity of the H. sapiens system, and it illustrates the advantage of cysteine isolation techniques. With no NET constraints, ~80% of H. sapiens peptides are unique, even with a mass tolerance of +/− 0.1 ppm for peptides with a mass over 3500 Da. With 1 ppm, the uniqueness drops to ~25%, and almost no differentiation can be made with only +/− 5 and 10 ppm tolerances. Upon application of +/− 0.05 LC NET constraints, >50% uniqueness can be obtained for masses over 2250 Da with mass tolerances of 1 ppm and 0.1 ppm. With the strictest NET constraint of +/− 0.01, greater than 50% uniqueness can be achieved using 1 ppm or 0.1 ppm, for peptides 1200 Da or larger.
CONCLUSIONS
The accurate mass and LC NET of a peptide provides information that can support peptide identification in the context of high throughput measurements, and the effective use of both the mass and NET dimensions is important for addressing proteomes of high complexity. As the accuracy of these measurements improves, the extent of protein coverage by confidently identified peptides also increases. Statistically speaking, peptides have the highest possibility of being uniquely identified if the separation device and mass spectrometric detection are as precise as possible. At present, high accuracy mass spectrometers are able to measure masses within a tolerance of 1 ppm or less, and liquid chromatographic separations provide run-to-run performance that supports the use of alignment algorithms to give corrected (e.g. normalized) elution times within 1%, providing the basis for approaches combining accurate mass and NET.
These approaches can be further augmented using a sample preparation technique that isolates cysteinyl peptides, an extension attractive for studies of the most complex proteomes. While there are inherent uncertainties that apply to accurate mass and NET measurements, the high specificity provides a basis for confident identifications that can be made in a high throughput fashion (e.g. without the need to perform MS/MS on every peptide).
Furthermore, unlike the peptide sets used in this study, which included every possible tryptic and every case of a single missed trypsin cleavage site, actual experimental data for a specific tissue or cell culture condition will not include every possible protein or every possible tryptic peptide. Thus, while the present approach certainly underestimates the added complexity introduced by protein modifications and “partial” tryptic peptides, it also incorporates a higher level of complexity contributed by many peptides that will not be present (since many proteins will not be expressed at detectable levels) or will not be detected (e.g. rarely detected highly hydrophobic peptides that are underrepresented in proteome analyses). Thus, in many cases, the present calculations indicate that modest mass accuracies of +/− 5 ppm and LC 0.05 NET tolerances may be adequate for identifying the majority of identifiable peptides.
High throughput analyses are required in order to meet the demands of the field as it pertains to proteomic exploration. By simplifying the LC-MS analysis to measurements that can be acquired quickly and accurately, limited chromatographic time is preserved, and the number of possible identifications increased. Yet still there remains the challenge of extending that identification confidently to the protein level. Novel methods of validating protein identifications from the peptide identifications are needed, particularly during the current state of degenerate protein annotations from genomic sequences.
Acknowledgments
Portions of this research were supported by the NIH National Center for Research Resources (RR18522) at Pacific Northwest National Laboratory and National Institute for Allergy and Infectious Diseases contract (N01-AI-40053). The authors would also like to thank Jon Jacobs, Kostas Petritis, Penny Colton and Samuel Purvine for helpful discussions and critical reading of the manuscript. This research was performed in the Environmental Molecular Sciences Laboratory (a national scientific user facility sponsored by the U.S. DOE Office of Biological and Environmental Research) located at Pacific Northwest National Laboratory, operated by Battelle for the U.S. DOE.
APPENDIX: GENERATION OF THE HMEC DATABASE
HMEC sample preparation
The whole cell lysates were split into four groups, each representing a different focus for identification. Preparation conditions for each group are summarized below:
Group 1: Global 3D analysis of HMEC
A protein size exclusion separation was performed, with subsequent trypsin digestion (Promega), half using alkylation by iodoacetamide and half without, followed by strong cation exchange fractionation (SCX) of each of the size exclusion fractions and LC-MS/MS analysis of each SCX fraction This group had 149 samples.
Group 2: Second global analysis
No protein separation was performed prior to digestion with trypsin. SCX fractionation (plus alkylation) was performed, followed by LC-MS/MS analysis. This group had 67 fractions.
Group 3: Cysteine enrichment global dataset
Same as for Group 2, except that peptides were first treated using Quantitative Cysteinyl-peptide Enrichment Technique (QCET) [7] for cysteine enrichment prior to SCX fractionation. A total of 60 fractions were collected.
Group 4: Secreted protein sample
The media from four different growth treatments of HMEC cell samples were analyzed to target secreted proteins. Each sample was cleaned, protein isolated, and digested (plus alkylation), then SCX fractionated. There were 40 fractions for each of the 4 samples, for a total of 160 total fractions.
LC-MS analysis of HMEC samples
The high pressure LC (HPLC) system consisted of a pair of Model 100DM 100-mL syringe pumps and Series D controller (Isco, Inc., Lincoln, NE), an in-house manufactured stir-bar style mobile phase mixer (2.5-mL volume), two 4-port, 2-position valves (Valco Intruments Co., Houston, TX) for mobile phase and capillary column selection, and a 6-port, 2-position Valco valve equipped with a 10-μL sample loop for automated injections. The mixer and valves were mounted on an in-house manufactured rack assembly that was custom fit to a PAL autosampler (Leap Technologies, Carrboro, NC) for unattended routine analysis. Reversed-phase capillary HPLC columns were manufactured in-house by slurry packing 5-μm Jupiter C18 stationary phase (Phenomenex, Torrence, CA) into a 60-cm length of 360 μm o.d. x 150 μm i.d. fused silica capillary tubing (Polymicro Technologies Inc., Phoenix, AZ) incorporating a 2-μm retaining screen in a 1/16” capillary-bore union (Valco Intruments Co., Houston, TX).
The mobile phase consisted of 0.2% acetic acid and 0.05% TFA in water (A) and 0.1% TFA in 90% acetonitrile/10%water (B). Mobile phase was degassed with an in-line Alltech vacuum degasser (Alltech Associates, Inc., Deerfield, IL). The HPLC system was equilibrated at 5000 psi with 100% mobile phase A for initial starting conditions. The mobile phase selection valve was switched from position A to B 20 minutes after injection, creating an exponential gradient as mobile phase B displaced A in the mixer. An ~5-cm length of 360 i.d. fused silica tubing packed with 5 μm C18 was used to split ~25 μL/min of flow before the injection valve. The split flow controls gradient speed under conditions of constant pressure operation. Flow through the capillary HPLC column was ~1.8 μL/min when equilibrated to 100% mobile phase A.
MS analysis was performed using a Finnigan model LCQ Duo or XP ion trap mass spectrometer (ThermoQuest Corp., San Jose, CA) with electrospray ionization (ESI). The HPLC column was coupled to the mass spectrometer using an in-house manufactured interface. No sheath gas or make-up liquid was used. The heated capillary temperature and spray voltage were 200ºC and 2.2 kV, respectively. Samples were analyzed over a mass (m/z) range of 400–2000. For each cycle, the three most abundant ions from MS analysis were selected for MS/MS analysis using a collision energy setting of 45%. Dynamic exclusion was used to discriminate against previously analyzed ions.
HMEC protein identifications
A total of 436 LC-MS/MS analyses were performed with the digested lysate samples. The datasets were searched against the H. sapiens database using SEQUEST [20]. Proteins with 2 or more peptides of high confidence (XCorr ≥1.9, 2.2, and 3.75 for 1+, 2+, and 3+ charge states, respectively), a mass between 500 and 4000 Da, and no more than one missed tryptic cleavage were compiled, resulting in a list of 2759 unique proteins. The peptide confidence criteria used here are similar to those used by the Yates and coworkers [2,20]. This protein list was then treated in the same manner as the protein lists that were downloaded for the other three systems.
References
- 1.Eng JK, McCormack AL, Yates JR. An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino-Acid-Sequences in a Protein Database. J Am Soc Mass Spectr. 1994;5(11):976–989. doi: 10.1016/1044-0305(94)80016-2. [DOI] [PubMed] [Google Scholar]
- 2.Washburn MP, Wolters D, Yates JR., 3rd Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol. 2001;19(3):242–247. doi: 10.1038/85686. [DOI] [PubMed] [Google Scholar]
- 3.Taylor CF, Paton NW, Garwood KL, Kirby PD, Stead DA, Yin Z, Deutsch EW, Selway L, Walker J, Riba-Garcia I, Mohammed S, Deery MJ, Howard JA, Dunkley T, Aebersold R, Kell DB, Lilley KS, Roepstorff P, Yates JR, 3rd, Brass A, Brown AJ, Cash P, Gaskell SJ, Hubbard SJ, Oliver SG. A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nat Biotechnol. 2003;21(3):247–254. doi: 10.1038/nbt0303-247. [DOI] [PubMed] [Google Scholar]
- 4.Carr S, Aebersold R, Baldwin M, Burlingame A, Clauser K, Nesvizhskii A. The Need for Guidlines in Publications of Peptide and Protein Identification Data. Mol Cell Proteomics. 2004;3:531–533. doi: 10.1074/mcp.T400006-MCP200. [DOI] [PubMed] [Google Scholar]
- 5.Conrads TP, Anderson GA, Veenstra TD, Pasa-Tolic L, Smith RD. Utility of accurate mass tags for proteome-wide protein identification. Anal Chem. 2000;72(14):3349–3354. doi: 10.1021/ac0002386. [DOI] [PubMed] [Google Scholar]
- 6.Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol. 1999;17(10):994–999. doi: 10.1038/13690. [DOI] [PubMed] [Google Scholar]
- 7.Liu T, Qian WJ, Strittmatter EF, Camp DG, 2nd, Anderson GA, Thrall BD, Smith RD. High-throughput comparative proteome analysis using a quantitative cysteinyl-peptide enrichment technology. Anal Chem. 2004;76(18):5345–5353. doi: 10.1021/ac049485q. [DOI] [PubMed] [Google Scholar]
- 8.Adkins JN, Varnum SM, Auberry KJ, Moore RJ, Angell NH, Smith RD, Springer DL, Pounds JG. Toward a human blood serum proteome: analysis by multidimensional separation coupled with mass spectrometry. Mol Cell Proteomics. 2002;1(12):947–955. doi: 10.1074/mcp.m200066-mcp200. [DOI] [PubMed] [Google Scholar]
- 9.Durr E, Yu J, Krasinska KM, Carver LA, Yates JR, Testa JE, Oh P, Schnitzer JE. Direct proteomic mapping of the lung microvascular endothelial cell surface in vivo and in cell culture. Nat Biotechnol. 2004;22(8):985–992. doi: 10.1038/nbt993. [DOI] [PubMed] [Google Scholar]
- 10.Graumann J, Dunipace LA, Seol JH, McDonald WH, Yates JR, 3rd, Wold BJ, Deshaies RJ. Applicability of tandem affinity purification MudPIT to pathway proteomics in yeast. Mol Cell Proteomics. 2004;3(3):226–237. doi: 10.1074/mcp.M300099-MCP200. [DOI] [PubMed] [Google Scholar]
- 11.Lipton MS, Pasa-Tolic L, Anderson GA, Anderson DJ, Auberry DL, Battista JR, Daly MJ, Fredrickson J, Hixson KK, Kostandarithes H, Masselon C, Markillie LM, Moore RJ, Romine MF, Shen Y, Stritmatter E, Tolic N, Udseth HR, Venkateswaran A, Wong KK, Zhao R, Smith RD. Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc Natl Acad Sci U S A. 2002;99(17):11049–11054. doi: 10.1073/pnas.172170199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Peng J, Elias JE, Thoreen CC, Licklider LJ, Gygi SP. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J Proteome Res. 2003;2(1):43–50. doi: 10.1021/pr025556v. [DOI] [PubMed] [Google Scholar]
- 13.Shen Y, Jacobs JM, Camp DG, 2nd, Fang R, Moore RJ, Smith RD, Xiao W, Davis RW, Tompkins RG. Ultra-high-efficiency strong cation exchange LC/RPLC/MS/MS for high dynamic range characterization of the human plasma proteome. Anal Chem. 2004;76(4):1134–1144. doi: 10.1021/ac034869m. [DOI] [PubMed] [Google Scholar]
- 14.Petritis K, Kangas LJ, Ferguson PL, Anderson GA, Pasa-Tolic L, Lipton MS, Auberry KJ, Strittmatter EF, Shen Y, Zhao R, Smith RD. Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. Anal Chem. 2003;75(5):1039–1048. doi: 10.1021/ac0205154. [DOI] [PubMed] [Google Scholar]
- 15.Jacobs JM, Mottaz HM, Yu LR, Anderson DJ, Moore RJ, Chen WN, Auberry KJ, Strittmatter EF, Monroe ME, Thrall BD, Camp DG, 2nd, Smith RD. Multidimensional proteome analysis of human mammary epithelial cells. J Proteome Res. 2004;3(1):68–75. doi: 10.1021/pr034062a. [DOI] [PubMed] [Google Scholar]
- 16.Liu T, Qian WJ, Chen WJ, Jacobs JM, Moore RJ, Anderson DJ, Gritsenko MA, Monroe ME, Thrall BD, Camp DG, 2nd, Smith RD. Improved Proteome Coverage by Using High Efficiency Cysteinyl-peptide Enrichment: The Human Mammary Epithelial Cell Proteome. Proteomics. doi: 10.1002/pmic.200401055. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Protein Digestion Simulator. 2004. In., (v2.0.1846.26324, 01/20/05) Ed. [Google Scholar]
- 18.Petritis KKL, Yorn B, Strittmatter EF, Camp DG, II, Lipton M, Xu Y, Smith RD. Improved Liquid Chromatography peptide Elution Time Prediction by Using Artificial Neural Networks for Improved Proteomics Analysis. 52nd ASMS Symposium; Nashville, TN: 2004. [Google Scholar]
- 19.Anderson KK, Monroe ME, Daly DS. Estimating Probabilities of Peptide Assignments to LC-FTICR-MS Observations. Proc of the Intern Conf METMBS. 2004:151–156. [Google Scholar]
- 20.Ducret A, Van Oostveen I, Eng JK, Yates JR, 3rd, Aebersold R. High throughput protein characterization by automated reverse-phase chromatography/electrospray tandem mass spectrometry. Protein Sci. 1998;7(3):706–719. doi: 10.1002/pro.5560070320. [DOI] [PMC free article] [PubMed] [Google Scholar]
