Skip to main content
Molecular & Cellular Proteomics : MCP logoLink to Molecular & Cellular Proteomics : MCP
. 2009 Jun 25;8(9):2051–2062. doi: 10.1074/mcp.M800512-MCP200

Unique Ion Signature Mass Spectrometry, a Deterministic Method to Assign Peptide Identity

Jamie Sherman 1,, Matthew J McKay 1, Keith Ashman 1,§, Mark P Molloy 1,
PMCID: PMC2742438  PMID: 19556279

Abstract

The growing use of selected reaction monitoring (SRM) mass spectrometry in proteomic analyses led us to investigate how to identify peptides by SRM using only a minimal number of fragment ions. By using a computational model of the SRM work flow we computed the potential interferences from other peptides in a given proteome. From these results, we selected the deterministic SRM addresses that contained sufficient information to confer peptide and protein identity that we termed unique ion signatures (UIS). We computationally showed that UIS comprised of only two transitions are diagnostic for >99% of Escherichia coli proteins and >96% of human proteins that possess a sequence-unique peptide. We demonstrated an example of experimental use of UIS using a modified SRM methodology to profile the E. coli tricarboxylic acid cycle from a single injection of cell lysate. In addition, we showed the potential of UIS to form the first functionally orthogonal approach to validate peptide assignments obtained from conventional analyses of MS/MS spectra. The UIS methodology is a novel deterministic peptide identification method for MS/MS spectra based on information content. These robust theoretical assays will have widespread use when integrated with previously collected MS/MS data and conventional proteomics technologies.


Shotgun proteomic analyses using multidimensional LC/MS/MS show great capacity for rapid protein analysis. This is arguably the most prevalent work flow for high throughput comparative proteomics, utilizing information-dependent acquisition (IDA)1 to acquire MS/MS triggered by the signals generated from incoming peptides (13). Despite the utility and widespread use of this approach, there remain inherent problems including a relatively high level of ambiguous and false peptide assignments (∼5%) as well as high numbers of unassigned mass spectra (46). The reason for this level of ambiguity stems in part from the non-deterministic nature of the identification algorithms. Without the use of reference standards the only way to know a spectrum was generated by a given peptide with absolute certainty is for the spectrum to contain a fragment pattern that conclusively demonstrates the presence of each amino acid. Unfortunately this level of coverage is extremely rare in proteomics data.

More recently, selected reaction monitoring (SRM) or multiple reaction monitoring (MRM) mass spectrometry methods have been deployed for proteomic analyses (720). This has occurred as proteomics has matured from a discovery-oriented discipline into a more targeted and quantitative field. The method is conventionally conducted using triple quadrupole mass spectrometers where two rounds of mass selection provide excellent fidelity and sensitivity to monitor one or more predetermined target peptides generally in the context of a complex sample such as a cell lysate. Using this approach the mass spectrometer continually monitors the selected precursor ion m/z (Q1) and a subsequent product ion m/z (Q3) from the target analyte. SRM experiments can be used to conduct several rounds of these scans targeting different product ions in an attempt to bolster the confidence that the Q1 → Q3 transitions monitor the intended analyte with fidelity. A key point of contrast with IDA experiments is the need to preselect target analytes for monitoring. This can be achieved by harvesting data from previous discovery-based experiments or by in silico predictions such as MRM-initiated detection and sequencing (MIDAS) (10, 12). Regardless the key underlying principle of SRM in proteomics applications is that the selected set of precursor and product ions contain sufficient information to proxy for the target peptide and thereby its protein of origin. Given that proteomics SRM experiments are conducted with a minimal set of transitions, one must accept that a degree of uncertainty resides in any such assay. To date, the magnitude of this uncertainty has not been studied. This remains a key point even with MS instruments capable of conducting subsequent full MS/MS scans triggered by SRM (e.g. QTrap) as these are lower sensitivity scans that may contain insufficient fragmentation data to conclusively confer peptide identity.

The problem of interference is also present in SRM experiments. To achieve acceptable sensitivity a large Q1 m/z window (±0.3–1.0 m/z) is needed. This in turn allows other peptides with similar Q1 m/z and elution properties to interfere with detection of the desired target. The frequency of these interferences would likely increase as the complexity of the sample increases creating a greater likelihood of false positives. Clearly this is not an unexpected result as conventional peptide identification strategies utilizing tandem MS result in some false assignments. Therefore, it would be unreasonable to expect that SRM assays that typically utilize fewer product ions than MS/MS experiments would not also encounter similar interference (21).

In this study we investigated the information content of SRM assays and in doing so exposed the potential redundancy. Computational simulations of the experiment enabled us to demonstrate that directed selection of SRM precursor and product ions can avoid the pitfalls of interference by selecting ion combinations that uniquely map to target peptides within the context of the simulation. We used these unique ion signatures (UIS) in a proof of concept study to direct SRM data acquisition for the exclusive detection of enzymes in the Escherichia coli tricarboxylic acid cycle. In addition, given that UIS have been calculated to uniquely define target peptides in the experimental context, we demonstrated the applicability of UIS as an orthogonal validation of peptide identity for traditional MS/MS experiments.

EXPERIMENTAL PROCEDURES

Calculation of Unique Ion Signatures

Protein sequences for the nominated proteomes were downloaded from Swiss-Prot release 56.1. We determined a set of variables used to calculate UIS including: the order of the UIS (the number of Q3 values, UIS|r), use of trypsin for proteolysis, the number of possible missed cleavage sites (one for the proteome wide calculation and two for the E. coli tricarboxylic acid cycle calculation), variable modifications of methionine, the number of allowed charge states (1+, 2+, and 3+), and the number of heavy isotopes to consider (+1, … +5 amu). Using this description all the possible peptides were generated, X was substituted for isoleucine and leucine, and the peptides were then mapped into a set. If the peptide being loaded was already present in the set it was marked as redundant and excluded as a candidate. From this set the peptides that contain no inappropriate cleavage residues, are non-redundant in the proteome, and fall within a 300–2000 m/z domain are candidates for potential UIS addresses. For each candidate peptide, all charge states up to 3+ within a given tolerance (e.g. ±1 m/z) were pooled. From the pooled peptides, the product ions of the candidate peptide are generated (i ions), and all the possible combinations of Q3 m/z were considered. For a UIS|r (r indicating the number of Q3 values) the number of candidate addresses is given by (i choose r) = i!/((r!)(ir)!). These candidates are then challenged with all the combinations of product ions for each of the peptides in the pool. These challenge ions are specified for each experiment but may include b, y, b − H2O, b − NH3, y − H2O, y − NH3, a, a − NH3, M − H2O, M − NH3, and peeling (b + H2O) ions (22). All ions listed are considered with the charge states appropriate for the calculation. Non-unique peptides were removed by determining whether all Q3 values in a combination have a counterpart challenge combination where the ions are within a tolerance (e.g. ±1 m/z) of a candidate combination. All remaining peptide product ions are considered unique and comprise a UIS consisting of a Q1 value and at least one Q3 value. For an example parameter file see supplemental Table S2.

Estimating the Likelihood of SRM Redundancy

The E. coli and human proteomes were downloaded from Swiss-Prot release 56.1. A computer program was written to in silico digest the selected proteome allowing for up to two missed cleavage sites and conditional oxidation of methionine. The peptides were then mapped into a set with the redundant peptides noted. A list of 500 randomly selected proteins for each proteome was then in silico digested, and the sequence-unique peptides were listed. For each of the sequence-unique peptides the charge state of 2+ was set, and the b and y ions in the m/z range of 300–2000 were generated. The set of all possible combinations of these ions was created. Each peptide in the proteome with a Q1 or an isotopic Q1 indistinguishable from candidates Q1 values, accounting for charges of 1+, 2+, and 3+ and isotopic contributions of upto +5 amu, was used to make sets of b and y series challenge ions. If the SRM ions were present in a set of challenge ions, that combination of SRMs was marked as redundant. For this comparison a Q1 tolerance of ±0.6 m/z and a Q3 tolerance of ±1.0 m/z were used. Once all the possible SRM ion combinations were checked, the probability of choosing a redundant combination was computed by dividing the number of redundant combinations by the total number of SRM ion combinations for that peptide. The average of these probabilities was then computed as an estimate of an expected likelihood of redundancy in SRM analysis.

Cell Culture

E. coli K-12 (MG1655) was grown in LB medium to mid-log phase (A600 = 1.2) and collected by centrifugation. The cells were washed with 50 mm Tris/HCl, pH 8.0, and then resuspended in 50 mm ammonium bicarbonate, pH 8.5, supplemented with protease inhibitors. The cells were lysed using a French press operated at 12000 p.s.i., and then the supernatant was collected following centrifugation at 2000 × g.

Sample Preparation

1 ml of the E. coli lysate was adjusted to 8 m urea in 50 mm ammonium bicarbonate, pH 8.5, and reduced with tris(2-carboxyethyl)phosphine (5 mm) at room temperature for 1 h. Proteins were alkylated in 10 mm IAA for 1 h in the dark. The sample was diluted 1:10 with 50 mm ammonium bicarbonate and then digested with trypsin (20 μg) at 37 °C for 18 h. The digest was concentrated and desalted using a 1-ml solid-phase extraction cartridge. Peptides were gravity-loaded onto a pre-equilibrated cartridge, desalted with 5 ml of 0.1% TFA, and then eluted with 5 ml of 80% acetonitrile, 0.1% TFA. Acetonitrile was removed by centrifugal evaporation to reduce the volume of the eluent to ∼0.5 ml.

Liquid Chromatography and Mass Spectrometry Analysis

Digested protein samples were analyzed using a 4000 QTrap hybrid triple quadrupole/linear ion trap mass spectrometer (Applied Biosystems, Foster City, CA) operating in positive ion mode. Peptides were separated by nanoflow liquid chromatography using an Eksigent 2D LC system (Eksigent Technologies, Dublin, CA). Digested samples were analyzed by injecting 10 μl of the digest onto a precolumn (Captrap 0.5 × 2 mm, Michrom BioResources Inc., Auburn, CA) for preconcentration with 95:5 mobile phase A:mobile phase B (mobile phase A: 2% (v/v) acetonitrile containing 0.1% (v/v) formic acid; mobile phase B: 80% (v/v) acetonitrile containing 0.1% (v/v) formic acid) at 10 μl/min. Peptides were then separated using a ProteCol C18 column (300 Å, 3 μm, 150 μm × 10 cm; SGE Analytical Sciences, Ringwood, Victoria, Australia). Peptides were eluted from the column using a linear gradient from 95:5 mobile phase A:mobile phase B to 45:55 mobile phase A:mobile phase B over 120 min at a flow rate of 600 nl/min. The LC eluent was subjected to positive ion nanoflow analysis using a NanoSpray II source equipped with a MicroIonSpray II spray head. Column eluent was directed into the MicroIonSpray II spray head via coupling to a distal coated PicoTip fused silica spray tip (360-μm outer diameter, 75-μm inner diameter, 15-μm-diameter emitter orifice; New Objective, Woburn, MA). Samples were analyzed using an ion spray voltage, heater interface temperature, curtain gas flow, and nebulizing gas flow of 2.1 kV, 150 °C, 18, and 12, respectively. Collision energy (CE) was determined using the following equation CE = slope x(m/z) + intercept where slope = 0.050 and intercept = 5.5 for 2+ precursor ions. MS data were searched against all E. coli entries in the Swiss-Prot database (version 53.2) using MASCOT (Matrix Science, London, UK) allowing for one missed cleavage, alkylation of cysteine (IAA), and oxidation of methionine. False discovery rates were determined by searching the MS data against a reversed E. coli database.

IDA

IDA experiments utilized an enhanced MS survey scan (m/z 350–1500) followed by three data-dependent product ion scans of the three most intense precursor ions. Precursor ions were fragmented a maximum of two times before being excluded for 2 min.

MIDAS

MIDAS experiments were used in an attempt to identify peptides for each protein in the tricarboxylic acid cycle. MRM transitions were designed for each protein in the tricarboxylic acid cycle using the MIDAS Workflow Designer (Version 1.0.0, Applied Biosystems). Enhanced product ion scans (MS/MS) were triggered when individual MRM signals exceeded 300 counts/s. A list of MRM transitions was obtained by taking the amino acid sequence for each protein and theoretically digesting the sequence in silico. MRM transitions included the potential variable modifications of oxidation of methionine and alkylation of cysteine (IAA). MRM transitions with a maximum of two variable modifications per peptide were considered. Q1 values for tryptic peptides (2+ and 3+ charge states and no missed cleavages) between m/z 350 and 1500 were determined by the MIDAS work flow designer program, and Q3 values were the first 1+ product y ion above the 2+ or 3+ precursor ion. Precursor ions were fragmented a maximum of two times prior to being excluded for 2 min. MRM experiments were conducted for each protein in the tricarboxylic acid cycle using unit resolution settings for Q1 and Q3.

UIS-SRM Scanning

UIS-SRM experiments conducted on a 4000 QTrap utilized two SRM transitions (UIS|2 = Q1 → Q3a and Q1 → Q3b) to detect the target peptide. Wherever possible, UIS experiments utilized a primary Q3 value corresponding to the highest intensity product ion that constituted a UIS and a secondary Q3 value corresponding to the second most intense product ion that constituted the UIS for each peptide candidate. Additional scans utilizing UIS other than the first and second most intense product ion pairs were also assessed wherever possible. Overlay of the extracted ion chromatogram of the Q3 product ions indicated detection of UIS. UIS assays were validated by triggering a product ion scan (MS/MS) when individual SRM signals exceeded 300 counts/s.

RESULTS

Computational Simulation to Assess SRM Assay Redundancy

We developed a computational simulation of an SRM experimental work flow as typically conducted on a triple quadrupole MS instrument. First we considered that each protein was present at equal abundance and calculated all possible peptides that would be formed by trypsin digestion (m/z range from 300 to 2000 m/z) from a proteome considering charge states of 1+, 2+, and 3+ and allowing for up to two missed cleavages. We next determined the precursor and corresponding product ion (b and y ions) masses that would be generated by CID. Those precursor m/z values within an m/z isolation window defined by a seed peptide were combined into a bin. Isotopic abundance was also taken into account as this causes some peptide isotopes to fall within the isolation mass window. For every bin, each peptide was considered, and its product ions were challenged with the product ions from all other peptides residing in the bin. This process allowed us to calculate SRM assay redundancy for each peptide in the chosen proteome.

To explore potential SRM assay redundancy we randomly sampled 500 E. coli and 500 human proteins (>12,000 peptides/data point), selected sequence unique peptides, then evaluated the number of redundant SRM assays for each peptide as a function of the number of SRM transitions, and computed the likelihood that a randomly chosen assay would be redundant (Fig. 1). Fig. 1 shows that standard SRM analysis using a single transition (Q1, Q3 pair) for a given peptide had no power to resolve peptide identity when considered in the wider context of a proteome. Even for highly abundant proteins of the E. coli tricarboxylic acid cycle, extracted ion chromatograms (XICs) from SRM transitions showed the presence of multiple peptide signals with high intensity (supplemental Fig. S1). An example is shown in Fig. 2 for the SRM Q1(707.39) → Q3(1102.55) to target the peptide LDGLSDAFSVFR from the iron-sulfur subunit of succinate dehydrogenase that shows the presence of >10 significant peaks. The problem of redundancy is clearly illustrated in Table I by computationally determining the number of peptides that shared a single transition (Table I). Table I shows 10s to 100s of peptides for each targeted SRM transition. For a further discussion on the issue of ion interference in SRM assays see Sherman et al. (21). Clearly there is a significant level of redundancy for single SRM transitions. A common approach to address the problem is to combine multiple transitions; however, as these are normally selected because of favorable fragmentation without consideration of m/z redundancy, this does not solve the problem. This is illustrated in Fig. 1, which shows that even when combining up to five randomly selected product ions there remains considerable likelihood of assay redundancy.

Fig. 1.

Fig. 1.

Estimation of SRM redundancy with human and E. coli proteomes. The results of randomly sampling 500 proteins (>12,000 peptides/point), extracting the sequence unique peptides, and evaluating the possible combinations of SRMs for redundancy in their respective proteomes for a given number of SRM product ions are shown.

Fig. 2.

Fig. 2.

An example of experimental SRM redundancy. The XIC resulting from targeting a single SRM transition to detect LDGLSDAFSVFR from the protein succinate dehydrogenase iron-sulfur subunit is shown. The significant number of peaks results from interference from the sample.

Table I. UIS assays used for the detection 13 proteins of the E. coli tricarboxylic acid cycle and the individual computed redundancy of each SRM transition.

N/A indicates that no MS/MS scan was triggered during the UIS LC/MS/MS analysis preventing independent confirmation for the detection of these peptides. ¦ indicates a fragmentation site. MOWSE, MOlecular Weight SEarch.

UniProtKB/Swiss-Prot entry(gene) Protein name Peptide UIS|2
Peptide confirmed by MS/MS (MOWSE score) Computational SRM transition redundancy
b, y
All
Q1 Q3a, Q3b Q3a Q3b Q3a Q3b
ACON2_ECOLI (acnB) Aconitate hydratase 2 DLV¦H¦AI¦PLYAIK 676.90 704.43 (y6), 1025.61 (y9) Yes (37) 152 73 260 139
676.90 888.55 (y8), 1025.61 (y9) 122 73 263 139
CISY_ECOLI (gltA) Citrate synthase AMGIP¦SSMFTVI¦FAMAR 915.45 595.30 (y5), 1360.67 (y12) Yes (75) 229 58 425 277
DHSA_ECOLI (sdhA) Succinate dehydrogenase flavoprotein subunit LP¦GI¦LE¦LSR 499.30 617.36 (y5), 787.47 (y7) Yes (41) 231 55 359 80
499.30 375.23 (y3), 787.47 (y7) 369 55 475 80
DHSB_ECOLI (sdhB) Succinate dehydrogenase iron-sulfur subunit L¦D¦G¦LSD¦AF¦SVFR 663.83 726.39 (y6), 1213.58 (y11) Yes (55) 103 109 234 169
663.83 508.29 (y4), 1098.56 (y10) 218 80 414 127
663.83 508.29 (y4), 1041.53 (y9) 218 66 414 111
DLDH_ECOLI (lpd) Dihydrolipoyl dehydrogenase GISY¦ETA¦TFPWAASGR 857.41 1063.53 (y10), 1293.62 (y12) Yes (65) 122 63 242 144
FUMA_ECOLI (fumA) Fumarate hydratase class I, aerobic V¦A¦P¦EALTLLAR 577.35 886.53 (y8), 983.59 (y9) Yes (33) 53 70 118 179
577.35 886.53 (y8), 1054.62 (y10) 53 92 118 124
IDH_ECOLI (icd) Isocitrate dehydrogenase (NADP) GP¦L¦T¦T¦PVGGGIR 562.82 655.37 (y7), 857.48 (y9) Yes (56) 168 87 332 174
562.82 655.39 (y7), 970.57 (y10) 168 45 332 98
562.82 756.43 (y8), 970.57 (y10) 340 45 425 98
MDH_ECOLI (mdh) Malate dehydrogenase VA¦V¦L¦G¦AA¦GGI¦GQA¦LALLLK 868.04 926.60 (y9), 1352.82 (y15) Yes (63) 132 63 253 142
868.04 926.60 (y9), 1564.98 (y17) 132 41 253 152
868.04 1153.73 (y12), 1295.80 (y14) 117 60 280 163
868.04 1153.73 (y12), 1564.98 (y17) 117 41 280 152
868.04 670.48 (y6), 1295.80 (y14) 268 60 575 163
868.04 670.48 (y6), 1465.91 (y16) 268 48 575 108
MQO_ECOLI (mqo) Malate:quinone oxidoreductase VV¦L¦FGPFAT¦FSTK 707.39 482.26 (y4), 1102.55 (y10) Yes (56) 238 57 419 124
707.39 482.26 (y4), 1215.64 (y11) 238 42 419 138
ODO1_ECOLI (sucA) 2-Oxoglutarate dehydrogenase E1 component VATL¦EDATEMV¦NLYR 862.93 565.31 (y4), 1340.61 (y11) Yes (78) 254 64 466 140
SUCC_ECOLI (sucC) Succinyl-CoA synthetase β chain AV¦LVNI¦F¦GGIVR 629.38 648.38 (y6), 1087.67 (y10) Yes (34) 141 45 258 149
629.38 501.32 (y5), 1087.67 (y10) 258 45 400 149
ACON1_ECOLI (acnA) Aconitate hydratase 1 V¦LLE¦NLLR 485.30 515.33 (y4), 870.54 (y7) N/A 146 52 265 91
SUCD_ECOLI (sucD) Succinyl-CoA ligase (ADP-forming) subunit α S¦GTLTYE¦AVK 534.78 317.22 (y3), 981.52 (y9) N/A 304 39 407 95
Unique Ion Signatures Are Non-redundant SRM Assays

Using the simulation described previously we observed many instances where particular combinations of m/z ions were non-redundant (Fig. 3). We term these ion combinations UIS as they map exclusively to a given peptide in a proteome under the defined conditions. We observed that two SRMs (Q1 → Q3a and Q1 → Q3b; together they comprise the UIS (Q1, Q3a, and Q3b) and are therein referred to as UIS|2) were necessary and sufficient to define peptide identity in this simulation. These coordinates comprise the set of UIS|2 and provide proteome coverage for the proteins that contain one or more sequence-unique peptides of >99 and >96% for the E. coli and Homo sapiens Swiss-Prot proteomes with Q1 tolerance of ±0.8 amu (Fig. 4A). Interestingly at this Q1 tolerance there are many UIS|2 per protein (estimated average of 26 in E. coli and 16 in humans) (Fig. 4B). Given that there are numerous UIS per protein and individual peptides may possess multiple UIS, the likelihood of experimentally observing at least one unique proteotypic peptide per protein is favorable.

Fig. 3.

Fig. 3.

Number of false identifications in the E. coli proteome for the tryptic peptide VLLPAFPDIR from glycogen synthase. Indicated on the x axis and y axis are the y and b product ions in ascending m/z order. Each colored block represents the number of redundant peptides sharing the same coordinate. Blue dots indicate UIS.

Fig. 4.

Fig. 4.

MS and biological variables effecting UIS coverage. For the data displayed, the simulation, which used the E. coli proteome, took into account two miscleavages by trypsin, peptide charges of 1+ … 3+, and ions b and y and all (b, y, b − H2O, b − NH3, y − H2O, y − NH3, a, a − NH3, M − H2O, M − NH3, and peeling ions (b + H2O)) as noted. A, the effect of Q1 tolerance on the percentage of proteins that are addressed by UIS. UIS|1, the red curve, clearly demonstrates that the use of a single transition in a complex mixture is unsuitable for proteome analysis. The addition of a second transition into the same computational context, UIS|2 shown in blue, significantly increases the number of UIS resulting in sufficient coverage of the proteome. When considering a greater set of ions that may interfere with UIS the number of UIS|2 addresses, indicated in yellow, declines. Introduction of a third fragment ion, UIS|3, overcomes this problem leading to sufficient UIS|3 addresses to restore UIS coverage to the entire proteome. Of note is that the order of the UIS (the number of fragment ions) has far greater impact than the Q1 tolerance. B, the mean coverage of UIS per protein in E. coli. The blue line (UIS|2 b& y) shows the mean number of UIS per protein and the impact of Q1 tolerance. The yellow line (UIS|2 all) displays the impact of increasing the number of types of challenge ions, and the green line (UIS|3 all) shows how the numbers recover when the order of the UIS is increased. C, distribution of UIS by protein mass. The figure illustrates that the number of UIS per protein corresponds with the molecular weight of the protein. Interestingly if one were intentionally targeting a lower molecular weight protein a higher order UIS may be desirable, increasing the likelihood that one of the UIS could be experimentally observed and used as an assay.

Evaluation of More Stringent UIS Simulations

The simulation described above considers typical experimental conditions that have been reported in publications described to date for SRM work flows. In these experiments, Q3 product ions used in SRM transitions are either y or b ions. Given that gas-phase peptide fragmentation sometimes yields ion species other than y and b ions we evaluated the impact this would make on defining UIS. In this stimulation we considered loss and gain of water and ammonia from certain y and b ions, presence of multiply charged product ions, a ions, and peeling ions (22, 23). As would be expected, the consideration of additional ions negatively impacted the number of UIS|2 addresses (Fig. 4A). Given that when an additional ion is added, the number of potential UIS addresses scales with the binomial coefficient, we considered whether UIS addresses with three product ions (i.e. UIS|3) would improve proteome-wide coverage when additional ion series were included in the simulation. Fig. 4A shows that UIS|3 addresses restored >99% proteome coverage for proteins containing one or more sequence-unique peptides even when numerous ion series were considered. In fact, the average number of UIS|3 per protein was greater than 1500 for either E. coli or H. sapiens proteomes (Fig. 4B).

UIS Profiling of E. coli Tricarboxylic Acid Cycle

As a practical, proof of principle example of UIS for targeted proteome profiling we analyzed enzymes of the E. coli tricarboxylic acid cycle (Table I). We applied a combination of both IDA and MIDAS acquisition methods to detect tricarboxylic acid cycle peptides (supplemental Table S1) and matched these to an E. coli UIS atlas that was precalculated for each of the 18 tricarboxylic acid cycle target proteins. UIS-SRM assays using UIS|2 were then selected for each protein and combined into a single MS acquisition method, and an aliquot of trypsin proteolytically cleaved E. coli cell lysate was analyzed by LC/MS/MS (Table I). Clear evidence of UIS|2 detection was apparent when the extracted ion chromatograms of each Q3 m/z were overlaid (Fig. 5). As a further confirmation step we used the SRM signal to trigger MS/MS in the 4000 QTrap and searched these data using MASCOT. Fig. 6 displays the combined UIS-SRM scans detecting 13 of the 18 tricarboxylic acid cycle proteins from a single injection of cell lysate. Using this approach enzymes for each step of the tricarboxylic acid cycle were identified by UIS and validated by MS/MS. In this case MS/MS was conducted only as a validation step, although in principle this is a redundant step when utilizing UIS (supplemental Fig. S2). Thus UIS presents a novel identification strategy for triple quadrupole instruments. A key benefit of using an SRM work flow is that data acquisition is faster than in IDA, and sensitivity is greater if MS/MS scans are not required for peptide identification.

Fig. 5.

Fig. 5.

Selective detection of the peptide GPLTTPVGGGIR in whole cell lysate from the E. coli protein isocitrate dehydrogenase using UIS. A, overlaid XICs display the targeted detection of GPLTTPVGGGIR using three independent UIS|2. The inset shows an expanded region of the overlaid XICs illustrating the co-elution of each UIS assay (denoted as follows: A, 562.82 → 655.37, 562.82 → 857.48; B, 562.82 → 655.37, 562.82 → 970.57; C, 562.82 → 756.43, 562.82 → 970.57). B, MS/MS spectrum confirming the detection of the peptide GPLTTPVGGGIR in E. coli whole cell lysate. Product ions constituting UIS are indicated (A, B, and C).

Fig. 6.

Fig. 6.

UIS scans for E. coli tricarboxylic acid cycle proteins. A, time offset XICs for the Q3a and Q3b ions that form the UIS|2 for the peptide GISYETATFPWAASGR from DldH. Two SRM scans were used to detect this UIS. The signals co-elute but are offset for clarity. (¦ indicates a fragmentation site.) B, 26 overlaid XICs from the UIS scans for the 13 peptides shown in Table I. Each UIS|2 is indicated by black dots above the paired co-eluting peak. C, barcode representation of the E. coli tricarboxylic acid cycle obtained by UIS scans in B. The representation was calculated as a function of the product of Q3a and Q3b ion intensities for each UIS. Colored bars correspond to peptides detected by UIS in B.

Of the five tricarboxylic acid cycle proteins not observed in our analysis, we did not detect any UIS candidate peptides from SucB, SdhC, and SdhD using either IDA or MIDAS acquisition methods. SdhC and SdhD are small hydrophobic transmembrane proteins that were most likely not extracted given our sample preparation methods. Peptides from FumB and FumC were detected by MIDAS, but the FumB peptides did not possess UIS because these peptide sequences are also present in FumA. The FumC peptide detected by MIDAS contained a single UIS; however, the b6 product ion that constituted the UIS was not detected using the UIS assay nor could it be readily observed in the MS/MS scan. It is important to note that failure to detect some UIS candidate peptides such as FumC is not a flaw of UIS methodology per se but rather a result of poor detection of the necessary Q3 product ion whose intensity is governed by the physicochemical properties of the specific peptide.

UIS for Validation of Peptide Identity from MS/MS

A valuable additional use of UIS is to underpin a functionally orthogonal method to validate peptide assignments obtained from MS/MS spectra. As a proof of concept demonstration we used MS/MS spectra acquired using the Universal Proteomics Standard, a mixture of 48 human proteins, that was analyzed by IDA on a QSTAR XL mass spectrometer and searched with MASCOT using conditions described previously (24). 36 proteins were identified with a p value <0.05 and appropriate ion score. We computed UIS|2 for these 36 proteins and then searched the MASCOT output for the presence of the ions needed to exclusively identify the peptides proposed by MASCOT. Ions that comprise the UIS were detected in 32 of the 36 proteins proposed by MASCOT, providing a facile mechanism to orthogonally validate the MS/MS assignments (Fig. 7). The four proteins that were not confirmed lacked sufficient intensity of the key product ions that were required to validate these assignments using UIS. The inability to validate these four proteins does not necessarily mean an incorrect assignment by MASCOT as this algorithm relies on the presence of numerous product ions unlike UIS that uses the minimal essential set. However, the intersection of UIS validation and MASCOT assignments sets a new standard for compelling evidence of true positive peptide assignments. Additionally in supplemental Fig. S3 we show evidence of using UIS to rescue two assignments from MS/MS spectra that were poorly informative for MASCOT and therefore were assigned poor expectation values by MASCOT. In isolation, these low scoring spectra would be unassigned, but as they intersect with UIS, they should be considered accurate.

Fig. 7.

Fig. 7.

Validation of MASCOT-assigned identities by UIS. A–C show examples where a MASCOT identity was assigned and then used to retrieve the corresponding UIS for that peptide. On the right is the list of proteins that MASCOT identified that were validated by UIS.

DISCUSSION

Computational methods for peptide identification are key to proteomics because of the sheer volume of data generated by experiments. We used computational simulation and provided experimental evidence to show that the undirected selection of SRMs to monitor proteins in a proteome will likely result in a significant percentage of assays with ambiguous results because of interference from non-target peptides that share the same SRMs (Figs. 1 and 2). However, we demonstrated by using curated proteomes generated by experimental investigation that a solution to this predicament is available. Our approach was to use the assumptions made in the simulation as a hypothesis for the content of the proteome. It should be noted that it is important to accurately mirror the experimental conditions in the simulations as they are fundamental to the results of the simulations. Thus, any UIS that are shown to be false indicate discordance between the assumptions and the experiment.

Computationally UIS occur at surprisingly high frequency in each proteome, enabling detection of at least one peptide in >99 and >96% of the E. coli and human proteomes, respectively. We found that these computations achieve this coverage using only two transitions (UIS|2). A database of UIS, named ProteomeDB, is currently being made available online. There is currently no robust method that accurately predicts product ion intensity; thus we do not have the ability to predict which UIS product ions will be present experimentally. Nonetheless there are various strategies that could be adopted to increase the likelihood of detecting appropriate product ions, including use of simple rules (e.g. selection of proline), sophisticated evaluation of peptide physicochemical properties and predicted fragmentation based on these properties (2527), use of data repositories (28), or direct empirical methods. One approach would be to select ions based on their membership in multiple UIS. By this we mean select the ions that are present in multiple UIS, thus providing a level of redundancy. Additionally if the UIS of a given order (UIS|n) do not provide an observable ion signature then by simply increasing the number of transitions by one (UIS|n + 1), the binomial coefficient dictates that approximately an order of magnitude more addresses are generated, likely providing one that is readily observable. To illustrate, a peptide having 20 ions and using two product ions (UIS|2) results in 20 × 19/2 = 190 possible coordinates, or for UIS|3 the result is 20 × 19 × 18/(3 × 2) = 1140 possible coordinates. The significance of these observations are profound for large scale proteome profiling, and given a high predicted frequency of UIS occurrence this raises the likelihood that a significant portion of the proteome will be “MS-observable” using the sensitive detection methodology provided by SRM. Furthermore these MS-observable UIS could be considered a higher order proteotypic peptide as they are not only sequence-unique but are non-redundant in the m/z domain for a given computation.

We expect that there is considerable utility in using UIS for validation of peptide identities obtained from conventional analysis of MS/MS data. As this approach is functionally orthogonal to conventional probability-based methods it adds confidence to any assignments that are consistent between these approaches. Any lack of concordance between the two methods is not grounds for rejecting the conventional assignment given that numerous product ions are often used in deriving these assignments. Furthermore a higher order UIS might be present in the MS/MS scan. As we demonstrated in Fig. 7, UIS can also be used to interrogate spectra from low scoring assignments that are below reporting criteria thresholds. Provided that the spectra contain UIS, confident peptide assignments can be made for peptides that have non-uniform fragmentation patterns. This may prove of immense value for proteome profiling given that some estimates suggest that up to 50% of all MS/MS spectra are unassigned (6).

It is important to recognize that the results presented here are only applicable in the context of the simulation, a key parameter of which is the database. If a protein in the database is composed exclusively of peptides found elsewhere in the database (e.g. isoforms, evolutionarily related proteins, etc.) then there are no UIS for those proteins. Additionally giving equal consideration to the presence of each peptide in the proteome is a key variable that penalizes some ion combinations from obtaining UIS status. Clearly this does not truly represent the in vivo situation; yet without an accurate method to account for abundance levels and expression patterns relevant to the sample, this variable cannot be reduced. We have previously considered the effect of using LC retention time to overcome SRM assay redundancy (21). Additionally here we conducted an “order-of-magnitude” analysis to compare the power of (i) LC retention time or (ii) use of an additional SRM Q3 product ion to eliminate assay redundancy (supplemental analysis). Several assumptions were made regarding peptide distribution in the LC time and m/z domain; each of these assumptions was made to favor LC retention time, i.e. the use of a uniform peptide distribution. This analysis indicates that use of LC retention time is 30 times less likely to eliminate redundancy than the use of an additional Q3 product ion. That is, a UIS|2 plus retention time is an order of magnitude less effective than using a UIS|3 without retention time. If one desired to include peptide retention time to reduce redundancy this does provide benefit but may be difficult to accurately predict. Nonetheless as LC separation is an integral component in proteomic analysis a rapid path to UIS implementation might involve the use of LC retention time coupled with appropriate MS/MS reference libraries and possibly isotopic peptide reference peptides for optimum robustness.

There are three primary components to MS-based peptide identification, namely 1) signal, 2) noise, and 3) information content. The main thrust of the current work addresses information content. Further development of MS-based peptide identification would benefit from decoupling signal from noise for which many possible solutions could be adapted from the field of signal analysis (29, 30). Optimized methods to deal with noise will provide added confidence in UIS identification and are an important future direction that will likely need instrument-specific solutions.

Supplementary Material

[Supplemental Data]

Acknowledgments

This research was facilitated by access to the Australian Proteome Analysis Facility established under the Australian Governments National Collaborative Research Infrastructure Scheme.

Footnotes

Inline graphic The on-line version of this article (available at http://www.mcponline.org) contains supplemental material.

1 The abbreviations used are:

IDA
information-dependent acquisition
MIDAS
MRM-initiated detection and sequencing
SRM
selective reaction monitoring
UIS
unique ion signature(s) (a combination of ions generated by a peptide that maps exclusively to one peptide in the proteome being analyzed)
UIS|r
UIS composed of r SRM scans, meaning one Q1 value and r Q3 values
IAA
iodoacetamide
XIC
extracted ion chromatogram.

REFERENCES

  • 1.Wolters D. A., Washburn M. P., Yates J. R., 3rd (2001) An automated multidimensional protein identification technology for shotgun proteomics. Anal. Chem 73, 5683–5690 [DOI] [PubMed] [Google Scholar]
  • 2.Aebersold R., Mann M. (2003) Mass spectrometry-based proteomics. Nature 422, 198–207 [DOI] [PubMed] [Google Scholar]
  • 3.Domon B., Aebersold R. (2006) Mass spectrometry and protein analysis. Science 312, 212–217 [DOI] [PubMed] [Google Scholar]
  • 4.Eriksson J., Fenyö D. (2002) A model of random mass-matching and its use for automated significance testing in mass spectrometric proteome analysis. Proteomics 2, 262–270 [DOI] [PubMed] [Google Scholar]
  • 5.Cargile B. J., Bundy J. L., Stephenson J. L., Jr. (2004) Potential for false positive identifications from large databases through tandem mass spectrometry. J. Proteome Res 3, 1082–1085 [DOI] [PubMed] [Google Scholar]
  • 6.Marcotte E. M. (2007) How do shotgun proteomics algorithms identify proteins? Nat. Biotechnol 25, 755–757 [DOI] [PubMed] [Google Scholar]
  • 7.Gerber S. A., Rush J., Stemman O., Kirschner M. W., Gygi S. P. (2003) Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl. Acad. Sci. U.S.A 100, 6940–6945 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Anderson N. L., Anderson N. G., Haines L. R., Hardie D. B., Olafson R. W., Pearson T. W. (2004) Mass spectrometric quantitation of peptides and proteins using Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAPA). J. Proteome Res 3, 235–244 [DOI] [PubMed] [Google Scholar]
  • 9.Barnidge D. R., Goodmanson M. K., Klee G. G., Muddiman D. C. (2004) Absolute quantification of the model biomarker prostate-specific antigen in serum by LC-Ms/MS using protein cleavage and isotope dilution mass spectrometry. J. Proteome Res 3, 644–652 [DOI] [PubMed] [Google Scholar]
  • 10.Cox D. M., Zhong F., Du M., Duchoslav E., Sakuma T., McDermott J. C. (2005) Multiple reaction monitoring as a method for identifying protein posttranslational modifications. J. Biomol. Tech 16, 83–90 [PMC free article] [PubMed] [Google Scholar]
  • 11.Kirkpatrick D. S., Gerber S. A., Gygi S. P. (2005) The absolute quantification strategy: a general procedure for the quantification of proteins and post-translational modifications. Methods 35, 265–273 [DOI] [PubMed] [Google Scholar]
  • 12.Unwin R. D., Griffiths J. R., Leverentz M. K., Grallert A., Hagan I. M., Whetton A. D. (2005) Multiple reaction monitoring to identify sites of protein phosphorylation with high sensitivity. Mol. Cell. Proteomics 4, 1134–1144 [DOI] [PubMed] [Google Scholar]
  • 13.Lin S., Shaler T. A., Becker C. H. (2006) Quantification of intermediate-abundance proteins in serum by multiple reaction monitoring mass spectrometry in a single-quadrupole ion trap. Anal. Chem 78, 5762–5767 [DOI] [PubMed] [Google Scholar]
  • 14.Anderson L., Hunter C. L. (2006) Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol. Cell. Proteomics 5, 573–588 [DOI] [PubMed] [Google Scholar]
  • 15.Stahl-Zeng J., Lange V., Ossola R., Eckhardt K., Krek W., Aebersold R., Domon B. (2007) High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites. Mol. Cell. Proteomics 6, 1809–1817 [DOI] [PubMed] [Google Scholar]
  • 16.Keshishian H., Addona T., Burgess M., Kuhn E., Carr S. A. (2007) Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol. Cell. Proteomics 6, 2212–2229 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wolf-Yadlin A., Hautaniemi S., Lauffenburger D. A., White F. M. (2007) Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proc. Natl. Acad. Sci. U.S.A 104, 5860–5865 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.McKay M., Sherman J., Laver M., Baker M., Clarke S., Molloy M. (2007) The development of multiple reaction monitoring assays for liver-derived plasma proteins. Proteomics Clin. Appl 1, 1570–1581 [DOI] [PubMed] [Google Scholar]
  • 19.Sandhu C., Hewel J. A., Badis G., Talukder S., Liu J., Hughes T. R., Emili A. (2008) Evaluation of data-dependent versus targeted shotgun proteomic approaches for monitoring transcription factor expression in breast cancer. J. Proteome Res 7, 1529–1541 [DOI] [PubMed] [Google Scholar]
  • 20.Lange V., Malmström J. A., Didion J., King N. L., Johansson B. P., Schäfer J., Rameseder J., Wong C. H., Deutsch E. W., Brusniak M. Y., Bühlmann P., Björck L., Domon B., Aebersold R. (2008) Targeted quantitative analysis of Streptococcus pyogenes virulence factors by multiple reaction monitoring. Mol. Cell. Proteomics 7, 1489–1500 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sherman J., McKay M. J., Ashman K., Molloy M. P. (2009) How specific is my SRM? The issue of precursor and product ion redundancy. Proteomics 9, 1120–1123 [DOI] [PubMed] [Google Scholar]
  • 22.Thorne G. C., Gaskell S. J. (1989) Elucidation of some fragmentations of small peptides using sequential mass spectrometry on a hybrid instrument. Rapid Commun. Mass Spectrom 3, 217–221 [DOI] [PubMed] [Google Scholar]
  • 23.Biemann K. (1988) Contributions of mass spectrometry to peptide and protein structure. Biomed. Environ. Mass Spectrom 16, 99–111 [DOI] [PubMed] [Google Scholar]
  • 24.Saldanha R. G., Molloy M. P., Bdeir K., Cines D. B., Song X., Uitto P. M., Weinreb P. H., Violette S. M., Baker M. S. (2007) Proteomic identification of lynchpin urokinase plasminogen activator receptor protein interactions associated with epithelial cancer malignancy. J. Proteome Res 6, 1016–1028 [DOI] [PubMed] [Google Scholar]
  • 25.Mallick P., Schirle M., Chen S. S., Flory M. R., Lee H., Martin D., Ranish J., Raught B., Schmitt R., Werner T., Kuster B., Aebersold R. (2007) Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol 25, 125–131 [DOI] [PubMed] [Google Scholar]
  • 26.Zhang Z. (2004) Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem 76, 3908–3922 [DOI] [PubMed] [Google Scholar]
  • 27.Zhang Z. (2005) Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges. Anal. Chem 77, 6364–6373 [DOI] [PubMed] [Google Scholar]
  • 28.Prakash A., Tomazela D. M., Frewen B., Maclean B., Merrihew G., Peterman S., Maccoss M. J. (2009) Expediting the development of targeted SRM assays: using data from shotgun proteomics to automate method development. J. Proteome Res 8, 2733–2739 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Oppenheim A., Willsky A., Hamid S. (1996) Signals and Systems, Prentice-Hall, Upper Saddle River, NJ [Google Scholar]
  • 30.Oppenheim A., Schafer R., Buck J. (1999) Discrete-Time Signal Processing, Prentice-Hall, Upper Saddle River, NJ [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplemental Data]
M800512-MCP200_1.pdf (1.4MB, pdf)

Articles from Molecular & Cellular Proteomics : MCP are provided here courtesy of American Society for Biochemistry and Molecular Biology

RESOURCES