Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Aug 1.
Published in final edited form as: J Am Soc Mass Spectrom. 2009 Mar 14;20(8):1435–1440. doi: 10.1016/j.jasms.2009.03.006

Post-acquisition ETD spectral processing for increased peptide identifications

David M Good 1, Craig D Wenger 1, Graeme C McAlister 1, Joshua J Coon 1,2,*
PMCID: PMC2716440  NIHMSID: NIHMS90370  PMID: 19362853

Abstract

Tandem mass spectra (MS/MS) produced using electron transfer dissociation (ETD) differ from those derived from collision-activated dissociation (CAD) in several important ways. Foremost, the predominant fragment ion series are different: c-and z-type ions are favored in ETD spectra while b-and y-type ions comprise the bulk of the CAD spectra. Additionally, ETD spectra possess specific neutral losses and charge-reduced precursors . Most database search algorithms were designed to analyze CAD spectra, and have only recently been adapted to accomodate c-and z-ions; however, inclusion of these additional spectral features can hinder identification, leading to lower confidence scores and decreased sensitivity. Therefore, it is important to pre-process spectral data prior to submission to a database search to remove those features which cause complications. Here, we demonstrate the effect of removing these features on the number of identifications at a 1% false discovery rate (FDR) using the open mass spectrometry search algorithm (OMSSA). When analyzing two biological replicates of a yeast protein extract in three total analyses, the number of identifications with a ~1% FDR increased from ~4611 to ~5931 upon spectral pre-processing – an increase of ~28.6%. We outline the most effective pre-processing methods, and provide free software containing these algorithms.

Introduction

Collision-activated dissociation (CAD) is the most commonly employed method of peptide fragmentation. During CAD, peptide ions are collided with inert gas atoms, imparting kinetic energy, and causing preferential cleavage of the weakest, most energetically favorable bonds. For unmodified peptides, the weakest bonds tend to be the amide linkages between amino acid residues. Cleavage of these bonds results in the formation of b- and y-type fragment ions. A fundamentally distinct, electron-based fragmentation method was discovered ten years ago by Zubarev et al. 1 In electron capture dissociation (ECD), the capture of thermal electrons by peptide cations also induces cleavage of backbone bonds. However, in contrast to CAD, in ECD, cleavage of the N-Cα bond is favored, thus forming c- and z-type fragment ions. Recently, an ion/ion analog to ECD was discovered. In electron transfer dissociation (ETD), the ability to produce ECD-like spectra was achieved within an ion trap mass spectrometer through ion/ion reactions with radical anion molecules.2, 3 Subsequent studies have suggested that ETD is better suited than CAD for analysis of certain peptides, particularly those that are large, highly basic, and/or contain post-translational modifications4, 5, and is considered a complementary dissociation method.6, 7 With the advent of commercially available bench-top ion trap instruments capable of electron-based dissociation, its inclusion in large-scale proteomics experiments has become increasingly widespread. Due to this increased use, several of the more common database search algorithms have been adapted to allow for searching of c- and z-ions. However, these algorithms had originally been designed for searching CAD-generated data, typically using trypsin-digested proteins as the standard for the construction of the searching algorithm. These algorithms therefore do not take into account several of the major spectral differences between CAD- and ETD-produced spectra.

In addition to producing different fragment ion types, ETD spectra differ from those generated by CAD in several important ways. As well as containing significantly more unreacted precursor than is typical for a CAD spectra generated via resonant excitation, charge-reduced species – those precursor molecules which received an electron but did not dissociate (ETnoD) – often constitute a significant fraction of the total ion current of the spectrum. Another major difference between ETD and CAD spectra are the types of neutral losses observed as a result of the fragmentation. While CAD spectra frequently contain fragment ions with neutral losses of immonium ions or water molecules, these are not often seen in those produced from ETD. Recently, several groups have reported on those neutral losses which are specific to ECD spectra.810 While it has been suggested that neutral losses produced from the electron-based methods may be used to increase the confidence of peptide identifications, this has been shown only within the context of a spectral database, and has not yet been extended to database search algorithms.11, 12 Therefore, while many algorithms account for several of the well-documented neutral losses observed in CAD, they do not currently take advantage of the extra information available within ETD-generated spectra. On the contrary, those database search algorithms which do allow for searching of ETD or ECD spectra typically produce lower scoring identifications when these species are present within the spectrum, ultimately leading to fewer identifications or a lower confidence threshold.

Here, we demonstrate the effect of removing these features on both the confidence and number of identifications using a commonly employed database search algorithm, the open mass spectrometry search algorithm (OMSSA).13 We outline the most effective pre-processing methods, and provide free software containing these algorithms to aid in automated workflows

Materials and Methods

Cell culture and protein harvesting

Wildtype Saccharomyces cerevisiae were grown and proteins harvested as previously described.14 Briefly, yeast were grown in rich media to an OD600 of 0.97, spun down, washed with sterile water, and pelleted via centrifugation at 3,000 rpm for 5 minutes. The cell pellet was added to a volume of lysis buffer containing protease inhibitors. The sample was French-pressed 3 times and centrifuged for 15 min at 14,000 rpm at 4 °C. This procedure was performed twice, using biological replicates.

Digestion

To reduce and alkylate cysteine residues, ~4.2 mg of protein (as determined by BCA assay) was incubated in 2.5 mM DTT for 25 min. at 60 °C followed by incubation in 7 mM iodoacetamide in the dark at room temperature for 30 minutes. Alkylation was capped by incubation in 2.5 mM DTT for 15 min. at room temperature. The samples were digested with endoproteinase Lysine-C, desalted, and the eluent lyophilized following the established protocol.14

Fractionation

SCX fractionation of all samples was performed as previously described by Villen et al.15 Peptides were redissolved in 500 µl SCX buffer A (5 mM KH2PO4, pH 2.6/30% acetonitrile) and separated using a 9.4 × 200-mm polysulfoethyl aspartamide column (5-mm particle size; 200-Å pore; PolyLC, Columbia, MD) with a Surveyor HPLC pump and PDA detector (Thermo Fisher Scientific, San Jose, CA). Buffer A was flowed over the column for 3 minutes, and peptides were then separated with a linear gradient from 0% to 21% buffer B (5 mM KH2PO4, pH 2.6/30% acetonitrile/350 mM KCl) over 35 min. The column was equilibrated with multiple washes of 100% buffer B and 100% buffer C (20 mM Tris, pH 8.5). Fractions were collected over three minute intervals, yielding 12 total fractions. These fractions were lyophilized, resuspended in 0.5% TFA, and desalted using 100-mg tC18 SepPak cartridges (Waters, Milford, MA). The resulting eluent was lyophilized and stored at −20° C until further use.

Chromatography

SCX fractions were loaded onto a pre-column via a Waters nanoAcquity auto-sampler (Waters, Framingham, MA). Columns were prepared in-house as described, with the exception that the analytical column was packed to 12 cm.16 Peptides were chromatographically separated using a vented column setup on a Waters nanoAcquity and employing a 40 min. linear gradient of 1.4% to 49% acetonitrile in 0.2% formic acid.

Mass Spectrometry

Mass spectrometry experiments were performed in an online manner, in which peptides eluting from the above described nHPLC method were sampled via an integrated electrospray emitter for peptide ionization.16 Analysis was carried out using a hybrid LTQ-Orbitrap which had been modified in-house to perform ETD.17, 18 MS analysis was executed in the Orbitrap at 30,000 resolving power, followed by six data-dependent MS/MS events with product ion analysis performed in the QLT. The list of included precursor ion targets was determined by intensity, followed by dynamic exclusion (30 seconds or a peak list of 500), and charge state inclusion (ions with two or more charges). Each SCX fraction was analyzed using ETD, employing a reaction time of 63 ms, precursor cation target value of 10,000 ions, and an anion target value of 250,000 reagent ions. Duplicate analysis was performed on the first biological replicate to insure reproducibility of the data, while the other biological replicate was analyzed a single time (3 total analyses).

Spectral Processing

Using software written in-house, Thermo Scientific .raw data files were directly converted into .dta files and batched into text files containing 10,000 .dta’s each. This same software was written to selectively remove unwanted spectral features during the creation of the .dta files. It supported three different options: (1) removal of the precursor - all peaks +/− 3.1 m/z from the precursor m/z were removed, (2) removal of those charge-reduced precursor molecules which were the result of ETnoD – all peaks +/− 3.1 m/z from the charge-reduced precursor m/z for each charge state from +1 to the precursor charge – 1, and (3) removal of those peaks which were the result of neutral losses from charge-reduced precursors - all peaks -60 Da (scaled to charge) to the charge-reduced precursor for each charge state from +1 to the precursor charge – 1. For comparison of the impact that removal of different spectral features has on the probability of identification, spectra were processed in four ways: (1) performing all spectral processing steps, (2) performing the first two processing steps, but not removing neutral losses, (3) removing only the precursor, and (4) not processing the spectra in any way, thus providing a baseline for comparison.

Database searching

MS/MS spectra were searched against a concatenated target-decoy version of the Saccharomyces Genome Database (database reversed using a modified version of the open-source software available from the Gygi lab at: http://gygi.med.harvard.edu/gygilab/index.php?html=software) using OMSSA (Open Mass Spectrometry Search Algorithm; freely available from NCBI). The search algorithm parameters were set to consider the static modification of +57 Da on cysteine residues (carbamidomethylation), a variable modification of +16 Da on methionine residues (oxidation), a precursor mass tolerance of 4.0 Da, and a fragment ion mass tolerance of 0.5 Da. Search results were trimmed to a one percent false discovery rate using a program written in-house based on identification expectation value and precursor mass error.19

Results and Discussion

A view onto how spectral processing affects the data submitted to a protein database search is provided in Figure 1A–D. Here, we display the step-wise removal of specified m/z regions from the original spectrum (Figure 1A), with the complexity of the spectrum clearly decreasing as more m/z regions are removed (Figure 1B–D). As is evident when comparing Panels A and B of Figure 1, the precursor m/z is generally the greatest contributor to the total ion current within the spectrum. This, of course, is a function of ion/ion reaction duration. Note the precursor can be entirely removed by extending the ETD reaction duration; however, doing so also results in the destruction of the newly formed product ions as they are likewise subject to reaction with the ETD reagent. For a triply protonated precursor the optimal reaction duration is one that removes ~ 2/3 of the initial amount of precursor cation; thus, a substantial amount of unreacted precursor remains. Panel C of Figure 1 displays the spectrum that results from removal of the precursor m/z as well as the removal of all the charge-reduced (ETnoD) precursors. Finally, Panel D illustrates the fully processed spectrum, where neutral losses from the charge-reduced precursors have been removed. This MS/MS event provides an interesting example of how processing affects the possibility for identification, as only the fully processed spectrum provided an identification when searched using the OMSSA database search algorithm and trimming to a 1% FDR. Though OMSSA provided high-quality hits for all spectra, regardless of processing, the e-value for peptides identified from processed spectra decreased (thus increased in probability) with more extensive processing. In this case, the e-value decreased with each successive processing step, from 1.17 × 10−08 for no pre-submission processing to 6.38 × 10−11 for the fully processed .dta file. In each case, OMSSA was able to label 21 of the 28 possible fragments for this peptide, relating to 13 of the 14 possible bonds being cleaved. The only bond which was not dissociated was the N-terminal arginine to proline bond, one which is not observed in ETD due to the ring structure of proline. But with each successive processing step non-matching peaks are removed which, in-turn, decreases the chance that the match is a random occurrence. Interestingly, the more high confidence identifications there are in a data set, the more low confidence identifications that can be included at a 1% false discovery rate. So, as a result of vigorous spectral processing, the standards for inclusion of an identification at the 1% FDR level are loosened – it is this trend which allowed this particular match to be included when it was subjected to all forms of cleaning.

Figure 1.

Figure 1

The effect of removing interfering peaks from the MS/MS scan #3118 from SCX fraction 6, biological replicate #1, first analysis. Panel A shows the unprocessed scan, where panels B–D show the step-wise removal of the precursor, charge-reduced precursors, and neutral losses, respectively. Note how while all scans were identified by OMSSA, as the interfering peaks were removed, the confidence of the identification increased, and only that which was completely processed remained after trimming to a 1% FDR.

Referring back to Figure 1, one can see the effect of removing the relatively high-intensity m/z peaks that result from unreacted precursor and ETnoD products on the identification of a peptide. The precursor is the highest intensity peak within the spectrum, with the doubly charged charge-reduced precursor having the second most intense peak. As OMSSA removes “noise” peaks dynamically within a range relative to the most intense peak, removal of these two most intense peaks drops the initial value for the maximum intensity within the spectrum by several-fold. Subsequent removal of the other interfering ions further reduces the theoretical maximum intensity within the spectrum, with the removal of each successively intense ion dropping the maximum by a relatively large percentage. It is therefore probable that in spectra like that illustrated in Figure 1, c- and z-ions are deleted from the spectral matching step in OMSSA because of their low intensity relative to these precursor, ETnoD, and neutral loss peaks. This is especially problematic for ETD-generated spectra, as production of fragment ions is typically less efficient than in CAD ( i.e., charge neutralization), thus causing these ions to have small fraction of the total intensity of the precursor and charge-reduced precursors. The observations made with this particular example will not necessarily hold true for every spectrum generated using ETD – precursor characteristics such as charge state can greatly affect the partitioning between ETD and ETnoD.

To assess the impact of spectral processing on a large amount of ETD data we analyzed spectra produced from a large-scale experiment.14 Yeast protein extracts (two replicates) were digested and fractioned by strong cation exchange into twelve pools. A nHPLC-MS/MS experiment was performed on each of the fractions, twice for the first replicate and once for the second for three sets of 12 analyses. Figure 2 presents the overall effects of removing varying m/z regions from these data sets prior to submission to a protein database search. All identifications were determined to be within a 1% FDR. Due to its contribution to the total ion current of the spectrum, removal of the precursor showed the greatest impact on the quality of the matches for a given spectrum (as evident by the ~15% average increase in identifications). Subsequent removal of other interfering species showed gradual improvement of the total number of unique (sequence exclusive) identifications, with the combined removal of precursor, ETnoD, and neutral loss m/z ranges producing the greatest number of matches (5931). The collective effect of removing all of the interfering species listed above resulted in a substantial improvement in the total number of identifications, ~26.3% increase.

Figure 2.

Figure 2

The effect of spectral pre-processing on total ID’s from a proteomics experiment. Panel A provides the average number of unique identifications (sequence exclusive) within a 1% FDR for the three analyses based upon the amount of spectral processing. Panel B illustrates the percent decrease for these identifications as compared performing the entire processing routine.

Conclusions

We have illustrated the importance of removing interfering peaks from ETD-generated spectra prior to submission to a database search using the OMSSA algorithm. Note that the OMSSA algorithm has recently been updated to remove charge-reduced precursors and some of their associated neutral losses. This command was not employed during the OMSSA searches, and the algorithm presented here varies from that employed in OMSSA. Whereas OMSSA assumes a precursor charge of four and scales the windows removed by mass tolerance, our algorithm determines the windows based upon the charge contained within the .dta file (this information is obtained through collection of the MS data with the high resolution orbitrap mass analyzer). We anticipate the conclusions reached here are applicable for all peptide search algorithms, although the impact of spectral pre-processing could be less significant for algorithms designed specifically for ETD. Removal of these peaks led to a marked increase in both total identifications and also identifications trimmed to a 1% false discovery rate. The step-wise removal of these categories of peaks was characterized and the relative impact of each was analyzed. The removal of the precursor from the spectra produced the greatest increase in total identifications, while subsequent removal of the charge-reduced precursor(s) and neutral loss m/z regions also provided gains in the number of identifications.

The software used to generate .dta’s and remove the above-specified spectral features is freely available for download: http://www.chem.wisc.edu/~coon/index.html.

As this work focuses solely on the impact of removal of certain extraneous ETD-generated features on results produced from the OMSSA search algorithm, future analyses will be performed using multiple, commonly employed search algorithms, and will also investigate the effect of feature intensity on resulting IDs.

Acknowledgements

We dedicate this manuscript to Alexander Makarov and to his wonderful achievement – the development of the orbitrap mass analyzer. Alexander’s innovation has transformed not only our own research, but the entire field of proteomics. We are honored to participate in this issue and to celebrate his considerable contribution. Congratulations Alexander!

DMG and GCM are grateful for support from NIH pre-doctoral fellowships – (Biotechnology Training Program, NIH 5T32GM08349). The University of Wisconsin, the Beckman Foundation, Eli Lilly, and the National Institutes of Health (1R01GM080148) provided financial support for this work. We thank Danielle Swaney for generating the large-scale yeast dataset. Finally, we thank Don Hunt, Jeff Shabanowitz, and Dina Bai for the gift of similar software that was the inspiration for our work and for this comparitive study.

References

  • 1.Zubarev RA, Kelleher NL, McLafferty FW. Electron capture dissociation of multiply charged protein cations. A nonergodic process. Journal of the American Chemical Society. 1998;120(13):3265–3266. [Google Scholar]
  • 2.Coon JJ, Ueberheide B, Syka JE, Dryhurst DD, Ausio J, Shabanowitz J, Hunt DF. Protein identification using sequential ion/ion reactions and tandem mass spectrometry. Proc Natl Acad Sci U S A. 2005;102(27):9463–9468. doi: 10.1073/pnas.0503189102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Syka JE, Coon JJ, Schroeder MJ, Shabanowitz J, Hunt DF. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc Natl Acad Sci U S A. 2004;101(26):9528–9533. doi: 10.1073/pnas.0402700101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Molina H, Horn DM, Tang N, Mathivanan S, Pandey A. Global proteomic profiling of phosphopeptides using electron transfer dissociation tandem mass spectrometry. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(7):2199–2204. doi: 10.1073/pnas.0611217104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chi A, Huttenhower C, Geer LY, Coon JJ, Syka JEP, Bai DL, Shabanowitz J, Burke DJ, Troyanskaya OG, Hunt DF. Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(7):2193–2198. doi: 10.1073/pnas.0607084104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Good DM, Wirtala M, McAlister GC, Coon JJ. Performance characteristics of electron transfer dissociation mass spectrometry. Molecular & Cellular Proteomics. 2007;6(11):1942–1951. doi: 10.1074/mcp.M700073-MCP200. [DOI] [PubMed] [Google Scholar]
  • 7.Huang YY, Triscari JM, Tseng GC, Pasa-Tolic L, Lipton MS, Smith RD, Wysocki VH. Statistical characterization of the charge state and residue dependence of low-energy CID peptide dissociation patterns. Analytical Chemistry. 2005;77(18):5800–5813. doi: 10.1021/ac0480949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cooper HJ, Hakansson K, Marshall AG, Hudgins RR, Haselmann KF, Kjeldsen F, Budnik BA, Polfer NC, Zubarev RA. Letter: the diagnostic value of amino acid side-chain losses in electron capture dissociation of polypeptides. Comment on: "Can the (M(.)-X) region in electron capture dissociation provide reliable information on amino acid composition of polypeptides?", Eur. J. Mass Spectrom. 8, 461–469 (2002) Eur. J. Mass Spectrom. 2003;9:221–222. doi: 10.1255/ejms.555. [DOI] [PubMed] [Google Scholar]
  • 9.Cooper HJ, Hudgins RR, Hakansson K, Marshall AG. Characterization of amino acid side chain losses in electron capture dissociation. J Am Soc Mass Spectrom. 2002;13(3):241–249. doi: 10.1016/S1044-0305(01)00357-9. [DOI] [PubMed] [Google Scholar]
  • 10.Fung YME, Chan TWD. Experimental and theoretical investigations of the loss of amino acid side chains in electron capture dissociation of model peptides. Journal of the American Society for Mass Spectrometry. 2005;16(9):1523–1535. doi: 10.1016/j.jasms.2005.05.001. [DOI] [PubMed] [Google Scholar]
  • 11.Falth M, Savitski MM, Nielsen ML, Kjeldsen F, Andren PE, Zubarev RA. Analytical utility of small neutral losses from reduced species in electron capture dissociation studied using SwedECD database. Anal Chem. 2008;80(21):8089–8094. doi: 10.1021/ac800944u. [DOI] [PubMed] [Google Scholar]
  • 12.Savitski MM, Nielsen ML, Zubarev RA. Side-chain losses in electron capture dissociation to improve peptide identification. Anal Chem. 2007;79(6):2296–2302. doi: 10.1021/ac0619332. [DOI] [PubMed] [Google Scholar]
  • 13.Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang XY, Shi WY, Bryant SH. Open mass spectrometry search algorithm. Journal of Proteome Research. 2004;3(5):958–964. doi: 10.1021/pr0499491. [DOI] [PubMed] [Google Scholar]
  • 14.Swaney DL, McAlister GC, Coon JJ. Decision tree-driven tandem mass spectrometry for shotgun proteomics. Nature Methods. 2008;5(11):959–964. doi: 10.1038/nmeth.1260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Villen J, Beausoleil SA, Gerber SA, Gygi SP. Large-scale phosphorylation analysis of mouse liver. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(5):1488–1493. doi: 10.1073/pnas.0609836104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Martin SE, Shabanowitz J, Hunt DF, Marto JA. Subfemtomole MS and MS/MS peptide sequence analysis using nano-HPLC micro-ESI Fourier transform ion cyclotron resonance mass spectrometry. Analytical Chemistry. 2000;72(18):4266–4274. doi: 10.1021/ac000497v. [DOI] [PubMed] [Google Scholar]
  • 17.McAlister GC, Berggren WT, Griep-Raming J, Horning S, Makarov A, Phanstiel D, Stafford G, Swaney DL, Syka JEP, Zabrouskov V, Coon JJ. A proteomics grade electron transfer dissociation-enabled hybrid linear ion trap-orbitrap mass spectrometer. Journal of Proteome Research. 2008;7(8):3127–3136. doi: 10.1021/pr800264t. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.McAlister GC, Phanstiel D, Good DM, Berggren WT, Coon JJ. Implementation of electron-transfer dissociation on a hybrid linear ion trap-orbitrap mass spectrometer. Analytical Chemistry. 2007;79(10):3525–3534. doi: 10.1021/ac070020k. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Elias JE, Gygi SP. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature Methods. 2007;4(3):207–214. doi: 10.1038/nmeth1019. [DOI] [PubMed] [Google Scholar]

RESOURCES