Preprocessing Significantly Improves the Peptide/Protein Identification Sensitivity of High-resolution Isobarically Labeled Tandem Mass Spectrometry Data

Quanhu Sheng; Rongxia Li; Jie Dai; Qingrun Li; Zhiduan Su; Yan Guo; Chen Li; Yu Shyr; Rong Zeng

doi:10.1074/mcp.O114.041376

. 2014 Nov 30;14(2):405–417. doi: 10.1074/mcp.O114.041376

Preprocessing Significantly Improves the Peptide/Protein Identification Sensitivity of High-resolution Isobarically Labeled Tandem Mass Spectrometry Data^*

Quanhu Sheng ^‡,^§,^**, Rongxia Li ^‡,^**, Jie Dai ^¶,^**, Qingrun Li ^‡, Zhiduan Su ^‡, Yan Guo ^§, Chen Li ^‡, Yu Shyr ^§,^‖, Rong Zeng ^‡,^‖

PMCID: PMC4350035 PMID: 25435543

Abstract

Isobaric labeling techniques coupled with high-resolution mass spectrometry have been widely employed in proteomic workflows requiring relative quantification. For each high-resolution tandem mass spectrum (MS/MS), isobaric labeling techniques can be used not only to quantify the peptide from different samples by reporter ions, but also to identify the peptide it is derived from. Because the ions related to isobaric labeling may act as noise in database searching, the MS/MS spectrum should be preprocessed before peptide or protein identification. In this article, we demonstrate that there are a lot of high-frequency, high-abundance isobaric related ions in the MS/MS spectrum, and removing isobaric related ions combined with deisotoping and deconvolution in MS/MS preprocessing procedures significantly improves the peptide/protein identification sensitivity. The user-friendly software package TurboRaw2MGF (v2.0) has been implemented for converting raw TIC data files to mascot generic format files and can be downloaded for free from https://github.com/shengqh/RCPA.Tools/releases as part of the software suite ProteomicsTools. The data have been deposited to the ProteomeXchange with identifier PXD000994.

Mass spectrometry-based proteomics has been widely applied to investigate protein mixtures derived from tissue, cell lysates, or from body fluids (1, 2). Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS)¹ is the most popular strategy for protein/peptide mixtures analysis in shotgun proteomics (3). Large-scale protein/peptide mixtures are separated by liquid chromatography followed by online detection by tandem mass spectrometry. The capabilities of proteomics rely greatly on the performance of the mass spectrometer. With the improvement of MS technology, proteomics has benefited significantly from the high-resolution and excellent mass accuracy (4). In recent years, based on the higher efficiency of higher energy collision dissociation (HCD), a new “high–high” strategy (high-resolution MS as well as MS/MS(tandem MS)) has been applied instead of the “high–low” strategy (high-resolution MS, i.e. in Orbitrap, and low-resolution MS/MS, i.e. in ion trap) to obtain high quality tandem MS/MS data as well as full MS in shotgun proteomics. Both full MS scans and MS/MS scans can be performed, and the whole cycle time of MS detection is very compatible with the chromatographic time scale (5).

High-resolution measurement is one of the most important features in mass spectrometric application. In this high–high strategy, high-resolution and accurate spectra will be achieved in tandem MS/MS scans as well as full MS scans, which makes isotopic peaks distinguishable from one another, thus enabling the easy calculation of precise charge states and monoisotopic mass. During an LC-MS/MS experiment, a multiply charged precursor ion (peptide) is usually isolated and fragmented, and then the multiple charge states of the fragment ions are generated and collected. After full extraction of peak lists from original tandem mass spectra, the commonly used search engines (i.e. Mascot (6), Sequest (7)) have no capability to distinguish isotopic peaks and recognize charge states, so all of the product ions are considered as all charge state hypotheses during the database search for protein identification. These multiple charge states of fragment ions and their isotopic cluster peaks can be incorrectly assigned by the search engine, which can cause false peptide identification. To overcome this issue, data preprocessing of the high-resolution MS/MS spectra is required before submitting them for identification. There are usually two major preprocessing steps used for high-resolution MS/MS data: deisotoping and deconvolution (8, 9). Deisotoping of spectra removes all isotopic peaks except monoisotopic peaks from multi-isotopic peaks. Deconvolution of spectra translates multiply charged ions to singly charged ions and also accumulates the intensity of fragment ions by summing up all the intensities from their multiply charged states. After performing these two data-preprocessing steps, the resulting spectra is simpler and cleaner and allows more precise database searching and accurate bioinformatics analysis.

With the capacity to analyze multiple samples simultaneously, stable isotope labeling approaches have been widely used in quantitative proteomics. Stable isotope labeling approaches are categorized as metabolic labeling (SILAC, stable isotope labeling by amino acids in cell culture) and chemical labeling (10, 11). The peptides labeled by the SILAC approach are quantified by precursor ions in full MS spectra, whereas peptides that have been isobarically labeled using chemical means are quantified by reporter ions in MS/MS spectra. There are two similar isobaric chemical labeling methods: (1) isobaric tag for relative and absolute quantification (iTRAQ), and (2) tandem mass tag (TMT) (12, 13). These reagents contain an amino-reactive group that specifically reacts with N-terminal amino groups and epilson-amino groups of lysine residues to label digested peptides in a typical shotgun proteomics experiment. There are four different channels of isobaric tags: TMT two-plex, iTRAQ four-plex, TMT six-plex, and iTRAQ eight-plex (12–16). The number before “plex” denotes the number of samples that can be analyzed by the mass spectrum simultaneously. Peptides labeled with different isotopic variants of the tag show identical or similar mass and appear as a single peak in full scans. This single peak may be selected for subsequent MS/MS analysis. In an MS/MS scan, the mass of reporter ions (114 to 117 for iTRAQ four-plex, 113 to 121 for iTRAQ eight-plex, and 126 to 131for TMT six-plex upon CID or HCD activation) are associated with corresponding samples, and the intensities represent the relative abundances of the labeled peptides. Meanwhile, the other ions from the MS/MS spectra can be used for peptide identification. Because of the multiplexing capability, isobaric labeling methods combined with bottom-up proteomics have been widely applied for accurate quantification of proteins on a global scale (14, 17–19). Although mostly associated with peptide labeling, these isobaric labeling methods have also been applied at protein level (20–23).

For the proteomic analysis of isobarically labeled peptides/proteins in “high–high” MS strategy, the common consensus is that accurate reporter ions can contribute to more accurate quantification. However, there is no evidence to show how the ions related to isobaric labeling affect the peptide/protein identification and what preprocessing steps should be taken for high-resolution isobarically labeled MS/MS. To demonstrate the effectiveness and importance of preprocessing, we examined how the combination of preprocessing steps improved peptide/protein sensitivity in database searching. Several combinatorial ways of data-preprocessing were applied for high-throughput data analysis including deisotoping to keep simple monoisotopic mass peaks, deconvolution of ions with multiple charge states, and preservation of top 10 peaks in every 100 Dalton mass range. After systematic analysis of high-resolution isobarically labeled spectra, we further processed the spectra and removed interferential ions that were not related to the peptide. Our results suggested that the preprocessing of isobarically labeled high-resolution tandem mass spectra significantly improved the peptide/protein identification sensitivity.

EXPERIMENTAL PROCEDURES

Sample Preparation

The Goto-Kakizaki (GK) rat liver tissue was respectively mixed with SDT-lysis buffer (2% SDS, 0.1 m DTT, and 0.1 m Tris-HCl, pH = 7.6) and then heated for 5 min at 100 °C. After that, the tissue layers were cooled to room temperature, sonicated 60 s at 100 w, and then centrifuged at 16,000 × g for 30 min at 20 °C for removing cell debris. The protein concentration was detected by measurements of tryptophan fluorescence as described (24). Briefly, 1 μl of sample or tryptophan standard (100 ng/μl) was added into 3 ml of 8 m urea buffer (8 m urea and 20 mm Tris-HCl, pH = 7.6). Fluorescence was excited at 295 nm and measured at 350 nm. The slits were set at 10 nm.

Six hundred micrograms of liver tissue from GK rat was digested by the FASP procedure as described (25) with small modifications. Each sample was transferred to a 10k filter (Pall Corporation, Port Washington, NY) and centrifuged at 10,000 × g for 20 min at 20 °C. 200 μl of UA buffer (8 m urea and 0.1 m Tris-HCl, pH = 8.5) was added and centrifuged at 10,000 × g for 20 min again. This step was repeated once. Then, the concentrate was mixed with 100 μl of 50 mm IAA in UA buffer and incubated for an additional 40 min at room temperature in darkness. After that, IAA was removed by centrifugation at 10,000 × g for 20 min. Following dilution with 200 μl of UA buffer and centrifugation twice, 200 μl of 200 mm triethylammonium bicarbonate (TEAB) buffer (pH 8.5) was added and centrifuged at 10,000 × g for 20 min. This step was repeated four times. Finally, 100 μl of 50 mm TEAB buffer (pH 8.5) and Trypsin (1:50, enzyme to protein) was added to the filter, and after 4 h, another 50 μg trypsin was added. The samples were digested 20 h at 37 °C and peptides were collected by centrifugation at 16,000 × g. To increase the yield of peptides, the filter was washed twice with 500 μl 0.5 m TEAB buffer (pH 8.5). The peptide solutions were dried in a vacuum concentrator.

The trypsin digestion of 100 μg protein from each sample was processed as described elsewhere. iTRAQ labeling was done following the manufacturer's instructions (AB SCIEX, Foster City, CA). Briefly, for each four- or eight-plex experiment, 100 μg of dried peptide mixture power from each digested sample was reconstituted with 30μl 0.5 mm TEAB Buffer (pH 8.5). Each peptide solution was labeled at room temperature for 2 h with one iTRAQ reagent vial (four-plex mass tag 114, 115, 116, 117 or eight-plex mass tag 113,114, 115, 116, 117, 118, 119, 121) previously reconstituted with 70 μl of anhydrous acetonitrile (ACN). After 2 h, 100 μl ddH2O were added to each tube to quench the iTRAQ reaction and incubated at room temperature for 30 min. The contents of all iTRAQ reagent-labeled sample tubes were combined into one tube for four or eight-plex experiments, respectively. Then, labeled samples were dried down by evaporation in a SpeedVac to obtain a brown pellet. 100 μl of water was added to the tube and the sample was dried completely. Prior to MS analysis, samples were desalted onto Empore C18 47 mm Disk (3 m). Just prior nano-LC, the fractions were resuspended in 20 μl of H₂O with 0.1% (v/v) TFA.

LC-MS/MS Analysis

The reverse phase-high performance liquid chromatography (RP-HPLC) separation was achieved on an UltiMate 3000 RSLC nanoLC Systems (Dionex, now ThermoFisher Scientific) equipped with a self-packed tip column (75 μm × 240 mm; C18, 1.9 μm) using a 180 min gradient at a flow rate of 150 nl/min. An LTQ-Orbitrap Velos instrument (Thermo Fisher Scientific) was operated in data-dependent mode. MS full scans were acquired in ranges m/z 300–2000. The mass spectrometer was set so that each full MS scan was followed by the ten most intense ions for MS/MS with charge ≥ +2 with the following Dynamic Exclusion™ settings: repeat counts, 1; repeat duration, 30 s; exclusion duration, 180 s. The normalized collision energy for MS2 was 45.0%. Full MS scans and MS/MS scans were acquired at a resolution of 30,000 for profile-mode and 7500 for centroid-mode respectively, with a lock mass option enabled for the 445.120025 ion. Data were acquired using Xcalibur software.

b/y Free Windows

b/y free windows are two mass windows for a specific mass spectrum that no B ion or Y ion would be in. With the assumption that the mass of an isobaric tag was M, trypsin was used as protease and the isobaric tag was attached at both the N-terminal of peptide and lysine (K), for a spectrum with singly charged precursor mass MH⁺, the b/y free windows of that spectrum can be calculated as below. Because only full-tryptic peptides are considered in data analysis, the latest amino acid of the peptide will be either arginine (R) with mass 156 or lysine with mass 128. Given the fact that glycine (G) is the smallest amino acid with mass 57, the minimum and maximum mass of B and Y ions can be calculated as formula (1–4):

graphic file with name zjw00215-4953-m02.jpg

graphic file with name zjw00215-4953-m03.jpg

graphic file with name zjw00215-4953-m04.jpg

where H₂O is the mass of water and H is the mass of hydrogen. Then, the b/y free window in the low mass range is from 0 to minimum (minimum (B), minimum (Y)) and the b/y free window in the high mass range is from maximum (maximum (B), maximum (Y)) to infinite.

Ion Frequency and Abundance Analysis

Only the spectra with precursor charges 2, 3, and 4 were used to detect high frequency ions. The ion frequency and ion abundance distribution in each sample were generated by software “Raw Ion Frequency Statistic Builder,” which was also a part of ProteomicsTools. The charge, mass to charge (m/z), and abundance of each ion were extracted from each MS/MS spectrum through Thermo's MS File Reader interface. The abundance of ions in each MS/MS was normalized to a uniform distribution [0..1]. The ions with relative abundance less than 0.01 were discarded. All remaining ions were deconvoluted to corresponding singly charged ions by formula (5). The ions without charge information were treated as a single charge state.

graphic file with name zjw00215-4953-m05.jpg

where H is the mass of hydrogen.

The ions in different deconvoluted spectra but with difference in masses less than 20 parts per million (ppm) were considered identical ions. The ion frequency and ion average relative abundance were calculated from all the MS/MS spectra in the sample. The ions with frequency larger than 0.3 and average relative abundance larger than 0.05 were defined as high frequency ions and classified to five categories: “Rep⁺,” “Label⁺,” “Y1,” “b/y free,” and “Unknown.” “Rep⁺” denotes that an ion is a reporter ion. “Label⁺” denotes that an ion is an isobaric tag ion with both reporter group and balance group. “Y1” denotes that an ion is a first Y series ion. Because trypsin was used in the sample preparation, a Y1 ion was produced from either lysine (K) or arginine (R). b/y free denotes that the mass of the ion is located in the b/y free windows of that spectrum. All other ions belonged to the “Unknown” category. An ion within one of the first four categories “Rep⁺, Label⁺, Y1, and b/y free) was considered annotated. For each deconvoluted tandem mass spectrum (forward spectrum), a backward spectrum was generated by using the mass of the precursor minus the mass of each forward ion. The backward ions were also filtered and annotated in the same fashion as the forward ions except that the ions with mass equal to “Label⁺” were marked as “Precursor-Label⁺.” “Precursor-Label⁺” denotes a precursor ion without the isobaric tag. The ions annotated as Rep⁺, Label⁺, and Precursor-Label⁺ are not related to the peptide and therefore can be confidently removed during data preprocessing. For the ions annotated as b/y free in low mass range, they are very likely not related to the peptide as well. But it is still possible that those ions are actually multiply charged ions that lack charge information in the spectrum.

Data Preprocessing

The tandem mass spectra were extracted by TurboRaw2MGF (v1.3.4) for database searching. Four fixed criteria were used to filter out low quality spectra: (1) the required precursor mass weight range was 400 to 5000 Daltons, (2) the minimum ion absolute abundance was 1.0, 3) the minimum ion count of a spectrum was 15, and 4) the minimum total ion absolute abundance of a spectrum was 100. Four processing options were also provided in TurboRaw2MGF including deisotoping to keep monoisotopic mass peaks, deconvolution of ions with multiple charge states, preservation of the top 10 peaks in every 100 Dalton mass range, and removing the ions that may not be related to the peptide. The spectra that passed the fixed criteria and were processed with a combination of the four options were saved in mascot generic format for further database searching.

Database Searching

Five engines were used for database searching, including Mascot (v2.2.2) (6), Comet (2014.01 rev. 1) (26), MyriMatch (v2.2.140) (27), OMSSA (v2.1.9) (28), and X! Tandem (2013.09.01.1) (29). All MS/MS spectra were searched against a composite target-decoy rat Uniprot database (Version 20120222), in which each protein sequence was followed by a reversed amino acid sequence. Trypsin was set as protease. Carbamidomethylation on cysteine (+57.021464), iTRAQ-labeling on N-terminal, and lysine were set as fixed modifications. Oxidation on methionine (+15.994915) was set as a variable modification. One missing cleavage site was allowed. The tolerances of peptides and fragment ions were set at 10 ppm and 0.02 Daltons respectively. SearchGUI (30) was used for MyriMatch and OMSSA searching. BuildSummary (31) was used to generate a confident protein list for both peptide and protein with a false discovery rate ≤ 0.01.

Software Development

We implemented our preprocessing steps in a user friendly software package named TurboRaw2MGF (v2.0). The previous version of TurboRaw2MGF was developed for low-resolution tandem mass spectra and was integrated into the package ProtQuantSuite (32). TurboRaw2MGF (v2.0) was developed using the C# programming language and was compiled in the Microsoft Visual Studio 2012 Professional Edition. The software is fully compatible with Windows-based operating systems with dotNET framework v4.5. TurboRaw2MGF (v2.0) and its source code can be downloaded freely from ln]https://github.com/shengqh/RCPA.Tools/releases/. The manual of TurboRaw2MGF (v2.0) can be viewed at https://github.com/shengqh/RCPA.Tools/wiki/.

Data Availability

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (33) via the PRIDE partner repository with the data set identifier PXD000994 and DOI 10.6019/PXD000994.

To access the data please visit: http://tinyurl.com/pdbkesj

Username: reviewer06796@ebi.ac.uk

Password: jWjYoiuT

RESULTS

Isobaric Related Mass Range

Table I illustrates some important ion properties in isobaric labeling methods. For iTRAQ4 spectra, the mass of a Label⁺ ion is within the low mass b/y free window, and the mass of a Precursor-Label⁺ ion is also within the high mass b/y free window. The isobaric related mass ranges include both low and high b/y free windows. For iTRAQ8 spectra, the mass of a Label⁺ ion is not within the low mass b/y free window and the mass of a Precursor-Label⁺ ion is also not within the high mass b/y free window. The isobaric related mass ranges not only include both low and high b/y free windows but also include the mass range around Label⁺ ion and Precursor-Label⁺ ion within a specific tolerance, which was 20 ppm in our study.

Table I. Ion characteristics of isobaric labeling methods.

Property	iTRAQ4	iTRAQ8
^aRep⁺	114–117	113–119,121
^bLabel⁺	145	305
Isobaric ion mass	144	304
Minimum B ion	202	362
Minimum Y ion	175	175
Low mass b/y free window	0∼175	0∼175
Maximum B ion	MH⁺-174	MH⁺-174
Maximum Y ion	MH⁺-201	MH⁺-361
High mass b/y free window	MH⁺-174∼INF	MH⁺-174∼INF
^cPrecursor-Label⁺	MH⁺-144	MH⁺-304

Open in a new tab

^a Rep⁺: reporter ions.

^b Label⁺: isobaric tag ion.

^c Precursor-Label⁺: the precursor ion without the isobaric tag.

Ion Frequency and Abundance

Tables II and III show the high frequency forward ions in iTRAQ4 and iTRAQ8 tandem mass spectra respectively. Almost all high frequency forward ions in iTRAQ4 tandem mass spectra were annotated, except 429.0888. Even with the majority of high frequency ions annotated, there were still more ions left unannotated in the iTRAQ8 tandem mass spectra than in iTRAQ4 tandem mass spectra.

Table II. High frequency ions in iTRAQ4 tandem mass spectra.

Rep+: singly charged reporter ion, Label+: singly charged isobaric tagion, Y1(R): y1 ion from peptide with 3′ terminal amino acid R, Y1(K): y1 ion from peptide with 3′ terminal amino acid K, b/y free: ion in the b/y free windows.

Charge	Ion	Count	Frequency	Mean^a	S.D.^b	Median^c	Annotation
2	116.1111	37123	0.997	0.837	0.274	1	Rep+
	115.1078	37101	0.9964	0.744	0.243	0.844	Rep+
	117.1144	37090	0.9961	0.799	0.269	0.934	Rep+
	114.1107	37076	0.9957	0.751	0.253	0.858	Rep+
	145.108	36251	0.9735	0.206	0.114	0.194	Label+
	291.2155	35287	0.9477	0.238	0.183	0.201	Y1(K)
	110.0712	34004	0.9132	0.15	0.196	0.078	b/y free
	175.119	33280	0.8938	0.2	0.202	0.111	Y1(R)
	120.0807	25310	0.6797	0.149	0.194	0.077	b/y free
	158.0924	23172	0.6223	0.076	0.061	0.061	b/y free
	429.0889	15170	0.4074	0.304	0.378	0.087	Unknown
3	116.111	23680	0.9917	0.433	0.256	0.383	Rep+
	115.1077	23666	0.9911	0.399	0.243	0.347	Rep+
	117.1144	23644	0.9902	0.41	0.244	0.363	Rep+
	114.1107	23639	0.99	0.387	0.234	0.339	Rep+
	291.2155	22582	0.9457	0.294	0.247	0.225	Y1(K)
	145.1079	22367	0.9367	0.127	0.092	0.103	Label+
	110.0712	21007	0.8798	0.285	0.309	0.149	b/y free
	175.119	19229	0.8053	0.133	0.119	0.098	Y1(R)
	120.0807	17309	0.7249	0.22	0.266	0.105	b/y free
	429.0888	13624	0.5706	0.261	0.329	0.098	Unknown
	136.0756	13377	0.5602	0.153	0.225	0.06	b/y free
	258.1936	9096	0.3809	0.08	0.082	0.054	Unknown
	404.2996	7790	0.3262	0.167	0.225	0.063	Unknown
	101.0709	7675	0.3214	0.078	0.075	0.05	b/y free
4	115.1077	3570	0.938	0.123	0.089	0.101	Rep+
	116.111	3566	0.9369	0.131	0.089	0.112	Rep+
	117.1144	3552	0.9333	0.125	0.085	0.107	Rep+
	114.1107	3533	0.9283	0.118	0.083	0.099	Rep+
	291.2155	3287	0.8636	0.178	0.197	0.103	Y1(K)
	110.0712	2770	0.7278	0.216	0.257	0.103	b/y free
	120.0808	2314	0.608	0.181	0.238	0.082	b/y free
	429.0888	1976	0.5192	0.255	0.333	0.087	Unknown
	163.1188	1659	0.4359	0.249	0.333	0.062	b/y free
	136.0756	1594	0.4188	0.15	0.221	0.056	b/y free

Open in a new tab

^a mean of relative abundance.

^b stand deviation of relative abundance.

^c median of relative abundance.

Table III. High frequency ions in iTRAQ8 tandem mass spectra. Rep+: singly charged reporter ion, Label+: singly charged isobaric tag ion, Y1(R): y1 ion from peptide with 3′ terminal amino acid R, Y1(K): y1 ion from peptide with 3′ terminal amino acid K, b/y free: ion in the b/y free window.

Charge	Ion	Count	Frequency	Mean^a	S.D.^b	Median^c	Annotation
2	115.1078	28359	0.9886	0.608	0.303	0.725	Rep+
	119.1148	28350	0.9883	0.642	0.322	0.761	Rep+
	114.1107	28346	0.9881	0.642	0.32	0.765	Rep+
	118.1115	28342	0.988	0.646	0.316	0.776	Rep+
	117.1144	28327	0.9875	0.661	0.337	0.782	Rep+
	116.1111	28325	0.9874	0.589	0.312	0.653	Rep+
	113.1074	28271	0.9855	0.599	0.3	0.713	Rep+
	121.1215	28268	0.9854	0.626	0.321	0.742	Rep+
	201.1842	27382	0.9545	0.27	0.154	0.273	Unknown
	203.1837	26311	0.9172	0.16	0.087	0.163	Unknown
	219.1944	26252	0.9152	0.145	0.083	0.136	Unknown
	160.088	26010	0.9067	0.145	0.165	0.087	b/y free
	143.1008	25618	0.893	0.084	0.042	0.085	b/y free
	141.0939	25438	0.8868	0.091	0.042	0.093	b/y free
	161.1106	25354	0.8838	0.082	0.062	0.076	b/y free
	305.2095	25064	0.8737	0.103	0.06	0.094	Label+
	110.0713	24882	0.8674	0.186	0.215	0.106	b/y free
	147.1092	24818	0.8652	0.08	0.036	0.08	b/y free
	159.1038	24575	0.8567	0.1	0.1	0.083	b/y free
	221.1944	24406	0.8508	0.085	0.047	0.08	Unknown
	162.0947	24395	0.8504	0.095	0.108	0.06	b/y free
	163.0982	23646	0.8243	0.147	0.155	0.094	b/y free
	175.119	23383	0.8151	0.26	0.277	0.13	Y1(R)
	205.1905	23329	0.8133	0.084	0.041	0.086	Unknown
	163.1109	21118	0.7362	0.097	0.082	0.084	b/y free
	451.3162	19723	0.6875	0.114	0.099	0.09	Y1(K)
	145.1021	19367	0.6751	0.093	0.048	0.094	b/y free
	120.0807	18968	0.6612	0.21	0.251	0.111	b/y free
	136.0756	15912	0.5547	0.112	0.159	0.054	b/y free
	418.2941	14626	0.5099	0.189	0.235	0.082	Unknown
	429.0889	12637	0.4405	0.288	0.346	0.11	Unknown
	158.0923	12555	0.4377	0.11	0.078	0.096	b/y free
	112.0869	12275	0.4279	0.062	0.039	0.058	b/y free
3	114.1107	22905	0.9831	0.262	0.166	0.225	Rep+
	117.1144	22904	0.983	0.268	0.169	0.23	Rep+
	115.1077	22897	0.9827	0.248	0.161	0.211	Rep+
	118.1115	22886	0.9823	0.264	0.169	0.225	Rep+
	119.1147	22865	0.9814	0.263	0.167	0.225	Rep+
	113.1073	22864	0.9813	0.248	0.161	0.211	Rep+
	121.1215	22846	0.9806	0.254	0.164	0.216	Rep+
	116.1111	22831	0.9799	0.237	0.158	0.201	Rep+
	219.1944	22189	0.9524	0.165	0.103	0.151	Unknown
	201.1841	21927	0.9411	0.142	0.089	0.128	Unknown
	305.2095	21599	0.927	0.12	0.075	0.107	Label+
	451.3159	21094	0.9054	0.344	0.291	0.263	Y1(K)
	221.1943	20777	0.8918	0.095	0.059	0.086	Unknown
	203.1837	20343	0.8731	0.084	0.051	0.075	Unknown
	143.1011	19850	0.852	0.06	0.036	0.052	b/y free
	147.1093	19684	0.8448	0.065	0.037	0.057	b/y free
	160.0879	19452	0.8349	0.111	0.126	0.069	b/y free
	110.0712	19252	0.8263	0.257	0.3	0.123	b/y free
	163.098	18737	0.8042	0.103	0.115	0.065	b/y free
	141.094	17633	0.7568	0.064	0.039	0.056	b/y free
	120.0807	17103	0.7341	0.253	0.292	0.12	b/y free
	175.119	16820	0.7219	0.118	0.098	0.09	Y1(R)
	323.2203	15042	0.6456	0.095	0.137	0.051	Unknown
	191.1294	14861	0.6378	0.087	0.111	0.052	Unknown
	418.294	14369	0.6167	0.203	0.249	0.091	Unknown
	188.1196	14347	0.6158	0.088	0.113	0.051	Unknown
	153.1084	12205	0.5238	0.183	0.248	0.078	b/y free
	136.0756	11924	0.5118	0.174	0.25	0.065	b/y free
	145.1078	11454	0.4916	0.069	0.039	0.061	b/y free
	429.0889	10728	0.4604	0.219	0.293	0.082	Unknown
	102.0549	10138	0.4351	0.089	0.088	0.058	b/y free
	145.1023	10108	0.4338	0.071	0.045	0.061	b/y free
	101.0709	7528	0.3231	0.078	0.074	0.053	b/y free
	404.2781	7393	0.3173	0.168	0.229	0.065	Unknown
4	114.1107	3901	0.8978	0.084	0.051	0.074	Rep+
	117.1144	3881	0.8932	0.087	0.057	0.076	Rep+
	219.1944	3871	0.8909	0.095	0.06	0.086	Unknown
	118.1115	3868	0.8902	0.085	0.058	0.073	Rep+
	113.1073	3867	0.89	0.081	0.051	0.07	Rep+
	119.1148	3853	0.8868	0.085	0.055	0.074	Rep+
	451.3159	3851	0.8863	0.333	0.299	0.234	Y1(K)
	121.1215	3833	0.8822	0.083	0.052	0.073	Rep+
	115.1077	3831	0.8817	0.08	0.053	0.068	Rep+
	116.1111	3776	0.869	0.076	0.051	0.066	Rep+
	305.2096	3643	0.8384	0.082	0.059	0.068	Label+
	221.1943	3282	0.7554	0.057	0.033	0.051	Unknown
	201.1842	3256	0.7494	0.063	0.045	0.054	Unknown
	120.0808	2995	0.6893	0.238	0.27	0.117	b/y free
	110.0713	2600	0.5984	0.154	0.198	0.076	b/y free
	429.0889	2444	0.5625	0.227	0.289	0.097	Unknown
	418.2936	2086	0.4801	0.142	0.204	0.06	Unknown
	452.3188	2013	0.4633	0.069	0.046	0.058	Unknown
	153.1084	2000	0.4603	0.214	0.285	0.087	b/y free
	136.0755	1783	0.4104	0.154	0.21	0.062	b/y free
	102.0549	1611	0.3708	0.08	0.078	0.056	b/y free

Open in a new tab

^a mean of relative abundance.

^b stand deviation of relative abundance.

^c median of relative abundance.

For backward ions, only 144.1 (frequency = 0.3316, median of abundance = 0.207) from iTRAQ4 tandem mass spectra with double precursor charge and 304.1997 (frequency = 0.380, median of abundance = 0.199) from iTRAQ8 tandem mass spectra with double precursor charge passed the criteria. Both ions were annotated as Precursor-Label⁺.

Also, the frequency and abundance of reporter ions in both iTRAQ4 and iTRAQ8 data sets were decreased while the corresponding precursor charge increased.

Identification Sensitivity Improvement

We evaluated how the combination of preprocessing steps affected the peptide/protein identification sensitivity under the same peptide/protein false discovery rate 0.01. Table IV indicated 16 methods with different combination of five processing options used in the data preprocessing.

Table IV. 16 preprocessing methods with different combinations of three preprocessing steps.

Method	Deisotoping	Top 10	Remove Isobaric Ions
Method	Deconvolution	Top 10	Low^c	Label^b	High^a
1
2	+
3		+
4	+	+
5			+
6				+
7					+
8			+	+	+
9	+		+
10	+			+
11	+				+
12	+		+	+	+
13	+	+	+
14	+	+		+
15	+	+			+
16	+	+	+	+	+

Open in a new tab

^a b/y free window in high mass range.

^b reporter and isobaric tag ions.

^c b/y free window in low mass range.

Fig. 1 illustrates the identification results from iTRAQ4 and iTRAQ8 data sets using five search engines. The bigger the point of a method in the graph, the more identification that method achieved in the same engine and same isobaric labeling method. The red circle indicates the preprocessing method that achieved the most identification among all 16 methods. In iTRAQ4 data set, Mascot, MyriMatch, OMSSA, and X! Tandem achieved the most identified spectrum, peptide, and two-hit protein identification with preprocessing isobaric related ions, although the top performance method of each engine might not be identical to each other. In iTRAQ8 data set, only Mascot, OMSSA, and X! Tandem achieved most two-hit protein identification with preprocessing isobaric related ions. The preprocessing did not significantly improve the Comet identification sensitivity in both iTRAQ4 and iTRAQ8 data sets.

Fig. 2 illustrates the identification improvement of 15 preprocessing methods compared with non-preprocessing methods in iTRAQ4 and iTRAQ8 data sets. Among all five search engines, Mascot identification sensitivity was significantly improved by most of the preprocessing methods. The identification sensitivity of MyriMatch, OMSSA, and X! Tandem was moderately improved by some of the preprocessing methods. The identification sensitivity of Comet was not improved by most of the preprocessing methods. The detailed identification summary was also provided as supplemental Table S1–S10.

Comparing method 2 to method 1 in Table IV and V indicates that deisotoping and deconvolution significantly improved the Mascot spectrum identification for iTRAQ4 and iTRAQ8 from 16,442 to 18,286 (increased 11.2%) and from 8817 to 10,219 (increased 15.9%) respectively. Comparing method 3 to method 1 shows that keeping the top 10 ions in each 100 Dalton window decreased the Mascot identification sensitivity for the iTRAQ4 data set but increased the identification sensitivity for the iTRAQ8 data set. Identified spectrum count were moderately increased for iTRAQ4 (from 16,442 to 17,912, increased 8.9%) and significantly increased for iTRAQ8 (from 8817 to 12,012, increased 36.2%) by removing isobaric tag ions and the ions in low mass range b/y free window (comparing method 5 to method 1). Comparing methods 5, 6, and 7 to 1 indicates removing any one of the three isobaric related ion types improved Mascot identification sensitivity in both iTRAQ4 and iTRAQ8 data sets, except the ions in high mass range b/y free window in iTRAQ4 data set. Finally, comparing method 10 to method 1 in Table IV indicates that deisotoping, deconvolution, and removing isobaric ions improved the Mascot spectrum identification from 16,442 to 19,118 (increased 16.3%), the peptide identification from 6275 to 7148 (increased 13.9%), and the two-hit protein identification from 950 to 1013 (increased 6.6%) in iTRAQ4 data set. Comparing method 16 to method 1 in Table V indicates that deisotoping, deconvolution, and removing all possible isobaric related ions improved the Mascot spectrum identification from 8817 to 13,240 (increased 50.2%), the peptide identification from 3349 to 4671 (increased 39.5%) and the two-hit protein identification from 612 to 766 (increased 25.2%) in iTRAQ8 data set.

Table V. Identification result from iTRAQ4 dataset using Mascot.

Method	Deisotoping	Top 10	Remove Isobaric Ions			Spectrum	Peptide	Two-hits Protein
Method	Deconvolution	Top 10	Low^c	Label^b	High^a	Spectrum	Peptide	Two-hits Protein
1						16442	6275	950
2	+					18286	6876	992
3		+				15856	6059	931
4	+	+				18040	6757	989
5			+			17912	6752	989
6				+		17299	6614	973
7					+	16055	6110	931
8			+	+	+	16611	6268	959
9	+		+			18794	6964	1004
10^d	+			+		*19118*	*7148*	*1013*
11	+				+	18169	6787	982
12	+		+	+	+	18775	6918	1000
13	+	+	+			19143	7099	1011
14	+	+		+		19056	7100	1013
15	+	+			+	17735	6617	969
16	+	+	+	+	+	19313	7114	1012

Open in a new tab

^a b/y free window in high mass range.

^b reporter and isobaric tag ions.

^c b/y free window in low mass range.

^d best method identified most two-hits proteins and then most spectra.

Table VI. Identification result from iTRAQ8 dataset using Mascot.

Method	Deisotoping	Top 10	Remove Isobaric Ions			Spectrum	Peptide	Two-hits Protein
Method	Deconvolution	Top 10	Low^c	Label^b	High^a	Spectrum	Peptide	Two-hits Protein
1						8817	3349	612
2	+					10219	3793	657
3		+				9280	3508	634
4	+	+				10596	3934	674
5			+			12012	4356	732
6				+		10677	3978	687
7					+	9393	3557	634
8			+	+	+	12403	4464	736
9	+		+			12962	4594	756
10	+			+		11951	4361	721
11	+				+	10831	3994	671
12	+		+	+	+	13178	4671	763
13	+	+	+			13092	4624	759
14	+	+		+		12339	4496	733
15	+	+			+	11242	4141	690
16^d	+	+	+	+	+	*13240*	*4671*	*766*

Open in a new tab

^a b/y free window in high mass range.

^b reporter and isobaric tag ions.

^c b/y free window in low mass range.

^d best method identified most two-hits proteins and then most spectra.

Mascot Score Improvement by Data Preprocessing

We evaluated how the Mascot peptide identification scores were improved by preprocessing of tandem mass spectra before database searching. The scores of peptide-spectrum-match identified in method 1 and 10 in iTRAQ4 data set and method 1 and 16 in iTRAQ8 data set were extracted (See supplemental Table S11). Fig. 3 indicates that data preprocessing before database searching improved the identification scores from a majority of spectra at both iTRAQ4 and iTRAQ8 data sets. p value 2.2e-16 from Wilcoxon rank sum test indicates that the score improvement in iTRAQ8 data set was significantly higher than in iTRAQ4 data set.

Fig. 3. — **Mascot score improvement after preprocessing tandem mass spectra.** Both top two density plots and bottom two violin plots indicated that the majority of the spectra gained score improvement with data preprocessing in both iTRAQ4 and iTRAQ8 data sets. p value 2.2e-16 from Wilcoxon rank sum test indicates that the score improvement in iTRAQ8 data set was significantly higher than in iTRAQ4 data set.

C-terminal Peptide Identification

Because the tryptic peptide generated from the protein carboxyl terminus (C-terminal peptide) usually does not follow the assumption that the Y1 ion is either Y1(K) or Y1(R), which we use for calculating the b/y free window, we checked how those peptides were identified before and after data preprocessing. The scores of C-terminal peptide identified in method 1 and 10 in iTRAQ4 data set and method 1 and 16 in iTRAQ8 data set were extracted (See supplemental Table S12). In Fig. 4, the top two Venn diagrams indicate that preprocessing also increases C- terminal peptide identification sensitivity in both iTRAQ4 and iTRAQ8 data set, and the bottom two scatter plots indicate that the Mascot scores from a majority of commonly identified C- terminal peptides also increased after preprocessing.

Fig. 4. — **C-terminal peptide identification improvement in iTRAQ4 and iTRAQ8 data sets after preprocessing tandem mass spectra.** The top two Venn diagrams indicated that preprocessing also increased C-terminal peptide identification sensitivity in both iTRAQ4 and iTRAQ8 data sets. The bottom two scatter plots indicate that the Mascot scores of the majority of commonly identified C-terminal peptides were also increased after preprocessing.

DISCUSSION

We annotated the high frequency ions in isobarically labeled tandem mass spectra. The majority of high frequency ions in iTRAQ4 and iTRAQ8 data sets could be annotated as reporter ions (Rep⁺), isobaric tag ions (Label⁺), Y1 ions, or ions in the b/y free window. More unannotated ions were observed in iTRAQ8 data set than in iTRAQ4 data set. Such a phenomenon can be caused by the more complex iTRAQ8 isobaric labeling tag compared with iTRAQ4, which could introduce more byproduct ions by isolation of mass spectrometry. Reporter ions and isobaric tag ions are isobaric ions and can be confidently removed from the MS/MS spectrum for database searching. The other high frequency ions in the b/y free windows are very possibly not introduced by the peptide itself but by either the isobaric labeling procedure or mass spectrometry system. Those ions might be removed to de-noise the tandem mass spectra for improving identification sensitivity. But there is still a possibility that the ions in the low mass range b/y free window are actually multiply charged b/y ions but that their charges cannot be estimated from mass spectrum, thus, removing such ions may decrease the identification sensitivity. The benefit of removing the ions in b/y free window may be varied between different isobaric labeling methods and different searching engines. With less ions in low mass b/y free window in iTRAQ4 than in iTRAQ8 data set (supplemental Fig. S1), removing isobaric ions only may be more suitable for iTRAQ4 data and removing ions in low mass b/y free window may be more suitable for iTRAQ8 data. We also observed a few high frequency ions outside of b/y free windows, including 429.0888. Without confidential evidence, we did not remove them in this study.

We also examined the factors that might affect the sensitivity of peptide identification. Our results showed that the combination of deisotoping/deconvolution and removing isobaric related ions significantly improved the Mascot identification sensitivity and moderately improved MyriMatch, X! Tandem, and OMSSA identification sensitivity for both iTRAQ4 and iTRAQ8 data sets. Comet was only slightly affected by preprocessing procedure. We further validated our results using an independent TMT6 data set using Mascot. The analysis results from this TMT6 data set also showed similar peptide/protein identification sensitivity improvement (See supplemental Table S13). Based on our results, we conclude that removing isobaric related ions combined with deisotoping/deconvolution is highly recommended for preprocessing isobarically labeled MS/MS spectra before database search, especially for Mascot search engine.

The complexity of the isobaric labeling tag significantly affects the identification sensitivity improvement after preprocessing tandem mass spectra. Keeping the top 10 ions in each 100 Dalton window slightly decreased the Mascot peptide identification sensitivity in iTRAQ4 data sets, regardless of whether it was combined with deisotoping and deconvolution. It may indicate that the high-resolution mass spectra in our iTRAQ4 data set were very clean that keeping the top 10 ions in each 100 Daltons was not necessary during data preprocessing. This finding may require additional validation in other independent iTRAQ4 data sets. On the other hand, keeping the top 10 ions in each 100 Dalton window slightly increased the Mascot peptide identification sensitivity in the iTRAQ8 data sets. Comparing to method 1, a combination of deisotoping/deconvolution, keeping the top 10 ions in each 100 Dalton window, and removing isobaric related ions (method 16) improved identified spectra, peptides, and two-hit proteins for iTRAQ8 over iTRAQ4 by 32.7%, 36.4%, and 18.5% respectively. This suggests that preprocessing is more crucial for iTRAQ8 than iTRAQ4 data.

We validated the identification improvement of the C-terminal peptides. C-terminal peptides might not end with “K” or “R,” which voids our assumption for b/y free window calculation that Y1 ions were either from K or R. The result indicated that data preprocessing not only improved the Mascot scores of major C-terminal peptides but also increased the identification sensitivity of C-terminal peptides: even the ions in low mass b/y free window were removed.

We implemented TurboRawToMGF (v2.0) with a user friendly GUI. The GUI allows users to transfer the data generated from high-resolution mass spectrometry (such as Thermo LTQ-OrbitrapVelos) to mascot generic format file conveniently. TurboRawToMGF also supports filtering spectra based on user-defined mass ranges. For example, the user may define 428.75–429.25 to remove the 429.0888 ion. TurboRawToMGF (v2.0) offers many other conveniences to users. For example, the conversion from mzData and mzXML format file to mascot generic format file is supported, and conversion of multiple files in batch mode is also provided. TurboRawToMGF is free, and it will be consistently supported in the coming years.

Supplementary Material

Supplemental Data

supp_14_2_405__index.html^{(1.6KB, html)}

Acknowledgments

We thank GSA program by Thermo. We are grateful to Margot Bjoring for her editorial support. The data deposition to the ProteomeXchange Consorium was supported by PRIDE Team, EBI.

Footnotes

Author contributions: Q.S., J.D., Y.S., and R.Z. designed research; R.L., Q.L., Z.S., and C.L. performed research; Q.S. analyzed data; Q.S., R.L., J.D., and Y.G. wrote the paper.

* This work was supported by grants from Ministry of Science and Technology (2011CB910200, 2014CB910500, 2011CB910600), and a grant from the National Natural Science Foundation of China (31130034).

This article contains supplemental Fig. S1 and Tables S1 to S13.

¹ The abbreviations used are:

MS/MS: Tandem Mass Spectrometry
LC: Liquid Chromatography
m/z: mass-to-charge ratios
SILAC: stable isotope labeling by amino acids in cell culture
iTRAQ: isobaric tag for relative and absolute quantification
TMT: tandem mass tag.

REFERENCES

1. Yates J. R., 3rd, Gilchrist A., Howell K. E., Bergeron J. J. (2005) Proteomics of organelles and large cellular structures. Nat. Rev. Mol. Cell Biol. 6, 702–714 [DOI] [PubMed] [Google Scholar]
2. Walther T. C., Mann M. (2010) Mass spectrometry-based proteomics in cell biology. J. Cell Biol. 190, 491–500 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Wolters D. A., Washburn M. P., Yates J. R., 3rd (2001) An automated multidimensional protein identification technology for shotgun proteomics. Anal. Chem. 73, 5683–5690 [DOI] [PubMed] [Google Scholar]
4. Mann M., Kelleher N. L. (2008) Precision proteomics: the case for high-resolution and high mass accuracy. Proc. Natl. Acad. Sci. U.S.A. 105, 18132–18138 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Olsen J. V., Schwartz J. C., Griep-Raming J., Nielsen M. L., Damoc E., Denisov E., Lange O., Remes P., Taylor D., Splendore M., Wouters E. R., Senko M., Makarov A., Mann M., Horning S. (2009) A dual pressure linear ion trap Orbitrap instrument with very high sequencing speed. Mol. Cell Proteomics 8, 2759–2769 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Perkins D. N., Pappin D. J., Creasy D. M., Cottrell J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 [DOI] [PubMed] [Google Scholar]
7. Eng J. K., McCormack A. L., Yates J. R. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectr. 5, 976–989 [DOI] [PubMed] [Google Scholar]
8. Carvalho P. C., Xu T., Han X., Cociorva D., Barbosa V. C., Yates J. R., 3rd (2009) YADA: a tool for taking the most out of high-resolution spectra. Bioinformatics 25, 2734–2736 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Liu X., Inbar Y., Dorrestein P. C., Wynne C., Edwards N., Souda P., Whitelegge J. P., Bafna V., Pevzner P. A. (2010) Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol. Cell Proteomics 9, 2772–2782 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Ong S. E., Blagoev B., Kratchmarova I., Kristensen D. B., Steen H., Pandey A., Mann M. (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell Proteomics 1, 376–386 [DOI] [PubMed] [Google Scholar]
11. Bantscheff M., Schirle M., Sweetman G., Rick J., Kuster B. (2007) Quantitative mass spectrometry in proteomics: a critical review. Anal. Bioanal. Chem. 389, 1017–1031 [DOI] [PubMed] [Google Scholar]
12. Thompson A., Schafer J., Kuhn K., Kienle S., Schwarz J., Schmidt G., Neumann T., Johnstone R., Mohammed A. K., Hamon C. (2003) Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal. Chem. 75, 1895–1904 [DOI] [PubMed] [Google Scholar]
13. Ross P. L., Huang Y. N., Marchese J. N., Williamson B., Parker K., Hattan S., Khainovski N., Pillai S., Dey S., Daniels S., Purkayastha S., Juhasz P., Martin S., Bartlet-Jones M., He F., Jacobson A., Pappin D. J. (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell Proteomics 3, 1154–1169 [DOI] [PubMed] [Google Scholar]
14. Aggarwal K., Choe L. H., Lee K. H. (2006) Shotgun proteomics using the iTRAQ isobaric tags. Brief. Funct. Genomics Proteomics 5, 112–120 [DOI] [PubMed] [Google Scholar]
15. Choe L., D'Ascenzo M., Relkin N. R., Pappin D., Ross P., Williamson B., Guertin S., Pribil P., Lee K. H. (2007) 8-plex quantitation of changes in cerebrospinal fluid protein expression in subjects undergoing intravenous immunoglobulin treatment for Alzheimer's disease. Proteomics 7, 3651–3660 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Dayon L., Hainard A., Licker V., Turck N., Kuhn K., Hochstrasser D. F., Burkhard P. R., Sanchez J. C. (2008) Relative quantification of proteins in human cerebrospinal fluids by MS/MS using 6-plex isobaric tags. Anal. Chem. 80, 2921–2931 [DOI] [PubMed] [Google Scholar]
17. Leitner A., Lindner W. (2009) Chemical tagging strategies for mass spectrometry-based phospho-proteomics. Methods Mol. Biol. 527, 229–243 [DOI] [PubMed] [Google Scholar]
18. Treumann A., Thiede B. (2010) Isobaric protein and peptide quantification: perspectives and issues. Expert Rev. Proteomics 7, 647–653 [DOI] [PubMed] [Google Scholar]
19. Coombs K. M. (2011) Quantitative proteomics of complex mixtures. Expert Rev. Proteomics 8, 659–677 [DOI] [PubMed] [Google Scholar]
20. Wiese S., Reidegeld K. A., Meyer H. E., Warscheid B. (2007) Protein labeling by iTRAQ: a new tool for quantitative mass spectrometry in proteome research. Proteomics 7, 340–350 [DOI] [PubMed] [Google Scholar]
21. Prudova A., auf dem Keller U., Butler G. S., Overall C. M. (2010) Multiplex N-terminome analysis of MMP-2 and MMP-9 substrate degradomes by iTRAQ-TAILS quantitative proteomics. Mol. Cell Proteomics 9, 894–911 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Sinclair J., Timms J. F. (2011) Quantitative profiling of serum samples using TMT protein labelling, fractionation and LC-MS/MS. Methods 54, 361–369 [DOI] [PubMed] [Google Scholar]
23. Hung C. W., Tholey A. (2012) Tandem mass tag protein labeling for top-down identification and quantification. Anal. Chem. 84, 161–170 [DOI] [PubMed] [Google Scholar]
24. Nielsen P. A., Olsen J. V., Podtelejnikov A. V., Andersen J. R., Mann M., Wisniewski J. R. (2005) Proteomic mapping of brain plasma membrane proteins. Mol. Cell Proteomics 4, 402–408 [DOI] [PubMed] [Google Scholar]
25. Cox J., Mann M. (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 [DOI] [PubMed] [Google Scholar]
26. Eng J. K., Jahan T. A., Hoopmann M. R. (2013) Comet: an open-source MS/MS sequence database search tool. Proteomics 13, 22–24 [DOI] [PubMed] [Google Scholar]
27. Tabb D. L., Fernando C. G., Chambers M. C. (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6, 654–661 [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Geer L. Y., Markey S. P., Kowalak J. A., Wagner L., Xu M., Maynard D. M., Yang X., Shi W., Bryant S. H. (2004) Open mass spectrometry search algorithm. J. Proteome Res. 3, 958–964 [DOI] [PubMed] [Google Scholar]
29. Craig R., Beavis R. C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 [DOI] [PubMed] [Google Scholar]
30. Vaudel M., Barsnes H., Berven F. S., Sickmann A., Martens L. (2011) SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 11, 996–999 [DOI] [PubMed] [Google Scholar]
31. Sheng Q., Dai J., Wu Y., Tang H., Zeng R. (2012) BuildSummary: using a group-based approach to improve the sensitivity of peptide/protein identification in shotgun proteomics. J. Proteome Res. 11, 1494–1502 [DOI] [PubMed] [Google Scholar]
32. Mann B., Madera M., Sheng Q., Tang H., Mechref Y., Novotny M. V. (2008) ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics. Rapid Commun. Mass Spectrom. 22, 3823–3834 [DOI] [PubMed] [Google Scholar]
33. Vizcaino J. A., Deutsch E. W., Wang R., Csordas A., Reisinger F., Rios D., Dianes J. A., Sun Z., Farrah T., Bandeira N., Binz P. A., Xenarios I., Eisenacher M., Mayer G., Gatto L., Campos A., Chalkley R. J., Kraus H. J., Albar J. P., Martinez-Bartolome S., Apweiler R., Omenn G. S., Martens L., Jones A. R., Hermjakob H. (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data

supp_14_2_405__index.html^{(1.6KB, html)}

supp_O114.041376_mcp.O114.041376-1.pdf^{(224.4KB, pdf)}

supp_O114.041376_mcp.O114.041376-2.xlsx^{(2MB, xlsx)}

supp_O114.041376_mcp.O114.041376-3.xlsx^{(25.1KB, xlsx)}

Data Availability Statement

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (33) via the PRIDE partner repository with the data set identifier PXD000994 and DOI 10.6019/PXD000994.

To access the data please visit: http://tinyurl.com/pdbkesj

Username: reviewer06796@ebi.ac.uk

Password: jWjYoiuT

[B1] 1. Yates J. R., 3rd, Gilchrist A., Howell K. E., Bergeron J. J. (2005) Proteomics of organelles and large cellular structures. Nat. Rev. Mol. Cell Biol. 6, 702–714 [DOI] [PubMed] [Google Scholar]

[B2] 2. Walther T. C., Mann M. (2010) Mass spectrometry-based proteomics in cell biology. J. Cell Biol. 190, 491–500 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3. Wolters D. A., Washburn M. P., Yates J. R., 3rd (2001) An automated multidimensional protein identification technology for shotgun proteomics. Anal. Chem. 73, 5683–5690 [DOI] [PubMed] [Google Scholar]

[B4] 4. Mann M., Kelleher N. L. (2008) Precision proteomics: the case for high-resolution and high mass accuracy. Proc. Natl. Acad. Sci. U.S.A. 105, 18132–18138 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5. Olsen J. V., Schwartz J. C., Griep-Raming J., Nielsen M. L., Damoc E., Denisov E., Lange O., Remes P., Taylor D., Splendore M., Wouters E. R., Senko M., Makarov A., Mann M., Horning S. (2009) A dual pressure linear ion trap Orbitrap instrument with very high sequencing speed. Mol. Cell Proteomics 8, 2759–2769 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6. Perkins D. N., Pappin D. J., Creasy D. M., Cottrell J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 [DOI] [PubMed] [Google Scholar]

[B7] 7. Eng J. K., McCormack A. L., Yates J. R. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectr. 5, 976–989 [DOI] [PubMed] [Google Scholar]

[B8] 8. Carvalho P. C., Xu T., Han X., Cociorva D., Barbosa V. C., Yates J. R., 3rd (2009) YADA: a tool for taking the most out of high-resolution spectra. Bioinformatics 25, 2734–2736 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9. Liu X., Inbar Y., Dorrestein P. C., Wynne C., Edwards N., Souda P., Whitelegge J. P., Bafna V., Pevzner P. A. (2010) Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol. Cell Proteomics 9, 2772–2782 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10. Ong S. E., Blagoev B., Kratchmarova I., Kristensen D. B., Steen H., Pandey A., Mann M. (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell Proteomics 1, 376–386 [DOI] [PubMed] [Google Scholar]

[B11] 11. Bantscheff M., Schirle M., Sweetman G., Rick J., Kuster B. (2007) Quantitative mass spectrometry in proteomics: a critical review. Anal. Bioanal. Chem. 389, 1017–1031 [DOI] [PubMed] [Google Scholar]

[B12] 12. Thompson A., Schafer J., Kuhn K., Kienle S., Schwarz J., Schmidt G., Neumann T., Johnstone R., Mohammed A. K., Hamon C. (2003) Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal. Chem. 75, 1895–1904 [DOI] [PubMed] [Google Scholar]

[B13] 13. Ross P. L., Huang Y. N., Marchese J. N., Williamson B., Parker K., Hattan S., Khainovski N., Pillai S., Dey S., Daniels S., Purkayastha S., Juhasz P., Martin S., Bartlet-Jones M., He F., Jacobson A., Pappin D. J. (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell Proteomics 3, 1154–1169 [DOI] [PubMed] [Google Scholar]

[B14] 14. Aggarwal K., Choe L. H., Lee K. H. (2006) Shotgun proteomics using the iTRAQ isobaric tags. Brief. Funct. Genomics Proteomics 5, 112–120 [DOI] [PubMed] [Google Scholar]

[B15] 15. Choe L., D'Ascenzo M., Relkin N. R., Pappin D., Ross P., Williamson B., Guertin S., Pribil P., Lee K. H. (2007) 8-plex quantitation of changes in cerebrospinal fluid protein expression in subjects undergoing intravenous immunoglobulin treatment for Alzheimer's disease. Proteomics 7, 3651–3660 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16. Dayon L., Hainard A., Licker V., Turck N., Kuhn K., Hochstrasser D. F., Burkhard P. R., Sanchez J. C. (2008) Relative quantification of proteins in human cerebrospinal fluids by MS/MS using 6-plex isobaric tags. Anal. Chem. 80, 2921–2931 [DOI] [PubMed] [Google Scholar]

[B17] 17. Leitner A., Lindner W. (2009) Chemical tagging strategies for mass spectrometry-based phospho-proteomics. Methods Mol. Biol. 527, 229–243 [DOI] [PubMed] [Google Scholar]

[B18] 18. Treumann A., Thiede B. (2010) Isobaric protein and peptide quantification: perspectives and issues. Expert Rev. Proteomics 7, 647–653 [DOI] [PubMed] [Google Scholar]

[B19] 19. Coombs K. M. (2011) Quantitative proteomics of complex mixtures. Expert Rev. Proteomics 8, 659–677 [DOI] [PubMed] [Google Scholar]

[B20] 20. Wiese S., Reidegeld K. A., Meyer H. E., Warscheid B. (2007) Protein labeling by iTRAQ: a new tool for quantitative mass spectrometry in proteome research. Proteomics 7, 340–350 [DOI] [PubMed] [Google Scholar]

[B21] 21. Prudova A., auf dem Keller U., Butler G. S., Overall C. M. (2010) Multiplex N-terminome analysis of MMP-2 and MMP-9 substrate degradomes by iTRAQ-TAILS quantitative proteomics. Mol. Cell Proteomics 9, 894–911 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22. Sinclair J., Timms J. F. (2011) Quantitative profiling of serum samples using TMT protein labelling, fractionation and LC-MS/MS. Methods 54, 361–369 [DOI] [PubMed] [Google Scholar]

[B23] 23. Hung C. W., Tholey A. (2012) Tandem mass tag protein labeling for top-down identification and quantification. Anal. Chem. 84, 161–170 [DOI] [PubMed] [Google Scholar]

[B24] 24. Nielsen P. A., Olsen J. V., Podtelejnikov A. V., Andersen J. R., Mann M., Wisniewski J. R. (2005) Proteomic mapping of brain plasma membrane proteins. Mol. Cell Proteomics 4, 402–408 [DOI] [PubMed] [Google Scholar]

[B25] 25. Cox J., Mann M. (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 [DOI] [PubMed] [Google Scholar]

[B26] 26. Eng J. K., Jahan T. A., Hoopmann M. R. (2013) Comet: an open-source MS/MS sequence database search tool. Proteomics 13, 22–24 [DOI] [PubMed] [Google Scholar]

[B27] 27. Tabb D. L., Fernando C. G., Chambers M. C. (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6, 654–661 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] 28. Geer L. Y., Markey S. P., Kowalak J. A., Wagner L., Xu M., Maynard D. M., Yang X., Shi W., Bryant S. H. (2004) Open mass spectrometry search algorithm. J. Proteome Res. 3, 958–964 [DOI] [PubMed] [Google Scholar]

[B29] 29. Craig R., Beavis R. C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 [DOI] [PubMed] [Google Scholar]

[B30] 30. Vaudel M., Barsnes H., Berven F. S., Sickmann A., Martens L. (2011) SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 11, 996–999 [DOI] [PubMed] [Google Scholar]

[B31] 31. Sheng Q., Dai J., Wu Y., Tang H., Zeng R. (2012) BuildSummary: using a group-based approach to improve the sensitivity of peptide/protein identification in shotgun proteomics. J. Proteome Res. 11, 1494–1502 [DOI] [PubMed] [Google Scholar]

[B32] 32. Mann B., Madera M., Sheng Q., Tang H., Mechref Y., Novotny M. V. (2008) ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics. Rapid Commun. Mass Spectrom. 22, 3823–3834 [DOI] [PubMed] [Google Scholar]

[B33] 33. Vizcaino J. A., Deutsch E. W., Wang R., Csordas A., Reisinger F., Rios D., Dianes J. A., Sun Z., Farrah T., Bandeira N., Binz P. A., Xenarios I., Eisenacher M., Mayer G., Gatto L., Campos A., Chalkley R. J., Kraus H. J., Albar J. P., Martinez-Bartolome S., Apweiler R., Omenn G. S., Martens L., Jones A. R., Hermjakob H. (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Preprocessing Significantly Improves the Peptide/Protein Identification Sensitivity of High-resolution Isobarically Labeled Tandem Mass Spectrometry Data*

Quanhu Sheng

Rongxia Li

Jie Dai

Qingrun Li

Zhiduan Su

Yan Guo

Chen Li

Yu Shyr

Rong Zeng

Abstract

EXPERIMENTAL PROCEDURES

Sample Preparation

LC-MS/MS Analysis

b/y Free Windows

Ion Frequency and Abundance Analysis

Data Preprocessing

Database Searching

Software Development

Data Availability

RESULTS

Isobaric Related Mass Range

Table I. Ion characteristics of isobaric labeling methods.

Ion Frequency and Abundance

Table II. High frequency ions in iTRAQ4 tandem mass spectra.

Table III. High frequency ions in iTRAQ8 tandem mass spectra. Rep+: singly charged reporter ion, Label+: singly charged isobaric tag ion, Y1(R): y1 ion from peptide with 3′ terminal amino acid R, Y1(K): y1 ion from peptide with 3′ terminal amino acid K, b/y free: ion in the b/y free window.

Identification Sensitivity Improvement

Table IV. 16 preprocessing methods with different combinations of three preprocessing steps.

Fig. 1.

Fig. 2.

Table V. Identification result from iTRAQ4 dataset using Mascot.

Table VI. Identification result from iTRAQ8 dataset using Mascot.

Mascot Score Improvement by Data Preprocessing

Fig. 3.

C-terminal Peptide Identification

Fig. 4.

DISCUSSION

Supplementary Material

Acknowledgments

Footnotes

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Preprocessing Significantly Improves the Peptide/Protein Identification Sensitivity of High-resolution Isobarically Labeled Tandem Mass Spectrometry Data^*