Abstract
Applying high-throughput Top-Down MS to an entire proteome requires a yet-to-be-established model for data processing. Since Top-Down is becoming possible on a large scale, we report our latest software pipeline dedicated to capturing the full value of intact protein data in automated fashion. For intact mass detection, we combine algorithms for processing MS1 data from both isotopically resolved (FT) and charge-state resolved (ion trap) LC-MS data, which are then linked to their fragment ions for database searching using ProSight. Automated determination of human keratin and tubulin isoforms is one result. Optimized for the intricacies of whole proteins, new software modules visualize proteome-scale data based on the LC retention time and intensity of intact masses and enable selective detection of PTMs to automatically screen for acetylation, phosphorylation, and methylation. Software functionality was demonstrated using comparative LC-MS data from yeast strains in addition to human cells undergoing chemical stress. We further these advances as a key aspect of realizing Top-Down MS on a proteomic scale.
Keywords: Bioinformatics, Data reduction, Deconvolution, Intact protein, Tandem MS, Top down
1 Introduction
The steadily advancing field of high-throughput Top-Down Proteomics [1] has shown impressive advantages over Bottom-Up in affording certain types of information. For example, Top-Down can reveal the presence of diverse types of PTMs residing on the whole protein, as well as detecting single nucleotide polymorphisms and various splice forms [2]. Combinations of these variations encode detailed biological information and are involved in vital cellular activities, such as replication, signaling, and transcriptional regulation [3]. In addition, whole protein analysis permits the determination of relative abundances of modified forms in a quantitative measurement [4]. As the limits of Top-Down are extended, proteome scale runs similar to those seen with Bottom-Up are on the horizon. Advances in separation techniques have augmented parallel improvements to the mass spectrometer for improved handling of high sample complexity and improving throughput metrics [5, 6]. With the increased capabilities, large sets of MS data are generated that are beyond the scope of manual investigation and signal a success moment for Top-Down.
The “deconvolution” of protein charge states has long been used to determine intact protein mass values from pseudo-molecular ions [7, 8]. In particular, the Z-score algorithm was particularly useful for charge state deconvolution of simple, low-resolution ESI-MS data [9]. For high-resolution FTMS, a modified version of the thorough high-resolution analysis of spectra by Horn (THRASH) algorithm [10] was employed on FTMS spectra to take advantage of isotopically resolved peaks [11]. When the boundaries of molecular weight (MW) are pushed, isotope distributions of highly charged protein ions become harder to resolve due to adducts and modifications present in the sample. Therefore, detection of charge states in the ion trap (IT) of IT-FTMS hybrids becomes an attractive possibility for intact mass detection in complex mixtures of larger proteins [12].
Immense effort has been poured into PTM detection and identification because of the far-reaching biological implications of protein modifications [13]. Mining MS data for PTMs has a rich history, with older tools such as FindMod able to find putative modifications from experimental MALDI-TOF peptide mapping data [14]. Newer tools such as A-score, PTMFinder, and SLoMo have been designed for automated PTM site searching beyond normal search engines by doing large searches on peptide databases for particular PTMs such as phosphorylation [15–17]. Although many PTMs can be found, large-scale PTM analysis of MS/ MS spectra has some limitations. During fragmentation, tryptic peptides are not always able to retain PTMs that are otherwise stable with Top-Down MS/MS methods [18]. Current Top-Down data analysis tools (e.g. ProSightPC 2.0) are effective in characterization of protein modifications, but only in cases where proteins are identified through fragmentation [19]. Since the number of proteins detected by intact mass in a typical Top-Down experiment exceeds those identified by fragmentation, it would be advantageous to have software which can determine the presence of PTMs by detection of precursor mass and/or mass differences. Here, the Giddings group has developed a MS-Calculator, PROCLAME, to produce putative PTMs found on proteins measured at TOF-type resolution [20]. Rapid MS detection of PTMs, by virtue of their mass shifts, would provide a high-throughput screen at FTMS-type resolution for targets to be identified and characterized by MS/MS.
In addition to rapid PTM detection, data visualization for intra-run and comparative studies will be important for profiling intact proteins on a large scale. Protein expression can be visualized using virtual 2-D maps of scan number and molecular mass for MS-determined intact precursors [21, 22] and recently, intact protein profiling has been demonstrated for MR-1 cell lysate proteins, with MS/MS of intact proteins not typically achieved [23–25]. Differential analyses that have been conducted with TOF instruments between various diseased/treated cell lines or different biological states are bound by an inability to identify the detected intact precursors without using a Bottom-Up approach [26]. With this strong background in protein visualization already established, the next clear step is to create software that can detect, map, and identify modified proteins.
This study describes several improvements for processing of Top-Down data streams. By building on our previous platforms such as cRAWler and ProSightPC [19, 27], which all have their basis in the early study of Senko et al. [28, 29], we can now handle most of the data idiosyncrasies obtained through high-throughput Top-Down Proteomics platforms currently in operation [6, 12]. A unified approach to intact protein detection algorithms for both high- and low-resolution spectra has been established with associated capacities for PTM finding and protein/PTM visualization. This expansive platform has been set up to provide the analysis tools necessary to assure that intact mass information is captured by high-throughput implementations of Top-Down MS.
2 Materials and methods
2.1 HeLa and yeast cellular manipulations
Cytosolic and nuclear extracts were prepared from HeLa S3 cells as described [30]. Yeast were grown to an OD600 of 0.6 in YPD media. Samples enriched in nuclear proteins such as histones were recovered from yeast cells using sulfuric acid extraction, as described previously [31]. The histone-enriched samples were subjected to RPLC, with MS monitoring, also as described previously [31].
2.2 Prefractionation of human proteins using solution IEF-Gel-Eluted Liquid Fraction Entrapment Electrophoresis
HeLa proteins (~2 mg) were reduced and alkylated, then precipitated with cold acetone and focused using an eight-channel solution IEF (sIEF) system as described previously [32]. Following IEF, selected fractions were precipitated using cold acetone and subsequently separated using multiplexed Gel-Eluted Liquid Fraction Entrapment Electrophoresis (GELFrEE) as described elsewhere [33]. Briefly, tubes were cast with polyacrylamide at 12% T for resolving gels (1 cm length) and 4% T for the stacking gels (300 μL volume). A constant voltage of 240 V was applied during separation, and fractions were collected for 1.5 h starting after the elution of the dye front. The electrolyte chambers of the device, as well as the void volume above the gel column, were completely filled with electrode running buffer (0.192 M glycine, 0.025 M Tris, 0.1% SDS) [34].
2.3 LC-tandem MS
Selected sIEF-GELFrEE fractions underwent SDS removal using a method described previously [35]. Methodology for the capillary-LC was recently described by our lab [6, 12]. In Summary, 10 μL of the cleaned up sample was injected onto a 10 cm × 75 μm id PLRP-S (5 μm particle size) capillary column (New Objective, Woburn, MA, USA) fitted with a 2 cm × 150 μm id trap column (same particles). A flow rate of 300 nL/min from an Eksigent 1D Plus system (Eksigent, Dublin, CA, USA) was used under the following gradient: 0 min, 5% B (ACN+0.2% formic acid); 5 min, 20% B; 50 min, 55% B, 55 min, 85% B, 60 min, 5% B. Samples were analyzed on a 12 Tesla linear trap quadrupole FT Ultra (Thermo Fisher Scientific, San Jose, CA). MS Parameters for sIEF-GELFrEE fractions containing proteins <25 kDa were similar to those described by Lee et al. [6], whereas fractions >25 kDa used MS parameters recently worked out by Vellaichamy et al. [12]. Protein identification using fragment ion masses was accomplished using ProSightPC (Thermo Fisher Scientific), with optional multiplexed searching engaged [36]. The Database Manager within ProSightPC was used to create yeast or human databases that were shotgun annotated [2] from UniProt data propagated via Swiss-Prot flat files. UniProt contained ~25 000 accession numbers from Swiss-Prot and ~75 000 entries from TrEMBL.
2.4 Intact mass detection from MS1 spectra
A modified version of Online Automation cRAWler [27] that we call cRAWler-Plus was used to analyze MS1 scans from Thermo Scientific. raw files. An algorithm, kDecon, was developed and included alongside the modified THRASH algorithm already present in the cRAWler application, such that kDecon processed charge state distributions from ITMS and THRASH interpreted isotopic distributions resolved by FTMS. The kDecon was based on the Z-score algorithm [9] and expanded using an adapted noise reduction approach [37]. Before kDecon an exponential moving average (EMA) was applied to the data points of the spectrum to reduce the level of noise. Every point below the EMA was discarded as noise. Additionally, an open source implementation of QuickHull (based on the Swing library from The Concord Consortium, Concord, MA) was integrated into our software platform as an extra noise filter [38]. The QuickHull code was further optimized for both faster run time and complete charge-state coverage.
The output of kDecon is a mass that is the average of every peak in a given charge state, found by the equation
| (1) |
where H is the high m/z, n is the number of neighboring peaks, and z, the charge, is calculated as
| (2) |
with L being the low m/z and 1.0078 being the mass of a proton in Da.
In cases of low abundance and high MW species, multiple scans (up to 2 min) were averaged to increase signal intensity. Detected peaks from THRASH and kDecon were added to an Automation Warehouse database for storage after the filtering and binning described previously [27].
kDecon iteratively matches peaks to find the highest scoring distribution such that decreasing the amount of noise increases the accuracy of the resulting detected masses. Although averaging the number of acquired spectra will increase the S/N to an extent, a preprocessing filter that limits the number of noise peaks entered into kDecon is an additional advantage. Noise reduction methods are essential because substantial baseline noise is often present in LC-IT spectra of intact proteins. Furthermore, the iterative matching of kDecon for more than one protein mass in a spectrum will have largely noisy results for high masses if noise reduction measures are not taken. With no data reduction, the noise peaks often overwhelm the signal peaks because there are 15 600 data points in each IT spectrum when acquired using our centroid mode with setting with 0.0833 m/z increments between each point. When this occurs, more noise than signal is matched to a charge-state distribution by kDecon. A moving average was added to discard noise valleys while maintaining signal peaks. The EMA was chosen over a simple moving average because the EMA is weighted toward the current data points. The weighting allows for a better tracing of the multiply charged peaks present in IT spectra. As a first pass, the EMA computes the moving average of the data and removes all data points below the average. This prevents lower abundance peaks from interfering with kDecon. For further noise reduction, QuickHull was run on the smoothed spectrum and retained only the highest intensity remaining peaks. QuickHull is a variation of the convex hull algorithm that removes internal points and therefore leaves only the outliers, which, here, are the highest intensity peaks that make up the boundary of the convex set [38]. The lowest intensity, lowest m/z peak is chosen by QuickHull first. Then the greatest intensity peak is picked, followed by the lowest intensity, highest m/z peak being selected. The points form a triangle with all internal points being excluded. Multiple iterations of QuickHull are done to select a wide range of high-intensity peaks that encompass the majority of charge-state distributions present in each spectrum. After each iteration, the external peaks are added to a list. After all QuickHull iterations are completed, the kDecon algorithm then fits the list of remaining peaks into charge-state distributions. The distributions are scored by the function from the Z-score algorithm:
| (3) |
with n being the number of peaks in the distribution. The mass is assigned to the charge distribution receiving the highest score.
2.5 PTMcRAWler
Another new program, PTMcRAWler, was used to locate both singly-and multiply-modified protein forms. PTMcRAWler traverses through a list of THRASH-inferred mass values from cRAWler-Plus, finding mass shifts that match theoretical PTMs according to a user-defined list. If masses are found within a 0.1 Da tolerance and have identical charge, it is determined to be a match and is added to a PTM list for optional visualization (see Section 2.6). Using a THRASH S/N setting of 10:1 limits false positives and reliably finds PTMs.
2.6 In silico proteome visualization
Proteome Display was used for all protein heatmaps shown here [6]. The in-house program had several features added, including the ability to view PTM mass differences. Either a single modification or the multiple modifications can be shown selectively on the PTM heatmap. Additionally, more than one .raw file can be visualized on a single map to allow comparisons of targeted LC-MS runs or enable large-scale proteome visualization schemes in one image.
3 Results and discussion
3.1 Integrated workflow
An overview of integrated data processing routines is shown in Fig. 1. Although general, it is augmented by using sample streams emerging from the GELFrEE-RPLC platform previously shown to achieve robust performance for Top-Down MS of complex mixtures [6, 12]. Resulting data files were analyzed by cRAWler, which applies THRASH to the high-resolution FTMS scans and kDecon to the lower-resolution ITMS scans (Fig. 1, top), and fills a database with the list of intact protein mass values. ProSightPC takes in fragment ions from MS/MS or nozzle skimmer and outputs identified proteins [19, 39, 40]. The processing supplies detected intact masses that can be merged with the ProSightPC output for enhanced identification confidence and precision (vide infra). Visualization of the detected masses (Fig. 1, bottom) allows for high level browsing for patterns of mass occurrences, with selective mapping of particular mass differences (Δm’s) also enabled (Fig. 5). Proteins of interest, such as those with putative PTMs requiring (better) localization, can be re-run through the pipeline for additional microcharacterization (Fig. 1, bottom left).
Figure 1.

Top-Down data pipeline for capturing detailed information on protein forms. Front end separation techniques generate samples for capillary-LCMS. Intact mass detection algorithms, THRASH and kDecon are applied to data acquired and ProSight identifies proteins using fragment ions. Visual maps can be made from intact mass values and modifications discerned and selectively displayed using PTMcRAWler (Fig. 5). In an iterative process (dotted lines), protein targets can be re-subjected to sample creation for detailed molecular characterization not achieved in the first pass.
Figure 5.

Two data visualization software schemes that show either intact mass values directly (top) or PTM mass differences (middle three panels). The THRASH-detected proteins from fractionated human cell extracts were mapped according to mass and retention time (top image). The middle panels display the output of PTMcRAWler when set to search for phosphorylations, methylations, and acetylations, from left to right, respectively.
3.2 Detection of the high mass proteome
Proteome samples run via Top-Down introduce a large complexity created by adduction, natural modifications, and chemical noise which is magnified at high mass where dozens of charge states fall in the m/z range of detection. As a result, FT detection is hindered by a lack of resolved isotopic distributions at baseline resolution on an LC-MS timescale. We have found THRASH-based mass inferences above ~35 kDa at 12 Tesla become undependable [12]. Therefore, for increased sensitivity, IT spectra are used to detect proteins of moderate size and larger. Using information based on the charge distributions from these IT scans, the kDecon algorithm generates average masses (as opposed to monoisotopic masses obtained from FTMS). The IT often required 20–50 μ scans to obtain defined charge states for proteins in the 30–90 kDa range.
The ITMS scans from a GELFrEE fraction containing ~65–75 kDa proteins were analyzed by the kDecon algorithm. On an averaged spectrum from retention times 29–30 min (Fig. 2A, left side), the noise reduction routine within kDecon lowered the 15 600 initial peaks in the spectrum to the final trimmed spectrum of 97 peaks (Fig. 2A, center). The circles highlight peaks selected by kDecon that form the highest scoring charge-state distribution. These peaks have charges between 78+ and 93+ that resulted in a calculated average mass of 70 673 Da that was identified by ProSightPC as a glucose-regulated protein (70 592.8 Da) with an expectation value (E-value) of 5 × 10−9. Manual investigation of the averaged spectrum yielded the same mass as the calculated kDecon mass. Both kDecon and the manual analysis had a standard deviation of approximately 1300 Da for the distribution, with the large standard deviation hinting at a broad microheterogeneity for this protein. Although it is unrealistic to assume that low ppm accuracy for very high mass can be achieved with the current low resolution IT setup, a close estimate within 0.1–0.5% is readily obtained where charge states are observed at sufficient S/N levels and multiple components <100 Da apart are not present as semi-overlapping species.
Figure 2.
kDecon analysis of charge states in spectra from large protein ions, with the algorithm assisted from a priori knowledge of sample MW range by virtue of using the GELFrEE separation approach. (A) At left: spectra of a 70 670 Da protein from a GELFrEE fraction containing ~65–75 kDa proteins (according to a silver-stained gel). At center: remaining peaks after a noise reduction routine imbedded in the kDecon algorithm was applied to the spectrum. The charge states (far right) selected by kDecon to determine the average mass value are shown. (B) A spectrum containing multiple protein forms is shown on the left side. The reduced spectrum (center) has the distribution of a 50 840 Da protein outlined with triangles while the other distribution of 50 200 Da has been circled. The inset shows the three semi-overlapping protein forms within 100 of 50 200 Da.
In a GELFrEE fraction corresponding to MW range 50–65 kDa (Supporting Information Fig. 1), kDecon reduced the data from an averaged scan of retention times 38–40 min, from 15 600 peaks to 284 peaks (Fig. 2B). From this smaller subset of peaks, multiple proteins were detected by kDecon from the spectrum on the left side of Fig. 2B. The following average masses (in Da) with standard deviations in parenthesis were detected: 50 183 (174), 50 255 (265), 50 296 (235), and 50 840 (178). Manual calculation matched the results for both the 50 255 and 50840 Da species. Differing slightly were the 50 183 and 50 296 Da with hand-calculated masses of 50 204 and 50 292 Da, respectively. Noticeably, the accuracy of kDecon as well as manual analysis is vastly improved compared to the 70 kDa example above. The higher mass proteins typically displayed more heterogeneity than lower mass counterparts and as such, suffered a loss in S/N, apparent resolution, and mass accuracy.
3.3 Precursor mass guides protein identifications
From the fragmentation data for the intact proteins of Fig. 3A (middle), ProSight generated 13 candidate hits in two groups corresponding to the two intact masses of 50 183 and 50 840 Da found by kDecon. These groups contained three and ten database entries with equal E-values based on the equal numbers of matching fragment ions. The intact masses from kDecon narrowed the pool of candidate gene products or isoforms to those with theoretical masses of 50 820.4 Da (A8JZY9) and 50 127.4 Da (P07437), respectively. For example, fragment ions from the 50 183 kDa species matched three isoforms of β-5 and β-7 tubulin in UniProt which were 50, 36, and 27 kDa (Supporting Information Table 1). Here, the intact mass made the hit unique to the β-5 isoform of tubulin with the start Met retained and unacetylated. For the case with ten candidate hits (which were all α tubulins; Supporting Information Table 2), five had masses <50 kDa, and five were within 320 Da with all of these lower than the 50 840 Da observed in the experiment. The closest, in terms of mass, was the A8JZY9 candidate, making this the most likely identification. Also detected in the intact mass spectrum were satellite peaks that were +72 and +113 Da higher than the 50 183 species (Fig. 3A, middle inset). Whether these are other isoforms or arise from artifactual modifications associated with protein MS requires higher (isotopic) resolution mass measurement, possibly by next generation FTMS.
Figure 3.
Protein isoforms determined automatically. (A) The 50 183 and 50 840 Da forms had precursor masses detected by kDecon that help affirm their existence even in the absence of bidirectional fragmentation. (B and C) Two forms of keratin, 48 344 and 59 008 Da, were detected and identified both with large numbers of N- and C-terminal fragment ions (far right).
In Figs. 3B and C, kDecon detected a mass of 59 008 Da, which was ~40 Da larger than the predicted mass of cyto-keratin-10 (P13645; 584 aa, KRT10) that was most consistent with the fragment ions observed from both termini (E-value of 1 × 10−38). With the next gene product having an E-value at 3 × 10−1, KRT10 is therefore the isoform identified in the experiment, with keratin typically contaminating most sensitive work flows in modern proteomics. Also, a related protein was identified unambiguously (E-value of 8 × 10−35) in this “multiplexed” fragmentation experiment as cyto-keratin-17 (Q04695; 432 aa, KRT17) with a detected mass of 48 344 Da (~+40 Da apparent error). The next best match scored at 6 × 10−10 (still statistically significant), and was a keratin-related isoform (Q14666) that displayed C-terminal homology.
When both C- and N-terminal fragment ions are not present, protein mass detection from kDecon can facilitate determination of a single isoform in the database during LC-MS. Such was the case with the tubulin isoforms above. Drawing information from Swiss-Prot, there are six genes that encode β tubulins with >91% sequence identity and six genes that encode α tubulin and members of this gene family display >89% sequence identity. In such cases, the combined selectivity of searching with fragment ion masses from a Top-Down experiment and a precursor mass sharply constricts the number of isoforms consistent with the data. These examples show multiplexed detection and gene-/ isoform-specific identifications, a timely and open question in contemporary proteomics of higher eukaryotic systems. Such advantages are unique to Top-Down and provide an avenue to get around the “peptide inference” problem [41], where Bottom Up identifications cannot be correlated to distinct isoforms or precise genes.
3.4 PTMcRAWler for the low mass proteome
The previous section described our latest efforts in obtaining precursor mass information when IT scans are used for the high mass proteome. For proteins less than ~30 kDa, we continue to use the high-resolution FTMS data. In order to demonstrate this capacity, we utilized acid-extracted yeast samples, containing 100 proteins and enriched for the highly modified histones [31]. PTMcRAWler searched for one to four acetylations (Δm of +42, +84, +126, and +168 Da) on four LC-MS runs. Overall, 198 modifications were detected across the two MS runs for proteins >6 kDa. The yeast runs consisted of both wild type and rpd3 histone deacetylase (HDAC) knock-out strains. Mutants of rpd3 are over-acetylated compared to wild-type cells, due to the loss of the HDAC activity. Modifications were consistently seen on histone H4 (Fig. 4B), which served here as a positive control. The unmodified form was not detected, as expected given that yeast H4 is N-terminally acetylated [42]. Four additional acetylations of H4 can be observed (Fig. 4B) in the wild-type spectra as well as the rpd3 spectra. Also noticeable is the more abundant fifth acetylation on the rpd3 versus the wild-type species, presumably due to the lack of HDAC activity.
Figure 4.
Stathmin phosphorylation (human) and histone H4 acetylation (yeast) dynamics selectively detected by searching for specific Δm values using PTMcRAWler. (A) Phosphorylated stathmin was detected in asynchronous (left) and M-phase-arrested HeLa cells (right). Mass differences due to phos-phorylation are shown in red. (B) H4 acetylation states (1–5 acetylations) detected from wild type (left) and rpd3 mutant, the HDAC knock-out, (right) are highlighted in blue.
A phosphorylation search by PTMcRAWler (set for one to four phosphorylations) was run on data collected from GELFrEE fractions from both asynchronous and M-phase-arrested human cells. This type of data filtering based on “delta m” values resulted in 18 putative phosphorylations being detected among the 553 THRASH inferred masses. A look at the largest MW proteins with phosphorylations detected in both samples revealed a stathmin protein (17 201.9 Da) containing several modifications in a partially known hierarchy of site utilization [43, 44]. In Fig. 4A, an upregulation of di- and tri-phosphorylated forms can been seen on the M-phase-arrested sample compared to the asynchronous.
These examples show how potentially important biological cases can be found quickly using PTMcRAWler, which affords the ability to find PTMs on intact proteins from a confidence rooted in the high mass accuracy of FTMS-based measurement. Combined with MS/MS-based detection of PTMs by ProSightPC, we can now detect PTMs on proteins that are related through mass differences and detected with precursor masses alone. Having this capability automated is tremendously useful since the number of detected proteins far exceeds those that are identified. The automated software program, PTMcRAWler, has the ability to find Δm’s of mass shifts from modifications between THRASH-detected precursor masses. This method gives us the ability to rapidly screen for PTMs that can later be targeted in MS/MS experiments for full protein characterization (Fig. 1B, bottom left). By finding modifications within different cell states, such as between asynchronous and treated samples, various changes to modification levels can be detected and visualized using the software described below.
3.5 Visualization of MS1-detected masses and PTMs
As sample preparation and MS methods continue to improve, protein visualization is a critical component of comparative studies and searching for patterns of dynamic PTMs. Protein MS needs to be harnessed to provide a high-resolution display that is akin to the traditional gel-staining protocols for protein visualization as spots or bands on 1-D or 2-D gels. To actualize this, Fig. 5 is generated as a 2-D heatmap of mass versus LC retention time from data obtained through eight Top-Down LC-FTMS runs of GELFrEE fractions from HeLa cells. Overall, 2632 non-unique masses >5 kDa were detected using THRASH. The well-defined bands are in stark contrast to the typical protein visualization provided by a silver stain of a complex mixture.
Beyond mapping all the detected proteins, data interpretation can be done in ways quite different than Bottom-Up. The insets of Fig. 5 (middle panels) show visualized outputs from PTMcRAWler after being instructed to search for mass differences coinciding with 1–3 phosphorylations (left), 1–3 methylations (middle), and 1–3 acetylations (right). Among all of the THRASH-based detections in the eight Top-Down runs, PTMcRAWler found mass differences consistent with 78 methylations, 84 acetylations, and 36 phosphorylations. The insets at the bottom of Fig. 5 show example cases for each, with a phosphoprotein (bottom left) having a THRASH inferred MW of 9980.06 Da and an unmodified MW at 9900.07 Da (Q9Y4Y9). The detected precursor containing a methylation (Δm = +14) in the (bottom middle, Fig. 5) had a MW of 9703.99 Da, with an unmodified mass of 9689.98 Da (B4DT13). On the bottom right of Fig. 5, PTMcRAWler detected an acetylation (Δm = +42) with MW 10 191.06 Da and an unmodified mass of 10 149.04 Da (O75531). In each of these cases, a confident identification was obtained via ProSightPC searching. This type of visualization uses the oft-observed co-elution of differentially-modified protein forms during RPLC. This kind of data processing enabled by combing PTMcRAWler with visualization allows a survey of large arrays of modified proteins and forms the basis of a scalable browsing environment for Top-Down Proteomics.
4 Concluding remarks
One benefit of having the intact mass of a protein is the ability to provide a precise identification from several related forms, which are otherwise ambiguous when identifications are based on a few fragment ions measured at high accuracy. We therefore developed the upgrades reported above to automate the detection, identification, and characterization of modified proteins. As full proteome runs become more robust for Top-Down MS, advanced software will be required to fully realize the conceptual advantages of Top-Down. Quantification of slightly different isoforms has precedent, but achieving this on the scale of several thousand mammalian proteins is the next big goal associated with this line of development.
As proteome coverage increases, so too will the need for improved mass spectrometer performance for high MW proteins. Of note, when fragment ions are matched to only one terminus in a Top-Down nozzle-skimmer experiment with no precursor mass detected [12], it is unknown whether the identified fragments derive the full-length protein, an isoform, or a degraded portion of the protein. This issue is also encountered in Bottom-Up experiments since there is normally no intact precursor mass detected. Having accurate intact masses of large proteins will retain advantages of Top-Down, such as accurate precursor matches, to dissect related protein forms and provide the clarity captured under the rubric of “precision proteomics” [45].
Supplementary Material
Acknowledgments
We are very grateful to members of the Kelleher Research Group, including Paul Thomas, Haylee Thomas, Adaikkalam Vellaichamy, Ioanna Ntai, Cong Wu, and Dorothy Ahlf, and the US federal government funding of this work under NIH/GM R01067193-07, NSF/DMS Award ID 0800631, and NIH/ NIDA P30 DA 018310-05.
Abbreviations
- EMA
exponential moving average
- E-value
expectation value
- GELFrEE
Gel-Eluted Liquid Fraction Entrapment Electrophoresis
- HDAC
histone deacetylase
- IT
ion trap
- MW
molecular weight
- sIEF
solution IEF
- THRASH
thorough high-resolution analysis of spectra by Horn
Footnotes
The authors have declared no conflict of interest.
References
- 1.Kelleher NL. Top-down proteomics. Anal Chem. 2004;76:196A–203A. [PubMed] [Google Scholar]
- 2.Roth MJ, Forbes AJ, Boyne MT, Kim YB, et al. Precise and parallel characterization of coding polymorphisms, alternative splicing, and modifications in human proteins by mass spectrometry. Mol Cell Proteomics. 2005;4:1002–1008. doi: 10.1074/mcp.M500064-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pandey A, Mann M. Proteomics to study genes and genomes. Nature. 2000;405:837–846. doi: 10.1038/35015709. [DOI] [PubMed] [Google Scholar]
- 4.Pesavento JJ, Mizzen CA, Kelleher NL. Quantitative analysis of modified proteins and their positional isomers by tandem mass spectrometry: human histone H4. Anal Chem. 2006;78:4271–4280. doi: 10.1021/ac0600050. [DOI] [PubMed] [Google Scholar]
- 5.Meng FY, Cargile BJ, Patrie SM, Johnson JR, et al. Processing complex mixtures of intact proteins for direct analysis by mass spectrometry. Anal Chem. 2002;74:2923–2929. doi: 10.1021/ac020049i. [DOI] [PubMed] [Google Scholar]
- 6.Lee JE, Kellie JF, Tran JC, Tipton JD, et al. A robust two-dimensional separation for top-down tandem mass spectrometry of the low-mass proteome. J Am Soc Mass Spectrom. 2009;20:2183–2191. doi: 10.1016/j.jasms.2009.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Covey TR, Bonner RF, Shushan BI, Henion J. The determination of protein, oligonucleotide and peptide molecular weights by ion-spray mass spectrometry. Rapid Commun Mass Spectrom. 1988;2:249–256. doi: 10.1002/rcm.1290021111. [DOI] [PubMed] [Google Scholar]
- 8.Fenn JB, Mann M, Meng CK, Wong SF, Whitehouse CM. Electrospray ionization for mass-spectrometry of large biomolecules. Science. 1989;246:64–71. doi: 10.1126/science.2675315. [DOI] [PubMed] [Google Scholar]
- 9.Zhang ZQ, Marshall AG. A universal algorithm for fast and automated charge state deconvolution of electrospray mass-to-charge ratio spectra. J Am Soc Mass Spectrom. 1998;9:225–233. doi: 10.1016/S1044-0305(97)00284-5. [DOI] [PubMed] [Google Scholar]
- 10.Horn DM, Zubarev RA, McLafferty FW. Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J Am Soc Mass Spectrom. 2000;11:320–332. doi: 10.1016/s1044-0305(99)00157-9. [DOI] [PubMed] [Google Scholar]
- 11.Patrie SM, Ferguson JT, Robinson DE, Whipple D, et al. Top down mass spectrometry of <60-kDa proteins from Methanosarcina acetivorans using quadrupole FTMS with automated octopole collisionally activated dissociation. Mol Cell Proteomics. 2006;5:14–25. doi: 10.1074/mcp.M500219-MCP200. [DOI] [PubMed] [Google Scholar]
- 12.Vellaichamy A, Tran JC, Catherman AD, Lee JE, et al. Size-sorting combined with improved nanocapillary liquid chromatography-mass spectrometry for identification of intact proteins up to 80 kDa. Anal Chem. 2010;82:1234–1244. doi: 10.1021/ac9021083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jensen ON. Interpreting the protein language using proteomics. Nat Rev Mol Cell Biol. 2006;7:391–403. doi: 10.1038/nrm1939. [DOI] [PubMed] [Google Scholar]
- 14.Wilkins MR, Gasteiger E, Gooley AA, Herbert BR, et al. High-throughput mass spectrometric discovery of protein post-translational modifications. J Mol Biol. 1999;289:645–657. doi: 10.1006/jmbi.1999.2794. [DOI] [PubMed] [Google Scholar]
- 15.Bailey CM, Sweet SMM, Cunningham DL, Zeller M, et al. SLoMo: automated site localization of modifications from ETD/ECD mass spectra. J Proteome Res. 2009;8:1965–1971. doi: 10.1021/pr800917p. [DOI] [PubMed] [Google Scholar]
- 16.Beausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat Biotechnol. 2006;24:1285–1292. doi: 10.1038/nbt1240. [DOI] [PubMed] [Google Scholar]
- 17.Tanner S, Payne SH, Dasari S, Shen Z, et al. Accurate annotation of peptide modifications through unrestrictive database search. J Proteome Res. 2008;7:170–181. doi: 10.1021/pr070444v. [DOI] [PubMed] [Google Scholar]
- 18.Siuti N, Kelleher NL. Decoding protein modifications using top-down mass spectrometry. Nat Methods. 2007;4:817–821. doi: 10.1038/nmeth1097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zamdborg L, LeDuc RD, Glowacz KJ, Kim YB, et al. ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry. Nucleic Acids Res. 2007;35:W701–W706. doi: 10.1093/nar/gkm371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Holmes MR, Giddings MC. Prediction of posttranslational modifications using intact-protein mass spectrometric data. Anal Chem. 2004;76:276–282. doi: 10.1021/ac034739d. [DOI] [PubMed] [Google Scholar]
- 21.Simpson RJ, Dorow DS. Cancer proteomics: from signaling networks to tumor markers. Trends Biotechnol. 2001;19:S40–S48. doi: 10.1016/S0167-7799(01)01801-7. [DOI] [PubMed] [Google Scholar]
- 22.Jensen PK, Pasa-Tolic L, Peden KK, Martinovic S, et al. Mass spectrometic detection for capillary isoelectric focusing separations of complex protein mixtures. Electrophoresis. 2000;21:1372–1380. doi: 10.1002/(SICI)1522-2683(20000401)21:7<1372::AID-ELPS1372>3.0.CO;2-Y. [DOI] [PubMed] [Google Scholar]
- 23.Sharma S, Simpson DC, Tolic N, Jaitly N, et al. Proteomic profiling of intact proteins using WAX-RPLC 2-D separations and FTICR mass spectrometry. J Proteome Res. 2007;6:602–610. doi: 10.1021/pr060354a. [DOI] [PubMed] [Google Scholar]
- 24.Wu S, Lourette NM, Tolic N, Zhao R, et al. An integrated top-down and bottom-up strategy for broadly characterizing protein isoforms and modifications. J Proteome Res. 2009;8:1347–1357. doi: 10.1021/pr800720d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhou F, Hanson TE, Johnston MV. Intact protein profiling of Chlorobium tepidum by capillary isoelectric focusing, reversed-phase liquid chromatography, and mass spectrometry. Anal Chem. 2007;79:7145–7153. doi: 10.1021/ac071147c. [DOI] [PubMed] [Google Scholar]
- 26.Zhao J, Zhu K, Lubman DM, Miller FR, et al. Proteomic analysis of estrogen response of premalignant human breast cells using a 2-D liquid separation/mass mapping technique. Proteomics. 2006;6:3847–3861. doi: 10.1002/pmic.200500195. [DOI] [PubMed] [Google Scholar]
- 27.Wenger CD, Boyne MT, Ferguson JT, Robinson DE, Kelleher NL. Versatile online-offline engine for automated acquisition of high-resolution tandem mass spectra. Anal Chem. 2008;80:8055–8063. doi: 10.1021/ac8010704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Senko MW, Beu SC, McLafferty FW. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J Am Soc Mass Spectrom. 1995;6:229–233. doi: 10.1016/1044-0305(95)00017-8. [DOI] [PubMed] [Google Scholar]
- 29.Senko MW, Beu SC, McLafferty FW. Automated assignment of charge states from resolved isotopic peaks for multiply-charged ions. J Am Soc Mass Spectrom. 1995;6:52–56. doi: 10.1016/1044-0305(94)00091-D. [DOI] [PubMed] [Google Scholar]
- 30.Trinkle-Mulcahy L, Boulon S, Lam YW, Urcia R, et al. Identifying specific protein interaction partners using quantitative mass spectrometry and bead proteomes. J Cell Biol. 2008;183:223–239. doi: 10.1083/jcb.200805092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Jiang L, Smith JN, Anderson SL, Ma P, et al. Global assessment of combinatorial post-translational modification of core histones in yeast using contemporary mass spectrometry. J Biol Chem. 2007;282:27923–27934. doi: 10.1074/jbc.M704194200. [DOI] [PubMed] [Google Scholar]
- 32.Tran JC, Doucette AA. Rapid and effective focusing in a carrier ampholyte solution lsoelectric focusing system: a Proteome prefractionation tool. J Proteome Res. 2008;7:1761–1766. doi: 10.1021/pr700677u. [DOI] [PubMed] [Google Scholar]
- 33.Tran JC, Doucette AA. Gel-eluted liquid fraction entrapment electrophoresis: an electrophoretic method for broad molecular weight range proteome separation. Anal Chem. 2008;80:1568–1573. doi: 10.1021/ac702197w. [DOI] [PubMed] [Google Scholar]
- 34.Laemmli UK. Cleavage of structural proteins during assembly of head of bacteriophage-T4. Nature. 1970;227:680–685. doi: 10.1038/227680a0. [DOI] [PubMed] [Google Scholar]
- 35.Wessel D, Flugge UI. A method for the quantitative recovery of protein in dilute-solution in the presence of detergents and lipids. Anal Biochem. 1984;138:141–143. doi: 10.1016/0003-2697(84)90782-6. [DOI] [PubMed] [Google Scholar]
- 36.Boyne MT, Garcia BA, Li MX, Zamdborg L, et al. Tandem mass spectrometry with ultrahigh mass accuracy clarifies peptide identification by database retrieval. J Proteome Res. 2009;8:374–379. doi: 10.1021/pr800635m. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Johnson JR, Meng FY, Forbes AJ, Cargile BJ, Kelleher NL. Fourier-transform mass spectrometry for automated fragmentation and identification of 5–20 kDa proteins in mixtures. Electrophoresis. 2002;23:3217–3223. doi: 10.1002/1522-2683(200209)23:18<3217::AID-ELPS3217>3.0.CO;2-K. [DOI] [PubMed] [Google Scholar]
- 38.Preparata FP, Shamos MI. Computational Geometry. Springer-Verlag; New York: 1985. [Google Scholar]
- 39.Taylor GK, Kim YB, Forbes AJ, Meng FY, et al. Web and database software for identification of intact proteins using “top down” mass spectrometry. Anal Chem. 2003;75:4081–4086. doi: 10.1021/ac0341721. [DOI] [PubMed] [Google Scholar]
- 40.LeDuc RD, Taylor GK, Kim YB, Januszyk TE, et al. ProSight PTM: an integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res. 2004;32:W340–W345. doi: 10.1093/nar/gkh447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Nesvizhskii AI, Aebersold R. Interpretation of shotgun proteomic data – the protein inference problem. Mol Cell Proteomics. 2005;4:1419–1440. doi: 10.1074/mcp.R500012-MCP200. [DOI] [PubMed] [Google Scholar]
- 42.Pesavento JJ, Yang H, Kelleher NL, Mizzen CA. Certain and progressive methylation of histone H4 at lysine 20 during the cell cycle. Mol Cell Biol. 2008;28:468–486. doi: 10.1128/MCB.01517-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Belmont LD, Mitchison TJ. Identification of a protein that interacts with tubulin dimers and increases the catastrophe rate of microtubules. Cell. 1996;84:623–631. doi: 10.1016/s0092-8674(00)81037-5. [DOI] [PubMed] [Google Scholar]
- 44.Larsson N, Melander H, Marklund U, Osterman O, Gullberg M. G2/M Transition requires multisite phosphorylation of oncoprotein-18 by 2 distinct protein-kinase systems. J Biol Chem. 1995;270:14175–14183. doi: 10.1074/jbc.270.23.14175. [DOI] [PubMed] [Google Scholar]
- 45.Mann M, Kelleher NL. Precision proteomics: the case for high resolution and high mass accuracy. Proc Natl Acad Sci USA. 2008;105:18132–18138. doi: 10.1073/pnas.0800788105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



