Skip to main content
Molecular & Cellular Proteomics : MCP logoLink to Molecular & Cellular Proteomics : MCP
. 2014 Jun 2;13(9):2435–2449. doi: 10.1074/mcp.O113.037135

Tandem Mass Spectral Libraries of Peptides in Digests of Individual Proteins: Human Serum Albumin (HSA) *

Qian Dong ‡,§, Xinjian Yan , Lisa E Kilpatrick , Yuxue Liang , Yuri A Mirokhin , Jeri S Roth , Paul A Rudnick , Stephen E Stein
PMCID: PMC4159660  PMID: 24889059

Abstract

This work presents a method for creating a mass spectral library containing tandem spectra of identifiable peptide ions in the tryptic digestion of a single protein. Human serum albumin (HSA1) was selected for this purpose owing to its ubiquity, high level of characterization and availability of digest data. The underlying experimental data consisted of ∼3000 one-dimensional LC-ESI-MS/MS runs with ion-trap fragmentation. In order to generate a wide range of peptides, studies covered a broad set of instrument and digestion conditions using multiple sources of HSA and trypsin. Computer methods were developed to enable the reliable identification and reference spectrum extraction of all peptide ions identifiable by current sequence search methods. This process made use of both MS2 (tandem) spectra and MS1 (electrospray) data. Identified spectra were generated for 2918 different peptide ions, using a variety of manually-validated filters to ensure spectrum quality and identification reliability. The resulting library was composed of 10% conventional tryptic and 29% semitryptic peptide ions, along with 42% tryptic peptide ions with known or unknown modifications, which included both analytical artifacts and post-translational modifications (PTMs) present in the original HSA. The remaining 19% contained unexpected missed-cleavages or were under/over alkylated. The methods described can be extended to create equivalent spectral libraries for any target protein. Such libraries have a number of applications in addition to their known advantages of speed and sensitivity, including the ready re-identification of known PTMs, rejection of artifact spectra and a means of assessing sample and digestion quality.


Shotgun proteomics is a widely used and evolving method for determining the protein composition of a biological mixture (13). It most often involves the digestion of denatured proteins by trypsin, followed by the identification of product peptides and the use of this information to infer protein identities and possibly targeted post-translational modifications (PTMs)1. However, because digestion is a highly complex chemical process, a large proportion of identifiable products are not specifically targeted for analysis and therefore invisible to the analysis. These include unexpected and unwanted peptides that interfere with the analysis. Others may contain modifications of biological origin, which, unless specifically targeted, can be lost among the forest of artifacts (46). This paper describes methods for building a tandem mass spectral library capable of characterizing all identifiable peptides in a tryptic digest of a selected protein. Spectral libraries are known to provide an effective way of reusing this information to quickly, reliably, and sensitively determine peptide identities (711). These identifications can serve several purposes, including 1) ensuring that all previously identified peptides are identified regardless of search engine settings, 2) tagging artifact peptides that might otherwise lead to false positive identifications, 3) ensuring the identification of known and identifiable biological post-translational modifications without explicitly looking for them, and 4) providing a list of artifact peptides for assessing the quality of the sample preparation process.

HSA, human serum albumin, was selected as the target protein for library development partly because of its ubiquity, making up >50% of the total protein in blood (1213) and therefore found in many biological samples, and partly because of the considerable background information available for its digestion products (1419). However, despite the long-standing interest in this protein (2021), a thorough determination of its digestion products has not been reported. HSA is composed of 585 amino acids and yields a wide range of tryptic peptides, including many with missed or irregular cleavages and a variety of both native and analytical modifications. At first sight, the analysis of just one protein may appear straightforward because it is common practice in the field of proteomics to search for thousands of proteins in a biological sample. However, this analysis aiming at thorough analytical characterization of HSA peptide ions requires a very different method of analysis. It needs to deal with the wide diversity of digestion products, many of which cannot be predicted in advance and whose relative concentrations are likely to depend on complex chemical processes that cannot be fully controlled. Products include peptides with missed and irregular cleavages, under or over alkylation, unexpectedly high and low charge states, and an uncertain number of modifications, including unknown modifications (i.e. so-called blind modifications (2223)). Furthermore, the process of identifying such peptides is prone to misidentification by accidental “homologies” (two different peptides yielding an overlapping set of y/b ions). Including these variant peptides leads to a dramatic increase in the number of both true and false HSA peptide identifications compared with those of the commonly sought tryptic peptides (2425) at a given score threshold. This paper describes a series of methods designed to first produce all possible identifications and then to reject false identifications using a variety of filters to generate a reliable and comprehensive library of reference spectra for a single protein.

Experimental and Computational Procedures

Experimental Methods and Data Sources

Most of the mass spectral data used for building the HSA library came from 2035 LTQ runs and 522 LTQ/Orbitrap runs (Thermo Fisher Scientific, San Jose, CA, see Disclaimer). Many of these were generated for two studies examining digestion variability (26, 27). These served to generate peptides over a wide range of conditions and HSA sources, including 12 HSA samples from five vendors, eight sources of trypsin, and a range of denaturing/digestion conditions. High temperature (90 °C) and urea (6 m) were the most commonly used denaturing conditions. Most commonly, dithiothreitol (DTT) was the reducing agent, iodoacetamide (IAA) the alkylation agent and tris-hydroxymethyl-aminomethane (TRIS) the buffer. Concentrations of these were varied as were those of HSA and trypsin. Other runs employed organic and no denaturants, cleavable surfactants, tris(2-carboxyethyl)phosphine (TCEP) as a reducing agent, and widely varying digestion times (5 min. to 2 days). Also included were 355 runs of digests of a plasma-like protein mix from the NIH/NCI-supported Clinical Proteomic Technology Assessment for Cancer (CPTAC) program (http://proteomics.cancer.gov/programs/CPTAC/), comprised of 200 LTQ and 155 LTQ/Orbitrap runs (2830). Some 122 spectra from the NIST Human library were also included (described later).

Initial Peptide Identifications

The method developed for building this single-protein spectral library was derived from the methods currently used for building the NIST tandem mass spectral libraries of tryptic peptides from digests of biological protein samples (3132). As in that earlier work, initial identifications were made from ion-trap fragmentation spectra derived from tryptic digests using four sequence search engines (OMSSA (33), X!Tandem (34), Comet (35), and ProteinProspector (36)), but used a fasta file containing only the HSA sequence (see Supplemental Table S1) and its reverse. It was found that to reliably identify both long, highly-charged peptides as well as peptides containing a wide range of peptide modifications, two separate sets of searches were necessary. Otherwise, incorrect high scoring semitryptic peptides with unusual modifications could overwhelm correct identifications of conventional tryptic peptides, especially those with multiple missed cleavages. The first search allowed up to two missed cleavages and four charges as well as one nontryptic terminus (semitryptic) and included a list of 22 categories of HSA-targeted modifications (16 in Table IV and 6 in Table V). The second search allowed up to four missed cleavage sites, six charge states, did not allow semitrypic peptides, and permitted only common modifications (variable cysteine alkylation, methionine oxidation, ammonia loss of N-terminal Gln and Carbamidomethyl-Cys, and water loss from N-terminal Glu). Results of these searches were merged. To find unidentified modifications, two additional search engines, namely InSpect (37) and TagRecon (38), served to identify single, untargeted modifications with mass shifts at specific residues between −300 and 300 Da. The list of the 22 specified modifications just described was partly built by examining and assigning some of these identifications. Parent and fragment tolerances of 0.2 m/z and 0.8 m/z, respectively, were used at this stage.

Table IV. Sixteen categories of modifications sorted by percent of total ions.
Modification label Delta mass Modified site Modified ions % Ions % Total MRAB
Oxidation +15.9949 M, H, W 145 5.0 1.27
Carbamyl +43.0058 N-terminus, K, T, M 121 4.2 2.26a
Formyl +27.9949 N-terminus, K, S, T 112 3.8 0.54
Cation:Na +21.9819 D, E 89 3.1 0.50
Cation:Fe[II] +53.9193 E … b 77 2.6 0.91
Cation:Ca[II] +37.9469 E … b 58 2.0 1.71
Dehydrated −18.0106 D, S, T 54 1.9 0.37
Argc +156.1011 N- or C-terminus 45 1.5 0.08
Lysc +128.0950 N- or C-terminus 8 0.3 0.01
Gln->pyro-Glu −17.0265 Q at N-terminus 46 1.6 1.73
Methyl +14.0157 K, H 43 1.5 0.21
Pyro-carbamidomethyl +39.9949 C at N-terminus 31 1.1 1.64
Glu->pyro-Glu −18.0106 E at N-terminus 19 0.7 0.06
Deamidated +0.9840 N, Q 4 0.1 0.01
Vicidisulfided −2.0157 C-C 3 0.1 0.01
Dioxidation +31.9898 W 2 0.1 0.004
Delta:H(2)C(2)e +26.0157 N-terminus 1 0.03 0.0003

a Includes only runs with urea as denaturant.

b Our data revealed adducts Fe and Ca can also be attached to many other residues such as L, G, S, T, P, V.

c Addition of arginine or Lysine on N- or C-terminus due to transpeptidation catalyzed by trypsin.

d Vicinal disulfide labeled internal disulfide observed on several HSA adjacent cysteines. They were only observed from runs without a reducing agent.

e Formation of Schiff base on N-terminus, see Reference 60.

Table V. Six categories of posttranslational modifications (PTMs) identified in HSA. Sites of modification are shown in boldface. MRAB, relative abundance; PIIF, peptide ion identification frequency; Cysteinyl, cysteinylation; cys34 oxidation adducts, +2O, +3O, or +O and -2H; Acetyl, acetylation; Hex, glycation; Phospho, phosphorylation.
PTM m/z z Peptide sequence Modified site Delta mass MRAB PIIFa
1 Cysteinylation 1276.638 2 ALVLIAFAQYLQQC(Cysteinyl)PFEDHVK Cys34 119.00 0.0119 0.25
Cysteinylation 851.428 3 ALVLIAFAQYLQQC(Cysteinyl)PFEDHVK Cys34 119.00 0.2585 0.58
Cysteinylation 638.823 4 ALVLIAFAQYLQQC(Cysteinyl)PFEDHVK Cys34 119.00 0.1753 0.46
Cysteinylation 871.929 4 DLGEENFKALVLIAFAQYLQQC(Cysteinyl)PFEDHVK Cys34 119.00 0.0245 0.46
2b Sulfinic acid 822.427 3 ALVLIAFAQYLQQC(+2O)PFEDHVK Cys34 31.99 0.0066 0.17
Sulfonic acid 1241.136 2 ALVLIAFAQYLQQC(+3O)PFEDHVK Cys34 47.99 0.0009 0.17
Sulfonic acid 827.767 3 ALVLIAFAQYLQQC(+3O)PFEDHVK Cys34 47.99 0.0050 0.12
Sulfinamide 816.760 3 ALVLIAFAQYLQQC(+O,-2H)PFEDHVK Cys34 13.98 0.0002 0.01
3 Truncation 963.512 1 (-DA)HKSEVAHR N-term −186.06 0.0003 0.05
Truncation 482.260 2 (-DA)HKSEVAHR N-term −186.06 0.0094 0.07
Truncation 900.515 1 LVAASQAALG(-L) C-term −113.08 0.0171 0.62
Truncation 450.762 2 LVAASQAALG(-L) C-term −113.08 0.0205 0.44
4 Glycation 931.082 3 LVNEVTEFAK(Hex)TCVADESAENCDK Lys51 162.05 0.0022 0.29
Glycation 605.304 3 AEFAEVSK(Hex)LVTDLTK Lys234 162.05 0.0122 0.53
Glycation 736.388 3 VFDEFK(Hex)PLVEEPQNLIK Lys378 162.05 0.0016 0.26
Glycation 430.923 3 K(Hex)QTALVELVK Lys525 162.05 0.0117 0.39
5 Acetylation 989.545 1 LK(Acetyl)CASLQKc Lys199 42.01 0.0001 0.01
Acetylation 495.277 2 LK(Acetyl)CASLQKc Lys199 42.01 0.0121 0.82
Acetylation 585.859 2 K(Acetyl)QTALVELVK Lys525 42.01 0.0003 0.07
6 Phosphorylation 789.776 2 TCVADES(Phospho)AENCDKc Ser58 79.97 0.0001 0.03
Phosphorylation 860.456 2 KVPQVST(Phospho)PTLVEVSR Thr420 79.97 0.0005 0.15
Phosphorylation 573.973 3 KVPQVST(Phospho)PTLVEVSR Thr420 79.97 0.0008 0.16

a PIIF was calculated a) for cysteinylation using 24 LTQ non-reducing runs, b) for Cys34 oxidation, N- or C- terminal truncation, glycation, and acetylation, using 350 LTQ-Orbitrap runs, and c) for phosphorylation using 170 LTQ-Orbitrap runs in CPTAC studies (26–27).

b Category 2, Cys34 oxidation, has three oxidized forms (sulfinic/sulfonic acid and sulfinamide).

c All cysteines in the categories 4–6 are alkylated.

Scores from each of the search engines were normalized using results of searching a combined HSA forward and reversed sequence database. This method refined scores using fractions of unassigned fragment abundances and peptide classes. Tentative identifications were determined based upon a formal 5% false discovery rate (FDR) using a target-decoy approach (39). Owing to the large variety of peptides allowed, even this single protein generated sufficient decoy hits to allow setting a statistically meaningful FDR. Manual examination showed that the computed score threshold was sufficiently low not to miss any of the conventional peptides expected to be generated in HSA digestion. Note also that the actual FDR was far higher than 5% because of the wide search space employed and the consequent generation of many false “homologous” peptide identifications.

Filters

The wide peptide search space generated a large number of incorrect identifications at search scores appropriate for reliable identification of conventional tryptic peptides. Ideally, scores would depend on the “prior probability” (40) that a particular variety of peptide ion would be present in the digest - of course this is not done by present methods. Rejection of these unusual and less predictable peptides requires post-processing analyses. To some degree, this was done by adjusting scores of certain classes of peptides (3132), but this was found to be inadequate for the wide range of modifications considered here. Therefore, a general peptide classification scheme, along with a series of five quality filters and one flag were developed. These are summarized in Table I, which shows the name of each filter, the type of data it uses, the specifics of the filter as well as thresholds for rejection. A description of the peptide classification method and each of the filters follows.

Table I. Five quality filters and one flag used for quality assessment of HSA peptide ion spectra.
Filter Data type Description Rejection threshold
1 Ion significance MS1 Median relative abundance (MRAB) and peptide ion identification frequency (PIIF) MRAB = 0 or PIIF ≤ 0.01
2 m/z error MS1 Actual and absolute median m/z deviation ≥0.25 m/z for LTQ
≥5 ppm for Orbitrap
3 Unidentified fragment ions MS2 Unassigned abundance (subfilter 1), Unassigned abundance and numbers of peaks (subfilter 2) Subfilter 1 ≥ 0.32
Subfilter 2 ≥ 0.36
4 Insufficient ions above the precursor m/z MS2 Fraction of the largest 20 fragment ions above precursor m/z ≤0.2 for charge 2, ≤0.3 for charge 3, or ≤0.36 for charge state higher than 3
5 Principal charge state Peptide charge assignment Number of basic residues, NBR, and charge state, CS NBR-CS > 0
Flag Data type Description Flagging threshold
1 Gaps in charge state distribution Peptide charge assignment Charge states of a given peptide Gap in the charge states

Peptide Classification

For the purpose of excluding the most improbable peptides, peptides were first separated into two broad classes—common and unusual. Common peptides are those expected from digestion and most commonly sought in sequence identification searching. Briefly, these include tryptic peptides with normal missed cleavages (near acidic groups or a terminus), Met/Trp oxidation and N-terminal Cys or Gln loss of ammonia. In-source peptides that co-elute with their precursor peptide are also expected as is the alkylation of all cysteines. Other peptides are classified as “unusual.“ Peptides that contain features of two or more unusual classes or modifications are rejected.

Filter 1: Peptide Ion Significance

This filter rejects identifications with weak signals that occur rarely. It uses two derived values, the median relative abundance, MRAB, and peptide ion identification frequency, PIIF. MRAB of each ion was extracted from the raw data by ProMS, a software tool for LC-MS/MS ion perception and annotation program developed at NIST and used in the NIST MSQC Pipeline (30, 41). The abundance of each identified ion was determined from extracted ion chromatograms (XIC). For high resolution data, individual isotopic peaks were summed, whereas for low resolution (LTQ) data (e.g. unresolved isotopic peaks), the peaks were summed within a defined range (-0.6 to 1.6) of the m/z that was calculated based on the ion average mass, which generally represent isotopic components. Then relative abundance was derived by dividing this by that of the largest identified ion in that run. MRAB is the median of the relative abundance values obtained from all LC/MS runs where the ion is identified. If a precursor peak could not be found, its abundance was set to zero. The PIIF was simply the fraction of runs that an ion was identified, excluding special cases such as nonalkylated runs. These two values were computed separately for LTQ and LTQ Orbitrap data. Filtering used LTQ Orbitrap values when available and LTQ values when identifications were made only on those low resolution instruments.

Filter 2: m/z Error

The difference between observed and theoretical mass of each ion identified served as a filter. The m/z of each peptide ion in a run was taken as its intensity-weighted monoisotopic m/z averaged over its elution profile. Each value was corrected for instrument bias by linear regression of these deviations versus m/z based on the confident identifications. Median absolute m/z deviations were then computed. Identifications made for Orbitrap spectra were rejected when these median deviations exceeded 5 ppm, whereas identifications made only in low resolution instruments (ion trap m/z determination) were rejected when these deviations exceeded 0.25 m/z.

Filter 3. Unidentified Fragment Ions

The presence of significant fragment ions that could not be traced to known fragmentation paths suggests that either the spectrum was contaminated with co-fragmenting ions or that the identification was erroneous. In the NIST human ion trap library the median percentage of unidentified abundance in a spectrum was 8% and the percent of unidentified peaks was 15%. Examination of questionable spectra led to development of a filter that used both abundances and numbers of peaks. Subfilter 1 was the geometric mean of the fraction of unassigned abundance for the most abundant 20 peaks and for all peaks. Subfilter 2 added to this value the geometric mean of the unassigned fraction of the 20 most abundant peaks and all peaks. If the value for both subfilters 1 and 2 exceeded 0.32 and 0.36, respectively, the spectrum was rejected. Note that neutral loss from the precursor was excluded in these calculations, and that small peptides of sequence length less than six were not subject to this filter.

Filter 4. Sufficient Ions above the Precursor m/z

Fragmentation products of multiply charged peptides are generally expected to produce significant product ions above the precursor m/z. Moreover, it was noted that a common feature of some questionable identifications was the presence of little signal above the precursor m/z. Based on examination of spectra and findings from the NIST human ion trap library, spectra were rejected when the fraction of the largest 20 fragment ions (excluding neutral loss from the precursor) above the precursor m/z was less than 0.2 for charge 2, 0.3 for charge 3, or 0.36 for charge state higher than 3.

Filter 5. Principal Charge State

A significant fraction of the abundance of most tryptic peptides appears in the peptide ion whose charge state equal to the number of basic residues (NBR = Arg, Lys, His, and N-terminal amine) (42). Relatively little signal typically is carried by charge states more than 1 charge state away from this value. This behavior was confirmed for predominant tryptic peptides and peptides with multiple charge states. Therefore, peptides identified in only one charge state, constituting about 75% of identified HSA peptides, were rejected if their charge state did not match the NBR, with the following exceptions. When basic groups were adjacent, one lower charge state was permitted for each such pair (4344). Because of possible long range interactions and involvement of less basic peptides (42), peptides of sequence length greater than 20 containing multiple basic sites were not subject to this filter.

Flag: Gaps in Charge State Distribution

When peptides were identified in multiple charge states, all charge states between the maximum and minimum charge are expected to be identified.

Any gaps were manually examined to find the origin of the problem. As discussed later, this led to improvements in the methods.

RESULTS

Overview of Library Building Pipeline

Library building proceeds through six stages (Fig. 1): (1) data acquisition, (2) tentative peptide identification, (3) consensus spectrum creation, (4) MS1 and MS2 data extraction, (5) quality filtering, and (6) final library creation.

Fig. 1.

Fig. 1.

Single-protein spectral library building pipeline. Flow diagram illustrating the six major stages of library building process used in the single-protein spectral library.

In Stage 1 the underlying data was generated or collected. In Stage 2 peptides were tentatively identified using the wide range of search methods and parameters described earlier. This large search space led to many false and conflicting spectrum identifications. This most frequently occurred for groups of peptide ions having high charge states, unusual modifications, and/or irregular cleavages, but with sufficient sequence similarity to more common tryptic peptides to generate an overlapping set of y- or b-ions. These identifications were often found to depend on the search engine and its specific settings. These ambiguities were resolved in the later stages. In Stage 3, spectra for these tentative identifications were combined to create an annotated “consensus” spectrum that included information concerning the origin of the underlying spectra, peak labeling, search engine scores and, other of processing details (3132). In Stage 4, relevant MS1 and MS2 information needed for later filters was extracted and analyzed for these identifications using the underlying raw data. In Stage 5, the classifications and filters described above were used to reject uncertain identifications. Many rejected spectra, especially those with high scores and identification frequency, were examined to find why they were rejected, guiding the development of the present method. In Stage 6, the final library was derived, all spectra were inter-compared and conflicts between similar spectra having different identifications were resolved. In this process, expected peptides were preferred over unusual peptides. When this did not resolve ambiguities, the higher scoring identification was kept, with alternatives given in the spectrum annotation.

Consensus Spectrum Rejection using Quality Filters

Peptides were divided into the nine classes presented in Table II. For each class is shown the type of peptide (common or unusual), peptide description, number of ions prior to filtering, the numbers of ions rejected by each filter, ions in the final library (number and percent) and the contribution of each class to total identified ion intensity.

Table II. For each peptide class, the type (C-Common, U-Unexpected) and peptide description are given, along with numbers of ions in the starting and final libraries, those rejected by five filters, contributions to ion count and total median relative abundance, MRAB.
Type Class Peptide description Initial Filter 1 Filter 2 Filter 3 Filter 4 Filter 5 Final Ion% MRAB%
C 1 Simple Tryptic 168 3 5 3 4 0 158 5.4 46.7
C 2 Tryptic with Expected Missed-Cleavage 173 10 4 2 11 0 154 5.3 23.7
C 3 Common Modifications 92 11 5 4 6 0 74 2.5 3.8
C 4 In-Source Semitryptic 332 43 22 6 6 0 263 9.0 3.6
U 5 In-Solution Semitryptic 788 136 83 2 7 0 577 19.8 4.7
U 6 Artifacts and PTMs 1068 109 218 120 36 0 673 23.1 8.1
U 7 Unexpected Missed-Cleavage 404 67 38 4 20 8 293 10.0 5.3
U 8 Under/Over Alkylation 420 96 68 17 20 10 256 8.8 1.9
U 9 Unidentified Modifications 2546 1739 349 694 114 133 470 16.1 2.2
Total 5991* 2214 792 852 224 151 2918 100 100

* This process started with 7359 spectra. After discarding peptides falling into multiple “unusual” classes, 5991 spectra remained and then subjected to quality filtering.

Filter 1: Peptide Ion Significance

The ability of the library consensus spectrum of a peptide ion to re-identify this ion in the original data provided a measure of significance of the ion and quality of its spectrum. Using the preliminary library (before filters were applied), 2214 consensus spectra had PIIF ≤ 0.01 or MRAB = 0 (Threshold in Table I) - these were rejected. Among them, 83 consensus spectra were not matched in any run. This occurred when a good quality consensus spectrum could not be derived during the construction of library consensus spectra because of low quality source spectra. It was also found that 141 ions produced identifications (score > 0.45) in only 1 or 2 runs, and 473 ions were matched in fewer than 10 runs. This filter has an especially large effect on Class 9 (unidentified modifications), removing 1739 ions, most seen only in low mass accuracy runs. Some examples of the excluded spectra are included in supplemental Table S2.

Filter 2: m/z Error

The mass accuracy calculations described in the Method section rejected hundreds of ambiguous or erroneous identification of peptides. Insufficient precursor m/z accuracy led to rejection of 792 (13%) of initially identified ions (Table II). Among them, 65% were from Orbitrap data, the rest were from LTQ runs. In Orbitrap runs, the deviations of 510 rejected ions ranged from 5 to 2181 ppm with a median 471 ppm. As shown in the Filter 2 column of Table II, 95% of these rejected ions were from classes 5–9, only 36 ions from the common classes. Manual inspection showed that many of these had different assignments from different sequence search engines. Filter 2 rejected many false assignments, with some examples given in supplemental Table S3.

Filter 3: Unidentified Fragment Ions

This filter led to the removal of 852 peptide spectra (14%) from the initial library. Of these rejected spectra, 595 would have been removed using Filter 1 as insignificant spectra and 114 removed using Filter 2 as spectra because of large mass error. Peptides with unusual modifications constituted 80% of the rejections.

Filter 4: Insufficient Ions above the Precursor m/z

Of 2946 multiply charged ions, 224 did not pass the requirement for sufficient sequence ions above the precursor m/z. Of these, 75% would have been rejected by Filter 1 because of low peptide identification frequency or by Filter 2 because of large mass error. This absence of significant identified peaks above the precursor m/z was a useful filter for removing low quality spectra (see supplemental Fig. S1 for examples).

Filter 5: Principal Charge States

Of the 4275 peptides identified in only one charge state, 151 were rejected because their charge state was not equal to the number of basic residues (NBR = Arg, Lys, His, and N-terminal amine) in the peptide.

Flag: Gaps in the Charge State Distribution

Prior to application of the filters, 53 peptides identified in multiply charged states had gaps in their charge states. In some cases, gaps originated from erroneous identification of at least one peptide ion. After final filter development, all of these gaps disappeared. These flags therefore greatly assisted the refinement of the other filters. In other cases, consensus spectra of some minor peptides for intermediate charge states were rejected when reliable consensus spectra could not be created, possibly because of contamination. These spectra were retained by the library.

Peptide Classes

The following sections present findings for the peptide classes given in Table II. Peptides are ranked by the peptide identification significance value, PSIG, defined as the geometric mean of MRAB and PIIF values. To better represent typical conditions, the following statistics exclude exceptional runs such as those without reducing or alkylating agents or with unusual m/z ranges.

Classes 1 and 2: Tryptic Peptides with and without Expected Missed-cleavages

These peptide classes dominate the field of shotgun proteomics. Table III lists those peptide ions with an identification frequency (PIIF) over 50% in the 350 LTQ-Orbitrap runs. Class 1 includes “proteotypic” peptides (45) with no missed cleavages (also includes Lys/Arg at the N-terminal resulting from cleavage between adjacent cleavable residues). Class 2 includes peptides that contain plausible missed cleavages that are often identified in sequence searching. These only include peptides with missed cleavages where D, E, K, or R is near the missed cleavage site (46). Other missed cleavages can occur when digestion is incomplete, so can be very significant for short-time digestion. Classes 1 and 2, which represent only 10.7% (312) of identified peptides, account for over 70% of total peptide abundance (Table II). Their sequence lengths ranged from 4 to 51 amino acid residues, covering over 96% of the total protein sequence. Identifications that were also made for 16 small peptides composed of two, three, or four amino acids in special LTQ-Orbitrap runs at lower m/z settings (100–600 m/z). They were identified by both sequence database search and the NIST MSMS library containing tryptic dipeptides and tripeptides (47). Note that peptides having fewer than six amino acid residues are generally invisible in sequence searching but are readily identified by spectrum library searching.

Table III. Predominant tryptic peptides.

Peptides without and with expected missed cleavages are shown in separate sections. All cysteines are alkylated. Sites of missed cleavage in Class 2 are shown in boldface. MRAB, median relative abundance; Ordering is by PSIG, peptide identification significance.

Class 1: Simple tryptic peptides
Rank PSIG m/z Z Peptide sequence MRAB
1 0.92 682.371 3 VFDEFKPLVEEPQNLIK 0.93
2 0.89 575.312 2 LVNEVTEFAK 0.79
3 0.85 547.318 3 KVPQVSTPTLVEVSR 0.72
4 0.81 637.649 3 RPCFSALEVDETYVPK 0.66
5 0.78 992.120 3 SHCIAEVENDEMPADLPSLAADFVESK 0.68
6 0.76 480.785 2 FQNALLVR 0.59
7 0.75 722.325 2 YICENQDSISSK 0.58
8 0.72 489.953 3 RHPDYSVVLLLR 0.53
9 0.71 671.822 2 AVMDDFAAFVEK 0.60
10 0.71 686.288 2 AAFTECCQAADK 0.61
11 0.70 464.251 2 YLYEIAR 0.49
12 0.68 829.380 2 QNCELFEQLGEYK 0.46
13 0.67 717.771 2 ETYGEMADCCAK 0.46
14 0.66 467.263 2 LCTVATLR 0.45
15 0.65 386.723 2 AACLLPK 0.43
16 0.65 749.793 2 TCVADESAENCDK 0.43
17 0.64 395.240 2 LVTDLTK 0.42
18 0.64 569.753 2 CCTESLVNR 0.42
19 0.63 440.725 2 AEFAEVSK 0.40
20 0.62 339.851 3 SLHTLFGDK 0.38
21 0.62 500.806 2 QTALVELVK 0.38
22 0.60 840.077 3 MPCAEDYLSVVLNQLCVLHEK 0.41
23 0.59 492.748 2 TYETTLEK 0.36
24 0.59 518.205 3 CCAAADPHECYAK 0.38
25 0.59 754.013 3 EFNAETFTFHADICTLSEK 0.35
26 0.53 581.637 3 HPYFYAPELLFFAK 0.32
27 0.51 470.728 2 DDNPNLPR 0.28
28 0.47 507.304 2 LVAASQAALGL 0.30
29 0.47 337.193 2 AWAVAR 0.30
30 0.45 509.272 2 SLHTLFGDK 0.21
31 0.42 522.465 4 VHTECCHGDLLECADDR 0.18
32 0.40 830.767 3 ALVLIAFAQYLQQCPFEDHVK 0.17
33 0.40 663.322 4 LVRPEVDVMCTAFHDNEETFLK 0.18
34 0.38 564.854 2 KQTALVELVK 0.15
35 0.37 633.670 3 RHPYFYAPELLFFAK 0.15
36 0.36 820.473 2 KVPQVSTPTLVEVSR 0.14
37 0.36 696.285 3 VHTECCHGDLLECADDR 0.14
38 0.34 1023.052 2 VFDEFKPLVEEPQNLIK 0.13
39 0.34 376.905 3 KQTALVELVK 0.12
40 0.32 435.878 3 ECCEKPLLEK 0.11
41 0.31 871.951 2 HPYFYAPELLFFAK 0.17
42 0.31 756.426 2 VPQVSTPTLVEVSR 0.10
43 0.30 476.225 2 DLGEENFK 0.10
44 0.29 656.375 2 HPDYSVVLLLR 0.09
45 0.26 955.970 2 RPCFSALEVDETYVPK 0.10
46 0.25 1013.599 1 LVAASQAALGL 0.09
47 0.24 347.229 1 VTK 0.06
48 0.24 884.093 3 LVRPEVDVMCTAFHDNEETFLK 0.07
49 0.24 669.335 4 RMPCAEDYLSVVLNQLCVLHEK 0.07
50 0.23 623.327 4 ALVLIAFAQYLQQCPFEDHVK 0.06
Class 2: Tryptic peptides with expected missed-cleavage
Rank PSIG m/z Z Peptide sequence MRAB
1 0.85 695.35 4 LVRPEVDVMCTAFHDNEETFLKK 0.76
2 0.84 556.48 5 LVRPEVDVMCTAFHDNEETFLKK 0.74
3 0.76 647.04 4 VHTECCHGDLLECADDRADLAK 0.81
4 0.67 543.25 3 ADDKETCFAEEGKK 0.46
5 0.67 409.54 3 FKDLGEENFK 0.45
6 0.64 516.27 3 LKECCEKPLLEK 0.50
7 0.61 387.46 4 LKECCEKPLLEK 0.49
8 0.56 407.69 4 ADDKETCFAEEGKK 0.33
9 0.55 517.83 5 VHTECCHGDLLECADDRADLAK 0.32
10 0.49 358.85 3 LDELRDEGK 0.35
11 0.47 528.05 5 QEPERNECFLQHKDDNPNLPR 0.23
12 0.45 572.27 3 QEPERNECFLQHK 0.24
13 0.42 537.78 2 LDELRDEGK 0.19
14 0.41 659.81 4 QEPERNECFLQHKDDNPNLPR 0.24
15 0.40 613.81 2 FKDLGEENFK 0.20
16 0.35 499.99 4 NECFLQHKDDNPNLPR 0.12
17 0.34 483.77 4 SLHTLFGDKLCTVATLR 0.12
18 0.33 862.38 3 VHTECCHGDLLECADDRADLAK 0.12
19 0.32 633.67 3 HPYFYAPELLFFAKR 0.15
20 0.31 644.68 3 SLHTLFGDKLCTVATLR 0.10
Classes 3 and 6: Common and Less Common Modifications

These peptides are separated into two broad classes: first to be discussed are the 858 analytical modifications (Table IV) and, second, in the next section, are 22 post-translational modifications likely present in the starting HSA (Table V). The origin of a few, such as methionine oxidation, can be unclear. Table IV lists the identified analytical modifications, all of which have been reported in the literature (4860). The most frequently observed were oxidation of methionine, carbamylation of N terminus and lysine (when urea is used as a denaturant), formylation of N terminus, and lysine, serine, and threonine, and adduction by sodium and iron, with maximum intensities in the range 1% to 4% of the most abundant ion. Several adducts, including sodium, iron, and calcium, most often appeared to originate in the electrospray, as indicated by their co-eluting with the nonadduct peptide. In some cases, two distinct chromatographic peaks for the same modified peptide were observed, suggesting the presence of some adduct in the original digest. This was especially common for methionine-oxidized peptides (49). One less discussed modification was transpeptidation, which involves the transfer of a basic residue to the N or C terminus of a peptide. Several papers have highlighted its ubiquity (5356). Transpeptidation was observed as the N- and C- terminal adduct of arginine or lysine. Fifty-three such peptides were identified, contributing 0.09% to the peptide total intensity and covering 50% of the HSA sequence. Another unusual modification, vicinal disulfide (5758) - the formation of a disulfide bond between adjacent cysteines, was observed between Cys90–91, Cys168–169, and Cys476–477. The delta mass of −2.0157 was detected on the bridged form of these adjacent cysteines in the MS2 spectra. Although they had low abundances of 0.15%, 0.23%, and 0.07% of the most abundant ion in the run, respectively, each was observed in over one-quarter of the LTQ Orbitrap runs. The MS2 fragmentation pattern of these peptides was consistent with that of their unmodified counterpart but without a cleavage product from the adjacent cysteine bonds. Some of adducts, such as Fe and Ca, often appeared to be attached to residues not reported by the Unimod database (59) - work is underway to confirm these results and define the positions more precisely.

Subclass: Post-translational Modifications (PTM)

HSA is known to possess various biological modifications (1213). Such modifications have direct effects on the binding and antioxidant properties of the molecule and are associated with various diseases (13,1519,6163). Therefore, these modifications were examined with special care. Using the methods described above, we were able to detect the presence of six categories of PTMs in HSA (Table V). These were: (a) cysteinylation (cysteine addition to Cys34), (b) Cys34 oxidation, (c) protein terminus truncation (the loss of aspartate-alanine from the N terminus or leucine from the C terminus), (d) glycation, (e) acetylation, and (f) phosphorylation. Except for cysteinylation, these identifications were made with a mass accuracy of less than 3 ppm derived from the high resolution LTQ-Orbitrap data under normal digestion conditions. Cysteinylation was only identified in analyses without a reducing agent (64), thereby leaving all native disulfide bonds intact.

Cysteinylation at Cys34 was a particularly abundant modification (6465), roughly 70% as abundant as the unmodified counterpart in the same nonreducing runs. Oxidation of Cys34 to sulfenic acid, sulfonic acid, and sulfinamide was detected under typical digestion conditions in four peptide ions at abundance levels of about 5% of their unmodified counterparts. All 3+ charge states of these peptides were reported by Li and Grigoryan et al. (6667). Loss of N-terminal aspartate-alanine (-186.06 Da) and C-terminal leucine (-113.08 Da) was identified, and their median relative abundances suggested that C-terminal truncation was more prevalent than N-terminal truncation (6869). Several other modifications were also detected in the HSA digestion. Glycation was observed at several lysine residues, including the well documented Lys525 (7073). This specific modification was detected with an identification frequency range from 26% to 53% of LTQ-Orbitrap runs and an abundance of up to 1% of the most abundant ion in the run. Two lysine sites of HSA acetylation were identified, with Lys199 being seen in 82% of runs and Lys 525 observed in only 7% of the runs. HSA phosphorylation was rare, observed in three ions at very low abundance. All of these were only observed in CPTAC studies (2829), which employed recombinant human serum albumin. All modification sites in Table V have been reported by the Universal Protein Resource (UniProt) and PhosphoSitePlus (7071) and other references (7275).

Classes 4 and 5: In-solution and In-source Semitryptic Peptides

These peptides were generated by either in-source fragmentation (labeled “in-source”) in the electrospray or nontryptic cleavage during digestion (labeled “in-solution”). The former were distinguished by their co-elution with their precursor peptides (generally observed within 5 seconds) and the presence of their precursor m/z as a major peak in the MS2 of their precursor peptide. As shown in Table II, 263 of these (Class 4) were identified as in-source fragments, and 577 (Class 5) were generated during the in-solution digestion.

Table VI lists the most frequently identified semitryptic peptides and their precursor ions. The relative abundance ratios of in-source or in-solution fragments to their probable precursor ion were found to vary by up to 25%. To ensure confidence in their identification, those of rank 1–2, “FSALEVDETYVPK,” and “FYAPELLFFAK,” in the Class 5 section of Table IV, were synthesized and co-injected in digestion mixtures to confirm their non-in-source origin. Both eluted at distinctly different times as their potential precursor peptide and were not dominant fragmentation products of this potential precursor, confirming that they originated in the digestion process. Note that both are characteristic of “pseudotryptic” activity (76). Curiously, the three very abundant in-solution peptides, numbers 1, 2, and 4, in the second part of Table VI were reported as the values of candidate biomarkers for disease diagnosis (7779).

Table VI. The most significant semitryptic peptide ions. In-source and in-solution semitryptic peptides are compared with their most probably precursor peptide ions, sorted by PSIG. All cysteines are alkylated. MRAB, median relative abundance; PIIF, peptide identification frequency.
Class 4 - Semitryptic peptide ions from in-source fragmentation
Precursor peptide ion
Rank PSIG m/z z Sequence MRAB PIIF m/z z Sequence MRAB PIIF
1 0.242 481.233 2 DELRDEGK 0.0921 0.64 358.853 3 LDELRDEGK 0.3544 0.68
2 0.226 577.320 1 TDLTK 0.0852 0.60 395.240 2 LVTDLTK 0.4193 0.98
3 0.187 680.362 1 FAEVSK 0.0553 0.63 440.725 2 AEFAEVSK 0.4006 0.98
4 0.182 676.388 1 VTDLTK 0.0487 0.68 395.240 2 LVTDLTK 0.4193 0.98
5 0.174 685.436 1 NALLVR 0.0532 0.57 480.785 2 FQNALLVR 0.5903 0.99
6 0.151 937.463 1 NEVTEFAK 0.0449 0.51 575.312 2 LVNEVTEFAK 0.7933 0.99
7 0.129 720.378 1 ETTLEK 0.0349 0.47 492.748 2 TYETTLEK 0.3628 0.97
8 0.115 596.352 1 PNLPR 0.0238 0.55 470.728 2 DDNPNLPR 0.2778 0.92
9 0.114 764.431 1 LYEIAR 0.0240 0.55 464.251 2 YLYEIAR 0.4891 0.99
10 0.109 602.341 1 WAVAR 0.0199 0.59 337.193 2 AWAVAR 0.3001 0.73
11 0.103 482.779 2 PELLFFAK 0.0134 0.79 581.637 3 HPYFYAPELLFFAK 0.3168 0.89
12 0.096 450.762 2 PTLVEVSR 0.0104 0.88 756.426 2 VPQVSTPTLVEVSR 0.0972 1.00
13 0.080 809.404 1 EFAEVSK 0.0144 0.45 440.725 2 AEFAEVSK 0.4006 0.98
14 0.076 723.331 1 GEENFK 0.0125 0.46 476.225 2 DLGEENFK 0.1031 0.87
15 0.071 813.495 1 QNALLVR 0.0122 0.41 480.785 2 FQNALLVR 0.5903 0.99
16 0.068 771.498 1 ALVELVK 0.0109 0.43 500.806 2 QTALVELVK 0.3819 1.00
17 0.065 533.293 1 AEVSK 0.0159 0.26 440.725 2 AEFAEVSK 0.4006 0.98
18 0.064 465.756 2 LHTLFGDK 0.0130 0.32 339.851 3 SLHTLFGDK 0.3841 0.99
19 0.062 883.441 1 YETTLEK 0.0142 0.27 492.748 2 TYETTLEK 0.3628 0.97
20 0.059 900.515 1 PTLVEVSR 0.0116 0.30 756.426 2 VPQVSTPTLVEVSR 0.0972 1.00
Class 5 - Semitryptic peptide ions formed during digestion
Precursor peptide ion
Rank PSIG m/z z Sequence MRAB PIIF m/z z Sequence MRAB PIIF
1 0.356 749.378 2 FSALEVDETYVPK 0.1655 0.77 637.649 3 RPCFSALEVDETYVPK 0.6613 0.99
2 0.277 673.364 2 FYAPELLFFAK 0.0823 0.93 581.637 3 HPYFYAPELLFFAK 0.3168 0.89
3 0.182 630.365 1 CLLPK 0.0505 0.65 386.723 2 AACLLPK 0.4306 0.99
4 0.156 588.279 3 AQYLQQCPFEDHVK 0.0280 0.87 830.767 3 ALVLIAFAQYLQQCPFEDHVK 0.1725 0.93
5 0.149 701.402 1 ACLLPK 0.0335 0.66 386.723 2 AACLLPK 0.4306 0.99
6 0.132 660.404 1 TVATLR 0.0415 0.42 467.263 2 LCTVATLR 0.4499 0.98
7 0.102 300.192 2 PLLEK 0.0113 0.92 516.271 3 LKECCEKPLLEK 0.5031 0.81
8 0.100 478.576 3 KECCEKPLLEK 0.0232 0.43 516.271 3 LKECCEKPLLEK 0.5031 0.81
9 0.097 820.435 1 CTVATLR 0.0291 0.32 467.263 2 LCTVATLR 0.4499 0.98
10 0.094 675.844 2 SALEVDETYVPK 0.0134 0.65 637.649 3 RPCFSALEVDETYVPK 0.6613 0.99
11 0.083 452.703 2 PHECYAK 0.0077 0.90 518.205 3 CCAAADPHECYAK 0.3831 0.92
12 0.083 587.283 2 HADICTLSEK 0.0087 0.79 754.013 3 EFNAETFTFHADICTLSEK 0.3521 0.99
13 0.083 302.138 3 PHECYAK 0.0121 0.57 518.205 3 CCAAADPHECYAK 0.3831 0.92
14 0.072 746.482 1 ALVLIAF 0.0082 0.63 830.767 3 ALVLIAFAQYLQQCPFEDHVK 0.1725 0.93
15 0.066 587.377 1 VELVK 0.0077 0.57 500.806 2 QTALVELVK 0.3819 1.00
16 0.063 487.749 2 ADDRADLAK 0.0052 0.76 647.035 4 VHTECCHGDLLECADDRADLAK 0.8054 0.71
17 0.060 638.843 2 LPSLAADFVESK 0.0043 0.84 992.120 3 SHCIAEVENDEMPADLPSLAADFVESK 0.6785 0.89
18 0.058 710.702 3 AEDYLSVVLNQLCVLHEK 0.0061 0.55 840.077 3 MPCAEDYLSVVLNQLCVLHEK 0.4052 0.88
19 0.056 640.367 2 PLVEEPQNLIK 0.0035 0.90 682.371 3 VFDEFKPLVEEPQNLIK 0.9273 0.91
20 0.055 877.919 2 PCFSALEVDETYVPK 0.0060 0.51 637.649 3 RPCFSALEVDETYVPK 0.6613 0.99
Class 7: Tryptic Peptides with Unexpected Missed-cleavage

Tryptic cleavages after K/R not hindered by nearby acidic or cleavable basic residues or proline are expected to be rapid, a large fraction of which cleave in less than 30 min. Hence, at longer digestion times relative amounts of peptides with such missed cleavages are expected to be small. However, a number of such trypsin cleavage sites persisted even after 18 h digestion periods and changed little in relative abundance between 2 and 18 h. A set of 293 such peptides were identified, accounting for 10% of peptides. The most significant ions of these persistent peptides with a PIIF over 0.40 are given in Table VII. The reason for their stability is not clear. It is plausible, but unproven, that a fraction of these peptides, once formed, have isomerized or coiled in some way to prevent further trypsinization.

Table VII. Most frequently observed ions with unexpected missed cleavage sites. Residues in boldface are the unexpected missed cleavage sites. All cysteines are alkylated. Pyro-cmC, Pyro-carbamidomethyl (N-terminus); NUMC, Number of unexpected missed-cleavage sites; MRAB, median relative abundance; PIIF, peptide identification frequency; Rank, based on peptide identification significance, PSIG.
Rank PSIG m/z z NUMC Sequence MRAB PIIF
1 0.126 687.138 4 2 RHPDYSVVLLLRLAKTYETTLEK 0.0245 0.65
2 0.108 549.912 5 2 RHPDYSVVLLLRLAKTYETTLEK 0.0212 0.55
3 0.079 338.150 3 1 (Pyro-cmC)CCKHPEAK 0.0114 0.55
4 0.079 445.771 4 1 RHPDYSVVLLLRLAK 0.0088 0.70
5 0.061 1006.130 5 3 NYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEK 0.0087 0.43
6 0.059 438.259 2 1 LSQRFPK 0.0045 0.78
7 0.051 648.113 4 2 HPDYSVVLLLRLAKTYETTLEK 0.0053 0.49
8 0.049 856.433 4 1 DLGEENFKALVLIAFAQYLQQCPFEDHVK 0.0048 0.50
9 0.037 646.152 5 5 FGERAFKAWAVARLSQRFPKAEFAEVSK 0.0029 0.47
10 0.035 740.381 5 1 FKDLGEENFKALVLIAFAQYLQQCPFEDHVK 0.0025 0.50
Class 8: Under and Over Alkylation

Low accessibility of cysteine sites may lead to incomplete cysteine alkylation (80), which was found for 205 peptides. Alternatively, over-alkylation by iodoacetamide can occur when alkylation is not stopped by removing or “quenching” IAA with added DTT (81). Residues, E, H, and K were the most commonly alkylated residues in these cases. Over-alkylation was observed for 51 peptide ions. Table VIII shows the eight most frequently observed over-alkylated and under-alkylated peptides, all of which were observed in over 40% of LTQ-orbitrap runs. Peptides with under-/over-alkylation typically amounted to 1.9% of the HSA abundance under conventional digestion conditions.

Table VIII. Ions with under/over alkylation sites ranked according to peptide identification significance. All bolded residues are either under-alkylated or over-alkylated. Cam, carbamidomethylation; MRAB, median relative abundance; PIIF, peptide identification frequency; Rank, based on peptide identification significance, PSIG.
Rank PSIG m/z z Over-alkylation Sequence MRAB PIIF
1 0.158 758.597 4 Glu SHC(Cam)IAEVE(Cam)NDEMPADLPSLAADFVESK 0.0334 0.75
2 0.042 567.882 5 His LVRPEVDVMC(Cam)TAFH(Cam)DNEETFLKK 0.0025 0.69
3 0.039 728.008 3 Lys AAFTEC(Cam)C(Cam)QAADK(Cam)AAC(Cam)LLPK 0.0020 0.78
4 0.033 709.601 4 His LVRPEVDVMC(Cam)TAFH(Cam)DNEETFLKK 0.0018 0.60
5 0.033 652.678 3 Lys HPYFYAPELLFFAK(Cam)R 0.0024 0.45
6 0.022 335.524 3 Lys LK(Cam)C(Cam)ASLQK 0.0010 0.48
7 0.019 358.858 3 His SLH(Cam)TLFGDK 0.0007 0.54
8 0.017 674.068 4 His QEPERNEC(Cam)FLQH(Cam)KDDNPNLPR 0.0006 0.46
Rank PSIG m/z z Under-alkylation Sequence MRAB PIIF
1 0.091 524.241 3 1 ADDKETCFAEEGKK 0.0109 0.75
2 0.086 393.432 4 1 ADDKETCFAEEGKK 0.0093 0.80
3 0.069 693.814 2 1 YICENQDSISSK 0.0081 0.58
4 0.066 735.006 3 1 EFNAETFTFHADICTLSEK 0.0072 0.60
5 0.065 545.074 5 1 LVRPEVDVMCTAFHDNEETFLKK 0.0066 0.63
6 0.064 438.753 2 1 LCTVATLR 0.0078 0.52
7 0.063 618.642 3 1 RPCFSALEVDETYVPK 0.0084 0.47
8 0.062 629.266 2 2 AAFTECCQAADK 0.0075 0.51
Class 9: Tryptic Peptides with Unidentified Modifications

In an effort to identify all products of digestion, searches applied two nontarget modification search engines, InSpect (37) and TagRecon (38), to find any single modification changing the peptide mass by up to 300 Da. Those that were identified were then added to the list of targeted modifications. Because exact mass was especially important for identifying members of this class, only those identified at high mass accuracy (Orbitrap) were included, and subject to the requirement that they appear in at least 10% of the runs. This generated 470 peptide ions with unknown modifications, accounting for 16.1% of total library peptides and 2.2% of the total peptide abundance. In most cases their position in the sequence and even their exact chemical formula is not yet certain. Table IX lists the peptides of this class appearing in over 40% of Orbitrap runs. One particularly prevalent modification, identified from over 75% of 350 LTQ Orbitrap runs, had a mass of 69.988 Da and appeared on N terminus (see, Rows 8 and 16 of the Table IX). This appears to be associated with tris(hydroxymethyl)aminomethane (Tris) buffer because it did not appear when ammonium bicarbonate was used in its place. Work in progress will add localization procedures to precisely locate these modification sites and attempt to more precisely determine chemical formulas.

Table IX. Unidentified modifications from “blind Search.” All data from Orbitrap runs; all cysteines are alkylated; the bolded residue is the probable location of the modification with mass given in delta mass column. MRAB, median relative abundance; PIIF, peptide identification frequency. Rank, based on peptide identification significance, PSIG.
Rank PSIG Observed m/z z Sequence Delta mass MRAB PIIF Possible modification
1 0.09 561.242 3 QNCELFEQLGEYK 23.957 0.0110 0.72
2 0.09 554.913 3 AAFTECCQAADK 291.156 0.0125 0.60
3 0.07 432.215 3 AVMDDFAAFVEK −48.005 0.0086 0.41 Dethiomethyl
4 0.07 462.215 2 DDNPNLPR −17.0254 0.0068 0.68 −NH3
5 0.06 836.099 3 ALVLIAFAQYLQQCPFEDHVK 15.996 0.0084 0.45 Oxidation
6 0.06 528.938 3 LKECCEKPLLEK 37.941 0.0059 0.61 +Ca (position?)
7 0.06 839.840 2 QNCELFEQLGEYK 20.919 0.0060 0.58
8 0.05 539.600 3 LKECCEKPLLEK 69.988 0.0038 0.75 +C3H2S
9 0.04 501.287 3 HPDYSVVLLLR 190.105 0.0051 0.40
10 0.04 453.210 2 DDNPNLPR −35.037 0.0023 0.57 −(H2O + NH3)
11 0.04 417.540 3 FKDLGEENFK 23.953 0.0027 0.49
12 0.04 632.778 2 FKDLGEENFK 37.944 0.0022 0.66 +Ca (from search of unidentified mod.)
13 0.03 462.848 3 ETYGEMADCCAK −48.005 0.0020 0.48 CH4S
14 0.03 497.937 3 HPDYSVVLLLR 180.053 0.0018 0.48
15 0.02 472.271 2 FQNALLVR −17.027 0.0010 0.50 −NH3 (position?)
16 0.02 502.257 2 LCTVATLR 69.988 0.0011 0.41 +C3H2S
17 0.02 652.377 2 LVAASQAALGL 290.147 0.0009 0.48
18 0.01 633.803 2 FKDLGEENFK 39.994 0.0004 0.40 C2O

The final HSA library contains 651 peptide ions with less common modifications and 470 with unidentified (unknown) modifications.

HSA Spectra in the NIST Human Spectral Library

Spectra derived from the newly-built HSA spectral library were compared with HSA peptides already present in the 2012 NIST library of human tryptic peptides (31). Of the 2918 HSA peptide ions derived in this work, 911 were present in the human library, whereas 122 HSA ions in the human library were not in the HSA library. Among the latter set were 72 peptides with new charge states, 15 with common modifications or multiple missed alkylation sites, and 35 semitryptic peptides. All were then added to the HSA library. These new identifications likely arise because of the very wide range of analysis conditions and instruments in experiments from which the human library was built. In fact, 45% of these additions arose from peptides also found in the newly created HSA library, but with lower charge states, possibly reflecting lower protonation levels in some electrospray sources. This comparison also led to the discovery of 55 spectra in the human library that matched spectra in the HSA library, but were not assigned to HSA. These were found to be false identifications caused by assigned spectra for unusual peptides in the present HSA library to simple tryptic peptides of less common proteins in the comprehensive human library - these have been removed in the 2013 release of the human library.

DISCUSSION

Creating a comprehensive library of tandem spectra of peptides for a single protein is a quite different task than building a library of peptides from digests of the thousands of proteins in a “proteome.” Though single protein libraries may appear easier to build, in some ways they are more difficult. This difficulty is a consequence of the need to deal with the wide variety of peptide classes found even in a simple digest, the unpredictability of their concentrations, and even the uncertainty of some of their identities. The procedures described here employ a wider search space necessary to find these peptides, but then adds a variety of quality control filters necessary to reject the increased number of false identifications.

Fig. 2 is a stacked bar graph of peptide ion identification frequency (PIIF) values for nine peptide classes at each residue position along the HSA sequence. This plot illustrates the wide range of fates of the individual residues and their dependence on the locations in the sequence. Note that 100% sequence coverage is achieved. The ordinate provides a measure of the number of different peptides in which each residue can appear in an HSA digest. Maxima are produced in regions where, by virtue of its location within observed peptides, a residue can be found in many different peptide ions. Minima are regions where residues are not well represented because they are not part of readily observed tryptic peptides, due primarily to their proximity to multiple K/R residues that do not give rise to abundant tryptic peptides with missed cleavages, and which form peptides too short to be observed in these experiments. These regions are typically probed using alternate proteases.

Fig. 2.

Fig. 2.

Distribution of nine peptide classes along the amino acid sequence of the protein. At each amino acid position is given the summed peptide ion identification frequency (PIIF) from all peptide classes containing that amino acid. Simple tryptic in blue (Class 1), Expected missed-cleavage in red (Class 2), Common modification in yellow (Class 3), In-source semitryptic in purple (Class 4), In-solution semitryptic in orange (Class 5), Artifact and PTM in light blue (Class 6), Unexpected missed-cleavage in green (Class 7), Under/over alkylation in green-blue (Class 8), and Unidentified modification in pink (Class 9).

As evident in Table II, the bulk of the product ion intensity from the digestion of HSA arises from conventional tryptic peptides. However, as evident in Fig. 2, in terms of numbers, the majority of identifiable peptides represent other varieties of peptides that are generally ignored by shotgun proteomics. In cases where peptides have biological significance, such as PTMs, searching a library containing such spectra will ensure that the modification is not missed. Otherwise, as would occur if it were not explicitly sought the modification could be “crowded out” by using the large search space. Further, unexpected quantities of unusual peptides may signify problems with the digestion or sample preparation.

Single protein libraries have a variety of applications. First, they provide a convenient means of storing and re-identifying all identifiable peptides and modifications found in the digest of a given protein. This can assist the separation of true PTMs from analytical artifacts by limiting possible identifications to previously observed peptides. In fact, relative numbers of spectra identified in prior runs provide a measure of “prior probabilities” (40) of potential value in deriving more accurate probabilities. A second application is to identify the large possible number of peptides in a digest (e.g. carbamylated or otherwise modified) to prevent their misidentification as well as to assess the quality of the sample preparation process. A third application is the integration of these spectra with comprehensive proteome libraries, such the NIST human library (31). This not only adds more and better quality peptide spectra for individual proteins, but, as described earlier, can reveal incorrect identifications of peptides that may be falsely identified as tryptic peptides of minor proteins, but which actually originate as minor modifications of peptides from major proteins.

This first attempt to build a single protein library involved a considerable amount of manual inspection to refine filters and assess their efficacy. This process was aided by the availability of the large numbers of digest results available from prior studies (2627); however, far fewer are expected to be needed for future library building efforts. Future work will extend this method to other proteins and proteases as well as develop a fully automated method for single-protein library creation. It is hoped that this procedure can then be extended to a large number of proteins of importance in proteomics and become a useful tool for those who have special interest in particular proteins. Other work is ongoing to build libraries of energy-dependent spectra from high resolution, collision-cell instruments.

We note that certain highly modified proteins remain a challenge for fully characterization in libraries of digest peptides. Especially for highly modified proteins, procedures are needed to localize modifications, possibly by extension of widely used methods to fix phosphorylation sites (8283). Highly glycosylated proteins present a special challenge, because glycan heterogeneity, identity, and the analysis of O-linked glycans requires special effort.

The HSA spectral library described in this work is available for download from http://peptide.nist.gov. It contains both 2918 spectra from the filtering described here and 122 spectra from the NIST human library. Occurrence information for the former spectra is given in Supplemental Table S4.

Supplementary Material

Supplemental Data

Footnotes

Author contributions: S.E.S. designed research; L.E.K. and Y.L. performed research; Q.D., X.Y., Y.A.M., and P.A.R. contributed new reagents or analytic tools; Q.D., X.Y., L.E.K., J.S.R., and S.E.S. analyzed data; Q.D., X.Y., J.S.R., P.A.R., and S.E.S. wrote the paper.

* This work was supported by the NIH/NCI CPTAC program (http://proteomics.cancer.gov/) through a series of Interagency Agreements with NIST.

Single Protein Library Building: HSA.

DISCLAIMER: Certain commercial instruments are identified in this document. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the products identified are necessarily the best available for the purpose.

1 The abbreviations used are:

HSA
Human Serum Albumin
MS1
full MS scan
MS2
tandem MS scan
PTM
post-translational modification
DTT
dithiothreitol
IAA
iodoacetamide
TCEP
tris(2-carboxyethyl)phosphine
TRIS
tris-hydroxymethyl-aminomethane
NIH
National Institutes of Health
NCI
National Cancer Institute
CPTAC
Clinical Proteomic Technology Assessment for Cancer
LTQ
linear trap quadrupole
NIST
National Institute of Standards and Technology
FDR
false discovery rate
MRAB
median relative abundance
PIIF
peptide ion identification frequency
XIC
extracted ion chromatograms
NBR
number of basic residue
PSIG
peptide identification significance.

REFERENCES

  • 1. Washburn M. P., Wolters D., Yates J. R., 3rd (2001) Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19, 242–247 [DOI] [PubMed] [Google Scholar]
  • 2. Mallick P., Kuster B. (2010) Proteomics: a pragmatic perspective. Nat. Biotechnol. 28, 695–709 [DOI] [PubMed] [Google Scholar]
  • 3. Nagaraj N., Wisniewski J. R., Geiger T., Cox J., Kircher M., Kelso J., Pääbo S., Mann M. (2011) Deep proteome and transcriptome mapping of a human cancer cell line. Mol. Syst. Biol. 7, 548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Baldwin M. (2004) Protein identification by mass spectrometry: issues to be considered. Mol. Cell. Proteomics 3, 1–9 [DOI] [PubMed] [Google Scholar]
  • 5. Nesvizhskii A. I., Roos F. F., Grossmann J., Vogelzang M., Eddes J. S., Gruissem W., Baginsky S., Aebersold R. (2005) Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol. Cell. Proteomics 5, 652–670 [DOI] [PubMed] [Google Scholar]
  • 6. Picotti P., Aebersold R., Domon B. (2007) The implications of proteolytic background for shotgun proteomics. Mol. Cell, Proteomics 6, 1589–1598 [DOI] [PubMed] [Google Scholar]
  • 7. Yates J. R., 3rd, Morgan S. F., Gatlin C. L., Griffin P. R., Eng J. K. (1998) Method to compare collision-induced dissociation spectra of peptides: potential for library searching and subtractive analysis. Anal. Chem. 70, 3557–3565 [DOI] [PubMed] [Google Scholar]
  • 8. Frewen B. E., Merrihew G. E., Wu C. C., Noble W. S., MacCoss M. J. (2006) Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal. Chem. 78, 5678–5684 [DOI] [PubMed] [Google Scholar]
  • 9. Craig R., Cortens J. C., Fenyo D., Beavis R. C. (2006) Using annotated peptide mass spectrum libraries for protein identification. J. Proteome Res. 5, 1843–1849 [DOI] [PubMed] [Google Scholar]
  • 10. Lam H., Deutsch E. W., Eddes J. S., Eng J. K., King N., Stein S. E., Aebersold R. (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667 [DOI] [PubMed] [Google Scholar]
  • 11. Lam H., Aebersold R. (2011) Building and searching tandem mass (MS/MS) spectral libraries for peptide identification in proteomics. Method 54, 424–431 [DOI] [PubMed] [Google Scholar]
  • 12. Theodore P. (1995) All about albumin: biochemistry, genetics, and medical applications. Academic Press San Diego, California [Google Scholar]
  • 13. Fanalia G., Masib A., Trezzab V., Marinob M., Fasanoa M., Ascenzib P. (2012) Human serum albumin: from bench to bedside. Mol. Aspects Med. 33, 209–290 [DOI] [PubMed] [Google Scholar]
  • 14. Kratz F. (2008) Albumin as a drug carrier: design of prodrugs, drug conjugates, and nanoparticles. J. Control. Release 132, 171–183 [DOI] [PubMed] [Google Scholar]
  • 15. Barber M. D., Ross J. A., Fearon K. C. (1999) Changes in nutritional, functional, and inflammatorymarkers in advanced pancreatic cancer. Nutr. Cancer 35, 106–110 [DOI] [PubMed] [Google Scholar]
  • 16. Koga M., Kasayama S. (2010) Clinical impact of glycated albumin as another glycemic control marker. Endocrine J. 57, 751–762 [DOI] [PubMed] [Google Scholar]
  • 17. Roohk H. V., Zaidi A. R. (2008) A review of glycated albumin as an intermediate glycation index for controlling diabetes, J. Diabet. Sci. Technol. 2, 1114–1121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Gundry R., Fu Q., Jelinek C., Van Eyk J. E., Cotter R. (2007) Investigation of an albumin-enriched fraction of human serum and its albuminome. Proteomics Clin. Appl. 1, 73–88 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. David Bar-Or D., Rael L. T., Bar-Or R., Slone D. S., Craun M. L. (2006) Case report: The formation and rapid clearance of a truncated albumin species in a critically ill patient. Clin. Chim. Acta 365, 346–349 [DOI] [PubMed] [Google Scholar]
  • 20. Mingetti P. P., Ruffner D. E., Kuang W. J., Dennison O. E., Hawkins J. W., Beattie W. G., Dugaiczyk A. (1986) Molecular structure of the human albumin gene is revealed by nucleotide sequence within q11–22 of chromosome 4. J. Biol. Chem. 261, 6747–6757 [PubMed] [Google Scholar]
  • 21. Kobayashi K. (2006) Summary of recombinant human serum albumin development. Biologicals 34, 55–59 [DOI] [PubMed] [Google Scholar]
  • 22. Chen Y., Chen W., Cobb M. H., Zhao Y. (2009) PTMap: A sequence alignment software for unrestricted, accurate, and full-spectrum identification of post-translational modification sites. Proc. Natl. Acad. Sci. U.S.A. 106, 761–766 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Tanner S., Payne S. H., Dasari S., Shen Z., Wilmarth P. A., David L. L., Loomis W. F., Briggs S. P., Bafna V. (2008) Accurate annotation of peptide modifications through unrestrictive database search. J. Proteome Res. 7, 170–181 [DOI] [PubMed] [Google Scholar]
  • 24. Wa C., Cerny R., Hage D. S. (2006) Obtaining high sequence coverage in matrix-assisted laser desorption time-of-flight mass spectrometry for studies of protein modification: analysis of human serum albumin as a model. Anal. Biochem. 349, 229–41 [DOI] [PubMed] [Google Scholar]
  • 25. Aldini G., Gamberoni L., Orioli M., Beretta G., Regazzoni L., Maffei F. R., Carini M. (2006) Mass spectrometric characterization of covalent modification of human serum albumin by 4-hydroxy-trans-2-nonenal. J. Mass Spectrom. 41, 1149–1161 [DOI] [PubMed] [Google Scholar]
  • 26. Lowenthal M. S., Liang Y., Phinney K. W., Stein S. E. (2013) Quantitative bottom-up proteomics depends on digestion conditions. Anal. Chem. 1, 551–558 [DOI] [PubMed] [Google Scholar]
  • 27. Walmsley S. J., Rudnick P. A., Liang L., Dong Q., Stein S. E., Nesvizhskii A. I. (2013) Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics. J. Proteome Res. 12, 5666–5680 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Tabb D. L., Vega-Montoto L., Rudnick P. A., Variyath A. M., Ham A. J., Bunk D. M., Kilpatrick L. E., Billheimer D. D., Blackman R. K., Cardasis H. L., Carr S. A., Clauser K. R., Jaffe J. D., Kowalski K. A., Neubert T. A., Regnier F. E., Schilling B., Tegeler T. J., Wang M., Wang P., Whiteaker J. R., Zimmerman L. J., Fisher S. J., Gibson B. W., Kinsinger C. R., Mesri M., Rodriguez H., Stein S. E., Tempst P., Paulovich A. G., Liebler D. C., Spiegelman C. (2010) Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J. Proteome Res. 9, 761–776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Paulovich A. G., Billheimer D., Ham A. J., Vega-Montoto L., Rudnick P. A., Tabb D. L., Wang P., Blackman R. K., Bunk D. M., Cardasis H. L., Clauser K. R., Kinsinger C. R., Schilling B., Tegeler T. J., Variyath A. M., Wang M., Whiteaker J. R., Zimmerman L. J., Fenyo D., Carr S. A., Fisher S. J., Gibson B. W., Mesri M., Neubert T. A., Regnier F. E., Rodriguez H., Spiegelman C., Stein S. E., Tempst P., Liebler D. C. (2010) Interlaboratory study characterizing a yeast performance standard for benchmarking LC-MS platform performance. Mol. Cell. Proteomics 9, 242–254 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Rudnick P. A., Clauser K. R., Kilpatrick L. E., Tchekhovskoi D. V., Neta P., Blonder N., Billheimer D. D., Blackman R. K., Bunk D. M., Cardasis H. L., Ham A. J., Jaffe J. D., Kinsinger C. R., Mesri M., Neubert T. A., Schilling B., Tabb D. L., Tegeler T. J., Vega-Montoto L., Variyath A. M., Wang M., Wang P., Whiteaker J. R., Zimmerman L. J., Carr S. A., Fisher S. J., Gibson B. W., Paulovich A. G., Regnier F. E., Rodriguez H., Spiegelman C., Tempst P., Liebler D. C., Stein S. E. (2010) Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Mol. Cell. Proteomics 9, 225–241 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Eds. Stein S. E., Rudnick P. A. NIST peptide tandem mass spectral libraries. human peptide mass spectral reference data, H. sapiens, ion trap, Official Build Date: Feb. 4, 2009. National Institute of Standards and Technology, Gaithersburg, MD, 20899. Downloaded from http://peptide.nist.gov on October 17, 2012 [Google Scholar]
  • 32. Loevenich S. N., Brunner E., King N. L., Deutsch E. W., Stein S. E., Aebersold R., Hafen E. (2009) The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation. BMC Bioinformatics 11, 10–59 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Geer L. Y., Markey S. P., Kowalak J. A., Wagner L., Xu M., Maynard D. M., Yang X., Shi W., Bryant S. H. (2004) Open mass spectrometry search algorithm. J. Proteome Res. 3, 958–964 [DOI] [PubMed] [Google Scholar]
  • 34. Craig R., Beavis R. C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 [DOI] [PubMed] [Google Scholar]
  • 35. Keller A., Eng J., Zhang N., Li X., Aebersold R., (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 1, 2005–2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Clauser K. R., Baker P., Burlingame A. L. (1999) Role of accurate mass measurement (+/−10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Anal. Chem. 71, 2871–2882 [DOI] [PubMed] [Google Scholar]
  • 37. Tanner S., Shu H. J., Frank A., Wang L. C., Zandi E., Mumby M., Pevzner P. A., Bafna V. (2005) InsPecT: identification of posttransiationally modified peptides from tandem mass spectra. Anal. Chem. 77, 4626–4639 [DOI] [PubMed] [Google Scholar]
  • 38. Dasari S., Chambers M. C., Slebos R. J., Zimmerman L. J., Ham A. J. L., Tabb D. L. (2010) TagRecon: high-throughput mutation identification through sequence tagging. J. Proteome Res. 9, 1716–1726 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Elias J. E., Gygi S. P. (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 [DOI] [PubMed] [Google Scholar]
  • 40. Stein S. (2012) Mass spectral reference libraries: an ever-expanding resource for chemical identification. Anal. Chem. 84, 7274–7282 [DOI] [PubMed] [Google Scholar]
  • 41.NIST MSQC Pipeline - Software for Monitoring LC-MS Performance (Data Version: version 1.2.0 June 17, 2011). URL http://peptide.nist.gov/software/nist_msqc_pipeline/NIST_MSQC_Pipeline.html [Google Scholar]
  • 42. Schnier P. D., Gross D. S., Williams E. R. (1995) On the maximum charge state and proton transfer reactivity of peptide and protein ions formed by electrospray ionization. J. Am. Soc. Mass Spectrom. 6, 1086–1097 [DOI] [PubMed] [Google Scholar]
  • 43. Tabb D. L., Huang Y., Wysocki V. H., Yates J. R., 3rd (2004) Influence of basic residue content on fragment ion peak intensities in low-energy collision induced dissociation spectra of peptides. Anal. Chem. 76, 1243–1248 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Pallante G. A., Cassady C. J. (2002) Effects of peptide chain length on the gas-Stage proton transfer properties of doubly-protonated ions from bradykinin and its N-terminal fragment peptides. Int. J. Mass Spectrom. 219, 115–131 [Google Scholar]
  • 45. Fusaro V. A., Mani D. R., Mesirov J. P., Carr S. A. (2009) Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat. Biotechnol. 27, 190–198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Siepen J. A., Keevil E. J., Knight D., Hubbard S. J. (2007) Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics. J. Proteome Res. 6, 399–408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.NIST/EPA/NIH Mass Spectral Library with Search Program (Data Version: NIST 11, Software Version 2.0g). URL http://www.nist.gov/srd/nist1a.cfm [Google Scholar]
  • 48. Rebecchi K. R., Go E. P., Xu L., Woodin C. L., Mure M., Desaire H. (2011) A general protease digestion procedure for optimal protein sequence coverage and post-translational modifications analysis of recombinant glycoproteins: application to the characterization of human lysyl oxidase-like 2 glycosylation. Anal. Chem. 83, 8484–8491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Chen M., Cook K. D. (2007) Oxidation artifacts in the electrospray mass spectrometry of Aβ peptide. Anal. Chem. 79, 2031–2036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Perdivara I., Deterding L. J., Przybylski M., Tomer K. B. (2010) Mass spectrometric identification of oxidative modifications of tryptophan residues in proteins: chemical artifact or post-translational modification? J. Am. Soc. Mass Spectrom. 21, 1114–1117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Lippincott J., Apostol I. (1999) Carbamylation of cysteine: a potential artifact in peptide mapping of hemoglobins in the presence of urea. Anal. Biochem. 267, 57–64 [DOI] [PubMed] [Google Scholar]
  • 52. Berg M., Parbel A., Pettersen H., Fenyö D., Björkesten L. (2006) Detection of artifacts and peptide modifications in liquid chromatography/mass spectrometry data using two-dimensional signal intensity map data visualization. Rapid Commun. Mass Spectrom. 20, 1558–62 [DOI] [PubMed] [Google Scholar]
  • 53. Schaefer H., Chamrad D. C., Marcus K., Reidegeld K. A., Blüggel M., Meyer H. E. (2005) Tryptic transpeptidation products observed in proteome analysis by liquid chromatography-tandem mass spectrometry. Proteomics 5, 846–852 [DOI] [PubMed] [Google Scholar]
  • 54. Xu T., Wong C. C., Kashina A., Yates J. R., III (2009) Identification of N-terminally arginylated proteins and peptides by mass spectrometry. Nat. Protoc. 4, 325–332 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Yagüe J., Paradela A., Ramos M., Ogueta S., Marina A., Barahona F., López de Castro J. A., Vázquez J. (2003) Peptide rearrangement during quadrupole ion trap fragmentation: added complexity to MS/MS spectra. Anal. Chem. 75, 1524–1535 [DOI] [PubMed] [Google Scholar]
  • 56. Fodor S., Zhang Z. (2006) Rearrangement of terminal amino acid residues in peptides by protease-catalyzed intramolecular transpeptidation. Anal. Biochem. 356, 282–290 [DOI] [PubMed] [Google Scholar]
  • 57. Mann M., Jensen O. N. (2003) Proteomic analysis of post-translational modifications. Nat. Biotechnol. 21, 255–261 [DOI] [PubMed] [Google Scholar]
  • 58. Hudaky I., Gaspari Z., Carugo O., Cemazar M., Pongor S., Perczel A. (2004) Vicinal disulfide bridge conformers by experimental methods and by ab initio and DFT molecular computations. Proteins 55, 152–68 [DOI] [PubMed] [Google Scholar]
  • 59.UNIMOD Protein modifications for mass spectrometry: URL: http://www.unimod.org/login.php [DOI] [PubMed] [Google Scholar]
  • 60. Chalkley R. J., Baker P. R., Medzihradszky K. F., Lynn A. J., Burlingame A. L. (2008) In-depth analysis of tandem mass spectrometry data from disparate instrument types. Mol. Cell. Proteomics 7, 2386–2398 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Quinlan G. J., Martin G. S., Evans T. W. (2005) Albumin: biochemical properties and therapeutic potential. Hepatology. 41, 1211–1219 [DOI] [PubMed] [Google Scholar]
  • 62. Taverna M., Marie A. L., Mira J. P., Guidet B. (2013) Specific antioxidant properties of human serum albumin. Ann. Intensive Care 15, 3:4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Gum E. T., Swanson R. A., Alano C., Liu J., Hong S., Weinstein P. R., Panter S. S. (2004) Human serum albumin and its N-terminal tetrapeptide (DAHK) block oxidant-induced neuronal death. Stroke 35, 590–5 [DOI] [PubMed] [Google Scholar]
  • 64. Kleinova M., Belgacem O., Pock K., Rizzi A., Buchacher A., Allmaier G. (2005) Characterization of cysteinylation of pharmaceutical-grade human serum albumin by electrospray ionization mass spectrometry and low-energy collision-induced dissociation tandem mass spectrometry. Rapid Commun. Mass Spectrom. 19, 2965–73 [DOI] [PubMed] [Google Scholar]
  • 65. Bar-Or D., Bar-Or R., Rael L. T., Gardner D. K., Slone D. S., Craun M. L. (2005) Heterogeneity and oxidation status of commercial human albumin preparations in clinical use. Crit. Care Med. 33, 1638–1641 [DOI] [PubMed] [Google Scholar]
  • 66. Li H., Grigoryan H., Funk W. E., Lu S. S., Rose S., Williams E. R., Rappaport S. M. (2011) Profiling Cys34 adducts of human serum albumin by fixed-step selected reaction monitoring. Mol. Cell. Proteomics 10(3):M110.004606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Grigoryan H., Li H., Iavarone A. T., Williams E. R., Rappaport S. M. (2012) Cys34 adducts of reactive oxygen species in human serum albumin. Chem. Res. Toxicol. 25, 1633–42 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Brennan S. O., George P. M. (2000) Three truncated forms of serum albumin associated with pancreatic pseudocyst. Biochim. Biophys. Acta 1481, 337–343 [DOI] [PubMed] [Google Scholar]
  • 69. Chan B., Dodsworth N., Woodrow J., Tucker A., Harris R. (1995) Site-specific N-terminal auto-degradation of human serum albumin. Eur. J. Biochem. 227, 524–8 [DOI] [PubMed] [Google Scholar]
  • 70. The UniProt Consortium, (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 40, D71–D75 URL http://www.uniprot.org/uniprot/p02768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Hornbeck P. V., Kornhauser J. M., Tkachev S., Zhang B., Skrzypek E., Murray B., Latham V., Sullivan M. (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261-D270 URL http://www.phosphosite.org [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Anguizola J., Matsuda R., Barnaby O. S., Hoy K. S., Wa C., DeBolt E., Koke M., Hage D. S. (2013) Review: Glycation of human serum albumin. Clin. Chim. Acta 425, 64–76 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Barnaby O. S., Wa C., Cerny R. L., Clarke W., Hage D. S. (2010) Quantitative analysis of glycation sites on human serum albumin using (16)O/(18)O-labeling and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Clin. Chim. Acta 411, 1102–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Liyasova M. S., Schopfer L. M., Lockridge O. (2010) Reaction of human albumin with aspirin in vitro: mass spectrometric identification of acetylated lysines 199, 402, 519, and 545. Biochem. Pharmacol. 79, 784–791 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Han G., Ye M., Zhou H., Jiang X., Feng S., Jiang X., Tian R., Wan D., Zou H., Gu J. (2008) Large-scale phosphoproteome analysis of human liver tissue by enrichment and fractionation of phosphopeptides with strong anion exchange chromatography. Proteomics 8, 1346–1361 [DOI] [PubMed] [Google Scholar]
  • 76. Artimo P., Jonnalagedda M., Arnold K., Baratin D., Csardi G., de Castro E., Duvaud S., Flegel V., Fortier A., Gasteiger E., Grosdidier A., Hernandez C., Ioannidis V., Kuznetsov D., Liechti R., Moretti S., Mostaguir K., Redaschi N., Rossier G., Xenarios I., Stockinger H. ExPASy: SIB bioinformatics resource portal, Nucleic Acids Res., 40(W1):W597-W603, 2012. http://web.expasy.org/peptide_cutter/peptidecutter_enzymes.html#Tryps [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Kagedan D., Lecker I., Batruch I., Smith C., Kaploun I., Lo K., Grober E., Diamandis E. P., Jarvi K. A. (2012) Characterization of the seminal plasma proteome in men with prostatitis by mass spectrometry. Clin. Proteomics 9, 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Li R., Guo Y., Han B. M., Yan X., Utleg A. G., Li W., Tu L. C., Wang J., Hood L., Xia S., Lin B. (2008) Proteomics cataloging analysis of human expressed prostatic secretions reveals rich source of biomarker candidates. Proteomics Clin. Appl. 2, 543–555 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Ying W., Jiang Y., Guo L., Hao Y., Zhang Y., Wu S., Zhong F., Wang J., Shi R., Li D., Wan P., Li X., Wei H., Li J., Wang Z., Xue X., Cai Y., Zhu Y., Qian X., He F. (2006) A dataset of human fetal liver proteome identified by subcellular fractionation and multiple protein separation and identification technology. Mol. Cell. Proteomics 5, 1703–1707 [DOI] [PubMed] [Google Scholar]
  • 80. Sechi S., Chait B. T. (1998) Modification of cysteine residues by alkylation. A tool in peptide mapping and protein identification. Anal. Chem. 70, 5150–5158 [DOI] [PubMed] [Google Scholar]
  • 81. Boja E. S., Fales H. M. (2001) Overalkylation of a protein digest with iodoacetamide. Anal. Chem. 73, 3576–3582 [DOI] [PubMed] [Google Scholar]
  • 82. Beausoleil S. A., Villén J., Gerber S. A., Rush J., Gygi S. P. (2006) A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 [DOI] [PubMed] [Google Scholar]
  • 83. Taus T., Köcher T., Pichler P., Paschke C., Schmidt A., Henrich C., Mechtler K. (2011) Universal and confident phosphorylation site localization using phosphoRS. J. Proteome Res. 10, 5354–5362 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data

Articles from Molecular & Cellular Proteomics : MCP are provided here courtesy of American Society for Biochemistry and Molecular Biology

RESOURCES