Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2010 Feb 19;107(10):4561–4566. doi: 10.1073/pnas.0914495107

Sampling the N-terminal proteome of human blood

David Wildes 1, James A Wells 1,1
PMCID: PMC2842036  PMID: 20173099

Abstract

The proteomes of blood plasma and serum represent a potential gold mine of biological and diagnostic information, but challenges such as dynamic range of protein concentration have hampered efforts to unlock this resource. Here we present a method to label and isolate N-terminal peptides from human plasma and serum. This process dramatically reduces the complexity of the sample by eliminating internal peptides. We identify 772 unique N-terminal peptides in 222 proteins, ranging over six orders of magnitude in abundance. This approach is highly suited for studying natural proteolysis in plasma and serum. We find internal cleavages in plasma proteins created by endo- and exopeptidases, providing information about the activities of proteolytic enzymes in blood, which may be correlated with disease states. We also find signatures of signal peptide cleavage, coagulation and complement activation, and other known proteolytic processes, in addition to a large number of cleavages that have not been reported previously, including over 200 cleavages of blood proteins by aminopeptidases. Finally, we can identify substrates from specific proteases by exogenous addition of the protease combined with N-terminal isolation and quantitative mass spectrometry. In this way we identified proteins cleaved in human plasma by membrane-type serine protease 1, an enzyme linked to cancer progression. These studies demonstrate the utility of direct N-terminal labeling by subtiligase to identify and characterize endogenous and exogenous proteolysis in human plasma and serum.

Keywords: plasma, protease, proteomics, serum, biomarker


The proteomes of human blood serum and plasma contain a vast amount of useful information about the state of the body in health and disease. Because the blood contacts virtually every cell and tissue throughout the body, it contains many proteins and other chemicals that may report on health and disease. In addition, blood collection is simple and minimally invasive, making it a medium of choice for many classical diagnostic tests. Unfortunately, the blood proteome has been challenging to exploit for discovery of protein biomarkers, because of the large number of unique proteins and their degradation products and the broad range of protein concentrations (from millimolar to picomolar or below) in serum and plasma. Just 22 proteins are estimated to make up 99% of the blood proteome by mass. Promising candidates for diagnostic markers, such as cytokines, growth factors, and cancer-specific antigens, may be more than a billionfold less abundant than the major blood proteins (1). Immunoaffinity depletion of certain abundant proteins is typically employed to improve dynamic range, though it has potential disadvantages, including high cost and the possibility of removing low-abundance species that bind to highly abundant proteins (2, 3).

Many biomarker discovery efforts search for variations in total abundance of particular proteins. This approach is simple to implement and conceptually straightforward but may miss potentially informative variation in a sample. A given protein in blood may be posttranslationally modified in myriad ways, including differential glycosylation, sulfation, oxidation, glycation, proteolysis, and many others. Modified proteins may be informative about disease states; for instance, glycated hemoglobin (HbA1c) and serum albumin are useful markers for diabetes mellitus (4). This information is lost when protein abundance alone is measured.

Specific enrichment of modified species can address these challenges, because separating modified from unmodified peptides greatly reduces sample complexity. Proteolysis is an excellent candidate for this strategy. Most blood proteins are subject to at least one proteolytic cleavage, when the N-terminal secretory signal is removed during biogenesis. Additional cleavages may occur in the secretory pathway, and proteins may be further processed by endo- and exoproteases acting in biological processes such as coagulation and complement. These proteolytic processes may be perturbed in disease, and disease-specific, protease-derived new N-termini in blood may be a valuable class of biomarkers.

Recently, we and others have developed a number of complementary chemical methods to isolate and identify N-terminal peptides in proteins, on the basis of positive or negative enrichment strategies (reviewed in refs. 5, 6). Here we apply one such method to isolate and identify the products of proteolysis of blood proteins, an underexploited class of potential biomarkers. These methods and data can further illuminate the role of proteases in blood biology and could provide a strategy for blood-based biomarker discovery.

Results and Discussion

N-Terminal Enrichment Strategy.

Specific labeling and isolation of protein N termini is challenging, because of the similar reactivity of N-terminal α-amines and the > 20-fold more abundant ϵ-amines of lysine side chains. We have addressed this challenge by employing an engineered enzyme, called subtiligase. Subtiligase is a double mutant (S221C/ P225A) of the serine protease subtilisin BPN′ from Bacillus amyloliquefaciens (7), containing additional modifications that enhance stability (8, 9). It lacks detectable protease activity but is capable of cleaving peptide glycolate esters, forming a thioester enzyme intermediate that can be transferred onto free protein and peptide N termini. Subtiligase exhibits absolute specificity for N-terminal α-amines over lysine ϵ-amines, making it an excellent tool for N-terminal labeling. Our group has previously described a subtiligase-based method for labeling, isolation, and enrichment of protein N termini in cell lysates (10). This protocol was modified for plasma and serum labeling and is shown schematically in Fig. 1 A. N-terminal peptides are isolated with a characteristic serine-tyrosine dipeptide tag, providing a characteristic mass shift to all labeled precursor ions as well as two prominent fragment ions in all MS/MS spectra. This tag provides strong evidence for subtiligase tagging, enrichment, and recovery.

Fig. 1.

Fig. 1.

Method for specific, enzymatic labeling of N termini in serum. (A) Schematic of workflow. Subtiligase is used to transfer a peptide containing biotin and a TEV protease-cleavable linker onto protein N termini. Proteins are then captured on streptavidin beads and trypsinized, removing all but the N-terminal tryptic peptide. Trypsinization on beads reduces unlabeled background created from sample precipitation in solution digests. N-terminal peptides are released with TEV protease for strong cation exchange fractionation and MS/MS analysis. Release leaves a SY-dipeptide tag on the N terminus.

The N-Terminal Proteome of Blood.

By using our N-terminal enrichment technique, we identified 772 unique N termini in 222 proteins in human serum and plasma (Dataset S1), with an overall peptide false discovery rate estimated at 1.0% by a target-decoy strategy. We found N termini in blood proteins with concentrations spanning at least six orders of magnitude, with excellent coverage in the top four logs of abundance, where we detected over 70% of the 150 most abundant proteins (11) (Fig. 2 and Table S1).

Fig. 2.

Fig. 2.

Concentration distribution of proteins and reproducibility of the method. (A) A subset of 110 proteins of established abundance (11) is plotted by mean molar concentration in plasma. Representative low, medium, and high abundance proteins are labeled. (B) The number of N termini detected in each protein is shown, arranged in order of abundance. Proteins depicted in this plot are given in Table S1. (C) Venn diagram showing results of three replicate experiments on a single sample of citrated plasma.

The number of unique termini found in each protein varied greatly and was generally consistent with the role of proteolysis in protein function. For example, multiple N termini were found in many coagulation and complement factors. We also found proteolysis of abundant proteins where the biological function of proteolytic cleavage is less clear. It is possible that some of these cleavages represent nonspecific cleavage of abundant proteins by blood proteases, although it is notable that the correlation between number of N termini discovered and protein abundance is weak. Indeed, the abundant plasma protein alpha-1 acid glycoprotein yields no detectable N termini.

We assessed the reproducibility of our labeling and enrichment strategy by performing three technical replicate experiments on a single sample of citrated plasma (Fig. 2 C and Dataset S1). We found substantial overlap between these three samples, with 29% of peptides found in all three experiments and 56% found in at least two. This level of overlap between technical replicates is well within the range expected for the mass spectrometry techniques that we used (12) and suggests that our N-terminal labeling does not result in major variations between samples.

The subcellular localizations, as annotated in Swiss-Prot, for each of the 222 proteins we report are shown in Fig. 3 A. As would be expected for a survey of blood proteins, 67% are known to be secreted. The proportion of annotated secreted proteins reported here is substantially higher than the 50% found in a recently compiled, high confidence list of proteins found in plasma (13), likely reflecting a bias inherent in N-terminal labeling. Intracellular proteins arising from both tissue leakage in vivo and cell lysis during sample collection and preparation are likely to be acetylated on their native N termini (14), rendering them undetectable in the absence of internal proteolytic processing. Thus our method is more sensitive to secreted proteins whose N termini are free after signal sequence removal, and this is reflected in the proportion of secreted protein identifications.

Fig. 3.

Fig. 3.

The N-terminal proteome of human blood. (A) The subcellular locations, as annotated in Swiss-Prot (www.uniprot.org), of the proteins detected in this study. (B) Cleavage site annotations of detected N termini, according to Swiss-Prot and MEROPS (40) databases. (C) Evidence of aminopeptidase trimming of N termini in three proteins. Similar degradation was seen in 112 N termini. (D) Comparison of cleavage annotations found in serum and plasma collected with three different anticoagulants.

Endoproteolysis and Exoproteolysis in Blood.

Most blood proteins are subject to proteolytic cleavage by a multitude of proteases in the secretory pathway and the extracellular environment. Tracking these proteolytic events could shed light on important biological processes in health and disease. We compared the N termini that we found to annotations in the Swiss-Prot and MEROPS databases to identify known termini resulting from well-understood biological processes (Fig. 3 B). Interestingly, 81% of the N termini that we found are not annotated in either database. Annotated signal peptide cleavages, within five residues of predicted signal processing sites, account for 11% of our data, with annotated propeptide cleavages making up another 4.5%. Plasmin activity on fibrinogen (2%), cleavage of “bait loops” in protease inhibitors (0.9%), and removal of initiator methionines (0.4%) are also represented. Consistent with previous studies (15), a significant number of N termini (28%) appear to arise from aminopeptidase processing of peptides derived from endoprotease cleavage, indicated by systematic laddering of products (Fig. 3 C). In total, 112 termini in 53 proteins are subject to aminopeptidase trimming, ranging from removal of a single amino acid to long ladders of aminopeptidase-processed termini. Trimming occurs on termini derived from a variety of endoprotease cleavages, including signal and propeptide removal, reactive center loop of serpin cleavage, and cleavages of unknown significance.

Proteolysis in Serum and Plasma.

We investigated the N-terminal proteomes of human serum and plasma collected with three different anticoagulants, with increasingly stringent suppression of proteolytic activity: citrate, EDTA, and the proprietary P100 system (BD). Serum is expected to differ from plasma because of the initiation of coagulation, resulting in cleavage of coagulation factors, release of platelet granule contents, and a general increase in proteolytic activity (16). A comparison of the types of N termini found in serum and the three plasmas is shown in Fig. 3 D. The overall differences are modest but some patterns are evident. Serum and citrated plasma are enriched for N termini of unknown significance. EDTA and P100 plasma have proportionately fewer unknown cleavages and a concomitant increase in signal peptide and other annotated cleavages. This increase is consistent with a higher background of proteolysis in serum and citrated plasma, leading to cleavages of abundant proteins after sample collection. Serum and citrate appear similar in their background proteolysis levels, as has been shown previously (16). EDTA is a stronger inhibitor of plasma proteases than citrate, and the lower proportion of unknown cleavages in EDTA plasma reflects this. In these experiments the additional protease inhibition provided by P100 tubes does not significantly improve the results, though others have shown a reduction in proteolysis of specific substrates (16). Interestingly, aminopeptidase-derived termini comprised an approximately equal proportion of all samples, suggesting either that this activity is not affected by any of these anticoagulants or that these termini reflect in vivo proteolytic processing that is not affected by the conditions of sample collection.

Sequence and Structural Determinants of Proteolytic Cleavage.

In order to understand the nature of the proteolytic enzymes acting on proteins in the blood, we investigated the cleavage sequences of the 461 endoprotease-derived N termini of unknown significance in our dataset (Fig. 4 B). A preference is seen for basic (R, K) residues preceding the cut site, and small (G, S, A) residues following, which is consistent with many endoproteases, including those of the coagulation (17) and complement (18) pathways. Efforts to discover a simple recognition motif(s) by using the program MotifX (19) did not reveal a clear consensus sequence; evidently either the protease(s) responsible have relatively low specificity at the level of the primary structure of the cleavage site or the proteolysis we see is because of the action of many different proteases with varied specificity.

Fig. 4.

Fig. 4.

Patterns of endoproteolysis. (A) Sequence logo of the eight residues (P4–P4′) surrounding the cleavage site of all nonannotated, endoproteolytic cleavages. The y axis denotes information content and has a maximum value of 4.2. Logo created by Weblogo (http://weblogo.berkeley.edu/) (41). (C) Cleavage of two significant blood proteins, prothrombin and complement C3. Swiss-Prot annotated cleavage sites divide the proteins into domains as shown on the schematic. Cleavage sites detected in this study are indicated with arrows. AP, thrombin activation peptide; LC, thrombin light chain.

Proteolytic processing is important for the activation and inactivation of factors involved in coagulation and complement cascades (18, 20). Proteolysis in these systems has been characterized extensively in vitro, and we compared our data to the in vitro findings for some of these proteins (Fig. 5 A). Prothrombin lies at the center of the coagulation cascade and is activated to thrombin by a series of discrete cleavages, shown as gaps in the rectangular representation of the protein in Fig. 5 A. Thrombin cleaves fibrinogen, initiating fibrin polymerization to form a clot (20). We identified the expected activating cleavages of thrombin, in addition to a few cleavages of unknown significance within the activation peptides.

Fig. 5.

Fig. 5.

Plots of iTRAQ signal for representative putative MT-SP1 substrates. ● A2M 705; ○ A2M 707; ▴ A2M 707; ▵ A2M720; ▪ complement C3 713; □ complement C3 741.

In contrast to prothrombin, we see much more heterogeneous cleavage of complement C3. C3 is extensively proteolyzed throughout its life cycle in blood. After an activating cleavage to C3a and C3b by C3 convertase complexes, factor I and other enzymes inactivate C3b by additional cleavages. A series of discrete fragments has been defined in vitro (21). Our findings indicate that C3b inactivation by factor I in vivo may be more heterogeneous than previously appreciated. Whereas the overall distribution of fragments is consistent in our data, we detect clusters of cleavages at the domain boundaries (shown as vertical arrows in Fig. 4 C) rather than isolated, discrete cuts. Interestingly, these areas of heterogeneous proteolysis cluster internally only in specific fragments of C3: C3a, C3g, and C3f. C3a in particular is a mediator of inflammation, and internal cleavages within it may antagonize this function (22). These data suggest that even well-studied proteolytic cascades may yield unique insights from such global analysis, made possible by N-terminal proteomics.

Proteome Simplification by N-Terminal Isolation.

N-terminal isolation reduces each protein in a mixture to one or a few peptides, which potentially has the advantage of reducing the interference caused by abundant proteins (23). For example, serum albumin produces about 100 tryptic peptides but has only a single N terminus, meaning that N-terminal labeling may reduce albumin peptides by 100-fold. In practice, we detected internal peptides from serum albumin (22 total) but at substantially reduced abundance compared to the parent protein. This advantage has been described in previous efforts to characterize N-terminal peptides, both by depletion of all internal peptides from a trypsin-digested sample (23) and by positive enrichment of N termini following selective chemical blockage of lysine residues (15). Each of these N-terminal enrichment methods has unique advantages and disadvantages and thus, in aggregate, are likely to provide complementary information. For plasma and serum, subtiligase labeling has some advantage in that it does not rely on the absolute efficiency of internal peptide depletion or lysine blocking.

Additional reduction in complexity is afforded by the sequence specificity of subtiligase. Subtiligase is very promiscuous toward N-terminal sequences, but it disfavors certain N-terminal residues, including acidic side chains and proline (9). Several abundant serum proteins, including serum albumin and apolipoprotein A-I, have acidic N-terminal residues and are not efficiently labeled by subtiligase, reducing their relative abundance in our experiments. Other abundant plasma proteins (e.g., alpha-1 acid glycoprotein) have chemically blocked N termini. Perhaps because of these factors, we found little benefit from pretreatment of plasma to remove the 12 most abundant proteins (Fig. S1), although it is likely that targeted depletion of abundant proteins with many N termini (e.g., C3, fibrinogen) would improve coverage.

It should be noted that sampling only particular peptides from each protein can limit protein identification. N-terminal tryptic peptides may be too short or too long or ionize poorly, rendering them difficult to identify by database matching. With only a single digest (in this case trypsin), this method is limited to sampling rather than comprehensive coverage of the N-terminal proteome. Alternate digestions with different specificities should increase coverage, as demonstrated in other N-terminal proteomic studies (24).

N-Terminal Proteomics and Biomarker Discovery.

Specific proteases may be up-regulated in diverse disease states, and their activity may leave a mark on the plasma proteome. Proteolytic products of certain intracellular events, including apoptotic and necrotic cell death (25), may also be released into the blood, where they may serve as useful markers of these processes. Proteolytic fragments of normal plasma proteins are also directly implicated in the pathogenesis of certain diseases, such as amyloidoses and atherosclerosis (26, 27). Within our N-terminal peptide dataset, we find examples of all of these classes (Table 1).

Table 1.

Putative disease-associated proteolytic cleavages detected in this study

Protein or peptide Disease(s) References
Exopeptidase-derived Thyroid carcinoma (28)
IGF/IGFBPs Colorectal cancer, androgen-insensitive prostate cancer (30)
ApoE Alzheimer’s Disease (31)
TTR Senile systemic amyloidosis (32)
Cystatin-C Cerebral Hemorrhage with Amyloidosis (26)
ApoAI Cardiovascular Disease (27)

The widespread exoproteolysis we observe in our experiments may represent useful patterns to monitor health and disease. Tempst and coworkers have recently correlated exoprotease activities in blood to metastatic cancer (28). The 112 exoprotease-sensitive peptide sequences we identify here, derived from 53 proteins, greatly increase the number of potential sequences available and should expand the scope of this approach to biomarker discovery.

Proteolytic products of apoptosis are of significant interest in biomarker discovery. Apoptosis, a form of programmed cell death implicated in the response of cancer cells to chemotherapy and radiation, is executed by a family of cysteine proteases called caspases (25). Products of caspase proteolysis could serve as markers of successful cancer therapy. For example, increase in a caspase-derived peptide from the intracellular protein cytokeratin 18 (CK-18) in the serum of breast cancer patients was correlated with 5-year survival in one study (29). The method we describe here may identify other such caspase-derived markers. Intriguingly, we find a peptide derived from cleavage of the abundant protein gelsolin after the sequence DQTD. This cleavage has been reported in cell culture screens for apoptotic caspase substrates (10). In addition to this putative caspase-derived peptide, we also find some intracellular proteins with internal proteolytic cleavages, including abundant cellular proteins such as actin, as well as less abundant proteins, including the Ran-specific GTPase activating protein. How these proteins are cleaved and how they reach the blood is unclear at present, but they may also represent useful markers for disease states.

Proteolytically cleaved peptides and proteins in blood are not only proxies for intracellular disease states; in some cases, they are directly involved in pathogenesis. Several possible examples of this occur in the data we report here, including proteolysis of components of the insulin-like growth factor (IGF) signaling axis that have been implicated in cancer progression (30), proteolysis of transthyretin and apolipoprotein E that may be involved in senile systemic amyloidosis and Alzheimer’s disease (31, 32), and proteolysis of apolipoprotein AI that may play a role in atherosclerosis and cardiovascular disease (27).

Multiple reaction monitoring (MRM) methods combined with stable isotope labeled peptide standards have shown great promise for quantification of biomarkers in plasma. These methods are sensitive, showing a limit of quantitation in the low nanogram/milliliter range in some cases (33, 34), and reproducible across multiple laboratories (35). The sensitivity of MRM methods can be further enhanced by specific enrichment of peptides by using peptide-directed antibodies (36). N-terminal enrichment may offer a similar benefit for enhancing detection of peptides specifically associated with protease activity in blood. Whereas the sensitivity and reproducibility of N-terminal peptide isolation remain to be demonstrated in this context, the fact that we have detected proteins present at the nanomolar to high picomolar level (e.g., VEGF-D and osteopontin) by using relatively insensitive survey MS/MS methods suggests that this is a promising area for future study.

Identification of Membrane-Type Serine Protease 1 (MT-SP1) Sites in Human Plasma.

In addition to identifying proteolyzed products generally present in blood, our method can also be used to identify substrates of specific proteases. Proteolytic enzymes make up 2% of the human genome, but the biological significance of most is unknown, owing partly to the difficulty in identifying natural substrates (37). A sensitive method to detect specific protease substrates must rely on quantitative proteomic methods, in order to identify N termini that increase with time after exogenous addition or activation of a protease. As a proof of concept, we explored the substrates of MT-SP1 in plasma.

MT-SP1 is present on a variety of epithelial cell types, is naturally shed into the blood, and is up-regulated in certain cancers. It is essential for development of a functional epidermal barrier, likely because of its role in processing profilaggrin to filaggrin (38). However, MT-SP1 has also been shown to process a number of other substrates, including prohepatocyte growth factor activator, prourokinase plasminogen activator, and protease activated receptor 2 (39).

MT-SP1 is rapidly inhibited in plasma, with a half-life of activity of 30 s (Fig. S2 and SI Text). We therefore explored a 60-s time course of proteolysis by labeling increasing time points with increasing isobaric tag for relative and absolute quantitation (iTRAQ) reporter ion masses. Of 86 peptides identified in this experiment, 13 showed a large (> 5-fold) change in iTRAQ signal over the time course and were identified as putative substrates, listed in Table 2. Plots of the iTRAQ ratio vs. time for representative peptides are shown in Fig. 5.

Table 2.

Putative substrates of MT-SP1

Protein Start residue* P4–P1 P1′–P4′ Fold change
Alpha-2-macroglobulin 705 EGLR VGFY 8.2
Alpha-2-macroglobulin 706 GLRV GFYE 10.9
Alpha-2-macroglobulin 707 LRVG FYES 5.5
Alpha-2-macroglobulin 720 GHAR LVHV 13.1
Apolipoprotein E 210 GRVR AATV 5.6
Complement C3 713 RRTR FISL 5.4
Complement C3 741 QHAR ASHL 21.3
Fibrinogen α chain 468 TVTK TVIG 11.5
Fibrinogen α chain 582 SYSK QFTS 8.2
Gelsolin 33 TASR GASQ 330
Inter-α-trypsin inhibitor heavy chain H1 126 QYRK AAIS 4.5
Inter-α-trypsin inhibitor heavy chain H2 140 TVGR ALYA 77.5
Inter-α-trypsin inhibitor heavy chain H4 658 AGSR MNFR 12.6

*Numbering according to Swiss-Prot database.

Substrates are defined as those peptides showing more than 3-fold change in iTRAQ signal after 60 s.

Several putative substrates of MT-SP1 may be of functional interest. We observe multiple cuts in the bait loop of the protease inhibitor α-2 macroglobulin (A2M), consistent with apparent A2M inhibition in peptide experiments (Fig. S2 and SI Text). Two cleavages in complement C3, located in the C3a anaphylotoxin domain, are also of interest. Free C3a is a potent mediator of inflammation, whose function requires key C-terminal amino acids (22). Interestingly, these residues are removed by one of the two observed cleavages. Whereas our data cannot distinguish whether this cleavage occurs in free C3a or in intact C3, it is an intriguing observation, suggesting the possibility that cell-surface MT-SP1 has the ability to inactivate C3a and reduce inflammatory response.

N termini that increase in abundance after addition of MT-SP1 are not necessarily direct substrates of the protease. MT-SP1 is known to activate uPA (39), which may process other proteins, or activate plasmin, leading to further indirect proteolysis. We detect several cleavages after lysine residues in fibrinogen Aα that could result from either direct cleavage by MT-SP1 or indirect cleavage through plasmin activation (Table 2). In addition, we detect evidence of rapid aminopeptidase processing following MT-SP1 cleavage of a single site in A2M. Cleavage after R704 in A2M is consistent with the known sequence preferences of MT-SP1. However, we also find cleavages after V705 and G706. These are expected to be poor substrates of MT-SP1 and are unlikely to be cleaved at the same rate as R704. More likely, this terminus is subject to rapid aminopeptidase processing after exposure by MT-SP1 cleavage. This observation is consistent with an active interplay between endo- and exoproteases in blood.

Conclusion

Here we have described a method for labeling and enrichment of N-terminal peptides from proteins in blood serum and plasma, allowing us to identify the sequences of the sites of proteolytic action in blood. We discovered many N termini corresponding to known biological processes. In addition, over half of the N termini we found have not been reported in protein databases. Some of these cuts may represent biologically significant substrates for blood proteases, whereas others may be cleavages that occur during sample collection and storage. These results may impact the choice of representative peptides for MS-based quantification of blood proteins; we identify protease-sensitive species that could introduce significant variability if they were used for this purpose. We also have demonstrated the utility of N-terminal labeling to identify substrates of proteases acting in blood. We anticipate using this method, particularly in combination with sensitive label-free quantification approaches, as a means to rapidly profile the actions of proteases in blood, ranging from endogenous blood coagulation and tissue surface proteases to pathogen-associated enzymes.

N-terminal peptide isolation should simplify blood proteome digests by enriching for one or a few peptides per protein. Whereas this may reduce the sensitivity for detecting certain proteins, such as those with N-terminal peptides that perform poorly in the mass spectrometer or are too short for reliable database matching, it likely has advantages for discovery of biomarkers associated with proteolytic processes. It selects for a suite of analytes that can be monitored by sensitive MRM methods, potentially without the need to enrich each individual analyte peptide by immunoaffinity methods. We believe that the application of this method holds promise for future biomarker candidate discovery.

Materials and Methods

Materials.

Details of materials used in this study may be found in SI Text.

Sample Labeling and Workup.

Samples (2 mL) were biotinylated with subtiligase and peptide ester for 60 min at room temperature. Proteins were reduced, alkylated with iodoacetamide, captured on immobilized streptavidin, digested with trypsin, and released from the resin with TEV protease. Additional details of the labeling and workup are provided in SI Text.

LCMS/MS Acquisition and Data Processing.

Peptides were subject to offline strong cation exchange fractionation and then to C18 chromatography coupled directly to a QSTAR Pulsar or QSTAR Elite mass spectrometer. Additional details of data acquisition and processing are in SI Text.

Peptide Identification.

Database searches to identify peptides were performed by using Protein Prospector v. 5.2.2 (UCSF Mass Spectrometry Facility, prospector.ucsf.edu) and the December 2008 release of Swiss-Prot. Additional details of peptide identification are in SI Text. Labeled and unlabeled peptides are listed in Dataset S1, and annotated peaklists are provided in Dataset S2.

Detection of MT-SP1 Substrates.

Samples (1 mL) of EDTA plasma were treated with 1 μM MT-SP1 for 10, 30, and 60 s and then quenched by addition of 4-(2-aminoethyl)benzenesulfonyl fluoride and PMSF. An untreated sample was used for a zero time point. Samples were processed as described above. After desalting, they were labeled with iTRAQ reagents as follows: 0 s, mass 114; 10 s, mass 115; 30 s, mass 116; 60 s, mass 117, by using the protocol provided by the manufacturer. Additional details of iTRAQ analysis are provided in SI Text.

Supplementary Material

Supporting Information

Acknowledgments.

We thank A.L. Burlingame, D. Maltby, and J.C. Trinidad for assistance with design and execution of mass spectrometry experiments, N.J. Agard, E.D. Crawford, and C.M. Jackson for critical reading of the manuscript, and members of the Wells and Burlingame groups for helpful discussions. Active MT-SP1 was a generous gift from E.L. Madison (Catalyst Biosciences, South San Francisco, CA). This work was supported by National Institutes of Health (NIH) Grant F32GM079931 (to D.W.) and NIH Grant R01 GM081051 (to J.A.W.). Mass spectrometry was performed at the Bio-Organic Biomedical Mass Spectrometry Resource at UCSF (A.L. Burlingame, Director) supported by the Biomedical Research Technology Program of the NIH National Center for Research Resources, NIH NCRR P41RR001614 and NIH NCRR RR015804.

Footnotes

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0914495107/DCSupplemental.

References

  • 1.Anderson NL, Anderson NG. The human plasma proteome: History, character, and diagnostic prospects. Mol Cell Proteomics. 2002;1:845–867. doi: 10.1074/mcp.r200007-mcp200. [DOI] [PubMed] [Google Scholar]
  • 2.Liu T, et al. Evaluation of multiprotein immunoaffinity subtraction for plasma proteomics and candidate biomarker discovery using mass spectrometry. Mol Cell Proteomics. 2006;5:2167–2174. doi: 10.1074/mcp.T600039-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zolotarjova N, et al. Differences among techniques for high-abundant protein depletion. Proteomics. 2005;5:3304–3313. doi: 10.1002/pmic.200402021. [DOI] [PubMed] [Google Scholar]
  • 4.Cohen MP, Clements RS. Measuring glycated proteins: Clinical and methodological aspects. Diabetes Technol The. 1999;1:57–70. doi: 10.1089/152091599317585. [DOI] [PubMed] [Google Scholar]
  • 5.Agard NJ, Wells JA. Methods for the proteomic identification of protease substrates. Curr Opin Chem Biol. 2009;13:503–509. doi: 10.1016/j.cbpa.2009.07.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Doucet A, et al. Metadegradomics: Toward in vivo quantitative degradomics of proteolytic post-translational modifications of the cancer proteome. Mol Cell Proteomics. 2008;7:1925–1951. doi: 10.1074/mcp.R800012-MCP200. [DOI] [PubMed] [Google Scholar]
  • 7.Abrahmsen L, et al. Engineering subtilisin and its substrates for efficient ligation of peptide bonds in aqueous solution. Biochemistry. 1991;30:4151–4159. doi: 10.1021/bi00231a007. [DOI] [PubMed] [Google Scholar]
  • 8.Atwell S, Wells JA. Selection for improved subtiligases by phage display. Proc Natl Acad Sci USA. 1999;96:9497–9502. doi: 10.1073/pnas.96.17.9497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chang TK, Jackson DY, Burnier JP, Wells JA. Subtiligase: A tool for semisynthesis of proteins. Proc Natl Acad Sci USA. 1994;91:12544–12548. doi: 10.1073/pnas.91.26.12544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Mahrus S, et al. Global sequencing of proteolytic cleavage sites in apoptosis by specific labeling of protein N termini. Cell. 2008;134:866–876. doi: 10.1016/j.cell.2008.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hortin GL, Sviridov D, Anderson NL. High-abundance polypeptides of the human plasma proteome comprising the top 4 logs of polypeptide abundance. Clin Chem. 2008;54:1608–1616. doi: 10.1373/clinchem.2008.108175. [DOI] [PubMed] [Google Scholar]
  • 12.Elias JE, Haas W, Faherty BK, Gygi SP. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat Methods. 2005;2:667–675. doi: 10.1038/nmeth785. [DOI] [PubMed] [Google Scholar]
  • 13.States DJ, et al. Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol. 2006;24:333–338. doi: 10.1038/nbt1183. [DOI] [PubMed] [Google Scholar]
  • 14.Brown JL, Roberts WK. Evidence that approximately eighty per cent of the soluble proteins from ehrlich ascites cells are N-alpha-acetylated. J Biol Chem. 1976;251:1009–1014. [PubMed] [Google Scholar]
  • 15.Timmer JC, et al. Profiling constitutive proteolytic events in vivo. Biochem J. 2007;407:41–48. doi: 10.1042/BJ20070775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Yi J, Kim C, Gelfand CA. Inhibition of intrinsic proteolytic activities moderates preanalytical variability and instability of human plasma. J Proteome Res. 2007;6:1768–1781. doi: 10.1021/pr060550h. [DOI] [PubMed] [Google Scholar]
  • 17.Page MJ, Macgillivray RTA, Di Cera E. Determinants of specificity in coagulation proteases. J Thromb Haemost. 2005;3:2401–2408. doi: 10.1111/j.1538-7836.2005.01456.x. [DOI] [PubMed] [Google Scholar]
  • 18.Sim RB, Tsiftsoglou SA. Proteases of the complement system. Biochem Soc Trans. 2004;32:21–27. doi: 10.1042/bst0320021. [DOI] [PubMed] [Google Scholar]
  • 19.Schwartz D, Gygi SP. An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat Biotechnol. 2005;23:1391–1398. doi: 10.1038/nbt1146. [DOI] [PubMed] [Google Scholar]
  • 20.Jackson CM, Nemerson Y. Blood coagulation. Annu Rev Biochem. 1980;49:765–811. doi: 10.1146/annurev.bi.49.070180.004001. [DOI] [PubMed] [Google Scholar]
  • 21.Sahu A, Lambris JD. Structure and biology of complement protein C3, a connecting link between innate and acquired immunity. Immunol Rev. 2001;180:35–48. doi: 10.1034/j.1600-065x.2001.1800103.x. [DOI] [PubMed] [Google Scholar]
  • 22.Hugli TE. Structure and function of C3a anaphylatoxin. Curr Top Microbiol Immunol. 1990;153:181–208. doi: 10.1007/978-3-642-74977-3_10. [DOI] [PubMed] [Google Scholar]
  • 23.McDonald L, Beynon RJ. Positional proteomics: Preparation of amino-terminal peptides as a strategy for proteome simplification and characterization. Nat Protocols. 2006;1:1790–1798. doi: 10.1038/nprot.2006.317. [DOI] [PubMed] [Google Scholar]
  • 24.Schilling O, Overall CM. Proteome-derived, database-searchable peptide libraries for identifying protease cleavage sites. Nat Biotechnol. 2008;26:685–694. doi: 10.1038/nbt1408. [DOI] [PubMed] [Google Scholar]
  • 25.Taylor RC, Cullen SP, Martin SJ. Apoptosis: Controlled demolition at the cellular level. Nat Rev Mol Cell Biol. 2008;9:231–241. doi: 10.1038/nrm2312. [DOI] [PubMed] [Google Scholar]
  • 26.Janowski R, Abrahamson M, Grubb A, Jaskolski M. Domain swapping in N-truncated human cystatin C. J Mol Biol. 2004;341:151–160. doi: 10.1016/j.jmb.2004.06.013. [DOI] [PubMed] [Google Scholar]
  • 27.Liz MA, Gomes CM, Saraiva MJ, Sousa MM. ApoA-I cleaved by transthyretin has reduced ability to promote cholesterol efflux and increased amyloidogenicity. J Lipid Res. 2007;48:2385–2395. doi: 10.1194/jlr.M700158-JLR200. [DOI] [PubMed] [Google Scholar]
  • 28.Villanueva J, et al. A sequence-specific exopeptidase activity test (SSEAT) for "functional" biomarker discovery. Mol Cell Proteomics. 2008;7:509–518. doi: 10.1074/mcp.M700397-MCP200. [DOI] [PubMed] [Google Scholar]
  • 29.Olofsson MH, et al. Cytokeratin-18 is a useful serum biomarker for early determination of response of breast carcinomas to chemotherapy. Clin Cancer Res. 2007;13:3198–3206. doi: 10.1158/1078-0432.CCR-07-0009. [DOI] [PubMed] [Google Scholar]
  • 30.Fuchs CS, et al. Plasma insulin-like growth factors, insulin-like binding protein-3, and outcome in metastatic colorectal cancer: Results from intergroup trial n9741. Clin Cancer Res. 2008;14:8263–8269. doi: 10.1158/1078-0432.CCR-08-0480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Harris FM, et al. Carboxyl-terminal-truncated apolipoprotein E4 causes alzheimer's disease-like neurodegeneration and behavioral deficits in transgenic mice. Proc Natl Acad Sci USA. 2003;100:10966–10971. doi: 10.1073/pnas.1434398100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Mizuguchi M, et al. Unfolding and aggregation of transthyretin by the truncation of 50 N-terminal amino acids. Proteins. 2008;72:261–269. doi: 10.1002/prot.21919. [DOI] [PubMed] [Google Scholar]
  • 33.Keshishian H, et al. Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics. 2007;6:2212–2229. doi: 10.1074/mcp.M700354-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Keshishian H, et al. Quantification of cardiovascular biomarkers in patient plasma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics. 2009;8:2339–2349. doi: 10.1074/mcp.M900140-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Addona TA, et al. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotechnol. 2009;27:633–641. doi: 10.1038/nbt.1546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Anderson NL, et al. Mass spectrometric quantitation of peptides and proteins using stable isotope standards and capture by anti-peptide antibodies (SISCAPA) J Proteome Res. 2004;3:235–244. doi: 10.1021/pr034086h. [DOI] [PubMed] [Google Scholar]
  • 37.Marnett AB, Craik CS. Papa's got a brand new tag: Advances in identification of proteases and their substrates. Trends Biotechnol. 2005;23:59–64. doi: 10.1016/j.tibtech.2004.12.010. [DOI] [PubMed] [Google Scholar]
  • 38.List K, Bugge TH, Szabo R. Matriptase: Potent proteolysis on the cell surface. Mol Med. 2006;12:1–7. doi: 10.2119/2006-00022.List. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Uhland K. Matriptase and its putative role in cancer. Cell Mol Life Sci. 2006;63:2968–2978. doi: 10.1007/s00018-006-6298-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rawlings ND, et al. Merops: The peptidase database. Nucleic Acids Res. 2008;36:D320–D325. doi: 10.1093/nar/gkm954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Crooks GE, Hon G, Chandonia J-M, Brenner SE. Weblogo: A sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
0914495107_SD2.pdf (48.5MB, pdf)
0914495107_SD1.xls (387KB, xls)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES