Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2023 Sep 7;95(37):13746–13749. doi: 10.1021/acs.analchem.3c02269

Assessment and Prediction of Human Proteotypic Peptide Stability for Proteomics Quantification

Cristina Chiva †,, Zahra Elhamraoui †,, Amanda Solé †,, Marc Serret †,, Mathias Wilhelm §, Eduard Sabidó †,‡,*
PMCID: PMC10515110  PMID: 37676919

Abstract

graphic file with name ac3c02269_0002.jpg

Mass spectrometry coupled to liquid chromatography is one of the most powerful technologies for proteome quantification in biomedical samples. In peptide-centric workflows, protein mixtures are enzymatically digested to peptides prior their analysis. However, proteome-wide quantification studies rarely identify all potential peptides for any given protein, and targeted proteomics experiments focus on a set of peptides for the proteins of interest. Consequently, proteomics relies on the use of a limited subset of all possible peptides as proxies for protein quantitation. In this work, we evaluated the stability of the human proteotypic peptides during 21 days and trained a deep learning model to predict peptide stability directly from tryptic sequences, which together constitute a resource of broad interest to prioritize and select peptides in proteome quantification experiments.


Mass spectrometry coupled to liquid chromatography is one of the most powerful technologies for proteome quantification in biomedical samples.1 In peptide-centric workflows, protein mixtures are enzymatically digested to peptides prior their analysis.2 Experimental spectra are then used for peptide identification, and extracted peptide areas, peak heights, or spectral counts are used to infer protein quantities.35 However, proteome-wide quantification studies rarely identify all potential peptides for any given protein, and targeted proteomics experiments focus on a set of peptides for the proteins of interest. Consequently, proteomics relies on the use of a limited subset of all possible peptides as proxies for protein quantitation. As only some peptides are used to infer protein abundances, the selected peptides become relevant to achieve accurate and precise protein quantification.

Several targeted and nontargeted proteomics studies have evaluated the quantitative response of tryptic peptides and have described rules to guide peptide selection for protein quantification.68 We and others have evaluated the influence of the digestion technique, protease, missed cleavages, and amino acidic composition in protein quantitation9,10 and showed the importance of experimental data to assess the quantitative behavior of peptides.11 In addition, various tools were developed that use previous experimental data, peptide amino acidic composition, and their physiochemical properties to derive suitability scores and thus suggest the best candidate peptides for quantitative proteomics.8,1215 Beyond the effect of tryptic digestion and missed cleavages, short- and midterm peptide stability can have a profound effect on quantification, especially in large proteomics experiments that expand for several days or weeks. Despite its importance in other fields,16 to date only few studies have investigated peptide stability, either in the context of handling and storage of liquid biopsies,17,18 or in the development of specific targeted proteomics assays.1921 Here, we expanded this knowledge by assessing the stability of the human proteotypic peptides under autosampler conditions for a period of 21 days, and we used this knowledge to train a deep learning model that predicts the stability for new tryptic peptide sequences.

We initially assessed the stability of the human proteotypic peptide set of the ProteomeTools collection,22 containing 124,875 peptides and mapping to 15,990 human Uniprot/SwissProt annotated genes (Table ST1). Synthetic peptide pools consisted of approximately 1000 peptides (∼5 pmol each) disolved in water with 0.1% formic acid and kept at 4 °C in polypropylene vials during the entire experiment. Each pool was analyzed by data-dependent acquisition LC-MSMS once every 3.5 days, with a total of 6 time points within a period of 21 days (Figure 1A). Acquired data were analyzed using MaxQuant v1.6.0,23 with match between runs among the six independent injections of each peptide pool. Peptides were identified at FDR < 5% given the known composition of each peptide pool, and peptide areas were extracted for peptide relative quantification (Table ST2). On average, peptide identifications that surpassed 85% of the total analyzed peptides were peptide identifications, with pools reaching up to 95% identification success rate, and most of the identifications being based on fragmentation spectra evidence (Figure S1A, B). Profiles with less than three quantitative points were discarded, and the remaining missing values in each profile were imputed as the average of their neighboring temporal quantitative values. Quantitative temporal profiles comprising 6 injections within 21 days (822 injections) were obtained for 101,903 proteotypic peptides in 137 pools, making a total of over half a million data points. Peptide intensities were normalized within each pool based on median equalization (Figure S1C).

Figure 1.

Figure 1

(A) Schematic representation of the experimental workflow for peptide stability assessment. (B) Peptide stability profile clusters obtained from experimental data. (C) Schematic representation of the deep learning model trained in this work to predict peptide stability. (D) Sensitivity, specificity, accuracy, Matthews correlation coefficient, and average receiver operating characteristic (ROC) for the deep learning model.

Peptide profiles were used to assess the stability of each peptide and distinguish between stable and unstable peptides. We used a deep embedding clustering (DEC)24 algorithm, an unsupervised deep learning model that combines deep embedding and k-means clustering (see Supporting Information). The peptide abundance profiles were grouped based on pattern similarity, resulting in three subgroups containing 76,173 (74.75%), 17,046 (16.72%), and 8,684 (8.52%) peptide sequences. Based on their average profile, we labeled the obtained clusters as “stable” (cluster #3) and “unstable” (cluster #2) peptide sequences and “noisy data” (cluster #1), respectively (Figure 1B, Table ST2). Despite peptide stability being affected by specific matrix effects, plastic and glassware, and storage conditions, our results demonstrate that most proteotypic peptides from the human proteome are stable throughout a period of 21 days in specific measuring conditions that are common in many proteomics laboratories.

Next, we used these labeled experimental data to build a deep learning model that can learn and predict peptide stability for any tryptic peptide of interest using only its amino acid sequence. A total of 77,338 peptide experimental stability profiles were used as training set, whereas 13,649 peptide profiles were set as an independent test data set to evaluate the model performance. We used a deep learning model with a hybrid architecture that included two Bidirectional Gated Recurrent Unit (BiGRU) layers and an attention mechanism layer (Figure 1C). To assess the performance of our model, we used 5-fold cross-validation and a holdout set. Due to the imbalanced nature of our data set, we used AUC as our evaluation metric. Our model obtained an AUC of 0.809 on the holdout set and 0.811 during cross-validation, demonstrating its robustness in peptide stability prediction (Figure 1D). Finally, we built an online web server, hosting both the experimental stability data for the measured proteotypic peptides and the model to predict peptide stability for new tryptic peptide sequences (http://peptidestability.crg.eu). The latter is limited to peptide sequences of length below 20 amino acids with no chemical or post-translational modifications.

Overall, we evaluated the stability of the human proteotypic peptides during 21 days and trained a deep learning model to predict peptide stability directly from tryptic sequences, which together constitute a resource of broad interest within the community to prioritize and select peptides in proteome quantification experiments.

Acknowledgments

We acknowledge support from the Spanish Ministry of Science, Innovation and Universities (PID2020-115092GB-I00), the German Federal Ministry of Education and Research (BMBF; Grant No. 031L0008A), “Centro de Excelencia Severo Ochoa 2013-2017”, SEV-2012-0208, and “Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya” (2021SGR01225). This project has also received support from Marie Skłodowska-Curie Actions – European Training Networks PROTrEIN: Computational Proteomics Training European Innovative Network (GA 956148). The CRG/UPF Proteomics Unit is part of the Spanish Infrastructure for Omics Technologies (ICTS OmicsTech).

Data Availability Statement

The mass spectrometry proteomics data have been deposited at the ProteomeXChange Consortium via the PRIDE repository with identifier PXD025766,25 and the model source code can be found at https://github.com/proteomicsunitcrg/peptide-stability.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.3c02269.

  • Figure S1: Heatmap representing the number of peptides identified in each pool; number of time points with valid identifications; peptide areas distribution before and after median normalization; peptide abundance profiles along time (T1–T6) for protein MICAL3; peptide abundance profiles along time (T1–T6) for protein LMF2; number of amino acid containing peptides that are stable, unstable, or noisy, either as total counts or relative frequencies. Supplementry methods. (PDF)

  • Table ST1: (A) List of human proteotypic peptides from the ProteomeTools collection measured in this study and (B) list of quality control peptides included in each peptide pool of the ProteomeTools collection (XLSX)

  • Table ST2: Identification and quantification of peptide stability profiles within 21 days (3.5 days per time point) (XLSX)

Author Contributions

# Cristina Chiva and Zahra Elhamraoui contributed equally.

The authors declare the following competing financial interest(s): M.W. is founder and shareholder of MSAID GmbH and OmicScouts GmbH, although he has no operational role in any of the two companies.

Supplementary Material

ac3c02269_si_001.pdf (372.1KB, pdf)
ac3c02269_si_002.xlsx (7MB, xlsx)
ac3c02269_si_003.xlsx (56.5MB, xlsx)

References

  1. Aebersold R.; Mann M. Mass-Spectrometric Exploration of Proteome Structure and Function. Nature 2016, 537 (7620), 347. 10.1038/nature19949. [DOI] [PubMed] [Google Scholar]
  2. Duncan M. W.; Aebersold R.; Caprioli R. M. The Pros and Cons of Peptide-Centric Proteomics. Nat. Biotechnol. 2010, 28 (7), 659–664. 10.1038/nbt0710-659. [DOI] [PubMed] [Google Scholar]
  3. Silva J. C.; Gorenstein M. V.; Li G.-Z.; Vissers J. P. C.; Geromanos S. J. Absolute Quantification of Proteins by LCMSE: A Virtue of Parallel MS Acquisition. Mol. Cell Proteomics 2006, 5 (1), 144–156. 10.1074/mcp.M500230-MCP200. [DOI] [PubMed] [Google Scholar]
  4. Ishihama Y.; Oda Y.; Tabata T.; Sato T.; Nagasu T.; Rappsilber J.; Mann M. Exponentially Modified Protein Abundance Index (EmPAI) for Estimation of Absolute Protein Amount in Proteomics by the Number of Sequenced Peptides per Protein. Mol. Cell Proteomics 2005, 4 (9), 1265–1272. 10.1074/mcp.M500061-MCP200. [DOI] [PubMed] [Google Scholar]
  5. Schwanhäusser B.; Busse D.; Li N.; Dittmar G.; Schuchhardt J.; Wolf J.; Chen W.; Selbach M. Global Quantification of Mammalian Gene Expression Control. Nature 2011, 473 (7347), 337–342. 10.1038/nature10098. [DOI] [PubMed] [Google Scholar]
  6. Lange V.; Picotti P.; Domon B.; Aebersold R. Selected Reaction Monitoring for Quantitative Proteomics: A Tutorial. Mol. Syst. Biol. 2008, 4, 222. 10.1038/msb.2008.61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Worboys J. D.; Sinclair J.; Yuan Y.; Jørgensen C. Systematic Evaluation of Quantotypic Peptides for Targeted Analysis of the Human Kinome. Nat. Methods 2014, 11 (10), 1041–1044. 10.1038/nmeth.3072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Searle B. C.; Egertson J. D.; Bollinger J. G.; Stergachis A. B.; MacCoss M. J. Using Data Independent Acquisition (DIA) to Model High-Responding Peptides for Targeted Proteomics Experiments. Mol. Cell Proteomics 2015, 14 (9), 2331–2340. 10.1074/mcp.M115.051300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chiva C.; Ortega M.; Sabidó E. Influence of the Digestion Technique, Protease, and Missed Cleavage Peptides in Protein Quantitation. J. Proteome Res. 2014, 13 (9), 3979–3986. 10.1021/pr500294d. [DOI] [PubMed] [Google Scholar]
  10. Peng M.; Taouatas N.; Cappadona S.; van Breukelen B.; Mohammed S.; Scholten A.; Heck A. J. R. Protease Bias in Absolute Protein Quantitation. Nat. Methods 2012, 9 (6), 524–525. 10.1038/nmeth.2031. [DOI] [PubMed] [Google Scholar]
  11. Chiva C.; Sabidó E. Peptide Selection for Targeted Protein Quantitation. J. Proteome Res. 2017, 16 (3), 1376–1380. 10.1021/acs.jproteome.6b00115. [DOI] [PubMed] [Google Scholar]
  12. Mohammed Y.; Domański D.; Jackson A. M.; Smith D. S.; Deelder A. M.; Palmblad M.; Borchers C. H. PeptidePicker: A Scientific Workflow with Web Interface for Selecting Appropriate Peptides for Targeted Proteomics Experiments. J. Proteomics 2014, 106, 151–161. 10.1016/j.jprot.2014.04.018. [DOI] [PubMed] [Google Scholar]
  13. Fusaro V. A.; Mani D. R.; Mesirov J. P.; Carr S. A. Prediction of High-Responding Peptides for Targeted Protein Assays by Mass Spectrometry. Nat. Biotechnol. 2009, 27 (2), 190–198. 10.1038/nbt.1524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kuster B.; Schirle M.; Mallick P.; Aebersold R. Scoring Proteomes with Proteotypic Peptide Probes. Nat. Rev. Mol. Cell Biol. 2005, 6 (7), 577–583. 10.1038/nrm1683. [DOI] [PubMed] [Google Scholar]
  15. Mallick P.; Schirle M.; Chen S. S.; Flory M. R.; Lee H.; Martin D.; Ranish J.; Raught B.; Schmitt R.; Werner T.; Kuster B.; Aebersold R. Computational Prediction of Proteotypic Peptides for Quantitative Proteomics. Nat. Biotechnol. 2007, 25 (1), 125–131. 10.1038/nbt1275. [DOI] [PubMed] [Google Scholar]
  16. Cavaco M.; Andreu D.; Castanho M. A. R. B. The Challenge of Peptide Proteolytic Stability Studies: Scarce Data, Difficult Readability, and the Need for Harmonization. Angew. Chem., Int. Ed. Engl. 2021, 60 (4), 1686–1688. 10.1002/anie.202006372. [DOI] [PubMed] [Google Scholar]
  17. Pérez V.; Juega-Mariño J.; Bonjoch A.; Negredo E.; Clotet B.; Romero R.; Bonet J. Evaluation of Protease Inhibitors Containing Tubes for MS-Based Plasma Peptide Profiling Studies. J. Clin Lab Anal 2014, 28 (5), 364–367. 10.1002/jcla.21694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Ignjatovic V.; Geyer P. E.; Palaniappan K. K.; Chaaban J. E.; Omenn G. S.; Baker M. S.; Deutsch E. W.; Schwenk J. M. Mass Spectrometry-Based Plasma Proteomics: Considerations from Sample Collection to Achieving Translational Data. J. Proteome Res. 2019, 18 (12), 4085–4097. 10.1021/acs.jproteome.9b00503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Whiteaker J. R; Halusa G. N; Hoofnagle A. N; Sharma V.; MacLean B.; Yan P.; Wrobel J. A; Kennedy J.; Mani D R; Zimmerman L. J; Meyer M. R; Mesri M.; Rodriguez H.; Paulovich A. G Clinical Proteomic Tumor Analysis Consortium (CPTAC); Paulovich, A. G. CPTAC Assay Portal: A Repository of Targeted Proteomic Assays. Nat. Methods 2014, 11 (7), 703–704. 10.1038/nmeth.3002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Michaud S. A.; Sinclair N. J.; Pětrošová H.; Palmer A. L.; Pistawka A. J.; Zhang S.; Hardie D. B.; Mohammed Y.; Eshghi A.; Richard V. R.; Sickmann A.; Borchers C. H. Molecular Phenotyping of Laboratory Mouse Strains Using 500 Multiple Reaction Monitoring Mass Spectrometry Plasma Assays. Commun. Biol. 2018, 1, 78. 10.1038/s42003-018-0087-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Percy A. J.; Chambers A. G.; Smith D. S.; Borchers C. H. Standardized Protocols for Quality Control of MRM-Based Plasma Proteomic Workflows. J. Proteome Res. 2013, 12 (1), 222–233. 10.1021/pr300893w. [DOI] [PubMed] [Google Scholar]
  22. Zolg D. P.; Wilhelm M.; Schnatbaum K.; Zerweck J.; Knaute T.; Delanghe B.; Bailey D. J.; Gessulat S.; Ehrlich H.-C.; Weininger M.; Yu P.; Schlegl J.; Kramer K.; Schmidt T.; Kusebauch U.; Deutsch E. W.; Aebersold R.; Moritz R. L.; Wenschuh H.; Moehring T.; Aiche S.; Huhmer A.; Reimer U.; Kuster B. Building ProteomeTools Based on a Complete Synthetic Human Proteome. Nat. Methods 2017, 14 (3), 259–262. 10.1038/nmeth.4153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Tyanova S.; Temu T.; Cox J. The MaxQuant Computational Platform for Mass Spectrometry-Based Shotgun Proteomics. Nat. Protoc 2016, 11 (12), 2301–2319. 10.1038/nprot.2016.136. [DOI] [PubMed] [Google Scholar]
  24. Xie J.; Girshick R.; Farhadi A.. Unsupervised Deep Embedding for Clustering Analysis. In International Conference on Machine Learning; PMLR, 2016; pp 478–487.
  25. Vizcaíno J. A.; Deutsch E. W.; Wang R.; Csordas A.; Reisinger F.; Ríos D.; Dianes J. A.; Sun Z.; Farrah T.; Bandeira N.; Binz P.-A.; Xenarios I.; Eisenacher M.; Mayer G.; Gatto L.; Campos A.; Chalkley R. J.; Kraus H.-J.; Albar J. P.; Martinez-Bartolomé S.; Apweiler R.; Omenn G. S.; Martens L.; Jones A. R.; Hermjakob H. ProteomeXchange Provides Globally Coordinated Proteomics Data Submission and Dissemination. Nat. Biotechnol. 2014, 32 (3), 223–226. 10.1038/nbt.2839. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ac3c02269_si_001.pdf (372.1KB, pdf)
ac3c02269_si_002.xlsx (7MB, xlsx)
ac3c02269_si_003.xlsx (56.5MB, xlsx)

Data Availability Statement

The mass spectrometry proteomics data have been deposited at the ProteomeXChange Consortium via the PRIDE repository with identifier PXD025766,25 and the model source code can be found at https://github.com/proteomicsunitcrg/peptide-stability.


Articles from Analytical Chemistry are provided here courtesy of American Chemical Society

RESOURCES