msproteomics sitereport: reporting DIA-MS phosphoproteomics experiments at site level with ease

Thang V Pham; Alex A Henneman; Nam X Truong; Connie R Jimenez

doi:10.1093/bioinformatics/btae432

. 2024 Jun 29;40(7):btae432. doi: 10.1093/bioinformatics/btae432

msproteomics sitereport: reporting DIA-MS phosphoproteomics experiments at site level with ease

Thang V Pham ^1,^2,^✉, Alex A Henneman ^3,⁴, Nam X Truong ⁵, Connie R Jimenez ^6,⁷

Editor: Janet Kelso

PMCID: PMC11239223 PMID: 38944032

Abstract

Summary

Identification and quantification of phosphorylation sites are essential for biological interpretation of a phosphoproteomics experiment. For data independent acquisition mass spectrometry-based (DIA-MS) phosphoproteomics, extracting a site-level report from the output of current processing software is not straightforward as multiple peptides might contribute to a single site, multiple phosphorylation sites can occur on the same peptides, and protein isoforms complicate site specification. Currently only limited support is available from a commercial software package via a platform-specific solution with a rather simple site quantification method. Here, we present sitereport, a software tool implemented in an extendable Python package called msproteomics to report phosphosites and phosphopeptides from a DIA-MS phosphoproteomics experiment with a proven quantification method called MaxLFQ. We demonstrate the use of sitereport for downstream data analysis at site level, allowing benchmarking different DIA-MS processing software tools.

Availability and implementation

sitereport is available as a command line tool in the Python package msproteomics, released under the Apache License 2.0 and available from the Python Package Index (PyPI) at https://pypi.org/project/msproteomics and GitHub at https://github.com/tvpham/msproteomics.

1 Introduction

Protein phosphorylation is one of the most important regulatory mechanism of cellular signaling in biological systems. Profiling phosphorylation states of biological samples is feasible by mass spectrometry-based proteomics. A recent approach called data independent acquisition (DIA-MS) offers advantages for site localization (Bekker-Jensen et al. 2020) and data completeness in comparison to the traditional data dependent acquisition (DDA-MS). However, current major DIA-MS data processing software tools, namely Spectronaut (Biognosys) and DIA-NN (Demichev et al. 2020), do not offer a common reporting scheme for phosphosites and phosphopeptides, which makes it difficult for downstream analysis and accessing the performance of the processing pipelines. Furthermore, the source code for data reporting is not available for inspection when a large discrepancy is observed (Skowronek et al. 2022).

In their seminal work on DIA-MS phosphoproteomics, Bekker-Jensen et al. provide a software plugin for Perseus (Tyanova et al. 2016), which creates site and peptide level report similar to that of MaxQuant (Cox and Mann 2008) for DDA data. The software was adopted in the Spectronaut software (Biognosys) for site-level reporting. However, the plugin does not support later versions of Spectronaut or other software such as DIA-NN; and as a result, the user needs to resort to in-house scripts (Kitata et al. 2021, Skowronek et al. 2022). Furthermore, site quantification is currently done by summing up associated intensities, ignoring missing values and not taking into account differences such as ionization efficiency of fragmentation of observed peptides. Here, we present an open-source software tool called sitereport for both site-level and peptide-level reporting for DIA-MS phosphoproteomics. In particular, we implement the MaxLFQ algorithm (Cox et al. 2014) for quantification as in the R package iq (Pham et al. 2020), allowing quantification using peptide intensity (MS1), or peptide fragment intensity (MS2), or both sources of information, which has demonstrated excellent performance for protein expression for DIA-MS experiments.

2 Implementation

The input of the software is a long-format report as exported by DIA-MS processing software Spectronaut and DIA-NN. We support MS/MS fragment level exports to allow quantification using MS2 information. We implement data format conversion to accommodate different processing tools. A Spectronaut export can be processed without alteration, while for DIA-NN we provide a tool to adapt the output to a required format, taking into account the input proteome used either in the library-free search or in the library construction phase to properly map detected peptides to protein isoforms.

2.1 Phosphosite report

Figure 1A outlines four main steps in sitereport. The first step is optional that normalizes the input data by bringing the median intensities equal across all samples. This is typically used to adjust for variation in the mount of input materials.

In the second step, we filter out peptides with low confidence in identification and localization, e.g. peptide q-value <0.01 and site localization at least 0.75. The filtering on site localization confidence might differ among software tools. Lou et al. suggest that a threshold of 0.75 on site localization in Spectronaut is equivalent to a threshold of 0.01 in DIA-NN.

In the third step, we construct phosphosite identifiers. Each identifier is composed of a protein identifier, the site position in the protein, and the so-called phosphorylation multiplicity. The concept of phosphorylation multiplicity originates from MaxQuant in which 1 denotes single phosphorylation, 2 for double phosphorylation of a peptide, and 3 for three or more phosphorylation sites. We consider five scenarios (i) a singly phosphorylated peptide mapped to a single protein, (ii) phosphorylated peptides mapped to multiple locations of a single protein, (iii) a doubly phosphorylated peptides, (iv) peptides phosphorylated at three or more locations, and (v) peptides mapped to multiple protein isoforms. Following this strategy for phosphosite identifier construction, we obtain identical result as the pivot report (wide format) from the latest Spectronaut release (version 18).

Finally, for each site identifier, we form a quantitative table of associated, observed intensities, and subsequently perform site-level quantification. The options are summing the intensities, taking the maximal intensities, and MaxLFQ quantification. Here we have integrated the fast implementation of the MaxLFQ algorithm from R package iq (Pham et al. 2020) for protein summarization.

2.2 Phosphopeptide report

The steps for peptide-level reporting are similar to those of site level reporting except that site localization confidence is optional and another strategy for phosphopeptide identifiers is used. We follow the scheme described by Bekker-Jensen et al. (2020) to produce modification-specific peptides. Specifically, for each modified peptides, the number of sites for each modification are appended to the unmodified version as illustrated in Fig. 1B. After the creation of modified peptide keys, we populate the observation for each key and perform quantification like for site quantification.

3 Example usage

3.1 Phosphosite and phosphopeptide reporting

To demonstrate the usage of the sitereport, we analyze a dataset from the study in (Skowronek et al. 2022). Here the authors observed a considerable difference between Spectronaut and DIA-NN. First, we process the outputs from Spectronaut 16 and DIA-NN 1.8 as published in dataset PXD034128 in the ProteomeXchange repository. Spectronaut output can be used immediately. For DIA-NN, we convert the output to another text format where fragment intensities are unrolled into multiple rows. In addition, the proteome used in the library construction stage is used to properly locate phosphosite positions in the proteins. The output can be processed by activating a single command

sitereport report.tsv -tool sn

Figure 2A shows the results of sitereport processing. The number of phosphosites are similar to those reported by the authors of the datasets, confirming that Spectronaut results in more phosphosites than DIA-NN. Note that for DIA-NN, we used a threshold of 0.01 for site localization confidence cut-off as suggested in (Lou et al. 2023). Nevertheless, we obtain a lower number of phosphosites for Spectronaut (25 006 versus 28 980) and more phosphosites for DIA-NN (12 754 versus 10 510) with more overlapped sites (33% versus 26%).

To verify our implementation, we examine the site-level report produced by Spectronaut 18 (the pivot report). We obtain an identical list of phosphosites for this dataset and all other datasets we have considered. Next, we ran the dataset through the latest peptide collapse plugin (Perseus 2.0.11.0, plugin 1.4.4) with an identical setting used by sitereport. Specifically, we used column EG.PTMAssayProbability for site localization confidence, PG.ProteinGroups for protein identifiers, and with all modifications [Phospho (STY)], [Acetyl (Protein N-term)], [Carbamidomethyl (C)], and [Oxidation (M)]. We have discovered that the Perseus plugin misses several phosphosites due to repetition of peptides in the protein sequence (scenario 2) and multiple isoforms (scenario 5). The data and analysis scripts are publicly available for inspection.

Next, we assess the performance of the latest releases of the processing software. For Spectronaut 18, we process the DIA data with a spectral library created from the DDA data. For the latest DIA-NN 1.8.2 beta 27, we use the library construction pipeline FragPipe platform version 20.0 (https://fragpipe.nesvilab.org/) (Demichev et al. 2022, Yu et al. 2023) with MSFragger 3.8 (Kong et al. 2017), MSBooster 1.1.11 (Yang et al. 2023), Percolator 3.06.0 (Käll et al. 2007), and the spectral library building module EasyPQP 0.1.40 (https://github.com/grosenberger/easypqp). Figure 2B shows a significant increase in the number of phosphosites and phosphopeptides for DIA-NN, from 12 754 sites to 19 730 sites and 12 628 peptides to 18 023 peptides. There is a small increase in the number of identified phosphosites for Spectronaut, but a large decrease in the number of phosphopeptides. Overall, while Spectronaut 18 still gives a higher number of phosphosites and phosphopeptides than DIA-NN 1.8.2 beta 27, the difference is not as pronounced as that between the previous versions. Note that the true false discovery rates of Spectronaut and the different versions of DIA-NN are unknown since there is no unbiased external control available. Therefore, the differences reported here might be due to different approaches to false discovery rate control.

3.2 Site quantification

The package supports different methods for quantification. To demonstrate this feature, we process dataset PXD014525 published in (Bekker-Jensen et al. 2020) using Spectronaut 18. The dataset consists of 36 DIA-MS phosphoproteomics runs of mixed species in 5 conditions called yeast 25, yeast 50, yeast 100, yeast 150, and yeast 200 with 6 replicates each (the remaining 6 runs are yeast only and human only). For each phosphosite, the average values in conditions yeast 25, yeast 50, yeast 150, and yeast 200 are divided by the corresponding values in yeast 100, resulting in expected ratios of 0.25, 0.5, 1.5, and 2.0 for yeast phosphosites, and 1.0 for phosphosites belonging to the human background. We used the directDIA mode in Spectronaut 18 with human proteome from UniProt (downloaded 30 March 2023, 42 420 entries), a yeast proteome from UniProt (downloaded 3 May 2023, 6757 entries), and Biognosys iRT peptide sequences. Figure 2C shows the quantification results for yeast phosphosites by the Spectronaut export and by sitereport with MaxLFQ, and Fig. 2D the results for phosphosites in the human background. It can be seen that relative site quantification using MS2 fragments, giving comparable results as those reported by Spectronaut. For the 0.25 ratio comparison, quantification of yeast phosphosites is better by Spectronaut while quantification of the human background is better by sitereport. For other group comparisons, the results by sitereport are better for both yeast and human background. Supplementary Figure S1 shows the quantitative results stratified by phosphorylation multiplicity. Supplementary Figures S2 and S3 show the results of the three quantification methods with and without normalization. Here, the normalization step appears to have a large effect in quantification for phosphosites in the human background.

4 Conclusion

We have implemented a reference reporting scheme for DIA-MS phosphoproteomics experiments for both site-level and peptide-level quantification. The Python software package currently supports two major DIA-MS processing software tools Spectronaut and DIA-NN, and readily extendable to support other tools via format conversion. Three quantification methods, summing associated intensities, taking the maximal value, and MaxLFQ, are implemented together with data normalization. Our software will facilitate downstream data analysis at the phosphosite level and aid in benchmarking upstream data processing tools.

Supplementary Material

btae432_Supplementary_Data

btae432_supplementary_data.pdf^{(251KB, pdf)}

Acknowledgements

The authors thank the students Do Duy Luc, Tran Thanh Kien, Tuong Dang Vuong Quoc, Nguyen Gia Bao, and Truong Tuan Dung for contribution to the implementation of the Python package.

Contributor Information

Thang V Pham, Amsterdam UMC location Vrije Universiteit Amsterdam, OncoProteomics Laboratory, Medical Oncology, De Boelelaan 1117, Amsterdam, the Netherlands; Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, the Netherlands.

Alex A Henneman, Amsterdam UMC location Vrije Universiteit Amsterdam, OncoProteomics Laboratory, Medical Oncology, De Boelelaan 1117, Amsterdam, the Netherlands; Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, the Netherlands.

Nam X Truong, Thuyloi University, Faculty of Computer Science and Engineering, 175 Tay Son, Hanoi, Vietnam.

Connie R Jimenez, Amsterdam UMC location Vrije Universiteit Amsterdam, OncoProteomics Laboratory, Medical Oncology, De Boelelaan 1117, Amsterdam, the Netherlands; Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, the Netherlands.

Author contributions

Thang V. Pham and Connie R. Jimenez conceived the study. Thang V. Pham, Nam X. Truong, and Alex A. Henneman developed the method and experimental software. Thang V. Pham and Alex A. Henneman performed experiments. All authors interpreted the results, wrote, and reviewed the manuscript.

Supplementary data

Supplementary data are available at Bioinformatics online.

Conflict of interest

None declared.

Funding

This work was supported by the Netherlands eScience Center [ASDI.2020.014].

Data availability

The data used in this article are publicly available at doi.org/10.5281/zenodo.11494771.

References

Bekker-Jensen DB, Bernhardt OM, Hogrebe A. et al. Rapid and site-specific deep phosphoproteome profiling by data-independent acquisition without the need for spectral libraries. Nat Commun 2020;11:787. 10.1038/s41467-020-14609-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cox J, Hein MY, Luber CA. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics 2014;13:2513–26. 10.1074/mcp.M113.031591. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cox J, Mann M.. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 2008;26:1367–72. 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
Demichev V, Messner CB, Vernardis SI. et al. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat Methods 2020;17:41–4. 10.1038/s41592-019-0638-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Demichev V, Szyrwiel L, Yu F. et al. dia-PASEF data analysis using FragPipe and DIA-NN for deep proteomics of low sample amounts. Nat Commun 2022;13:3944. 10.1038/s41467-022-31492-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Käll L, Canterbury JD, Weston J. et al. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods 2007;4:923–5. 10.1038/nmeth1113. [DOI] [PubMed] [Google Scholar]
Kitata RB, Choong WK, Tsai CF. et al. A data-independent acquisition-based global phosphoproteomics system enables deep profiling. Nat Commun 2021;12:2539. 10.1038/s41467-021-22759-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kong AT, Leprevost FV, Avtonomov DM. et al. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods 2017;14:513–20. 10.1038/nmeth.4256. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lou R, Cao Y, Li S. et al. Benchmarking commonly used software suites and analysis workflows for DIA proteomics and phosphoproteomics. Nat Commun 2023;14:94. 10.1038/s41467-022-35740-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pham TV, Henneman AA, Jimenez CR.. iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics. Bioinformatics 2020;36:2611–3. 10.1093/bioinformatics/btz961. [DOI] [PMC free article] [PubMed] [Google Scholar]
Skowronek P, Thielert M, Voytik E. et al. Rapid and in-depth coverage of the (phospho-)proteome with deep libraries and optimal window design for dia-PASEF. Mol Cell Proteomics 2022;21:100279. 10.1016/j.mcpro.2022.100279. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tyanova S, Temu T, Sinitcyn P. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 2016;13:731–40. 10.1038/nmeth.3901. [DOI] [PubMed] [Google Scholar]
Yang KL, Yu F, Teo GC. et al. MSBooster: improving peptide identification rates using deep learning-based features. Nat Commun 2023;14:4539. 10.1038/s41467-023-40129-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu F, Teo GC, Kong AT. et al. Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nat Commun 2023;14:4154. 10.1038/s41467-023-39869-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btae432_Supplementary_Data

btae432_supplementary_data.pdf^{(251KB, pdf)}

Data Availability Statement

The data used in this article are publicly available at doi.org/10.5281/zenodo.11494771.

[btae432-B1] Bekker-Jensen DB, Bernhardt OM, Hogrebe A. et al. Rapid and site-specific deep phosphoproteome profiling by data-independent acquisition without the need for spectral libraries. Nat Commun 2020;11:787. 10.1038/s41467-020-14609-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btae432-B2] Cox J, Hein MY, Luber CA. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics 2014;13:2513–26. 10.1074/mcp.M113.031591. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btae432-B3] Cox J, Mann M.. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 2008;26:1367–72. 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]

[btae432-B4] Demichev V, Messner CB, Vernardis SI. et al. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat Methods 2020;17:41–4. 10.1038/s41592-019-0638-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btae432-B5] Demichev V, Szyrwiel L, Yu F. et al. dia-PASEF data analysis using FragPipe and DIA-NN for deep proteomics of low sample amounts. Nat Commun 2022;13:3944. 10.1038/s41467-022-31492-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btae432-B6] Käll L, Canterbury JD, Weston J. et al. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods 2007;4:923–5. 10.1038/nmeth1113. [DOI] [PubMed] [Google Scholar]

[btae432-B7] Kitata RB, Choong WK, Tsai CF. et al. A data-independent acquisition-based global phosphoproteomics system enables deep profiling. Nat Commun 2021;12:2539. 10.1038/s41467-021-22759-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btae432-B8] Kong AT, Leprevost FV, Avtonomov DM. et al. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods 2017;14:513–20. 10.1038/nmeth.4256. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btae432-B9] Lou R, Cao Y, Li S. et al. Benchmarking commonly used software suites and analysis workflows for DIA proteomics and phosphoproteomics. Nat Commun 2023;14:94. 10.1038/s41467-022-35740-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btae432-B10] Pham TV, Henneman AA, Jimenez CR.. iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics. Bioinformatics 2020;36:2611–3. 10.1093/bioinformatics/btz961. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btae432-B11] Skowronek P, Thielert M, Voytik E. et al. Rapid and in-depth coverage of the (phospho-)proteome with deep libraries and optimal window design for dia-PASEF. Mol Cell Proteomics 2022;21:100279. 10.1016/j.mcpro.2022.100279. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btae432-B12] Tyanova S, Temu T, Sinitcyn P. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 2016;13:731–40. 10.1038/nmeth.3901. [DOI] [PubMed] [Google Scholar]

[btae432-B13] Yang KL, Yu F, Teo GC. et al. MSBooster: improving peptide identification rates using deep learning-based features. Nat Commun 2023;14:4539. 10.1038/s41467-023-40129-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btae432-B14] Yu F, Teo GC, Kong AT. et al. Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nat Commun 2023;14:4154. 10.1038/s41467-023-39869-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

msproteomics sitereport: reporting DIA-MS phosphoproteomics experiments at site level with ease

Thang V Pham

Alex A Henneman

Nam X Truong

Connie R Jimenez

Roles

Abstract

Summary

Availability and implementation

1 Introduction

2 Implementation

2.1 Phosphosite report

Figure 1.

2.2 Phosphopeptide report

3 Example usage

3.1 Phosphosite and phosphopeptide reporting

Figure 2.

3.2 Site quantification

4 Conclusion

Supplementary Material

Acknowledgements

Contributor Information

Author contributions

Supplementary data

Conflict of interest

Funding

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

msproteomics sitereport: reporting DIA-MS phosphoproteomics experiments at site level with ease

Thang V Pham

Alex A Henneman

Nam X Truong

Connie R Jimenez

Roles

Abstract

Summary

Availability and implementation

1 Introduction

2 Implementation

2.1 Phosphosite report

Figure 1.

2.2 Phosphopeptide report

3 Example usage

3.1 Phosphosite and phosphopeptide reporting

Figure 2.

3.2 Site quantification

4 Conclusion

Supplementary Material

Acknowledgements

Contributor Information

Author contributions

Supplementary data

Conflict of interest

Funding

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases