Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Apr 1.
Published in final edited form as: J Proteome Res. 2022 Jan 27;21(4):899–909. doi: 10.1021/acs.jproteome.1c00669

multiFLEX-LF: A Computational Approach to Quantify the Modification Stoichiometries in Label-free Proteomics Datasets

Pauline Hiort 1,2, Christoph N Schlaffner 1,2,3,4, Judith A Steen 3,4,5, Bernhard Y Renard 2,§, Hanno Steen 1,5,6,7,§,*
PMCID: PMC9936407  NIHMSID: NIHMS1868457  PMID: 35086334

Abstract

In LC-MS/MS-based proteomics, information about the presence and stoichiometry of protein modifications is not readily available. To overcome this problem we developed multiFLEX-LF, a computational tool that builds upon FLEXIQuant, which detects modified peptide precursors and quantifies their modification extent by monitoring the differences between observed and expected intensities of the unmodified precursors. multiFLEX-LF relies on robust linear regression to calculate the modification extent of a given precursor relative to a within-study reference. multiFLEX-LF can analyze entire label-free discovery proteomics datasets in a precursor-centric manner without preselecting a protein of interest. To analyze modification dynamics and co-regulated modifications, the precursors of all proteins are hierarchically clustered based on their computed relative modification scores. We applied multiFLEX-LF to a data-independent acquisition (DIA)-based dataset acquired using the Anaphase Promoting Complex/Cyclosome (APC/C) isolated at various time points during mitosis. The clustering of the precursors allows for identifying varying modification dynamics and ordering the modification events.

Overall, multiFLEX-LF enables fast identification of potentially differentially modified peptide precursors and quantification of their differential modification extent in large datasets using a personal computer. Additionally, multiFLEX-LF can drive large-scale investigation of modification dynamics of peptide precursors in time series and case-control studies. multiFLEX-LF is available at https://gitlab.com/SteenOmicsLab/multiflex-lf.

Keywords: bioinformatics tool, label-free quantification, LC-MS/MS, post-translational modification, modification stoichiometry, PTM quantification

Graphical Abstract

graphic file with name nihms-1868457-f0001.jpg

Introduction

Post-translational modifications (PTMs) of proteins in biological systems allow for location- or time specific changes of protein properties1. Thereby, the number of proteins encoded in the genome and their functions are expanded extensively. Technological advances have made liquid chromatography–tandem mass spectrometry (LC–MS/MS) the key technology in proteome and PTM research24. Changes in the quantity and the pattern of PTMs of proteins can have physiological implications in a wide range of diseases, including cancer5,6, Alzheimer’s disease79 and other age-related diseases10. Knowing the PTM stoichiometry aids in understanding physiological and pathophysiological systems. For PTM identification and quantification specific enrichment methods are oftentimes necessary to detect modified peptides among the highly abundant unmodified peptides11. As a result, absolute PTM quantification with good precision and accuracy is difficult to accomplish without prior knowledge about the type of modification and additional experimental steps. Consequently, the analysis of the unmodified peptides is cheaper, less time intensive and more accurate.

Singh et al. (2009)12 developed a MS-based experimental workflow, termed FLEXIQuant (full-length expressed stable isotope labeled proteins for quantification). FLEXIQuant quantifies changes in the abundance of unmodified peptides relative to a standard to identify modified peptides and to quantify the extent of modification. Briefly, a protein of interest is synthesized in a wheat germ extract with heavy isotope labels and tagged with a so-called FLEX-peptide before being spiked into the sample as a heavy full length protein standard. After digestion and LC-MS/MS analysis, the intensities of the unmodified heavy-labeled peptides from the protein standard can be compared to the intensities of the unmodified light peptides from the endogenous protein. Hence, light/heavy intensity ratios of the unmodified peptides can be calculated. All peptides of a completely unmodified natural protein exhibit similar light/heavy ratios compared to the unmodified peptides of the protein standard. However, if a peptide is modified then the intensity of its corresponding unmodified peptide is lower, therefore, the light/heavy ratio of the unmodified peptide is reduced. Based on these deviating ratios, the degree of modification of the peptides is determined.

The FLEXIQuant12 method requires a purified and labeled standard for every protein of interest, and is therefore a laborious, time-intensive and expensive process. Hence, building on the same concept, Schlaffner et al. (2020)13 developed the label-free computational method FLEXIQuant-LF (FLEXIQuant-Label Free). FLEXIQuant-LF infers the degree of differential modification solely based on the intensities of the unmodified peptides of a single protein in time series or case-control studies, where a certain time point or control group is used as a reference. The median intensities of the precursors, i.e., peptides with different charge states, of the reference group are compared to the intensities of each sample through robust linear regression. FLEXIQuant-LF employs the random sample consensus (RANSAC) algorithm14 to train a linear regression model for each sample compared to the reference. Based on the model the distance to the regression line is calculated for every peptide precursor13. Afterwards, the distances are normalized by the slope of the regression line and the intensity of the reference precursor and subtracted from one. Next, outliers with a normalized distance, a so-called raw score, above a sample-specific cutoff are removed. Lastly, raw scores are normalized by the median of the three highest raw scores of the protein within a sample resulting in the relative modification (RM) score. The RM score is a measure of the modification extent and is analogous to the light/heavy ratio of FLEXIQuant. A RM score of 1 or higher corresponds to 0% differential modification while RM score 0.5 corresponds to a differential modification of 50%. Through FLEXIQuant-LF modified peptide precursors are indirectly identified via strongly deviating intensities of unmodified precursors from a sample defined as reference. The minimal requirements of FLEXIQuant-LF for analyzing a protein are the intensities of at least five unmodified precursors and a reference sample.

The FLEXIQuant and FLEXIQuant-LF methods quantify the modification extent of peptide precursors of a single protein of interest based on changes of the intensities of the unmodified precursors. While the targeted analysis of single proteins is interesting by itself, the dynamics and interplay between multiple proteins is of particular interest to understand the processes within cells. Therefore, multiFLEX-LF was built on top of FLEXIQuant-LF to analyze all proteins in a dataset in parallel. Additionally, the multiFLEX-LF workflow implements a method to distinguish the potentially differentially modified precursors and investigate potential modification stoichiometries.

We applied our multiFLEX-LF method on a data independent acquisition (DIA)-based dataset from the anaphase promoting complex/cyclosome (APC/C) co-immunoprecipitated at various time points after nocodazole treatment of synchronized HeLa cells, i.e., at various time points during mitosis. The multiFLEX-LF analysis allowed us to identify the peptide precursors that show potential cell cycle phase-dependent differential modifications in a protein-independent manner. Apart from providing insights into the ordering of differential protein modifications during nocodazole treatment, we also identified additional non-APC/C precursors and proteins which show similar potential differential modification dynamics as the APC/C, thereby, providing evidence for also being substrates of enzymes that are regulated in a mitosis-dependent manner.

Materials and Methods

multiFLEX-LF

The bioinformatics tool multiFLEX-LF was developed in Python 3.8.11 to analyze all proteins in a given dataset of precursor intensities, i.e., peptides measured at different charge states. To this end FLEXIQuant-LF13 is applied to every protein in the dataset consecutively. The FLEXIQuant-LF method employs the random sample consensus (RANSAC) algorithm14 to compute a robust linear regression between the precursor intensities of a protein from a sample of interest and from a reference sample13. RANSAC linear regression iteratively identifies outliers and fits the regression line solely based on the inliers. For every peptide precursor of the protein in the sample of interest the distance to the regression line is calculated and normalized by the slope of the regression line and the intensities of the corresponding reference precursor. The normalized distances, called raw scores, are subtracted from one and outliers with a normalized distance above a computed sample-specific cutoff are removed. The relative modification (RM) scores are then computed by normalizing the raw scores by the median of the three highest raw scores of the protein within a sample. The RM score is a measure of the modification extent, i.e., a RM score of 1 corresponds to a modification extent of 0% and a RM score of 0 to a modification extent of 100%. multiFLEX-LF was implemented to apply the FLEXIQuant-LF method to all proteins in parallel in a user-defined number of processes employing the multiprocessing library of Python. The input format for multiFLEX-LF is a list-like CSV file. The input has to contain protein IDs, peptide precursor IDs, the samples, the sample groups and the intensities in separate columns. multiFLEX-LF requires the samples of a dataset to be grouped into at least two different sample groups, a reference group and another sample group, e.g., a disease group. Additionally, multiFLEX-LF requires at least five precursors per protein for a reliable RM score calculation. This requirement is due to the linear regression and RANSAC component of the algorithm with three data points being required for any regression model with standard errors and two outliers for a stable RANSAC linear regression model13. After protein-wise application of FLEXIQuant-LF, imputation of missing RM scores, additional normalization and hierarchical clustering of the peptide precursors of all proteins was implemented in multiFLEX-LF. A graphical overview of the multiFLEX-LF workflow is shown in Figure 1.

Figure 1:

Figure 1:

Overview of multiFLEX-LF. multiFLEX-LF requires label-free quantification data of unmodified peptide precursors of a set of proteins and a within study reference to identify differentially modified precursors and compute the modification extent. To this end, an adapted FLEXIQuant-LF13 using robust linear regression is applied on every protein consecutively. The intensities of the unmodified precursors of a protein are compared to the reference intensities of that protein by fitting a linear regression line. Based on the distances to the regression line normalized relative modification (RM) scores are calculated for every precursor and every sample. Quality control is given in the form of intermediate results with measures such as reproducibility and visualization of the regression for every sample and protein. The RM scores of all proteins can be subjected to an additional normalization step employing DESeq2. To highlight different groups of peptide precursors with different modification dynamics, the precursors are hierarchically clustered based on the RM scores. These different groups can be analyzed further, for example by looking at the trajectory of the median RM score with a confidence band for each group or by creating sequence logos of the precursors in the clusters to find potential enzyme motifs.

multiFLEX-LF implementation

The tool multiFLEX-LF was implemented in Python 3.8.11 employing the packages pandas15 (version 1.3.1), numpy16 (version 1.20.3), scipy17 (version 1.6.2), scikit-learn18 (version 0.24.2), seaborn19 (version 0.11.1), matplotlib20 (version 3.4.2), plotly21 (version 5.1.0). A normalization using DESeq222 (version 1.26.0) was applied in R23 (version 3.6.3 (2020)). The package click24 (version 8.0.1) was used for implementation of the command line interface (CLI). Additionally, a graphical user interface (GUI) of multiFLEX-LF was developed utilizing PyQt525 (version 5.9.2) and Qt26 (version 5.9.7). The readily available executables of the CLI and the GUI were created with PyInstaller27 (version 3.6).

Missing value handling and imputation of the RM scores

To enhance the broad application scope of multiFLEX-LF missing value handling was added. Missing value imputation was implemented for the subsequent DESeq2 normalization and hierarchical clustering of the RM scores as these downstream methods do not provide handling of missing values. Due to multiFLEX-LFs requirement of at least two sample groups for differential analysis, peptide precursors with RM scores in less than two sample groups are removed before imputation. After this filtering step the missing RM scores of a precursor are imputed with the median of the RM scores of the closest precursors in the dataset. The closest precursors are determined by the cosine similarity measure of their RM scores over the samples. For every precursor with missing RM scores the cosine similarity to all other precursors in the dataset is calculated. Next, the median of the RM scores of all precursors with a similarity larger than a user-defined threshold (default threshold: 0.98) are calculated. The missing RM scores of the current precursor are then filled with the computed median values. If no close precursors are found, the precursor is removed from further analyses. Additionally, all precursors that still have missing RM scores after imputation are removed.

DESeq2 normalization of the RM score

To enable adequate RM score comparison of precursors across the samples in the hierarchical clustering independent of protein assignment, an additional normalization step was added to multiFLEX-LF employing DESeq222. The optional RM score normalization with DESeq2 is computed after missing value imputation. The RM scores are normalized per group, i.e., for every sample group the DESeq2 size factors are calculated for each sample in the group. After normalization the previous missing RM scores are imputed again based on the DESeq2 normalized RM scores. The precursors are then clustered based on the newly normalized and imputed RM scores.

Hierarchical clustering of the RM scores

To group precursors with similar differential modification behavior over the samples the peptide precursors across all proteins in the dataset are clustered based on their RM scores. For this, agglomerative hierarchical clustering is used in multiFLEX-LF employing the hierarchical clustering module from scipy17 (version 1.6.2). To this end, a custom distance measure was defined to account for the modification cutoff used in multiFLEX-LF. The modification cutoff is a user-defined value (default value: 0.5) that indicates the threshold for the RM score below which a precursor is considered potentially differentially modified. The distance measure is based on the Manhattan distance adding a penalty for jumps from below to above the modification cutoff. In more detail, to compute the distance between two precursors the absolute difference of the RM scores of the precursors is calculated per sample. If the RM score of one of the precursors is below the user-defined modification cutoff for a given sample and the RM score of the other precursor is above the cutoff in the same sample, +1 is added as a penalty to the absolute difference of the two RM scores. Similar to the Manhattan distance, the penalized absolute differences between the RM scores are summed over the samples thus creating the distance measure between two precursors. The results of the hierarchical clustering are displayed in an interactive dendrogram together with a heatmap of the RM scores. Based on this, precursors with similar potential differential modification dynamics can be investigated and subjected to further analyses.

multiFLEX-LF application

Dataset

multiFLEX-LF was applied on an in-house dataset acquired for the analysis of the anaphase promoting complex/cyclosome (APC/C). The previously published FLEXIQuant-LF method was tested with a small subset of proteins from this dataset (ProteomeXchange identifier: PXD018411)13. Briefly, HeLa S3 cells were treated with thymidine for 20 hours to synchronize them in S (synthesis) phase (0h). For M (mitotic) phase, the cells were treated with nocodazole after having been cultured for 3 hours in fresh media. Cells were collected after 4, 8 and 10 hours of treatment with nocodazole. Each sample was analyzed once on a TripleTOF® 5600 mass spectrometer (Sciex, Framingham, MA) in data-dependent acquisition (DDA) mode coupled with an online nanoLC system (Sciex/Eksigent, Dublin, CA). Additionally, each time point was analyzed in SWATH (DIA) mode in triplicates using the same mass spectrometer and LC-system as for the DDA data.

The raw DDA data were newly searched in the human proteome database from Uniprot28 containing 20,397 canonical and reviewed sequences (downloaded: Oct. 20, 2020). The database was concatenated with the iRT sequence from Biognosys and common laboratory contaminants (245 entries). The data was searched with MaxQuant v1.5.2.829 with full tryptic specificity and allowing up to 2 missed cleavages. Carbamidomethyl on C was set as a fixed modification and oxidation on M and phosphorylation on S, T and Y were set as variable modifications. The error tolerance was set to 20ppm during the first search and 4.5 ppm during the second search. The database was reversed for false-discovery rate (FDR) calculation. The FDR was set to 1% at peptide and protein level. The minimum length of peptides was set to seven amino acids. The database search results were used to generate a spectral library in Spectronaut 14.530,31 with the above-described database and default parameters. The DIA data were searched against the spectral library employing Spectronaut 14.5 with default parameters. The precursor intensities of the DIA data were used for the multiFLEX-LF analysis. Due to a low number of protein identifications (<10% of the other two replicates), the third replicate of each time point was omitted from the data analyses. Additionally, for each time point the mean precursor intensities of the replicates were calculated for each precursor and analyzed with multiFLEX-LF. Generally, we recommend the median across multiple replicates (>2).

Additionally, the DDA data were searched with ProteinPilot (Paragon Algorithm: 5.0.0.0; Sciex, Framingham, MA)32 to improve PTM identification. The above-described database was used. Sample type was set to identification, cys alkylation to iodoacetamide, digestion to trypsin, instrument to TripleTOF 5600, special factors to phosphorylation emphasis and gel-based ID, ID focus to biological modifications, and search effort to thorough ID. The peptide FDR was set to 1%. Thus identified modified peptide precursors in the DDA data were compared to the potentially differentially modified peptide precursors identified with multiFLEX-LF.

Data processing

Before multiFLEX-LF analysis the identified and quantified peptide precursors of the DIA data search results were filtered. Precursors without a unique assignment to one protein were removed from further analyses. Only unmodified precursors and precursors with methionine oxidation or cysteine carbamidomethylation were processed with multiFLEX-LF. Peptide precursors with an unmodified version and a modified counterpart with one or more of those two modifications were considered as two separate precursors with different precursor identifiers. Additionally, only proteins with at least five identified and quantified peptide precursors were used for multiFLEX-LF analysis, as the tool does not work with fewer precursors. Charge states of the precursor ions were added to the peptide sequences to generate unique precursor identifiers. If more than one intensity was given for a precursor with the same charge state in a sample the intensities were summed. Additionally, human keratin proteins were removed as common contaminants. For each time point the mean precursor intensity of the replicates was calculated before multiFLEX-LF was applied using the following settings: modification cutoff was set to 0.5 (default), cosine similarity threshold to 0.98 (default), other parameters were set to default, and the reference was the 0-hours sample.

Follow-up analysis

After hierarchical clustering six clusters of potentially differentially modified peptide precursors were selected for further analysis to highlight a use case. Manual investigation of the dendrogram and heatmap informed the application of a clustering distance of 0.45 to build the flat clusters. The acetylation, methylation, phosphorylation and ubiquitination site databases from PhosphoSitePlus®3335 (version: Feb. 17, 2021; downloaded: March 16, 2021) were searched for known modification sites. These databases contain previously identified modification sites in proteins of humans, mice and several other species. For every potentially differentially modified precursor in the selected clusters, all modification sites within the respective sequence were searched in the PhosphoSitePlus® databases. After finding all known phosphorylation sites in the sequences of the potentially differentially modified precursors, short amino acid sequences containing the seven preceding and following amino acids around the identified sites were extracted. If multiple modification sites of a precursor were found, the 15 amino acid sequences surrounding each site were extracted. Phosphorylation sites, which were found multiple times due to, e.g., different charge states of the same precursor, were analyzed only once. The resulting amino acid sequences were divided into three groups based on their phosphorylation site being a serine, threonine or tyrosine residue. These groups of sequences were then subjected to the sequence logo generator PSP LogoGenerator3335, a web interface provided by PhosphoSitePlus®. The logos are constructed with overrepresented amino acids as positive and underrepresented amino acids as negative values. The sum of the absolute numbers of positive and negative values are kept equal to one by the PSP Production algorithm. For the logo generation, the PSP Production algorithm was selected and the background for logo generation was set to input sequences.

Robustness and runtime analysis

To analyze the runtime of multiFLEX-LF 25%, 50% and 75% percent of the proteins from our dataset were randomly selected ten times each and analyzed with multiFLEX-LF. The number of CPUs was set to 5 for these analyses using a common personal laptop with 16GB RAM and a 6-Core AMD Ryzen 5 CPU. Clustering was turned on and the other settings were left at the default values.

The impact of missing values on the multiFLEX-LF RM score computation and missing RM score imputation was analyzed by random removal of values. Prior to this analysis the precursors with missing intensities were removed from the dataset. Afterwards, respectively 10 to 80% of the intensities were masked as missing randomly per sample. The generated datasets were searched with multiFLEX-LF using the default settings. For error analysis, the RM scores of the generated missing value masked dataset was compared to the RM scores of the unmasked dataset. The analysis was repeated ten times for estimation of 95% confidence intervals. The RM scores of the reference (0-hours) with itself were removed as all calculated RM scores would result in values of 1.0 regardless of missing values.

The imputation of RM scores was analyzed by randomly removing RM scores of the dataset without missing values prior to imputation. Here too, 10 to 80% of the intensities were removed randomly per sample ten times each excluding the reference sample (0-hours). The reference sample was excluded because missing values in the reference would have caused the precursor to be removed by multiFLEX-LF before RM score computation. The missing RM scores were imputed with a cosine similarity cutoff of 0.98. To compute the error the absolute difference between the RM scores of the imputed data and the RM scores without missing values was calculated. The RM scores of the reference sample were again removed prior to error calculation.

Additionally, a simulated dataset was created for statistical analysis of multiFLEX-LF results using the setup of our biological dataset. For the simulation, the precursor intensities of the 0-hours sample were used as reference. Of the precursors, 15% were randomly chosen to be simulated as differentially modified. For the 4-, 8-, and 10-hours samples, the precursor intensities were replaced with simulated intensities with varying noise levels. The simulated precursor intensities were drawn from normal distributions with their respective reference intensity as the mean and different amounts (between 10% and 90%) of the respective reference intensity as the standard deviation, resulting in the aforementioned varying noise levels. To simulate the intensities of the randomly chosen differentially modified precursors, their respective reference intensities were reduced by 50%, thus simulating a differential modification extent of 50%. Afterwards, the simulated intensities were each drawn from a normal distribution centered around the reduced reference intensity with a standard deviation of between 10% and 90% of the reduced intensity. The simulated dataset was then analyzed with multiFLEX-LF using the default parameters. For following statistical analysis, RM scores below the modification cutoff of the simulated differentially modified precursors were considered as true positives. RM scores of the not artificially differentially modified precursors below the modification cutoff were regarded as false positives. Specificity, sensitivity, and accuracy were computed for different modification cutoffs between 0.1 and 1.

Results and Discussion

multiFLEX-LF software

multiFLEX-LF was developed to allow for the unbiased analysis of modification dynamics in large-scale label-free proteomics time-series or case-control studies. multiFLEX-LF is truly unbiased as it does not require the pre-selection of proteins or protein modifications of interest. Figure 1 shows a graphical overview of the computation with multiFLEX-LF. The input format for multiFLEX-LF is a list-like text file of the proteins, the peptide precursors, the samples, the sample groups and the precursor intensities of quantified proteomics data. multiFLEX-LF employs an adapted FLEXIQuant-LF13 to analyze each protein consecutively. Optionally, the proteins can be analyzed in parallel. multiFLEX-LF creates several different output files including a text file containing the RM scores per protein, peptide precursor and sample and a html file containing the interactive dendrogram and heatmap of the hierarchical clustering. Additionally, intermediate results are saved for quality control purposes including a text file of the raw scores with measures such as reproducibility and plots of the regression line with the precursor intensities vs. the reference intensities for each protein and sample. After computation of the RM scores for the peptide precursors of each protein the missing RM scores are imputed and, optionally, normalized using the DESeq2 script before hierarchical clustering. The interactive figure of the hierarchical clustering supports zooming in and out of parts and taking screenshots of the current view in any common web browser. In every cell of the heatmap, the respective protein ID, peptide precursor ID, sample and RM score is shown when hovering the cursor over it. Additionally, an ID number is shown, which corresponds to the rank of the precursor in the ordering of the clustering. After clustering, an input prompt opens asking for a user-defined clustering distance. The distance value is used to create flat clusters of the precursors with a clustering distance below this value. An appropriate distance can be determined by the user through investigation of the dendrogram. After the cutoff is applied the interactive figure is updated and saved in a separate file. In this figure, the dendrogram is color coded highlighting the different clusters. The hover information of the cells of the heatmap is updated to include the cluster IDs of the precursors. The input prompt is repeated and the cutoff can be updated until the user decides to quit. A common personal computer is sufficient for the multiFLEX-LF computations.

Analysis of the dataset with multiFLEX-LF

The DIA dataset comprised 5,184 precursors (with 4,995 unique peptide sequences) from 893 proteins/protein groups. Before multiFLEX-LF computation, the dataset was filtered for precursors with a unique protein assignment and for proteins with at least five precursors. Removing all precursors with modifications except those with only methionine oxidation and/or cysteine carbamidomethylation resulted in a total of 4,049 precursors (with 3,582 unique peptide sequences) which were used for the subsequent multiFLEX-LF analysis. The only requirement for multiFLEX-LF is the association of at least five peptide precursors per protein; this step reduced the 893 proteins identified in the dataset to 305 proteins. The 0-hours sample was set as reference for our multiFLEX-LF computation of this dataset.

As described above, multiFLEX-LF was implemented with an optional parallel analysis of the proteins within a dataset. Analyzing the 305 proteins and four samples in the dataset without parallelization took ~5 minutes on a common personal laptop (16GB RAM, 6-Core AMD Ryzen 5 CPU) using ~1GB of RAM. The parallelized analysis of the proteins using 5 threads on the same laptop reduced the runtime to two and a half minutes. Ultimately, the runtime depends on the number of proteins in a dataset, the number of samples to analyze, and the parallelization settings chosen by the user. Runtime analysis of subsampling at different total numbers of proteins highlights the linear dependence as shown in Figure S1. This enables an estimate for 8000 quantifiable proteins being processed using multiFLEX-LF in less than 1.5 hours on a common personal laptop.

Before hierarchical clustering with multiFLEX-LF, the missing values were imputed by the median RM score of the closest precursors. The cosine similarity cutoff was set to 0.98. Therefore, precursors considered close had a cosine similarity above 0.98 to the precursor with missing RM scores. In the end, 82 precursors were removed during multiFLEX-LF analysis because they either did not have intensities in the reference samples or their missing values could not be imputed. The modification cutoff was set to 0.5 for clustering. The dendrogram and heatmap of the clustering is shown in Figure 2A.

Figure 2:

Figure 2:

Differential modification stoichiometries in the cell cycle. (A) Heatmap of the hierarchical clustering. The precursors of the dataset were clustered with a modification cutoff of 0.5, therefore, values below 0.5 (red colors) in the heatmap show precursors with potential differential modification. A decreased RM score corresponds to an increased modification extent. (B) Zoom-in of the heatmap into the part containing different clusters of potentially differentially modified precursors. (C) Trajectories of the median RM scores with the 95% confidence interval around the median for each identified cluster. (D) Examples of sequence logos of the amino acid sequences around the identified known serine and threonine phosphorylation sites from PhosphoSitePlus®3335 for the clusters with the most precursors (clusters 1, 4, 5).

After investigation of the heatmap and the dendrogram, a clustering distance cutoff of 0.45 was applied to build flat clusters from the hierarchical clustering. In total 26 flat sub-clusters were built with this distance cutoff. As shown in Figure 2A the hierarchical clustering divided the dataset in two main clusters. The larger cluster comprised mostly peptide precursors that remained unchanged during the time course experiment (top cluster in Figure 2A). The other main cluster contained potentially differentially modified precursors of varying potential modification stoichiometries (bottom cluster in Figure 2A). This cluster of potentially differentially modified precursors contained nine flat sub-clusters. The zoom-in of the heatmap in Figure 2B displays these different sub-clusters in greater detail. Six of these sub-clusters with increasing levels of potential differential modification were selected for further analysis. While 4 sub-clusters showed the most significant drop in the RM score after 4 hours (clusters 1, 3, 4, and 5), only two dropped below our RM cutoff of 0.5 after 4 hours (clusters 3 and 4). The other four sub-clusters reached this RM cutoff of 0.5 only after 8 (clusters 1 and 2) and 10 hours (clusters 5 and 6), respectively. This and previous results of FLEXIQuant-LF13, FLEXIQuant12 and derivatives highlights that multiFLEX-LF is capable of detecting differences in potential modification dynamics, i.e., differences in potential modification onset and extent. Of note: we excluded from further deliberations two small sub-clusters with only one or two precursors, as well as a sub-cluster comprising 20 precursors at the bottom of the heatmap which showed non-monotonous trajectories.

Analysis of the selected clusters

The trajectories of potential differential modification over time for each of the clusters 1 to 6 are displayed in Figure 2C. The plots show the median RM scores per time point with the 95% confidence interval around this median. We analyzed the trajectories in more detail using two criteria: i) time point of largest delta, i.e., strongest increase in extent of modification, and ii) the time point when the RM score drops below 0.5, i.e., a value at which we consider a potential differential modification.

Based on these criteria the following order becomes apparent: clusters 3 and 4, clusters 1 and 5, cluster 2 and cluster 6. This ordering reflects the time of onset of potential differential modification (clusters 3, 4, 1, and 5 for 0–4 hours after nocodazole addition, cluster 2 for 4–8 hours after nocodazole addition, and cluster 6 for 8–10 hours after nocodazole addition), as well as the subsequent kinetic to reach our RM score cutoff of 0.5. Combining both criteria allows the chronological sorting of potential differential modification sites that occur within 4 hours of nocodazole treatment into the very early (cluster 3), the early (cluster 4), the medium (cluster 1) and the late (cluster 5) potential differential modification sites. Such insights will be essential for a better understanding of the exact timing of the various events that occur during a biological process of interest, such as nocodazole-induced arrest in prometaphase as investigated in our dataset.

Altogether, 312 precursors from 127 proteins (~8% of all precursors and ~42% of all proteins of the dataset) were associated with the six clusters with potentially differentially modified precursors with varying potential modification dynamics. A breakdown of the number of precursors and proteins for each cluster is shown in Table 1. For 73 out of these 127 proteins, only single potentially differentially modified precursors were identified, while for 24 proteins each two potentially differentially modified precursors were found. The remaining 30 proteins had at least three precursors across the six clusters. The proteins with the largest number of modified precursors involving at least six potentially differentially modified precursors across the selected clusters are shown in Figure 3A including their respective number of precursors per cluster. The protein with the most precursors in all the potentially differentially modified clusters was as expected APC1 with 21 precursors. APC1 is the largest subunit of the anaphase promoting complex/cyclosome (APC/C). A strong enrichment in APC/C components is apparent (the components are marked with asterisks in Figure 3A), indicating that they are potentially differentially modified at the beginning of mitosis. In addition, several interacting actin filament binding proteins show similar cell cycle dependent potential differential modifications as the APC/C components. In addition to identifying the proteins that show potential cell cycle dependent modifications, it is also possible to derive information about the ordering of the substrates. While APC1, CDC27, APC5, PLEC, MYH9 and MYH10 show (very) early potential differential modifications within 4 hours of nocodazole arrest (clusters 3 and 4), all proteins show most of their potential differential modifications late during the first 4 hours post nocodazole arrest (clusters 1 and 5). In addition, APC7, TOP2A and APC1 also show potential differential modifications very late during the nocodazole arrest (clusters 2 and 6). Given the known pleiotropic effects of nocodazole arrest36, it is possible that these late potential differential modifications are nocodazole-associated artefacts.

Table 1:

Number of proteins, precursors, precursors with known modification and phosphorylation sites, and precursors with modifications found in the DDA data for each of the potentially differentially modified clusters. The known modification and phosphorylation sites were found in the PhosphoSitePlus®3335 acetylation, methylation, phosphorylation, and ubiquitination site databases.

Number of
Precursors with
Cluster Proteins Precursors Known Modification Sites Known Phosphorylation Sites Modifications Found in the DDA Data

1 48 97 79 (81%) 74 (76%) 24 (25%)
2 10 10 10 (100%) 9 (90%) 2 (20%)
3 10 10 8 (80%) 7 (70%) 1 (10%)
4 19 31 24 (77%) 16 (52%) 4 (13%)
5 90 153 122 (80%) 96 (63%) 16 (10%)
6 10 11 9 (82%) 7 (64%) 2 (18%)

1–6 127 312 252 (81%) 208 (67%) 49 (16%)

Figure 3:

Figure 3:

Proteins with at least six potentially differentially modified precursors in the clusters 1 to 6. (A) Number of precursors of the proteins in each of the clusters. (B) STRING39 analysis of the proteins. The analysis of the proteins was conducted with default parameters in the web interface of STRING.

Early potential modification events during the nocodazole block, i.e., low RM scores after 4 hours, are apparent for 16% of the precursors in Figure 3A (see cluster 3 and 4). Later potential modification events associated with low RM scores after 8 hours are seen for 45% of the precursors (see cluster 1 and 2). Late potential modification events observable after 10 hours of nocodazole block are associated with 39% of the precursors (clusters 5 and 6).

Analysis of the potentially differentially modified precursors

To provide an example for a subsequent analysis of multiFLEX-LF results, the PhosphoSitePlus®3335 database was searched for known acetylation, methylation, phosphorylation and ubiquitination sites within the 312 potentially differentially modified precursors of the six clusters. These precursors included 603 unique known modification sites from PhosphoSitePlus®. Modification sites found in precursors with the same sequence or in precursors with missed cleavage sites were counted only once. The number of precursors in each of the six analyzed clusters with known modification sites is shown in Table 1. About 80% of the precursors in these clusters had at least one known modification site indicating the presence of a modification site which can be differentially modified. The remaining potentially differentially modified precursors did not contain a known modification site within their sequence. These precursors could contain yet undiscovered modification sites. Alternatively, they could also have a reduced intensity because of modification of amino acid residues close to cleavage sites that interfered with the tryptic cleavage.

Additionally, the potentially differentially modified precursors in the 6 clusters were searched in the modified precursors from the ProteinPilot database search of the DDA dataset. As shown in Table 1 between about 10% and 25% of the potentially differentially modified precursors had a modified counterpart in the DDA data. The Spearman correlation between the mean RM score per cluster and the frequency of identified modified peptides in the DDA data per cluster have a strong negative correlation in the 6 potentially differentially modified clusters (see Figure S5), i.e., the lower the mean RM score (the higher the modification extent) the higher the frequency of modified peptides found within the cluster. The other clusters have no or only a weak correlation.

Sequence logos/motifs around the phosphorylation sites

To highlight a use case of multiFLEX-LF results, sequence logos of the amino acid sequences around known phosphorylation sites were generated. Phosphorylations are known to play a major functional role in cell cycle37. Therefore, we focused on protein phosphorylation and investigated this modification in more detail. As shown in Table 1 about 67% of the precursors in the six clusters contained at least one known phosphorylation site. Hence, about 83% of the precursors with known modification sites had a known phosphorylation site within their sequence that could be differentially modified during mitosis.

After identifying the known phosphorylation sites from PhosphoSitePlus® in the sequences of the potentially differentially modified precursors, the sites were investigated utilizing sequence logos. Sequences containing seven amino acids preceding and following the modification site were divided into two groups containing either a serine or a threonine residue as a phosphorylation site. Duplicate phosphorylation sites, due to different charge states or cleavage sites of the precursors were removed before being subjected to the sequence logo generator PSP LogoGenerator from PhosphoSitePlus®3335. Given the low frequency and abundance of tyrosine phosphorylation during mitosis in the literature and supported by the sparseness thereof in our dataset, we excluded tyrosine phosphorylation from further analysis. The logos that were generated around the serine and threonine phosphorylation sites for the clusters containing the most precursors (clusters 1, 4 and 5) are shown in Figure 2D. The sequence logos of the remaining clusters are displayed in Supplemental Figure S4.

Despite the noisiness of this analysis, several different potential substrate motifs can be found in the sequence logos such as the expected S/T-P motif associated with cyclin-dependent kinases (CDK), which are well described as being active and of high importance during mitosis38. The S/T-P motif is the most frequent motif in 4 out of 6 logos (green boxes) in Figure 2D. The S/T-P motif is markedly present in the clusters associated with the earlier modification events, i.e., clusters 1 and 4. The lower frequency of the S/T-P motif for the later potential differential modification events supports the notion that non-CDK driven, possibly nocodazole-associated artefacts are responsible for the observed potential differential modifications.

Analysis of the robustness of multiFLEX-LF

The impact of missing values on the RM scores was investigated using masking of random intensity values as missing at different overall rates prior to multiFLEX-LF analysis. The resulting RM scores were compared to the RM scores of the unmasked dataset. The mean absolute error is below 0.1 (10% difference in modification extent) for up to 60% of missing values per sample shown in Supplemental Figure S2A. This highlights the robustness of multiFLEX-LF RM score calculation even at high levels of missing values. Of note: Our unfiltered dataset showed the highest rate of missing intensities in the 10-hours sample with only 0.8% of missingness.

Similar to the missing value analysis, the imputation was examined by randomly removing increasing fractions of RM scores from the dataset before imputation. The absolute mean errors are shown in Supplemental Figure S2B. The figure shows that the imputation causes a mean absolute error below 0.07 for up to 80% of missing values. About half of the precursors were removed during imputation in the 80% missing values analyses either because they did not have two RM scores or no close precursors were found with a cosine similarity above 0.98.

Additionally, a simulated dataset was analyzed with multiFLEX-LF containing simulated precursor intensities with different levels of noise. 15% of precursors were randomly selected and simulated with a 50% reduction of the intensities compared to the reference and with the different levels of noise. RM scores of these 15% were considered true positives if they were below the modification cutoff. Supplemental Figure S3AC displays the specificity, sensitivity and accuracy of the RM scores for different levels of noise and with different modification cutoffs. The specificity was close to 1 for 10% of noise and a modification cutoff of 0.5. With 90% noise levels the specificity dropped to below 0.6 with the modification cutoff of 0.5. The sensitivity with modification cutoff of 0.5 stayed almost constant between 0.75 and 0.8 for all noise levels from 10 to 90%. The accuracy similar to the specificity starts at almost 1 for modification cutoff 0.5 and 10% noise and drops to below 0.6 with 90% noise. For intensity reduction of 50% and with up to 20% noise the modification cutoffs 0.5 and 0.6 have a balance between specificity, sensitivity and accuracy. Modification cutoffs above 0.6 have a higher sensitivity but a loss of specificity while modification cutoffs below 0.5 have a higher specificity but lower sensitivity.

Conclusion

PTM detection and quantification of the extent of modification without laborious and expensive experimental procedures are challenging. Here we presented multiFLEX-LF, a bioinformatics tool to detect protein-independent differentially modified peptide precursors in large-scale proteomics datasets and analyze their modification stoichiometries. To this end, differentially modified precursors are identified and their differential modification extent is quantified based on the intensities of the unmodified precursors compared to a reference. Furthermore, the hierarchical clustering of precursors based on their relative modification scores as part of multiFLEX-LF enables the analytical and visual interrogation of different modification stoichiometries in the data using common available tools such as a web browser. multiFLEX-LF focuses on analyzing increasing differential modification extent as this oftentimes is most relevant. To analyze decreasing differential modifications, we suggest analyzing the same dataset setting a different sample group as the reference, e.g., the last time point.

Through the application of multiFLEX-LF to a proteomics dataset of the APC/C isolated at various time points after nocodazole arrest, we not only showed the efficiency of the tool itself but also illustrated a potential downstream analysis highlighting the potential differential modification dynamics and time-series resolved involvement of cyclin-dependent kinases during mitosis.

Overall, multiFLEX-LF enables the detection and quantification of differential modification extent in large-scale proteomics data while only requiring label-free quantified unmodified peptide precursors and a reference sample. This allows for a broad range of applications and follow-up analyses where specific experimental methods are too time intensive or costly to investigate the modification landscape changes over time or across conditions. Therefore, we envision that multiFLEX-LF analysis will be carried out routinely on discovery proteomics datasets to obtain additional information that are inherently present in these datasets. As such, multiFLEX-LF can become a main contributing tool to elucidating modification changes in many biological systems.

Data and code availability

The raw mass spectrometry proteomics data and the database and spectral library search results are available via the ProteomeXchange Consortium and the PRIDE40 partner repository with identifier PXD027970. The code and executable of multiFLEX-LF are available at https://gitlab.com/SteenOmicsLab/multiflex-lf.

Supplementary Material

Suppl. Table 1

Table S1: Raw intensities of the precursors analyzed with multiFLEX-LF

Suppl. Table 2

Table S2: multiFLEX-LF RM scores and flat cluster ids of the multiFLEX-LF analysis

Supplementary info, Figure S1 to S5

Figure S1: Runtime analysis of multiFLEX-LF

Figure S2: Missing value impact on multiFLEX-LF results

Figure S3: multiFLEX-LF analysis of simulated dataset

Figure S4: Sequence logos of phospho-sites

Figure S5: Spearman correlation of mean RM score and fraction of identified modified DDA peptides per cluster

Acknowledgements

We acknowledge the following funding from the US National Institutes of Health: S10OD0107060 to H.S. for the TripleTOF 5600 mass spectrometer, R01CA196703, R01AI099204 and U01AI124284 covering fractions of the efforts from H.S., and R01GM112007 to J.A.S. covering part of her and C.N.S.’ effort. B.Y.R. gratefully acknowledges financial support from Deutsche Forschungsgemeinschaft (RE3474/2–2) covering part of his efforts. P.H. acknowledges the funding of the HPI Research School of Data Science and Engineering for covering parts of her efforts. We would like to acknowledge Drs. Ruchi Chauhan and Jan Muntel for preparing the APC/C co-IPs and analyzing the co-immunoprecipitates by LC-MS/MS, respectively. Furthermore, we would like to thank Konstantin Kahnert for his contributions to the development of the original FLEXIQuant-LF software.

Abbreviations

PTM

post-translational modification

LC–MS/MS

liquid chromatography–tandem mass spectrometry

DIA

data-independent acquisition

FLEXIQuant

full-length expressed stable isotope labeled proteins for quantification

RM score

relative modification scores

APC/C

anaphase promoting complex/cyclosome

CDK

cyclin-dependent kinases

References

  • (1).Prabakaran S; Lippens G; Steen H; Gunawardena J Post-Translational Modification: Nature’s Escape from Genetic Imprisonment and the Basis for Dynamic Information Encoding. Wiley Interdiscip. Rev.: Syst. Biol. Med. 2012, 4 (6), 565–583. 10.1002/wsbm.1185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (2).Aebersold R; Mann M Mass-Spectrometric Exploration of Proteome Structure and Function. Nature 2016, 537 (7620), 347–355. 10.1038/nature19949. [DOI] [PubMed] [Google Scholar]
  • (3).Zhang Z; Wu S; Stenoien DL; Paša-Tolić L High-Throughput Proteomics. Annu. Rev. Anal. Chem 2014, 7 (1), 427–454. 10.1146/annurev-anchem-071213-020216. [DOI] [PubMed] [Google Scholar]
  • (4).Sinitcyn P; Rudolph JD; Cox J Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data. Annu. Rev. Biomed. Data Sci. 2018, 1 (1), 207–234. 10.1146/annurev-biodatasci-080917-013516. [DOI] [Google Scholar]
  • (5).Singh V; Ram M; Kumar R; Prasad R; Roy BK; Singh KK Phosphorylation: Implications in Cancer. The Protein Journal 2017, 36 (1), 1–6. 10.1007/s10930-017-9696-z. [DOI] [PubMed] [Google Scholar]
  • (6).Mansour MA Ubiquitination: Friend and Foe in Cancer. Int. J. Biochem. Cell Biol. 2018, 101, 80–93. 10.1016/j.biocel.2018.06.001. [DOI] [PubMed] [Google Scholar]
  • (7).Wesseling H; Mair W; Kumar M; Schlaffner CN; Tang S; Beerepoot P; Fatou B; Guise AJ; Cheng L; Takeda S; Muntel J; Rotunno MS; Dujardin S; Davies P; Kosik KS; Miller BL; Berretta S; Hedreen JC; Grinberg LT; Seeley WW; Hyman BT; Steen H; Steen JA Tau PTM Profiles Identify Patient Heterogeneity and Stages of Alzheimer’s Disease. Cell 2020, 183 (6), 1699–1713.e13. 10.1016/j.cell.2020.10.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (8).Grundke-Iqbal I; Iqbal K; Tung YC; Quinlan M; Wisniewski HM; Binder LI Abnormal Phosphorylation of the Microtubule-Associated Protein Tau (Tau) in Alzheimer Cytoskeletal Pathology. Proc. Natl. Acad. Sci. U. S. A. 1986, 83 (13), 4913–4917. 10.1073/pnas.83.13.4913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (9).Dujardin S; Commins C; Lathuiliere A; Beerepoot P; Fernandes AR; Kamath TV; De Los Santos MB; Klickstein N; Corjuc DL; Corjuc BT; Dooley PM; Viode A; Oakley DH; Moore BD; Mullin K; Jean-Gilles D; Clark R; Atchison K; Moore R; Chibnik LB; Tanzi RE; Frosch MP; Serrano-Pozo A; Elwood F; Steen JA; Kennedy ME; Hyman BT Tau Molecular Diversity Contributes to Clinical Heterogeneity in Alzheimer’s Disease. Nat. Med. 2020, 26 (8), 1256–1263. 10.1038/s41591-020-0938-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (10).Santos AL; Lindner AB Protein Posttranslational Modifications: Roles in Aging and Age-Related Disease. Oxid. Med. Cell. Longevity 2017, 2017 (Article ID 5716409), 1–19. 10.1155/2017/5716409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (11).Bantscheff M; Lemeer S; Savitski MM; Kuster B Quantitative Mass Spectrometry in Proteomics: Critical Review Update from 2007 to the Present. Anal. Bioanal. Chem. 2012, 404 (4), 939–965. 10.1007/s00216-012-6203-4. [DOI] [PubMed] [Google Scholar]
  • (12).Singh S; Springer M; Steen J; Kirschner MW; Steen H FLEXIQuant: A Novel Tool for the Absolute Quantification of Proteins, and the Simultaneous Identification and Quantification of Potentially Modified Peptides. J. Proteome Res. 2009, 8 (5), 2201–2210. 10.1021/pr800654s. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (13).Schlaffner CN; Kahnert K; Muntel J; Chauhan R; Renard BY; Steen JA; Steen H FLEXIQuant-LF to Quantify Protein Modification Extent in Label-Free Proteomics Data. eLife 2020, 9, e58783. 10.7554/eLife.58783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (14).Fischler MA; Bolles RC Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24 (6), 381–395. 10.1145/358669.358692. [DOI] [Google Scholar]
  • (15).Reback J; jbrockmendel; McKinney W; Bossche J. V. den; Augspurger T; Cloud P; Hawkins S; gfyoung; Sinhrks Roeschke, M.; Klein A; Petersen T; Tratner J; She C; Ayd W; Hoefler P; Naveh S; Garcia M; Schendel J; Hayden A; Saxton D; Shadrach R; Gorelli ME; Jancauskas V; Li F; attack68; McMaster A; Battiston P; Seabold S; Dong K Pandas-Dev/Pandas: Pandas 1.3.1; Zenodo, 2021. 10.5281/zenodo.5136416. [DOI] [Google Scholar]
  • (16).Harris CR; Millman KJ; van der Walt SJ; Gommers R; Virtanen P; Cournapeau D; Wieser E; Taylor J; Berg S; Smith NJ; Kern R; Picus M; Hoyer S; van Kerkwijk MH; Brett M; Haldane A; del Río JF; Wiebe M; Peterson P; Gérard-Marchant P; Sheppard K; Reddy T; Weckesser W; Abbasi H; Gohlke C; Oliphant TE Array Programming with NumPy. Nature 2020, 585 (7825), 357–362. 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (17).Virtanen P; Gommers R; Oliphant TE; Haberland M; Reddy T; Cournapeau D; Burovski E; Peterson P; Weckesser W; Bright J; van der Walt SJ; Brett M; Wilson J; Millman KJ; Mayorov N; Nelson ARJ; Jones E; Kern R; Larson E; Carey CJ; Polat İ; Feng Y; Moore EW; VanderPlas J; Laxalde D; Perktold J; Cimrman R; Henriksen I; Quintero EA; Harris CR; Archibald AM; Ribeiro AH; Pedregosa F; van Mulbregt P; Vijaykumar A; Bardelli AP; Rothberg A; Hilboll A; Kloeckner A; Scopatz A; Lee A; Rokem A; Woods CN; Fulton C; Masson C; Häggström C; Fitzgerald C; Nicholson DA; Hagen DR; Pasechnik DV; Olivetti E; Martin E; Wieser E; Silva F; Lenders F; Wilhelm F; Young G; Price GA; Ingold G-L; Allen GE; Lee GR; Audren H; Probst I; Dietrich JP; Silterra J; Webber JT; Slavič J; Nothman J; Buchner J; Kulick J; Schönberger JL; de Miranda Cardoso JV; Reimer J; Harrington J; Rodríguez JLC; Nunez-Iglesias J; Kuczynski J; Tritz K; Thoma M; Newville M; Kümmerer M; Bolingbroke M; Tartre M; Pak M; Smith NJ; Nowaczyk N; Shebanov N; Pavlyk O; Brodtkorb PA; Lee P; McGibbon RT; Feldbauer R; Lewis S; Tygier S; Sievert S; Vigna S; Peterson S; More S; Pudlik T; Oshima T; Pingel TJ; Robitaille TP; Spura T; Jones TR; Cera T; Leslie T; Zito T; Krauss T; Upadhyay U; Halchenko YO; Vázquez-Baeza Y; SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Med. 2020, 17 (3), 261–272. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (18).Pedregosa F; Varoquaux G; Gramfort A; Michel V; Thirion B; Grisel O; Blondel M; Prettenhofer P; Weiss R; Dubourg V; Vanderplas J; Passos A; Cournapeau D; Brucher M; Perrot M; Duchesnay É Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  • (19).Waskom M; Gelbart M; Botvinnik O; Ostblom J; Hobson P; Lukauskas S; Gemperline DC; Augspurger T; Halchenko Y; Warmenhoven J; Cole JB; Ruiter J. de; Vanderplas J; Hoyer S; Pye C; Miles A; Swain C; Meyer K; Martin M; Bachant P; Quintero E; Kunter G; Villalba S; Brian; Fitzgerald C; Evans C; Williams ML; O’Kane D; Yarkoni T; Brunner T Mwaskom/Seaborn: V0.11.1 (December 2020); Zenodo, 2020. 10.5281/zenodo.4379347. [DOI] [Google Scholar]
  • (20).Hunter JD Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9 (3), 90–95. 10.1109/MCSE.2007.55. [DOI] [Google Scholar]
  • (21).Plotly Technologies Inc. Collaborative Data Science; Plotly Technologies Inc.: Montreal, QC, 2015. [Google Scholar]
  • (22).Love MI; Huber W; Anders S Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2. Genome Biol. 2014, 15 (12), 550. 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (23).R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
  • (24).The Pallets Projects. Click 8.0.1; The Pallets Projects, 2021. [Google Scholar]
  • (25).Riverbank Computing Limited. PyQT 5.9.2; 2017. [Google Scholar]
  • (26).The QT Company. QT 5.9.7; 2018.
  • (27).PyInstaller Development Team. PyInstaller 3.6; 2020.
  • (28).The UniProt Consortium. UniProt: A Worldwide Hub of Protein Knowledge. Nucleic Acids Res. 2019, 47 (D1), D506–D515. 10.1093/nar/gky1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (29).Cox J; Mann M MaxQuant Enables High Peptide Identification Rates, Individualized p.p.b.-Range Mass Accuracies and Proteome-Wide Protein Quantification. Nat. Biotechnol. 2008, 26 (12), 1367–1372. 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
  • (30).Bernhardt O; Selevsek N; Gillet L; Rinner O; Picotti P; Aebersold R; Reiter L Spectronaut: A Fast and Efficient Algorithm for MRM-like Processing of Data Independent Acquisition (SWATH-MS) Data. In Proceedings of the 60th ASMS Conference on Mass Spectrometry and Allied Topics; 2012. [Google Scholar]
  • (31).Bruderer R; Bernhardt OM; Gandhi T; Miladinović SM; Cheng L-Y; Messner S; Ehrenberger T; Zanotelli V; Butscheid Y; Escher C; Vitek O; Rinner O; Reiter L Extending the Limits of Quantitative Proteome Profiling with Data-Independent Acquisition and Application to Acetaminophen-Treated Three-Dimensional Liver Microtissues*[S]. Mol. Cell. Proteomics 2015, 14 (5), 1400–1410. 10.1074/mcp.M114.044305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (32).Shilov IV; Seymour SL; Patel AA; Loboda A; Tang WH; Keating SP; Hunter CL; Nuwaysir LM; Schaeffer DA The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra *. Mol. Cell. Proteomics 2007, 6 (9), 1638–1655. 10.1074/mcp.T600050-MCP200. [DOI] [PubMed] [Google Scholar]
  • (33).Hornbeck PV; Kornhauser JM; Tkachev S; Zhang B; Skrzypek E; Murray B; Latham V; Sullivan M PhosphoSitePlus: A Comprehensive Resource for Investigating the Structure and Function of Experimentally Determined Post-Translational Modifications in Man and Mouse. Nucleic Acids Res. 2012, 40 (D1), D261–D270. 10.1093/nar/gkr1122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (34).Hornbeck PV; Zhang B; Murray B; Kornhauser JM; Latham V; Skrzypek E PhosphoSitePlus, 2014: Mutations, PTMs and Recalibrations. Nucleic Acids Res. 2015, 43 (D1), D512–D520. 10.1093/nar/gku1267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (35).Hornbeck PV; Kornhauser JM; Latham V; Murray B; Nandhikonda V; Nord A; Skrzypek E; Wheeler T; Zhang B; Gnad F 15 Years of PhosphoSitePlus®: Integrating Post-Translationally Modified Sites, Disease Variants and Isoforms. Nucleic Acids Res. 2019, 47 (D1), D433–D441. 10.1093/nar/gky1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (36).Steen JAJ; Steen H; Georgi A; Parker K; Springer M; Kirchner M; Hamprecht F; Kirschner MW Different Phosphorylation States of the Anaphase Promoting Complex in Response to Antimitotic Drugs: A Quantitative Proteomic Analysis. Proc. Natl. Acad. Sci. U. S. A. 2008, 105 (16), 6069–6074. 10.1073/pnas.0709807104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (37).Olsen JV; Vermeulen M; Santamaria A; Kumar C; Miller ML; Jensen LJ; Gnad F; Cox J; Jensen TS; Nigg EA; Brunak S; Mann M Quantitative Phosphoproteomics Reveals Widespread Full Phosphorylation Site Occupancy During Mitosis. Sci. Signaling 2010, 3 (104), ra3. 10.1126/scisignal.2000475. [DOI] [PubMed] [Google Scholar]
  • (38).Murray AW Recycling the Cell Cycle: Cyclins Revisited. Cell 2004, 116 (2), 221–234. 10.1016/S0092-8674(03)01080-8. [DOI] [PubMed] [Google Scholar]
  • (39).Szklarczyk D; Gable AL; Lyon D; Junge A; Wyder S; Huerta-Cepas J; Simonovic M; Doncheva NT; Morris JH; Bork P; Jensen LJ; Mering C von. STRING V11: Protein–Protein Association Networks with Increased Coverage, Supporting Functional Discovery in Genome-Wide Experimental Datasets. Nucleic Acids Research 2019, 47 (D1), D607–D613. 10.1093/nar/gky1131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (40).Perez-Riverol Y; Csordas A; Bai J; Bernal-Llinares M; Hewapathirana S; Kundu DJ; Inuganti A; Griss J; Mayer G; Eisenacher M; Pérez E; Uszkoreit J; Pfeuffer J; Sachsenberg T; Yılmaz Ş; Tiwary S; Cox J; Audain E; Walzer M; Jarnuczak AF; Ternent T; Brazma A; Vizcaíno JA The PRIDE Database and Related Tools and Resources in 2019: Improving Support for Quantification Data. Nucleic Acids Res. 2019, 47 (D1), D442–D450. 10.1093/nar/gky1106. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Suppl. Table 1

Table S1: Raw intensities of the precursors analyzed with multiFLEX-LF

Suppl. Table 2

Table S2: multiFLEX-LF RM scores and flat cluster ids of the multiFLEX-LF analysis

Supplementary info, Figure S1 to S5

Figure S1: Runtime analysis of multiFLEX-LF

Figure S2: Missing value impact on multiFLEX-LF results

Figure S3: multiFLEX-LF analysis of simulated dataset

Figure S4: Sequence logos of phospho-sites

Figure S5: Spearman correlation of mean RM score and fraction of identified modified DDA peptides per cluster

Data Availability Statement

The raw mass spectrometry proteomics data and the database and spectral library search results are available via the ProteomeXchange Consortium and the PRIDE40 partner repository with identifier PXD027970. The code and executable of multiFLEX-LF are available at https://gitlab.com/SteenOmicsLab/multiflex-lf.

RESOURCES