Improved Detection of Differentially Abundant Proteins through FDR-Control of Peptide-Identity-Propagation

Alexander J Solivais; Hannah Boekweg; Lloyd M Smith; William S Noble; Michael R Shortreed; Samuel H Payne; Uri Keich

doi:10.1021/acs.jproteome.5c00065

. 2025 Jul 30;24(9):4437–4449. doi: 10.1021/acs.jproteome.5c00065

Improved Detection of Differentially Abundant Proteins through FDR-Control of Peptide-Identity-Propagation

Alexander J Solivais ^†,^*, Hannah Boekweg ^‡, Lloyd M Smith ^†, William S Noble ^∥, Michael R Shortreed ^†, Samuel H Payne ^‡, Uri Keich ^§,^*

PMCID: PMC12488043 NIHMSID: NIHMS2105554 PMID: 40736352

Abstract

The goal of proteomics is to identify and quantify peptides and proteins within a biological sample. Almost all algorithms for the identification of peptides in LC–MS/MS data employ two steps: peptide/spectrum matching and peptide-identity-propagation (PIP), also known as match-between-runs. PIP can routinely account for up to 40% of all results, with that proportion rising as high as 75% in single-cell proteomics. Unlike peptide identities derived through peptide/spectrum matches, for which error estimation has been strictly enforced for decades, peptide identities derived through PIP have not historically been subject to statistical evaluation. As an indispensable component of label-free quantification, PIP needs a statistically rigorous method for estimating its false-discovery rate (FDR). We present a method for FDR control of PIP, called PIP-ECHO, and devise a rigorous protocol for evaluating FDR control of any PIP method. Using three different benchmark data sets, we evaluate PIP-ECHO alongside the PIP procedures implemented by FlashLFQ, IonQuant, and MaxQuant. These analyses show that only PIP-ECHO can accurately control the FDR of PIP at 1% across all data sets. When analyzing a spike-in data set, PIP-ECHO increases both the accuracy and sensitivity of differential expression analysis, yielding substantially more differentially abundant proteins than either MaxQuant or IonQuant.

Keywords: peptide-identity-propagation, match-between-runs, false-discovery rate, FDR, label-free quantification

graphic file with name pr5c00065_0007.jpg

graphic file with name pr5c00065_0005.jpg

Introduction

Liquid chromatography–tandem mass spectrometry (LC–MS/MS) is the basis of modern proteomics. In the most common form of proteomic analysis, bottom-up proteomics, complex mixtures of peptides obtained through the enzymatic digestion of proteins are separated using LC and analyzed using MS/MS. The mass spectrometer records MS1 scans, which contain the mass-to-charge ratios (m/z’s) of all analytes eluting at a given time. In data-dependent acquisition (DDA), a common MS/MS analysis method, each MS1 spectrum is followed by a few MS2 fragmentation scans that are generated by isolating and fragmenting all analytes in a narrow m/z range corresponding to a series of isotopic peaks observed in the MS1 scan. Database search engines analyze the resulting data by matching the MS2 fragmentation spectra to theoretical fragmentation spectra generated using a peptide database. This procedure yields a list of peptide-spectrum-matches, or PSMs. We refer to peptides identified in this way as peptide detections.

A key component of the database search strategy is a statistical procedure to control the rate of errors among the reported PSMs. Some of the PSMs output by the search engine are correct, meaning that the matched peptide was present in the sample and generated the MS2 spectrum. Conversely, other PSMs are incorrect, meaning that the match was made by chance, and the reported peptide did not generate the MS2 spectrum. Canonically, target-decoy competition (TDC) is used to determine which PSMs are reported. TDC involves augmenting the peptide database with decoy peptides that are created by reversing or shuffling the original, or target, peptides. Because the decoy peptides are artificial, any PSM where a spectrum is matched to a decoy peptide is incorrect. Hence, the number of decoy peptides that are detected (in one or more PSMs) can be used to estimate and subsequently control the false discovery rate (FDR). The FDR is the expected value of the proportion of incorrect peptides among the reported target peptides, i.e., the false discovery proportion (FDP). ,

With the MS2-based peptide detection phase completed, we are often interested in quantifying the relative abundances of the detected peptides. Label-free quantification (LFQ) is a popular approach that uses information in MS1 scans to accomplish this task. Briefly, consider a peptide that was detected in a specific MS2 scan. As that peptide eluted it would have left a so-called “peak trace” across a short sequence of MS1 scans that are temporally adjacent to the detecting MS2 scan. These MS1 scans would contain the characteristic collection of m/z peaks corresponding to different isotopologues of the peptide (a graphical depiction of a “peak trace” is shown in Figure A, and a more thorough definition is provided in Methods). Pairing the detected peptide with its corresponding peak trace allows us to deduce the peptide’s relative abundance from the intensities of the peaks in the trace.

Key Terms: (A) Graphical depictions of MS1 spectra highlighting an MS peak, an isotopic envelope, and a precursor peak. (B) Illustration of the basic PIP workflow, where the precursor peak corresponding to an MS2-detection in the donor run is used to identify a matching precursor peak in the acceptor run, resulting in an PIP-detection. (C) Illustration of the two different types of error that can occur during PIP.

LFQ allows researchers to compare relative peptide or protein abundances between samples. Of course, we can only compare abundances of peptides that are consistently detected across multiple runs. However, inherent run-to-run variability is created by both real differences between biological samples and also technical instrument limitations. For example, a peptide may be present in multiple runs but only fragmented in some runs, resulting in missing peptide detections. For this reason, LFQ proteomics experiments yield quantitative data with 20–50% data incompleteness. To mitigate this stochastic variability and improve data completeness, a family of algorithms has been developed to transfer the identity of detected peptides between runs based on pairings of peak traces. This procedure is known as “peptide-identity-propagation” (PIP) or sometimes “match-between-runs” (MBR).

The key idea in PIP is to identify pairs of peak traces with similar retention times and m/z values that occur in different LC–MS/MS runs. Only one of the two peak traces, the “donor,” is linked to an MS2 peptide detection. The other peak trace, the “acceptor”, originates in a run where that peptide was not MS2 detected. The identity of the peptide is then transferred from the donor to the acceptor peak. An illustration of the PIP procedure is shown in Figure B. This approach was originally designed to increase the throughput of home-built ultrahigh resolution FTICRs, but has since been adapted to many applications in proteomics. − In most current applications, PIP is thought of as a backup method to supplement peptide detections from MS2 spectra. In practice, PIP can be responsible for a large fraction of all identified peptides, i.e., peptides that are paired with specific MS1 features. In single-cell proteomics in particular, PIP can account for more than 75% of all peptide identifications.

Because PIP generates a large fraction of all peptide identifications, it is essential that we understand and control errors in the method. Similar to the PSMs produced by database search engines, many of the peptide identifications arising from PIP are correct: the MS1 peaks in the acceptor run were generated by the same peptide that was MS2-detected and linked to the donor peak trace. However, others are not. Thus, as a vital component of quantitative proteomics, it is essential that we develop procedures for controlling the error rate of PIP.

Error analysis in PIP is complex, as two distinct types of errors can occur. First, peak-matching errors occur when the donor–acceptor peak traces are incorrectly paired, either because the acceptor peak trace was generated by a different analyte or is a noise-generated artifact. Second, peptide-identification errors occur when the identity of the donor peptide that was propagated was falsely matched to the donor peak trace prior to PIP. This type of error mostly occurs when the donor peak trace is linked to an incorrect peptide detection. If the MS2 detection procedure used an FDR threshold of 1%, then on average we expect that approximately 1% of all peptides used as donors in the PIP process will be incorrect, resulting in peptide-identification errors. Both types of errors are depicted in Figure C. Although this type of error is not caused by failures of the PIP algorithm, these errors do have important implications for the accuracy of the results. When a peptide-identification error occurs, even if the peak traces in the donor and acceptor are correctly matched, they now have an incorrect label and contribute to the overall error rate among reported transfers.

Initial implementations of PIP did not provide any measure of statistical confidence. Subsequent work evaluated the potential for errors in the method but did not provide a general solution for how to estimate statistical confidence. For example, with a two-proteome experimental design, Lim et al. used a combined human+yeast data set as donors and human-only data as acceptors to quantify error rates. In this approach, some of the peak-matching errors can easily be detected; in particular, any yeast peptide matched in a human-only sample constitutes such an error. Surprisingly, 44% of detected yeast proteins were incorrectly transferred to a human-only sample. Although most of these false transfers were “one-hit-wonders,” meaning that only one peptide from the protein was transferred, this demonstrates that false transfers can have an outsized impact on errors when rolled up to the protein level. This is especially concerning in single cell proteomics, where protein detections frequently rely on only one peptide.

More recently, several methods have been introduced that attempt to account for peak-matching errors during PIP. For example, IonQuant attempts to sample from the null distribution of incorrect feature–feature matches by looking for matches to “decoy” features in the acceptor run. These decoy features’ masses are selected to be shifted by 5–11 m/z from the donors’ feature masses; hence, they presumably represent incorrect transfers. Quandenser uses a similar strategy to model the null distribution, relying on decoy acceptor features that are offset by 5 m/z from the donors’ masses. However, neither tool accounts for possible peptide-identification errors, making the implicit assumption that all donor peptide detections are correct. While the combination of Quandenser with triqler does consider both types of error, this is done internally as part of the overall goal of producing a list of differentially abundant proteins, and it is not clear how this method can be used for controlling the error in the PIP process itself.

Here we introduce PIP Error Control via Hybrid cOmpetition (PIP-ECHO), a procedure to carry out PIP while accounting for both peak-matching and peak-identification errors. As its name suggests, PIP-ECHO leverages two types of competition to control the FDR among the reported list of transferred peptide identities. The first competition mimics the canonical target-decoy peptide competition used in FDR control for database search. The second competition involves generating matches to peak traces detected at randomized retention times. This step is similar in spirit to the methods employed by Quandenser and IonQuant. The heart of PIP-ECHO is the novel way in which it leverages both types of competitions to estimate and control the overall FDR for PIP.

Although PIP-ECHO is analytically motivated, it is important to gauge how successful the method is in controlling the FDR in practice. This goal leads to the second major contribution of this paper, a procedure that uses the two-proteome experimental design to quantifiably gauge how well a PIP procedure controls the overall FDR among its reported list of transfers. Applying this novel procedure to multiple two-proteome data sets, including one simulating single-cell proteomics data, we show that MaxQuant, IonQuant, and the previous version of FlashLFQ all fail to control the overall FDR for PIP. Moreover, these experiments show that the new version of FlashLFQ, which implements PIP-ECHO, successfully controls the overall PIP FDR while delivering a comparable number of discoveries as MaxQuant and IonQuant.

We demonstrate the practical utility of PIP-ECHO by performing an experiment designed to detect proteins that are differentially abundant in two conditions. Specifically, we used FlashLFQ + PIP-ECHO, as well as MaxQuant and IonQuant, to infer a PIP-boosted list of MS1-quantified peptides from a spike-in data set originally published by Shen et al. This data set was generated by varying the proportion of E. coli proteins that were spiked into an essentially fixed concentration of human proteins. Thus, the E. coli peptides are differentially abundant with known expected fold-change values, whereas the human peptide abundances are not expected to change. We subsequently applied two types of differential abundance analyses comparing the empirical fold change computed from the MS1-quantified peptides with the expected one. In both analyses, we found that when the error rate is held constant, FlashLFQ + PIP-ECHO allows us to identify substantially more peptides and proteins at the expected fold change than either MaxQuant or IonQuant. In the second analysis, the results from FlashLFQ + PIP-ECHO enabled the discovery of 47% and 915% more differentially abundant proteins as compared to MaxQuant and IonQuant, respectively.

PIP-ECHO is implemented in a new version of FlashLFQ, which is open source and freely available both as a stand-alone program and as a component of the MetaMorpheus software suite.

Methods

PIP-ECHO

The input to PIP-ECHO includes a list of potential donor peptides that are generated by searching a concatenated target-decoy database. In the results reported here we used MetaMorpheus to generate this list, but this can also be done using other search tools so long as the output contains both target and decoy PSMs. The number of peptides the tool reports can affect the power of PIP-ECHO, but as long as the tool is oblivious to the target/decoy label, PIP-ECHO should be able to control the PIP FDR.

Every peptide in the list of donor peptides is represented by one PSMthe highest scoring PSM that was matched to the donor peptide sequence. In addition to the highest scoring PSM, PIP-ECHO is also told the run from which the PSM originated, as well as a list of charges of all other PSMs from the same run that were matched to the same peptide sequence. PIP-ECHO then seeks to propagate the identity of the donor peptides to acceptor runs where the peptide was not MS2-detected.

PIP-ECHO attempts to match each potential donor peptide, including decoy peptides, to two peak traces in each acceptor run. The first peak trace is sought within a window anchored at the peptide’s predicted acceptor RT, which is derived by locally matching the donor and acceptor runs’ RTs of peptides that were MS2-detected in both runs (Algorithm 5). The second peak trace in the same run is sought within a window anchored at a randomized RT drawn as follow. First, PIP-ECHO randomly draws a peptide from a list of MS2-detected peptides in the donor run that have (a) an unambiguously associated peak trace (Algorithm 2), (b) a mass close to, but at least 5 Da away from that of the original donor peptide, (c) a RT that is not too close to that of the original donor peptide, and (d) a stem or unmodified form that is distinct from that of the original donor peptide (Algorithm 13). Next, the RT of this randomly drawn donor peptide is mapped to the acceptor RT scale using the same Algorithm 5 that we used for mapping the RT of the original donor peptide.

Both the predicted-RT anchored window and the one anchored at the randomized RT have the same window size. Matching a donor peptide with a peak trace within either window requires scoring each eligible peak trace within the window and picking the highest scoring one. A peak trace is eligible if (a) it consists of one or more valid MS1 scans, i.e., the scans contain a peak that matches the theoretical most abundant isotopologue of the donor peptide, and they have a valid isotopic envelope matching the donor peptide’s theoretical envelope (Algorithm 7), and (b) the RT of the trace’s apex scan is within the window. An “apex scan” here is the MS1 scan in the peak trace with the maximal sum of log-intensities of peaks that match the theoretical isotopologues of the donor peptide (Algorithm 12). If neither the predicted-RT nor the randomized-RT based windows contain an eligible peak trace matching any of the observed donor peptide charges, then the window size is increased by 30 s. This process repeats until at least one such eligible peak trace is found or the maximal window size is reached and no peak trace is associated with the current donor (Algorithm 4).

Matches to peak traces are scored based on the following features: mass error, intensity, difference between observed and predicted/randomized retention time, isotopic distribution, and the number of MS1 scans in which an isotopic envelope was observed. These features are detailed in Table S2. Each feature is calibrated by computing the proportion of peak traces associated with MS2-detected peptides for which the value of the feature, as calculated for the match to the MS2-detected peptide, is at least as extreme as it is for the evaluated acceptor peak trace. For example, the mass error feature is the difference between the theoretical mass of the peptide’s most abundant ion and the deconvolved mass of the most abundant ion in the peak trace. This feature’s calibrated version is the proportion of MS2-detected peptides in the acceptor run for which the analogous difference, calculated with respect to their associated peak traces, is at least as large. The combined score for each matched peak trace is the geometric mean of these calibrated feature scores (Algorithm 11). This score is used to decide between multiple acceptor peak traces that are found within the same retention time window.

After peak trace matching has been performed for every acceptor run, the resulting list of candidate PIPs or matched peak traces are rescored using a semisupervised machine learning approach. This approach employs a gradient boosted binary decision tree classifier in order to assign every peak trace a posterior error probability (PEP) that, ideally, represents the likelihood that the peak trace is correctly matched. The same classifier, an implementation of the MART algorithm, is also used in MetaMorpheus, and a complete description is provided by Burges, 2010. , Classification is performed in the style of Percolator using 3-fold cross validation where the randomized-RT peaks are used as negative training examples and the predicted-RT peaks in the top 25% of all peaks are used as positive training examples. The first iteration of the semisupervised learning model uses the combined score described in the preceding paragraph to select the top scoring peaks. Each subsequent iteration uses the PEP calculated in the previous iteration. Five rounds of training are performed in total, after which each peak trace has a well calibrated PEP (Algorithm 15). Finally, for each donor peptide that is linked to both a predicted-RT and a randomized-RT acceptor peak trace, the peak trace with the higher PEP is removed.

After rescoring is complete, the PIP transfers are sorted by their PEPs. Then, the FDR is calculated on a run-by-run basis. For each run, let T _k be the number of candidate target-peptides PIPs with a predicted RT among the top k scoring candidates, and let T _k be the corresponding number of target-peptide PIPs with a randomized RT among the same top k. Similarly, let D _k and D _k be the analogous numbers of decoy-peptide PIPs. The assumptions above imply that T _k is an unbiased estimate of the number of peak-matching errors. Similarly, D _k – D _k is an estimate of the number of peptide-identification errors (with correctly matched peaks). D _k is subtracted from D _k to avoid double-counting of errors, as incorrect peak transfers are accounted for in the first term. Because D _k – D _k can be negative, we take the maximum of it and 0, arriving at the following estimate for the FDR among the T _k top scoring predicted-RT target-peptide PIPs:

{\hat{FDR}}_{k} = \frac{T_{k}^{r} + \max {0, D_{k}^{p} - D_{k}^{r}}}{T_{k}^{p}}

In attempting to control the FDR among the run’s reported PIPs at level α, PIP-ECHO then utilizes the above estimate analogously to by adding +1 to the numerator and looking for the maximal k such that the estimate is still ≤ α:

k (α) = \max_{k} {k : {\hat{FDR}}_{k}^{+} \leq α} = \max_{k} {k : \frac{1 + T_{k}^{r} + \max {0, D_{k}^{p} - D_{k}^{r}}}{T_{k}^{p}} \leq α}

The procedure then reports the corresponding T _k(α) top scoring predicted-RT-based target PIPs.

A more detailed description of the classes, default parameters, and algorithms used by PIP-ECHO can be found in Tables S1 and S3, and Algorithms 1–23 in the Supporting Information.

Estimating False Discovery Proportion Using Two-Proteome Data sets

Protein Databases

Each data set was searched against a concatenated database composed of two separate databases. For the Lim et al. and single-cell equivalent data sets, the first database consisted of the Saccharomyces cerevisiae (UP000002311, 6060 proteins) proteome, whereas for the E. coli data set, the database consisted of the Escherichia coli (UP000000625, 4404 proteins) proteome.

For all three data sets the second database was made of the human proteins fused with entrapment sequences designed to capture peptide identification errors. This database was created by appending an entrapment sequence to the end of each protein in the Homo sapiens (UP000005640, 82499 proteins) proteome. Entrapment sequences were created by randomly shuffling every residue in the target protein sequence while keeping all R and K residues (tryptic cleavage sites) in place. If any of the tryptic shuffled peptides were identical to a peptide found in the Homo sapiens, or in Saccharomyces cerevisiae (for the Lim et al. and single-cell data sets) or Escherichia coli (E. coli data set) proteomes, they were removed. Half of each entrapment sequence was appended to the target protein sequence, resulting in a fused protein. This resulted in an approximately equal number of unique peptide sequences arising from the entrapment sequences and the Homo sapiens sequences.

All proteomes were downloaded from UniProt in March of 2024. Additionally, the default contaminant database provided by each search engine (MaxQuant, MetaMorpheus, MSFragger) was used during search.

Data Formatting

All data files were converted from.raw to.mzML format using msconvert prior to analysis by FragPipe+IonQuant and MetaMorpheus+FlashLFQ. MaxQuant analysis was performed using data in the.raw format because MaxQuant does not support the.mzML file type.

Database Search in MetaMorpheus

Each protein in each database was digested in silico to produce tryptic peptides. To be considered, peptides needed to have minimum length of 7, no more than two variable modifications, and no more than two missed cleavage sites. Oxidation of methionine was the only variable modification that was considered. More information about the search and FDR control procedures in MetaMorpheus can be found in Supplementary Note 1.

FlashLFQ v1.0

PSMs from MetaMorpheus were filtered to a PEP q-value of 0.01 before being passed into FlashLFQ v1.0 for quantification and PIP. Default argument were used for FlashLFQ.

FlashLFQ + PIP-ECHO

PSMs from MetaMorpheus were filtered to a PEP q-value of 0.01 before being passed into FlashLFQ + PIP-ECHO for quantification and PIP. Additionally, a list of peptides detected across all runs, along with their corresponding peptide-level PEP q-values, were passed as arguments to FlashLFQ + PIP-ECHO. Peptide-level PEP q-values were used to select donors for the PIP procedure. The donor peptide PEP q-value threshold differed depending on PIP FDR. PIP FDRs of 1, 2.5, 5% corresponds to donor peptide PEP q-value thresholds of 0.002, 0.005, and 0.01, respectively. For PIP FDRs of 100% (i.e., no PIP FDR threshold), a donor peptide PEP q-value threshold of 0.01 was used.

IonQuant

Data sets were searched and quantified using FragPipe v21.1, MSFragger v4.0, and IonQuant v1.10.12. , The “Default” workflow setting in FragPipe was used with several parameters changed as follows. The “Add decoys” option was enabled to add decoy and contaminant proteins to the protein database used for search. The “Run MS1 Quant” and “Match between runs (MBR)” options were enabled. The “MBR ion FDR” was adjusted based on the analysis being performed. For the spike-in data set, the “Normalize intensity across runs” option was enabled. For the single-cell data set, mixed samples were assigned to the “library” condition.

MaxQuant

Data sets were searched and quantified using MaxQuant v2.6.1.0. ,, All default settings were used with the exception of the “Match between runs” option, which was enabled.

Data Censoring

Data censoring was performed to estimate each PIP algorithm’s native peak matching error rate. This involved comparing peak traces that were unambiguously associated with MS2-detected peptides with corresponding peak traces identified by PIP using the same donor peptide but with the detecting MS2 spectra removed from the acceptor run. Data censoring was performed separately for each software tool that was evaluated as follows.

Let MS2 Peptides _i be the list of all peptides that were detected in the pure-sample run i. MS2 Peptides _i was filtered according to the following criteria:

1.
The peptide must have been detected at an FDR < 0.1%.
2.
The peptide sequence must be derived from the H. sapiens proteome and must not occur in any other proteome used during the search.
3.
The peptide must have been detected in at least one other run in the data set with a score greater than the score of the peptide detection in run i.
4.
The peptide must only be associated with one MS2 spectrum in run i.

Then, Censored peptides _i was created by randomly selecting 500 peptides from MS2 Peptides _i for censoring. Censoring was performed by editing the data file from run i. The MS2 spectrum associated with a peptide in Censored peptides _i was replaced with an MS2 spectrum that contained only one peak at 150 m/z. This was performed for every data file associated with a pure-sample run, and then the resulting censored data files were searched and quantified alongside the original mixed-sample data files. A different set of censored data files was generated for each software.

The results of the original and the censored data analyses were compared. Let Native Peak Pairs _i be the list of peak traces pairs where a peak trace corresponding to the same peptide was identified in the original as well as in the censored data analyses of run i.

Let MS2-RT _x,i be the retention time of the apex scan of the peak trace associated with Peptide _x in run i that was identified in the original analysis, and let PIP-RT _x,i be the corresponding retention time of the PIP involving Peptide _x in run i in the censored-data analysis. Let PIP RT-Diff _x,i = |MS2-RT _x,i – PIP-RT _x,i|. Let Native Peak Errors _i be every pair in Native Peak Pairs _i where PIP RT-Diff _x,i > 1% the length of the LC gradient (Lim et al. - 54 s, E. coli - 36 s, Single-cell -24 s). Then, let All Native Peak Errors = ∪ _i = 1 Native Peak Errors _i and All Native Peak Pairs = ∪_i = 1 Native Peak Pairs _i

Calculating FDP

1.
Let PIP-Identifications be the list of all peak traces identified by PIP in pure (human-only) runs.
2.
Let Foreign MS2-Detections be the list of all foreign species (yeast or E. coli, depending on the data sets) peptides that were detected in pure (human-only) runs.
3.
All identifications in PIP-Identifications that correspond to a peptide in Foreign MS2-Detections were removed from PIP-Identifications (these are removed as suspected carryovers).
4.
Each identification in PIP-Identifications was then assigned to one of the following lists:
- a.
  Human PIP-IDslist of PIPs associated with Homo sapiens peptides,
- b.
  Foreign PIP-IDslist of PIPs associated with a foreign species (E. coli or S. cerevisiae) peptide that is not present in the Homo sapiens proteome,
- c.
  Entrapment PIP-IDslist PIPs associated with an entrapment peptide.
5.
The estimated number of foreign peak matching errors is the observed number of foreign PIPs:eFPE = |Foreign PIP-IDs|.
6.
The estimated number of peptide identification errors is derived from the entrapment PIPs:ePIE = |Entrapment PIP-IDs|× S, where, S is a scaling factor that corresponds to the ratio of the combined target database size to the entrapment database size: $S = \frac{| E n t r a p m e n t p e p t i d e s | + | H . s a p i e n s p e p t i d e s |}{| E n t r a p m e n t p e p t i d e s |} = 1.86$ .
7.
The estimated number of native peak matching errors is $e N P E = | H u m a n P I P ‐ I D s | \times \frac{| A l l N a t i v e P e a k E r r o r s |}{| A l l N a t i v e P e a k P a i r s |}$ .
8.
The estimated FDP is then equal to the sum of the three estimated error counts over the number of PIPs: $e F D P = \frac{e F P E + e P I E + e N P E}{| P I P ‐ I d e n t i f i c a t i o n s |}$ .
9.
Note the dividing each of the three estimated counts by |PIP-Identifications| gives an estimate of the FDP due to each of those three types of error.

E. coli Two-Proteome Data Set

Human protein digest was obtained from Promega (MS Compatible Human Protein Extract, Digest, V695A) and resuspended in 5% acetonitrile, 0.2% formic acid buffer to a final concentration of 1 μg/μL. E. coli protein digest was obtained from Waters (MassPREP E. coli Digest Standard, 186003196) and resuspended in 5% acetonitrile, 0.2% formic acid buffer to a final concentration of 1 μg/μL. The mixed species sample was prepared by adding 3.5 μL of the E. coli protein digest solution to 35 μL of the human protein digest solution, for a final ratio of 10:1 human:E. coli peptides.

Both samples were analyzed over 10 replicate injections using an ultrahigh performance LC-MS/MS (UPLC-MS/MS) system consisting of a Vanquish Neo ultrahigh-pressure liquid chromatography system and an Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific, RRID:SCR_020562). Injected peptide samples were loaded at a pressure of 400 bar onto a 25 cm long fused silica capillary nanocolumn packed with C18 resin (3.0-μm diameter, 130 Å pore size from Waters). Approximately 1 μg of peptide was loaded onto the column for each run. Peptides eluted over 140 min at a flow rate of 350 nL/min with the following gradient, where buffer A was aqueous 0.2% formic acid and buffer B was 80% acetonitrile with 0.2% formic acid: time 1 min-5% buffer B; time 35 min-30% buffer B; time 53 min-42% buffer B; time 60 min-55% acetonitrile; time 60.5– 85 min-85% buffer B. The nanocolumn was held at 60 C using a column heater constructed in-house.

The nanospray source voltage was set to 2300 V. Full-mass profile scans were performed in the orbitrap between 375 and 1500 m/z at a resolution of 120,000, followed by MS/MS higher-energy collisional dissociation (HCD) scans in the orbitrap of the highest intensity parent ions in a 3-s cycle time at 30% relative collision energy and 15,000 resolution, with a 2.5 m/z isolation window. Charge states 2–6 were included and dynamic exclusion was enabled with a repeat count of one over a duration of 30 s and a 10 ppm exclusion width both low and high. The automatic gain control (AGC) target was set to “standard,” maximum inject time was set to “auto,” and 1 μscan was collected for the MS/MS orbitrap HCD scan.

Differential Expression Analysis

Differential expression analysis was conducted using a spike-in data set originally published by Shen et al. This data set consists of E. coli and human lysate digests mixed at five different ratios, ranging from 3 to 9% total E. coli lysate (w/w). These ratios are denoted by the relative amounts of E. coli included in each: 1×, 1.5×, 2×, 2.5×, and 3×. Four technical replicates were used for each concentration, resulting in 20 LC-MS/MS runs in total.

This data set was searched against the Escherichia coli (UP000000625) and Homo sapiens (UP000005640) proteomes. Default contaminant databases for each database search engine were also included during search. Settings for each database search program are identical to those described in “Estimating False Discovery Proportion using Two-Proteome Datasets.” Quantification was performed using IonQuant, MaxQuant, FlashLFQ v1.0, and FlashLFQ + PIP-ECHO. For IonQuant and FlashLFQ + PIP-ECHO, PIP FDR thresholds ranging from 1 to 100% were tested. For all software, we compared quantification with and without PIP enabled. Normalization was performed using the normalization procedure implemented by each software.

The quantitative results for each file were analyzed as follows. The intensity values were log ₂ transformed, and missing or zero values were removed. Then, each of the five conditions were compared (e.g., the 1× spike-in condition was compared to the 1.5×, 2×, 2.5× and 3× conditions), for a total of (5 2) = 10 comparisons.

In each comparison, for each analyte (peptide or protein), the log ₂ intensity values that were reported for a given spike-in condition were averaged. Let O _FC denote the observed log₂ fold change and let E _FC = log₂ (higher spike-in/lower spike-in) denote the expected log₂ fold change of an E. coli peptide. For each analyte we calculated Δ_FC = T _FC – O _FC, where perfectly accurate quantification should result in all human analytes having a Δ_FC = 0, and all E. coli analytes displaying a Δ_FC = T _FC. Analytes were ordered by descending Δ_FC, and the FDP among the top k analytes was calculated as $FDP = \frac{| Human analytes in {top}_{k} |}{k}$ . Analytes that were ambiguous between the human and E. coli proteomes were excluded from the analysis. The number of E. coli analytes discovered at FDP = 0.05 were reported for each comparison and summed across all 10 comparisons.

Differential expression analysis using limma (v3.19) was conducted in a similar fashion. Peptides/proteins with fewer than two observations in either condition were excluded from the analysis. Then, log ₂ transformed intensity values were used as an input into the standard limma workflow. A linear model was fit for each of the 10 pairwise comparisons using the lmFit function with default parameters and the eBayes function with “trend” set to True resulting in 10 sets of limma p-values, a p-value for each analyte. For each comparison the analytes with significant Benjamini-Hochberg-adjusted p-values at FDR thresholds of 0.01 and of 0.05 with a log₂ fold change >1.0 were deemed significantly differentially abundant. We reported the total number of E. coli peptides/proteins that were thus deemed significant summed up across the 10 comparisons, as well as the estimated FDP which we computed as the corresponding total number of human analytes that were deemed significant divided the sum of the human and E. coli totals.

Results and Discussion

PIP-ECHO: Controlling for Both Types of Errors

PIP-ECHO is designed to report as many correct PIPs as possible while controlling the overall FDR. This requires a procedure to estimate the number of incorrect PIPs included in a set of candidate PIPs, as well as a scoring function that quantifies the quality of each putative PIP. To estimate the number of incorrect PIPs, we need to account for both peak-matching and peptide-identification errors. PIP-ECHO uses two types of competitions to estimate, and therefore control, the frequency of these two types of errors.

Peptide-identification errors are accounted for using the first competition, which relies on the canonical TDC conducted during the database search step, where each MS2 spectrum was searched against a concatenated target-decoy database. PIP-ECHO transfers the identity of both target and decoy peptides that were detected (at a given FDR threshold) during this database search step (Figure A). This produces target-peptides PIPs, where the identity of a target peptide is transferred, and decoy-peptide PIPs, where the identity of a decoy peptide is transferred. By counting the number of decoy-peptide PIPs, we are able to estimate the number of PIPs where the identity of an incorrectly detected target peptide is transferred, i.e., peptide-identification errors.

Workflow for PIP-ECHO FDR Control: (A) A depiction of the MS1 signal from a single LC-MS run showing peak traces associated with target and decoy peptides. (B) Illustrated example of predicted and random-RT peak selection. (C) Illustrated example of decoy peptide peak selection and decoy peptide, random-RT peak selection. (D) The equation used to calculate the estimated FDR in PIP-ECHO.

Peak-matching errors are accounted for using the second competition, where PIPs that are located at a predicted retention time (RT)the retention time of the donor peptide mapped onto the acceptor runare competed against PIPs located at a randomized RT. These random RTs are selected by considering all peptides in the donor run whose mass is greater than 5 Da away and no further than 11 Da away from that of the donor peptide (Figure B,C). Then, one of these peptides is selected at random, and its RT is mapped onto the acceptor run. Finally, we attempt to locate a peak trace with the same mass and isotopic distribution as the donor peptide but anchored at this randomized RT. Such random-RT peak traces model peak-matching errors: the peak trace is observed at the wrong RT and therefore should not be linked to the donor peak trace.

Assuming that an incorrect PSM is equally likely to involve a target or a decoy peptide, as is canonically done in TDC, the number of decoy-peptide PIPs allows us to estimate the number of incorrect target-peptide PIPs. We similarly assume that an incorrect transfer for a given donor peak trace is equally likely to involve the predicted RT as it is to involve the random RT. Accordingly, PIP-ECHO competes each such pair of predicted-RT and random-RT transfers, keeping only the higher scoring of the two, and uses the number of times a random-RT PIP wins the competition to estimate the number of target-peptide PIPs that are actually peak-matching errors. FDR is then calculated using eq and controlled using eq , as described in the methods.

In order to maximize PIP-ECHO’s power, i.e., the number of PIPs reported at a given FDR threshold, we need a scoring function that is able to distinguish between correct and incorrect transfers. We therefore developed an iterative rescoring that uses gradient boosted decision trees to gradually learn to separate correct and incorrect transfers, modeled by predicted-RT and random-RT PIPs, respectively. Like Percolator, we use cross-validation to ensure that the tree learns to distinguish between correct and incorrect transfers rather than only learning how to distinguish between predicted and random-RT PIPs (i.e., overfitting).

Estimating False Discovery Proportions in PIP Using Two-Proteome Data Sets

A procedure that controls the FDR in practice controls only the expected value of the false discovery proportion (FDP). The latter varies from data set to data set but should on average be ≤ α, where α is the procedure’s target FDR threshold. For such a procedure, we expect that when the number of discoveries is large, the actual FDP should be at most about α. In our case, the FDP is the proportion of PIPs in the final list of reported PIPs that involve either a peak-matching or a peptide-identification error.

Computational entrapment procedures are often used to assess the FDR control of database search tools by estimating the FDP in the reported result. However, the two types of errors present in the PIP context require a different experimental design. We therefore present a new entrapment procedure to estimate the FDP of PIP using data from two-proteome experiments. In a two-proteome experiment, cell lysate digests from two different species are used to generate two samples: a pure sample, which contains peptides only from species A; and a mixed sample, which contains a mix of peptides from both species A and species B. Each sample is analyzed by LC–MS/MS multiple times to generate replicates, and these replicates are analyzed together.

Our procedure focuses on estimating the FDP due to peptide-identification and peak-matching errors among all PIPs reported in the pure (species A) runs. Analogously to the canonical entrapment setup we estimate the number of peptide-identification errors by adding entrapment peptides that are not expected to be in the sample to the target database. Any PIP that is derived from an entrapment donor peptide is therefore presumed to be a peptide-identification error, and similarly to TDC we can use those observed errors to estimate the number of unobserved peptide-identification errors among the target-peptide PIPs.

To estimate the number of peak-matching errors, we note that each can be classified either as a native-peak error, where the donor peptide is present in the acceptor run but it is matched to an incorrect peak trace, or as a foreign-peak error, where the donor peptide is not present in (or foreign to) the acceptor run. In a two-proteome experiment, the pure samples only contain peptides from species A. Therefore, any PIP reported in a pure sample with a donor peptide from species B clearly constitutes an observed foreign peak-matching error. Moreover, we assume that the sample from species A is shared across all runs and hence that there are no other PIPs with foreign-peak errors in the pure runs (entrapment PIPs are already counted as peptide identification errors).

In order to estimate the number of native-peak errors that occur in the pure sample runs, we mask a subset of peptide detections by editing the data to remove MS2 spectra in which the peptides were detected. We then reanalyze the edited data, relying on PIP to locate peak traces corresponding to the masked peptides. The peak traces identified through PIP in the reanalysis are compared to the peak traces identified based on MS2 spectra in the first analysis. If the two peak traces have apex retention times separated by more than 1% the length of the LC gradient) (54 s for a 90 min gradient), it is considered a native peak error.

Finally, we combine the estimated number of native and foreign peak-matching errors with the estimated number of peptide-identifications errors to estimate the overall PIP FDP. This approach enables an accurate estimation of the FDP of PIP during analysis of two-proteome data sets (see Methods for more details).

Only PIP-ECHO Consistently Controls the Overall FDR

We compared the FDP among the PIPs reported by FlashLFQ + PIP-ECHO, MaxQuant, IonQuant, and FlashLFQ v1.0 by applying the above method to three different two-proteome data sets. The first data set, originally published by Lim et al., consists of mixed samples containing human and yeast peptides and pure samples containing only human peptides. The second data set consists of mixed samples containing human and E. coli peptides, and pure samples containing only human peptides. The final data set, originally published by Truong et al., was designed to mimic a single-cell proteomics experiment, where it is common to analyze single-cell samples alongside samples derived from bulk cell lysate in order to increase the number of peptides detected in single cells. Specifically, ten nanograms of peptides were injected for each mixed human and yeast sample replicate and only 200 picograms of peptides (equivalent to the expected amount in a single human cell) were injected for each pure human sample replicate.

The resulting FDP analysis suggests that only FlashLFQ + PIP-ECHO appears to control the FDR in the single-cell equivalent data set (Figure ): at the 1% PIP FDR threshold PIP-ECHO’s estimated FDP is 1.0% compared with IonQuant’s 3.4%. For FlashLFQ v1.0 we could only filter at the donor level (PSMs were filtered to an FDR of 1%), the same applies to MaxQuant (PSM and protein level FDR were both set to 1%) and the corresponding estimated PIP FDPs were 6.1 and 2.0%, respectively. These results suggest that existing PIP tools are prone to errors when applied to single-cell proteomics data.

PIP Errors by FlashLFQ + PIP-ECHO, IonQuant, MaxQuant, and FlashLFQ v1.0: (A) Plot showing the number of PIP detection errors made by each software tool when evaluated using each of three different two-proteome data sets. For IonQuant and FlashLFQ + PIP-ECHO, three different PIP FDR thresholds were tested: 1, 2.5, and 5%. For FlashLFQ v1.0 and MaxQuant, N/A denotes that the software reports a list of PIP discoveries without controlling the FDR. (B) Illustration of how we estimate the three different types of errors that can occur during PIP. (C) Plot showing the number of PIPs reported by each software tool.

For the other two data sets, FlashLFQ v1.0s estimated FDP is about 2.5% for both, while the estimated FDP for MaxQuant is 1.4 and 0.9%. While IonQuant reports a higher number of discoveries than FlashLFQ + PIP-ECHO at the same 1% threshold, the estimated FDP is higher than 1%: 1.3 and 1.2%, whereas FlashLFQ + PIP-ECHO’s estimated FDP is consistently at or below 1%.

Looking closer at the breakdown of the estimated FDP we find that for MaxQuant, IonQuant, and FlashLFQ v1.0, approximately half of the errors among PIPs reported for the E. coli and Lim et al. data sets are peptide-identification errorserrors that these tools do not account for. The corresponding estimated fraction of peptide-identification errors in FlashLFQ + PIP-ECHO’s case increases with the overall FDR threshold, but remains significantly lower. This is thanks to a heuristic that PIP-ECHO employs, where it sets the donor peptide FDR threshold at one-fifth of the overall PIP FDR threshold, e.g., for a PIP-FDR of 1%, the considered donor peptides will be those that are discovered in a database-search using an FDR cutoff of 0.2%. Note that this heuristic should only impact the sensitivity of PIP-ECHO and has no effect on its FDR control. In Figure S1 we show a similar breakdown of PIP-ECHO’s estimated FDP where we disable this heuristic, defining the list of potential donor peptides using a fixed 0.2% database-search FDR cutoff, and consider a continuous range of FDR thresholds.

FlashLFQ + PIP-ECHO Quantification Improves Downstream Differential Abundance Analysis

We next compared how the quantitative results generated by IonQuant, MaxQuant, FlashLFQ v1.0, and FlashLFQ + PIP-ECHO impact downstream differential abundance analysis. In order to do so, we analyzed a spike-in data set originally published by Shen et al. This data set consists of E. coli and human lysate digests mixed at five different ratios, ranging from 3 to 9% total E. coli lysate (w/w). Four technical replicates were used for each concentration, resulting in 20 LC-MS/MS runs in total. Pairwise analyses were performed between each of the five concentrations for a total of 10 pairwise analyses. Notably, because this is a controlled data set we know which peptides and proteins are differentially expressed and what is the expected log-fold change (E. coli’s), and which are not supposed to be changed (human). This setup allows us to compute the FDP in any list of presumed differentially expressed analytes.

A common approach to differential expression analysis is to apply limma, a popular t test based R package. Originally designed for the analysis of microarray data, limma was adopted by the proteomics community to detect peptides or proteins that are differentially expressed between alternate experimental conditions. We used limma to analyze the outputs from IonQuant, FlashLFQ v1.0, MaxQuant, and FlashLFQ + PIP-ECHO and find that the latter delivers consistently more discoveries than IonQuant and MaxQuant and slightly fewer discoveries than FlashLFQ v1.0. Comparing IonQuant and FlashLFQ + PIP-ECHO, we find comparable FDP levels for every selected level of PIP FDR (including no-PIP and all PIPs). For example, using FlashLFQ + PIP-ECHO, limma is able to detect, at the 1% DE FDR level, 22% more proteins than when using IonQuant, while still yielding a DE FDP of less than 1%, and this holds for all PIP FDR thresholds (Figure S2A).

When comparing FlashLFQ + PIP-ECHO to FlashLFQ v1.0s implementation of PIP, we see that limma’s analysis based on FlashLFQ v1.0s quantification often exceeds the selected FDR threshold (Figure S2A,C) and is consistently higher than the FDP of the corresponding analysis based on FlashLFQ + PIP-ECHO. This is true even when the latter does not filter PIP transfers based on FDR. Presumably, this is thanks to PIP-ECHO’s improved ability to select correct transfers, something we will return to below. Unsurprisingly, when PIP is disabled, limma’s analysis based on FlashLFQ v1.0 delivers identical to that based on FlashLFQ + PIP-ECHO (Figure S2A–D).

Although the limma package is popular and commonly used for group comparisons, in our specific experimental design, it is not well suited for this comparison. First, limma implicitly assumes that most of the analytes (genes/peptides/proteins) are not differentially expressed, but this assumption is violated in the controlled spike-in data sets that we analyze, where a substantial fraction of the analytes is expected to be differentially expressed. Thus, it is not surprising that there are cases where limma’s differential analyses based on each of the peptide-level quantification tools apparently fails to control the FDR (Figure S2C). Notably, these apparent failures occur regardless of whether PIP is enabled as is evident by the estimated FDP in the no-PIP columns. Second, limma is based on t tests that require at least two observations in each condition being compared. In practice, this can reward inaccurate implementations of PIP, because adding more observations enables a greater number of peptides/proteins to be quantified, regardless of the correctness of the additional observations. Thus, a tool that erroneously adds more PIP identified peptides is rewarded by a limma differential expression test.

To resolve these issues, we designed our own statistical evaluation to compare the effect of the choice of quantifications tool on the differential analysis. This evaluation takes advantage of the fact that for each of 10 pairwise abundance comparisons there is a known difference in the relative abundance between the respective spike-in levels. For example, when comparing the 1× and 3× spike-in conditions, the log₂ fold-change is equal to log₂(3/1) = 1.36. So in this case, all E. coli peptides/proteins should display a log₂ fold-change of roughly 1.36, whereas all human peptides/proteins should display a log ₂ fold-change of about 0. To determine the number of accurately quantified E. coli peptides/proteins, we count the number of those whose log ₂ fold-change falls within a distance of ΔFC of their expected fold-change (1.36 in our example). We define ΔFC so as to set the FDP at 5% by looking for its smallest value for which no more than 5% of the analytes whose log ₂ fold-change falls within the same window are human. This represents a 5% FDP, and is analogous to the adjusted p-value, or FDR cutoff, of 5% that is commonly used in limma. This approach is illustrated in Figure B.

Differential Abundance Analysis of an *E. coli* Spike-In Data set. (A) Bar chart showing the number of PIP detections made by FlashLFQ v1.0, MaxQuant, IonQuant, and FlashLFQ v2.0 under a variety of analysis conditions. (B) Illustrated example of the sensitivity analysis. Results shown are from FlashLFQ v2.0, 5% PIP FDR, 1× vs 2.5× spike-in comparison. (C,D) Sensitivity analysis of the different software. Plots show the number of differentially abundant peptides and proteins detected at 5% FDP (Cprotein level analysis, Dpeptide level analysis).

When we perform this differential abundance analysis, we find that at the same 5% FDP threshold, the analysis based on FlashLFQ + PIP-ECHO’s quantifications reports the largest number of differentially abundant proteins (4476 vs 411, 3040, 4112 for IonQuant, MaxQuant and v1.0 respectively) and peptides (5821 vs 11, 2075, and 73 for IonQuant, MaxQuant and v1.0 respectively). These results are shown in Figure C–D. This increase in performance occurs despite the fact that at 1% PIP FDR threshold, PIP-ECHO reports about half the number of PIPs that IonQuant and MaxQuant report and about a third of the number reported by FlashLFQ v1.0 (Figure A).

Figure also shows that, for FlashLFQ v1.0 and IonQuant, enabling PIP actually decreases the number of differentially abundant peptides detected. This is because when PIP is performed, a greater number of human peptides are found (presumably falsely) to be differentially expressed between conditions. Notably, this is not the case for PIP-ECHO, where the addition of PIP transfers increases the number of peptides found to be differentially abundant. Interestingly, this observation holds regardless of PIP-ECHO’s selected FDR threshold, suggesting that PIP-ECHO is better at selecting correct transfers, thus generating cleaner quantitative data. This improved selection is probably thanks to a combination of the PEP-scoring of candidate traces with the competition between the randomized and properly predicted RT transfers.

When performing this analysis, it came to our attention that IonQuant demonstrates a systematic shift in the reported fold-change values for E. coli peptides and proteins, which results in the discovery of fewer differentially abundant peptides and proteins. To adjust for this, we repeated the analysis but shifted the target fold-change for IonQuant to maximize the number of differentially abundant peptides/proteins discovered (Figure S3). Even after adjusting for the systemic bias, IonQuant’s quantification yielded fewer differentially abundant proteins and peptides that of FlashLFQ + PIP-ECHO. When PIP was enabled in IonQuant, fewer differentially abundant peptides were discovered. This is due to the fact that, with the exception of FlashLFQ + PIP-ECHO, applying PIP increases the variability of both human and E. coli analytes, resulting in more human analytes displaying extreme fold-change values and increasing the FDP. This is shown in Figures S4 and S5 for the peptide and protein levels, respectively. We note that the results discussed here were obtained using IonQuant version 1.10.12. Recent updates to IonQuant have improved quantitative accuracy, and differential abundance analysis based on this improved quantification yields substantially more differentially abundant peptides and proteins.

Conclusions

While two-proteome experiments have been used in the past to evaluate PIP results, we show here for the first time how such experiments can be used to rigorously estimate the FDP of PIP, accounting for both peak-matching and peptide-identification errors. Our analysis suggests that only the new PIP-ECHO consistently controls the PIP FDR, particularly when considering single-cell analysis.

We further demonstrate that quantitative results from FlashLFQ + PIP-ECHO enable a substantially more powerful differential abundance analysis compared with competing analysis methods. In the context of the differential abundance that we devised, FlashLFQ + PIP-ECHO enables the discovery of more differentially abundant peptides and proteins than any other software tool, including the previous version of FlashLFQ. These improvements are most dramatic at the peptide level. This result suggests that FlashLFQ + PIP-ECHO is particularly well suited to the quantitative analysis of post-translational modifications or variant amino acids, two areas where changes can only be observed at the level of the individual peptide.

We plan on investigating future improvements to PIP-ECHO, including incorporating information from unidentified MS2 spectra, developing a better understanding of the effect of the FDR threshold that is used by the database-search tool to define potential donor peptides, and replacing the Percolator-like cross-validation scheme that is used in computing the PEP of candidate peak traces with the more rigorous Percolator-RESET type of approach. We stress that PIP-ECHO was designed for DDA analysis, and it is not clear to us how easy it would be to extend it to DIA tools.

We note that PIP-ECHO controls the PIP FDR separately in each acceptor run, and PIPs from different acceptor runs that involve the same donor peptide are not independent. Thus, even if PIP-ECHO controls the FDR separately in each acceptor run, strictly speaking, it is not guaranteed to control the FDR in the combined list of PIPs from all runs. However, in practice with hundreds and even thousands of PIPs from each run, the law of large numbers implies that we can safely aggregate the PIPs across all runs while essentially controlling the FDR.

An implementation of PIP-ECHO is freely available through both FlashLFQ, a standalone tool for label-free quantification, and within the MetaMorpheus software suite. While the analysis of PIP-ECHO in this paper was within the framework of MetaMorpheus and FlashLFQ, in principle it can be combined with any database-search or LFQ tool.

Supplementary Material

pr5c00065_si_001.pdf^{(15.7MB, pdf)}

Acknowledgments

Figures , , and B were created with BioRender.com.

Code used for data analysis and figure generation can be found on GitHub: https://github.com/Alexander-Sol/PIP-ECHOanalysis, 10.5281/zenodo.14112130. The Lim et al. two-proteome data set can be found at the ProteomeXchange Consortium Web site with identifier PXD014415. The Shen et al. spike-in data set can be found at the ProteomeXchange Consortium Web site with identifier PXD003881. The single-cell equivalent data set can be found on PRIDE, with project identifier PXD037527. The E. coli data set was generated specifically for this work and can be found at the ProteomeXchange Consortium Web site with the data set identifier PXD057758, alongside the outputs from every software for every analysis that was performed. FlashLFQ + PIP-ECHO was developed in the C# language as part of the mzLib library for mass spectrometry: https://github.com/smith-chem-wisc/mzLib. A standalone version of FlashLFQ + PIP-ECHO can be downloaded from https://github.com/smith-chem-wisc/FlashLFQ/releases.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jproteome.5c00065.

The following Supporting Information is available free of charge at ACS Web site http://pubs.acs.org Analysis of false discovery proportion versus false discovery rate for PIP-ECHO when analyzing two-proteome data sets; differential expression analysis using limma; differential abundance analysis adjusted for systematic error of IonQuant; peptide-level fold change histograms from the analysis of a spike-in experiment; protein-level fold change histograms from the analysis of a spike-in experiment; explanation of FDR calculation in MetaMorpheus; classes and properties used in supplementary algorithms; parameters used in scoring the match between a precursor and an MS1 peak trace; and detailed explanations of the PIP-ECHO algorithm (PDF)

Conceptualization: A.J.S., H.B., W.S.N., M.R.S., S.H.P., U.K.; Data curation: A.J.S., H.B.; Formal analysis: A.J.S., U.K.; Funding acquisition: W.S.N., M.R.S., S.H.P.; Investigation: A.J.S., H.B.; Methodology: A.J.S., H.B., M.R.S., S.H.P., U.K.; Project administration: S.H.P., U.K.; Software: A.J.S., H.B., M.R.S.; Resources: L.M.S., M.R.S., S.H.P.; Supervision: L.M.S., W.S.N., M.R.S., S.H.P., U.K.; Validation: A.J.S., M.R.S., U.K.; Visualization: A.J.S.; Writingoriginal draft: A.J.S., H.B., S.H.P., U.K.; Writingreview and editing: A.J.S., H.B., L.M.S., W.S.N., M.R.S., S.H.P., U.K.

This work was supported by NIGMS/NIH award R01GM147653 and R35GM126914, an award from the Chan Zuckerberg Initiative (Single Cell Biology 2023-323305), and an NSF award 2245300.

The authors declare no competing financial interest.

References

Elias J. E., Gygi S. P.. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods. 2007;4(3):207–214. doi: 10.1038/nmeth1019. [DOI] [PubMed] [Google Scholar]
He, K. ; Fu, Y. ; Zeng, W.-F. ; Luo, L. ; Chi, H. ; Liu, C. ; Qing, L.-Y. ; Sun, R.-X. ; He, S.-M. ; A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics. arXiv, 2015. https://arxiv.org/abs/1501.00537. [Google Scholar]
Webb-Robertson B. M., Wiberg H. K., Matzke M. M., Brown J. N., Wang J., McDermott J. E., Smith R. D., Rodland K. D., Metz T. O., Pounds J. G., Waters K. M.. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J. Proteome Res. 2015;14:1993–2001. doi: 10.1021/pr501138h. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang B., Kall L., Zubarev R. A. Demix-q: Quantification-centered data processing workflow. Mol. Cell. Proteomics. 2016;15:1467. doi: 10.1074/mcp.O115.055475. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cox J., Hein M. Y., Luber C. A., Paron I., Nagaraj N., Mann M.. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed maxlfq. Mol. Cell. Proteomics. 2014;13:2513. doi: 10.1074/mcp.M113.031591. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lipton M. S., Paša-Tolić L., Anderson G. A., Anderson D. J., Auberry D. L., Battista J. R., Daly M. J., Fredrickson J., Hixson K. K., Kostandarithes H., Masselon C., Markillie L. M., Moore R. J., Romine M. F., Shen Y., Stritmatter E., Tolić N., Udseth H. R., Venkateswaran A., Wong K. K., Zhao R., Smith R. D.. Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc. Natl. Acad. Sci. U. S. A. 2002;99(17):11049–11054. doi: 10.1073/pnas.172170199. [DOI] [PMC free article] [PubMed] [Google Scholar]
Silva J. C., Denny R., Dorschel C. A., Gorenstein M., Kass I. J., Li G. Z., McKenna T., Nold M. J., Richardson K., Young P., Geromanos S.. et al. Quantitative proteomic analysis by accurate mass retention time pairs. Anal. Chem. 2005;77(7):2187–2200. doi: 10.1021/ac048455k. [DOI] [PubMed] [Google Scholar]
Cox J., Mann M.. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008;26(12):1367–1372. doi: 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
Ivanov M. V., Bubis J. A., Gorshkov V., Tarasova I. A., Levitsky L. I., Lobas A. A., Solovyeva E. M., Pridatchenko M. L., Kjeldsen F., Gorshkov M. V.. DirectMS1: MS/MS-Free Identification of 1000 Proteins of Cellular Proteomes in 5 minutes. Anal. Chem. 2020;92(6):4326–4333. doi: 10.1021/acs.analchem.9b05095. [DOI] [PubMed] [Google Scholar]
Zhu Y., Clair G., Chrisler W. B, Shen Y., Zhao R., Shukla A. K, Moore R. J, Misra R. S, Pryhuber G. S, Smith R. D.. Proteomic analysis of single mammalian cells enabled by microfluidic nanodroplet sample preparation and ultrasensitive nanoLC-MS. Angew. Chem., Int. Ed. 2018;57(38):12370. doi: 10.1002/anie.201802843. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bielow C., Mastrobuoni G., Kempa S.. Proteomics quality control: quality control software for maxquant results. J. Proteome Res. 2016;15(3):777–787. doi: 10.1021/acs.jproteome.5b00780. [DOI] [PubMed] [Google Scholar]
Lim M. Y., Paulo J. A., Gygi S. P.. Evaluating false transfer rates from the match-between-runs algorithm with a two-proteome model. J. Proteome Res. 2019;18(11):4020–4026. doi: 10.1021/acs.jproteome.9b00492. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nesvizhskii A., Yu F., Haynes S.. IonQuant enables accurate and sensitive label-free quantification with FDR-controlled Match-Between-Runs. Mol. Cell. Proteomics. 2021;20:100077. doi: 10.1016/j.mcpro.2021.100077. [DOI] [PMC free article] [PubMed] [Google Scholar]
The M., Käll L.. Focus on the spectra that matter by clustering of quantification data in shotgun proteomics. Nat. Commun. 2020;11(1):3234. doi: 10.1038/s41467-020-17037-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Truong P., The M., Käll L.. Triqler for protein summarization of data from data-independent acquisition mass spectrometry. J. Proteome Res. 2023;22(4):1359–1366. doi: 10.1021/acs.jproteome.2c00607. [DOI] [PMC free article] [PubMed] [Google Scholar]
Millikin R., Solntsev S., Shortreed M., Smith L.. Ultrafast peptide label-free quantification with flashlfq. J. Proteome Res. 2018;17:386–391. doi: 10.1021/acs.jproteome.7b00608. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shen X., Shen S., Li J., Hu Q., Nie L., Tu C., Wang X., Orsburn B., Wang J., Qu J.. An ionstar experimental strategy for ms1 ion current-based quantification using ultrahigh-field orbitrap: reproducible, in-depth, and accurate protein measurement in large cohorts. J. Proteome Res. 2017;16(7):2445–2456. doi: 10.1021/acs.jproteome.7b00061. [DOI] [PMC free article] [PubMed] [Google Scholar]
Solntsev S. K., Shortreed M. R., Frey B. L., Smith L. M.. Enhanced global post-translational modification discovery with metamorpheus. J. Proteome Res. 2018;17(5):1844–1851. doi: 10.1021/acs.jproteome.7b00873. [DOI] [PubMed] [Google Scholar]
Shortreed M. R., Millikin R. J., Liu L., Rolfs Z., Miller R. M., Schaffer L. V., Frey B. L., Smith L. M.. Binary classifier for computing posterior error probabilities in metamorpheus. J. Proteome Res. 2021;20(4):1997–2004. doi: 10.1021/acs.jproteome.0c00838. [DOI] [PMC free article] [PubMed] [Google Scholar]
Burges C. J. C.. From RankNet to LambdaRank to LambdaMART: An overview. Learning. 2010;11:81. [Google Scholar]
Kong A. T., Leprevost F. V., Avtonomov D. M., Mellacheruvu D., Nesvizhskii A. I.. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods. 2017;14(5):513–520. doi: 10.1038/nmeth.4256. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tyanova S., Temu T., Cox J.. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 2016;11(12):2301–2319. doi: 10.1038/nprot.2016.136. [DOI] [PubMed] [Google Scholar]
Smyth G. K.. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 2004;3:3. doi: 10.2202/1544-6115.1027. [DOI] [PubMed] [Google Scholar]
Käll L., Canterbury J. D., Weston J., Noble W. S., MacCoss M. J.. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods. 2007;4(11):923–925. doi: 10.1038/nmeth1113. [DOI] [PubMed] [Google Scholar]
Wen B., Freestone J., Riffle M., MacCoss M. J., Noble W. S., Keich U.. Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment. Nat. Methods. 2025;22:1454. doi: 10.1038/s41592-025-02719-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Truong T., Webber K. G. I., Madisyn Johnston S., Boekweg H., Lindgren C. M., Liang Y., Nydegger A., Xie X., Tsang T., Jayatunge D. A. D. N., Andersen J. L., Payne S. H., Kelly R. T.. Data-dependent acquisition with precursor coisolation improves proteome coverage and measurement throughput for label-free single-cell proteomics. Angew. Chem., Int. Ed. 2023;62(34):e202303415. doi: 10.1002/anie.202303415. [DOI] [PMC free article] [PubMed] [Google Scholar]
Freestone, J. ; Käll, L. ; Noble, W. S. ; Keich, U. ; Semi-supervised learning while controlling the FDR with an application to tandem mass spectrometry analysis. bioRxiv, 2023. https://www.biorxiv.org/content/10.1101/2023.10.26.564068v3. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

pr5c00065_si_001.pdf^{(15.7MB, pdf)}

Data Availability Statement

[ref1] Elias J. E., Gygi S. P.. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods. 2007;4(3):207–214. doi: 10.1038/nmeth1019. [DOI] [PubMed] [Google Scholar]

[ref2] He, K. ; Fu, Y. ; Zeng, W.-F. ; Luo, L. ; Chi, H. ; Liu, C. ; Qing, L.-Y. ; Sun, R.-X. ; He, S.-M. ; A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics. arXiv, 2015. https://arxiv.org/abs/1501.00537. [Google Scholar]

[ref3] Webb-Robertson B. M., Wiberg H. K., Matzke M. M., Brown J. N., Wang J., McDermott J. E., Smith R. D., Rodland K. D., Metz T. O., Pounds J. G., Waters K. M.. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J. Proteome Res. 2015;14:1993–2001. doi: 10.1021/pr501138h. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] Zhang B., Kall L., Zubarev R. A. Demix-q: Quantification-centered data processing workflow. Mol. Cell. Proteomics. 2016;15:1467. doi: 10.1074/mcp.O115.055475. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref5] Cox J., Hein M. Y., Luber C. A., Paron I., Nagaraj N., Mann M.. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed maxlfq. Mol. Cell. Proteomics. 2014;13:2513. doi: 10.1074/mcp.M113.031591. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref6] Lipton M. S., Paša-Tolić L., Anderson G. A., Anderson D. J., Auberry D. L., Battista J. R., Daly M. J., Fredrickson J., Hixson K. K., Kostandarithes H., Masselon C., Markillie L. M., Moore R. J., Romine M. F., Shen Y., Stritmatter E., Tolić N., Udseth H. R., Venkateswaran A., Wong K. K., Zhao R., Smith R. D.. Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc. Natl. Acad. Sci. U. S. A. 2002;99(17):11049–11054. doi: 10.1073/pnas.172170199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] Silva J. C., Denny R., Dorschel C. A., Gorenstein M., Kass I. J., Li G. Z., McKenna T., Nold M. J., Richardson K., Young P., Geromanos S.. et al. Quantitative proteomic analysis by accurate mass retention time pairs. Anal. Chem. 2005;77(7):2187–2200. doi: 10.1021/ac048455k. [DOI] [PubMed] [Google Scholar]

[ref8] Cox J., Mann M.. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008;26(12):1367–1372. doi: 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]

[ref9] Ivanov M. V., Bubis J. A., Gorshkov V., Tarasova I. A., Levitsky L. I., Lobas A. A., Solovyeva E. M., Pridatchenko M. L., Kjeldsen F., Gorshkov M. V.. DirectMS1: MS/MS-Free Identification of 1000 Proteins of Cellular Proteomes in 5 minutes. Anal. Chem. 2020;92(6):4326–4333. doi: 10.1021/acs.analchem.9b05095. [DOI] [PubMed] [Google Scholar]

[ref10] Zhu Y., Clair G., Chrisler W. B, Shen Y., Zhao R., Shukla A. K, Moore R. J, Misra R. S, Pryhuber G. S, Smith R. D.. Proteomic analysis of single mammalian cells enabled by microfluidic nanodroplet sample preparation and ultrasensitive nanoLC-MS. Angew. Chem., Int. Ed. 2018;57(38):12370. doi: 10.1002/anie.201802843. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] Bielow C., Mastrobuoni G., Kempa S.. Proteomics quality control: quality control software for maxquant results. J. Proteome Res. 2016;15(3):777–787. doi: 10.1021/acs.jproteome.5b00780. [DOI] [PubMed] [Google Scholar]

[ref12] Lim M. Y., Paulo J. A., Gygi S. P.. Evaluating false transfer rates from the match-between-runs algorithm with a two-proteome model. J. Proteome Res. 2019;18(11):4020–4026. doi: 10.1021/acs.jproteome.9b00492. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] Nesvizhskii A., Yu F., Haynes S.. IonQuant enables accurate and sensitive label-free quantification with FDR-controlled Match-Between-Runs. Mol. Cell. Proteomics. 2021;20:100077. doi: 10.1016/j.mcpro.2021.100077. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] The M., Käll L.. Focus on the spectra that matter by clustering of quantification data in shotgun proteomics. Nat. Commun. 2020;11(1):3234. doi: 10.1038/s41467-020-17037-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] Truong P., The M., Käll L.. Triqler for protein summarization of data from data-independent acquisition mass spectrometry. J. Proteome Res. 2023;22(4):1359–1366. doi: 10.1021/acs.jproteome.2c00607. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] Millikin R., Solntsev S., Shortreed M., Smith L.. Ultrafast peptide label-free quantification with flashlfq. J. Proteome Res. 2018;17:386–391. doi: 10.1021/acs.jproteome.7b00608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] Shen X., Shen S., Li J., Hu Q., Nie L., Tu C., Wang X., Orsburn B., Wang J., Qu J.. An ionstar experimental strategy for ms1 ion current-based quantification using ultrahigh-field orbitrap: reproducible, in-depth, and accurate protein measurement in large cohorts. J. Proteome Res. 2017;16(7):2445–2456. doi: 10.1021/acs.jproteome.7b00061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] Solntsev S. K., Shortreed M. R., Frey B. L., Smith L. M.. Enhanced global post-translational modification discovery with metamorpheus. J. Proteome Res. 2018;17(5):1844–1851. doi: 10.1021/acs.jproteome.7b00873. [DOI] [PubMed] [Google Scholar]

[ref19] Shortreed M. R., Millikin R. J., Liu L., Rolfs Z., Miller R. M., Schaffer L. V., Frey B. L., Smith L. M.. Binary classifier for computing posterior error probabilities in metamorpheus. J. Proteome Res. 2021;20(4):1997–2004. doi: 10.1021/acs.jproteome.0c00838. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] Burges C. J. C.. From RankNet to LambdaRank to LambdaMART: An overview. Learning. 2010;11:81. [Google Scholar]

[ref21] Kong A. T., Leprevost F. V., Avtonomov D. M., Mellacheruvu D., Nesvizhskii A. I.. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods. 2017;14(5):513–520. doi: 10.1038/nmeth.4256. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] Tyanova S., Temu T., Cox J.. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 2016;11(12):2301–2319. doi: 10.1038/nprot.2016.136. [DOI] [PubMed] [Google Scholar]

[ref23] Smyth G. K.. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 2004;3:3. doi: 10.2202/1544-6115.1027. [DOI] [PubMed] [Google Scholar]

[ref24] Käll L., Canterbury J. D., Weston J., Noble W. S., MacCoss M. J.. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods. 2007;4(11):923–925. doi: 10.1038/nmeth1113. [DOI] [PubMed] [Google Scholar]

[ref25] Wen B., Freestone J., Riffle M., MacCoss M. J., Noble W. S., Keich U.. Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment. Nat. Methods. 2025;22:1454. doi: 10.1038/s41592-025-02719-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] Truong T., Webber K. G. I., Madisyn Johnston S., Boekweg H., Lindgren C. M., Liang Y., Nydegger A., Xie X., Tsang T., Jayatunge D. A. D. N., Andersen J. L., Payne S. H., Kelly R. T.. Data-dependent acquisition with precursor coisolation improves proteome coverage and measurement throughput for label-free single-cell proteomics. Angew. Chem., Int. Ed. 2023;62(34):e202303415. doi: 10.1002/anie.202303415. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] Freestone, J. ; Käll, L. ; Noble, W. S. ; Keich, U. ; Semi-supervised learning while controlling the FDR with an application to tandem mass spectrometry analysis. bioRxiv, 2023. https://www.biorxiv.org/content/10.1101/2023.10.26.564068v3. [Google Scholar]

PERMALINK

Improved Detection of Differentially Abundant Proteins through FDR-Control of Peptide-Identity-Propagation

Alexander J Solivais

Hannah Boekweg

Lloyd M Smith

William S Noble

Michael R Shortreed

Samuel H Payne

Uri Keich

Abstract

Introduction

1.

Methods

PIP-ECHO

Estimating False Discovery Proportion Using Two-Proteome Data sets

Protein Databases

Data Formatting

Database Search in MetaMorpheus

FlashLFQ v1.0

FlashLFQ + PIP-ECHO

IonQuant

MaxQuant

Data Censoring

Calculating FDP

E. coli Two-Proteome Data Set

Differential Expression Analysis

Results and Discussion

PIP-ECHO: Controlling for Both Types of Errors

2.

Estimating False Discovery Proportions in PIP Using Two-Proteome Data Sets

Only PIP-ECHO Consistently Controls the Overall FDR

3.

FlashLFQ + PIP-ECHO Quantification Improves Downstream Differential Abundance Analysis

4.

Conclusions

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases