We have determined that the results reported in “On the Importance of Well Calibrated Scores for Identifying Shotgun Proteomics Spectra” are problematic due to the way that precursor charge state was handled. Correcting the error leads to systematic changes in most of our results; however, the overall trends that we observe and the main conclusions of our study remain unchanged.
In addition, we made one other change to the analysis. The corrected data in Table 2 are now based on using the XCorr score as implemented in the Crux tool Tide, whereas the original, erroneous table, was generated using an older Crux search engine called “search-for-matches.” We made this change so that all of the results in the paper would be generated by the same search engine.
Table 2. Variability in PSM Discoveries Reported by Different Applications of TDC Using Calibrated and Raw XCorr Scores.
|
% only in one T-TDC (raw score) |
% only in one T-TDC (calibrated score) |
|||||
---|---|---|---|---|---|---|---|
set | FDR | 0.01 | 0.05 | 0.10 | 0.01 | 0.05 | 0.10 |
yeast | 0.05 quantile | 0.03 | 0.4 | 1.1 | 0.0 | 0.1 | 0.4 |
median | 0.2 | 0.6 | 1.5 | 0.05 | 0.3 | 0.6 | |
0.95 quantile | 5.8 | 3.0 | 3.3 | 3.6 | 1.9 | 2.2 | |
worm | 0.05 quantile | 0.0 | 0.2 | 1.1 | 0.0 | 0.0 | 0.07 |
median | 0.2 | 0.6 | 1.2 | 0.09 | 0.2 | 0.4 | |
0.95 quantile | 7.8 | 5.2 | 4.6 | 5.2 | 3.5 | 3.3 | |
Plasmodium | 0.05 quantile | 0.0 | 0.3 | 1.0 | 0.0 | 0.0 | 0.1 |
median | 0.2 | 0.7 | 1.5 | 0.08 | 0.2 | 0.4 | |
0.95 quantile | 7.0 | 3.9 | 4.0 | 2.9 | 1.7 | 2.4 |
Note that in this correction the method that the paper refers to as “Käll et al.” is now called “STDS-PIT” for “separate target–decoy search with percentage of incorrect targets.” This change was made to be consistent with our subsequent work and also because the new name is more descriptive.
We would also like to take this opportunity to clarify that we implemented TDC by comparing two competing separate searches. This approach coincides with our model and, in the case of Tide and the MS-GF+ raw score, is the same as searching the concatenated DB, but the order of the PSMs might differ in the MS-GF+ E-value case.
We also clarify that when STDS-PIT estimates π0, the proportion of incorrect target hits, we used the R package qvalue with the option bootstrap for this estimation.
Detailed List of Changes
Table 2 and Figures 3–8, 11, and 12 have been updated to reflect the new results. In addition, the following three sections of text from the original paper contain numeric values that have changed due to the reanalysis. In the quoted text, the new numbers are followed by their old values in parentheses.
“At FDR 1% and using Xcorr, we observe an increase in the number of discoveries of 31 (22), 12 (8.0), and 30 (31)% for the yeast, worm, and Plasmodium data sets, respectively. Using the MS-GF+ raw score, the corresponding improvements at the same 1% FDR level are 26 (37), 71 (61), and 27 (27)%. Presumably MS-GF+’s raw score is even less calibrated than XCorr.”
“The MS-GF+ E-value score is designed to be calibrated; thus, it is not surprising that at 1% FDR level there is little difference between using the E-value score and its 10K-calibrated version: 0.2 (1.2) and 1.4(3.3)% more calibrated TDC discoveries in the yeast and worm data sets and 0.5 (0.5)% fewer discoveries in the Plasmodium data set. Similarly, at 5% FDR level, the calibrated version of the MS-GF+ E-value identifies 0.6 (1.5)% more discoveries in the yeast data set and 0.2% more (0.2% fewer) discoveries in the Plasmodium data set. It is, however, surprising that at the same 5% FDR level the calibrated version yields 8.4 (12)% more discoveries in the worm data and that number increases to 9.7 (16)% at 10% FDR level. We suspect that some of the assumptions that go into computing the MS-GF+ E-value are violated for the worm data set, but these do not affect our robust albeit costly calibration procedure.”
“For example, at FDR 1%, STDS-PIT (Käll’s method) suggests that when using XCorr calibration increases the number of discoveries by 39 (43), 21 (20), and 49 (53)% for the yeast, worm, and Plasmodium data sets (again, these are median improvements using 1000 independently drawn decoy sets). Notably, this increase in the number of discoveries is substantially larger than we observed previously for TDC using XCorr, which yielded corresponding percentages of 31 (22), 12 (8.0), and 30 (31)%. The corresponding increases at 1% FDR when using MS-GF+’s raw score are 54 (70), 100 (79), and 46(49)% for the 10-K calibrated STDS-PIT over using the raw score (compared with 26 (37), 71 (61), and 27(27)% increases when using TDC).”