Scoring information integration with statistical quality control enhanced cross-run analysis of data-independent acquisition proteomics data

Mingxuan Gao; Shubham Gupta; Wenxian Yang; Rongshan Yu; Hannes L Röst

doi:10.1038/s42004-025-01734-5

. 2025 Nov 20;8:364. doi: 10.1038/s42004-025-01734-5

Scoring information integration with statistical quality control enhanced cross-run analysis of data-independent acquisition proteomics data

Mingxuan Gao ^1,^2,^3,⁴, Shubham Gupta ^1,^2,⁵, Wenxian Yang ⁶, Rongshan Yu ^3,^4,^6,^✉, Hannes L Röst ^1,^2,^7,^✉

PMCID: PMC12635196 PMID: 41266840

Abstract

The peptide-centric strategy is widely applied in data-independent acquisition (DIA) proteomics to analyze multiplexed MS2 spectra. However, current software tools often rely on single-run data for peptide peak identification, leading to inconsistent quantification across heterogeneous datasets. Match-between-runs (MBR) algorithms address this by aligning peaks or elution profiles post-analysis, but they are often ad hoc and lack statistical frameworks for controlling peak quality, causing false positives and reduced quantitative reproducibility. Here we present DreamDIAlignR, a cross-run peptide-centric tool that integrates peptide elution behavior across runs with a deep learning peak identifier and alignment algorithm for consistent peak picking and FDR-controlled scoring. DreamDIAlignR outperformed state-of-the-art MBR methods, identifying up to 21.2% more quantitatively changing proteins in a benchmark dataset and 36.6% more in a cancer dataset. Additionally, DreamDIAlignR establishes an improved methodology for performing MBR compatible with existing DIA analysis tools, thereby enhancing the overall quality of DIA analysis.

Subject terms: Mass spectrometry, Proteomics

Data-independent acquisition (DIA) proteomics often struggles with inconsistent peptide quantification across heterogeneous datasets due to reliance on single-run data. Here, the authors introduce DreamDIAlignR, a cross-run peptide-centric tool that leverages deep learning for consistent peak picking and FDR-controlled scoring to improve peak identification and alignment, enhancing protein and peptide detection accuracy and reproducibility in DIA analyses.

Introduction

The data-independent acquisition (DIA)-based liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) strategy enables accurate and reproducible protein identification and quantification for large-scale molecular biology research^1,2. It is one of the most widely utilized high-throughput proteome profiling methods, often used for its superior performance in cross-run quantitative cohort studies^3–7. During the DIA data acquisition process, all peptide precursors in a relatively wide mass-to-charge (m/z) ratio window are selected for cofragmentation in an unbiased manner⁸, resulting in highly multiplexed MS2 spectra that are not suitable for direct analysis by peptide search engines for the identification of peptides^9–11. To address this issue, peptide-centric analysis (PECA) methods^12,13 were developed, where a spectral library containing predefined m/z values, retention time (RT), fragment ion intensities, and all the necessary information of the peptides of interest is used to query against raw DIA data to find evidence of the presence of each peptide in the library at a certain confidence level¹⁴. In PECA, only the chromatograms of the peptides and fragment ions in the library are extracted from the raw data and analyzed in a targeted manner, injecting strong priors into the data analysis and helping to alleviate the interference from cofragmented ions and improve identification sensitivity.

Since the concept of targeted data extraction was introduced, several software tools have been developed for PECA on DIA proteomics data¹⁵. OpenSWATH¹⁶ pioneered automated DIA data analysis by extracting chromatograms (XICs), scoring co-eluting peak groups, and employing semi-supervised learning with FDR control^14,17, establishing a foundational framework for subsequent tools¹⁸. DIA-NN¹⁹ built upon this foundation by incorporating additional sub-scores and leveraging a neural network model to enhance peptide identification. Taking these advancements further, DreamDIA²⁰ replaced traditional scores with a deep learning model, demonstrating superior capability in capturing complex chromatogram features²¹.

Although existing PECA tools excel in single-run analysis, achieving consistent analyte identification and quantification across heterogeneous sample cohorts remains challenging^22–25. Most existing algorithms treat each run independently, without integrating information from other runs^16,19,20. When iterating over each run in a sample set one at a time, PECA tools attempt to find viable chromatographic signals, referred to as “peak groups”, for peptides in the spectral library sequentially and score these peak groups separately. However, these peak group scores reflect only the elution behavior of peptides in individual runs. Consequently, the target-decoy binary discriminative model relies solely on single-run peak group scores, overlooking the relationships among peak groups of the same peptide across all runs. This limitation can lead to inconsistent peak identification for a single peptide across different runs, especially when a high-scoring peak group originates from another peptide in one run but not in the other. Treating each LC-MS/MS run independently substantially hinders the ability of a statistical model to analyze large-scale datasets and correct for experimental idiosyncrasies in these datasets²⁵. While Group-DIA²⁶ attempts to leverage correlations between chromatograms across multiple runs, its effectiveness is limited to homogeneous datasets with highly similar elution signals. The fragmentation of information across runs remains a critical, unresolved challenge in computational proteomics, and overcoming it becomes essential for ensuring robust, reproducible analyses and advancing the field’s capacity to handle large-scale datasets.

To enhance cross-run peptide identification and quantification reproducibility, several match-between-runs (MBR) algorithms have been developed^19,22,23,27. These MBR algorithms compare and align signals among multiple runs after regular PECA, effectively correcting the RT locations and boundaries of falsely identified peak groups after statistical scoring. However, despite these advances, MBR algorithms still rely on the peak groups picked and the scores assigned by single-run analysis approaches. Their performance depends on high-scoring “reference” peaks observed in one or a few runs. However, the peak with the highest single-run score across all runs may not always be the correct one due to variations in interfering signals across different runs. Consequently, if the reference peak is chosen incorrectly, the identified peaks in all runs could be completely erroneous. Moreover, MBR is typically applied after statistical scoring, and thus undermines the guarantees and safeguards provided by FDR control, potentially leading to a substantial increase of the number of false positive peak groups. Currently, there are no statistical methods available capable of estimating the identification confidence of aligned peaks. In practice, MBR algorithms often require running with a heuristically increased, less strict FDR. While this approach reports more candidate peak groups, it unintentionally leads to decreased quantification performance. Consequently, MBR algorithms face a dilemma in balancing the trade-off between identification and quantification: They must either accept more low-quality but well-aligned peaks, compromising quantification accuracy, or reduce identification numbers to improve quantification.

Here, we introduce DreamDIAlignR, a cross-run peptide-centric DIA proteomics data analysis software tool. It integrates the DreamDIA²⁰ deep learning peak scorer with the MBR algorithm DIAlignR²², enabling consistent cross-run peptide identification at the entire dataset level. Instead of processing each MS injection sequentially, it considers all-run performance collectively for each library peptide. By aligning raw chromatograms first, it calculates a multi-run score for each peak, which is a weighted combination of single-run quality scores that reflects the overall performance across all runs in the dataset. Considering both single-run peak quality and multi-run global performance, the peak identification confidence can be automatically estimated using a statistical model between target peptides and decoys, eliminating the need for manual ad hoc tuning of the FDR threshold. Meanwhile, it enables the discriminative model to learn the alignment relationship among peak groups across runs, thereby reducing inconsistent cross-run peak selection. In experiments conducted on both standard datasets and highly heterogeneous datasets, DreamDIAlignR substantially outperformed state-of-the-art software tools, achieving more consistent identification and enhanced quantification accuracy. Moreover, our results show that analyzing highly heterogeneous DIA data across multiple runs simultaneously yields superior cross-run protein quantification results compared to using single-run-only methods.

Results

Design of the DreamDIAlignR workflow

DreamDIAlignR implements a general DIA data analysis workflow with match-between-runs (MBR) safeguarded by rigorous statistical control (Fig. 1a). Unlike existing tools, DreamDIAlignR performs MBR prior to FDR estimation, eliminating the need for ad hoc FDR tuning and ensuring that cross-run analysis adheres to a statistically principled quality control framework. The DreamDIAlignR workflow incorporates two recent innovations in targeted DIA analysis. First, it uses a deep learning-based method to generate a continuous quality scoring profile for peptide peak groups along the RT axis in each run. Second, a dynamic programming algorithm aligns chromatographic traces, ensuring one-to-one mapping of data points across runs. These strategies seamlessly integrate peptide identification information across all runs without discontinuity or hard cutoffs. The workflow comprises the following four main steps (Fig. 1c, see Methods for details).

Chromatogram extraction and scoring

DreamDIAlignR first extracts chromatograms of precursor and fragment ions across all runs for each library peptide. It then applies a pre-trained deep learning model to slide a scoring window along the RT, producing continuous scoring profiles that estimate the likelihood of each time point-and its surrounding context-representing a true peptide peak group (Fig. 1b). By default, DreamDIAlignR uses a Long Short-Term Memory (LSTM) model²⁸ consisting of two LSTM layers²⁰, which outputs a scalar score between 0 and 1 for each window. The model was trained on approximately one million chromatographic peaks collected from multiple instrument vendors to ensure robust and accurate peak scoring²⁰.

Multi-run chromatogram alignment

To address RT misalignment caused by sample heterogeneity and experimental variations²⁹, we then employ MBR algorithms to synchronize chromatograms and scoring profiles across runs, including both run-wide global alignment^27,30 and peptide-wide dynamic alignment via DIAlignR^22,23 to account for peptide-specific elution behavior.

Multi-run peak picking and peak scoring

By averaging aligned single-run scoring profiles, DreamDIAlignR generates a cross-run scoring profile that represents the collective elution behavior of each peptide across all runs. Candidate peak groups with the highest averaged scores are then picked, ensuring identification is based on majority consensus rather than relying solely on single-run data¹⁶. Simultaneously, we introduce a multi-run score for each peak group, which combines single-run scores weighted by global RT similarity, providing a balanced assessment of both individual and cross-run elution behavior to improve peak group identification.

Statistical analysis

The output of DreamDIAlignR is a discriminative model to distinguish between real peptides and artificially created decoys^14,17,19. In contrast to routine methods that only concern peak groups’ single-run behavior, DreamDIAlignR extends existing statistical approaches^14,17 to learn the peak group correspondence across runs by utilizing both single-run scores and multi-run scores, thereby enabling more consistent peak selection and more comprehensive confidence estimation with a cross-run horizon.

Feasibility of cross-run signal alignment and integration

We first performed a feasibility test of our cross-run signal alignment and integration strategy using a pilot dataset, the Streptococcus pyogenes (S. pyogenes) dataset^27,31, which included approximately 7000 manually annotated peak groups across 16 LC-MS/MS injections (Supplementary Fig. S20a).

As an example, the DreamDIA scoring profile for each run of the target peptide (Fig. 2b) is obtained by moving the DreamDIA peak group scorer along the RT axis of the corresponding chromatograms (Fig. 2a). Notably, the score apex regions show major concordance with the manually annotated peak group regions, indicating the peak identification accuracy of the DreamDIA scorer. In addition, although RT shifts among peak groups across multiple runs are clearly visible, the signals can be effectively synchronized when MBR algorithms like DIAlignR are applied (Fig. 2c). While all the MBR algorithms mitigate RT discrepancies across multiple runs, the chromatograms aligned by either global lowess method or DIAlignR demonstrate superior synchronization compared to the global linear method (Supplementary Fig. S1).

Moreover, although DreamDIA can correctly identify the locations of peak groups, there still remain some high-score noisy signals in the scoring profiles that fall outside the true peak group regions (Fig. 2b, c). To address this, we averaged the aligned scoring profiles from multiple runs. This process diluted and suppressed the random noise originating from a single experiment alone, resulting in a cross-run scoring profile with over 2-fold higher signal-to-noise ratio (SNR) compared to the single-run profiles (Fig. 2d). The averaged scoring profile will then be utilized for cross-run peak picking, which yields greater accuracy and consistency compared to sequential single-run peak picking methods.

We next evaluated the accuracy of peak group identification using manual annotations as ground truth, comparing different MBR approaches and DreamDIA without multi-run analysis (Fig. 2e). Compared to regular DreamDIA, global lowess alignment and DIAlignR reduced false identifications by 41% and 36%, respectively, while global linear alignment showed no improvement. This aligns with previous findings²², where global linear models failed to address run-to-run RT variation for this dataset. Non-linear lowess alignment also achieved lower residuals than linear alignment, demonstrating its robustness for datasets with significant RT shifts (Supplementary Fig. S2). DIAlignR, on the other hand, directly aligns chromatograms across runs for individual peptides. It provides the flexibility to adjust the alignment function for each peptide, resulting in comparable results to the global lowess alignment. It is noteworthy that different datasets exhibit varying cross-run RT discrepancy patterns, indicating that the performance of various MBR algorithms can vary across datasets. When choosing and benchmarking MBR algorithms, flexibility and robustness play a crucial role.

Simultaneously improved identification and quantification performance

We benchmarked DreamDIAlignR’s peptide and protein identification and quantification performance using the LFQbench HYE110 dataset, which includes two replicated samples with known inter-species abundance ratios (human 1:1, yeast 10:1, and E. coli 1:10) for evaluating DIA tools’ ability to recover these ratios³² (Supplementary Fig. S20b).

We first evaluated the performance of the MBR algorithms implemented in different software tools (Fig. 3a and Supplementary Fig. S3a). At 1% precursor FDR, DreamDIAlignR uniquely improved both identification and quantification, identifying 26.7% more peptides and reducing quantification bias by 17.6%. In contrast, DIA-NN’s MBR increased peptide identification by 11.9% but raised quantification bias by 46.1%. OpenSWATH’s MBR reduced quantification bias by 28.9% but at the expense of a slight 3.0% loss in peptide identification. Benchmarking at the protein level showed similar trends (Supplementary Fig. S3a). These findings indicate that both OpenSWATH and DIA-NN have to trade off quantitative accuracy with identification performance during MBR (with DIA-NN producing more, but quantitatively worse peptide identifications, while OpenSWATH produces less, but quantitatively better peptide identifications). However, this is not the case for DreamDIAlignR, which achieved better results in both.

Next, we benchmarked DreamDIAlignR’s overall performance with the state-of-the-art tools. DreamDIAlignR showed superior cross-run quantification accuracy compared to OpenSWATH and DIA-NN, with the A:B quantification ratios closer to the ground truth ratio line for each species and slightly higher dispersion than DIA-NN (Fig. 3b and Supplementary Fig. S3b). In many experimental settings, accurate identification and quantification of analytes that change in abundance are of particular interest since these could indicate proteins of interest for a drug target, biomarker or mechanistic study. We therefore focused our analysis next on the yeast and E. coli proteomes, which comprise the variable part of the A:B mixture experiment. Notably, all software tools yielded comparable numbers of human peptides and proteins, while DreamDIAlignR outperformed the other tools in identifying yeast and E. coli analytes (Fig. 3c and Supplementary Fig. S3c). It improved yeast and E. coli peptide precursor identifications by 23.5% and 57.5% over DIA-NN and OpenSWATH (Fig. 3c), and proteins by 21.2% and 47.3% (Supplementary Fig. S3c), respectively. The identified peptides and proteins that form valid A:B ratios show strong concordance with DIA-NN and OpenSWATH, with additional analytes uniquely identified by DreamDIAlignR, supporting the reliability of its results (Fig. 3d and Supplementary Fig. S3d). These additional peptides and proteins are predominantly from yeast and E. coli, indicating that DreamDIAlignR can reliably detect analytes across both high- and low-abundance samples to generate valid and accurate ratios (Supplementary Fig. S4). Manual inspection confirmed superior cross-run peak group identification consistency and quantification accuracy, particularly in low-abundance runs (Supplementary Fig. S5). These findings highlight DreamDIAlignR’s ability to detect more quantitatively changing proteins without compromising accuracy, offering significant advantages for multi-run analyses.

To estimate potential bias from overly optimistic FDR control by a single tool, we performed an entrapment analysis by spiking Arabidopsis peptides into the LFQbench library. Before MBR, all software tools exhibited well-calibrated FDRs, with the two-species FDR slightly lower than the reported FDR, indicating conservative estimation (Fig. 3e). However, DIA-NN’s MBR inflated the FDR by nearly threefold, reflecting a fundamental limitation of applying MBR after single-run-based FDR estimation. This approach bypasses the statistical calibration step for newly matched identifications, making it difficult to maintain reliable FDR control and potentially leading to inflated confidence in the results^33,34. In contrast, OpenSWATH’s MBR produced overly conservative FDR estimates, likely due to the ad hoc removal of low-confidence identifications. DreamDIAlignR, however, maintained a well-calibrated FDR closely matching the two-species FDR, underscoring the robustness and accuracy of its FDR control strategy.

Next, we analyzed performance across a broad range of quality thresholds by evaluating the number of identified peptides versus the quantitative accuracy achieved on the benchmark dataset at different FDR cutoffs. This approach enables direct comparison, even in cases where the self-reported FDR is not well-calibrated, as observed with tools such as DIA-NN and OpenSWATH with MBR. The results showed that DreamDIAlignR substantially improved MBR performance and achieved the best overall results among all tools, with or without the MBR strategy (Fig. 3f and Supplementary Fig. S6). Across commonly used FDR thresholds ranging from 1% to 5%, DreamDIAlignR identified more analytes while maintaining superior quantification accuracy. To facilitate a more intuitive comparison, we added a vertical cutoff line at DIA-NN’s default 1% FDR level, enabling a fair comparison of the number of peptides identified by different tools at a consistent quantification accuracy. At a species separation ability (SSA) of 0.989, DreamDIAlignR identified 6.8% and 40.0% more peptides than DIA-NN with MBR and OpenSWATH with MBR, respectively (Fig. 3g). Results using alternative quantification metrics and cutoff values showed consistent trends (Supplementary Fig. S7). Furthermore, benchmarking different MBR algorithms available in DreamDIAlignR showed that the DIAlignR algorithm outperformed the other alignment methods (Supplementary Fig. S8). Notably, even when benchmarked against DIA-NN under its inflated FDR setting, DreamDIAlignR still achieved superior performance.

The multi-run score ensures reliable cross-run peptide identification

Statistical error control is essential for ensuring reliable multi-run proteomics analysis¹⁴. Lim et al.³³ highlighted this by designing an entrapment experiment, showing that MBR algorithms for DDA data caused an 8-fold increase in false identifications, including the erroneous identification of yeast peptides in human-only samples in their 2-Sample, 2-Proteome Challenge³⁴. To mitigate erroneous alignment, DreamDIAlignR employs an exponential weight decay function governed by a penalty parameter k, which calculates multi-run scores by assigning higher weights to similar runs while penalizing contributions from distant ones. This approach prioritizes relevant scoring information, minimizing false identifications and avoiding inflated scores, particularly in highly heterogeneous datasets.

Herein, we investigated the impact of the parameter k on the identification performance by using an entrapment experiment as well. A subset of 24 samples, consisting of 12 human-proteome-only runs and 12 human-yeast mixed runs from the Procan large-scale cancer study²⁵, was selected for testing (Supplementary Fig. S20d). As k increased, the number of peptides identified declined, with a watershed k value of 50 (Fig. 4a). Compared to DreamDIAlignR without weight decay (k = 0), the number of entrapment (yeast) peptides falsely identified in the human-only runs decreased by 92.1% when k = 50. This reduction was significantly greater than the decreases observed in the yeast peptides identified in human-yeast mixed runs (10.4%) and human peptides in human-only runs (1.6%). We also monitored changes in FDR before and after applying the weight decay function. Two-species FDRs were calculated both with and without accounting for the differing likelihoods of human and yeast peptide identification, represented as the upper-bound and lower-bound FDRs³⁵, respectively (See Methods). With a k value of 50, the upper-bound two-species FDR significantly decreased from 9.05% to 0.83%, while the lower-bound FDR also dropped from 1.91% to 0.18%. To further validate robustness, we conducted sensitivity tests across multiple datasets by tracking the total number of identifications at varying k values. A similar decreasing trend followed by a plateau was consistently observed, indicating the suppression of potential false positives arising from inflated multi-run scores (Supplementary Fig. S9). Based on this, we implemented an elbow-point method to automatically select an appropriate k, ensuring adaptability across datasets. Overall, these results demonstrate that the weight decay function substantially reduces false positives while preserving true identifications, enabling reliable cross-run FDR control.

Fig. 4 — a The number of peptide precursors identified and the corresponding two-species false discovery rate (FDR), calculated from mixed and human-only samples using various weight decay parameters, denoted as k. The parameter k controls the penalization of distant runs in the exponential weight decay function. A k value of 0 indicates that the weight decay approach has been deactivated. The number of yeast peptides identified in human-only runs serves as a measure of false targets. The two-species FDR was calculated using the “combined” method reported by Wen et al³⁵ (See Methods). Error bars represent the standard deviation across 12 independent replicates (n = 12). b Run weight matrix without weight decay, showing hierarchical clustering based on similarity metrics. c Run weight matrix with weight decay (k = 50), showing hierarchical clustering based on similarity metrics.

In DreamDIAlignR, the final score assigned to a peak is a combination of its single-run score-reflecting the quality of its individual chromatographic signal-and a multi-run score, which captures the consistency of the peak across aligned runs. For the multi-run component, contributions from other runs are weighted according to the global RT similarity between the target run and each reference run. This strategy ensures that peaks from poorly matched runs do not disproportionately influence the multi-run score, even if they align with high-quality peaks. To validate this weighting scheme, we examined the run weight matrices before and after applying the weight decay function (Fig. 4b, c). As expected, highly similar runs (e.g., technical replicates) contributed more prominently to the multi-run score, while dissimilar runs (e.g., different sample types) were largely downweighted.

We also evaluated the impact of different global RT similarity metrics (see Methods) on cross-run performance. Among the four tested metrics, NC similarity consistently performed best: it effectively clustered similar runs, distinguished between different sample types-even those acquired on different instruments (Supplementary Fig. S10)-and resulted in more true identifications while maintaining fewer false identifications (Supplementary Fig. S11). Based on this evidence, NC similarity was selected as the default global RT similarity metric in DreamDIAlignR.

To assess whether false peptides in the library affect global RT similarity estimation, we conducted a spike-in experiment using varying amounts of entrapment Arabidopsis peptides³⁶ in the LFQbench dataset. We found that DreamDIAlignR could still estimate global similarity accurately and robustly-even with library specificity reduced to 5%-provided a sufficient number of peptides were used for alignment (Supplementary Fig. S12).

In addition, the multi-run score, which indicates the overall quality of the aligned peaks across all runs, is a concept that current PECA software tools do not possess to the best of our knowledge. Therefore, we investigated whether the multi-run score could enhance the discriminative power of the statistical model to distinguish between target peptides and decoys. Results showed that the multi-run scores of yeast peptides in human-yeast mixed runs (true targets) were significantly higher than those in human-only runs (false targets) after applying weight decay (Supplementary Fig. S13a, b). Moreover, the multi-run score ranked second among all the sub-scores used by the statistical model, with a feature importance of 27.0% (Supplementary Fig. S13c). An ablation test also showed that discarding all multi-run scores before building the statistical model caused the number of identifications to revert to levels comparable to DreamDIA (Supplementary Fig. S13d). These results highlight the critical role of the multi-run score in helping the model accurately identify peptides in appropriate runs during entrapment testing.

Better identification and quantification performance for highly heterogeneous datasets

One notable feature of DreamDIAlignR is its ability to analyze highly heterogeneous datasets through signal alignment and integration of multiple runs. Therefore, we evaluated its performance on the Procan²⁵ cancer dataset, featuring mixed-species proteomes similar to LFQbench (Ovary 1:4, Prostate 1:1, Yeast 1.75:1) but with greater heterogeneity due to acquisition on different instruments over extended intervals (See Methods, Supplementary Fig. S20e).

Similar to the LFQbench analysis, we began by benchmarking the MBR algorithms across different software tools. At a 1% precursor FDR, DreamDIAlignR was the only MBR approach that consistently improved both identification and quantification performance. Compared to DreamDIA single-run analysis, DreamDIAlignR increased the number of identified peptides by 40.6% while reducing quantification bias by 50.6% (Fig. 5a). In contrast, OpenSWATH maintained a similar number of identifications but primarily reduced quantification bias by 24.1% after applying MBR. DIA-NN increased identifications by 24.5%, but at the cost of a 56.0% increase in quantification bias. Among all tools employing MBR, DreamDIAlignR achieved the highest identification count alongside the best quantification accuracy.

Fig. 5 — a Performance comparison of match-between-runs (MBR) methods for DreamDIA, DIA-NN, and OpenSWATH at 1% precursor false discovery rate (FDR). Total quantification bias is computed as the geometric mean of two normalized quantification metrics: median bias (MB) and dispersion (DISP). Number of valid peptide ratios (Sample A vs. Sample B) for yeast and ovary peptides, plotted against the corresponding median bias (MB; b) and dispersion (DISP; c) across a range of FDR thresholds. A ratio is considered valid if the peptide is identified in at least 3 runs from both Sample A and Sample B. Data points represent the mean MB and DISP across yeast and ovary peptides. “♢” and “▿” indicate results at 1% and 5% precursor FDR, respectively. Vertical dashed lines denote quantification cut-offs at 1% and 5% FDR, based on DIA-NN, which were used to compare the number of identifications across different software tools in (d). d Number of valid yeast and ovary peptides identified by each tool at the benchmark quantification levels indicated in (b, c, e). Venn diagram showing overlap of identified peptides across software tools. f Benchmark of quantification matrix completeness. Percentages represent the proportion of validly quantified peptides out of all peptides identified. Solid bars denote the average number of quantified peptides per run; dashed bars show the total matrix dimensionality divided by the number of runs (36). Quantification matrices were filtered at 1% precursor FDR, and peptides identified in fewer than 3 runs were excluded.

In the overall performance benchmark independent of FDR threshold selection, DreamDIAlignR significantly outperformed other tools in both identification and quantification (Fig. 5b, c). At equivalent quantification accuracy levels, DreamDIAlignR identified substantially more peptides than both OpenSWATH and DIA-NN-for example, 22.7% more at MB = 0.530 and 23.0% more at DISP = 0.056, compared to DIA-NN (Fig. 5d). Identifications were also consistent with those from other tools, confirming the reliability of the results (Fig. 5e). Among the three MBR algorithms implemented in DreamDIAlignR, DIAlignR showed the best performance for this dataset, yielding lower median bias than global alignment-based methods (Supplementary Fig. S14). Given the higher number of identified analytes, we further examined matrix completeness. DreamDIAlignR improved data completeness by 13.1% relative to DreamDIA single-run analysis, and by 4.9% and 7.0% compared to DIA-NN and OpenSWATH with MBR, respectively (Fig. 5f).

Moreover, to evaluate DreamDIAlignR on large-scale datasets, we analyzed the Procan494 dataset, consisting of 494 runs with technical replicates and varying species concentration ratios, enabling calibration curve generation (Supplementary Fig. S20f). Compared to DIA-NN, DreamDIAlignR demonstrated superior identification and quantification, identifying 21.2% more peptides and achieving a 2.7% higher median R² at 1% FDR (Supplementary Fig. S15). This result highlights the robust performance of DreamDIAlignR in large-scale, high-throughput proteome profiling. We further assessed DreamDIAlignR using an LFQbench dataset acquired on an Orbitrap QE HF-X instrument with a different acquisition strategy-staggered all-ion fragmentation (AIF) (Supplementary Fig. S20c). In this setting, DreamDIAlignR again achieved the best overall identification and quantification performance (Supplementary Fig. S16a, b). When controlling for the number of identified peptides, DreamDIAlignR produced quantification ratios that were closer to the ground truth and exhibited lower dispersion than DIA-NN (Supplementary Fig. S16c, d). These results demonstrate the robustness of DreamDIAlignR across different instrument platforms and acquisition methods.

Improved cross-run quantification provides more insightful data for biological analysis

Drawing meaningful biological conclusions in cohort proteomics studies hinges on the precise identification of differentially expressed signature proteins across samples. Achieving this requires robust and accurate cross-run peptide quantification, which enables the reliable detection of significant fold changes through statistical analysis amidst a large background of non-changing signals. Given that DreamDIAlignR has demonstrated superior cross-run quantification performance in benchmarking studies, we seek to evaluate its potential for providing deeper insights into biological changes in proteomic analyses.

We conducted differential expression analysis on the 2-Sample, 2-Proteome dataset using limma³⁷. Since only the Sample B runs contain ovarian cancer cells (Supplementary Fig. S20d), the up-regulated proteins in these runs can be considered ovarian cancer-related proteins. Our results demonstrated that DreamDIAlignR identified 36.6% and 109.6% more differentially expressed proteins at p < 0.01 using limma compared to DIA-NN with MBR and OpenSWATH with MBR, respectively (Fig. 6a). While a substantial overlap was observed in the proteins identified by the different software tools, DreamDIAlignR detected a greater number of unique proteins that are quantitatively changing (Fig. 6b). This suggests that DreamDIAlignR has a higher potential to uncover meaningful functional proteins or biomarkers in cohort-based proteomics studies.

Fig. 6 — a The numbers of differentially expressed proteins identified by DreamDIAlignR, DIA-NN with MBR and OpenSWATH with MBR in the 50% ovarian cancer tissue runs. The horizontal dashed lines indicate an adjusted p-value threshold of 0.01. The vertical lines represent a fold-change cut-off of 2. Proteins outside these thresholds are considered significantly up-regulated in the runs with 50% ovarian cancer tissue. b Consistency comparison of differentially expressed proteins identified by various software tools. c Over-representation analysis using Disease Ontology (DO) for the up-regulated proteins shown in a, highlighting DO terms related to ovarian cancer. d Ovarian cancer-related proteins were identified in the DO analysis using three different software tools. e Comparison of the top 5 over-represented gene sets in ProteomicsDB for the up-regulated proteins, identified using different software tools.

To assess the biological relevance of the increased protein identifications by DreamDIAlignR, we performed over-representation analysis³⁸ on the up-regulated proteins from the Sample B runs using two databases: Disease Ontology (DO)³⁹ and ProteomicsDB⁴⁰. In the DO analysis, the up-regulated proteins identified by DreamDIAlignR showed a stronger over-representation in ovarian cancer-related DO terms and yielded significantly lower p-values compared to the results from the other two software tools (Fig. 6c). Consistent with the overall overlap in differentially expressed proteins (Fig. 6b), DreamDIAlignR detected the majority of ovarian cancer-related proteins identified by the other two software tools, with its uniquely identified proteins also showing a strong association with ovarian cancer (Fig. 6d). Manual inspection of peaks identified by various software tools further showed that DreamDIAlignR delivered more complete and accurate cross-run quantification while effectively filtering out low-quality peaks (Supplementary Fig. S17, S18). This capability enhances differential expression analysis, providing higher statistical confidence compared to other tools.

Since Disease Ontology serves as a general semantic database primarily focused on linking genomic data with disease features and mechanisms³⁹, we additionally utilized ProteomicsDB⁴⁰-a database providing orthogonal disease information represented at the protein level-to perform over-representation analysis from a proteomics perspective. To ensure the completeness of the analysis results for different software tools, here we set the p-value cutoff for the over-representation analysis to 0.1 (while maintaining a 1% precursor FDR cutoff for all peptide identifications). Our results indicate that both DreamDIAlignR and DIA-NN with MBR rank OVCAR-4, a human ovarian cancer cell line, as the top gene set in the over-representation analysis (Fig. 6e). Notably, DreamDIAlignR identifies 45.5% more OVCAR-4 proteins than DIA-NN and achieves a significantly lower p-value (below 0.01). If the p-value cutoff is set to 0.05 or stricter, DreamDIAlignR remains the only tool capable of ranking the ovarian cancer cell line gene set first. Our analysis reveals that rather than merely improving the number of identifications, DreamDIAlignR strengthens DIA data analysis by enhancing the accuracy and coverage of identifying quantitatively changing analytes across samples. This capability supports the discovery of biologically meaningful signature proteins, providing valuable insights in real biomedical studies that involve highly heterogeneous data.

Discussion

Consistently identifying and accurately quantifying peptides in biologically or technically heterogeneous samples remains not only the ultimate goal in DIA-based high-throughput proteomics research, but also a highly demanding task for all DIA data analysis tools²⁰. Among the challenges encountered by DIA data analysis tools, a prominent one stems from the increased stochasticity of peptide elution signal selection across multiple samples, particularly when these samples have highly varied biological compositions or data acquisition conditions. Hence, the concept of harnessing multiple-run signal to alleviate interference and collectively identify chromatographic peaks is intuitive and has also been explored^26,41,42. However, these existing tools were developed with the assumption of data homogeneity and only incorporated rudimentary signal alignment approaches, which were not suited for highly heterogeneous data. As a result, their use has been significantly limited in large-scale biomedical studies. To address this, current MBR algorithms^22,23,27 prioritized fixing the retention time shift in heterogeneous data after a single-run analysis routine. However, they leave users with a dilemma: whether to accept aligned peaks of potentially lower quality or whether to reject potentially correct peaks to increase the stringency, due to the absence of a statistical framework for MBR. In essence, current “multi-run analysis tools” fall short of true multi-run capabilities, still confined by the conventional single-run analysis perspective.

In this study, we present an approach for performing MBR in DIA, which integrates data from all available LC-MS/MS runs and applies statistical error control after the match-between-runs, thereby ensuring that cross-run identification and quantification adhere to a rigorous statistical framework. This concept is in principle applicable to all DIA analysis tools, such as DIA-NN, OpenSWATH or Spectronaut, and addresses one of the last remaining challenges of DIA analysis: handling large-scale, heterogeneous study designs. Our findings demonstrate that DreamDIAlignR stands out as the only MBR method capable of simultaneously improving both identification and quantification performance, achieving a 36.6% increase in differentially expressed proteins in a realistic case-control dataset. Moreover, it is the only tool that reliably controls the FDR of aligned peaks by leveraging a rigorous statistical framework, rather than relying on ad hoc alignment strategies. While the Python-based implementation of DreamDIA introduces some computational overhead-primarily during single-run analysis-benchmark results show that DreamDIAlignR’s MBR runtime remains comparable to that of other tools (Supplementary Fig. S19). As LC-MS/MS technology and throughput continue to improve, we anticipate that DreamDIAlignR will not only assist proteomics researchers but also provide deeper insights into the analysis of large-scale heterogeneous omics data.

Methods

DreamDIAlignR workflow

DreamDIAlignR was developed based on DreamDIA²⁰ and DIAlignR²² software libraries. The workflow of DreamDIAlignR includes four major steps described (See Fig. 1c) as follows.

chromatogram extraction and scoring

Before extracting chromatograms, DreamDIAlignR performs RT normalization using endogenous peptides sub-sampled from the spectral library. Initially, 4000 iRT peptides are randomly selected from the library by default. To ensure comprehensive iRT coverage and a representative sampling of the full library, the entire iRT range (e.g., −50 to 250) is divided into equal-width bins (layers), and peptides are randomly selected from each bin. Additionally, a 20% oversampling is applied to the first and last layers to improve model accuracy at the boundaries. Then, for each run, DreamDIAlignR identifies the RT locations with the highest scores for all the endogenous iRT peptides across the entire RT gradient using a pre-trained deep learning peak group scorer. This scorer is the built-in LSTM deep learning model in DreamDIA²⁰, used without re-training or parameter fine-tuning. The RT location with the highest score is designated as the best RT. Peak groups located at these best RTs, with spectral cosine similarity scores above a certain threshold (0.95 by default), are considered validly identified iRT peptides. These best RTs are then used to fit a linear or non-linear model against their corresponding iRT values in the library. In DreamDIAlignR, the RT normalization step serves to both fit an RT versus iRT model to narrow down the peak group searching range^16,43,44 and to obtain the global similarity among all runs. iRT peptides with fitting residuals below a specified threshold in each run are deemed confidently identified, or inliers. Global RT similarity between runs is then calculated based on the selected inlier peptides. DreamDIAlignR implements four similarity metrics: NC similarity, intensity similarity, XIC cosine similarity, and aligned XIC cosine similarity. The NC similarity²³ between each run pair in the dataset is defined as:

w_{i j} = 2 \frac{N_{common}}{N_{{run}_{i}} + N_{{run}_{j}}}

where w_ij denotes the NC similarity, and N_common represents the number of common inlier peptide IDs in both run_i and run_j. Intensity similarity is calculated as the cosine similarity of the total extracted MS2 ion intensities over the union of inliers. XIC cosine similarity is the mean cosine similarity of XICs for the union of inliers between two runs. Aligned XIC cosine similarity is similar to the XIC cosine similarity but uses aligned XICs for the comparison. In our benchmark, NC similarity performed best and is therefore set as the default. The similarity scores between all run pairs collectively form a global similarity matrix used for subsequent alignment steps.

Subsequently, DreamDIAlignR extracts the chromatograms for the top 6 fragment ions by default for all the target peptides and decoys in each run. With extracted chromatograms as input, the deep learning scorer slides along the RT ranging 200–1500 seconds, and scores the signals in each sliding window as a candidate peak group. To enhance the robustness and transferability of our approach, the scoring model also incorporates theoretical fragment ions based on the amino acid sequences of the peptides for peak group identification²⁰. Consequently, both archived fragment ions from libraries and theoretical fragment ions have their respective chromatograms extracted in this process. DreamDIAlignR then calculates three additional single-run scores for each peak group except the DreamDIA deep learning score: spectral cosine similarity score, MS1 area score and MS2 area score. All these four scores create continuous traces along the RT axis, named as scoring profiles. The XICs and scoring profiles for each run are stored in SQLite database files for future use.