Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2023 Sep 3:2023.02.02.526809. Originally published 2023 Feb 3. [Version 3] doi: 10.1101/2023.02.02.526809

Data-Driven Optimization of DIA Mass Spectrometry by DO-MS

Georg Wallmann 1, Andrew Leduc 1, Nikolai Slavov 1,2,
PMCID: PMC9915643  PMID: 36778474

Abstract

Mass spectrometry (MS) enables specific and accurate quantification of proteins with ever increasing throughput and sensitivity. Maximizing this potential of MS requires optimizing data acquisition parameters and performing efficient quality control for large datasets. To facilitate these objectives for data independent acquisition (DIA), we developed a second version of our framework for data-driven optimization of mass spectrometry methods (DO-MS). The DO-MS app v2.0 (do-ms.slavovlab.net) allows to optimize and evaluate results from both label free and multiplexed DIA (plexDIA) and supports optimizations particularly relevant for single-cell proteomics. We demonstrate multiple use cases, including optimization of duty cycle methods, peptide separation, number of survey scans per duty cycle, and quality control of single-cell plexDIA data. DO-MS allows for interactive data display and generation of extensive reports, including publication quality figures, that can be easily shared. The source code is available at: github.com/SlavovLab/DO-MS.

Introduction

Mass spectrometry (MS) allows for comprehensive quantification and sequence identification of proteins from complex biological samples1. Reliable sequence identification of peptides by MS relies on the fragmentation of peptides2. This can be performed for one precursor at a time, as in the case of data dependent acquisition (DDA) or for multiple precursors in parallel as in the case of data independent acquisition (DIA). Using real-time instrument control for DDA can achieve high sensitivity, depth and data completeness3,4 but remains limited to fragmenting only a subset of the available precursors. This limitation is relaxed by DIA, which systematically selects groups of precursors for fragmentation which cover the whole m/z range5,6. This parallel analysis of multiple precursors can have many benefits, including: (1) consistent collection of data from all detectable peptides7, (2) high sensitivity due to long ion accumulation times8, and (3) high-throughput due to the parallel data acquisition9. Despite these benefits, parallel fragmentation of all precursors within the isolation window results in highly complex spectra.

This complexity initially challenged the interpretation of DIA spectra, but advances in machine learning and computational power have gradually increased sequence identification from DIA spectra. Initial approaches were based on sample-specific spectral libraries, but newer methods have allowed for direct library-free DIA and deeper proteome coverage1014. Many current approaches use computationally predicted peptide properties (libraries),15 which removes the overhead of experimentally generated libraries. These improvements continue with new acquisition methods1618 and contribute to achieving high proteome depth, data completeness, reproducibility, and throughput19,20. This has enabled the quantitative analysis of proteomes down to the single-cell level2124, and can continue to increase the throughput and accuracy of single-cell proteomics towards its biological applications25.

Orthogonal to the acquisition method, performance can be further increased when labeling samples with non-isobaric mass tags and analyzing them with the plexDIA framework2628. Multiple labeled samples can be combined and analysed in a single acquisition, multiplicatively increasing the number of protein data points29. At the same time, quantitative accuracy and proteome coverage are preserved. As identifications can be translated between different samples labeled by non-isobaric mass tags26.

To further empower these emerging capabilities, we sought to extend the DO-MS app to optimization and quality control of DIA experiments by developing and releasing its second major version, v2.0. Indeed, optimization of DIA workflows requires setting multiple acquisition method parameters, such as the number of MS1 survey scans and the placement of fragmentation windows. These parameters must be simultaneously optimized for multiple objectives, including throughput, sensitivity and coverage. Defining the optimal acquisition method therefore becomes a multi-objective, multi-parameter optimization30,31. Many tools already exist which cover some aspects of method optimization, like MS2 window placement18,32,33. Others focus on quality control34,35. DO-MS takes a different approach and offers a holistic view of the acquisition and data processing method specifically designed to diagnose analytical bottlenecks31. With this release, DO-MS v2.0 can be used with both DDA data like MaxQuant and DIA data from tools like DIA-NN while having an open interface allowing for adoption to other search engines.

DO-MS is particularly useful for optimizing single-cell proteomic and plexDIA analysis by displaying numerous features relevant to these workflows. These features include intensity distributions for each channel of n-plexDIA27,29 and ion accumulation times, which are useful for optimizing single-cell analysis36,37, particularly when using isobaric and isotopologous carriers27,38. In addition to optimization, DO-MS also facilities data quality control and experimental standardization with large sample cohorts, especially large scale single-cell proteomic experiments39,40. Here we demonstrated how DO-MS helps achieve these aims in concrete use cases.

Results

We developed DIA-specific modules of the DO-MS app31 to enable monitoring and optimization of DIA experiments. The DO-MS v2.0 app consists of two parts: A post-processing step which collects additional metrics on the performance of the acquisition method in use and an interactive application to visualize the metrics and results reported by DIA search engines, Fig. 1. All components are built in a modular way, which allows creating new visualization modules and extending the input source to other search engines (the default engine is DIA-NN13). The base functionality is available for all input formats compatible with the respective search engine, which Includes Thermo Fisher Scientific Orbitrap, as well as Bruker TimsTOF data.

Figure 1 |. Schematic of the DO-MS pipeline version 2.0.

Figure 1 |

A schematic of the processing and intermediate steps of the updated DO-MS pipeline. Input files (blue) in the raw format are searched by a search engine (the default one is DIA-NN13) and converted to mzML using a custom version of the ThermoRaw-File parser41. The search report from DIA-NN and the mzML are then used by the post-processing step to analyze and display data about MS1 and MS2 accumulation times, total ion current (TIC) information, precursor-wise signal to noise levels and MS1 features.

Further, instrument-specific information is collected in a post-processing step which is only implemented for Thermo Fisher Scientific Orbitrap42 raw files. However, the user has the flexibility to adapt the method to other vendors, given that they can be converted to the open mzML format43 using tools like msConvert44. The current implementation uses a custom version of the ThermoRawFileParser,41 which reports additional instrument specific information like the noise level. It is implemented in Python45 and can be called from the command line which allows the search engine to automatically call post-processing after it has finished the search. General metrics like the TIC and the MS1 and MS2 accumulation times are extracted and reported in individual files. Precursor specific metrics, such as the signal to noise level (S/N), are reported based on the search engine results. Peptide like features are identified using the Dinosaur feature finder46. This step is independent of the amino acid sequence identification of a precursor and only based on the shape of its elution profile and isotopic envelope distribution. The metrics are then visualized in an interactive R shiny47,48 app, which allows the generate portable html reports. All metrics shown in this article are accessible with DO-MS and all figures resemble figures generated with DO-MS unless explicitly noted otherwise. An overview of all metric available in DO-MS can be found in the supplement (supp. table 2).

Systematic Optimization of Precursor Isolation Window Placement

In DIA experiments fragmentation spectra are highly complex due to parallel fragmentation of multiple precursors. To reduce complexity, the range of precursor masses is distributed across multiple MS2 windows, which need to be designed by the experimenter. While increasing the number of MS2 windows results in less complex spectra, it comes at the expense of an increased duty cycle length. The more MS2 scans are incorporated, the fewer data points are collected across each and every elution peak, impeding identification and optimal quantification. This trade-off needs to be optimized in a context-specific manner, depending on the sample complexity, abundance, choice of chromatography and gradient length.

DO-MS helps optimize this trade-off by systematically assessing the impact of different parameters with respect to multiple performance metrics at the same time. This is exemplified for a plexDIA experiment consisting of a 3-plex bulk lysate diluted down to the single cell level, Fig. 2. The fastest duty cycle with a single MS1 and two MS2 scans has a duration of approximately 0.9 seconds, which allows for frequent sampling of the elution profile. This results in a higher chance to sample the elution apex and is reflected in the increased MS1 peak height compared to methods with more MS2 windows, Fig. 2A,B. An acquisition method with 16 MS2 scans samples precursors only every 5.1 seconds, and thus may fail to sample the elution peak apex (supp. table 1). This becomes evident when the intensity of the same peptide is compared across runs. The median ratio between shared peptides is more than two-fold lower for a method with more than 12 MS2 windows compared to 2 MS2 windows, Fig. 2B. In contrast, optimal sampling of the elution apex requires more frequent sampling, which comes at the cost of fewer MS2 isolation windows. Indeed, sampling the most intense precursor signal is achieved in our experiment when using only 2 isolation windows. At the same time, such acquisition method distributes fragment ions across only two isolation windows, resulting in high co-isolation, and reduced proteome coverage. DO-MS allows to systematically and comprehensively explore this inherent trade-off between proteome coverage and sampling elution peak apexes.

Figure 2 |. Optimizing the number of MS2 windows in the duty cycle of plexDIA methods.

Figure 2 |

Example DO-MS output for a plexDIA experiment using 3-plex bulk lysate diluted down to the single-cell level with different numbers of MS2 windows. All intensities were extracted as peak heights. A Histogram of precursor (MS1) intensities for each plexDIA channel shown separately. B Distributions of ratios between precursor intensities for precursors identified across all conditions. All ratios are displayed on Log2 scale relative to the first condition. C The Total number of identified precursors per run is shown. Numbers are shown for precursors with MS1 (yellow) and MS2 (red) level quantification. D The number of protein identifications in a plexDIA set is shown for each non-isobarically labeled sample (channel). Proteins shared across all three sets and the entirety of all proteins across sets is shown in grey. Identifications which were propagated within the set are highlighted with lighter colours.

For the chosen chromatography and specimen, the DO-MS report indicates that the largest number of precursors is identified with an acquisition method of 6, 8 or 10 MS2 windows, Fig. 2C. Across all three channels about 10,000 precursors are identified on the MS2 level and quantified on the MS1 level. As we required MS2 information for sequence identification, our identifications did not benefit from the higher temporal resolution of MS1 scans and this identifications cannot exceed the number of MS2 identifications. The results indicate that overall performance balancing quantification and coverage depth is best when using 4 or 6 MS2 scans, Fig. 2. This trade-off may be mitigated by using multiple MS1 scans per duty cycle26,27, and such methods optimized by DO-MS using the metrics displayed in Fig. 2.

Data Driven Optimization of Window Placement

DO-MS also allows for refinement of the precursor isolation window placement, Fig. 3. The MS2 windows can be selected to utilize equal m/z ranges49 or to optimize the distribution of ions across MS2 windows and thereby increase the proteome coverage18,50. Recently, even dynamic on-line optimization has been proposed51. The metrics provided by DO-MS allow users to implement previously suggested strategies or develop new ones and to continuously monitor the performance, including metrics which are often not easily accessible.

Figure 3 |. Optimizing MS2 window placement.

Figure 3 |

A 3-plex experiment of 100 cell equivalent bulk lysate was analysed with 8 MS2 windows whose ranges were chosen to achieve equal distribution of (i) m/z range, (ii) ion current per window or (iii) number of precursors. A The total number of precursors identified on the MS2 level and quantified on the MS1 level is shown for the three different strategies. B The average MS2 accumulation time is shown for every MS2 window across the retention time.

As the distribution of peptide masses is not uniform across the m/z range, equal-sized isolation windows will result in more precursors per window in the lower m/z range. Thus, placement of isolation window across an equal m/z range is likely suboptimal, as manifested by lower proteome coverage shown in Fig. 3A. One of the reason for this is the associated suboptimal MS2 accumulation time, which is limited by the capacity of the ion trap. When analysing a 3-plex experiment of 100 cell equivalent bulk lysate, the lowest m/z windows will fill up in a few milliseconds, while windows with higher m/z will accumulate ions for the maximum accumulation time of 251 ms, Fig. 3B. This leads to complex fragmentation spectra, loss in sensitivity in lower mass ranges and unused ion capacity in higher m/z ranges. The effect of accumulation times on the sensitivity is likewise reflected in the lower coverage of the proteome at the MS1- than at the MS2-level. The wider isolation windows at the MS1 level leads to shorter accumulation before the maximum ion trap capacity is reached. This limits sensitivity and leads to fewer quantified peptides at the MS1 than MS2 level (See also supplementary full DO-MS Report).

Windows placed based on an equal total ion current (TIC) per window, determined in a previous experiment, or based on the precursor m/z can lead to improved proteome coverage. The metrics available in DO-MS, such as accumulation times, data completeness and number of identifications as a function of FDR, allow evaluating different choices of window placement, detecting bottlenecks and improving them.

Optimizing Chromatographic Profile and Length

To reduce the complexity of peptide sample mixtures, dimensions of separation including liquid chromatography or gas phase fractionation like trapped ion mobility spectrometry are used. Separation by liquid chromatography has been the default separation method for MS proteomics. The improved separation with longer gradients comes at the cost of increased measurement time. DO-MS allows to balance this trade-off and to perform routine quality control on peptide separation.

Longer LC gradients improve proteome coverage in DIA in two different ways. First, longer gradients lead to better separation of different peptide species reducing coelution of interferring species and improving spectral quality. Second, it leads to elongation of elution profiles resulting in precursors being sampled for a longer duration. This allows to sample each ion species less frequently and gives room for more specific isolation, improving spectral quality. Thereby, while identifying fewer peptides per unit time, longer gradients facilitate identifying more peptides per sample. The general trend is shown by the DO-MS output for a 3-plex 100-cell equivalent bulk dilution analyzed with 15, 30 and 60 minutes of active gradient using the same duty cycle, Fig. 4. One benefit of the longer gradients can be seen when the ion accumulation time of the Orbitrap instrument is plotted as a function of the retention time, Fig. 4A. Longer gradients distribute the analytes and lead to longer accumulation of ions, before the maximum capacity is reached. Individual spectra therefore contain fewer ion species and sample sufficient ions even from low abundant peptides. This improves not only the absolute numbers of identifications but also the fraction of precursors quantified at the MS1-level, Fig. 4B.

Figure 4 |. Optimizing gradient profile and length.

Figure 4 |

DO-MS allows to optimize the LC gradient of experiments based on metrics capturing the whole LC-MS workflow. A The distribution of MS1 accumulation times across the LC gradient. B Number of quantified precursors in relation to the gradient length. C Number of identified precursors by the search engine across the gradients and D ion features identified by Dinosaur. E Ion map displaying the TIC and mean m/z (red curve) as a function of the retention time. All data are from 100× 3-plexDIA samples as described in the methods.

DO-MS also allows to optimize the slope and profile of the gradient to evenly distribute ions across a gradient while keeping its duration constant. Depending on the sample, peptides might not elute evenly across the gradient. This information becomes accessible in three different ways. DO-MS reports the accumulation time of the ion trap Fig. 4A, peptide identifications across the gradient Fig. 4C, and peptide like features or potential contaminants assembled by Dinosaur across the gradient Fig. 4D.

Having access to gradient specific parameters facilitates effective quality control and problem identification. Identified MS1 features provide useful information for ion clusters not assigned to a peptide sequence including singly charged species and peptide-like ions not mapped to a sequence, Fig. 4D. This can be useful to identify contaminants31 and estimate the ions accessible to MS analysis that may be interpreted by improved algorithms8,52. The binned TIC output allows to identify errors in the method setup and gives a quick overview of the sampled mass range, Fig. 4E.

Improving Sampling Using Additional Survey Scans

The conflict between reducing spectral complexity and increasing the number of data points per peak mentioned in Fig. 2 can be partially alleviated by increasing the number of survey scans27. When duty cycles are long, more frequent sampling on the MS1 level can increase the fraction of precursors with MS1 information and the probability of sampling close to the elution apex19,26. The DO-MS framework can be used to assess the contribution of such additional MS1 scans to improved precursor sampling.

The effect can be exemplified based on 3-plexDIA set whose samples correspond to 100-cells per channel analyzed analyzed with 60 minutes of active gradient. A method with a single survey scan is compared to a method with two survey scans evenly distributed between the eight MS2 scans, Fig. 5A. The additional survey scan increases the duty cycle length only marginally, while increasing the frequency of precursor sampling almost 2-fold. Thus, the adapted method increases the probability that precursors are sampled close to their elution apex and that peptides with a shorter elution profile and potentially lower intensity can be quantified on the MS1 level, which would be otherwise missed. These expectations are supported by the results shown in Fig. 5BD.

Figure 5 |. Effect of additional survey scans per duty cycle.

Figure 5 |

Data acquisition methods can employ multiple survey scans to improve precursor sampling and reduce stochastic sampling effect. A Diagrams of a duty cycle with a single survey scan (orange) and a duty cycle with two survey scans (blue). B All peptide like features identified by Dinosaur46 are displayed with their elution length at base and MS1 intensity. The associated marginal distributions are shown. The additional survey scan allows to detect many additional peptide-like features with shorter elution profile. C The MS1 intensity of intersected precursors is increased upon introduction of an additional survey scan. D The fraction of MS1 quantified precursors is increased with additional survey scans while maintaining the total number of identifications, independent of the slightly increased duty cycle time. The data shows a 100-cell equivalent 3-plex dataset acquired on 60 min active gradient as described in the methods. Panel B was plotted outside of DO-MS using the peptide-like feature information as stated in the methods.

More survey scans lead to almost doubling the number of identified peptide-like features, with the increase being particularly pronounced for features with short elution length, Fig. 5B. The improvements also result higher MS1 intensity estimates by the search engine for intersected precursors since more precursors are samples close to their apexes. Furthermore, a larger fraction of precursors are quantified at the MS1, Fig. 5C,D. These improvements are observed without associated negative effects due to the longer overall duty cycle. These results indicate that the duty cycle with 2 MS1 survey scans outperforms the one with single MS1 survey scans.

Quality Control for Routine Sample Acquisition

When acquiring large datasets, it is important to continuously monitor the performance of the acquisition method and identify potential failed experiments37. This monitoring for plexDIA experiments should include metrics for each labeled sample i.e., channel level metrics.

DO-MS provides a convenient way to perform such quality control, exemplified by the single-cell plexDIA set by Derks et al.26 shown in Fig. 6. Using nPOP sample preparation53, 10 sets with 3 single cells each were prepared and measured on a timsTOF instrument, resulting in about 1,000 quantified proteins per single cell on average, Fig. 6A. As plexDIA can benefit from translating precursor identifications between channels26,27, the impact of translation on identifications and data completeness is reported by DO-MS. With single cells it is vital to identify potential dropouts where sample preparation might have failed and exclude them from processing. One useful metric for this is the precursor intensity distribution for every single cell, which is displayed by DO-MS, Fig. 6B. Another metric to assess the single-cell proteome quality is the quantification variability between peptides originating from the same protein, which has been proposed as a metric for single-proteome quality54, Fig. 6C. In this dataset, the cells in channel Δ0, set 06 and Δ8, set 10 show both lower number of proteins before translation and a higher quantification variability, and should potentially be excluded from further analysis.

Figure 6 |. Routine quality control.

Figure 6 |

When acquiring data from a large number of single cells, DO-MS can be used to get a quick overview of the quality of the processing results. A Number of protein identifications per single cell before and after translating identifications between channels. Only identifications quantified on the MS1 level are shown. B Channel wise intensity distribution of identified precursors. C Quantification variability calculated as the coefficient of variation between peptides of the same protein. The report was generated from the data published by Derks et al.26 for 10 single-cell 3-plex sets analyzed on a timsTOF instrument.

Conclusion

The DO-MS framework provides a systematic approach to benchmarking, optimizing, and reporting results from label free and multiplexed DIA-MS. We exemplified how key method parameters such as the number of precursor scans or isolation window placement can be benchmarked and optimized. DO-MS aims to foster understanding from first-principles, considering fundamental trade-offs such as spectral complexity and sampling frequency. By adopting this approach, it becomes possible to design methods tailored to specific application needs, such as emphasizing data completeness, quantitative accuracy, or proteome depth. DO-MS should enable broader adoption of cutting edge methods DIA and plexDIA methods for driving biological research55.

Methods

Data Acquisition

Apart from the 30 single cells acquired on the timsTOF as part of plexDIA, all samples consist of bulk cellular lysates diluted down to the respective number of single-cell equivalents by assuming a 250pg of protein per cell. Melanoma cells (WM989-A6-G3, a kind gift from Arjun Raj, University of Pennsylvania), U-937 cells (monocytes), and HPAF-II cells (PDACs, American Type Culture Collection (ATCC), CRL-1997) were cultured as previously described by Derks et al.26 - methods - cell culture. Cells were harvested, processed, and labeled with mTRAQ as described by Derks et al.26 - methods - Preparation of bulk plexDIA samples.

All bulk data was acquired on the Thermo Fisher Scientific Q-Exactive Classic Orbitrap mass spectrometer. Samples consisting of 1-μl were injected with the Dionex UltiMate 3000 UHPLC using 25 cm×75μm IonOpticks Aurora Series UHPLC column (AUR2–25075C18A). Two buffers A and B were used with buffer A made of 0.1% formic acid (Pierce, 85178) in LC–MS-grade water and buffer B made of 80% acetonitrile and 0.1% formic acid mixed with LC–MS-grade water.

Systematic optimization of precursor isolation windows

A combined sample consisting of one single-cell equivalent PDAC lysate labeled with mTRAQd0, one single-cell equivalent U937 lysate labeled with mTRAQd4 and one single-cell equivalent Melanoma lysate labeled with mTRAQd8 was injected with 1ul volume. Liquid chromatography was performed with 200nl/min for 30 minutes of active gradient starting with 4% Buffer B (minutes 0–2.5), 4–8% B (minutes 2.5–3), 8–32% B (minutes 3–33), 32–95% B (minutes 33–34), 95% B (minutes 34–35), 95–4% B (minutes 35–35.1), 4% B (minutes 35.1–53). All acquisition methods had a single MS1 scan covering the range of 380mz-1400mz followed by DIA MS2 scans: 2xMS2 starting at 380mz: 240Th, 780Th width; 4xMS2 starting at 380mz: 120Th, 120Th, 200Th, 580Th width; 6xMS2 starting at 380mz: 80Th, 80Th, 80Th, 120Th, 240Th, 420Th width; 8xMS2 starting at 380mz: 60Th, 60Th, 60Th, 60Th, 100Th, 100Th, 290Th, 290Th width; 10xMS2 starting at 380mz: 50Th, 50Th, 50Th, 50Th, 50Th, 75Th, 75Th, 150Th, 150Th, 320Th width; 12xMS2 starting at 380mz: 40Th, 40Th, 40Th, 40Th, 40Th, 40Th, 60Th, 60Th, 120Th, 120Th, 210Th, 210Th width; 16xMS2 starting at 380mz: 30Th, 30Th, 30Th, 30Th, 30Th, 30Th, 30Th, 30Th, 50Th, 50Th, 50Th, 50Th, 145Th, 145Th, 145Th, 145Th width. All MS1 and MS2 scans were performed with 70,000 resolving power, 3×106 AGC maximum, 300-ms maximum accumulation time, NCE at 27%, default charge of 2, and RF S-lens was at 80%.

Data Driven optimization of window placement

A combined sample consisting of 100 single-cell equivalents of PDAC, U937, and Melanoma cells were labled with mTRAQd0, mTRAQd4 and mTRAQd8 respectively. Liquid chromatography was performed with 200nl/min for 30 minutes of active gradient starting with 4% Buffer B (minutes 0–2.5), 4–8% B (minutes 2.5–3), 8–32% B (minutes 3–33), 32–95% B (minutes 33–34), 95% B (minutes 34–35), 95–4% B (minutes 35–35.1), 4% B (minutes 35.1–53). Both MS1 and MS2 scans covered a range of 380mz to 1400mz with a single MS1 scan and 8 MS2 scans. The distribution of precursors was determined based on DO-MS report using equal sized windows, starting at 380mz: 127.5Th, 127.5Th, 127.5Th, 127.5Th, 127.5Th, 127.5Th, 127.5Th, 127.5Th width. MS2 windows where then distributed to have equal TIC based on the DO-MS output: starting at 380mz: 100Th, 64Th, 61Th, 66Th, 91Th, 100Th, 153Th, 385Th. For the equal number of precursors, the original sample was searched with DIANN as described and MS2 windows were distributed to have an equal number of precursors: starting at 380mz: 84Th, 63Th, 49Th, 66Th, 59Th, 101Th, 176Th, 422Th. All MS1 and MS2 scans were performed with 70,000 resolving power, 3×106 AGC maximum, 251-ms maximum accumulation time, NCE at 27%, default charge of 2, and RF S-lens was at 80%.

Optimizing gradient profile and length

A combined sample consisting of 100 single-cell equivalents of PDAC, Melanoma and U937 were labled with mTRAQd0, mTRAQd4 and mTRAQd8 respectively. Liquid chromatography was performed with 200nl/min flow rate starting with 4% Buffer B (minutes 0–2.5) followed by 4–8% B (minutes 2.5–3). The active gradient with 8% buffer B to 32% buffer B stretched across 15, 30 and 60 minutes followed by a 1 minute 32–95% B ramp, 1 minute at 95% and 18 minutes at 4% B. All acquisition methods had a single MS1 scan covering the range of 478mz-1500mz followed by 8 DIA MS2 scans: starting at 380mz: 60Th, 60Th, 60Th, 60Th, 100Th, 100Th, 290Th, 290Th. All MS1 and MS2 scans were performed with 70,000 resolving power, 3×106 AGC maximum, 300-ms maximum accumulation time, NCE at 27%, default charge of 2, and RF S-lens was at 80%.

Effect of additional survey scans

A 100 single-cell equivalent of each, PDAC, U937 and Melanoma cells were labeled with mTRAQd0, mTRAQd4 and mTRAQd8 respectively and injected in a volume of 1ul. Liquid chromatography was performed with 200nl/min for 30 minutes of active gradient starting with 4% Buffer B (minutes 0–2.5), 4–8% B (minutes 2.5–3), 8–32% B (minutes 3–63), 32–95% B (minutes 63–64), 95% B (minutes 64–65), 95–4% B (minutes 65–65.1), 4% B (minutes 65.1–83). A single MS1 scan with a range of 478mz-1500mz was followed by MS2 scans starting at 380mz with 60Th, 60Th, 60Th, 60Th, 100Th, 100Th, 290Th, 290Th width. For the method with increased MS1 sampling, a second MS1 scan was incorporated after the fourth MS2 scan. All MS1 and MS2 scans were performed with 70,000 resolving power, 3×106 AGC maximum, 251-ms maximum accumulation time, NCE at 27%, default charge of 2, and RF S-lens was at 80%.

Data Analysis

Data was analysed using DIA-NN 1.8.1 using the 5,000 protein group human-only spectral library published previously by Derks et al.26 - methods - Spectral library generation. Data was then processed with DO-MS. For preprocessing of Orbitrap data DO-MS uses ThermoRawFileParser 1.4.0 to convert the proprietary raw format to the open mzML standard and Dinosaur 1.2.0 for feature detection. All other preprocessing steps are performed in the Python programming language version 3.10 and makes use of its extensive ecosystem for scientific programming including Numpy, Pandas, pymzML and scikit-learn. All plots were created in DO-MS which utilizes the R programming language version 4.3.1. Figure 5B was created using matplotlib.

Data completeness is shown for all pairwise comparisons in a plex DIA set. It is calculated as the Jaccard index between two sets of identifications A and B given by:

J(A,B)=|AB||AB|

Acknowledgments

We thank Luke Khoury for support with sample processing and acquisition and Jason Derks for sample preparation. The work was funded by an Allen Distinguished Investigator award through The Paul G. Allen Frontiers Group to N.S., a Seed Networks Award from CZI CZF2019–002424 to N.S., an NIGMS award R01GM144967 to N.S., and an NCI award UG3CA268117 to N.S.

Footnotes

Competing interests

Nikolai Slavov is a founding director and CEO of Parallel Squared Technology Institute, which is a non-profit research institute.

Availability

Further documentation on the use of DO-MS is available at do-ms.slavovlab.net. The current version 2.0 is open source and freely available at github.com/SlavovLab/DO-MS. All data shown as example application is available at do-ms.slavovlab.net/docs/DO-MS examples. The 30 single cells plexDIA dataset acquired on the timsTOF has been published as part of plexDIA and is available at http://scp.slavovlab.net/Derks_et_al_2022. blue All other data acquired for this study has been deposited on MassIVE under the accession MSV000091733.

References

  • 1.MacCoss M. J. et al. Sampling the proteome by emerging single-molecule and mass spectrometry methods. en. Nat. Methods 20, 339–346. https://www.nature.com/articles/s41592-023-01802-5 (Mar. 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Eng J. K., McCormack A. L. & Yates J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the american society for mass spectrometry 5, 976–989 (1994). [DOI] [PubMed] [Google Scholar]
  • 3.Huffman R. G. et al. Prioritized mass spectrometry increases the depth, sensitivity and data completeness of single-cell proteomics. en. Nat. Methods. ISSN: 1548–7091, 1548–7105. 10.1038/s41592-023-01830-1 (Apr. 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Extending the sensitivity, consistency and depth of single-cell proteomics. en. Nat. Methods. ISSN: 1548–7091, 1548–7105. 10.1038/s41592-023-01786-2 (Apr. 2023). [DOI] [PubMed] [Google Scholar]
  • 5.Venable J. D., Dong M.-Q., Wohlschlegel J., Dillin A. & Yates J. R. Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. en. Nature Methods 1, 39–45. ISSN: 1548–7105. (2020) (Oct. 2004). [DOI] [PubMed] [Google Scholar]
  • 6.Dong M.-Q. et al. Quantitative Mass Spectrometry Identifies Insulin Signaling Targets in C. elegans. Science 317, 660–663 (2007). [DOI] [PubMed] [Google Scholar]
  • 7.Ludwig C. et al. Data-independent acquisition-based SWATH-MS for quantitative proteomics: a tutorial. en. Mol. Syst. Biol. 14, e8126 (Aug. 2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Slavov N. Driving Single Cell Proteomics Forward with Innovation. Journal of Proteome Research 20, 4915–4918. 10.1021/acs.jproteome.1c00639 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Slavov N. Increasing proteomics throughput. Nature Biotechnology 39, 809–810. 10.1038/s41587-021-00881-z (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tsou C.-C. et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. en. Nat. Methods 12, 258–64, 7 p following 264 (Mar. 2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bruderer R. et al. Extending the Limits of Quantitative Proteome Profiling with Data-Independent Acquisition and Application to Acetaminophen-Treated Three-Dimensional Liver Microtissues. en. Mol. Cell. Proteomics 14, 1400–1410 (May 2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Egertson J. D., MacLean B., Johnson R., Xuan Y. & MacCoss M. J. Multiplexed peptide analysis using data-independent acquisition and Skyline. en. Nat. Protoc. 10, 887–903 (June 2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Demichev V., Messner C. B., Vernardis S. I., Lilley K. S. & Ralser M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nature methods 17, 41–44 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sinitcyn P. et al. MaxDIA enables library-based and library-free data-independent acquisition proteomics. Nature Biotechnology, 1–11 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Cox J. Prediction of peptide mass spectral libraries with machine learning. Nature Biotechnology. ISSN: 1546–1696. 10.1038/s41587-022-01424-w (Aug. 2022). [DOI] [PubMed] [Google Scholar]
  • 16.Distler U. et al. midiaPASEF maximizes information content in data-independent acquisition proteomics 2023.
  • 17.Szyrwiel L., Sinn L., Ralser M. & Demichev V. Slice-PASEF: fragmenting all ions for maximum sensitivity in proteomics. bioRxiv. eprint: https://www.biorxiv.org/content/early/2022/10/31/2022.10.31.514544.full.pdf. https://www.biorxiv.org/content/early/2022/10/31/2022.10.31.514544 (2022). [Google Scholar]
  • 18.Skowronek P. et al. Rapid and In-Depth Coverage of the (Phospho-)Proteome With Deep Libraries and Optimal Window Design for dia-PASEF. Molecular & Cellular Proteomics 21, 100279. ISSN: 1535–9476 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Xuan Y. et al. Standardization and harmonization of distributed multi-center proteotype analysis supporting precision medicine studies. Nature Communications 11, 5248 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Demichev V. et al. dia-PASEF data analysis using FragPipe and DIA-NN for deep proteomics of low sample amounts. en. Nat. Commun. 13, 3944 (July 2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Li Y. et al. An integrated strategy for mass spectrometry-based multiomics analysis of single cells. Analytical Chemistry 93, 14059–14067 (2021). [DOI] [PubMed] [Google Scholar]
  • 22.Gebreyesus S. T. et al. Streamlined single-cell proteomics by an integrated microfluidic chip and data-independent acquisition mass spectrometry. Nature Communications 13, 37 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Brunner A.-D. et al. Ultra-high sensitivity mass spectrometry quantifies single-cell proteome changes upon perturbation. en. Mol. Syst. Biol. 18, e10798 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Phlairaharn T. et al. High Sensitivity Limited Material Proteomics Empowered by Data-Independent Acquisition on Linear Ion Traps. J. Proteome Res. 21, 2815–2826 (Nov. 2022). [DOI] [PubMed] [Google Scholar]
  • 25.Slavov N. Learning from natural variation across the proteomes of single cells. PLOS Biology 20, 1–4. 10.1371/journal.pbio.3001512 (Jan. 2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Derks J. et al. Increasing the throughput of sensitive proteomics by plexDIA. Nature Biotechnology. 10.1038/s41587-022-01389-w (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Derks J. & Slavov N. Strategies for increasing the depth and throughput of protein analysis by plexDIA. Journal of Proteome Research 22, 697–705. 10.1021/acs.jproteome.2c00721 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Singh A. Sensitive protein analysis with plexDIA. en. Nat. Methods 19, 1032 (Sept. 2022). [DOI] [PubMed] [Google Scholar]
  • 29.Framework for multiplicative scaling of single-cell proteomics. en. Nat. Biotechnol., 1–2. https://www.nature.com/articles/s41587-022-01411-1 (July 2022). [DOI] [PubMed] [Google Scholar]
  • 30.Ludwig C. et al. Data-independent acquisition-based SWATH-MS for quantitative proteomics: a tutorial. Molecular Systems Biology 14, e8126 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Huffman G., Chen A. T., Specht H. & Slavov N. DO-MS: Data-Driven Optimization of Mass Spectrometry Methods. J. of Proteome Res. 18, 2493–2500. 10.1021/acs.jproteome.9b00039 (June 2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bittremieux W., Valkenborg D., Martens L. & Laukens K. Computational quality control tools for mass spectrometry proteomics. en. PROTEOMICS 17, 1600159. (2019) (2017). [DOI] [PubMed] [Google Scholar]
  • 33.Trachsel C. et al. rawDiag: An R Package Supporting Rational LC–MS Method Optimization for Bottom-up Proteomics. Journal of Proteome Research. ISSN: 1535–3893. 10.1021/acs.jproteome.8b00173 (July 2018). [DOI] [PubMed] [Google Scholar]
  • 34.Bielow C., Mastrobuoni G. & Kempa S. Proteomics Quality Control: Quality Control Software for MaxQuant Results. Journal of Proteome Research 15, 777–787. ISSN: 1535–3893 (Mar. 2016). [DOI] [PubMed] [Google Scholar]
  • 35.Soneson C., Iesmantavicius V., Hess D., Stadler M. B. & Seebacher J. einprot: flexible, easy-to-use, reproducible workflows for statistical analysis of quantitative proteomics data. bioRxiv (2023). [Google Scholar]
  • 36.Slavov N. Single-cell protein analysis by mass spectrometry. Current Opinion in Chemical Biology 60, 1–9. ISSN: 1367–5931. 10.1016/j.cbpa.2020.04.018 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Gatto L. et al. Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experiments. Nat. Methods 20, 375–386. 10.1038/s41592-023-01785-3 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Specht H. & Slavov N. Optimizing Accuracy and Depth of Protein Quantification in Experiments Using Isobaric Carriers. Journal of Proteome Research 20. 880–887. 10.1021/acs.jproteome.0c00675 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Petelski A. A. et al. Multiplexed single-cell proteomics using SCoPE2. Nature Protocols 16, 5398–5425. 10.1038/s41596-021-00616-z (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Slavov N. Scaling Up Single-Cell Proteomics. Molecular & Cellular Proteomics 21, 100179. ISSN: 1535–9476. 10.1016/j.mcpro.2021.100179 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hulstaert N. et al. ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion. J Proteome Res 19, 537–542 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Zubarev R. A. & Makarov A. Orbitrap mass spectrometry. Anal Chem 85, 5288–96. ISSN: 1520–6882 (Electronic) 0003–2700 (Linking) (2013). [DOI] [PubMed] [Google Scholar]
  • 43.Martens L. et al. mzML a community standard for mass spectrometry data. Mol Cell Proteomics 10, R110 000133 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Adusumilli R. & Mallick P. in Proteomics: Methods and Protocols (eds Comai L., Katz J. E. & Mallick P.) 339–368 (Springer; New York, New York, NY, 2017). ISBN: 978–1–4939–6747–6. [Google Scholar]
  • 45.Rossum G. v. Python tutorial. technical Report CS-R9526, entrum voor Wiskunde en Informatica (CWI), Amsterdam, (1995). [Google Scholar]
  • 46.Teleman J., Chawade A., Sandin M., Levander F. & Malmström J. Dinosaur: A Refined Open-Source Peptide MS Feature Detector. Journal of Proteome Research 15, 2143–2151 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.R Core Team. R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing (Vienna, Austria, 2022). https://www.R-project.org/. [Google Scholar]
  • 48.Chang W. et al. shiny: Web Application Framework for R R package version 1.7.2.9000 (2022). https://shiny.rstudio.com/.
  • 49.Gillet L. C. et al. Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis. Molecular & Cellular Proteomics 11, O111.016717. ISSN: 1535–9476 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Kawashima Y. et al. Optimization of Data-Independent Acquisition Mass Spectrometry for Deep and Highly Sensitive Proteomic Analysis. International Journal of Molecular Sciences 20. ISSN: 1422–0067. https://www.mdpi.com/1422-0067/20/23/5932 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Heil L. R. et al. Dynamic Data Independent Acquisition Mass Spectrometry with Real-Time Retrospective Alignment. bioRxiv (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Chen A. T., Franks A. & Slavov N. DART-ID increases single-cell proteome coverage. PLOS Computational Biology 15, 1–30. 10.1371/journal.pcbi.1007082 (July 2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Leduc A., Huffman R. G., Cantlon J., Khan S. & Slavov N. Exploring functional protein covariation across single cells using nPOP. Genome Biology 23, 261. 10.1186/s13059-022-02817-5 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Specht H. et al. Single-cell proteomic and transcriptomic analysis of macrophage heterogeneity using SCoPE2. Genome Biology 22 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Slavov N. Single-cell proteomics: quantifying post-transcriptional regulation during development with mass-spectrometry. Development 150, dev201492. ISSN: 0950–1991. 10.1242/dev.201492 (June 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Further documentation on the use of DO-MS is available at do-ms.slavovlab.net. The current version 2.0 is open source and freely available at github.com/SlavovLab/DO-MS. All data shown as example application is available at do-ms.slavovlab.net/docs/DO-MS examples. The 30 single cells plexDIA dataset acquired on the timsTOF has been published as part of plexDIA and is available at http://scp.slavovlab.net/Derks_et_al_2022. blue All other data acquired for this study has been deposited on MassIVE under the accession MSV000091733.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES