pyQms enables universal and accurate quantification of mass spectrometry data

Johannes Leufken; Anna Niehues; L Peter Sarin; Florian Wessel; Michael Hippler; Sebastian A Leidel; Christian Fufezan

doi:10.1074/mcp.M117.068007

. 2017 Jul 20;16(10):1736–1745. doi: 10.1074/mcp.M117.068007

pyQms enables universal and accurate quantification of mass spectrometry data^*

Johannes Leufken ^‡,^§, Anna Niehues ^‡, L Peter Sarin ^§, Florian Wessel ^‡,^¶, Michael Hippler ^‡, Sebastian A Leidel ^§,^‖,^**,^§§, Christian Fufezan ^‡,^‡‡,^§§

PMCID: PMC5629261 PMID: 28729385

Abstract

Quantitative mass spectrometry (MS) is a key technique in many research areas (1), including proteomics, metabolomics, glycomics, and lipidomics. Because all of the corresponding molecules can be described by chemical formulas, universal quantification tools are highly desirable. Here, we present pyQms, an open-source software for accurate quantification of all types of molecules measurable by MS. pyQms uses isotope pattern matching that offers an accurate quality assessment of all quantifications and the ability to directly incorporate mass spectrometer accuracy. pyQms is, due to its universal design, applicable to every research field, labeling strategy, and acquisition technique. This opens ultimate flexibility for researchers to design experiments employing innovative and hitherto unexplored labeling strategies. Importantly, pyQms performs very well to accurately quantify partially labeled proteomes in large scale and high throughput, the most challenging task for a quantification algorithm.

Current mass spectrometric workflows use a plethora of labeling strategies (Fig. 1). Established examples are label-free quantification (Fig. 1A) and metabolic labeling with partially or fully enriched isotopes (Figs. 1B and 1C). Furthermore, labeled molecules can be added to the culture, as in stable isotope labeling with amino acids in cell culture (2) or can be introduced in vitro by chemical tagging, e.g. TMT10 (3, 4) (Fig. 1D). For certain research areas and labeling strategies, powerful data analysis tools are well established (5–14). However, existing software solutions are generally not universal as they have been tailored to specific research fields and are often restricted to defined experimental protocols (see Supplemental Table S1 for a summary). For example, some software can quantify peptides with artificial isotope distributions, but not metabolites, or can quantify molecules when they are metabolically labeled with nitrogen isotope ¹⁵N, but not if labeled with carbon isotope ¹³C. Finally, some labeling strategies can currently not be combined within the same experiment (Figs. 1E and 1F).

These limitations, however, are artificial since all quantified entities are molecules defined by chemical formulas with known isotope distributions and masses. pyQms takes advantage of this knowledge and treats all molecules as formulas to calculate accurate isotope patterns that are based on the labeling strategy. This liberates the algorithm to perform analyses irrespective of the type of molecule (protein, metabolite, lipid, glycan, etc.), the type of label (metabolic or fixed) or MS level. Accordingly, there is no restriction in quantifying any combination of labels within the same experiment, paving the way for innovative experimental designs that would be precluded with most quantification tools. pyQms has been evaluated for accuracy and sensitivity in label-free proteomics (Supplemental Figs. 1 A–1C) and for pulse (chase) metabolic labeling data analysis using a novel partially labeled proteome gold standard data set (Supplemental Figs. 1 D–1J).

EXPERIMENTAL PROCEDURES

Partially Labeled Proteome Gold Standard Data Set Sample Preparation

Chlamydomonas reinhardtii (strain CW15) cells were grown in photoheterotrophic conditions in tris-acetate-phosphate (TAP) medium (15) at a light intensity of 50 μE m⁻² s⁻¹ at 22 °C on a rotary shaker at 120 rpm or on TAP-agar plates containing 1.5% agar at a light intensity of 40–50 μE m⁻² s⁻¹. Metabolic labeling was performed by mixing unlabeled TAP medium with fully labeled TAP medium containing 100% ¹⁵N at different proportions (0, 20, 40, 60, 80, and 100% (w/w)). Fully labeled medium was created using 99.4% ¹⁵N enriched ¹⁵NH₄Cl (Cambridge Isotope Laboratories, Tewksbury, MA). Cells were grown several generations on ¹⁵N containing agar plates to ensure complete metabolic labeling with the defined ¹⁵N proportion. Cells were then grown for 3 days in liquid medium, maintaining the labeling proportion and harvested at 5000 × g (Beckmann Coulter J 20 XP), suspended in H6 buffer (5 mm HEPES, pH 7.5, 10 mm EDTA), and stored at −80 °C. Protein samples were digested with trypsin using a modified filter-aided sample preparation protocol (16) as described in Barth et al. (17) with the following modifications: Samples were mixed based on equal chlorophyll content (6.25 μg) and washing steps were repeated four times.

LC-MS/MS Measurement

Liquid chromatography coupled tandem mass spectrometry (LC-MS/MS) measurements were done employing a Dionex Ultimate 3000 UPLC system (Thermo Scientific) and Q Exactive Plus (Thermo Scientific, Bremen, Germany) instrument. Software versions, which were used are: Exactive Series (Tune) 2.3 Build 1765 and Xcalibur 3.0.63. Peptides were separated by reversed phase chromatography. Peptide samples were loaded on a trap column (Acclaim PepMap100, 300 μm × 5 mm, 5 μm particle size, 100 Å pore size; Thermo Scientific, Bremen, Germany). Samples were desalted utilizing a flow rate of 5 μl/min for 5 min using 2% (v/v) acetonitrile/0.05% (v/v) trifluoroacetic acid in ultrapure water. Peptide separation was done using a mobile phase composed of 0.1% (v/v) formic acid in ultrapure water (A) and 80% (v/v) acetonitrile/0.1% (v/v) formic acid in ultrapure water (B). The trap column was switched for peptide elution in-line with a C18 capillary column (Acclaim PepMap 100, 75 μm × 150 mm, 2 μm particle size, 100 Å pore size, Thermo Scientific, Bremen, Germany). The gradient used was: 2.5–35% B (90 min), 35–99% B (5 min), 99% B (5 min). Ions were generated by electrospray ionization. For full scans a resolution of 70,000 at m/z 200 was used (maximum injection time: 50 ms, automatic gain control target: 1e⁶, range: 400–1600 m/z). For each full scan, the 12 most abundant precursor ions (charge 2+ to 7+) were selected for fragmentation (MS²) by higher energy c-trap dissociation. For MS² scans a resolution of 17,500 at m/z 200 was used (maximum injection time: 50 ms, AGC target: 5e⁴, underfill ratio: 1%). A dynamic exclusion of 60 s for fragmented precursor ions was used. In total, 35 LC-MS/MS runs were recorded consisting of 636,910 MS¹ and 1,258,099 MS² scans.

Data Analysis

LC-MS/MS files in RAW format were converted to mzML (18, 19) using Proteome Discoverer (version 1.4.0.0). Subsequently, pymzML (20) was used to convert the mzML files to the mascot generic format (mgf), if required. It is noteworthy that although MS1 scans were recorded in profile mode, all spectra used for quantification in pyQms need to be transformed to a centroided format, which can be done automatically during conversion to mzML or alternatively if mzML parsing is performed using pymzML (20).

All peptide identification algorithms were executed using Ursgal (21), a Python framework for performing peptide identifications, statistical postprocessing, and data visualization using unified parameters. Briefly, peptides were identified using the algorithms OMSSA (version 2.19, (22)), X!Tandem (version 2013.09.01, (23)), MS-GF+ (version 9979, (24)), and MyriMatch (version 2.1.138 (25)). Default values were used for most of the search parameters. Precursor mass accuracy was set to 5 parts per million (ppm), fragmentation mass accuracy was set to 20 ppm. Trypsin was defined as protease. A shuffled-peptide-based target–decoy database (conserving trypsin cleavage sites) was generated as described previously (17) using Ursgal. The Chlamydomonas reinhardtii database from the Joint Genome Institute version 5.3.1/236 (26) with Augustus 11.6 (27), the chloroplastic (28) and mitochondrial proteome as well as a contaminant database (cRAP, (29)) were used for the generation of the target decoy database containing in total 19,537 (target) protein sequences (see supplemental material). Further search parameters were variable modifications: oxidation of methionine (+15.9949 Da) and acetylation of the N terminus (+42.0106 Da). No fixed modifications were defined. Two missed cleavage sites were permitted. Database searches were conducted for ¹⁴N and ¹⁵N labeling separately. The posterior error probability was determined for each peptide-spectrum match using Percolator (version 2.08, (30, 31)). All peptide-spectrum matches with posterior error probabilities ≤1% at the level of the database search engine were used for all subsequent analyses. A total of 19,976 unique peptides were identified, mapping to 18,285 unique chemical formulas. The MS proteomics data have been deposited to the ProteomeXchange (32) via the PRIDE partner repository with the dataset identifier PXD003236. Venn diagrams of identified peptides and overlap of the different samples can be found in Supplemental Figs. 2 and 3, respectively. Spectrum annotations of proteins with one peptide-spectrum match or one distinct peptide can be found in the supplemental material. All identified proteins and peptides, including sequence coverage can be found in Supplemental Tables S2 and S3.

Peptides were quantified using pyQms (v0.5.0). Retention-time (RT) alignment and enhancement defining RT windows for all peptides (Supplemental Table S4) was carried out using piqDB as described earlier (17).

pyQms scoring

The pyQms matching score (mScore) is based on the work of Gower (33). The matching and scoring is performed on the m/z values and the intensity values independently yielding two scores, i.e. S^mz and S^intensity. In both cases, each peak k is scored, comparing the measured value i with the calculated value j (Equation 1), whereas a perfect match is 1. Each peak of the isotopologue that has a relative intensity (relative to the maximum intensity isotope peak) r_k above the matching threshold (by default 1% of the maximum intensity isotope peak) is matched and scored.

S_{i j k} \in [0, 1]

(1)

The m/z Score: S^mz

For each peak k, the m/z similarity between measured value i and the calculated value j is defined as

S_{i j k}^{m z} = 1 - (\frac{δ_{i j k}^{m z}}{α})

(2)

Whereas δ_ijk^mz the difference in ppm between measured m/z_ik and calculated m/z_jk and α defines the range in ppm, in which the score decreases from 1 to 0 in a linear fashion. In principle, α is equal to the precision of the measurement defined by the user (pyQms parameter REL_MZ_RANGE, default 5 ppm, http://pyqms.readthedocs.io/en/latest/params.html). For example, if the difference between measured and theoretical m/z values would be 2.5 ppm, then the s_ijk^mz score for this peak k would be 0.5.

The total m/z score for all peaks termed S^mz is the weighted sum of all single similarity m/z scores s_ijk^mz (Equation 3). The weighting is defined by the theoretical intensity of the peak k relative to the highest peak in the theoretical isotope pattern, termed r_k.

S^{m z} = \frac{\sum_{}^{k} S_{i j k}^{m z} r_{k}}{\sum_{}^{k} r_{k}}

(3)

The Intensity Score: S^intensity

Prior to intensity scoring, the scaling factor σ is calculated by comparing the intensities of the measured i and calculated j intensities for all peaks k within the matching threshold (see above). This scaling factor is calculated by dividing the weighted sum of the measured intensity by the weighted sum of the theoretical intensities (Equation 4).

σ = \frac{\sum_{}^{k} i n t e n s i t y_{i k} r_{k}}{\sum_{}^{k} i n t e n s i t y_{j k} r_{k}}

(4)

Using this scaling factor, which is equal to the abundance of the measured molecule, one can calculate δ_ijk^intensity, which is the relative intensity error between measured and theoretical intensity for each peak k (Equation 5).

δ_{i j k}^{i n t e n s i t y} = \frac{| i n t e n s i t y_{i k} - σ \times i n t e n s i t y_{j k} |}{α \times i n t e n s i t y_{j k}}

(5)

The intensity score of peak k is then defined (Equation 6).

S_{i j k}^{i n t e n s i t y} = 1 - (\frac{δ_{i j k}^{i n t e n s i t y}}{1 - r_{k} + ϵ})

(6)

In analogy to the m/z score (s_ijk^mz), the denominator defines the range in which the peak-based intensity score decreases from 1 to 0. However, in contrast to the m/z score, the intensity error has to be weighted by the abundance of each peak (1 – r_k) as more abundant peaks can be measured more accurately than smaller peaks. Additionally, we introduced ϵ (pyQms parameter REL_I_RANGE, default 0.2), which represents the most conservative relative error applied to the most precisely measured peak (r_k = 1). Thus, the overall relative error (denominator) will increase with lower peaks (see online documentation http://pyqms.readthedocs.io/en/latest).

The total intensity score S^intensity is the weighted sum of all similarity scores k in analogy to the S^mz score:

S^{i n t e n s i t y} = \frac{\sum_{}^{k} S_{i j k}^{i n t e n s i t y} r_{k}}{\sum_{}^{k} r_{k}}

(7)

The Combined Final Score: mScore

The final score is termed mScore and is a sum of S^mz and S^intensity. However, because some machines can measure m/z much more accurately then intensities, we introduced ξ to allow for flexibilities depending on the type of mass spectrometer used. ξ (the pyQms parameter MZ_SCORE_PERCENTILE, default 0.4) is the fraction the S^mz score is weighted into the sum. Thus, the final mScore is defined as

m S c o r e = ξ S^{m z} + (1 - ξ) S^{i n t e n s i t y}

(8)

Statistical Evaluation of the Gold Standard Data Set

Statistical evaluation was performed by reducing and grouping the matched isotope pattern chromatograms (MICs). A MIC is defined as all spectra matches of a peptide with a certain charge state in one LC-MS/MS run. As the name suggests, an MIC is similar to a XIC but based on all matched isotope patterns instead of a single m/z value, thus containing an additional data dimension, that is, the mScore. Grouping of the MICs was based on the chemical formula and charge of the respective peptide. First, each MIC was reduced to one single match, which showed the highest mScore, yielding two values per MIC (i.e. score and intensity). Second, the reduced MICs belonging to one chemical formula and charge state were grouped depending on their ¹⁵N labeling into one of 11 label percentile bins ranging from 0–5, 6–15, 16–25, …, 96–100%. Third, each bin was reduced to a single match based on the maximum score. Finally, for each mixed sample, we considered only one ground truth at a time. Given this setup, we reduced the evaluation of a quantified chemical formula and charge combination for each ground truth in each sample to ten bins, for which the true and false positives/negatives were counted at different score thresholds. For this, MICs (with a certain charge in one MS run) were grouped as following: for true positives (TP), the expected labeling percentile bin has matches, and no other bin has matches; for false positives (FP), the expected bin has no matches, but another unexpected bin has matches; for false negative (FN) the expected bin has no match at all, and for true negative (TN) the unexpected bin has no match at all. False discovery rate (FDR) was defined as FP/(FP+TP), true positive rate (TPR, sensitivity) as TP/(TP+FN) and false positive rate (FPR) as FP/(FP+TN). In total, 13.9e⁶ MICs were grouped according to molecular formula, charge state, and LC-MS/MS run.

Label-Free Quantification

The data set from Bruderer et al. (34) was used to evaluate label-free peptide quantification performance and quality of pyQms for data-dependent acquisition (DDA)¹ and data-independent acquisition (DIA) mode. RAW MS files were obtained from www.peptideatlas.org and converted into mzML using msconvert, which is part of Proteowizard (version 3.0.7408, (35)). MaxQuant/Andromeda (version 1.4.1.2, (5)) peptide identification results were used from Bruderer et al. (34) and stored in piqDB. All peptides were subsequently quantified by pyQms using default parameters except an adjustment of the machine offset in ppm to correct for measuring error (21, 34). Carbamidomethylation of cysteine was defined as fixed modification. RT alignment (Supplemental Table S5), and enhancements were carried out using piqDB as described earlier (17, 36). Intensity alignment (Supplemental Table S6) of all samples was performed as described earlier (37). The signal intensity of a given peptide (i.e. peptide charge combination) within each MS run was defined as the maximum intensity over all spectra within the predefined retention time window. RT alignment functions and raw peptide amounts for the DDA data set can be found in Supplemental Tables S5 and S7, respectively. The linear correlation of peptide concentrations and their matched peptide intensities over a wide concentration range (0.8 fmol/μl–819.2 fmol/μl) can be found in Supplemental Figs. 4 and 5.

Peptide ratios (log₂) between LC-MS/MS runs were calculated using these raw quantification data (mScore > = 0.8). MaxQuant raw peptide quantification results were obtained from the supplement material of Bruderer et al. (34). Correlation plots between the MaxQuant and pyQms amounts for the spiked-in proteins and for the human background proteins can be found in Supplemental Figs. 6 and 7, respectively. In order to compare pyQms with MaxQuant, the peptide (peptide charge combinations) ratios were equally calculated using the raw peptide amounts reported by Bruderer et al. (34).

A similar procedure was applied to the DIA measurements of the same samples. For all peptides that are proteotypic to the spiked-in proteins, all fragment ions were determined and ions suitable for quantification were determined using an algorithm that will be described elsewhere. These selected fragment ions were quantified in the cycle window corresponding to their peptide precursor m/z. A complete table of all ions used for quantification can be found in Supplemental Table S8. DIA runs were intensity aligned with the same method as for the DDA runs as described earlier (37). DIA peptide quantification was based on summing up all fragment ion intensities (maximum intensity in each MS run within the retention time window). Only ions that could be quantified in both samples were taken into account for the ratio calculation. All pyQms peptide quantification results for the DIA data set can be found in Supplemental Table S9. Spectronaut DIA raw peptide amounts were taken from the supplement material of Bruderer et al. (34), and those were equally summed up at the peptide level. Finally, to ensure comparability, peptide ratios (log2) were calculated similar as for the pyQms results.

Peptide Ratio Evaluation

The data set published by Bruderer et al. (34) contains 12 non-human proteins spiked in a background of human HEK293 cells. The setup contained three different master mixes (MMs) spanning several orders of magnitudes of various protein concentration (MM1: 1.1 to 13.33 fold, MM2: 1- to 200-fold, MM3: 1- to 16,384-fold; eight concentrations in total for each MM). Three technical replicates for each of the eight different samples containing a distinct concentration of each master mix were measured by LC-MS/MS (Supplemental Fig. 1). pyQms was used to quantify the peptides belonging to the spiked-in proteins. Subsequently, the quantification was used to calculate ratios for all peptide charge combinations between all samples (see above), including all technical replicates. This resulted in 276 potential ratios for each peptide (defined by all combinations of the 24 measurements). All peptides belonging to one spiked-in protein were used to evaluate the observed and calculated log₂ ratio against the known ground-truth using a two-sided t test implemented in Scipy (www.scipy.org, (38)). At least three peptides (or peptide charge combination ratios) were required per sample comparison. We grouped the calculated p values into four bins that classify the quality of the quantification, i.e. the similarity to the expected ground truth: a p value (1) above 0.05 was classified as 'not significant different' and colored blue, (2) 0.05–0.01 (*) colored green, (3) 0.01–0.001 (**) colored yellow, and (4) smaller than 0.001 (***) colored red. This color scheme is consistent throughout this work (Fig. 2 and Supplemental Figs. 8–10).

Fig. 2. — **Label-free pyQms quantification of metabolites and peptides.** (A) Dilution series of chemically synthesized adenosine (axes plotted in log scale), each dilution comprising three technical replicates. Line shows the linear regression function (R² = 0.998). *Inset* shows axes plotted in linear scale; x-axes, dilution of nucleoside; y-axes, maximum intensity. (B) Example of the statistical evaluation of pyQms peptide quantifications of the Bruderer *et al.* data sets (DDA, *top left*; DIA, *bottom right*). Shown is the heat map for spiked-in protein P61823 (ribonuclease pancreatic, *Bos Taurus*, master mix 2). The colors reflect the p value for the two-sided t test obtained after testing whether the calculated peptide ratios between two samples (x and y axes, eight samples, ticks represent each three technical replicates) differ significantly from the ground truth ratios. Histograms at the axes show the spiked-in protein concentrations for the eight samples (master mix 2, for sample setup and other master mix composition refer to Supplemental Fig. 1 and Bruderer *et al.* (34), x-axes, sample; y-axes, concentration of spiked-in proteins. The number of peptide charge combination ratios (for DDA samples) or peptide ratios (DIA) used for the t test are shown as numbers in the bins. (C) Stacked bar plot for relative occurrences of the p values over all 12 spiked-in proteins across all three master mixes (MM1, MM2, and MM3). Legend: x axis, relative occurrence of p value; y axis, master mix, and data acquisition method combination (DDA or DIA). p value legend for (B) and (C), blue: reflecting the ground truth; green (p value ≤0.05), yellow (p value ≤0.01), and red (p value ≤0.001): not reflecting the ground truth.

Liquid Chromatography MS of Ribonucleosides

Chemically synthesized MS-grade adenosine (C₁₀O₄N₅H₁₃; Carbosynth, Ltd., Berkshire, UK) was dissolved in 5 mm ammonium formate, pH 5.3, at a concentration of 50 ng/μL and further diluted to yield the following concentrations; 1000 pg/μL, 500 pg/μL, 250 pg/μL, 125 pg/μL, 62.5 pg/μL, 31.25 pg/μL, 20 pg/μL, 10 pg/μL, 5 pg/μL, 2.5 pg/μL, 1 pg/μL, 0.5 pg/μL, 0.2 pg/μL, 0.1 pg/μL, 0.05 pg/μL, 0.02 pg/μL, and 0.01 pg/μL. 4 μL of each dilution was subjected to reversed phase LC-MS analysis on a self-packed 75 μm × 500 mm porous graphitic carbon column connected to a Q Exactive mass spectrometer (Thermo Scientific). Full MS spectra (m/z 100–700) were recorded with three technical replicates for each sample, and the resulting LC-MS runs were analyzed by pyQms. Full details of the chromatography and MS conditions will be published elsewhere.

Experimental Design and Statistical Rationale

The pulse (-chase) gold standard data set contains in total six samples (0, 20, 40, 60, 80, and 100% ¹⁵N labeling) representing six biological replicates. These samples were mixed (0I100, 20I100, 40I100, 0I60, and 0I80), resulting in three technical replicates for the 0% sample and in two technical replicates for the 100% sample. All other samples have no technical replicate within these mixes. Furthermore, each of the seven filter-aided sample preparation fractions (16) was measured individually in order to achieve higher sensitivity. In total 35 LC-MS/MS runs, including 636,910 MS¹ and 1,258,099 MS² scans of the pulse gold standard data set, were evaluated. Please refer to the methods section “Statistical Evaluation of the Gold Standard Data Set” for all statistical tests used for the analysis of this data set.

Requirements, Availability, and Documentation

pyQms requires Python 3.4+ and is platform independent (OS X, macOS, Linux, and Windows). The module is freely available on https://github.com/pyQms/pyqms or pypi, published under Massachusetts Institute of Technology (MIT) license and requires no additional modules to be installed. We recommend pymzML (20) for fast access to spectra from mzML files. To run example scripts, it is necessary to install pymzML or to change the code for alternative spectrum access. Some scripts also require the openpyxl or rpy2 modules.

The documentation of pyQms including parameter description (http://pyqms.readthedocs.io/en/latest/params.html), a quick-start tutorial (http://pyqms.readthedocs.io/en/latest/quick_start.html), and example scripts (http://pyqms.readthedocs.io/en/latest/example_scripts.html) can be found online. pyQms can be run on standard desktop computers. For further hardware requirements, please refer to the online documentation.

RESULTS

Algorithm Design

The core of pyQms is a standalone open source Python module that can be incorporated easily into any analytical workflow, can be run on standard desktop computers, and that benefits from the resources of a rich scientific computing community (20, 21, 39–42). The workflow of pyQms can be divided into two steps. First, an isotope pattern library is built, based on user-defined molecules (Fig. 1G). These molecules are specified as chemical formulas or peptide sequences, optionally with modifications following the unimod standard. The metabolic isotopic distribution and the artificial isotopic distribution of the fixed label are taken into account. pyQms uses this information to calculate an accurate isotope pattern for each molecule based on its elemental composition, the isotope distributions, and the respective masses. Second, the predicted isotope patterns are compared with the MS measurements (Fig. 1H) and scored by calculating a similarity coefficient (33) optimized for MS data yielding a similarity match score, termed mScore. The matching and scoring algorithm initially uses the m/z values and the intensity values independently and combines both later, offering the possibility to adjust the scoring algorithm to the machine type, as some instruments can measure m/z values much more accurately than intensities. Furthermore, the similarity between each measured and calculated isotope pattern peak is weighted by its relative abundance. Thus, the more abundant peaks contribute more to the mScore (see M&M for details). We developed this similarity matching approach as the basis for pyQms since it provides several advantages: (i) The isotope pattern contains multiple peaks that are all used for quantification, thus increasing the robustness over using only a single value. (ii) The matching approach tags each quantification with a quality assessment reflected by its mScore. (iii) Coeluting molecules can either be distinguished reliably since the isotope patterns differ or they receive a low mScore. (iv) The matching algorithm can adapt to any mass spectrometer accuracy. Thus, future technological advancement in machine accuracy will automatically translate into higher matching sensitivity without the need to adjust the algorithm. Overall, any MS data analysis workflow that relies on single m/z observations will benefit from the incorporation of isotope pattern matching and our scoring methodology.

Furthermore, pyQms accepts so called evidence files as input (e.g. peptide identifications from Ursgal (21) or manually curated data). These files allow molecules with the identical chemical formula but different identities to be distinguished by associating their identity with a given retention-time window. pyQms offers a modular system to use custom functions that can be used to determine the abundance of a given molecule. The basic built-in function determines abundance by the maximum intensity within a retention-time window that is defined in the evidence files. However, pyQms was primarily build for bioinformaticians, thus functions to define retention windows or to determine abundances can easily be incorporated if required.

pyQms Provides Quantification with High Accuracy and Sensitivity

To assess the accuracy and sensitivity of pyQms for metabolomics, we analyzed chemically synthesized MS-grade adenosine in a dilution series ranging from 0.04–4000 pg/μL (Fig. 2A). Importantly, the quantification remained linear over a broad range of concentrations (four to five orders of magnitude) with an R² of 0.998, showing that pyQms can be used for the sensitive quantification of metabolites.

Next, to compare pyQms against established quantification tools in proteomics, we used the data sets of Bruderer et al. (34) that contain eight biological replicates of human HEK293 cells, each measured three times in DDA and DIA mode. We compared the published quantification results of the DDA data set analyzed with MaxQuant (5) and of the DIA data set analyzed with Spectronaut (43) to our results obtained with pyQms (Fig. 2B).

We found that 17.4% of the comparisons reflect the ground truth in the DDA samples. However, concentrations below 4 fmol/μL could not be quantified confidently, which is similar to the results reported for MaxQuant (5) (21.3%, Supplemental Fig. 8, 9I2). Furthermore, we found that 82.2% of the ratios obtained with pyQms reflect the ground truth for the DIA data set (Fig. 2B, bottom right half), similar to what was reported for Spectronaut (43) (76.8%, Supplemental Fig. 8, 9I3). These results are comparable to the remaining spiked-in proteins (Fig. 2C). In summary, pyQms quantifications reflected 71.2%/85.2% (n = 1380 ratios) of the ground truth in the DDA/DIA runs for master mix 1 (MM1), 34.9%/66.3% (n = 1380 ratios) for MM2, and 20.8%/52.5% (n = 552 ratios) for MM3 (Fig. 2C), which is similar to the results reported for MaxQuant and Spectronaut (34) (MM1: 62.8%/78.8%, MM2: 23.1%/68.5%, MM3: 15%/72.1%) (Supplemental Figs. 8–11, Supplemental Tables S10 and S11). This shows that pyQms covers the detection limits of data acquired in DDA mode or DIA modes.

Quantification of Pulse (-Chase) Samples

In order to evaluate the quantification performance of pyQms for pulse (-chase) samples, we create a gold-standard data set of well-defined partially labeled proteomes. This data set can be used to benchmark the accuracy of any quantitative software aiming at analyzing partially labeled proteomes. We cultivated the green algae Chlamydomonas reinhardtii in media containing 0 to 100% ¹⁵N metabolic label in 20% increments. Extracted proteins were combined to create five mixes: 0I100, 20I100, 40I100, 0I60, and 0I80 (Fig. 3A), which were analyzed by LC-MS/MS. Subsequently, we quantified all identified peptides, allowing all possible combinations of charge states and all ¹⁵N enrichment percentiles to be matched in all MS¹ scans. This summed up to a total of 5.8e¹² performed matches, which resulted in 1.6e⁸ positive matches with mScores ≥0.5. We grouped matches with an identical chemical formula, charge state, and labeling percentile from one LC-MS/MS run into MICs. Altogether, we found 1.4e⁷ MICs. For each sample, we assessed the matching quality by aggregating the single matches of all MICs into one heat map (Fig. 3D, Supplemental Fig. 12). We used the ¹⁵N-labeling percentile and the mScore of each match as coordinates for the heat map bin and added the abundance of that match to this bin (including peptide abundances of all spectra assembled in a MIC). While the 0I100 mix can be separated reliably (Supplemental Fig. 12A), this was not the case for the other mixes (Supplemental Figs. 12C, 12E, and 12G). Even though it was possible to identify the ground truth, the correct result was obscured by false positives leading to a high background signal that was distributed over all labeling percentiles.

Fig. 3. — **The partially labeled proteome gold standard.** (A) Cultures of *C. reinhardtii* were grown on medium containing 20, 40, 60, 80, and 100% ¹⁵N. Each partially labeled proteome was mixed with an unlabeled (¹⁴N) or fully labeled (¹⁵N) proteome, yielding five mixed samples (0 100, 0 60, 0 80, 20 100, and 40 100). (B) Unbiased quantification workflow. All identified peptides (19,976 peptides, 18,285 distinct chemical formulas) were quantified in all MS¹ spectra (6.3e⁵) of all LC-MS/MS runs in five charge states performing in total 5.8e¹² matches. Matches were filtered using an mScore threshold of 0.5 and assembled into 1.4e⁷ MICs. Example isotopologues for 0 and 60% ¹⁵N incorporation are shown on the *left*. (C) 3D visualization of a typical MIC, colors indicate mScores for each match within the MIC; x axis, *m/z*; y axis, spectrum id; z axis, intensity. (D) Visual evaluation of all matches in the 0 60 sample, x axis labeling percentile, y axis mScore, heat equals summed up intensities of matches in all MS¹ spectra in a given bin (representing all identified and quantified peptides in all matched charge states). (E) ROC curves of pyQms performance in all samples shown as specificity (x axis, log scale) versus true positive rate (y axis). (F) mScore-dependent FDR; x axis, mScore; y axis, FDR (log scale). (*G-I*) as (*D-F*), but quantifications are limited to retention time windows. Legends (D, E, G, H): 0% (blue), 100% (red), 20% (green), 40% (purple), 60% (orange), 80% (brown); mixtures: 0I100 (circles), 0I60 (triangles), 20I100 (diamonds), 0I80 (squares), 40I100 (reverse triangles).

We statistically evaluated our results by creating receiver-operating characteristics (ROC) of the MICs against the ground truth for each mixed sample (Fig. 3E). The ROC curves show that pyQms reliably matches the 0I100 mix (71.1% and 76.9% TPR at 99% specificity, blue and red circles). In contrast, the partially labeled samples result in TPRs between 12.7% and 21.5% at 99% specificity. The fully labeled (100%) or unlabeled samples (0%), if mixed with a partially labeled sample, have lower ROC curves compared with their counterpart in the 0I100 mix (30.9%–35.4% TPR at 99% specificity). The mixes containing partially labeled samples show a high FDR even at very high mScores (e.g. the 0 and 60% samples show FDRs of 4.5% and 29.3%, respectively, at mScores ≥ 0.9 in the 0I60 mix; Fig. 3F). Only in the 0I100 mix, we detected FDRs of 2.7% and 2.1% for the 0 and 100% samples, respectively, at mScores ≥ 0.7.

Decreasing the FDR in Pulse (-Chase) Experiments

To obtain a lower FDR, we limited quantification to stringent (2–2.5 min) RT windows (Supplemental Fig. 13, Supplemental Table S4) based on the RT alignment strategy described earlier (17). This strategy significantly reduced FP matches, as shown by reduced noise in the heat map (Fig. 3G). Furthermore, the ROC curves improve significantly for all five mixes (Fig. 3H). Finally, the mScore-dependent FDR (Fig. 3I) shows rates of 3.2% and 3.8% for the 40% (40 100 mix) and 60% (0 60 mix) samples, respectively, at very stringent mScores of ≥0.9 (Fig. 3I). This shows, that applying RT windows can significantly reduce the FDR in pulse (-chase) experiments.

DISCUSSION

pyQms Performs State-of-the-Art DDA and DIA Quantification

The presented results emphasize that pyQms performs optimally within the detection limits of data acquired in DDA mode or DIA modes similar to what is regarded as state-of-the art in proteomics. However, pyQms combines the functionality to quantify both DDA and DIA data in a single software. Furthermore, pyQms uniquely offers a highly accessible bioinformatics library, so other packages can incorporate its isotope matching procedure to increase their matching quality with ease. Ultimately, isotope pattern matching will replace approaches relying on single m/z values, especially in the advent of the broad availability of high-resolution mass spectrometers.

Benchmarked Partial Label Quantification

To demonstrate that pyQms goes beyond the current state-of-the-art, we analyzed a data set of differentially labeled proteomes, like observed during pulse or pulse-chase experiments. This reflects the ultimate challenge for any quantification algorithm since each molecule has different enriched isotope incorporation levels depending on the time of synthesis. Thus, one single labeling state cannot be observed, complicating analyses for several reasons: First, the isotope distribution is different for each labeling state, yet, it is difficult to distinguish closely related labeling states. Each isotope pattern must therefore be evaluated independently or in a mixed model. Second, partially labeled molecules broaden the isotope pattern and lead to more peaks, which are consequently less intense (Supplemental Fig. 14). This effect leads to a loss of signal and quantification accuracy. Finally, the number of isotope patterns that need to be matched increases by two orders of magnitude when compared with label-free quantification, thus requiring significant computational resources.

Our pulse or partially labeled gold standard data set revealed a low TPR for samples containing partially labeled peptides at high specificities in contrast to a sample containing only unlabeled or fully labeled proteins (Fig. 3E). This confirms that the complexity of the sample and the number of detected peaks strongly influences the classification. Similarly, this is reflected in the mScore-dependent FDR (Fig. 3F). These results illustrate the challenge of assessing partially labeled molecules in general. These difficulties are rarely addressed as they only become apparent by benchmarking against a defined partially labeled sample. Our gold standard data set can therefore be used to benchmark new software tools for the quantification of pulse and pulse-chase data.

The benchmarking results of pyQms against the gold standard data set underlines the universal application and high quality of pyQms, which goes beyond what is currently available in single algorithms.

pyQms Is Suitable for Large-Scale Pulse (-Chase) Studies

The approach to lower the FDR and increase the TPR by applying stringent RT windows demonstrates that pyQms allows accurate large-scale high-throughput quantification of pulse (-chase) MS experiments to be performed. Interestingly, the sample resembling the most common proteomic samples (0I100 mix) shows an FDR of only 1.2% at mScores ≥ 0.7 using accurate isotope pattern matching in combination with RT windows (Fig. 3I, blue and red circles). This indicates that pyQms could be sufficient for MS¹-based identifications and quantification in proteomics. Importantly, this should also pave the way to reliably overcome the undersampling issue in proteomics (44), especially if one takes the mScore as a quality criterion into account. In fact, if the Bruderer et al. (34) data set is evaluated with respect to mScore, this effect becomes very obvious (Supplemental Fig. 15).

CONCLUSIONS

In conclusion, we have demonstrated that pyQms is a powerful software to quantify any molecule measured by MS, independent of the type of molecule, label, research field, or acquisition mode. These unique features enable novel experimental designs and multiplexing strategies, which are urgently required in the era of personalized medicine in order to simultaneously compare hundreds of clinical samples. Using pyQms, researchers will be able to quantify MS data from proteomics, metabolomics, lipidomics, glycomics, or other research fields with a single, universal software solution. pyQms is open source and freely available via https://github.com/pyQms/pyqms under an MIT license.

Data availability

The mass spectrometry proteomics data have been deposited to the ProteomeXchange (32) via the PRIDE partner repository (http://www.ebi.ac.uk/pride/archive/) with the data set identifier PXD003236.

Supplementary Material

Supplemental Data

supp_16_10_1736__index.html^{(3.8KB, html)}

Acknowledgments

We thank Dr. R.P. Zahedi, Dr. M. Stitt, Dr. K. Bartscherer, Dr. G. Bentley, and the members of the Leidel, Hippler, and Fufezan groups and the eBio group of Cellzome for their valuable feedback and suggestions. Dr. H. Drexler and Dr. M. Scholz are acknowledged for the measurement of the LC-MS samples.

Footnotes

Author contributions: C.F. designed research; J.L., P.S., and C.F. performed research; J.L., A.N., F.W., M.H., and C.F. contributed new re-agents or analytic tools; J.L., P.S., and C.F. analyzed data; C.F., J.L., and A.N. programmed the algorithm; C.F. and S.A.L. super-vised project; and J.L., S.A.L., and C.F. wrote the paper.

* This work was supported by the Max Planck Society, the European Research Council (ERC-2012-StG 310489-tRNAmodi) and DFG SPP1784 to S.A.L.

This article contains supplemental material Supplemental Tables S1–S11 and Supplemental Figs. 1–15.

¹ The abbreviations used are:

DDA: data-dependent acquisition
DIA: data-independent acquisition
FDR: false discovery rate
FN: false negative
FP: false positive
FPR: false positive rate
LC-MS/MS: liquid chromatography coupled tandem mass spectrometry
MIC: matched isotope pattern chromatogram
MS: mass spectrometry
ROC: receiver-operating characteristic
RT: retention time
TAP: tris-acetate-phosphate
TN: true negative
TP: true positive
TPR: true positive rate.

REFERENCES

1. Yates J. R., Ruse C. I., and Nakorchevsky A. (2009) Proteomics by mass spectrometry: Approaches, advances, and applications. Annu. Rev. Biomed. Eng. 11, 49–79 [DOI] [PubMed] [Google Scholar]
2. Ong S. E., Blagoev B., Kratchmarova I., Kristensen D. B., Steen H., Pandey A., and Mann M. (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1, 376–386 [DOI] [PubMed] [Google Scholar]
3. Filla L. A., Yuan W., Feldman E. L., Li S., and Edwards J. L. (2014) Global metabolomic and isobaric tagging capillary liquid chromatography-tandem mass spectrometry approaches for uncovering pathway dysfunction in diabetic mouse aorta. J. Proteome Res. 13, 6121–6134 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Werner T., Sweetman G., Savitski M. F., Mathieson T., Bantscheff M., and Savitski M. M. (2014) Ion coalescence of neutron encoded TMT 10-plex reporter ions. Anal. Chem. 86, 3594–3601 [DOI] [PubMed] [Google Scholar]
5. Cox J., and Mann M. (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 [DOI] [PubMed] [Google Scholar]
6. Park S. K., Venable J. D., Xu T., and Yates J. R. 3rd. (2008) A quantitative analysis software tool for mass spectrometry-based proteomics. Nat. Methods 5, 319–322 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Sturm M., Bertsch A., Gröpl C., Hildebrandt A., Hussong R., Lange E., Pfeifer N., Schulz-Trieglaff O., Zerck A., Reinert K., and Kohlbacher O. (2008) OpenMS—An open-source software framework for mass spectrometry. BMC Bioinformatics 9, 163. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Sachsenberg T., Herbst F.-A., Taubert M., Kermer R., Jehmlich N., Bergen M. Von, Seifert J., and Kohlbacher O. (2015) MetaProSIP: Automated inference of stable isotope incorporation rates in proteins for functional metaproteomics. J. Proteome Res. 14, 619–627 [DOI] [PubMed] [Google Scholar]
9. Tsou C.-C., Avtonomov D., Larsen B., Tucholska M., Choi H., Gingras A.-C., and Nesvizhskii A. I. (2015) DIA-Umpire: Comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258–264 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Röst H. L., Rosenberger G., Navarro P., Gillet L., Miladinović S. M., Schubert O. T., Wolski W., Collins B. C., Malmström J., Malmström L., and Aebersold R. (2014) OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 [DOI] [PubMed] [Google Scholar]
11. MacLean B., Tomazela D. M., Shulman N., Chambers M., Finney G. L., Frewen B., Kern R., Tabb D. L., Liebler D. C., and MacCoss M. J. (2010) Skyline: An open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Reiter L., Rinner O., Picotti P., Hüttenhain R., Beck M., Brusniak M.-Y., Hengartner M. O., and Aebersold R. (2011) mProphet: Automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 8, 430–435 [DOI] [PubMed] [Google Scholar]
13. Mitchell C. J., Kim M.-S., Na C. H., and Pandey A. (2016) PyQuant: A versatile framework for analysis of quantitative mass spectrometry data. Mol. Cell. Proteomics 15, 2829–2838 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Palmer A., Phapale P., Chernyavsky I., Lavigne R., Fay D., Tarasov A., Kovalev V., Fuchser J., Nikolenko S., Pineau C., Becker M., and Alexandrov T. (2017) FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry. Nat. Methods 14, 57–60 [DOI] [PubMed] [Google Scholar]
15. Harris E. H. (1989) The Chlamydomonas Sourcebook [Google Scholar]
16. Wiśniewski J. R., Zougman A., Nagaraj N., and Mann M. (2009) Universal sample preparation method for proteome analysis. Nat. Methods 6, 359–362 [DOI] [PubMed] [Google Scholar]
17. Barth J., Bergner S. V., Jaeger D., Niehues A., Schulze S., Scholz M., and Fufezan C. (2014) The interplay of light and oxygen in the reactive oxygen stress response of Chlamydomonas reinhardtii dissected by quantitative mass spectrometry. Mol. Cell. Proteomics 13, 969–989 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Martens L., Chambers M., Sturm M., Kessner D., Levander F., Shofstahl J., Tang W. H., Römpp A., Neumann S., Pizarro A. D., Montecchi-Palazzi L., Tasman N., Coleman M., Reisinger F., Souda P., Hermjakob H., Binz P.-A., and Deutsch E. W. (2011) mzML—A community standard for mass spectrometry data. Mol. Cell. Proteomics 10, R110.000133. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Deutsch E. (2008) mzML: A single, unifying data format for mass spectrometer output. Proteomics 8, 2776–2777 [DOI] [PubMed] [Google Scholar]
20. Bald T., Barth J., Niehues A., Specht M., Hippler M., and Fufezan C. (2012) pymzML—Python module for high-throughput bioinformatics on mass spectrometry data. Bioinformatics 28, 1052–1053 [DOI] [PubMed] [Google Scholar]
21. Kremer L. P., Leufken J., Oyunchimeg P., Schulze S., and Fufezan C. (2016) Ursgal, universal Python module combining common bottom-up proteomics tools for large-scale analysis. J. Proteome Res. 15, 788–794 [DOI] [PubMed] [Google Scholar]
22. Geer L. Y., Markey S. P., Kowalak J. A., Wagner L., Xu M., Maynard D. M., Yang X., Shi W., and Bryant S. H. (2004) Open mass spectrometry search algorithm. J. Proteome Res. 3, 958–964 [DOI] [PubMed] [Google Scholar]
23. Craig R., and Beavis R. C. (2003) A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun. Mass Spectrom. 17, 2310–2316 [DOI] [PubMed] [Google Scholar]
24. Kim S., Mischerikow N., Bandeira N., Navarro J. D., Wich L., Mohammed S., Heck A. J., and Pevzner P. A. (2010) The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: Applications to database search. Mol. Cell. Proteomics 9, 2840–2852 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Tabb D. L., Fernando C. G., and Chambers M. C. (2007) MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6, 654–661 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Merchant S. S., Prochnik S. E., Vallon O., Harris E. H., Karpowicz S. J., Witman G. B., Terry A., Salamov A., Fritz-Laylin L. K., Maréchal-Drouard L., Marshall W. F., Qu L.-H., Nelson D. R., Sanderfoot A. A., Spalding M. H., Kapitonov V. V., Ren Q., Ferris P., Lindquist E., Shapiro H., Lucas S. M., Grimwood J., Schmutz J., Cardol P., Cerutti H., Chanfreau G., Chen C.-L., Cognat V., Croft M. T., Dent R., Dutcher S., Fernández E., Fukuzawa H., González-Ballester D., González-Halphen D., Hallmann A., Hanikenne M., Hippler M., Inwood W., Jabbari K., Kalanon M., Kuras R., Lefebvre P. A., Lemaire S. D., Lobanov A. V., Lohr M., Manuell A., Meier I., Mets L., Mittag M., Mittelmeier T., Moroney J. V., Moseley J., Napoli C., Nedelcu A. M., Niyogi K., Novoselov S. V., Paulsen I. T., Pazour G., Purton S., Ral J.-P., Riaño-Pachón D. M., Riekhof W., Rymarquis L., Schroda M., Stern D., Umen J., Willows R., Wilson N., Zimmer S. L., Allmer J., Balk J., Bisova K., Chen C.-J., Elias M., Gendler K., Hauser C., Lamb M. R., Ledford H., Long J. C., Minagawa J., Page M. D., Pan J., Pootakham W., Roje S., Rose A., Stahlberg E., Terauchi A. M., Yang P., Ball S., Bowler C., Dieckmann C. L., Gladyshev V. N., Green P., Jorgensen R., Mayfield S., Mueller-Roeber B., Rajamani S., Sayre R. T., Brokstein P., Dubchak I., Goodstein D., Hornick L., Huang Y. W., Jhaveri J., Luo Y., Martínez D., Ngau W. C., Otillar B., Poliakov A., Porter A., Szajkowski L., Werner G., Zhou K., Grigoriev I. V., Rokhsar D. S., and Grossman A. R. (2007) The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318, 245–250 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Stanke M., and Waack S. (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215–ii225 [DOI] [PubMed] [Google Scholar]
28. Maul J. E., Lilly J. W., Cui L., de Pamphilis C. W., Miller W., Harris E. H., and Stern D. B. (2002) The Chlamydomonas reinhardtii plastid chromosome : Islands of genes in a sea of repeats. Plant Cell 14, 2659–2679 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Mellacheruvu D., Wright Z., Couzens A. L., Lambert J., St-Denis N. A., Li T., Miteva Y. V., Hauri S., Sardiu M. E., Low T. Y., Halim V. A., Bagshaw R. D., Hubner N. C., Al-Hakim A., Bouchard A., Faubert D., Fermin D., Dunham W. H., Goudreault M., Lin Z.-Y., Badillo B. G., Pawson T., Durocher D., Coulombe B., Aebersold R., Superti-Furga G., Colinge J., Heck A. J., Choi H., Gstaiger M., Mohammed S., Cristea I. M., Bennett K. L., Washburn M. P., Raught B., Ewing R. M., Gingras A.-C., and Nesvizhskii A. I. (2013) The CRAPome: A contaminant repository for affinity purification–mass spectrometry data. Nat. Methods 10, 730–736 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Käll L., Canterbury J. D., Weston J., Noble W. S., and MacCoss M. J. (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 [DOI] [PubMed] [Google Scholar]
31. Käll L., Storey J. D., and Noble W. S. (2008) Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry. Bioinformatics 24, i42–i48 [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Vizcaíno J. A., Deutsch E. W., Wang R., Csordas A., Reisinger F., Ríos D., Dianes J. A., Sun Z., Farrah T., Bandeira N., Binz P.-A., Xenarios I., Eisenacher M., Mayer G., Gatto L., Campos A., Chalkley R. J., Kraus H.-J., Albar J. P., Martinez-Bartolomé S., Apweiler R., Omenn G. S., Martens L., Jones A. R., and Hermjakob H. (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Gower J. C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27, 857 [Google Scholar]
34. Bruderer R., Bernhardt O. M., Gandhi T., Miladinović S. M., Cheng L.-Y., Messner S., Ehrenberger T., Zanotelli V., Butscheid Y., Escher C., Vitek O., Rinner O., and Reiter L. (2015) Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues. Mol. Cell. Proteomics 14, 1400–1410 [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Kessner D., Chambers M., Burke R., Agus D., and Mallick P. (2008) ProteoWizard: Open source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536 [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Hoehner R., Barth J., Magneschi L., Jaeger D., Niehues A., Bald T., Grossman A., Fufezan C., and Hippler M. (2013) The metabolic status drives acclimation of iron deficiency responses in Chlamydomonas reinhardtii as revealed by proteomics based hierarchical clustering and reverse genetics. Mol. Cell. Proteomics, 1–59 [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Bergner S. V., Scholz M., Trompelt K., Barth J., Gäbelein P., Steinbeck J., Xue H., Clowez S., Fucile G., Goldschmidt-Clermont M., Fufezan C., and Hippler M. (2015) STATE TRANSITION7-dependent phosphorylation is modulated by changing environmental conditions, and its absence triggers remodeling of photosynthetic protein complexes. Plant Physiol. 168, 615–634 [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Oliphant T. E. (2007) Python for scientific computing. Comp. Sci. Eng. 9, 10–20 [Google Scholar]
39. Röst H. L., Schmitt U., Aebersold R., and Malmström L. (2014) pyOpenMS: A Python-based interface to the OpenMS mass-spectrometry algorithm library. Proteomics 14, 74–77 [DOI] [PubMed] [Google Scholar]
40. Kiefer P., Schmitt U., and Vorholt J. A. (2013) EMZed: An open source framework in Python for rapid and interactive development of LC/MS data analysis workflows. Bioinformatics 29, 963–964 [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Perez-Riverol Y., Wang R., Hermjakob H., Müller M., Vesada V., and Vizcaíno J. A. (2014) Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective. Biochim. Biophys. Acta 1844, 63–76 [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Goloborodko A. A., Levitsky L. I., Ivanov M. V., and Gorshkov M. V (2013) Pyteomics—A Python framework for exploratory data analysis and rapid software prototyping in proteomics. J. Am. Soc. Mass Spectrom. 24, 301–304 [DOI] [PubMed] [Google Scholar]
43. Reiter L., Rinner O., Picotti P., Hüttenhain R., Beck M., Brusniak M.-Y., Hengartner M. O., and Aebersold R. (2011) mProphet: Automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 8, 430–435 [DOI] [PubMed] [Google Scholar]
44. Nilsson T., Mann M., Aebersold R., Yates J. R. 3rd, Bairoch A., and Bergeron J. J. (2010) Mass spectrometry in high-throughput proteomics: ready for the big time. Nat. Methods 7, 681–685 [DOI] [PubMed] [Google Scholar]
45. Yao X., Freas A., Ramirez J., Demirev P. A., and Fenselau C. (2001) Proteolytic 18O labeling for comparative proteomics: Model studies with two serotypes of adenovirus. Anal. Chem. 73, 2836–2842 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data

supp_16_10_1736__index.html^{(3.8KB, html)}

10.1074_M117.068007_mcp.M117.068007-1.zip^{(48.4MB, zip)}

10.1074_M117.068007_mcp.M117.068007-2.pdf^{(6.3MB, pdf)}

10.1074_M117.068007_mcp.M117.068007-3.pdf^{(81.3KB, pdf)}

10.1074_M117.068007_mcp.M117.068007-4.pdf^{(64.9KB, pdf)}

10.1074_M117.068007_mcp.M117.068007-5.xlsx^{(19.7KB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-6.xlsx^{(1.8MB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-7.xlsx^{(356.8KB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-8.xlsx^{(2.6MB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-9.xlsx^{(444.8KB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-10.xlsx^{(9.4KB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-11.xlsx^{(86.8KB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-12.xlsx^{(721.1KB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-13.xlsx^{(54.2KB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-14.xlsx^{(9.2KB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-15.xlsx^{(10.6KB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-16.xlsx^{(9.4KB, xlsx)}

10.1074_M117.068007_mcp.M117.068007-17.zip^{(15.7MB, zip)}

Data Availability Statement

The mass spectrometry proteomics data have been deposited to the ProteomeXchange (32) via the PRIDE partner repository (http://www.ebi.ac.uk/pride/archive/) with the data set identifier PXD003236.

[B1] 1. Yates J. R., Ruse C. I., and Nakorchevsky A. (2009) Proteomics by mass spectrometry: Approaches, advances, and applications. Annu. Rev. Biomed. Eng. 11, 49–79 [DOI] [PubMed] [Google Scholar]

[B2] 2. Ong S. E., Blagoev B., Kratchmarova I., Kristensen D. B., Steen H., Pandey A., and Mann M. (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1, 376–386 [DOI] [PubMed] [Google Scholar]

[B3] 3. Filla L. A., Yuan W., Feldman E. L., Li S., and Edwards J. L. (2014) Global metabolomic and isobaric tagging capillary liquid chromatography-tandem mass spectrometry approaches for uncovering pathway dysfunction in diabetic mouse aorta. J. Proteome Res. 13, 6121–6134 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4. Werner T., Sweetman G., Savitski M. F., Mathieson T., Bantscheff M., and Savitski M. M. (2014) Ion coalescence of neutron encoded TMT 10-plex reporter ions. Anal. Chem. 86, 3594–3601 [DOI] [PubMed] [Google Scholar]

[B5] 5. Cox J., and Mann M. (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 [DOI] [PubMed] [Google Scholar]

[B6] 6. Park S. K., Venable J. D., Xu T., and Yates J. R. 3rd. (2008) A quantitative analysis software tool for mass spectrometry-based proteomics. Nat. Methods 5, 319–322 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7. Sturm M., Bertsch A., Gröpl C., Hildebrandt A., Hussong R., Lange E., Pfeifer N., Schulz-Trieglaff O., Zerck A., Reinert K., and Kohlbacher O. (2008) OpenMS—An open-source software framework for mass spectrometry. BMC Bioinformatics 9, 163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8. Sachsenberg T., Herbst F.-A., Taubert M., Kermer R., Jehmlich N., Bergen M. Von, Seifert J., and Kohlbacher O. (2015) MetaProSIP: Automated inference of stable isotope incorporation rates in proteins for functional metaproteomics. J. Proteome Res. 14, 619–627 [DOI] [PubMed] [Google Scholar]

[B9] 9. Tsou C.-C., Avtonomov D., Larsen B., Tucholska M., Choi H., Gingras A.-C., and Nesvizhskii A. I. (2015) DIA-Umpire: Comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258–264 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10. Röst H. L., Rosenberger G., Navarro P., Gillet L., Miladinović S. M., Schubert O. T., Wolski W., Collins B. C., Malmström J., Malmström L., and Aebersold R. (2014) OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 [DOI] [PubMed] [Google Scholar]

[B11] 11. MacLean B., Tomazela D. M., Shulman N., Chambers M., Finney G. L., Frewen B., Kern R., Tabb D. L., Liebler D. C., and MacCoss M. J. (2010) Skyline: An open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12. Reiter L., Rinner O., Picotti P., Hüttenhain R., Beck M., Brusniak M.-Y., Hengartner M. O., and Aebersold R. (2011) mProphet: Automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 8, 430–435 [DOI] [PubMed] [Google Scholar]

[B13] 13. Mitchell C. J., Kim M.-S., Na C. H., and Pandey A. (2016) PyQuant: A versatile framework for analysis of quantitative mass spectrometry data. Mol. Cell. Proteomics 15, 2829–2838 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14. Palmer A., Phapale P., Chernyavsky I., Lavigne R., Fay D., Tarasov A., Kovalev V., Fuchser J., Nikolenko S., Pineau C., Becker M., and Alexandrov T. (2017) FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry. Nat. Methods 14, 57–60 [DOI] [PubMed] [Google Scholar]

[B15] 15. Harris E. H. (1989) The Chlamydomonas Sourcebook [Google Scholar]

[B16] 16. Wiśniewski J. R., Zougman A., Nagaraj N., and Mann M. (2009) Universal sample preparation method for proteome analysis. Nat. Methods 6, 359–362 [DOI] [PubMed] [Google Scholar]

[B17] 17. Barth J., Bergner S. V., Jaeger D., Niehues A., Schulze S., Scholz M., and Fufezan C. (2014) The interplay of light and oxygen in the reactive oxygen stress response of Chlamydomonas reinhardtii dissected by quantitative mass spectrometry. Mol. Cell. Proteomics 13, 969–989 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18. Martens L., Chambers M., Sturm M., Kessner D., Levander F., Shofstahl J., Tang W. H., Römpp A., Neumann S., Pizarro A. D., Montecchi-Palazzi L., Tasman N., Coleman M., Reisinger F., Souda P., Hermjakob H., Binz P.-A., and Deutsch E. W. (2011) mzML—A community standard for mass spectrometry data. Mol. Cell. Proteomics 10, R110.000133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19. Deutsch E. (2008) mzML: A single, unifying data format for mass spectrometer output. Proteomics 8, 2776–2777 [DOI] [PubMed] [Google Scholar]

[B20] 20. Bald T., Barth J., Niehues A., Specht M., Hippler M., and Fufezan C. (2012) pymzML—Python module for high-throughput bioinformatics on mass spectrometry data. Bioinformatics 28, 1052–1053 [DOI] [PubMed] [Google Scholar]

[B21] 21. Kremer L. P., Leufken J., Oyunchimeg P., Schulze S., and Fufezan C. (2016) Ursgal, universal Python module combining common bottom-up proteomics tools for large-scale analysis. J. Proteome Res. 15, 788–794 [DOI] [PubMed] [Google Scholar]

[B22] 22. Geer L. Y., Markey S. P., Kowalak J. A., Wagner L., Xu M., Maynard D. M., Yang X., Shi W., and Bryant S. H. (2004) Open mass spectrometry search algorithm. J. Proteome Res. 3, 958–964 [DOI] [PubMed] [Google Scholar]

[B23] 23. Craig R., and Beavis R. C. (2003) A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun. Mass Spectrom. 17, 2310–2316 [DOI] [PubMed] [Google Scholar]

[B24] 24. Kim S., Mischerikow N., Bandeira N., Navarro J. D., Wich L., Mohammed S., Heck A. J., and Pevzner P. A. (2010) The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: Applications to database search. Mol. Cell. Proteomics 9, 2840–2852 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25. Tabb D. L., Fernando C. G., and Chambers M. C. (2007) MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6, 654–661 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26. Merchant S. S., Prochnik S. E., Vallon O., Harris E. H., Karpowicz S. J., Witman G. B., Terry A., Salamov A., Fritz-Laylin L. K., Maréchal-Drouard L., Marshall W. F., Qu L.-H., Nelson D. R., Sanderfoot A. A., Spalding M. H., Kapitonov V. V., Ren Q., Ferris P., Lindquist E., Shapiro H., Lucas S. M., Grimwood J., Schmutz J., Cardol P., Cerutti H., Chanfreau G., Chen C.-L., Cognat V., Croft M. T., Dent R., Dutcher S., Fernández E., Fukuzawa H., González-Ballester D., González-Halphen D., Hallmann A., Hanikenne M., Hippler M., Inwood W., Jabbari K., Kalanon M., Kuras R., Lefebvre P. A., Lemaire S. D., Lobanov A. V., Lohr M., Manuell A., Meier I., Mets L., Mittag M., Mittelmeier T., Moroney J. V., Moseley J., Napoli C., Nedelcu A. M., Niyogi K., Novoselov S. V., Paulsen I. T., Pazour G., Purton S., Ral J.-P., Riaño-Pachón D. M., Riekhof W., Rymarquis L., Schroda M., Stern D., Umen J., Willows R., Wilson N., Zimmer S. L., Allmer J., Balk J., Bisova K., Chen C.-J., Elias M., Gendler K., Hauser C., Lamb M. R., Ledford H., Long J. C., Minagawa J., Page M. D., Pan J., Pootakham W., Roje S., Rose A., Stahlberg E., Terauchi A. M., Yang P., Ball S., Bowler C., Dieckmann C. L., Gladyshev V. N., Green P., Jorgensen R., Mayfield S., Mueller-Roeber B., Rajamani S., Sayre R. T., Brokstein P., Dubchak I., Goodstein D., Hornick L., Huang Y. W., Jhaveri J., Luo Y., Martínez D., Ngau W. C., Otillar B., Poliakov A., Porter A., Szajkowski L., Werner G., Zhou K., Grigoriev I. V., Rokhsar D. S., and Grossman A. R. (2007) The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318, 245–250 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27. Stanke M., and Waack S. (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215–ii225 [DOI] [PubMed] [Google Scholar]

[B28] 28. Maul J. E., Lilly J. W., Cui L., de Pamphilis C. W., Miller W., Harris E. H., and Stern D. B. (2002) The Chlamydomonas reinhardtii plastid chromosome : Islands of genes in a sea of repeats. Plant Cell 14, 2659–2679 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] 29. Mellacheruvu D., Wright Z., Couzens A. L., Lambert J., St-Denis N. A., Li T., Miteva Y. V., Hauri S., Sardiu M. E., Low T. Y., Halim V. A., Bagshaw R. D., Hubner N. C., Al-Hakim A., Bouchard A., Faubert D., Fermin D., Dunham W. H., Goudreault M., Lin Z.-Y., Badillo B. G., Pawson T., Durocher D., Coulombe B., Aebersold R., Superti-Furga G., Colinge J., Heck A. J., Choi H., Gstaiger M., Mohammed S., Cristea I. M., Bennett K. L., Washburn M. P., Raught B., Ewing R. M., Gingras A.-C., and Nesvizhskii A. I. (2013) The CRAPome: A contaminant repository for affinity purification–mass spectrometry data. Nat. Methods 10, 730–736 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30. Käll L., Canterbury J. D., Weston J., Noble W. S., and MacCoss M. J. (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 [DOI] [PubMed] [Google Scholar]

[B31] 31. Käll L., Storey J. D., and Noble W. S. (2008) Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry. Bioinformatics 24, i42–i48 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32. Vizcaíno J. A., Deutsch E. W., Wang R., Csordas A., Reisinger F., Ríos D., Dianes J. A., Sun Z., Farrah T., Bandeira N., Binz P.-A., Xenarios I., Eisenacher M., Mayer G., Gatto L., Campos A., Chalkley R. J., Kraus H.-J., Albar J. P., Martinez-Bartolomé S., Apweiler R., Omenn G. S., Martens L., Jones A. R., and Hermjakob H. (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33. Gower J. C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27, 857 [Google Scholar]

[B34] 34. Bruderer R., Bernhardt O. M., Gandhi T., Miladinović S. M., Cheng L.-Y., Messner S., Ehrenberger T., Zanotelli V., Butscheid Y., Escher C., Vitek O., Rinner O., and Reiter L. (2015) Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues. Mol. Cell. Proteomics 14, 1400–1410 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35] 35. Kessner D., Chambers M., Burke R., Agus D., and Mallick P. (2008) ProteoWizard: Open source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B36] 36. Hoehner R., Barth J., Magneschi L., Jaeger D., Niehues A., Bald T., Grossman A., Fufezan C., and Hippler M. (2013) The metabolic status drives acclimation of iron deficiency responses in Chlamydomonas reinhardtii as revealed by proteomics based hierarchical clustering and reverse genetics. Mol. Cell. Proteomics, 1–59 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B37] 37. Bergner S. V., Scholz M., Trompelt K., Barth J., Gäbelein P., Steinbeck J., Xue H., Clowez S., Fucile G., Goldschmidt-Clermont M., Fufezan C., and Hippler M. (2015) STATE TRANSITION7-dependent phosphorylation is modulated by changing environmental conditions, and its absence triggers remodeling of photosynthetic protein complexes. Plant Physiol. 168, 615–634 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B38] 38. Oliphant T. E. (2007) Python for scientific computing. Comp. Sci. Eng. 9, 10–20 [Google Scholar]

[B39] 39. Röst H. L., Schmitt U., Aebersold R., and Malmström L. (2014) pyOpenMS: A Python-based interface to the OpenMS mass-spectrometry algorithm library. Proteomics 14, 74–77 [DOI] [PubMed] [Google Scholar]

[B40] 40. Kiefer P., Schmitt U., and Vorholt J. A. (2013) EMZed: An open source framework in Python for rapid and interactive development of LC/MS data analysis workflows. Bioinformatics 29, 963–964 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B41] 41. Perez-Riverol Y., Wang R., Hermjakob H., Müller M., Vesada V., and Vizcaíno J. A. (2014) Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective. Biochim. Biophys. Acta 1844, 63–76 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B42] 42. Goloborodko A. A., Levitsky L. I., Ivanov M. V., and Gorshkov M. V (2013) Pyteomics—A Python framework for exploratory data analysis and rapid software prototyping in proteomics. J. Am. Soc. Mass Spectrom. 24, 301–304 [DOI] [PubMed] [Google Scholar]

[B43] 43. Reiter L., Rinner O., Picotti P., Hüttenhain R., Beck M., Brusniak M.-Y., Hengartner M. O., and Aebersold R. (2011) mProphet: Automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 8, 430–435 [DOI] [PubMed] [Google Scholar]

[B44] 44. Nilsson T., Mann M., Aebersold R., Yates J. R. 3rd, Bairoch A., and Bergeron J. J. (2010) Mass spectrometry in high-throughput proteomics: ready for the big time. Nat. Methods 7, 681–685 [DOI] [PubMed] [Google Scholar]

[B45] 45. Yao X., Freas A., Ramirez J., Demirev P. A., and Fenselau C. (2001) Proteolytic 18O labeling for comparative proteomics: Model studies with two serotypes of adenovirus. Anal. Chem. 73, 2836–2842 [DOI] [PubMed] [Google Scholar]

PERMALINK

pyQms enables universal and accurate quantification of mass spectrometry data*

Johannes Leufken

Anna Niehues

L Peter Sarin

Florian Wessel

Michael Hippler

Sebastian A Leidel

Christian Fufezan