Abstract
Global and phosphoproteome profiling has demonstrated great utility for the analysis of clinical specimens. One barrier to the broad clinical application of proteomic profiling is the large amount of biological material required, particularly for phosphoproteomics—currently on the order of 25 mg wet tissue weight. For hematopoietic cancers such as acute myeloid leukemia (AML), the sample requirement is ≥10 million peripheral blood mononuclear cells (PBMCs). Across large study cohorts, this requirement will exceed what is obtainable for many individual patients/time points. For this reason, we were interested in the impact of differential peptide loading across multiplex channels on proteomic data quality. To achieve this, we tested a range of channel loading amounts (approximately the material obtainable from 5E5, 1E6, 2.5E6, 5E6, and 1E7 AML patient cells) to assess proteome coverage, quantification precision, and peptide/phosphopeptide detection in experiments utilizing isobaric tandem mass tag (TMT) labeling. As expected, fewer missing values were observed in TMT channels with higher peptide loading amounts compared to lower loadings. Moreover, channels with a lower loading have greater quantitative variability than channels with higher loadings. A statistical analysis showed that decreased loading amounts result in an increase in the type I error rate. We then examined the impact of differential loading on the detection of known differences between distinct AML cell lines. Similar patterns of increased data missingness and higher quantitative variability were observed as loading was decreased resulting in fewer statistical differences; however, we found good agreement in features identified as differential, demonstrating the value of this approach.
Keywords: TMT, isobaric labeling, clinical proteomics, phosphoproteomics, acute myeloid leukemia, differential loading
Introduction
Mass spectrometry-based proteomic profiling has become a powerful tool for broad quantification of proteins and their post-translational modifications in cancer research.1−6 Due to extensive efforts in the field to benchmark and standardize complex workflows, particularly those of the Clinical Proteomics Tumor Analysis Consortium (CPTAC), proteomics is now being more broadly utilized for clinical research.7,8 Reliable and reproducible quantification of >10 000 proteins and >30 000 phosphosites is now routinely attainable from numerous mammalian tissue types.7 The integration of a deep-scale proteomic analysis of human tumors with genomic data has been shown to improve specificity for identifying pathway alterations caused by tumor associated mutations. Furthermore, phosphoproteome measurements provide information on pathway activation not discernible from genetic measurements and thus offer unique insights into potential therapeutic targets.9−14
A substantial challenge in the field of clinical proteomics is obtaining the quantity of protein necessary for deep coverage of the proteome, as clinical specimens are often limited in size and available material. This challenge is particularly acute in the case of phosphoproteomics, which requires 100-fold more starting material than global profiling, due to the lower number of peptides with potential phosphosites and the low percentage of phosphorylation on those sites. The most widely utilized workflow for clinical proteomics studies seeking to achieve deep quantitative proteomic and phosphoproteomic measurements employs a tandem mass tag (TMT) isobaric labeling approach and employs an enrichment step to increase phosphopeptide specificity.15−17 In settings where the deepest achievable coverage of the phosphoproteome is a priority, it is recommended to use on the order of 300–400 μg of peptides in each TMT channel. While high-quality data and coverage of the global proteome can be achieved with significantly less material, phosphoproteome coverage is negatively impacted or requires specialized methods to recover.18,19 This creates a key challenge relating to variability in the amount of protein available per patient, due to external variables in the study that impact sample size availability. In these cases, the investigators need to decide whether to exclude sample-limited patients, to reduce the protein loading per patient for the study, or to include that individual patient at reduced protein loading.
In a data-dependent LC-MS/MS experiment, peptide identification is achieved by fragmenting tryptic peptides followed by database searching to obtain sequence information, which is then matched to the parent protein. To obtain confident peptide identifications, a sufficient number of peptide ions are needed to generate high-quality MS/MS spectra. A notable advantage of the TMT approach is that many patient samples can be combined to increase the available peptide amount, allowing the identification of low-abundance species, particularly when coupled with 2-dimensional LC separation.20 Recently, a number of groups have demonstrated a modified TMT labeling scheme that utilizes a “boosting” or “carrier” channel that increases the sensitivity for sample-limited samples.21−24 In these approaches, one or more TMT channels are used to label a larger representative sample that is then mixed with the patient samples for analysis. The high abundance of peptides in the boosting channel triggers MS/MS selection and provides the ion flux necessary for quality MS/MS spectra and confident peptide identification, while the reporter ions provide quantitative information on the samples of interest. While this approach enables profound gains in sensitivity for both global23 and phosphoproteomics24 workflows, the increased dynamic range of peptide concentration induced by the boosting sample(s) results in compromised quantification with issues such as reduced measurement precision and an increase in missing values.24,25 Further complicating the quantitative precision of TMT experiments is the concept of compositional data—as mass spectrometry measurements are made on a constrained number of ions allowed into the instrument, increasing the proportion of one component will impact the observable amount from other components.26 This issue will be exacerbated in settings where there are significant differences in the quantity of peptides loaded per channel; however, the boundaries for acceptable variability between samples remain unclear.
In this study, we set out to interrogate two common challenges of clinical proteomics experiments: the limited amount of protein available from patient samples of various sizes and the impact of differential channel loading on global and phosphoproteomics results using a well-established clinical workflow. The overarching goal of this study is to investigate the data quality trade-offs incurred when including patients with limited biomaterial in clinical proteomics investigations. First, we determined protein yields from patient samples of varying cell count to better define the range of possible peptide loadings we might expect to encounter in executing a clinical proteomics study. We then used these results to guide the design of two separate experiments using TMT 11-plexes: first, to evaluate the impact of loading differential amounts of peptide from the same biological sample across channels; next, to determine how differential peptide loading affects our ability to detect true differences between distinct acute myeloid leukemia (AML) cell lines. The results were examined for protein/phosphosite coverage and various aspects of quantitative reproducibility. Our results highlight the protein yields achievable from representative AML samples and demonstrate a thorough examination of the impacts of differential channel loading to provide researchers with a resource to make informed decisions concerning their study design.
Methods
Cell Counting
The clinical specimen used for this study was collected with informed consent from the patient according to a protocol approved by the Oregon Health & Science University institutional review board (IRB 4422; NCT01728402). Three cell pellets, each containing approximately 20 million peripheral blood mononuclear cells (PBMCs) isolated from a single deidentified patient, were combined into a single sterile 1.5 mL microcentrifuge tube. Each pellet was transferred to the sterile tube with 250 μL of 1× phosphate buffered saline (PBS). The original pellet tubes were rinsed with an additional 250 μL of 1× PBS that was quantitatively transferred to the pooled sample tube for a final volume of 500 μL. A 20× dilution stock was created to determine cell concentration and live cell count utilizing an Invitrogen Countess II FL cell counter and Invitrogen Countess disposable slides. The dilute cells were combined 1:1 with Trypan blue, and 20 μL was loaded on the slide and read four times; the values were then averaged. Once the total cell concentration was determined, a concentration series was created with three replicates each of 1E7, 5E6, 1E6, 5E5, 1E5, 5E4, and 1E4 cells.
Cell Lysis and Protein Extraction
Fresh lysis buffer was prepared, containing 8 M urea (Sigma-Aldrich), 50 mM Tris pH 8.0, 75 mM sodium chloride, 1 mM ethylenediamine tetra-acetic acid, 2 μg/mL aprotinin (Sigma-Aldrich), 10 μg/mL leupeptin (Roche), 1 mM PMSF in EtOH, 10 mM sodium fluoride, 100 μL of phosphatase inhibitor cocktail 2 and 3 (Sigma-Aldrich), 20 μM PUGNAc, and 0.01 U/μL Benzonase. Lysis buffer was added to samples based on cell concentration: 200 μL for 1E7 and 5E6 cells, 40 μL for 1E6 cells, 20 μL for 5E5 cells, 4 μL for 1E5 cells, and 1 μL for 5E4 and 1E4 cells. Once the lysis buffer was added, the samples were vortexed for 10 s and then placed in thermomixer for 15 min at 4 °C and 800 rpm. To ensure cell lysis, samples were vortexed for an additional 10 s and incubated again for 15 min utilizing the same settings. After incubation, the samples were centrifuged for 10 min at 4 °C and 18 000 rcf to remove cell debris. Due to the viscous nature of the samples, additional lysis buffer to double the initial volume was added to all samples. Samples were then incubated again twice for 15 min at 25 °C and 500 rpm, vortexing samples between incubations and then centrifuging to remove cell debris. A single 5× dilution BCA (ThermoFisher) was performed on the supernatant to determine the protein yields for the varying cell concentrations. After examining the results of the protein yields, all of the samples were mixed into a large protein pool and concentrated with a 10k spin filter for use in downstream sample prep and MS analysis. A final 10× dilution BCA (ThermoFisher) was performed on the supernatant to determine final protein yield for digestion. Protein extraction from MOLM-14 and K652 human AML cells (1E8 of each cell type) was performed in a similar fashion in a total volume of 2 mL of lysis buffer.
Protein Digest
The pooled protein samples (separate pools from patient cells, MOLM-14 cells, or K652 cells) were diluted to a concentration of 8 μg/μL total protein with 50 mM Tris, pH 8.0, before reducing the samples with 5 mM dithiothreitol (DTT) (Sigma-Aldrich) for 1 h in a thermomixer set to 37 °C and 800 rpm. Reduced cystine residues were alkylated with 10 mM iodoacetamide (IAA) (Sigma-Aldrich) for 45 min in a thermomixer set to 25 °C and 800 rpm in the dark. The samples were diluted 5-fold with 50 mM Tris-HCl, pH = 8.0, and then initially digested with Lys-C (Wako) at a 1:50 enzyme:substrate ratio for 2 h in a thermomixer set to 25 °C and 800 rpm. Following the initial digest, trypsin (Promega) was added at a 1:50 enzyme:substrate ratio, followed by a 14 h incubation in a thermomixer set to 25 °C and 800 rpm. The digestions were quenched by acidifying the solution to 1% formic acid (FA), and the samples were centrifuged for 15 min at 1500 rcf to remove any remaining cell debris. Peptides were desalted using C18 solid phase extraction (SPE) cartridges (Waters Sep-Pak).
TMT Labeling
The concentrations of the pooled peptides from patient PBMCs or human AML cell lines were determined by BCA assay, and peptides were aliquoted in discrete amounts based on the experimental designs, ranging from 20 to 400 μg per channel for TMT labeling (Thermo Fisher). After drying the peptides down in a speed-vac, each sample was reconstituted with 50 mM HEPES, pH 8.5, to a concentration of 5 μg/μL. Each isobaric tag aliquot was dissolved in 40 μL of anhydrous acetonitrile to a final concentration of 20 μg/μL. The tag was added to the samples at a 1:1 μg/μg peptide:label ratio,27 incubated in a thermomixer for 1 h at 25 °C and 400 rpm, and then diluted to 2.5 μg/μL with 50 mM HEPES pH 8.5, 20% acetonitrile (ACN). Finally, the reactions were quenched with 5% hydroxylamine and incubated on the thermomixer for 15 min at 25 °C and 400 rpm. The samples for each multiplex set were then combined and concentrated in a speed-vac before a final C18 SPE cleanup. Each 11-plex experiment was fractionated into 96 fractions by high-pH reversed phase separation using a 3.5 μm Agilent Zorbax 300 Extend-C18 column (4.6 mm ID × 250 mm length). Peptides were loaded onto the column in buffer A [4.5 mM ammonium formate (pH 10) in 2% (v/v) acetonitrile] and eluted off the column using a gradient of buffer B [4.5 mM ammonium formate (pH 10) in 90% (v/v) acetonitrile], described in more detail below, for 96 min at a flow rate of 1 mL/min. After fractionation, samples were concatenated into 12 fractions.28
| time interval (min) | gradient (% mobile phase B) |
| 0 | 0 |
| 7 | 0 |
| 13 | 16 |
| 73 | 40 |
| 77 | 44 |
| 82 | 60 |
| 96 | 60 |
Phosphopeptide Enrichment Using IMAC
A small aliquot (5% volume) of each of the 12 fractions from all multiplex sets was removed and vialed at 0.1 μg/μL in 3% ACN and 0.1% FA for an MS analysis of global protein abundance. For phosphopeptide enrichment, the remaining 95% of the 12 fractions were further combined to create six fractions per plex and dried by speed-vac. Fe3+-NTA-agarose beads were freshly prepared for phosphopeptide enrichment using Ni-NTA-agarose beads (Qiagen). Sample peptides were reconstituted to a 0.5 μg/μL concentration with 80% ACN and 0.1% TFA and incubated with 40 μL of the bead suspension for 30 min at room temperature (RT) in a thermomixer set at 800 rpm. After incubation, the beads were washed with 100 μL of 80% ACN and 0.1% TFA and 50 μL of 1% FA to remove any nonspecific binding. Phosphopeptides were eluted off beads with 210 μL of 500 mM K2HPO4, pH 7.0, directly onto C18 stage tips and eluted from C18 material with 60 μL of 50% ACN and 0.1% FA. Samples were dried in a speed-vac concentrator and reconstituted with 12 μL of 3% ACN and 0.1% FA prior to MS analysis.
LC-MS/MS Analysis
The pooled proteomics fractions were separated using a Waters nano-Aquity UPLC system equipped with a homemade 75 μm I.D. × 50 cm length C18 PicoFrit (New Objective) column packed with ReproSil-Pur 120 Å, C18-AQ, 1.9 μm. A 110 min gradient of 100% mobile phase A [0.1% (v/v) formic acid in water] to 60% (v/v) mobile phase B [0.1% (v/v) FA in acetonitrile] was applied to each fraction. The column was equipped with a 20 cm Nanospray column heater (Phoenix S & T). The separation was coupled to an Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher) for MS/MS analysis. MS spectra were collected from 350 to 1800 m/z at an MS1 resolution setting of 60 000, with a maximum injection time of 50 ms and the Orbitrap AGC set to 400 000. The top 20 most intense ions were selected with an isolation width of 0.7 m/z for higher energy collision dissociation (HCD); +1 charged species were excluded, and the dynamic exclusion window was set at 45 s. MS2 spectra were acquired at a mass resolution of 50 000, with a maximum injection time of 105 ms and the Orbitrap AGC set to 100 000. Phosphoproteomics fractions were separated as described above, with the LC gradient length extended to 200 min for each fraction. The separation was coupled to the same Lumos mass spectrometer with the same acquisition method.
Proteomics Data Processing
The obtained MS/MS spectra were searched by the MS-GF+ tool,29 against the UniProt human database (downloaded in October 2018) for peptide sequence identification. Carbamidomethylation on cysteine residues and TMT-11 modifications on lysine residues and the N termini were set as fixed modifications, with oxidation on methionine residues as a dynamic modification. For a determination of TMT labeling efficiency, a separate MS-GF+ search was performed with TMT-11 modifications set as a dynamic modification, and the numbers of identified peptides with and without TMT-11 modifications were used to calculate labeling efficiency. For phosphoproteomics, phosphorylation on serine, threonine, and tyrosine residues was set as dynamic modifications. Localization of phosphorylation modifications was performed using the Ascore algorithm.30 A target-decoy approach was used to control false discovery. Criteria for filtering peptide spectrum matches (PSMs) include precursor mass tolerance as 10 ppm and PepQvalue <0.01. In both data sets, this resulted in a less than 1% FDR at the unique peptide level. For identified peptides, the TMT reporter ion intensities were extracted by MASIC31 with the following filtering thresholds: signal-to-noise ratio = 0; interference score = 0.9. Data from all fractions of a multiplex (12 fractions for global abundance and 6 fractions for phosphopeptide abundance per plex) were aggregated based on common peptides or proteins (global proteomics) or phosphopeptides (phosphoproteomics). Data aggregation and processing was performed with R Studio software-based tools developed by our team and available via Github (https://github.com/vladpetyuk).
Statistical Analysis
Quantification reproducibility was assessed by calculating the percent coefficient of variation (%CV) among replicates of each loading group using TMT reporter ion intensities that had been aggregated to the peptide or protein level and global median shift-normalized. For downstream analysis, raw reporter ion intensities were aggregated and log2-transformed, and data for each sample within a TMT set were divided by the universal reference channel of that plex (designated as channel 131C). Within each sample, the central tendency method was used for data normalization for the global data based on medians.32 For phosphoproteomics, each TMT channel was normalized based on the coefficients derived from global data.33 Principal component analysis (PCA) was applied to demonstrate the clustering of the different loading groups. Unequal variance t tests were performed to determine statistical differences between the individual loading groups. Statistical results were adjusted for multiple hypothesis testing using the Benjamini–Hochberg procedure, and an adjusted p-value <0.05 was considered as the statistical significance cutoff. Data analysis, statistical tests, and visualizations were implemented in R language for statistical computing using the following packages: dplyr, reshape2, vp.misc, tibble, tidyverse, ggplot2, and ComplexHeatmap.
Results
Experimental Design, Protein Yield, And Peptide Identifications
In clinical proteomic studies, biospecimen availability can be a constraining factor depending on the tissue of interest being analyzed. In this work, we used peripheral blood mononuclear cells (PBMCs) isolated from AML patients as a model and followed a modified version of the CPTAC TMT-based clinical proteomics pipeline.7 An illustration of the general workflow used to process the samples, from cell lysis through LC-MS/MS analysis, is presented in Figure 1. To determine the amount of protein obtainable from patient samples of varying sizes and volumes, we set up three replicates each of AML patient cells representing 1E7, 5E6, 1E6, 5E5, 1E5, 5E4, and 1E4 total cell counts. Samples were lysed in proportional amounts of lysis buffer; the protein concentration was determined by BCA assay, and total protein yield was calculated. As shown in Figure 2, obtaining 400 μg of total protein—an amount commonly used in clinical experiments across the proteomics community—required greater than 10 million cells (Figure 2A). Lower numbers of cells resulted in a linear decrease in protein yields, with less than 1 μg of total protein extracted from smaller amounts of cells (5E4 and 1E4 cells). While these lower protein yields are substantially less than what is preferred for multiplexed proteomic studies, the sample sizes represent total amounts of biomaterial isolated in real-world clinical settings where available material is constrained. In these cases, researchers may face difficult decisions about the inclusion of samples in experimental designs; thus, we set out to evaluate the impact of lower and varying protein amounts on LC-MS/MS peptide and protein identification.
Figure 1.
Clinical proteomics workflow. Diagram illustrating the steps involved in sample processing and data acquisition for our clinical proteomics workflow.
Figure 2.
Design of TMT multiplexes with differential peptide loading. (A) Protein extraction yields obtained from cell pellets of decreasing cell counts. (B) Design of the two TMT11 multiplexes with differential amounts of peptide loaded per channel. (C) Total TMT reporter ion intensity obtained per channel from global proteomics data sets. TMT labeling efficiency was determined to be >99% for each plex and is reported in the plot headers. (D) Relationship between the amount of peptide loaded per channel and the median TMT reporter ion intensities acquired.
As technical variability in proteomics workflows (i.e., protein extraction and digestion) is unavoidable—and likely amplified when sample size, protein amount, and buffer volumes differ—we opted to pool all protein extracted from the different cell pellets and carry out digestion in a single reaction (Figure 1). From this homogeneous pool of starting material, we generated aliquots containing either 400 μg (a conventional amount loaded into each channel of clinical TMT proteomics experiments), 200 μg, 100 μg, 40 μg, or 20 μg of total peptides to assess the effect of variable peptide amount. We then prepared 2 sets of samples for labeling with TMT 11-plex reagents, with each set containing 2 replicates of each peptide aliquot distributed randomly through the first 10 TMT channels (Figure 2B). An additional aliquot of 400 μg of peptides was included in each plex for use as a universal reference and was assigned to the 131C channel. This experimental design—standardizing the source of peptides in each channel from a large pool homogenized after digest and clean up—eliminates several sample handling-related sources of variability and helps ensure that differences detected in downstream data processing and analysis are based on peptide loading. Following TMT labeling, the samples were mixed into their respective multiplexes, desalted by C18 SPE, fractionated by high-pH reverse-phase HPLC, and concatenated into 12 total fractions. An aliquot of each fraction was removed for global proteomics analysis, and the remaining material was further concatenated into 6 fractions and underwent immobilized metal affinity chromatography (IMAC) phosphopeptide enrichment. Global and phosphopeptide-enriched samples were analyzed by LC-MS/MS, and data were used to evaluate reporter ion intensities for each TMT channel as well as calculate total numbers of peptide and protein identifications.
Figure 2C displays the overall labeling efficiency calculated in each plex as well as the total reporter ion intensities acquired for each channel across both TMT11 multiplexes. In general, channels with equivalent peptide loadings showed consistent reporter ion intensities within and across the 2 plexes (Figure 2C). Furthermore, the median intensities for different channels increase linearly with the amount of peptide loaded per channel, demonstrating the general quantitative information achievable through the utilization of the TMT methodology (Figure 2D). Data from global and phospho-enriched fractions were used to determine the number of unique peptides and proteins identified from these samples (Table 1). Additionally, Table 1 displays the number of peptides/proteins from global and phosphoproteomic data sets that are quantified in 25%, 33%, 50%, and 100% of samples—cut-offs commonly applied to proteomics data sets prior to statistical analysis.
Table 1. Peptide and Protein Identificationsa.
| global proteomics | phosphoproteomics | |
|---|---|---|
| unique peptide identifications (total) | 138 373 | 27 351 |
| unique peptides quantified in >25% of samples | 137 302 (99.2%) | 26 753 (97.8%) |
| unique peptides quantified in >33% of samples | 134 399 (97.1%) | 24 230 (88.6%) |
| unique peptides quantified in >50% of samples | 118 418 (85.6%) | 17 409 (63.7%) |
| unique peptides quantified in 100% of samples | 45 925 (33.2%) | 3 366 (12.3%) |
| unique protein identifications (total) | 8926 | NA |
| unique proteins quantified in >25% of samples | 8910 (99.8%) | NA |
| unique proteins quantified in >33% of samples | 8887 (99.6%) | NA |
| unique proteins quantified in >50% of samples | 8722 (97.7%) | NA |
| unique proteins quantified in 100% of samples | 7641 (85.6%) | NA |
Number of unique peptides and proteins identified from global proteomics data sets and phosphoproteomics data sets across the two experimental TMT11 multiplexes. Unique peptide counts were filtered based on presence in >25%, 33%, 50%, and 100% of sample channels, and percentages displayed represent the fraction of overall unique peptide identifications that pass the filtering criteria.
Effects of Differential Peptide Loading on Missing Data
While our results demonstrate that reporter ion intensities correlate strongly with peptide loadings, and multiplex experiments with differentially loaded channels yield good proteome coverage, a larger question remains regarding the impact of differential loading on quantitative data reproducibility and reliability. In multiplex proteomic experiments, missing data pose a significant challenge, especially when comparing across multiple TMT plexes.25 Indeed, when evaluating these data sets, a clear trend of increasing missingness was evident in the data as peptide loading amounts decreased (Figure 3A–C). In general, this issue was more pronounced in the phosphoproteomic data sets (Figure 3C), as the differences in samples were likely exacerbated by the phosphopeptide enrichment protocol. The effects of missing data in global proteomics data sets can be largely mitigated by rolling peptide identifications/quantifications up to the protein level (Figure 3B); however, in cases where comparisons are to be made between individual peptide intensities (i.e., phosphoproteomic data sets), missing data can have tremendous implications on downstream statistical analysis. A comparison of the peptides identified in all replicates of each loading group illustrates that as channel loading decreases to 40 μg or 20 μg, we begin to see increasing numbers of peptides that are not quantified in these samples (Figure 3D,E).
Figure 3.
Missing data increase in channels with lower peptide loadings. Percentage of the total identified features [peptides in global proteomics (A), proteins in global proteomics (B), or phosphopeptides in phosphoproteomics (C)] that are quantified in the 4 replicates of each peptide loading group. (D, E) Upset plots demonstrating the overlap of peptides that were quantified (in all four replicates) of each differential peptide loading group. (F–H) Comparison of rates of missing data across channels of TMT plexes loaded with differential amounts of peptide (differential loading) vs loaded with equal peptide amounts (standard loading). Missing data were evaluated at the peptide level in phosphoproteomics data sets (F), or the peptide level (G) or protein level (H) in global proteomics data sets.
We then sought to compare the levels of missing data in these differentially loaded TMT plexes with those that might arise in a standard TMT experiment where all channels contain equivalent peptide loadings. To this end, we analyzed data generated in our laboratory from an experiment using two TMT11 multiplexes where all channels were loaded with 400 μg of peptides derived from similar biological material (human AML cell lines), processed with the same sample preparation protocols, fractionated into 12 global fractions and 6 phospho fractions per plex, and analyzed on the same instrument with the same acquisition settings (these data are deposited on the MassIVE repository under the same accession as the data from the differential loading experiment). This comparison clearly shows that differential peptide loading results in higher levels of missing data within multiplexes: on average, only 36% of phosphopeptides were quantified in all 10 channels, and only 83% of phosphopeptides were quantified in more than 6 channels (Figure 3F). These rates of missing data are significantly higher than those seen in the standard loading experiment—on average, greater than 95% of phosphopeptides are observed in all channels, and over 99% are observed in more than 6 channels (Figure 3F). While the issue of missing data is more apparent in phosphoproteomics measurements likely due to the low abundance of enriched phosphopeptides, the problem still exists in global proteomics. At the peptide level, only 76% of observations were quantified in all channels of either multiplex, while 96% of observations were quantified in more than 6 channels (Figure 3G). Global proteomics measurements benefit from the aggregation of data to the protein level; when evaluating quantification at the protein level, 96% of observations have values in all channels of either plex (Figure 3H). Again, these values are lower than standard, equally loaded TMT experiments, where ∼99% of peptides and proteins are typically observed in all channels (Figure 3G,H). In all cases, the higher levels of missing data occur in channels with lower peptide loading, which we attribute to the reduced signal-to-noise ratio for these channels (Figure S2).
Statistical Impacts of Differential Channel Loading
Before making any comparisons across differential loading groups, data for each sample were normalized by the central tendency method based on median values,32,33 a standard approach in proteomics data analysis that accounts for technical variations between samples (Figure S1). Following median normalization, a principal component analysis (PCA) of both global and phosphoproteomic data sets indicates that peptide loading influences data quantification at a certain threshold: while 400, 200, and 100 μg samples all group reasonably close to one another postnormalization, samples with 40 or 20 μg of peptides drift away from the other samples and show more variation within the replicates (Figure 4A,B). Additionally, the reproducibility within each loading group decreases as a function of peptide quantity, demonstrated by plotting the percent coefficient of variation calculated from raw TMT reporter ion intensities among replicates in both global and phosphoproteomic data sets (Figure 4A,B). These data indicate that as the amount of peptide loaded per channel decreases, the precision of the reporter ion intensity measurement decreases. In settings where comparisons are to be made between channels (i.e., when comparing patient samples), large variations in the amount of material loaded may impair the ability to discern statistically relevant biological differences.
Figure 4.
Impact of loading quantity on data reproducibility and statistics. (A, B) Visualization of intragroup data reproducibility through principal component analysis (PCA) and coefficients of variation (%CV) plots calculated across the four replicates of each loading group. (C, D) Density plots of p-value histograms resulting from unequal variance t tests comparing individual loading groups with the 400 μg standard loading amount at the protein level from global proteomics data sets or peptide level from phosphoproteomics data sets.
While a preliminary visualization by principal component analysis (PCA) plots suggests that samples with lower peptide loading cluster less tightly than samples with higher loadings, we sought to gain a better understanding of the effects of differential channel loading on statistical data analysis. Based on the unequal variances detected among the loading groups, we used unequal variance t tests to compare each sample loading group to the 400 μg sample group. As samples were all derived from a common pool of peptide digest, we employ the assumption that there should be no statistically significant differences between peptide loading levels. In both global and phosphoproteomic data sets, we observe more differences from the 400 μg sample group as peptide loading decreases. While few proteins or peptides remain significantly different after a correction for multiple hypothesis testing (defined as Benjamini–Hochberg adjusted p-value <0.05), p-value histograms when comparing the 20 μg samples or 40 μg samples with the 400 μg sample group show a more anticonservative distribution suggesting larger quantitative differences (Figure 4C,D). Combined, these data demonstrate that using standard data normalization methods, 4-fold differences in channel loading can be effectively corrected and not have significant impacts on quantification precision, while more drastic differences in channel loading (i.e., 10-fold or 20-fold) result in quantitative challenges that have the potential to lead to the detection of statistical differences that are not truly represented by the sample set (i.e., false-positives).
Impact of Differential Channel Loading on the Detection of Known Differences
While false-positives detected across replicates of peptides from the same biological material are problematic, real-world clinical proteomics multiplex experiments are not designed with an identical biological sample in every channel. Rather, the goal is to detect true differences between patients. A concern in this setting is the inability to detect these differences (i.e., false-negatives) due to decreased data precision and quantitative reproducibility resulting from differential channel loading. To investigate this, we designed a new experiment comparing two distinct human AML cell lines (MOLM-14 and K652). We sought to evaluate the differences in the proteome and phosphoproteome of these cell lines when comparing channels loaded in equal amounts (400 vs 400 μg) or differentially (400 vs 100 μg; 400 vs 40 μg; 400 vs 20 μg). Peptides from MOLM-14 cells were loaded at the standard amount of 400 μg per channel, while peptides from K652 cells were loaded at differential amounts ranging from 20 to 400 μg per channel (Figure S3). After TMT-labeling, samples were processed in an identical manner to the previous experiment and analyzed on the same LC-MS system with identical instrument settings (with 12 global proteomic fractions and 6 phosphoproteomic fractions per multiplex). An analysis of the collected data sets was performed in the same manner as prior data sets, and these data were used to evaluate the differences between the cell types at the protein, peptide, or phosphopeptide level and determine the impact of loading differential peptide amounts on detecting these differences.
We first compare global protein abundance between the cell types using t tests and controlling for FDR by a BH p-value adjustment. We treated each set of K652 loading replicates (400, 100, 40, or 20 μg) as a distinct group with which to compare the MOLM14 replicates (loaded with 400 μg per channel). Overlaying histograms of the adjusted p-value distributions for each comparison show a notable loss of features with statistical significance as the K652 peptide loading amounts decrease (Figure 5A). As displayed in Figure 5B, major differences between the cell types are detectable when comparing the channels loaded with 400 μg of peptides: of the 7522 proteins quantified, 5457 were statistically significant with an adjusted p-value <0.05. When comparing 400 μg MOLM-14 channels to those loaded with less K652 peptides, the ability to detect differences decreases as a function of peptide loading. While a 1:4 ratio retains most of the differences—4863 statistically significant proteins—ratios of 1:10 and 1:20 show dramatic decreases in statistical significance, capturing only 2666 and 763 statistical differences, respectively. Additionally, as we evaluate the fold change values calculated from the different comparisons, we see that results from unequal peptide loading comparisons correlate less well with results from the 400 μg comparison (Figure 5C–E). Again, we see that the impact of decreasing the loading by a factor of 4 is relatively minor, as fold change values are fairly consistent (adjusted R2 = 0.97, slope = 0.89, Figure 5C). However, 1:10 and 1:20 loading differences skew the observed fold changes compared to results from equal 400 μg loadings—resulting in decreased R2 values and lower slopes of the linear regressions (Figure 5D,E). The decreased slopes are indicative of ratio compression in the lower channel loadings, resulting in artificially lower fold change values in general.
Figure 5.
Differential loading affects the detection of true biological differences. (A) Density plots of adjusted p-values from comparisons of global protein abundances in MOLM-14 and K652 cells with the indicated peptide loadings. (B) Upset plots comparing the statistically significant protein-level results from each of the indicated MOLM-14 vs K652 comparisons. (C–E) Correlation of the calculated fold change for each protein when comparing various K652 peptide loadings with 400 μg of MOLM-14 peptides. (F) Density plots of adjusted p-values from comparisons of phosphopeptide abundances in MOLM-14 and K652 cells with the indicated peptide loadings. (G) Upset plots comparing the statistically significant phosphopeptide-level results from each of the indicated MOLM-14 vs K652 comparisons. (H–J) Correlation of the calculated fold change for each phosphopeptide when comparing various K652 peptide loadings with 400 μg of MOLM-14 peptides.
The effects of differential loading in this experiment are again more pronounced at the phosphopeptide level: in this setting, even a 1:4 reduction of the K652 peptide quantity results in the loss of over 40% of statistically significant results (Figure 5F,G). While the majority of the significant phosphopeptides from this comparison are also statistically significant when comparing 400 μg loadings of the cell types, approximately 14% of the significant results are unique, suggesting that false-positives are more prominent in the phosphoproteomics data (Figure 5G). Further reduction of the loading amount to 10- or 20-fold less has an even larger impact, with nearly 90% of statistical differences not being detected in the 400 vs 20 μg comparison (Figure 5G). As seen with the global protein abundance data, the resulting fold changes from each comparison are less consistent (and generally compressed) as the K652 loading amount decreases (Figure 5H–J). Together, these data demonstrate the negative impact of differential peptide loading on the ability of researchers to detect true biological differences between samples and again suggest that differences greater than 4-fold should be carefully weighed when designing experiments.
Discussion
In this study, we explored the practical limits for loading differential amounts of peptides across channels in TMT11 multiplex experiments using multiple experimental approaches. First, biologically identical material derived from a common peptide pool generated from the white blood cells of a single patient was analyzed in aliquots ranging from an amount commonly used for clinical proteomics experiments—400 μg of peptide per TMT channel—down to 20 μg per channel (representing 20-fold less peptide). This experimental design allowed for isolation of the effect of peptide loading quantity and eliminated much of the variation that would be introduced during sample processing steps—particularly, related to the processing of samples of varying quantity/concentration—and this should be taken into consideration when interpreting results. Samples were randomly divided across 2 TMT11 multiplexes and processed for LC-MS/MS-based global proteomics and phosphoproteomics measurements using a standard clinical proteomics protocol developed under the CPTAC. An analysis of the data generated from this work aimed to address two aspects of concern for quantitative proteomics experiments: missing data and data reproducibility.
It is readily clear from our results that peptide quantity and missing data are inversely correlated at the channel level—that is, as lower amounts of peptide are labeled in a channel, the number of features for which no quantitative information is obtained increases. While it is well-established that missing data are a major challenge in proteomics analyses,34,35 this typically only arises when comparing data acquired across multiple TMT sets, and there are generally very low rates of missingness within a single TMT plex when all channels contain equivalent amounts of labeled peptide.36 Our data illustrate an exacerbation of the cross-plex missing data problem, with more than 50% of phosphopeptide identifications failing to be quantified in channels loaded with only 20 μg, as well as a significant increase in the levels of missing data within a single plex (Figure 3). In addition to higher levels of missing data, TMT channels loaded with less peptide also displayed increased variation among the replicates. Coefficient of variation values steadily increased as peptide loadings were reduced, with noticeable differences present in the 40 and 20 μg samples, particularly in phosphoproteomics data. Additionally, PCA analysis illustrated that samples in the 40 and 20 μg groups began to separate away from the 400, 200, and 100 μg samples, which all clustered tightly together. Statistical testing of each different loading group to the 400 μg group revealed differences in the 40 and 20 μg groups, and while most differences are not statistically significant following a multiple hypothesis testing adjustment, distributions of p-values from these comparisons are significantly weighted toward lower p-values indicative of discrepancies in quantitation. Together, these data suggest that 10-fold or greater sample loading differences lead to increased variation and negative effects on TMT measurement precision that will likely impact the confidence in statistical interpretations of the samples. Importantly, it should be noted that this work was performed using high-resolution MS2 data acquisition; while other acquisition methods such as SPS-MS3 and RTS-MS3 may allow for more variability in sample loading amounts, these were not evaluated in this study.
The impacts of differential peptide loading on missing data and quantitative reproducibility have a compounding effect when one is interested in comparing samples. First, the amount of missing data present in samples with lower loadings reduces the number of observations for which a statistical analysis can be confidently performed. Second, among the features that have enough observations for statistical testing, quantitation is negatively impacted in samples with lower peptide loadings—likely due to signals closer to the noise level of the mass spectrometer (Figure S2), leading to increased variation when compared with higher loading groups. Importantly, the samples in this experiment were all derived from a common biological source; in true clinical studies, negative impacts on reproducibility will increase quantitative variability and reduce the statistical power, hindering the ability to confidently detect differences between patients, tissue types, or normal vs diseased samples.33,37 Furthermore, as true clinical samples are processed individually in the laboratory, differences in protein yields will likely increase the variation among samples and thus decrease the range of loading differences that are compatible with TMT multiplexing.
We next designed an experiment to more closely mimic and evaluate the impacts of differential loading on a true clinical study—using peptides from distinct AML cell lines loaded with varying amounts and looking for differences in the global proteome and phosphoproteome. Consistent with our first experiment, channels with lower peptide loadings had increased missing values and quantitative variability. When comparing equal 400 μg loadings of MOLM-14 and K652 cells, differences were readily detectable at both the global protein and phosphopeptide level (approximately 73% and 31% of detected features statistically different with BH-adjusted p-value <0.05, respectively). The more we lowered the amount of K652 peptides loaded, the less we were able to detect these differences—at the 1:20 ratio (comparing 20 μg loadings of K652 peptides with 400 μg loadings of MOLM-14 peptides), only 10–15% of these differences were detected as significant (Figure 5). Indeed, this appears to be a combination of increased data missingness and variability: many features were not quantified in channels loaded with less peptides, and those that were had less reproducible measurements. Additionally, we observe an increase in ratio compression as channel loading decreases, as the calculated fold changes for proteins or phosphopeptides generally decrease as loading differences become more extreme.
To further illustrate the impact differential loading can have on clinical proteomics experiments, we calculated and plotted the standard deviations for each phosphopeptide when using only the samples loaded with 400 μg of peptide, or when combining any of the other loading groups with the 400 μg samples. As lower loading groups are combined with the 400 μg samples, standard deviation measurements increase (Figure S4A). Using the mean standard deviation for each sample set, we estimated the sample size per group necessary to measure fold changes ranging from 1.2- to 2-fold with statistical significance (Figure S4B). While larger fold changes are detectable in any sample set with a reasonable number of patients per group (<2 patients per group for equal 400 μg loadings; 7.8 patients per group with 20-fold loading differences), smaller fold changes (1.2- and 1.4-fold) require a dramatic increase in the number of patients per group to detect statistical significance when combining the more variable 20 μg loading group with the 400 μg loading group. Compared to equal 400 μg loading, where only 4.8 or 2.4 patients per group are required to detect fold changes of 1.2 and 1.4, respectively, combining 400 and 20 μg loadings increases the necessary sample sizes to 100 and 29 patients per group. From a clinical standpoint, obtaining and analyzing samples from this number of subjects can be a major challenge.
A reproducible and in-depth proteomic analysis of samples smaller than 500 000 cells requires significant improvements in sensitivity over standard approaches. The necessary technological improvements are actively being pursued in our lab24,38−41 and others23,42−44 with great success in recent years. However, much more work is needed to make these technologies available to less specialized laboratories.
Conclusions
From this work, we conclude that, when designing TMT multiplex experiments, researchers should carefully consider the potential impacts on data quality of loading differential amounts of peptides across channels. Although minimal, decreases in signal-to-noise and increased variance can be observed for even 2-fold loading differences. Further, even with 20-fold differential peptide loading, true significant differences can accurately be detected. Using these experimental data as a guide, it is possible to minimize the effects of 4-fold peptide loading differences on downstream statistical analysis, when using standard instrument parameters and common data normalization techniques such as global median-centering. In cases where larger differences in loading are desired, researchers should consider adjusting instrument parameters, such as increasing AGC and maximum injection time, to accommodate the reduced signal-to-noise.45,46 It is important to note that the instrument parameters chosen in this study were optimized for protein coverage, and increasing AGC and injection time will also result in trade-offs: importantly, reduced protein coverage. Although we would recommend maintaining loading differences of 4-fold or less based on these results, researchers can use these data to independently assess the trade-offs associated with including sample-limited patients in their study design. For example, in cases where only a small number of patients have limited material and n is large, it may be most prudent to leave out these patients to preserve the highest-quality data. Conversely, the use of sample-limited patients may be required to create a properly balanced study design, and the researcher needs to account for loss of statistical power resulting from increased variance and data missingness.
Acknowledgments
Parts of this research were performed using EMSL, a national scientific user facility sponsored by the Department of Energy’s Office of Biological and Environmental Research and located at PNNL.
Glossary
Abbreviations
- AML
acute myeloid leukemia
- PBMCs
peripheral blood mononuclear cells
- TMT
tandem mass tag
- LC-MS/MS
liquid chromatography tandem mass spectrometry
- PBS
phosphate buffered saline
- DTT
dithiothreitol
- IAA
iodoacetamide
- SPE
solid phase extraction
- IMAC
immobilized metal affinity chromatography
- CPTAC
Clinical Proteomics Tumor Analysis Consortium
- PCA
principal component analysis
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/jasms.1c00169.
Additional figures including boxplots, signal-to-noise ratios, peptide loading in the experimental design, and sample size calculations (PDF)
Accession Codes
Mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium47 via the MassIVE partner repository (http://massive.ucsd.edu) with the data set identifier MSV000086417.
Author Contributions
The study design was conceived by P.D.P., W.-J.Q., T.L., B.J.D., and K.D.R. Patient samples were provided by B.J.D. Patient samples were prepared by C.E.T., J.R.H., and M.A.G. Mass spectrometry analysis was carried out by K.K.W. Data analysis was done by Y.W., J.A.S., T.J.S., V.A.P., and P.D.P. The manuscript was written by J.A.S. and P.D.P., with critical review from K.D.R. and C.E.T.
This work was supported by The Proteogenomic Translational Research Center for Clinical Proteomics Tumor Analysis Consortium, U01 CA214116, from NIH NCI to B.J.D. and K.D.R., U24CA210955 from NIH NCI to T.L., and R01 DK122160 from NIH NIDDK to W.-J.Q.
The authors declare no competing financial interest.
Supplementary Material
References
- Doll S.; Gnad F.; Mann M. The Case for Proteomics and Phospho-Proteomics in Personalized Cancer Medicine. Proteomics: Clin. Appl. 2019, 13 (2), 1800113. 10.1002/prca.201800113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piehowski P. D.; Petyuk V. A.; Sontag R. L.; Gritsenko M. A.; Weitz K. K.; Fillmore T. L.; Moon J.; Makhlouf H.; Chuaqui R. F.; Boja E. S.; Rodriguez H.; Lee J. S. H.; Smith R. D.; Carrick D. M.; Liu T.; Rodland K. D. Residual tissue repositories as a resource for population-based cancer proteomic studies. Clin. Proteomics 2018, 15 (1), 26. 10.1186/s12014-018-9202-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bradshaw R. A.; Hondermarck H.; Rodriguez H. Cancer Proteomics and the Elusive Diagnostic Biomarkers. Proteomics 2019, 19, 1800445. 10.1002/pmic.201800445. [DOI] [PubMed] [Google Scholar]
- Jimenez C. R.; Zhang H.; Kinsinger C. R.; Nice E. C. The cancer proteomic landscape and the HUPO Cancer Proteome Project. Clin. Proteomics 2018, 15 (1), 4. 10.1186/s12014-018-9180-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shah P.; Wang X.; Yang W.; Toghi Eshghi S.; Sun S.; Hoti N.; Chen L.; Yang S.; Pasay J.; Rubin A.; Zhang H. Integrated Proteomic and Glycoproteomic Analyses of Prostate Cancer Cells Reveal Glycoprotein Alteration in Protein Abundance and Glycosylation. Mol. Cell Proteomics 2015, 14 (10), 2753–63. 10.1074/mcp.M115.047928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Y.; Lih T. M.; Yang G.; Chen S. Y.; Chen L.; Chan D. W.; Zhang H.; Li Q. K. An Integrated Workflow for Global, Glyco-, and Phospho-proteomic Analysis of Tumor Tissues. Anal. Chem. 2020, 92 (2), 1842–1849. 10.1021/acs.analchem.9b03753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mertins P.; Tang L. C.; Krug K.; Clark D. J.; Gritsenko M. A.; Chen L.; Clauser K. R.; Clauss T. R.; Shah P.; Gillette M. A.; Petyuk V. A.; Thomas S. N.; Mani D. R.; Mundt F.; Moore R. J.; Hu Y.; Zhao R.; Schnaubelt M.; Keshishian H.; Monroe M. E.; Zhang Z.; Udeshi N. D.; Mani D.; Davies S. R.; Townsend R. R.; Chan D. W.; Smith R. D.; Zhang H.; Liu T.; Carr S. A. Reproducible workflow for multiplexed deep-scale proteome and phosphoproteome analysis of tumor tissues by liquid chromatography–mass spectrometry. Nat. Protoc. 2018, 13 (7), 1632–1661. 10.1038/s41596-018-0006-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mertins P.; Qiao J. W.; Patel J.; Udeshi N. D.; Clauser K. R.; Mani D. R.; Burgess M. W.; Gillette M. A.; Jaffe J. D.; Carr S. A. Integrated proteomic analysis of post-translational modifications by serial enrichment. Nat. Methods 2013, 10, 634. 10.1038/nmeth.2518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mertins P.; Mani D. R.; Ruggles K. V.; Gillette M. A.; Clauser K. R.; Wang P.; Wang X.; Qiao J. W.; Cao S.; Petralia F.; Kawaler E.; Mundt F.; Krug K.; Tu Z.; Lei J. T.; Gatza M. L.; Wilkerson M.; Perou C. M.; Yellapantula V.; Huang K.-l.; Lin C.; McLellan M. D.; Yan P.; Davies S. R.; Townsend R. R.; Skates S. J.; Wang J.; Zhang B.; Kinsinger C. R.; Mesri M.; Rodriguez H.; Ding L.; Paulovich A. G.; Fenyö D.; Ellis M. J.; Carr S. A.; Nci C. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 2016, 534, 55. 10.1038/nature18003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang B.; Wang J.; Wang X.; Zhu J.; Liu Q.; Shi Z.; Chambers M. C.; Zimmerman L. J.; Shaddox K. F.; Kim S.; Davies S. R.; Wang S.; Wang P.; Kinsinger C. R.; Rivers R. C.; Rodriguez H.; Townsend R. R.; Ellis M. J. C.; Carr S. A.; Tabb D. L.; Coffey R. J.; Slebos R. J. C.; Liebler D. C.; Carr S. A.; Gillette M. A.; Klauser K. R.; Kuhn E.; Mani D. R.; Mertins P.; Ketchum K. A.; Paulovich A. G.; Whiteaker J. R.; Edwards N. J.; McGarvey P. B.; Madhavan S.; Wang P.; Chan D.; Pandey A.; Shih I.-M.; Zhang H.; Zhang Z.; Zhu H.; Whiteley G. A.; Skates S. J.; White F. M.; Levine D. A.; Boja E. S.; Kinsinger C. R.; Hiltke T.; Mesri M.; Rivers R. C.; Rodriguez H.; Shaw K. M.; Stein S. E.; Fenyo D.; Liu T.; McDermott J. E.; Payne S. H.; Rodland K. D.; Smith R. D.; Rudnick P.; Snyder M.; Zhao Y.; Chen X.; Ransohoff D. F.; Hoofnagle A. N.; Liebler D. C.; Sanders M. E.; Shi Z.; Slebos R. J. C.; Tabb D. L.; Zhang B.; Zimmerman L. J.; Wang Y.; Davies S. R.; Ding L.; Ellis M. J. C.; Reid Townsend R. Proteogenomic characterization of human colon and rectal cancer. Nature 2014, 513, 382. 10.1038/nature13438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang H.; Liu T.; Zhang Z.; Payne S. H.; Zhang B.; McDermott J. E.; Zhou J.-Y.; Petyuk V. A.; Chen L.; Ray D. Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 2016, 166 (3), 755–765. 10.1016/j.cell.2016.05.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vasaikar S.; Huang C.; Wang X.; Petyuk V. A.; Savage S. R.; Wen B.; Dou Y.; Zhang Y.; Shi Z.; Arshad O. A.; Gritsenko M. A.; Zimmerman L. J.; McDermott J. E.; Clauss T. R.; Moore R. J.; Zhao R.; Monroe M. E.; Wang Y. T.; Chambers M. C.; Slebos R. J. C.; Lau K. S.; Mo Q.; Ding L.; Ellis M.; Thiagarajan M.; Kinsinger C. R.; Rodriguez H.; Smith R. D.; Rodland K. D.; Liebler D. C.; Liu T.; Zhang B.; Proteogenomic Analysis of Human Colon Cancer Reveals New Therapeutic Opportunities. Cell 2019, 177 (4), 1035–1049. 10.1016/j.cell.2019.03.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clark D. J.; Dhanasekaran S. M.; Petralia F.; Pan J.; Song X.; Hu Y.; da Veiga Leprevost F.; Reva B.; Lih T. M.; Chang H. Y.; Ma W.; Huang C.; Ricketts C. J.; Chen L.; Krek A.; Li Y.; Rykunov D.; Li Q. K.; Chen L. S.; Ozbek U.; Vasaikar S.; Wu Y.; Yoo S.; Chowdhury S.; Wyczalkowski M. A.; Ji J.; Schnaubelt M.; Kong A.; Sethuraman S.; Avtonomov D. M.; Ao M.; Colaprico A.; Cao S.; Cho K. C.; Kalayci S.; Ma S.; Liu W.; Ruggles K.; Calinawan A.; Gumus Z. H.; Geiszler D.; Kawaler E.; Teo G. C.; Wen B.; Zhang Y.; Keegan S.; Li K.; Chen F.; Edwards N.; Pierorazio P. M.; Chen X. S.; Pavlovich C. P.; Hakimi A. A.; Brominski G.; Hsieh J. J.; Antczak A.; Omelchenko T.; Lubinski J.; Wiznerowicz M.; Linehan W. M.; Kinsinger C. R.; Thiagarajan M.; Boja E. S.; Mesri M.; Hiltke T.; Robles A. I.; Rodriguez H.; Qian J.; Fenyo D.; Zhang B.; Ding L.; Schadt E.; Chinnaiyan A. M.; Zhang Z.; Omenn G. S.; Cieslik M.; Chan D. W.; Nesvizhskii A. I.; Wang P.; Zhang H.; Integrated Proteogenomic Characterization of Clear Cell Renal Cell Carcinoma. Cell 2020, 180 (1), 207. 10.1016/j.cell.2019.12.026. [DOI] [PubMed] [Google Scholar]
- Dou Y.; Kawaler E. A.; Cui Zhou D.; Gritsenko M. A.; Huang C.; Blumenberg L.; Karpova A.; Petyuk V. A.; Savage S. R.; Satpathy S.; Liu W.; Wu Y.; Tsai C. F.; Wen B.; Li Z.; Cao S.; Moon J.; Shi Z.; Cornwell M.; Wyczalkowski M. A.; Chu R. K.; Vasaikar S.; Zhou H.; Gao Q.; Moore R. J.; Li K.; Sethuraman S.; Monroe M. E.; Zhao R.; Heiman D.; Krug K.; Clauser K.; Kothadia R.; Maruvka Y.; Pico A. R.; Oliphant A. E.; Hoskins E. L.; Pugh S. L.; Beecroft S. J. I.; Adams D. W.; Jarman J. C.; Kong A.; Chang H. Y.; Reva B.; Liao Y.; Rykunov D.; Colaprico A.; Chen X. S.; Czekanski A.; Jedryka M.; Matkowski R.; Wiznerowicz M.; Hiltke T.; Boja E.; Kinsinger C. R.; Mesri M.; Robles A. I.; Rodriguez H.; Mutch D.; Fuh K.; Ellis M. J.; DeLair D.; Thiagarajan M.; Mani D. R.; Getz G.; Noble M.; Nesvizhskii A. I.; Wang P.; Anderson M. L.; Levine D. A.; Smith R. D.; Payne S. H.; Ruggles K. V.; Rodland K. D.; Ding L.; Zhang B.; Liu T.; Fenyo D.; Proteogenomic Characterization of Endometrial Carcinoma. Cell 2020, 180 (4), 729–748. 10.1016/j.cell.2020.01.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McAlister G. C.; Huttlin E. L.; Haas W.; Ting L.; Jedrychowski M. P.; Rogers J. C.; Kuhn K.; Pike I.; Grothe R. A.; Blethrow J. D.; Gygi S. P. Increasing the multiplexing capacity of TMTs using reporter ion isotopologues with isobaric masses. Anal. Chem. 2012, 84 (17), 7469–78. 10.1021/ac301572t. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kreuzer J.; Edwards A.; Haas W. Multiplexed quantitative phosphoproteomics of cell line and tissue samples. Methods Enzymol. 2019, 626, 41–65. 10.1016/bs.mie.2019.07.027. [DOI] [PubMed] [Google Scholar]
- Paulo J. A.; Jedrychowski M. P.; Chouchani E. T.; Kazak L.; Gygi S. P. Multiplexed Isobaric Tag-Based Profiling of Seven Murine Tissues Following In Vivo Nicotine Treatment Using a Minimalistic Proteomics Strategy. Proteomics 2018, 18 (10), e1700326. 10.1002/pmic.201700326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ahrne E.; Martinez-Segura A.; Syed A. P.; Vina-Vilaseca A.; Gruber A. J.; Marguerat S.; Schmidt A. Exploiting the multiplexing capabilities of tandem mass tags for high-throughput estimation of cellular protein abundances by mass spectrometry. Methods 2015, 85, 100–107. 10.1016/j.ymeth.2015.04.032. [DOI] [PubMed] [Google Scholar]
- Satpathy S.; Jaehnig E. J.; Krug K.; Kim B. J.; Saltzman A. B.; Chan D. W.; Holloway K. R.; Anurag M.; Huang C.; Singh P.; Gao A.; Namai N.; Dou Y.; Wen B.; Vasaikar S. V.; Mutch D.; Watson M. A.; Ma C.; Ademuyiwa F. O.; Rimawi M. F.; Schiff R.; Hoog J.; Jacobs S.; Malovannaya A.; Hyslop T.; Clauser K. R.; Mani D. R.; Perou C. M.; Miles G.; Zhang B.; Gillette M. A.; Carr S. A.; Ellis M. J. Microscaled proteogenomic methods for precision oncology. Nat. Commun. 2020, 11 (1), 532. 10.1038/s41467-020-14381-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- High A. A.; Tan H.; Pagala V. R.; Niu M.; Cho J. H.; Wang X.; Bai B.; Peng J. Deep Proteome Profiling by Isobaric Labeling, Extensive Liquid Chromatography, Mass Spectrometry, and Software-assisted Quantification. J. Visualized Exp. 2017, (129), e56474. 10.3791/56474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leoni E.; Bremang M.; Mitra V.; Zubiri I.; Jung S.; Lu C.-H.; Adiutori R.; Lombardi V.; Russell C.; Koncarevic S.; Ward M.; Pike I.; Malaspina A. Combined Tissue-Fluid Proteomics to Unravel Phenotypic Variability in Amyotrophic Lateral Sclerosis. Sci. Rep. 2019, 9 (1), 4478. 10.1038/s41598-019-40632-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Russell C. L.; Heslegrave A.; Mitra V.; Zetterberg H.; Pocock J. M.; Ward M. A.; Pike I. Combined tissue and fluid proteomics with Tandem Mass Tags to identify low-abundance protein biomarkers of disease in peripheral body fluid: An Alzheimer’s Disease case study. Rapid Commun. Mass Spectrom. 2017, 31 (2), 153–159. 10.1002/rcm.7777. [DOI] [PubMed] [Google Scholar]
- Budnik B.; Levy E.; Harmange G.; Slavov N. SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol. 2018, 19 (1), 161. 10.1186/s13059-018-1547-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yi L.; Tsai C.-F.; Dirice E.; Swensen A. C.; Chen J.; Shi T.; Gritsenko M. A.; Chu R. K.; Piehowski P. D.; Smith R. D.; Rodland K. D.; Atkinson M. A.; Mathews C. E.; Kulkarni R. N.; Liu T.; Qian W.-J. Boosting to Amplify Signal with Isobaric Labeling (BASIL) Strategy for Comprehensive Quantitative Phosphoproteomic Characterization of Small Populations of Cells. Anal. Chem. 2019, 91 (9), 5794–5801. 10.1021/acs.analchem.9b00024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brenes A.; Hukelmann J.; Bensaddek D.; Lamond A. I. Multibatch TMT Reveals False Positives, Batch Effects and Missing Values. Molecular & cellular proteomics: MCP 2019, 18 (10), 1967–1980. 10.1074/mcp.RA119.001472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Brien J. J.; O’Connell J. D.; Paulo J. A.; Thakurta S.; Rose C. M.; Weekes M. P.; Huttlin E. L.; Gygi S. P. Compositional Proteomics: Effects of Spatial Constraints on Protein Quantification Utilizing Isobaric Tags. J. Proteome Res. 2018, 17 (1), 590–599. 10.1021/acs.jproteome.7b00699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zecha J.; Satpathy S.; Kanashova T.; Avanessian S. C.; Kane M. H.; Clauser K. R.; Mertins P.; Carr S. A.; Kuster B. TMT Labeling for the Masses: A Robust and Cost-efficient, In-solution Labeling Approach. Molecular & Cellular Proteomics 2019, 18 (7), 1468–1478. 10.1074/mcp.TIR119.001385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y.; Yang F.; Gritsenko M. A.; Wang Y.; Clauss T.; Liu T.; Shen Y.; Monroe M. E.; Lopez-Ferrer D.; Reno T.; Moore R. J.; Klemke R. L.; Camp D. G. II; Smith R. D. Reversed-phase chromatography with multiple fraction concatenation strategy for proteome profiling of human MCF10A cells. Proteomics 2011, 11 (10), 2019–2026. 10.1002/pmic.201000722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim S.; Pevzner P. A. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun. 2014, 5, 5277. 10.1038/ncomms6277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beausoleil S. A.; Villen J.; Gerber S. A.; Rush J.; Gygi S. P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 2006, 24 (10), 1285–92. 10.1038/nbt1240. [DOI] [PubMed] [Google Scholar]
- Monroe M. E.; Shaw J. L.; Daly D. S.; Adkins J. N.; Smith R. D. MASIC: A software program for fast quantitation and flexible visualization of chromatographic profiles from detected LC–MS(/MS) features. Comput. Biol. Chem. 2008, 32 (3), 215–217. 10.1016/j.compbiolchem.2008.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Callister S. J.; Barry R. C.; Adkins J. N.; Johnson E. T.; Qian W. J.; Webb-Robertson B. J.; Smith R. D.; Lipton M. S. Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. J. Proteome Res. 2006, 5 (2), 277–86. 10.1021/pr050300l. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piehowski P. D.; Petyuk V. A.; Orton D. J.; Xie F.; Moore R. J.; Ramirez-Restrepo M.; Engel A.; Lieberman A. P.; Albin R. L.; Camp D. G. Sources of technical variability in quantitative LC–MS proteomics: human brain tissue sample analysis. J. Proteome Res. 2013, 12 (5), 2128–2137. 10.1021/pr301146m. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Webb-Robertson B.-J. M.; Wiberg H. K.; Matzke M. M.; Brown J. N.; Wang J.; McDermott J. E.; Smith R. D.; Rodland K. D.; Metz T. O.; Pounds J. G.; Waters K. M. Review, Evaluation, and Discussion of the Challenges of Missing Value Imputation for Mass Spectrometry-Based Label-Free Global Proteomics. J. Proteome Res. 2015, 14 (5), 1993–2001. 10.1021/pr501138h. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lazar C.; Gatto L.; Ferro M.; Bruley C.; Burger T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J. Proteome Res. 2016, 15 (4), 1116–1125. 10.1021/acs.jproteome.5b00981. [DOI] [PubMed] [Google Scholar]
- Chen L. S.; Wang J.; Wang X.; Wang P. A MIXED-EFFECTS MODEL FOR INCOMPLETE DATA FROM LABELING-BASED QUANTITATIVE PROTEOMICS EXPERIMENTS. annals of applied statistics 2017, 11 (1), 114–138. 10.1214/16-AOAS994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levin Y. The role of statistical power analysis in quantitative proteomics. Proteomics 2011, 11 (12), 2565–2567. 10.1002/pmic.201100033. [DOI] [PubMed] [Google Scholar]
- Zhu Y.; Piehowski P. D.; Zhao R.; Chen J.; Shen Y.; Moore R. J.; Shukla A. K.; Petyuk V. A.; Campbell-Thompson M.; Mathews C. E.; Smith R. D.; Qian W.-J.; Kelly R. T. Nanodroplet processing platform for deep and quantitative proteome profiling of 10–100 mammalian cells. Nat. Commun. 2018, 9 (1), 882. 10.1038/s41467-018-03367-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piehowski P. D.; Zhu Y.; Bramer L. M.; Stratton K. G.; Zhao R.; Orton D. J.; Moore R. J.; Yuan J.; Mitchell H. D.; Gao Y.; Webb-Robertson B.-J. M.; Dey S. K.; Kelly R. T.; Burnum-Johnson K. E. Automated mass spectrometry imaging of over 2000 proteins from tissue sections at 100-μm spatial resolution. Nat. Commun. 2020, 11 (1), 8. 10.1038/s41467-019-13858-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dou M.; Clair G.; Tsai C.-F.; Xu K.; Chrisler W. B.; Sontag R. L.; Zhao R.; Moore R. J.; Liu T.; Pasa-Tolic L.; Smith R. D.; Shi T.; Adkins J. N.; Qian W.-J.; Kelly R. T.; Ansong C.; Zhu Y. High-Throughput Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a Nanodroplet Sample Preparation Platform. Anal. Chem. 2019, 91 (20), 13119–13127. 10.1021/acs.analchem.9b03349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dou M.; Tsai C.-F.; Piehowski P. D.; Wang Y.; Fillmore T. L.; Zhao R.; Moore R. J.; Zhang P.; Qian W.-J.; Smith R. D.; Liu T.; Kelly R. T.; Shi T.; Zhu Y. Automated Nanoflow Two-Dimensional Reversed-Phase Liquid Chromatography System Enables In-Depth Proteome and Phosphoproteome Profiling of Nanoscale Samples. Anal. Chem. 2019, 91 (15), 9707–9715. 10.1021/acs.analchem.9b01248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lombard-Banek C.; Moody S. A.; Nemes P. Single-Cell Mass Spectrometry for Discovery Proteomics: Quantifying Translational Cell Heterogeneity in the 16-Cell Frog (Xenopus) Embryo. Angew. Chem., Int. Ed. 2016, 55 (7), 2454–2458. 10.1002/anie.201510411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Myers S. A.; Rhoads A.; Cocco A. R.; Peckner R.; Haber A. L.; Schweitzer L. D.; Krug K.; Mani D. R.; Clauser K. R.; Rozenblatt-Rosen O.; Hacohen N.; Regev A.; Carr S. A. Streamlined Protocol for Deep Proteomic Profiling of FAC-sorted Cells and Its Application to Freshly Isolated Murine Immune Cells. Molecular & Cellular Proteomics 2019, 18 (5), 995–1009. 10.1074/mcp.RA118.001259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu R.; Pai A.; Liu L.; Xing S.; Lu Y. NanoTPOT: Enhanced Sample Preparation for Quantitative Nanoproteomic Analysis. Anal. Chem. 2020, 92 (9), 6235–6240. 10.1021/acs.analchem.0c00077. [DOI] [PubMed] [Google Scholar]
- Cheung T. K.; Lee C. Y.; Bayer F. P.; McCoy A.; Kuster B.; Rose C. M. Defining the carrier proteome limit for single-cell proteomics. Nat. Methods 2021, 18 (1), 76–83. 10.1038/s41592-020-01002-5. [DOI] [PubMed] [Google Scholar]
- Tsai C. F.; Zhao R.; Williams S. M.; Moore R. J.; Schultz K.; Chrisler W. B.; Pasa-Tolic L.; Rodland K. D.; Smith R. D.; Shi T.; Zhu Y.; Liu T. An Improved Boosting to Amplify Signal with Isobaric Labeling (iBASIL) Strategy for Precise Quantitative Single-cell Proteomics. Mol. Cell Proteomics 2020, 19 (5), 828–838. 10.1074/mcp.RA119.001857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deutsch E. W.; Bandeira N.; Sharma V.; Perez-Riverol Y.; Carver J. J.; Kundu D. J.; Garcia-Seisdedos D.; Jarnuczak A. F.; Hewapathirana S.; Pullman B. S.; Wertz J.; Sun Z.; Kawano S.; Okuda S.; Watanabe Y.; Hermjakob H.; MacLean B.; MacCoss M. J.; Zhu Y.; Ishihama Y.; Vizcaino J. A. The ProteomeXchange consortium in 2020: enabling ’big data’ approaches in proteomics. Nucleic Acids Res. 2019, 48 (D1), D1145–D1152. 10.1093/nar/gkz984. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






