Abstract
Formalin-fixed, paraffin-embedded (FFPE) patient tissues are a valuable resource for proteomic studies with the potential to associate derived molecular insights with clinical annotations and outcomes. Here, we present an optimized, partially automated, plate-based workflow for FFPE proteomics combining pathology-guided macrodissection, xylene-free deparaffinization using Adaptive Focused Acoustics sonication for lysis and decrosslinking, optimized suspension trapping digestion and cleanup of peptides, and LC-MS/MS using Exploris 480, Orbitrap Astral, and timsTOF HT instrumentation. The workflow enables analysis of up to 96 dissected FFPE tissue samples or 10 μm scrolls, identifying 8000 to 10,000 unique proteins per sample with median CVs <20%. Application to lung adenocarcinoma FFPE blocks confirms the platform’s effectiveness in processing complex, clinically relevant samples, achieving deep proteome coverage and quantitative robustness comparable to tandem mass tagbased methods. Using the Orbitrap Astral with short, 24-min gradients, the workflow identifies up to 10,000 unique proteins and 11,000 fully localized phosphosites in lung adenocarcinoma FFPE tissue, demonstrating the ability to derive biologically relevant phosphoprotein/peptide results from clinically derived FFPE tumor samples. This high-throughput, scalable workflow advances biomarker discovery and proteomic research in archival tissue samples.
Keywords: FFPE, proteomics, phosphoproteomics, clinical proteomics, cancer, biomarker
Graphical Abstract

Highlights
-
•
High-Throughput proteomics workflow for FFPE tissue.
-
•
Deep coverage with ∼10,000 proteins and ∼11,000 phosphosites per FFPE sample.
-
•
Enhanced lysis boosts peptide recovery and cleanliness.
-
•
Fast DIA methods built for Orbitrap Astral or timsTOF HT Mass Spectrometers.
-
•
Supports biomarker discovery from archival tumor blocks.
In Brief
We present a high-throughput, plate-based proteomics workflow for FFPE tissues that combines Covaris AFA lysis, S-Trap digestion, and DIA-MS analysis to quantify ∼10,000 proteins and ∼11,000 phosphosites per sample. The semiautomated pipeline enables tumor dissection-to-data in under a week and is compatible with Astral and timsTOF platforms. Applied to different cancer tissues, this scalable method supports biomarker discovery and pathway analysis from real-world, archival clinical samples.
Formalin-fixed, paraffin-embedded (FFPE) tissue samples with clinical annotation represent an invaluable resource. Patient tissue profiling by global proteomics using LC-MS/MS provides a molecular foundation for exploration of disease mechanisms and prediction of clinical outcomes that goes beyond what can be achieved by traditional pathology or immunohistochemistry (1, 2, 3, 4, 5). The extensive collections of FFPE samples in biobanks and pathology archives facilitate large-scale studies, aiding in biological discovery and the identification of candidate biomarkers and molecular alterations associated with diverse pathologies (6).
Mass spectrometry–based proteomic analyses of FFPE tissues present a number of challenges. Many FFPE proteomics workflows have relied on laborious xylene and ethanol washes for deparaffinization (7, 8), although alternative approaches have started to emerge such as the use of the Beatbox (PreOmics), which leverages bead-based tissue lysis (9); substitution of SafeClear (Thermo Fisher Scientific) for xylene (10); and lysis using the Covaris Adaptive Focused Acoustics (AFA) technology (3, 11, 12, 13). Despite these innovations, extraction of proteins and peptides from FFPE tissues has proven difficult, constraining depth of coverage and quantitative characterization of post-translational modifications (PTMs) (4, 7, 14). Tissue preservation techniques can also impact quantitative reproducibility relative to fresh frozen tissue (15), and contaminants such as wax in FFPE can be problematic for downstream nanoflow LC-MS/MS applications. Thus, process improvements are needed to achieve deep-scale analysis of FFPE tissues and take full advantage of the invaluable resources afforded by repositories of well-annotated FFPE specimens.
In this study, we describe a highly optimized and partially automated workflow that enables reproducible, deep-scale proteomic analysis of FFPE tissues yielding ∼8000 to 10,000 unique proteins and up to ∼14,000 localized phosphosites per sample. The plate-based workflow employs the Covaris AFA ultrasonicator for FFPE sample processing and decrosslinking, an Opentrons OT-2 robot for complex liquid handling steps, positive pressure manifold-assisted suspension trapping (S-Trap)-based digestion and purification, and data-independent acquisition (DIA) analysis on Orbitrap or timsTOF instrumentation. DIA data were uploaded to Terra (https://app.terra.bio/), a cloud-based environment running on Linux that is designed to support Spectronaut (https://biognosys.com/software/spectronaut) (16), FragPipe (https://fragpipe.nesvilab.org) (17), and DIA-NN (https://github.com/vdemichev/DiaNN) (18), enabling fast and scalable data processing to meet the intensive computational demands of DIA pipelines in large-scale experiments. Starting with macrodissected FFPE tissues arrayed in plates, the processing and data-generation pipeline has a turnaround time of <1 week for proteome-level analysis of 96 samples, with an additional ∼3 days for phosphoproteome analysis.
Experimental Procedures
Samples
Lung cancer samples were obtained from Washington University, under Institutional Review Board (IRB) protocol number 201409101. Breast and colon samples were obtained from Van Andel Institute under IRB protocol 11010. All samples from both institutions were fully deidentified. Animal studies were performed in accordance with Dana-Farber Cancer Institute's Institutional Animal Care and Use Committee–approved protocols (10-055, 15-020). All human studies reported herein abide by the Declaration of Helsinki principles.
Deparaffinization of FFPE Tissue Using Covaris Ultrasonicator
Breast cancer (BRC) and colorectal cancer (CRC) FFPE blocks (∼2 × 3 cm tissue area) were sectioned into serial 10 μm scrolls using the HistoCore MULTICUT (Leica). Scrolls were transferred immediately into a Covaris 96 AFA-Tube TPX Plate (Covaris, Part No. 520291) and fully submerged in wells prefilled with Covaris tissue lysis buffer (TLB) (Covaris, Part No. 52-284). FFPE tissue was deparaffinized by ultrasonication in the Covaris LE220Rsc Focused-ultrasonicator at 3 mm Y-dither at 10 mm/s, followed by heating to 90 °C in a thermocycler (Eppendorf) with a heated lid at 105 °C for 90 min to decrosslink proteins. The tissue was then sonicated again for homogenization at 1 mm Z-dither at 20 mm/s. For both deparaffinization and homogenization, the ultrasonicator was maintained at 20 °C, operated with a peak power of 350 W and set to 200 cycles per burst; 5 min of sonication was applied to each lane of the plate. Evaluated parameters included increasing the Covaris default sonication duty factor (DF) from 25% to 50% and TLB volume from 100 μl to 150 μl; optimal sonication conditions of 50% DF in 150 μl TLB were used for subsequent experiments. Total protein amounts following sonication were quantified using a bicinchoninic acid (BCA) protein assay kit (Thermo Fisher Scientific, Cat No. 23225), with standards prepared in 5x TLB. The Covaris plate was spun down briefly at 1500 g in a centrifuge (Beckman Coulter) and 5 μl of the semiclarified lysate was diluted 5x in water for quantification.
Sample Cleanup and Digestion Optimization
For protein digestion and cleanup, S-Trap and solvent precipitation beads (SP3) workflows were evaluated. All lysates were initially treated with 2 mM magnesium chloride and 0.8 U/μl of BenzoNuclease (Sigma-Aldrich, Cat No. E8263-5KU). BRC and CRC samples were normalized to 50 μg of protein lysate, followed by reduction and alkylation in 5 mM tris(2-carboxyethyl)phosphine (Thermo Fisher Scientific, Cat No. 77720) and 50 mM 2-chloroacetamide (VWR, Cat No. TCC2536-25G) for 1 h while mixing in darkness at room temperature.
For the SP3 workflow, the beads (Cytiva Sera-Mag Hydrophilic and Sera-Mag Hydrophobic Magnetic Particles: Cat No. 44152105050250 and 24152105050250) were equilibrated at room temperature for 30 min, prepared at a 1:1 ratio, washed three times with 80% acetonitrile (MeCN), and resuspended at 50 mg/ml. The bead mix was spiked into each sample at a bead-to-protein ratio of 10:1. Protein binding to the beads was initiated by adding MeCN to a final concentration of 80%. The beads were mixed during binding for 5 min and pelleted using a magnetic rack, prior to removal of the supernatant by manual aspiration. The beads were then washed three times with 500 μl of 80% MeCN. On-bead protein digestion was performed overnight in 50 mM triethylammonium bicarbonate (TEAB) pH 9 with mass spectrometry–grade trypsin (Promega) and LysC (Fujifilm Bioscience) at a ratio of 1:50 (weight/weight). Following overnight digestion, the beads were centrifuged at 3000 g and the samples transferred into a magnetic rack for collection of the supernatant. The beads were then resuspended with 50 μl of water followed by 1% formic acid (FA); in each resuspension step, the samples were centrifuged and transferred to a magnetic rack, where the supernatants were collected and combined.
For suspension trapping, 1.2% phosphoric acid was added to each reduced and alkylated sample, followed by six times the volume of 100 mM TEAB in 90% methanol (MeOH) (pH 7.1). Samples were mixed thoroughly via repeated pipette aspirations and loaded onto the S-Trap 96-well Mini Plate (ProtiFi) using a vacuum manifold. After loading, the wells were washed three times with 300 μl of a 50/50 (volume/volume) chloroform/MeOH solution and three times with 300 μl of 100 mM TEAB in 90% MeOH pH 7.1. Samples were digested overnight in 125 μl of 50 mM TEAB pH 9, using trypsin and LysC at a ratio of 1:50 (weight/weight). Peptides were eluted into a deep-well plate (Cytiva, Part No. 7701-5200) with sequential 80 μl additions of 50 mM TEAB pH 9, 0.2% FA, and 0.2% FA in 50% MeCN.
StageTip Desalting Optimization
Peptide digests were quantified using a quantitative fluorometric peptide assay (Thermo Fisher Scientific, Cat No. 23290), acidified with 1% FA, and a specified amount was StageTip-desalted with Empore C18 disks (CDS Analytical, Part No. 98-0604-0217-3). The StageTips were conditioned with 100% MeOH and 0.1% FA in 50% MeCN and equilibrated with 1% FA. Peptides were loaded, washed three times with 0.1% FA, and eluted with 0.1% FA in 50% MeCN. Samples were dried down in a speedvac vacuum concentrator and stored at −80 °C until LC-MS/MS analysis.
SDB-RPS StageTips (Empore, Cat No. ST7600196) were implemented to reduce buildup on the column emitter. Peptides were acidified with 1% FA and StageTips were conditioned with 100% MeCN. Peptides were immediately loaded onto the tips, as equilibration is not required. The peptides were washed twice with 1% FA in isopropyl alcohol followed by 1% FA. Peptides were eluted with 5% ammonium hydroxide in 80% MeCN. Samples were dried down and stored at −80 °C until LC-MS/MS analysis.
SDB-XC StageTips were prepared by packing SDB-XC (Empore, Cat. No 98060402231EA) punches at a ratio of 1 punch for each 1 μg of peptides. The SDB-XC StageTip was conditioned with 100% of MeOH and 0.1% FA in 50% MeCN and then equilibrated with 1% FA. Peptides were acidified to 1% FA and loaded on the conditioned SDB-XC StageTip for fractionation. The StageTip was washed twice with 1% FA in water. Peptides were eluted from the StageTip in five steps with 0.1% ammonium hydroxide in 5%, 10%, 15%, 20%, and 80% MeCN. Samples were dried and stored at −80 °C until LC-MS/MS analysis.
Semiautomated 96-Well Plate Workflow
Automation was implemented to enable efficient 96-well plate–based processing of FFPE tissues. Following plate-based Covaris sonication and S-Trap loading of full pellets, the requisite trypsin and LysC quantities were determined based on per-sample protein yield. An Opentrons OT-2 Robot was used to prepare the digestion enzymes using a custom Python script to aliquot protease amount needed for each sample before normalizing to a uniform final volume of 100 μl with 50 mM TEAB. A multichannel pipette was used to transfer the digestion buffers to the S-Trap wells. A similar custom Opentrons protocol was used to normalize peptide input amounts for StageTip desalting by transferring specified volumes from the S-Trap deep-well elution plate to a secondary plate (Bio-Rad, Cat. No HSP9601) and adjusting to a standard volume. StageTip desalting was carried out in a 96-well plate format by removing the base from a used S-Trap plate and inserting the StageTips into the wells, allowing processing with a benchtop centrifuge (Beckman Coulter) and elution into a 96-well plate (Bio-Rad, Cat. No HSP9601). The samples were dried in the elution plate and stored at −80 °C until LC-MS/MS analysis.
BRC and CRC FFPE Tissue Macrodissection
For BRC and CRC blocks, 10 μm sections were stained with H&E to identify tumor-rich areas and guide macrodissection of five successive 10 μm sections, yielding a total macrodissected surface area of ∼30 mm2 per replicate (performed at VARI). Material was transferred to a Covaris 96-well plate; all experiments were performed in quadruplicate. To assess tissue processing and storage conditions, two of the four experimental replicates from each block were stored with TLB, and two without. Macrodissected samples were processed using through the S-Trap digestion and SDB-RPS StageTip desalting workflows described above. A pooled sample consisting of all BRC and CRC peptide samples was fractionated with SDB-XC StageTips as described above.
LC-MS Analysis of BRC and CRC FFPE Samples
For each of the BRC and CRC FFPE samples, 1 μg of peptides (prior to StageTip desalting) were loaded on a Vanquish Neo UHPLC system (Thermo Fisher Scientific) coupled to an Orbitrap Exploris 480 Mass Spectrometer (Thermo Fisher Scientific) operating in DIA mode. Peptides were separated on a 25 cm New Objective PicoFrit column that was packed in-house with 1.5 μm C18 beads (Reprosil-Pur, Dr Maisch). The separation occurred over a 94-min active gradient at a 200 nl/min flow rate: 1.6% B to 4.8% B in 1 min, 4.8% to 24% B in 83 min, 24% B to 48% B in 10 min. Mass spectra (MS)covered a mass range of 350 to 1800 m/z with a resolution of 120,000 at 200 m/z for full scans (MS1). The MS1 automatic gain control (AGC) target was set to 3E6 with a maximum injection time of 25 ms. For DIA analysis, 30 variable windows were used to cover m/z 500 to 740 as shown in Supplemental Table S2. The scans were performed at an AGC target of 5E5 with a maximum injection time of 54 ms. For fragmentation, higher energy collisional dissociation (HCD) was set to 27% normalized collision energy (NCE) with default charge state set to +3. The DIA method had an average cycle time of 2.2 s, equating to ∼5 data points per peak. The pooled and fractionated BRC/CRC samples were run on the same LC-MS setup described above, with the instrument operating in data-dependent acquisition (DDA) mode. Data were acquired at a resolution of 60,000 with an AGC target of 3E6 for MS1. MS2 spectra were acquired using a Top-20 method with a resolution of 120,000 and an AGC target of 3E5. Precursors of charge 2 to 6 were fragmented using HCD at 30% NCE.
Evaluation of Ncoa4 Overexpression by tandem mass tag or label-free DIA
Four mouse FFPE blocks were obtained from Dr Joseph D. Mancias at the Dana-Farber Cancer Institute (19). The blocks included embedded tissues from multiple different organs with two blocks being from WT animals and two from animals with induced overexpression of Ncoa4. The blocks were sectioned into four experimental replicates and processed with the optimized Covaris, digestion, and StageTip parameters outlined above.
For DIA analyses, 1 μg of peptides (measured prior to SDB-RPS StageTip) were injected on various LC-MS systems, including a variable-window DIA method on the Orbitrap Exploris 480, Orbitrap Astral DIA with narrow isolation windows (4 m/z, Thermo Fisher Scientific), and ion mobility-based DIA (diaPASEF) on the timsTOF-HT (Bruker Daltonics) with a variable window scheme as shown in Supplemental Table S3. The wide-window DIA method on the Exploris 480 was consistent with the method described in the previous section, with the exception of a wider mass range (400–850 Th) to enhance peptide depth as shown in Supplemental Table S4.
For diaPASEF, peptides were separated on a nanoElute-2 (Bruker Daltonics), using a PepSep ULTRA C18 column (Bruker Daltonics, 25 cm length/75 μm inner diameter/1.5 μm particle size) and a 30-min active gradient at a flow rate of 500 nl/min: 3% B to 28% B in 28 min, 28% B to 32% B in 2 min. The diaPASEF method on the timsTOF HT covered an m/z range of 300 to 1400, optimized with py-diAID using two ion mobility (IM) windows per m/z area (20), yielding a total of 20 diaPASEF windows and an additional MS1 scan. The IM scan range was set to 0.7 to 1.45 V cm-2 with 50 ms ramp and accumulation time, equating to a cycle time of 1.2 s and ∼6 data points per peak. Collision energy was synchronized with the IM range, 59 eV at 1/K0 = 1.6 V cm-2 decreased to 20 eV at 1/K0 = 0.6 V cm-2.
For the Astral DIA analysis, peptides were separated using a Vanquish Neo UHPLC system (Thermo Fisher Scientific) coupled to an Orbitrap Astral (Thermo Fisher Scientific) with an Aurora Elite TS C18 UHPLC column (IonOpticks, 15 cm length/75 μm inner diameter/1.7 μm particle size) at a flow rate of 800 nl/min. The 21-min active gradient was as follows: 2% to 4% B in 0.5 min, 4% to 8% B in 0.6 min, 8% to 22.5% B in 12.8 min, 22.5% to 35% B in 6.9 min, and 35% to 55% B in 0.4 min. MS1 spectra covered a mass range of 390 to 980 m/z with a resolution of 240,000 at 200 m/z using the Orbitrap mass analyzer. The MS1 AGC target was set to 5E6 with a maximum injection time of 3 ms, and MS1 spectra were acquired with a timed loop control of 0.6 s, and 4 m/z isolation windows spanned the range of m/z 390 to 900. Maximum injection time was set to 5 ms for MS2 scans, with an AGC target of 2E5. HCD fragmentation was set to 25% NCE.
For tandem mass tag (TMT) analysis, peptides (∼50 ug) were dried down and reconstituted to ∼1 mg/ml in 100 mM TEAB. Peptides were labeled in a 1:5 ratio with TMTpro 18-plex (Thermo Fisher Scientific, Cat no. A52045). All channels were fully labeled, and two channels were designated to represent a mixed pool of peptides from all experimental replicates, which would later be excluded from bioinformatic analysis. Following labeling, the 18 channels were quenched with hydroxylamine to a final concentration of 0.3%. The TMT-labeled samples were then pooled and dried down before desalting with SepPak tC18 (Waters, Cat. No. WATO36820). Offline fractionation of TMT-labeled peptides was performed using a high-pH reversed-phase gradient with a ZORBAX C18 column (Agilent, 250 mm length/4.6 mM inner diameter/3.5 μm particle size) as described previously (21). The resulting fractions were concatenated into 24 fractions for LC-MS analysis in DDA mode. One μg of each fraction was injected on the Thermo Scientific Vanquish Neo UHPLC coupled to a Thermo Scientific Orbitrap Exploris 480 with a 25 cm New Objective PicFrit column that was packed in-house with 1.5 μm C18 beads. A 99-min active gradient at a flow rate of 200 nl/min was used: 1.8% B to 5.4% B in 1 min, 5.4% B to 27% B in 86 min, and 27% B to 54% B in 12 min. MS1 spectra covered a mass range of 350 to 1800 m/z with a resolution of 60,000. The MS1 AGC target was set to 3E6 with a maximum injection time of 25 ms. The top 20 precursors were isolated, and a monoisotopic peak selection filter was set to peptides, with the “relaxed restrictions when too few precursors were found” toggle turned on. The precursor intensity threshold for selection was 5E3 with a fit threshold of 50% and a window of 1.2 m/z. Precursors of charge states 2 to 6 were selected for fragmentation with an isolation width of 0.7 m/z, the dynamic exclusion was set to 20 s, and NCE was set to 34%. MS2 scans were acquired with a resolution of 45,000, with an AGC target of 3E5, and a maximum injection time of 105 ms.
Proteome and Phosphoproteome Analysis of LUAD FFPE Blocks
The lung adenocarcinoma (LUAD) FFPE samples were macrodissected by VARI as described above and directly placed into a Covaris 96 AFA-Tube TPX Plate prefilled with TLB buffer. The samples had varying total tissue areas (9–63 mm2). Covaris plates with dissected tissue were kept at −80 °C until time of processing with optimal Covaris, digestion, and StageTip parameters as described above. Proteome and phosphoproteome data were collected on the Orbitrap Astral using the same liquid chromatography (LC) setup and 4 m/z isolation windows DIA method described above.
For phosphopeptide enrichment, 50 μg of peptides following S-Trap digestion were desalted using a Sep-Pak tC18 96-well plate (Waters, Cat No. 196002320) and eluted in 200 μl of 80% MeCN/0.1% TFA. An additional 1% TFA was added prior to phosphopeptide enrichment on the AssayMAP Bravo liquid handling system (Agilent). Fe(III)-NTA cartridges (Agilent, 5 μl) were initially primed with 100 μl of 50% MeCN/0.1% TFA and equilibrated with 50 μl of 80% MeCN/0.1% TFA, followed by sample loading (195 μl, 5 μl/min). The cartridges were then washed with 50 μl of 80% MeCN/0.1% TFA and the phosphorylated peptides eluted with 20 μl of 1% NH4OH directly into a 96-well plate (Bio-Rad, Cat. no HSP9601) containing 2.5 μl of 100% FA. Following elution, 20 μl of 100% ACN was added to each sample using a multichannel pipette. The samples were dried down and stored at −80 °C until LC-MS analysis.
Data Analysis
For all experiments, data were searched against the reviewed human proteome database obtained from UniProt on January 8,2024, without isoforms (20,462 entries). Contaminants were appended to the FASTA file, and decoys were generated using the reverse sequence approach.
DIA data from the FFPE workflow optimization and macrodissections were searched in library-free mode on FragPipe V20.0 (https://fragpipe.nesvilab.org/). Default parameters of DIA_Speclib_Quant workflow were loaded.
-
•
Peak matching: precursor mass tolerance: 20 ppm, fragment mass tolerance: 20 ppm, calibration and optimization: mass calibration and parameter optimization, ISOTOPE error: 0/1/2. Require precursor was enabled to discard peptide-spectrum matches (PSMs) without any identified parent peaks.
-
•
Protein digestion was set to strict trypsin, with KR cuts at the C terminus and maximal missed cleavages set to 2. Peptide length was set to a minimum of 7 and a maximum of 50. Peptide mass range was kept at 500 to 5000 Da. N-terminal methionine excision was enabled.
-
•
Methionine oxidation and N-terminal acetylation were set as variable modifications, while carbamidomethylation of cysteine was selected as fixed modification.
-
•
An additional 1% run-specific protein false discovery rate (FDR) filtering was performed during the DIA-NN quantification step.
The DDA data from the pooled and fractionated BRC and CRC samples were analyzed using FragPipe with the localization-aware open-search default parameters in MSFragger and PTM-Shepherd (https://ptmshepherd.nesvilab.org/) against the human FASTA file obtained from UniProt on January, 8, 2024, without isoforms (20,462 entries). Contaminants were appended to the FASTA file, and decoys were generated using the reverse sequence approach.
The search parameters and filtering criteria were loaded from the default MSFragger OpenSearch parameters as follows:
-
•
Peak matching: precursor mass tolerance: 20 ppm, fragment mass tolerance: 20 ppm, calibration and optimization: mass calibration and parameter optimization, Isotope error: 0/1/2. Require precursor was enabled to discard PSMs without any identified parent peaks.
-
•
Protein digestion was set to strict trypsin, with KR cuts at the C terminus and maximal missed cleavages set to 2. Peptide length was set to a minimum of 7 and a maximum of 50. Peptide mass range was kept at 500 to 5000 Da. N-terminal methionine excision was enabled.
-
•
Methionine oxidation and N-terminal acetylation were set as variable modifications, while carbamidomethylation of cysteine was selected as fixed modification.
-
•
For the open search options in MSFragger, default parameters were used, where the report mass shift as a variable was set to No, track zero top N: 0, zero bin accept expect: 0m delta mass exclude range: (1.5, 3.5), zero bin multiply expect: 1, and the localize mass shift was checked.
-
•
For PSM validation, crystal-C using the default open search parameters were applied, retention time and spectral predictions were enabled for Percolator. Peptide prophet parameters were to open search defaults. ProteinProphet default parameters were enabled, and the FDR filter was set to 1%.
-
•
To characterize the identified mass shifts against the UniMod database, PTM-Shepherd was enabled using the default settings, with the extended PTM report option enabled. In PTM profiling, the smoothing factor was set to 2, prominence ratio to 0.3, max fragment charge to 2, precursor tolerance to 0.01 da, peak picking width to 0.002 da, fragment mass tolerance to 20 ppm, peak minimum of 10 PSMs, and the data were normalized to PSMs. The UniMod database was used for annotation, with a tolerance of 0.01 Da. The localization scores were based on b and y fragment ions with a background level of 4, where the enrichment scores are calculated off PSMs with localizable mass shifts within the entire dataset as background for calculation.
All Ncoa4 overexpression data were searched using DIA-NN 1.8.1 both because of its compatibility with all acquired raw file types and to control for any database searching variability. An in silico digest spectral library was generated in DIA-NN from a mouse FASTA downloaded from UniProt on May 13, 2023. The FASTA contained 55,275 protein entries, including both reviewed (SwissProt) and unreviewed (TrEMBL) sequences. The fragment ion range was m/z: 100 to 2000 m/z, and the precursor ion range was set to m/z 200 to 1600, with charge states 2 to 4. The peptide length was 7 to 30 with a maximum number of variable modifications of 3 and 1 missed cleavage. Carbamidomethylation was selected as a fixed modification. N-terminal methionine excision, methionine oxidation, and N-terminal acetylation were chosen as variable modifications. TMT data were searched in FragPipe V20 based on the parameters from the TMT16 workflow. The precursor mass tolerance was set to ± 20 ppm. Peptide length was set to 7 to 50 with 2 missed cleavages. Carbamidomethylation of cysteines and TMT tagging of lysine and N-terminal were chosen as static modifications. N-terminal acetylation, methionine oxidation, pyroglutamic acid, pyro-carbamidomethylation of cysteine, proline hydroxylation, and N-terminal deamidation were searched as variable modifications. The maximum number of variable modifications per peptide was set to 3. Phosphorylation data were searched in Spectronaut (https://biognosys.com/software/spectronaut/) 19.1 in direct DIA mode. Cysteine carbamidomethylation was set as a fixed modification, while methionine oxidation, N-terminal acetylation, and phosphorylation of serine, threonine, and tyrosine were set as variable modifications, with a maximum variable modification count of 5. A PTM probability threshold of 0 was used to allow for phosphosites of class II (sites with only moderate probability of having been accurately localized) and above, with a cut-off filter of ≥0.75 being applied postanalysis to determine accurately localized class I phosphosites (22).
MSFragger-DIA and Spectronaut Instantiation on the Terra Platform
DIA data were searched using the MSFragger-DIA workflow, which was added in FragPipe v20.0 (17). MSFragger-DIA was implemented as a Google Cloud-based pipeline on the Terra platform (https://app.terra.bio) using a headless configuration to pass commands to a Fragpipe docker. To enable searching of DIA files collected from ThermoFisher instruments, MSConvert (23) was adapted to Terra for parallelized Fragpipe-compatible raw-to-mzML conversion. The default conversion settings of MSConvert are reflected in the standard Terra configuration: MS1 peak picking (centroiding) and removal of consecutive extra zeros in MS1. The Terra-hosted Fragpipe pipeline requires users to input a GUI-generated workflow file, containing parameters for running relevant Fragpipe modules with full functionality. Additional documentation for the Terra MSConvert and Fragpipe workflows is available on GitHub (https://github.com/broadinstitute/PANOPLY) (24). Spectronaut version 19.1 was also implemented as a Terra pipeline with functionality for multiple report generation, customized enzyme database integration, and combination of SNE files.
Experimental Design and Statistical Rationale
Experimental design and statistical rationale are provided in the individual method sections and also indicated in the result or figure legends corresponding to technology development workflows that were optimized using BRC and CRC blocks. An FDR of 0.01 was set as a cut-off for peptide and protein identifications. For Ncoa4 OE and WT mice scatter plots, fold changes between OE and WT samples were calculated and plotted on the x-axis and y-axis, respectively, and Pearson correlation is indicated in the upper left corner. Differential expression between OE2 and WT2 was calculated using a two-sample t-test. .Non-negative matrix factorization (NMF) clustering and single sample gene set enrichment analysis of the LUAD manual macrodissection (MMD) samples were performed using the NMF module within Panoply (https://github.com/broadinstitute/PANOPLY). Details have been described before (24, 25, 26, 27). Pathways with FDR <0.05 are indicated with an asterisk.
Results
Workflow and Method Optimization
A schematic representation of the FFPE processing workflow is shown in Figure 1A. The final workflow developed was the product of systematic optimization of critical sample processing parameters that impact protein/peptide yields, sample cleanup, and the reproducibility of proteomic identifications as detailed below. Our initial experiments focused on refining the workflow recommended by Covaris for clinical proteomics (28) using 10 μm scrolls from CRC and BRC blocks that had been stored ∼20 years (reflecting real-world FFPE repositories). We found that increasing the sonication DF on the Covaris LE220-plus focused-ultrasonicator to 50% and TLB volume to 150 μl increased protein yields by more than 100% compared to the Covaris-recommended protocol of 25% DF in 100 μl of TLB (Fig. 1B). A small pellet remained in most wells after processing. Our initial optimization studies utilized only the clear lysate (see below).
Fig. 1.
Method optimization. A, the FFPE workflow processes entire scrolls using a Covaris TPX plate. Breast cancer (BRC) and colorectal cancer (CRC) blocks were scrolled and transferred into a Covaris plate containing tissue lysis buffer (TLB). Sonication conditions were optimized to maximize protein extraction. Reduction and alkylation were performed in the Covaris TPX plate, and then SP3 beads and S-Trap plates were evaluated, starting with 50 μg of protein. Samples were analyzed on a Neo Vanquish LC coupled to an Exploris 480 (Thermo Fisher Scientific) in data-independent acquisition (DIA) mode. B, optimization of protein extraction efficiency by adjusting sonication duty factors (DFs) and buffer volume. Bar plots represent data from n = 4 scrolls per block, showing means and SD error bars. C, comparison of SP3 and S-Trap methods for digestion and cleanup using a fluorometric peptide assay. Bar plots represent data from n = 4 scrolls per block, with means and SD error bars. D, unique protein identifications using MSFragger-DIA across the three tested workflows, from 500 ng of peptide input for LC-MS analysis. Bar plots represent data from n = 4 scrolls per block, with means and SD error bars. E, Venn diagram illustrating protein overlap, with each circle representing the total number of unique proteins identified by each tested workflow. FFPE, formalin-fixed, paraffin-embedded; S-Trap, suspension trapping.
Following FFPE sample lysis, the protein content of each sample was quantified colorimetrically using the Micro BCA Protein Assay Kit (Thermo Fisher Scientific), followed by reduction in 5 mM tris(2-carboxyethyl)phosphine and alkylation by 50 mM 2-chloroacetamide. Next, two sample digestion methods, S-Trap (ProtiFi) and SP3 (Cytiva), were compared with respect to peptide yield starting with 50 μg protein from BRC and CRC FFPE replicate sections (Fig. 1C). Covaris TLB, which contains >2% SDS as formulated, was used directly without spiking in additional SDS for SP3 in order to streamline the workflow. For the S-Trap condition, both the >2% SDS formulation and the recommended 5% SDS concentration were evaluated. Yields from CRC scrolls were ∼3X greater than those from BRC scrolls, possibly due to the higher fat content of the latter, which may interfere with the protein BCA assay. BRC peptide yields were roughly comparable across the three tested conditions: S-Trap with TLB (∼2% SDS), S-Trap with TLB spiked with additional SDS (∼5% SDS), and SP3 beads with TLB (∼2% SDS), all done in four replicates. In contrast, peptide yields for CRC scrolls were more than twice as high for S-Trap than for SP3 methods using 2% SDS, and somewhat reduced when S-Trap SDS concentration was increased to 5% (Fig. 1C). The high lipid content in BRC and elevated SDS concentrations may have introduced bias in the protein BCA assay, but the overall trend of the S-Trap to SP3 remained similar across all conditions, with S-Trap exhibiting the highest peptide yields across several FFPE processing methods, including SP3, consistent with a recent report (29). That study also described incompletely homogenized tissues that further solubilized throughout downstream washes and digestion steps, consistent with our findings. Comparable numbers of proteins were identified for CRC and BRC samples when analyzing the equivalent of 1 μg of peptide (normalized based on the peptide yields shown in Fig. 1C), with over 90% overlap in identified proteins irrespective of the processing method used (Fig. 1, D and E). The S-Trap method, however, identified ∼10% more peptides than the SP3 method, providing increased sequence coverage for a subset of identified proteins (Supplemental Fig. S1A, Supplemental Table S1). Furthermore, total ion chromatograms had greater overall intensity and were more reproducible for samples processed by S-Trap than SP3 (Supplemental Fig. S1B, Supplemental Table S1), despite slightly better digestion efficiency with SP3 (Supplemental Fig. S1C, Supplemental Table S1A). In addition, cleaner samples were obtained using the S-Trap workflow due to an additional chloroform/MeOH wash step. Hence, all further evaluations utilized S-Trap rather than SP3.
Application of the Optimized Method to Macrodissected Tissue
Laser microdissection (13) of tumor-rich areas and pathologist-guided macro microdissection (MMD) are both used to enrich tumor epithelial cells within heterogeneous cancer tissue. Here, we optimized parameters for pathology-guided macrodissection of tumor-rich areas from three to five 10 μm thick scrolls from a subset of CRC and BRC samples (Fig. 2A, Supplemental Table S2). Transferring fragments from MMD samples to empty wells was impeded by electrostatic interaction; however, transfer was made easier and more consistent when wells were prefilled with TLB. To assess the downstream impact of wet versus dry stabilization, samples transferred to both dry or prefilled wells. Plateswere frozen at −80 °C and processed by AFA after several weeks. Protein yields from the clear lysates were nearly identical between MMD samples stored dry versus in wells prefilled with TLB (Fig. 2B).
Fig. 2.
Method optimization for MMD samples. A, schematic showing MMD sample collection and processing. Proteomic data for this figure were generated on an Exploris 480. B, bar plots showing comparable protein yields (n = 2 scrolls per condition) across both storage conditions obtained after lysis using Covaris AFA. The height of each bar represents the mean, with error bars indicating the SD. C, bar plots showing enhanced peptide yields from whole tissue loaded on the S-Trap compared to clear lysates only. Demonstrated with breast cancer (BRC1, n = 2) and colorectal cancer (CRC1, n = 2) samples. The height of each bar represents the mean, with error bars indicating the SD. D, comparison of relative abundance across shared proteins identified from clear lysate and whole tissue. Relative abundance is calculated by dividing each protein's abundance by the total protein abundance for each sample. Both methods show strong Spearman correlation. E, unique protein and peptide identifications from MMD tissue with a median area of ∼57 mm2 for CRC and BRC samples (n = 4), from 1000 ng of peptide input for LC-MS analysis. The height of each bar represents the mean, with error bars indicating the SD. F, protein abundance CVs among experimental replicates (n = 4) for each block. The violin plot illustrates data distribution based on the kernel density estimate (KDE). The inner lines represent the 25th, 50th, and 75th percentiles of the IQR. AFA, Adaptive Focused Acoustics; IQR, interquartile range; TLB, tissue lysis buffer.
Peptide yields obtained by processing 50 μg of protein were inconsistent, with a single dry-stored BRC2 sample exhibiting an unexpectedly high-peptide yield (Supplemental Fig. S2A). We hypothesized that the unexpectedly higher yield of peptides of this sample may have resulted from inadvertently loading more of the pellet material into the S-Trap. To examine this possibility, we processed additional BRC1 and CRC1 lysates, this time including the undissolved tissue (referred to here as “whole tissue”) (Fig. 2C). We hypothesized that use of S-Traps enabled the processing of whole tissue because the S-Trap mesh trapped insoluble tissue material, enabling further protein digestion and extraction of additional peptides. As a result of this finding, we subsequently processed only whole, lysed tissue by S-Trap, resulting in a >2-fold increase in peptide yields (Fig. 2C), while maintaining equivalent depth and protein abundance profile (Supplemental Fig. S2, B–C).
To investigate whether specific proteins were being extracted more efficiently using the whole tissue approach, we calculated the relative abundance for each protein within a sample, dividing the abundance of each protein by the protein abundance sum of the sample. The calculated relative abundances of certain cytoskeletal proteins, such as COL1A2 and KRT19, were increased (Fig. 2D). Notably, the relative abundance across all identified proteins was still highly correlated (Spearman correlation = 0.92), indicating strong global agreement between clear lysate and whole-tissue processing (Fig. 2D). Overall, we identified up to 8000 unique proteins in each of the MMD samples. An increase of 600 to nearly 1000 identified proteins (Fig. 2E) relative to the numbers shown in Figure 1D resulted in part from analyzing 1 μg of peptides rather than the 500 ng used previously. This increased peptide input was adopted for all subsequent experiments. Median protein abundance CV was <20% across 16 MMD samples representing four replicates each from two CRC blocks and two BRC blocks, demonstrating a high level of reproducibility (Fig. 2F).
Despite desalting peptides using standard tC18 StageTips, we consistently observed the deposition of white material on the column emitter, likely contributing to performance degradation. To address this, we tested peptide cleanup with styrene divinylbenzene reverse phase sulfonate (SDB-RPS) as the StageTip material, which is known for its potential to improve peptide recovery (30, 31). SDB-RPS cleanup resulted in the near-complete disappearance of these troublesome emitter deposits, enabling LC-MS/MS analysis to be carried out over extended periods without interruption. While no improvements in peptide recovery were observed with SDB-RPS, there was a slight reduction in miscleavage rates. The number of identifications was close to that seen with tC18 (Supplemental Fig. S, D–E, Supplemental Table S2), with consistency in protein abundance distributions (Supplemental Fig. S2F, Supplemental Table S2).
Fixation of tissue with formalin results in a range of poorly defined chemical modifications of proteins and peptides (4, 32, 33). The decrosslinking step aims to reverse these formalin-induced chemical modifications. To investigate the decrosslinking efficiency of our workflow, we used FragPipe’s open-search tools to identify fixation-related mass shifts from DDA analyses of a pooled BRC/CRC sample by PSMs (Supplemental Fig. S3A, Supplemental Table S2, see methods, (34, 35)). The distribution of the observed mass shifts spanned ∼400 Da, with a peak at zero (i.e., no adventitious modification), representing 70% of the PSM library (Supplemental Fig. S3B, Supplemental Table S2). Approximately 10% of all PSMs exhibited mass shifts, including common formalin modifications such as methylation, formylation, formaldehyde adducts, and methylol groups (Supplemental Fig. S3C, Supplemental Table S2). These FFPE-related modifications were largely localized to lysine residues (Supplemental Fig. S3D, Supplemental Table S2), consistent with previously reported studies (4). Globally, we identified approximately 74,000 peptidoforms, with 13% of peptides identified only in their chemically modified form, and 25% of peptides showing both modified and unmodified forms (Supplemental Fig. S3E, Supplemental Table S2). At the protein level, 20% of proteins carried PSMs showing common formalin-induced artifacts (methylation (+14.03 Da), formaldehyde adduct (+12.01 Da), formylation (27.99 Da), and methylol (+30.00 Da), Supplemental Fig. S3F). The modified PSM fraction average was 5.8% with a maximum of 60% (Supplemental Fig. S3F), which highlights efficient decrosslinking early in the process. Vesicle-associated compartments (secretory vesicle, secretory granule, vesicle lumen), supramolecular complexes (e.g., ribonucleoprotein complex), and membrane junctions (anchoring junction, cell substrate junction) were overrepresented in the subset of proteins that were formalin-modified (Supplemental Fig. S3G). These results suggest that while decrosslinking was largely effective on the protein level, residual modifications persist in specific subcellular components for various reasons such as protein accessibility, structural constraints, or local crosslinking density. Our results are in line with previous work by Coscia et al., 2020, which showed lysine methylation in the 2 to 6% range in FFPE ovarian cancer and glioma tissue relative to frozen tissue.
Evaluation of Quantitative Accuracy of TMT and DIA Analyses of FFPE Using Multiple Instrument Platforms
Quantitative accuracy and reproducibility of DIA-generated data can be compromised as a result of fragment ion interference, especially as gradients are shortened. To assess this, we tested instruments that offer high-throughput capabilities, including the Bruker timsTOF HT and the Thermo Fisher Orbitrap Astral. Detailed timsTOF HT and Orbitrap Astral data acquisition methods are described in the Methods section (Supplemental Fig. S4A and B, Supplemental Table S3).
To assess quantification, we utilized FFPE blocks obtained from wild-type (WT) mice and mice overexpressing (OE) Ncoa4 (19). As illustrated in Figure 3, A and B, we conducted three independent experiments using scrolls obtained from the two WT and two OE blocks. Each FFPE block contained multiple organs from a single WT or Ncoa4-expressing mouse, with a few specific exceptions highlighted in Supplemental Figure S4, C and D. Aliquots of each sample were analyzed on the Exploris 480 (110 min gradient), timsTOF HT (35 min gradient), and Orbitrap Astral (24 min gradient). DDA data were also generated by TMT labeling ∼50 μg peptide aliquots of four replicates each of WT and OE in an 18-plex format (Supplemental Table S3). Following offline fractionation into 24 fractions, data were acquired on the Exploris 480 using MS2-based peptide quantification. DIA data were searched using DIA-NN on Terra (see Methods), while DDA data were processed using FragPipe.
Fig. 3.
Evaluation of quantitative accuracy of TMT and DIA of FFPE tissue using multiple instrument platforms. A, schematic representation of mouse Ncoa4 WT and OE FFPE blocks. Peptides were either TMT-labeled or analyzed label-free using various DIA methods and instruments, as highlighted in the figure. TMT data were searched using FragPipe, while all DIA searches were performed using DIA-NN because that supported both Thermo and Bruker raw files. B, protein identifications for each experiment (n = 16 per experiment), from 1000 ng of peptide input for LC-MS analysis. Bars represent the average, with error bars indicating the SD. C, median normalized protein abundance CVs from experimental replicates (n = 4 per sample). The box represents the interquartile range (IQR), with the top and bottom edges indicating the 75th and 25th percentiles, respectively. The line within the box denotes the median, and whiskers extend to include data within 1.5x the IQR. D, quantified Ncoa4 fold change between WT and OE mouse models across each TMT and DIA method (x-axis) compared to DIA on Astral (y-axis). E, scatter plots show protein fold-change for two WT and OE pairs, labeled WT1, WT2, OE1, and OE2. F, Volcano plot illustrating differential protein abundance between WT2 and OE2. Significant upregulated or downregulated (p value <0.05) proteins are indicated by red and blue, respectively. DIA, data-independent acquisition; FFPE, formalin-fixed, paraffin-embedded; OE, overexpressing; TMT, tandem mass tag.
As anticipated, the highly fractionated TMT DDA data provided the deepest coverage, encompassing over 10,000 unique proteins. However, the need to analyze each of the 24 fractions using a 110 min gradient resulted in an effective sequencing speed of just 8 samples per day (SPDs) (Fig. 3B). In contrast, DIA on the Astral (4 m/z windows) yielded ∼9900 unique proteins per sample using a 24-min gradient, corresponding to 60 SPD. diaPASEF on the timsTOF HT yielded ∼8300 unique proteins per sample using a 35-min gradient, corresponding to 30 SPD. Wide-window (∼15 m/z) DIA on the Exploris 480 identified ∼8000 unique proteins using a 110 min gradient corresponding to 12 SPD (Fig. 3B, Supplemental Table S3). DDA analysis of the highly fractionated TMT-labeled sample quantified ∼108,000 unique peptides, while Astral DIA achieved a depth of ∼90,000 unique peptides per sample and diaPASEF ∼74,000 unique peptides per sample (Supplemental Fig. S4B, Supplemental Table S3). Given the slower speed of the Exploris 480, DIA acquisition on this instrument used wider windows of variable width and a shorter mass range than the Astral (25 Da windows, m/z: 400–850 versus 4 Da, m/z: 380–980) and quantified ∼56,000 unique peptides, 40% fewer than the other methods we tested. The median protein abundance CVs for each of the four experimental replicates were all below 20%, calculated using the DIA-NN quantification algorithm in FragPipe after sample-wise median normalization (Fig. 3C).
Differential expression between OE and WT was evaluated to assess quantitative reliability and reproducibility of the TMT and DIA datasets. Ncoa4 was prominently upregulated in the OE FFPE samples in each of the DIA datasets (Fig. 3D), with high correlation of protein fold change between OE and WT across the different platforms (Pearson’s correlation >0.8) using the DIA dataset generated on the Astral platform as a comparator (Fig. 3E). As expected, the TMT data were highly compressed relative to the DIA results despite the use of 24 off-line fractions.
Ncoa4 selectively regulates autophagic degradation of ferritin, the key cytosolic iron storage complex (36). In addition to observing the expected upregulation of Ncoa4 protein, we also observed increased levels of proteins involved in iron metabolism such as Steap4, a metalloreductase that reduces Fe3+ to Fe2+ (37), and transferrin receptor 2, which allows iron transportation into cells, likely reflecting feedback mechanisms in response to Ncoa4 overexpression (Fig. 3F).
Semiautomated 96-Well Plate FFPE Proteome and Phosphoproteome Workflow
Having established a processing pipeline capable of deep-scale proteome profiling of FFPE BRC and CRC tissue, we aimed to assess the feasibility of phosphoproteome analysis and enhance the workflow by incorporating automated components to enable efficient 96-well plate processing (Fig. 4A). Two enabling features of the Opentrons OT-2 robot are the ability to vary digestion buffer volumes for loading on the S-trap plate wells and to normalize peptide input amounts postdigestion, ensuring precision and reproducibility throughout the workflow. By implementing a strategy for StageTip desalting in a 96-well plate format (see Methods), along with plate-based phosphopeptide enrichment and LC-MS injections, the workflow was optimized for high-throughput analysis. The performance of this optimized pipeline was benchmarked using a cohort of LUAD FFPE blocks of resection tissues from 13 patients. The blocks were macrodissected to enrich tumor-rich sections and digested for global proteome analysis. Twelve of the 13 samples yielded enough peptide to allow enrichment of phosphopeptides from 50 μg peptide input using Fe(III)-NTA AssayMap cartridges on the automated Agilent Bravo liquid handling robot (see Methods).
Fig. 4.
Application to clinical lung cancer FFPE samples. A, semiautomated plate-based workflow for processing of LUAD (lung adenocarcinoma cancer) samples. Opentrons OT-2 automation steps utilized CSV worksheet inputs for easily adjusting plate-to-plate volumes (see Methods for details). B–C, bar plots showing unique protein and peptide depth from LUAD FFPE blocks processed with manual macrodissection (MMD) (n = 13), from 1000 ng of peptide input for LC-MS/MS analysis. Bar colors indicate the protein identification rate: cumulative (across all samples), total proteins detected in ≥70% of samples, median protein count per sample, and protein groups consistently quantified in all 13 samples. D, bar plot showing the number of unique class I phosphosites (localization probability >0.75) identified across the experiment. Each sample is representative of a 50 μg of peptide input for phosphoenrichment. Bar colors indicate the site identification rate: cumulative (across all samples), total sites detected in ≥70% of samples, median site count per sample, and phosphosites consistently quantified in all 12 samples. E, rank plot displaying the dynamic range of quantified proteins across all blocks; each point represents median expression, and error bars indicate the quantified range for the experiment. F, pathway diagram highlighting non–small cell lung cancer pathway-relevant proteins and phosphosites quantified by the workflow. Nodes detected by proteome alone or by both proteome and phosphoproteome are marked with yellow and pink-yellow boxes, respectively. G, NMF clustering of the proteome dataset, resulting in three clusters (C1-3). The heat map shows pathway terms from gene set enrichment analysis (GSEA) using the MSigDB hallmark gene sets and pathways with (FDR < 0.05) is indicated with an asterisk. H, heat map showing the significant phosphosites driving NMF clustering, with a handful of sites highlighted. FFPE, formalin-fixed, paraffin-embedded; LUAD, lung adenocarcinoma; NMF, non-negative matrix factorization.
Based on the robust performance of the Orbitrap Astral in prior experiments, this instrument was selected to evaluate the LUAD FFPE samples (see Methods) using the 60 SPD (24 min) Astral DIA (4 m/z) method. This approach cumulatively quantified approximately 9700 unique proteins and 120,000 peptides, with a median of ∼9100 unique proteins and ∼98,000 peptides per sample (Fig. 4, B and C, Supplemental Table S4). Anticipating that the Astral’s enhanced performance would similarly benefit phosphoproteome analysis, we applied the same Astral DIA method (4 m/z) to these samples. In total, we cumulatively identified ∼28,000 unique class I phosphosites (localization probability ≥ 0.75) (Fig. 4D), with a median of ∼11,000 (Fig. 4D) and ∼30,000 sites per sample, for class I and class II (localization probability <0.75), respectively (Supplemental Table S4). The experiment-wide median log2 abundance of proteins spanned a dynamic range ∼20 in log2 scale (Fig. 4E).
To verify the biological relevance of quantified proteins, we mapped these proteins back to non–small cell lung cancer pathways in the Kyoto Encyclopedia of Genes and Genomes database. We captured 36 proteins that are actively implicated in the non–small cell lung cancer cancer pathways reported by Kyoto Encyclopedia of Genes and Genomes (38), of which 21 also exhibited phosphorylation events (Fig. 4F). NMF clustering using proteome and phosphoproteome as either combined or individual -omes (25, 26) resulted in three clusters (Supplemental Fig. S5, A–C) with minor variation in cluster assignments across the different inputs (Supplemental Fig. S5D). The first proteomics cluster (C1) showed upregulation of gene sets related to immune signaling, epithelial to mesenchymal transition, and apical junction. The second cluster (C2) demonstrated upregulation of metabolic pathways including oxidative phosphorylation and fatty acid metabolism, and downregulation of MYC and E2F targets. The third cluster (C3) was characterized by downregulation of immune signaling pathways and upregulation of MYC targets and DNA repair (Fig. 4G), which was also consistent with the phosphoproteomic clustal assignment (Supplemental Fig. S5E). Although the cohort is small, the gene set enrichment analysis results illustrate the ability of standard data analysis approaches applied to FFPE proteomic data to yield biologically relevant molecular characterization of clinical samples. We also identified differential phosphorylation of several proteins with established roles in LUAD (Supplemental Fig. S5F). Phosphoproteomic NMF cluster C1 showed downregulation at S496 in PRKAA1 and S378 in PTPN1, potentially reflecting a shift in cellular processes such as energy metabolism (39) and phosphatase activity regulation (40) (Fig. 4H). In C2, S184 in TRAPPC12 was relatively downregulated, with dephosphorylation events in TRAPPC12 known to be necessary for cell cycle progression (41). Phosphorylation of BRAF at S729, a site known to act as a negative feedback regulator, was lower in C3 than other clusters. Downregulation of the inhibitory phosphosite may lead to BRAF-mediated MAPK/ERK signaling and enhanced cellular proliferation (42, 43). Additionally, upregulated phosphorylation was noted at SHC1 S139, a key site in receptor tyrosine kinase signaling reported to be upregulated in clear cell renal cell carcinoma and downregulated in colon cancer (38), highlighting its variable role in oncogenic processes. Also notable were increased phosphorylation of CTND1 Y257, a critical modulator of the Ras-MAPK pathway that influences cellular proliferation, survival, and migration (44).
In summary, using this small sample set, we demonstrate the overall feasibility of identifying over 9000 unique proteins from FFPE LUAD tumors and mapping those proteins to relevant biological processes. The identification of ∼28,000 localized unique phosphosites, with a median of 11,000 phosphosites per sample, further demonstrates the feasibility of phosphopeptide analysis of FFPE tissue samples using our workflow.
Discussion
We have developed a pipeline comprising high-throughput FFPE tissue processing workflows and optimized data acquisition methods for proteomic and phosphoproteomic applications with increased data depth and completeness as compared to previous DIA-based FFPE analyses. By utilizing ultrasonication with adaptive focused acoustics, we eliminate the need for xylene-based deparaffinization. Tissue digestion using plate-based S-Trap methods with an additional chloroform/MeOH wash and SDB-RPS StageTip desalting allows for cleaner samples. Label-free DIA analysis shortens the time needed for sample preparation as it avoids chemical labeling, achieving a turnaround time of <1 week for proteome-level analysis of a 96-well plate of FFPE material, with an additional 3 days required for phosphoproteome analysis. The use of DIA in this platform achieves a proteome depth similar to that obtained with TMT, identifying up to ∼10,000 unique proteins from each sample, but with much faster sample processing and data generation. These same approaches may also be applicable to tissue microarray samples. While many individual components of the workflow have been previously described, the value of our work is in the integrated optimization and validation of an end-to-end workflow, with clear demonstration of applicability for proteome and phosphopeptide analysis from FFPE tissue, including the ability to derive biologically relevant information.
Using an automated workflow, >11,000 localized (class I) phosphosites were enriched from macrodissected LUAD FFPE tissue samples using the Orbitrap Astral. Bioinformatic analysis of the proteome and phosphoproteome data generated on a cohort of LUAD samples from archived blocks expands the established utility of proteomics for cancer biology into FFPE sample characterization, with the potential of adding critical insights pertaining to oncogenic mechanisms across multiple -omes. The application of proteogenomic data analysis techniques to this small cohort of samples demonstrates the utility of high-throughput, plate-based FFPE proteomics and phosphoproteomics for derivation of meaningful biology from a larger clinical cohort of samples. Future studies are aimed at obtaining a deeper understanding of the biological relevance of these and other posttranslationally modified peptides and their stability in FFPE samples compared to flash-frozen tissue.
Data Availability
Data is available on MassIVE (http://massive.ucsd.edu) with identifier MSV000096596.
Supplemental Data
This article contains supplemental data.
Conflict of Interest
S. A. C. is a member of the scientific advisory boards of Kymera, PTM BioLabs, MOBILion, PrognomIQ. M. A. G. is on the scientific advisory board of PrognomIQ. The work described in this manuscript was conducted while S.S. was the Broad Institute. S. S. is currently a full-time employee of AstraZeneca, and AstraZeneca has no role in this study. The other authors declare no competing interests.
Acknowledgments
We also thank Dr Joseph Mancias and Matthew J. Dorman (Dana Farber Cancer Institute, Harvard Medical School) for the Ncoa4 WT and overexpressing mice FFPE blocks. We also thank Lia Abarzua, Robert Riegelhaupt, Eugenio Daviso, and Sameer Vasantgadkar from Covaris (Woburn, MA) for the guidance with protocols and Lilian Heil from Thermo Fisher Scientific, San Jose, for the assistance with data generation in the Thermo Fisher Science demo laboratory in San Jose, California. BioRender was used to make multiple figure panels in this article (agreement number: SR259LM2NJ). We also thank Natalie Clark and C. Williams for assistance with data analysis.
Author Contributions
M. H., M. A. G., S. S., and S. A. C. conceptualization; M. H., J. R. T., S. G., C. C., C. N., D. C. R., G. H., D. R. M., M. A. G., S. S., and S. A. C. writing–review and editing; M. H., J. R. T., M. A. G., S. S., and S. A. C. writing–original draft; M. H. visualization; M. H. validation; M. H., C. N., D. C. R., G. H., and S. S. resources; M. H., D. C. R, M. A. G., and S. S. project administration; M. H., J. R. T., M. A. G., S. S., and S. A. C. methodology; M. H., J. R. T., M. A. G., and S. S. investigation; M. H., J. R. T., and S. M. formal analysis; M. H., J. R. T., and C. C. data curation; S. G. and D. R. M. software; C. C., D. R. M., M. A. G., S. S., and S. A. C. supervision; S. S. , M.A.G., D.R.M, S. A. C. funding acquisition.
Funding and Additional Information
This work was supported by the following awards and grants - Broad Institute SPARC (#800444) awards to S. S. and S. A. C., National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC) grants U24CA270823 to S. S., M. A. G., and S. A. C., U01CA271402 to M. A. G. and S. A. C., U24CA271075 to D.R.M and, Dr Miriam and Sheldon G. Adelson Medical Research Foundation to S. A. C.
Footnotes
Present address for [Shankha Satpathy]: AstraZeneca, Waltham, Massachusetts, USA
Contributor Information
Shankha Satpathy, Email: shankha@broadinstitute.org.
Steven A. Carr, Email: scarr@broad.mit.edu.
Supplementary Data
References
- 1.Faktor J., Kote S., Bienkowski M., Hupp T.R., Marek-Trzonkowska N. Novel FFPE proteomics method suggests prolactin induced protein as hormone induced cytoskeleton remodeling spatial biomarker. Commun. Biol. 2024;7:708. doi: 10.1038/s42003-024-06354-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Buczak K., Ori A., Kirkpatrick J.M., Holzer K., Dauch D., Roessler S., et al. Spatial tissue proteomics quantifies Inter- and intratumor heterogeneity in hepatocellular carcinoma (HCC) Mol. Cell. Proteomics. 2018;17:810–825. doi: 10.1074/mcp.RA117.000189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zhu Y., Weiss T., Zhang Q., Sun R., Wang B., Yi X., et al. High-throughput proteomic analysis of FFPE tissue samples facilitates tumor stratification. Mol. Oncol. 2019;13:2305–2328. doi: 10.1002/1878-0261.12570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Coscia F., Doll S., Bech J.M., Schweizer L., Mund A., Lengyel E., et al. A streamlined mass spectrometry-based proteomics workflow for large-scale FFPE tissue analysis. J. Pathol. 2020;251:100–112. doi: 10.1002/path.5420. [DOI] [PubMed] [Google Scholar]
- 5.Tüshaus J., Eckert S., Schliemann M., Zhou Y., Pfeiffer P., Halves C., et al. Towards routine proteome profiling of FFPE tissue: insights from a 1,220-case pan-cancer study. EMBO J. 2024;44:304–329. doi: 10.1038/s44318-024-00289-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Velasquez E., Szadai L., Zhou Q., Kim Y., Pla I., Sanchez A., et al. A biobanking turning-point in the use of formalin-fixed, paraffin tumor blocks to unveil kinase signaling in melanoma. Clin. Transl. Med. 2021;11:e466. doi: 10.1002/ctm2.466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Friedrich C., Schallenberg S., Kirchner M., Ziehm M., Niquet S., Haji M., et al. Comprehensive micro-scaled proteome and phosphoproteome characterization of archived retrospective cancer repositories. Nat. Commun. 2021;12:1–15. doi: 10.1038/s41467-021-23855-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Eckert S., Chang Y.-C., Bayer F.P., Matthew T., Kuhn P.-H., Weichert W., et al. Evaluation of disposable trap column nanoLC–FAIMS–MS/MS for the proteomic analysis of FFPE tissue. J. Proteome. Res. 2021;20:5402–5411. doi: 10.1021/acs.jproteome.1c00695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Soni R.K. Protocol for deep proteomic profiling of formalin-fixed paraffin-embedded specimens using a spectral library-free approach. STAR Protoc. 2023;4 doi: 10.1016/j.xpro.2023.102381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mar D., Babenko I.M., Zhang R., Noble W.S., Denisenko O., Vaisar T., et al. A high-throughput PIXUL-Matrix-Based toolbox to profile frozen and formalin-fixed paraffin-embedded tissues multiomes. Lab. Invest. 2024;104 doi: 10.1016/j.labinv.2023.100282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Pujari G.P., Mangalaparthi K.K., Madden B.J., Bhat F.A., Charlesworth M.C., French A.J., et al. A high-throughput workflow for FFPE tissue proteomics. J. Am. Soc. Mass. Spectrom. 2023;34:1225–1229. doi: 10.1021/jasms.3c00099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Marchione D.M., Ilieva I., Devins K., Sharpe D., Pappin D.J., Garcia B.A., et al. HYPERsol: high-quality data from archival FFPE tissue for clinical proteomics. J. Proteome. Res. 2020;19:973–983. doi: 10.1021/acs.jproteome.9b00686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Herrera J.A., Mallikarjun V., Rosini S., Montero M.A., Lawless C., Warwood S., et al. Laser capture microdissection coupled mass spectrometry (LCM-MS) for spatially resolved analysis of formalin-fixed and stained human lung tissues. Clin. Proteomics. 2020;17:24. doi: 10.1186/s12014-020-09287-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wakabayashi M., Yoshihara H., Masuda T., Tsukahara M., Sugiyama N., Ishihama Y. Phosphoproteome analysis of formalin-fixed and paraffin-embedded tissue sections mounted on microscope slides. J. Proteome. Res. 2014;13:915–924. doi: 10.1021/pr400960r. [DOI] [PubMed] [Google Scholar]
- 15.Zeneyedpour L., Stingl C., Dekker L.J.M., Mustafa D.A.M., Kros J.M., Luider T.M. Phosphorylation ratio determination in fresh-frozen and formalin-fixed paraffin-embedded tissue with targeted mass spectrometry. J. Proteome. Res. 2020;19:4179–4190. doi: 10.1021/acs.jproteome.0c00354. [DOI] [PubMed] [Google Scholar]
- 16.Bruderer R., Bernhardt O.M., Gandhi T., Miladinović S.M., Cheng L.-Y., Messner S., et al. Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues. Mol. Cell. Proteomics. 2015;14:1400–1410. doi: 10.1074/mcp.M114.044305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yu F., Teo G.C., Kong A.T., Fröhlich K., Li G.X., Demichev V., et al. Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nat. Commun. 2023;14:4154. doi: 10.1038/s41467-023-39869-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Demichev V., Messner C.B., Vernardis S.I., Lilley K.S., Ralser M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods. 2020;17:41–44. doi: 10.1038/s41592-019-0638-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Santana-Codina N., Del Rey M.Q., Kapner K.S., Zhang H., Gikandi A., Malcolm C., et al. NCOA4-Mediated ferritinophagy is a pancreatic cancer dependency via maintenance of iron bioavailability for iron-sulfur cluster proteins. Cancer Discov. 2022;12:2180–2197. doi: 10.1158/2159-8290.CD-22-0043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Skowronek P., Thielert M., Voytik E., Tanzer M.C., Hansen F.M., Willems S., et al. Rapid and In-Depth coverage of the (Phospho-)Proteome with deep libraries and optimal window design for diaPASEF. Mol. Cell. Proteomics. 2022;21 doi: 10.1016/j.mcpro.2022.100279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Satpathy S., Jaehnig E.J., Krug K., Kim B.-J., Saltzman A.B., Chan D.W., et al. Microscaled proteogenomic methods for precision oncology. Nat. Commun. 2020;11:532. doi: 10.1038/s41467-020-14381-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bekker-Jensen D.B., Bernhardt O.M., Hogrebe A., Martinez-Val A., Verbeke L., Gandhi T., et al. Rapid and site-specific deep phosphoproteome profiling by data-independent acquisition without the need for spectral libraries. Nat. Commun. 2020;11:787. doi: 10.1038/s41467-020-14609-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Chambers M.C., Maclean B., Burke R., Amodei D., Ruderman D.L., Neumann S., et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 2012;30:918–920. doi: 10.1038/nbt.2377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mani D.R., Maynard M., Kothadia R., Krug K., Christianson K.E., Heiman D., et al. PANOPLY: a cloud-based platform for automated and reproducible proteogenomic data analysis. Nat. Methods. 2021;18:580–582. doi: 10.1038/s41592-021-01176-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Satpathy S., Krug K., Jean Beltran P.M., Savage S.R., Petralia F., Kumar-Sinha C., et al. Clinical Proteomic Tumor Analysis Consortium A proteogenomic portrait of lung squamous cell carcinoma. Cell. 2021;184:4348–4371.e40. doi: 10.1016/j.cell.2021.07.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gillette M.A., Satpathy S., Cao S., Dhanasekaran S.M., Vasaikar S.V., Krug K., et al. Clinical Proteomic Tumor Analysis Consortium Proteogenomic characterization reveals therapeutic vulnerabilities in lung adenocarcinoma. Cell. 2020;182:200–225.e35. doi: 10.1016/j.cell.2020.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Krug K., Jaehnig E.J., Satpathy S., Blumenberg L., Karpova A., Anurag M., et al. Clinical Proteomic Tumor Analysis Consortium Proteogenomic landscape of breast cancer tumorigenesis and targeted therapy. Cell. 2020;183:1436–1456.e31. doi: 10.1016/j.cell.2020.10.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Müller T., Kalxdorf M., Longuespée R., Kazdal D.N., Stenzinger A., Krijgsveld J. Automated sample preparation with SP3 for low-input clinical proteomics. Mol. Syst. Biol. 2020;16 doi: 10.15252/msb.20199111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Humphries E.M., Loudon C., Craft G.E., Hains P.G., Robinson P.J. Quantitative comparison of deparaffinization, rehydration, and extraction methods for FFPE tissue proteomics and phosphoproteomics. Anal. Chem. 2024;96:13358–13370. doi: 10.1021/acs.analchem.3c04479. [DOI] [PubMed] [Google Scholar]
- 30.Myers S.A., Rhoads A., Cocco A.R., Peckner R., Haber A.L., Schweitzer L.D., et al. Streamlined protocol for deep proteomic profiling of FAC-sorted cells and its application to freshly isolated murine immune cells. Mol. Cell. Proteomics. 2019;18:995–1009. doi: 10.1074/mcp.RA118.001259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kulak N.A., Pichler G., Paron I., Nagaraj N., Mann M. Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat. Methods. 2014;11:319–324. doi: 10.1038/nmeth.2834. [DOI] [PubMed] [Google Scholar]
- 32.Zhang Y., Muller M., Xu B., Yoshida Y., Horlacher O., Nikitin F., et al. Unrestricted modification search reveals lysine methylation as major modification induced by tissue formalin fixation and paraffin embedding. Proteomics. 2015;15:2568–2579. doi: 10.1002/pmic.201400454. [DOI] [PubMed] [Google Scholar]
- 33.Lai Z.W., Weisser J., Nilse L., Costa F., Keller E., Tholen M., et al. Formalin-fixed, paraffin-embedded tissues (FFPE) as a robust source for the profiling of native and protease-generated protein amino termini. Mol. Cell. Proteomics. 2016;15:2203–2213. doi: 10.1074/mcp.O115.056515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Yu F., Teo G.C., Kong A.T., Haynes S.E., Avtonomov D.M., Geiszler D.J., et al. Identification of modified peptides using localization-aware open search. Nat. Commun. 2020;11:4065. doi: 10.1038/s41467-020-17921-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Geiszler D.J., Kong A.T., Avtonomov D.M., Yu F., Leprevost F. da V., Nesvizhskii A.I. PTM-Shepherd: analysis and summarization of post-translational and chemical modifications from open search results. Mol. Cell. Proteomics. 2021;20 doi: 10.1074/mcp.TIR120.002216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Mancias J.D., Wang X., Gygi S.P., Harper J.W., Kimmelman A.C. Quantitative proteomics identifies NCOA4 as the cargo receptor mediating ferritinophagy. Nature. 2014;509:105–109. doi: 10.1038/nature13148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Scarl R.T., Lawrence C.M., Gordon H.M., Nunemaker C.S. STEAP4: its emerging role in metabolism and homeostasis of cellular iron and copper. J. Endocrinol. 2017;234:R123–R134. doi: 10.1530/JOE-16-0594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kanehisa M., Sato Y. KEGG mapper for inferring cellular functions from protein sequences. Protein Sci. 2020;29:28–35. doi: 10.1002/pro.3711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zhang Y., Zhou X., Cheng L., Wang X., Zhang Q., Zhang Y., Sun S. (1AD) PRKAA1 promotes proliferation and inhibits apoptosis of gastric cancer cells through activating JNK1 and Akt pathways. Oncol. Res. 2020;28:213–223. doi: 10.3727/096504019X15668125347026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Davidson B., Bock A.J., Holth A., Nymoen D.A. The phosphatase PTPN1/PTP1B is a candidate marker of better chemotherapy response in metastatic high-grade serous carcinoma. Cytopathology. 2021;32:161–168. doi: 10.1111/cyt.12921. [DOI] [PubMed] [Google Scholar]
- 41.Milev M.P., Hasaj B., Saint-Dic D., Snounou S., Zhao Q., Sacher M. TRAMM/TrappC12 plays a role in chromosome congression, kinetochore stability, and CENP-E recruitment. J. Cell. Biol. 2015;209:221–234. doi: 10.1083/jcb.201501090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Shen C.-H., Yuan P., Perez-Lorenzo R., Zhang Y., Lee S.X., Ou Y., et al. Phosphorylation of BRAF by AMPK impairs BRAF-KSR1 association and cell proliferation. Mol. Cell. 2013;52:161. doi: 10.1016/j.molcel.2013.08.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Chen J., Gao G., Li L., Ding J., Chen X., Lei J., et al. Pan-cancer study of SHC-adaptor protein 1 (SHC1) as a diagnostic, prognostic and immunological biomarker in human cancer. Front. Genet. 2022;13 doi: 10.3389/fgene.2022.817118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Yadav L., Pietilä E., Öhman T., Liu X., Mahato A.K., Sidorova Y., et al. PTPRA phosphatase regulates GDNF-dependent RET signaling and inhibits the RET mutant MEN2A oncogenic potential. iScience. 2020;23 doi: 10.1016/j.isci.2020.100871. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data is available on MassIVE (http://massive.ucsd.edu) with identifier MSV000096596.




