Spectronaut and DIA-NN: A Comparison of their Performance in the Analysis of Lung Adenocarcinoma Biopsies

Zhaoyan Yu; Anqi Du; Xiaona Xu; Yinan Li; Xiaoxue Ma; Wenyang Zhang; Yunzeng Zhang; Ivan K Chu; K W Michael Siu

doi:10.1021/acsomega.5c10421

. 2026 Jan 26;11(5):8080–8093. doi: 10.1021/acsomega.5c10421

Spectronaut and DIA-NN: A Comparison of their Performance in the Analysis of Lung Adenocarcinoma Biopsies

Zhaoyan Yu ^†,^‡,^§, Anqi Du ^§, Xiaona Xu ^§, Yinan Li ^§, Xiaoxue Ma ^§, Wenyang Zhang ^∥,^§, Yunzeng Zhang ^⊥,^*, Ivan K Chu ^§,^▼,^*, K W Michael Siu ^§,^∇,^*

PMCID: PMC12903166 PMID: 41696233

Abstract

Direct data-independent acquisition (DIA) in liquid chromatography–tandem mass spectrometry (LC–MS/MS) has developed into a powerful methodology for proteomics. Herein, we directly compare the performance of two widely used DIA computational platformsSpectronaut and DIA-NNusing paired tumor and peritumor tissue biopsies from human lung carcinoma (LUAD) patients, samples that one would encounter in a clinical research lab in a hospital setting. The evaluations include the following: (1) protein identification depth, (2) quantitative consistency, and (3) differential expression on the same set of LC–MS/MS runs, using nominally identical computational parameters and settings. Both Spectronaut and DIA-NN identified ∼>7600 proteins in the LUAD samples, with 7180 proteins common to both platforms. Spectronaut reported 1250 upregulated proteins and 266 downregulated proteins (tumor versus peritumor), while DIA-NN reported 1819 and 174 proteins, respectively. A total of 1130 differentially expressed proteins (DEPs) were common to both platforms. Of the top 50 DEPs, 34 were shared between Spectronaut and DIA-NN. Two of these DEPsHSPA5 and CAV1were also among the top 50 proteins showing the highest degrees of protein–protein interaction. The DEPs enabled classification of the LUAD patients into three subtypes; the clinical validity of the current subtypes will need to be substantiated in the future with additional LUAD samples and experimentation.

graphic file with name ao5c10421_0013.jpg

graphic file with name ao5c10421_0011.jpg

Introduction

Data-independent acquisition (DIA) has evolved into a foundational method in liquid chromatography (LC)–tandem mass spectrometry (MS/MS) for deep proteomic analysis of complex samples. DIA started off as a means to obtain “a single, permanent digital file representing the mass spectrometry (MS)-measurable proteome of the sample” in “sequential window acquisition of all theoretical fragment ion spectra (SWATH)-MS”, a description that accurately summarizes the salient features of this technological advance. The currently more widely used, and arguably more generic, name of data-independent acquisition pinpoints the contrast of DIA with the more conventional data-dependent acquisition (DDA), which was necessitated by prioritizing MS/MS time for abundant peptides that eluted from the LC column due to the much slower scan speed of older-generation mass spectrometers. − DDA has worked well and led to documentation of many proteomes; however, its strength in targeting abundant peptides is also its weakness when the analytical goal shifts from determining abundant proteins to less-abundant proteins, many of which are more functionally significant in diseases, and to document the complete proteome. As the DIA sequential window in MS1 is typically much wider than unit mass (e.g., 15 m/z units), the resulting fragment ions in MS2 are products of multiple precursor (peptide) ions, the identification of which is much more complicated than that in DDA. Initially, peptide identification required matching to product-ion spectral libraries pregenerated by the users under DDA conditions that otherwise mimicked those used for DIA. − Advances in DIA software in recent years, however, have lessened this reliance on user-generated spectral libraries, as direct DIA which does not rely on pre-existing spectral libraries generated by DDA is now viable. −

Recent advances in DIA methodologies have significantly enhanced the reproducibility, depth, and throughput of proteomic analysis. Among the available computational platforms for DIA data processing, Spectronaut and DIA-NN have emerged as two of the most widely adopted tools due to their applicability in, and robust support for, both spectral library-based and library-free (direct DIA) workflows. Spectronaut is a commercial product available with many documented and supported tools; DIA-NN is a research product with limited support, but is widely used. In particular, the capacity to perform direct DIA, which enables identification and quantification of peptides without the need for pre-existing spectral libraries, has gained considerable traction in clinical and translational research where sample availability or spectral library completeness may be limited. These two computational platforms were judged to be the best-performing DIA data analysis software in a comprehensive study that examined four computational platforms. That evaluation was performed using yeast samples spiked with known quantities of mouse proteins. Other DIA platforms are also available.

Herein we aim to directly compare the performance of Spectronaut and DIA-NN in the context of library-free DIA proteomics using paired tumor and peritumor tissue samples from human lung adenocarcinoma (LUAD) patients. These are the types of samples that one would come across in a clinical research lab in a hospital setting, the examination of which by direct DIA has not been previously addressed. Our primary objectives include evaluation of (1) protein identification depth, (2) quantitative consistency, and (3) differential expression profiling between disease and nondisease tissues across the two platforms. Previous benchmarking studies have shown that both Spectronaut and DIA-NN offer high proteome coverage and precise quantification. A particularly important goal of ours is to examine the overlap and divergence of differentially expressed proteins (DEPs) that are determined via a direct comparison of these two computational platforms on the same set of LC-MS/MS runs, using nominally identical computational parameters and settings. Identification of DEPs plays a crucial first step in biomarker discovery that enables tumor-subtyping, and personalized medicine and treatment. −

Lung cancer is one of the most frequently diagnosed cancers and the leading cause of cancer-related deaths worldwide. Traditionally, lung cancer is classified as nonsmall-cell lung cancer (NSCLC), which represents ∼85% of total diagnoses; and small-cell lung cancer (SCLC), which represents ∼15%. For NSCLC, the major classes are lung adenocarcinoma or LUAD, and squamous cell carcinoma (SCC). , It is of note that lung cancer incidence is changing. The incidence of SCC, which used to be the most common class, has significantly decreased, partially as a consequence of reduced smoking. LUAD is strongly associated with smoking as well, but also most frequently associated with patients who have no history of tobacco smoking. LUAD also exhibits a correlation with the gender and the ethnicity of the patients, being more common in female Asians. ,

Methods

Patient Cohort and Sampling

We examined 24 pairs of LUAD tumor samples and their corresponding peritumor samples from patients who were treated in the Shandong Public Health Clinical Center Affiliated to Shandong University. All samples originated from lung tissues resected during surgical care. This study was approved by the hospital’s ethics board (certificate number: GWLCZXEC-SOP-K-2025-124); all patients provided informed consent. Relevant patient demographic and clinical information is available in Supporting Information, Table S1. Sampling of the tumor and peritumor tissues from the resected tissues of a given patient for proteomic analysis was performed as soon as possible after surgery. The tissue pieces were examined and dissected to remove nontumor tissues from designated tumor samples and vice versa. The samples were then snap-frozen at −80 °C until analysis.

Tissue Homogenization, Treatment and Digestion

Approximately 25 mg of LUAD sample was spiked with 100 μL of lysis buffer (Reagent 0, OSFP0001, OmicSolution, Shanghai, China); the buffer, as specified by the manufacturer, includes 50 mM Tris-HCl (pH 7.4), 150 mM NaCl, 1% (v/v) NP-40, 0.5% (w/v) sodium deoxycholate, and 0.1% (w/v) sodium dodecyl sulfate, supplemented with 1× protease inhibitor cocktail (P8340, Sigma-Aldrich, U.S.A.). The resulting tissue-to-buffer ratio was approximately 1:4 (w/v). Homogenization was performed using a Cryo High-Throughput Tissue Grinder (Model: JXFSTPRP-CLN, Shanghai Jingxin Industrial Development Co., Ltd.) with oscillating grinding for 1 min per cycle, repeated for a total of four cycles at 4 °C. The homogenate was then heated at 95 °C for 5 min to facilitate protein denaturation. After cooling to room temperature, the sample was centrifuged at 12,000 × g for 10 min at 4 °C using a High-Speed Refrigerated Microcentrifuge (Model: D1524R, Dalong Xingchuang Laboratory Instrument (Beijing) Co., Ltd.). The resulting supernatant was collected, and the total protein concentration was determined using Pierce Rapid Gold BCA Protein Assay Kit (A53225, ThermoFisher Scientific, U.S.A.) in accordance with the manufacturer’s instructions.

For each proteomic analysis, 30 μg of total protein per sample was used. The protein solution was adjusted to a final volume of 50 μL with 50 mM ammonium bicarbonate. Protein reduction was carried out by adding 2 μL of 0.1 M dithiothreitol (DTT, D8220-5g, Solarbio Life Sciences, Shanghai, China) (final concentration ∼3.8 mM) and incubating at 37 °C for 1 h. Subsequently, alkylation was performed in the dark by adding 2 μL of 0.2 M iodoacetic acid (IAA, I1149–5g, Sigma-Aldrich, U.S.A.) (final concentration ∼7.4 mM) and incubating at room temperature for 1 h. Trypsin digestion was initiated by adding 2 μg of sequencing-grade modified trypsin (Promega V5111, U.S.A.) to achieve an enzyme-to-substrate ratio of 1:15 (w/w). This digestion mixture was incubated overnight at 37 °C. The reaction was terminated by adding 25 μL of 10% (v/v) formic acid (FA) to achieve a final concentration of ∼3.2% FA, which lowered the pH and thereby inactivated trypsin. The mixture was vortexed thoroughly and then centrifuged at 12,000 × g for 5 min to pellet any insoluble material. The peptide-containing supernatant was desalted using Easy Pept C18 desalting columns (OSFP0050W, Shanghai Easy Biotechnology Co. Ltd.) as follows: Prior to use, each column was activated with 100 μL of methanol followed by 100 μL of 0.1% (v/v) FA. The sample was then loaded onto the column and allowed to pass through by gravity. The column was washed twice with 100 μL of 0.1% (v/v) FA to remove salts and contaminants. Finally, the peptides were eluted with 50 μL of 90% (v/v) acetonitrile (ACN) in 0.1% (v/v) FA. The eluate was collected and lyophilized using a vacuum concentrator (SCIENTZ-10N/LS, Ningbo Scientz Biotechnology Co., Ltd., Ningbo, Zhejiang). The dried peptide powder was reconstituted in 30 μL of 0.1% (v/v) FA for subsequent LC-MS/MS analysis.

LC–MS/MS

Liquid chromatography tandem mass spectrometry was executed on an Orbitrap Exploris 480 system (ThermoFisher Scientific, Bremen, Germany) equipped with a frontend LC Vanquish Neo system with binary pumps, autosampler and column compartment. All samples in the autosampler were kept at 4 °C. The LC column used was an Acclaim PepMap RSLC (75 μm × 25 cm nanoViper C18, 2 μm with 100 Å pores) at 55 °C. Solvent A was 0.1% formic acid in 100% water; Solvent B was 0.1% FA in 80/20 ACN/water. The 1-h solvent gradient comprised the following steps: 0 min, 2% B; 2 min, 8% B; 46 min, 22% B; 58 min, 30% B; and 60 min, 55% B.

The Orbitrap Exploris 480 was operated in DIA mode with a Nanospray Flex Ionization (NSI) source under the following conditions: spray voltage, 2.6 kV (positive ion mode); capillary temperature, 320 °C; S-lens RF level, 70%. For MS1 scanning: Full scans were acquired over an m/z range of 400–1200 at a resolving power of 120,000, with an automatic gain control (AGC) target of 1 × 10⁶, and a maximum injection time of 22 ms. For DIA MS2 acquisition: Precursor ions were isolated using fixed 15 m/z isolation windows with a 1 m/z overlap (resulting in 53 consecutive windows spanning m/z 400–1200). Each window was fragmented via higher-energy collisional dissociation (HCD) with a fixed normalized collision energy (NCE) of 32%. MS2 scans were acquired at a resolving power of 30,000, with an AGC target of 5 × 10⁴, and a maximum injection time of 50 ms. The total cycle time for MS1 + MS2 scans was approximately 2 s.

Computational Platforms and Parameters

Spectronaut was commercially available from Biognosys AG (Schlieren, Switzerland). Analyses were first performed using Version 19.9 and then repeated using Version 20.1, when the later version became available. The two versions gave similar results, but there were also not insignificant differences; we are presenting the Version 20.1 results in this article. DIA-NN was downloaded from https://github.com/vdemichev/DiaNN ; we used initially Version 1.9.2, but then switched to Version 2.1.0 for the bulk of the work. Again, the two DIA-NN versions gave similar results, but there were also notable differences; all DIA-NN data presented herein were generated with the later, Version 2.1.0 software.

The raw DIA data were directly employed for both Spectronaut and DIA-NN analyses. For peak extraction and quantification, Spectronaut employed a targeted peak picking strategy (DirectDIA workflow), where peptide signals were extracted based on predicted retention times and m/z values, followed by peak integration with interference correction to ensure quantitative accuracy. It used MaxLFQ, an algorithm optimized for proteome-wide label-free quantification through delayed normalization and maximal peptide ratio extraction. DIA-NN utilized a deep learning-assisted peak extraction approach, combining spectral library information and in silico predicted fragment ion patterns to deconvolve DIA spectra; peak quantification was based on summed fragment ion intensities after background subtraction. Quantification utilized a model-based framework, QuantUMS, designed for uncertainty minimization, which leveraged algorithms to refine peptide-to-protein rollup and reduce quantification variability. A human protein FASTA sequence database, obtained from Swiss-Prot (comprising 20,429 sequence entries and released in December 2023) was utilized. The following processing parameters were employed: a maximum of two missed cleavages was permitted; cysteine carbamidomethylation was set as a fixed modification, while N-terminal acetylation and methionine oxidization were regarded as variable modifications. For false discovery rate (FDR) control, Spectronaut implemented a three-level target-decoy strategy (at PSM, peptide, and protein levels) to enforce a 1% FDR at each level. DIA-NN controlled FDR at 1% via the target-decoy approach, with validation at the peptide (precursor) and protein levels, leveraging its integrated decoy generation and scoring to ensure specificity. Both platforms employed match-between-runs (MBR) to reduce missing values: Spectronaut’s MBR aligned peaks across samples based on retention times and fragment ion similarity, transferring identifications from well-measured samples to those with low signal; DIA-NN’s MBR used predicted retention times and spectral similarity to propagate identifications and quantifications across runs, with a built-in confidence filter to minimize false transfers.

The 24 pairs of LUAD samples were analyzed qualitatively and quantitatively using both software platforms. After data collation and identification of commonly detected proteins, differential expression analysis was conducted. For the protein expression matrices generated by Spectronaut and DIA-NN, proteins commonly identified by both software tools were selected and subjected to median normalization. Following log2 transformation, Pearson correlation coefficients were calculated between and within the tumor and peritumor groups. After median correction of the intersecting proteins, two sets of proteins were filtered: those with no missing values in at least 12 pairs, and at least 17 pairs, of tumor and peritumor tissues. Missing protein abundance values (due to uncontrollable changes in instrumental or experimental conditions, or noise) were imputed using the following strategies: imputation with 100% of the minimum value, 70% of the minimum value, 50% of the minimum value, 20% of the minimum value of each protein; plus the inclusion of only patient samples without missing data. Missing value imputation was also performed using the MinProb algorithm , implemented in the R package imputeLCMD. The algorithm was configured with a detection limit quantile (q = 0.05) and a noise parameter (tune.sigma = 0.25), and imputed values for missing entries were generated based on a normal distribution model. A fixed random seed (123) was set to ensure the reproducibility of results. Only missing values in the numerical matrix were imputed, while the original quantitative values remained unmodified. All imputed values were calibrated to be positive numbers greater than 1/10 of the smallest positive value in the original data set, thus avoiding the interference of extremely small values on subsequent statistical analyses.

Differential analysis of all imputed data was performed using Student’s paired t test, involving pairwise comparisons between tumor tissue samples and their corresponding peritumor tissue samples from each individual patient to eliminate potential interference from interindividual differences in genetic/proteomic background and physiological status. Multiple testing correction was implemented via the Benjamini-Hochberg method to control the false discovery rate (FDR) with an adjusted P-value ≤0.05 set as the threshold of statistical significance. Differentially expressed proteins (DEPs) between tumor and peritumor tissues were defined based on the quantitative protein expression intensity values (rather than protein detection counts), with screening criteria specified as log₂ fold change (FC) ≥ 0.58 (i.e., FC ≥ 1.5) and adjusted P-value ≤0.05 (i.e., −log₁₀ (adjusted P-value) ≥ 1.3). Protein expression intensity was derived from peptide ion intensity quantified by Spectronaut or DIA-NN; raw intensity values were normalized and aggregated to the protein level following the standard workflow of the software. To maximize the number of LUAD samples that could be analyzed within the available MS time in this study, we prioritized assaying more samples instead of performing multiple technical replicates. The validity of this experimental design was verified by the high Pearson coefficients (r > 0.9) observed among biological replicates (see the Results section), which confirms the excellent reproducibility and stability of our proteomic quantification data.

Bioinformatics and Subtyping Analyses

To gain a deeper understanding of the interrelationships among differentially expressed proteins, we first identified the shared DEPs between Spectronaut and DIA-NN protein expression matrices. On this basis, the top 50 shared DEPs derived from the Spectronaut matrix, and the top 50 shared DEPs derived from the DIA-NN matrix were separately imported into the STRING database (https://cn.string-db.org/). , The species was set as “Homo sapiens” and the interaction threshold was set to “medium confidence >0.4”. The network display option “hide disconnected nodes in the network” was enabled, while all other parameters were kept at their default settings. From this, Protein–protein Interaction (PPI) Networks were constructed. Subsequently, the PPI network diagrams generated by STRING were imported into Cytoscape 3.9.1 software (https://cytoscape.org), , and the degree values were displayed using the “Analyze Network” function in the toolbar. Based on the degree rankings, we identified the PPIs with the strongest interactions.

Additional bioinformatic analyses were performed on the results obtained from the two DIA platforms for comparison. These bioinformatic analyses included Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses as curated by DAVID (https://davidbioinformatics.nih.gov/home.jsp). , The background genes were set as the well-defined “Gene List” in the study to avoid false positives in the enrichment results caused by mismatched background gene sets. Regarding the parameter settings for pathway enrichment, for the pathway entries output by the DAVID database, the FDR was adopted as the statistical screening threshold to correct for biases arising from multiple testing. Finally, FDR < 0.05 was used as the criterion for identifying significantly enriched pathways, ensuring that the screened pathways possess reliable biological significance.

Three clustering tools were used for subtype analysis of the 24 tumor samples: Consensus Cluster Plus (CCP), non-negative matrix factorization (NMF), and hierarchical clustering. The R packages for ConsensusClusterPlus and NMF, as well as a Python script for hierarchical clustering, was used to classify the 24 LUAD tumor tissues into subtypes based on the following three sets of proteins: (1) 1130 DEPs common to both Spectronaut and DIA-NN; (2) 1516 DEPs from Spectronaut and 1993 DEPs from DIA-NN; and (3) all proteins involved in differential analysis from both software tools. After subtype classification of the 24 tumor tissues, KEGG and GO analyses were performed on all detected proteins in each subtype.

Results

Figure shows a summary of the number of proteins identified using Spectronaut and DIA-NN at the various stages and categories of LUAD analyses. Altogether, the numbers of proteins identified and categorized are comparable, especially those at the top and at the beginning of the workflow. The use of Spectronaut afforded the identification of 7597 proteins in the 24 pairs of LUAD biopsies, while DIA-NN gave 7787 proteins; 7180 proteins were common to both platforms. We envision proteins that were shared between Spectronaut and DIA-NN had a higher probability of being correct identifications. The coefficients of variation (CVs) in our laboratory for triplicate analyses, covering a variety of biological matrices and using the two computational platforms, for log2-transformed protein abundances are median CV, 1.10%; 90th percentile CV, 3.65%; these translate to natural (unlogged) abundances: median CV, 2.14%; 90th percentile CV, 7.52%. Among the 24 pairs of LUAD tumor and peritumor samples, there was consistently high intragroup (tumor or peritumor) Pearson correlation, > 0.9166 for Spectronaut and >0.9087 for DIA-NN; and slightly lower but still high intergroup (tumor versus peritumor) correlation, 0.9073 ± 0.0495 for Spectronaut and 0.8990 ± 0.0544 for DIA-NN (Figure S1). The robust intragroup correlations (≥0.9087) substantiate our normalization procedure in handling sample-to-sample changes with respect to instrumental conditions and technical noise. This validates the effectiveness of our normalization strategy in ensuring reliable protein quantification. The high intergroup correlations demonstrate substantial consistency in protein quantification and veracity in the search for DEPs. Of the 7180 proteins that were identified by both computational platforms, Spectronaut was able to measure 2541 proteins in every 24 pairs of LUAD samples (those “without missing values”) (see the left half of Figure ). The remainders all contained missing quantitative values. Missing values in proteomics data can compromise data integrity, interfere with quantification accuracy, and impair the reliability of downstream analyses, including differential expression analysis and clustering. The processing workflow requires first filtering proteins/samples with high missing value rates, followed by missing value imputation. We elected to proceed by selecting proteins quantified in at least 12 pairs of samples and those quantified in at least 17 pairs of samples, then performing missing value imputation on the data sets with missing values. A number of strategies had been proposed; , for our work, we elected to proceed by imputation with the following strategy and evaluation. The strategy involved imputing missing values with the minimum quantity of the given protein among all 24 samples, 70th percentile value, 50th percentile value (median), and 20th percentile value, as well as only including patient samples without missing data. To avoid overfiltering, we retained proteins quantified in at least 12 pairs of samples. For missing values, we decided to use the 50th percentile value (median) of each protein for imputation, as it is not affected by extreme values and can reflect the central tendency of protein expression [see Discussion section]. The sum of 5564 proteins (those without missing values and those quantified in at least 12 pairs of samples) observed in Spectronaut were further processed for differential protein analysis, which led to 1250 upregulated proteins and 266 downregulated proteins. The results for DIA-NN are also shown in Figure (see the right half). The two sets of data show superficial resemblance but differ in detail (see later). As per the comments made by a reviewer after revision 1, we have also performed imputation using MinProb, , which resulted in 1038 upregulated proteins and 252 downregulated proteins in Spectronaut, numbers that are comparable to imputation with 100% of the minimum protein abundance values. We consider these changes to be modest and will not significantly impact our current work, especially as we are primarily interested at present in the top 50 DEPs (see later). Raw data are available in Table S2 (Spectronaut) and Table S3 (DIA-NN).

Details of Spectronaut and DIA-NN analyses. Spectronaut data on the left; DIA-NN data on the right.

Figure shows some of these details. Spectronaut identified 2541 proteins that were present in all 24 sample pairs, and DIA-NN identified 2289 proteins. Within these two groups, 2129 proteins were commonly identified by both platforms (Figure a). We envision that these proteins were likely to be high in abundance; this is evidenced by the fact that 2010 out of 2541 proteins (79.74%) were found within the first and 2541st protein rank in Spectronaut, and 1855 out of 2289 proteins (81.04%) were within the first and 2289th rank in DIA-NN. For the 3023 Spectronaut proteins and 2592 DIA-NN proteins (2169 proteins common to both) that were identified in at least 12 pairs of LUAD biopsies (Figure b), we envision that they would be of moderate abundance, and this is made apparent by the observations that 1780 of the 3023 proteins (58.89%) examined by Spectronaut were located within the 2541st and 5564th rank. Similarly, 1625 out of 2592 proteins (62.70%) thus identified by DIA-NN were found within the 2289th and 4881st rank. An impetus in our current study is to use the Spectronaut and DIA-NN platforms to validate each other, thus we deem proteins that are commonly identified carry a higher measure of fidelity.

Comparison of Spectronaut and DIA-NN data: (a) proteins without missing values; (b) proteins with data in at least 12 pairs of LUAD; and (c) proteins eligible for differential analysis and DEPs.

Results of differential analyses are shown in Figure c. Of the 5564 Spectronaut proteins that were analyzed for DEPs, 4838 were also identified by DIA-NN. Figure also shows that 5564 and 4881 proteins were eligible for DEP analysis from Spectronaut and DIA-NN, respectively. Figure c illustrates that 1516 DEPs were identified by Spectronaut; 1993 DEPs by DIA-NN. Of these DEPs, 1130 were common to both platforms. Figure shows volcano plots of the differential protein analyses. As summarized in Figure , we observed 1250 upregulated (shown in red in Figure ) and 266 downregulated (shown in blue in Figure ) proteins in the Spectronaut data, and 1819 upregulated (red) and 174 downregulated (blue) proteins in the DIA-NN data. GO and KEGG enrichment analyses (Figure ) show that the DEPs are involved in protein binding and are enriched in the cytosol, extracellular exosome and the mitochondria (GO, Figure b), and they play significant roles within metabolic pathways (KEGG, Figure a). As our sample size of 24 is modest and our analysis is elementary, these enrichment results are best treated as preliminary and require confirmation by more rigorous analysis and a larger patient cohort.

Volcano plots: (a) Spectronaut data; (b) DIA-NN data. DEPs: P11021 = HSPA5; Q03135 = CAV1 (see text). The x-axis shows the log₂ (fold change) in protein expression intensity (tumor versus peritumor), and the y-axis shows the −log₁ ₀ (Adjusted P_value). Red dots indicate upregulated proteins (log₂ (fold change) ≥ 0.58 and Adjusted_P_value ≤ 0.05); blue dots indicate downregulated proteins (log₂ (fold change) ≤ −0.58 and Adjusted_P_value ≤ 0.05); and gray dots represent nondifferentially expressed proteins. Statistical significance was determined by paired t test followed by Benjamini–Hochberg multiple testing correction.

GO and KEGG enrichment analyses: (a) KEGG; (b) GO.

We deem DEPs that had the largest FC and smallest Pvalue (i.e., those that are located toward the upper left and upper right corners of the volcano plots in Figure ) to be potentially most significant for examination. The top 50 of such proteins are shown in Figure and Table ; those that are common to both Spectronaut and DIA-NN (34/50 or 68%) are highlighted in yellow in Table with additional details provided in Table S4. The majority of our top 50 DEPs have been reported in a combination of previous studies. − Xu et al. used label-free quantification in combination with MaxQuant after DDA acquisition. Gillett et al. employed mass-tagging with 10-plex TMT with a common reference pool for quantification. Lehtio et al. used DDA for protein identification and quantification, and DIA for validation on an independent cohort. We deem the comparability of our DEPs with those identified in the earlier studies − as validation of our methodology and our data. Concordance of our data is the best with those of Xu et al.; this is, perhaps, a (partial) reflection of the plausible similarity in preponderance of East Asian, and especially Chinese, patient ethnicity between the patient cohorts in Xu et al. and ours.

Top 50 DEPs as determined by (a) Spectronaut; (b) DIA-NN. The x-axis shows the log₂ (fold change) in protein expression intensity (tumor versus peritumor), and the y-axis shows the −log₁ ₀ (Adjusted P_value). Upregulated proteins with red Uniprot numbers, and downregulated proteins with blue Uniprot numbers, are common to both Spectronaut and DIA-NN; DEPs labeled with black Uniprot numbers are specific to the particular computational platform. Statistical significance was determined by paired t test followed by Benjamini–Hochberg multiple testing correction.

1. Top 50 DEPs and PPI Proteins (PPIs) as Determined by Spectronaut and DIA-NN.

graphic file with name ao5c10421_0009.jpg

Open in a new tab

Protein–protein interactions were investigated to determine how interlinked the DEPs are. Table also displays the extent of PPI in the Top 50 DEPs, among which 35 proteins exhibit interactions. Additionally, proteins identified by both platforms (25/35, i.e., 71.4%) are highlighted in yellow. The extent of interactions among the Top DEPs, as exemplified by the degrees of interaction, is shown in Table , and is detailed in Figure . Two DEPs, P11021 (HSPA5) or heat shock protein family A (Hsp70), and Q03135 (CAV1) or caveolin 1, are among the 50 top DEPs (Figures and ) as well as the top interacting proteins (Figure ). The ranks and the abundances of these two proteins, HSPA5 and CAV1, in the 24 pairs of LUAD biopsies are shown in Figure . It is evident that there is a significant difference among the expressions of these two abundant proteins in the tumor and the peritumor samples, as demonstrated in the pairwise comparisons shown in Figure b, which further substantiates the data shown in Figures and .

Interactions of the top 50 PPI proteins determined by (a) Spectronaut; (b) DIA-NN. Proteins that are commonly identified in both computational platforms are highlighted in yellow.

Rank and expression of HSPA5 and CAV1 (a) as quantified by Spectronaut and DIA-NN; both DEPs are high in abundance (low in rank number); (b) expression of HSPA5 and CAV1 in tumor and peritumor samples.

Protein expression of the 24 pairs of tumor and peritumor samples enabled subtyping of the patients. Of the three subtyping algorithms that we usedConsensusClusterPlus (CCP), non-negative matrix factorization and hierarchical clusteringCCP gives the most internally consistent results that make sense (Table , data for the other two subtyping algorithms are not shown). CCP subtyping of DEP data, as evaluated by consensus clustering’s cumulative distribution functions (CDFs) and silhouette scores, consistently yields three (two large and one small) groups (see Figure S2 for details), irrespective of whether subtyping was performed on the 1516 DEPs that were observed by Spectronaut, on the 1993 DEPs found by DIA-NN, or on the 1130 DEPs that were common to both computational platforms. Subtyping using the larger ensemble of proteins eligible for differential analysis5564 in Spectronaut and 4881 in DIA-NNyielded two major groups, with one group consisting of 11 patients and the other of 13 patients, the subtyping results being completely consistent between the two platforms (Table and Figure S2). The two large plus one small subtypesSubtypes 1 to 3comprise a core of consistent samples, irrespective of the computational platform used or the protein types examined (Table and Figure ). GO and KEGG analyses show that these subtypes enrich for different types of proteins (Figure g–j). Cancer subtyping is important in identifying differences in the patients’ protein expression and their potential responses to specific therapeutic regimes. − As our sample size (24 patients) is modest, the subtyping results are best-treated as preliminary and exploratory that await confirmation or augmentation by a future, larger cohort.

2. Subtyping of 24 LUAD Samples/Patients by CCP.

graphic file with name ao5c10421_0010.jpg

Open in a new tab

Subtyping of the 24 LUAD sample pairs: (a) Spectronaut data on 1130 DEPs common to both Spectronaut and DIA-NN; (b) DIA-NN data on 1130 DEPs common to both Spectronaut and DIA-NN; (c) Spectronaut data on its 1516 DEPs; (d) DIA-NN data on its 1993 DEPs; (e) Spectronaut data on its 5564 proteins eligible for differential analysis; (f) DIA-NN data on its 4881 proteins eligible for differential analysis; (g) GO enrichment analyses of the Spectronaut subtypes; (h) GO enrichment analyses of the DIA-NN subtypes; (i) KEGG enrichment analyses of the Spectronaut subtypes; (j) KEGG enrichment analyses of the DIA-NN subtypes. The heatmap shows Z-Scores from −4 (deep blue) to 4 (deep red).

Discussion

Our results show that both Spectronaut and DIA-NN work well for protein identification and quantification in direct DIA. There is strong concordance in the proteins identified with 7180 out of 7597, or 94.5%, of Spectronaut proteins in common with 7180 out of 7787, or 92.2%, DIA-NN proteins. The extent of concordance decreases with quantification, but only slightly: 2129 out of 2541, or 83.8%, Spectronaut proteins with complete quantification data (no missing values) are now shared with 2129 out of 2289, or 93.0%, DIA-NN proteins. For proteins with some missing quantification data, but at least 12 pairs of data, 2169 out of 3023, or 71.7%, Spectronaut proteins are shared with 2169 out of 2592, or 83.7%, DIA-NN proteins. This works out to a concordance of 4838 out of 5564, or 87.0%, Spectronaut proteins considered for differential analysis with 4838 out of 4881, or 99.1%, DIA-NN proteins. However, the concordance decreases more substantially with respect to DEPs (Figure c): only 1130 out of 1516, or 74.5%, Spectronaut DEPs are in common with 1130 out of 1993, or 57.0, DIA-NN DEPs. As stated previously, we consider the 1130 DEPs that are common to both Spectronaut and DIA-NN to be of higher confidence with respect to differential expression.

Figure shows the presence of 1250 upregulated and 266 downregulated DEPs among a total of 5564 proteins in the Spectronaut data (Figure ), and 1819 upregulated and 174 downregulated DEPs among a total of 4881 proteins (Figure ) in the DIA-NN data. These DEP numbers were determined after imputation with 50% of the minimum protein quantities (see Tables S5 and S6 for details of imputation). Imputation with other levels resulted in the following upregulated DEPs: Spectronaut/DIA-NN: 100%, 1109/1649; 70%, 1202/1751; 50%, 1250/1819; and 20%, 1326/1921. Downregulated DEPs: Spectronaut/DIA-NN: 100%, 260/169; 70%, 262/171; 50%, 266/174; and 20%, 266/174 (see Tables S7 and S8 for DEP details). Responding to reviewer comments at revision 1, we have also investigated the differential expression with MinProb imputation, which gave upregulated DEPs Spectronaut/DIA-NN as 1038/1586 and downregulated DEPs as 252/166, respectively. The changes in the number of DEPs observed as the imputation level changed are in accordance with the expectation of the presence of missing values, the handling of which is ongoing research. However, it would appear that even the maximum changes are arguably moderate. For upregulated DEPs: Spectronaut/DIA-NN: <17.0%/<12.8%; and downregulated DEPs: Spectronaut/DIA-NN: <5.3%/<4.6% from 50% imputation. These results were obtained from proteins with values in at least 12 pairs of LUAD samples (Figure ). Raising the requirement to 17 pairs reduced the number of eligible proteins for differential analysis to 4684 in Spectronaut and 4139 in DIA-NN (see Tables S5 and S6) though the changes were modest at 11.6%. We have chosen to work with proteins quantified in at least 12 pairs of LUAD and imputing at 50% of the minimum protein quantity in this work; however, these parameters may be adjusted in future studies with a larger patient cohort. It is of note that there is good concordance in the Top 50 DEPs observed irrespective of the imputation levels and whether we are dealing with ≥12 or ≥17 pairs of samples (Table S4), with 44 DEPs observed in ≥10 imputation methods with Spectronaut and 39 DEPs observed in ≥10 imputations methods with DIA-NN.

Of the DEPs, the two DEPs that are determined to be among the 50 top DEPs as well as the top 50 PPI proteinsHSPA5 and CAV1are well-known for their significance in cancers. HSPA5 is an intracellular protein and a member of the heat-shock protein 70 family. It is localized to the lumen of the endoplasmic reticulum (ER) and functions as a chaperone involved in the folding and assembly of proteins in the ER. HSPA5 has been found to be a prognostic marker in glioblastoma multiforme, renal cancer and urothelial cancer. Although LUAD patients who exhibited higher HSPA5 expression had poorer survival statistics, this protein is not considered prognostic for LUAD patients. Immunohistochemical (IHC) staining of HSPA5 in LUAD cancer cells is described as medium with moderate intensity in >75% of cancer cells. CAV1 can be both intracellular as well as membranous, and is localized to the Golgi apparatus. It is a scaffolding protein within caveolar membranes. The CAV1 gene is a tumor suppressor. CAV1 is considered prognostic in lung cancer, pancreatic cancer and stomach cancer. The protein is typically not detected in LUAD tumor cells by IHC. The HPA data are in qualitative agreement with, and serve as external verification of, our DIA results shown in Figure , which shows higher expression of HSPA5 in the majority of our tumor samples than the peritumor samples, and lower expression of CAV1 in the majority of tumor versus peritumor samples, in agreement with the latter’s tumor suppression function.

The LUAD Subtypes share many of the enriched pathways. This is expected; however, there are also differences that stood out (Figure ). Comparing the two major subtypes, Subtypes 1 and 2, Subtype 1 is enriched in pathways that are involved in mitochondrial translation, protein homodimerization activity, endoplasmic reticulum (ER), GTP binding, while depleted in pathways that are involved in aerobic respiration, protein folding, ficolin-1-rich granule lumen, actin binding and actin filament binding. Subtype 3 shares some differentiating pathways between Subtypes 1 and 2, and can be regarded as a subtype intermediate to the two larger subtypes.

Cancer cells exhibit robust mitochondrial function. While glycolysis has long been viewed as the preferred metabolic pathway in tumors, accumulating evidence demonstrates that mitochondrial oxidative phosphorylation (OXPHOS) plays a significant role in tumor progression. Noteworthy progress has been made in examining aberrant regulation of the FIH gene, which is associated with aerobic respiration, in cancer cells. This research reveals complex molecular mechanisms and identifies potential therapeutic targets. The ER is a key organelle within cells responsible for calcium ion storage, protein synthesis, and lipid metabolism. In multiple tumor types, the combined action of diverse oncogenes, transcription abnormalities, and metabolic disorders can induce a persistent ER stress state. ER stress has emerged as a critical regulatory factor in tumor growth, metastasis and responses to chemotherapy, targeted therapy and immunotherapy. ER-associated proteins, such as TRAF3IP3, can suppress cell proliferation and promote apoptosis rates in LUAD cells by inducing excessive ER stress-related apoptosis. The importance of ER-associated proteins as novel therapeutic targets for LUAD treatment is noteworthy. Actin filaments are core components of the cytoskeleton, participating in key cellular activities such as maintaining cell shape, migration, adhesion, division, and signal transduction. Actin filament binding plays a critical role in the development of LUAD. Its dynamic regulation functions by reshaping the cytoskeleton and promoting migration and invasion.

Our current work shows that LUAD subtyping is viable and that the two computational platforms give fairly comparable results; clinical validity of the current subtypes will need to be substantiated in the future with additional LUAD samples and experimentation.

Conclusions

Our results show that both Spectronaut and DIA-NN function well for direct DIA and their data are, by-and-large, comparable both in terms of proteins identified and differential expression. Many of the DEPs identified herein by means of DIA have been reported using different protein identification and quantification methodologies based on DDA. We deem proteins that are identified and quantified by both Spectronaut and DIA-NN to likely bear a higher degree of fidelity. Subtyping of LUAD using the results of the two platforms led to comparable subtypes. These subtypes will need to be verified using a much larger LUAD patient cohort in a future study. We are also interested in utilizing some of our DEPs for possible diagnosis and prognosis of LUAD from a fluid-based patient sample.

Supplementary Material

ao5c10421_si_001.pdf^{(1.4MB, pdf)}

ao5c10421_si_002.xlsx^{(17.3KB, xlsx)}

ao5c10421_si_003.xlsx^{(17.3MB, xlsx)}

ao5c10421_si_004.xlsx^{(15.1MB, xlsx)}

ao5c10421_si_005.xlsx^{(20.9KB, xlsx)}

ao5c10421_si_006.xlsx^{(37.9MB, xlsx)}

ao5c10421_si_007.xlsx^{(33.3MB, xlsx)}

ao5c10421_si_008.xlsx^{(1MB, xlsx)}

ao5c10421_si_009.xlsx^{(1.4MB, xlsx)}

Acknowledgments

This study was approved by the Shandong Public Health Clinical Center Affiliated to Shandong University’s ethics board (certificate number: GWLCZXEC-SOP-K-2025-124). All patients provided inform consent regarding the use of their surgical specimens for clinical research. We are grateful to all patients and surgical staff for their invaluable contributions to research and the advancement of clinical care. This work would not have been possible but for their dedication and contributions. We would also like to commend the reviewers for their constructive and informative critiques, which have benefited our work and this manuscript.

The proteomic data have been deposited to the ProteomeXchange Consortium (https://proteomecentral.proteomexchange.org) via the iProX partner repository with the data set identifier PXD068441.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.5c10421.

Correlation matrix of 24 LUAD samples/patients (Pearson correlation coefficient) (Figure S1). Consensus clustering analysis indicators for CCP subtypes in 24 LUAD patients (Figure S2) (PDF)
LUAD patient demographic details (Table S1) (XLSX)
Spectronaut analysis raw data (Table S2) (XLSX)
DIA-NN analysis raw data (Table S3) (XLSX)
TOP 50 DEP details (Table S4) (XLSX)
Spectronaut imputation details (Table S5) (XLSX)
DIA-NN imputation details (Table S6) (XLSX)
Spectronaut DEPs at different levels of imputation (Table S7) (XLSX)
DIA-NN DEPs at different levels of imputation (Table S8) (XLSX)

#.

Z.Y., A.D., and X.X. contributed equally to this work. A.D. performed the bioinformatic analyses; X.X., Y.L., and X.M. carried out the sample preparation and proteomic analyses; Y.Z. performed the surgery and provided all clinical samples; all authors discussed and edited the manuscript; Z.Y., I.K.C., and K.W.M.S. conceptualized the study and wrote the manuscript with contributions from all authors.

This work is funded by the Shandong Public Health Clinical Center Affiliated to Shandong University, as well as the Shandong Science and Technology Department. Y.Z. thanks the Shandong Provincial Medical Health Science and Technology Development Plan (Project No. 202402050731) for support. I.K.C. acknowledges support from the Hong Kong Research Grants Council (HKU 17303821 and HKU 17304919).

The authors declare no competing financial interest.

References

Ludwig C., Gillet L., Rosenberger G., Amon S., Collins B. C., Aebersold R.. Data-Independent Acquisition-Based SWATH-MS for Quantitative Proteomics: A Tutorial. Mol. Syst. Biol. 2018;14:e8126. doi: 10.15252/msb.20178126. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eliuk S., Makarov A.. Evolution of Orbitrap Mass Spectrometry Instrumentation. Annual. Rev. Anal. Chem. 2015;8:61–80. doi: 10.1146/annurev-anchem-071114-040325. [DOI] [PubMed] [Google Scholar]
Krasny L., Huang P. H.. Data-Independent Acquisition Mass Spectrometry (DIA-MS) for Proteomic Applications in Oncology. Mol. Omics. 2020;17:29–42. doi: 10.1039/D0MO00072H. [DOI] [PubMed] [Google Scholar]
Mehta D., Scandola S., Uhrig R. G.. BoxCar and Library-Free Data-Independent Acquisition Substantially Improve the Depth, Range, and Completeness of Label-Free Quantitative Proteomics. Anal. Chem. 2022;94:793–802. doi: 10.1021/acs.analchem.1c03338. [DOI] [PubMed] [Google Scholar]
Muntel J., Kirkpatrick J., Bruderer R., Huang T., Vitek O., Ori A., Reiter L.. Comparison of Protein Quantification in a Complex Background by DIA and TMT Workflows with Fixed Instrument Time. J. Proteome Res. 2019;18:1340–1351. doi: 10.1021/acs.jproteome.8b00898. [DOI] [PubMed] [Google Scholar]
Kitata R. B., Choong W.-K., Tsai C.-F., Lin P.-Y., Chen B.-S., Chang Y.-C., Nesvizhskii A. I., Sung T.-Y., Chen Y.-J.. A Data-Independent Acquisition-Based Global Phosphoproteomics System Enables Deep Profiling. Nat. Commun. 2021;12:2539. doi: 10.1038/s41467-021-22759-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tsai C.-F., Wang Y.-T., Hsu C.-C., Kitata R. B., Chu R. K., Velickovic M., Zhao R., Williams S. M., Chrisler W. B., Jorgensen M. L., Moore R. J., Zhu Y., Rodland K. D., Smith R. D., Wasserfall C. H., Shi T., Liu T.. A Streamlined Tandem Tip-Based Workflow for Sensitive Nanoscale Phosphoproteomics. Commun. Biol. 2023;6:70. doi: 10.1038/s42003-022-04400-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dellar E. R., Vendrell I., Talbot K., Kessler B. M., Fischer R., Turner M. R., Thompson A. G.. Data-Independent Acquisition Proteomics of Cerebrospinal Fluid Implicates Endoplasmic Reticulum and Inflammatory Mechanisms in Amyotrophic Lateral Sclerosis. J. Neurochem. 2024;168:115–127. doi: 10.1111/jnc.16030. [DOI] [PMC free article] [PubMed] [Google Scholar]
Biognosys Spectronaut User Manual; Biognosys AG: Switzerland. [Google Scholar]
Demichev V., Messner C. B., Vernardis S. I., Lilley K. S., Ralser M.. DIA-NN: Neural Networks and Interference Correction Enable Deep Proteome Coverage in High Throughput. Nat. Methods. 2020;17:41–44. doi: 10.1038/s41592-019-0638-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lou R., Cao Y., Li S., Lang X., Li Y., Zhang Y., Shui W.. Benchmarking Commonly Used Software Suites and Analysis Workflows for DIA Proteomics and Phosphoproteomics. Nat. Commun. 2023;14:94. doi: 10.1038/s41467-022-35740-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu F., Teo G. C., Kong A. T., Frohlich K., Li G. X., Demichev V., Nesvizhskii A. I.. Analysis of DIA Proteomics Data Using MSFragger-DIA and FragPipe Computational Platform. Nat. Commun. 2023;14:4154. doi: 10.1038/s41467-023-39869-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu J.-Y., Zhang C., Wang X., Zhai L., Ma Y., Mao Y., Qian K., Sun C.. et al. Integrative Proteomic Characterization of Human Lung Adenocarcinoma. Cell. 2020;182:245–261. doi: 10.1016/j.cell.2020.05.043. [DOI] [PubMed] [Google Scholar]
Gillett M. A., Satpathy S., Cao S., Dhanasekaran S. M., Vasaikar S. V., Krug K., Petralia F., Li Y.. et al. Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma. Cell. 2020;182:200–225.e35. doi: 10.1016/j.cell.2020.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lehtio J., Arslan T., Siavelis I., Pan Y., Socciarelli F., Berkovska O., Umer H. M., Mermelekas G.. et al. Proteogenomics of Non-Small Cell Lung Cancer Reveals Molecular Subtypes Associated with Specific Therapeutic Targets and Immune-Evasion Mechanisms. Nat. Cancer. 2021;2:1224–1242. doi: 10.1038/s43018-021-00259-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xia C., Dong X., Li H., Cao M., Sun D., He S., Yang F., Yan X.. et al. Cancer Statistics in China and United States, 2022: Profiles, Trends, and Determinants. Clin. Med. J. 2022;135:584–590. doi: 10.1097/CM9.0000000000002108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Travis W. D., Brambilla E., Nicholson A. G., Yatabe Y., Austin J. H., Beasley M. B., Chirieac L. R., Dacic S.. et al. The 2015 World Health Organization Classification of Lung Tumors. Impact of Genetic, Clinical and Radiologic Advances since the 2004 Classification. J. Thoracic Oncol. 2015;10:1243–1260. doi: 10.1097/JTO.0000000000000630. [DOI] [PubMed] [Google Scholar]
Thai A. A.. Lung Cancer. Lancet. 2021;398:535–554. doi: 10.1016/S0140-6736(21)00312-3. [DOI] [PubMed] [Google Scholar]
Peng H., Wang H., Kong W., Li J., Goh W. W. B.. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nat. Commun. 2024;15:3922. doi: 10.1038/s41467-024-47899-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hui H. W. H., Chan W. X., Goh W. W. B.. Assessing the impact of batch effect associated missing values on downstream analysis in high-throughput biomedical data. Brief. Bioinform. 2025;26:bbaf168. doi: 10.1093/bib/bbaf168. [DOI] [PMC free article] [PubMed] [Google Scholar]
Snel B., Lehmann G., Bork P., Huynen M. A.. STRING: a Web-Server to Retrieve and Display the Repeatedly Occurring Neighbourhood of a Gene. Nucleic Acids Res. 2000;28:3442–3444. doi: 10.1093/nar/28.18.3442. [DOI] [PMC free article] [PubMed] [Google Scholar]
Szklarczyk D., Kirsch R., Koutrouli M., Nastou K., Mehryary F., Hachilif R., Annika G. L., Fang T.. et al. The STRING Database in 2023: Protein–Protein Association Networks and Functional Enrichment Analyses for Any Sequenced Genome of Interest. Nucleic Acids Res. 2023;51:D638–D646. doi: 10.1093/nar/gkac1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shannon P., Markiel A., Ozier O., Baliga N. S., Wang J. T., Ramage D., Amin N., Schwikowski B.. et al. Cytoscape: a Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ono K., Fong D., Gao C., Churas C., Pillich R., Lenkiewicz J., Pratt D., Pico A. R.. et al. Cytoscape Web: Bringing Network Biology to the Browser. Nucleic Acids Res. 2025;53:W203–W212. doi: 10.1093/nar/gkaf365. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sherman B. T., Hao M., Qiu J., Jiao X., Baseler M. W., Lane H. C., Imamichi T., Chang W.. DAVID: a Web Server for Functional Enrichment Analysis and Functional Annotation of Gene Lists (2021 Update) Nucleic Acids Res. 2022;50:W216–W221. doi: 10.1093/nar/gkac194. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang D. W., Sherma B. T., Lempicki R. A.. Systematic and Integrative Analysis of Large Gene Lists using DAVID Bioinformatics Resources. Nat. Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
Wilkerson M. D., Hayes D. N.. ConsensusClusterPlus: a Class Discovery Tool with Confidence Assessments and Item Tracking. Bioinformatics. 2010;26:1572–1573. doi: 10.1093/bioinformatics/btq170. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gaujoux R., Seoighe C.. A Flexible R Package for Nonnegative Matrix Factorization. BMC Bioinf. 2010;11:367. doi: 10.1186/1471-2105-11-367. [DOI] [PMC free article] [PubMed] [Google Scholar]
Han W., Zhang S., Gao H., Bu D.. Clustering on Hierarchical Heterogeneous Data with Prior Pairwise Relationships. BMC Bioinf. 2024;25:40. doi: 10.1186/s12859-024-05652-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karpievitch Y. V., Dabney A. R., Smith R. D.. Normalization and Missing Value Imputation for Label-Free LC-MS Analysis. BMC Bioinf. 2012;13:55. doi: 10.1186/1471-2105-13-S16-S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lazar C., Gatto L., Ferro M., Bruley C., Burger T.. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J. Proteome. Res. 2016;15:1116–1125. doi: 10.1021/acs.jproteome.5b00981. [DOI] [PubMed] [Google Scholar]
Human Protein Atlas. https://www.proteinatlas.org/; accessed on October 7, 2025. [Google Scholar]
Heiden M. G. V., Cantley L. C., Thompson C. B.. Understanding the Warburg Effect: The Metabolic Requirements of Cell Proliferation. Science. 2009;324:1029–1033. doi: 10.1126/science.1160809. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ashton T. M., McKenna W. G., Kunz-Schughart L. A., Higgins G. S.. Oxidative Phosphorylation as an Emerging Target in Cancer Therapy. Clin. Cancer Res. 2018;24:2482–2490. doi: 10.1158/1078-0432.CCR-17-3070. [DOI] [PubMed] [Google Scholar]
García-Del Río A., Prieto-Fernández E., Egia-Mendikute L., Antonana-Vildosola A., Jimenez-Lasheras B., Lee S. Y., Barreira-Manrique A., Zanetti S. R.. et al. Factor-inhibiting HIF (FIH) Promotes Lung Cancer Progression. JCI Insight. 2023;8:e167394. doi: 10.1172/jci.insight.167394. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schwarz D. S., Blower M. D.. The Endoplasmic Reticulum: Structure, Function and Response to Cellular Signaling. Cell. Mol. Life Sci. 2016;73:79–94. doi: 10.1007/s00018-015-2052-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen X., Cubillos-Ruiz J. R.. Endoplasmic Reticulum Stress Signals in the Tumour and its Microenvironment. Nat. Rev. Cancer. 2021;21:71–88. doi: 10.1038/s41568-020-00312-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao G., Qi J., Li F., Ma H., Wang R., Yu X., Wang Y., Qin S.. et al. TRAF3IP3 Induces ER Stress-Mediated Apoptosis with Protective Autophagy to Inhibit Lung Adenocarcinoma Proliferation. Adv. Sci. 2025;12:e2411020. doi: 10.1002/advs.202411020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Casalou C., Faustino A., Barral D. C.. Arf Proteins in Cancer Cell Migration. Small GTPases. 2016;7:270–282. doi: 10.1080/21541248.2016.1228792. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Y., Lu Y., Wan R., Wang Y., Zhang C., Li M., Deng P., Cao L.. et al. Profilin 1 Induces Tumor Metastasis by Promoting Microvesicle Secretion through the ROCK 1/p-MLC Pathway in Non-Small Cell Lung Cancer. Front. Pharmacol. 2022;13:890891. doi: 10.3389/fphar.2022.890891. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ao5c10421_si_001.pdf^{(1.4MB, pdf)}

ao5c10421_si_002.xlsx^{(17.3KB, xlsx)}

ao5c10421_si_003.xlsx^{(17.3MB, xlsx)}

ao5c10421_si_004.xlsx^{(15.1MB, xlsx)}

ao5c10421_si_005.xlsx^{(20.9KB, xlsx)}

ao5c10421_si_006.xlsx^{(37.9MB, xlsx)}

ao5c10421_si_007.xlsx^{(33.3MB, xlsx)}

ao5c10421_si_008.xlsx^{(1MB, xlsx)}

ao5c10421_si_009.xlsx^{(1.4MB, xlsx)}

Data Availability Statement

The proteomic data have been deposited to the ProteomeXchange Consortium (https://proteomecentral.proteomexchange.org) via the iProX partner repository with the data set identifier PXD068441.

[ref1] Ludwig C., Gillet L., Rosenberger G., Amon S., Collins B. C., Aebersold R.. Data-Independent Acquisition-Based SWATH-MS for Quantitative Proteomics: A Tutorial. Mol. Syst. Biol. 2018;14:e8126. doi: 10.15252/msb.20178126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] Eliuk S., Makarov A.. Evolution of Orbitrap Mass Spectrometry Instrumentation. Annual. Rev. Anal. Chem. 2015;8:61–80. doi: 10.1146/annurev-anchem-071114-040325. [DOI] [PubMed] [Google Scholar]

[ref3] Krasny L., Huang P. H.. Data-Independent Acquisition Mass Spectrometry (DIA-MS) for Proteomic Applications in Oncology. Mol. Omics. 2020;17:29–42. doi: 10.1039/D0MO00072H. [DOI] [PubMed] [Google Scholar]

[ref4] Mehta D., Scandola S., Uhrig R. G.. BoxCar and Library-Free Data-Independent Acquisition Substantially Improve the Depth, Range, and Completeness of Label-Free Quantitative Proteomics. Anal. Chem. 2022;94:793–802. doi: 10.1021/acs.analchem.1c03338. [DOI] [PubMed] [Google Scholar]

[ref5] Muntel J., Kirkpatrick J., Bruderer R., Huang T., Vitek O., Ori A., Reiter L.. Comparison of Protein Quantification in a Complex Background by DIA and TMT Workflows with Fixed Instrument Time. J. Proteome Res. 2019;18:1340–1351. doi: 10.1021/acs.jproteome.8b00898. [DOI] [PubMed] [Google Scholar]

[ref6] Kitata R. B., Choong W.-K., Tsai C.-F., Lin P.-Y., Chen B.-S., Chang Y.-C., Nesvizhskii A. I., Sung T.-Y., Chen Y.-J.. A Data-Independent Acquisition-Based Global Phosphoproteomics System Enables Deep Profiling. Nat. Commun. 2021;12:2539. doi: 10.1038/s41467-021-22759-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] Tsai C.-F., Wang Y.-T., Hsu C.-C., Kitata R. B., Chu R. K., Velickovic M., Zhao R., Williams S. M., Chrisler W. B., Jorgensen M. L., Moore R. J., Zhu Y., Rodland K. D., Smith R. D., Wasserfall C. H., Shi T., Liu T.. A Streamlined Tandem Tip-Based Workflow for Sensitive Nanoscale Phosphoproteomics. Commun. Biol. 2023;6:70. doi: 10.1038/s42003-022-04400-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] Dellar E. R., Vendrell I., Talbot K., Kessler B. M., Fischer R., Turner M. R., Thompson A. G.. Data-Independent Acquisition Proteomics of Cerebrospinal Fluid Implicates Endoplasmic Reticulum and Inflammatory Mechanisms in Amyotrophic Lateral Sclerosis. J. Neurochem. 2024;168:115–127. doi: 10.1111/jnc.16030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] Biognosys Spectronaut User Manual; Biognosys AG: Switzerland. [Google Scholar]

[ref10] Demichev V., Messner C. B., Vernardis S. I., Lilley K. S., Ralser M.. DIA-NN: Neural Networks and Interference Correction Enable Deep Proteome Coverage in High Throughput. Nat. Methods. 2020;17:41–44. doi: 10.1038/s41592-019-0638-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] Lou R., Cao Y., Li S., Lang X., Li Y., Zhang Y., Shui W.. Benchmarking Commonly Used Software Suites and Analysis Workflows for DIA Proteomics and Phosphoproteomics. Nat. Commun. 2023;14:94. doi: 10.1038/s41467-022-35740-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] Yu F., Teo G. C., Kong A. T., Frohlich K., Li G. X., Demichev V., Nesvizhskii A. I.. Analysis of DIA Proteomics Data Using MSFragger-DIA and FragPipe Computational Platform. Nat. Commun. 2023;14:4154. doi: 10.1038/s41467-023-39869-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] Xu J.-Y., Zhang C., Wang X., Zhai L., Ma Y., Mao Y., Qian K., Sun C.. et al. Integrative Proteomic Characterization of Human Lung Adenocarcinoma. Cell. 2020;182:245–261. doi: 10.1016/j.cell.2020.05.043. [DOI] [PubMed] [Google Scholar]

[ref14] Gillett M. A., Satpathy S., Cao S., Dhanasekaran S. M., Vasaikar S. V., Krug K., Petralia F., Li Y.. et al. Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma. Cell. 2020;182:200–225.e35. doi: 10.1016/j.cell.2020.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] Lehtio J., Arslan T., Siavelis I., Pan Y., Socciarelli F., Berkovska O., Umer H. M., Mermelekas G.. et al. Proteogenomics of Non-Small Cell Lung Cancer Reveals Molecular Subtypes Associated with Specific Therapeutic Targets and Immune-Evasion Mechanisms. Nat. Cancer. 2021;2:1224–1242. doi: 10.1038/s43018-021-00259-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] Xia C., Dong X., Li H., Cao M., Sun D., He S., Yang F., Yan X.. et al. Cancer Statistics in China and United States, 2022: Profiles, Trends, and Determinants. Clin. Med. J. 2022;135:584–590. doi: 10.1097/CM9.0000000000002108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] Travis W. D., Brambilla E., Nicholson A. G., Yatabe Y., Austin J. H., Beasley M. B., Chirieac L. R., Dacic S.. et al. The 2015 World Health Organization Classification of Lung Tumors. Impact of Genetic, Clinical and Radiologic Advances since the 2004 Classification. J. Thoracic Oncol. 2015;10:1243–1260. doi: 10.1097/JTO.0000000000000630. [DOI] [PubMed] [Google Scholar]

[ref18] Thai A. A.. Lung Cancer. Lancet. 2021;398:535–554. doi: 10.1016/S0140-6736(21)00312-3. [DOI] [PubMed] [Google Scholar]

[ref19] Peng H., Wang H., Kong W., Li J., Goh W. W. B.. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nat. Commun. 2024;15:3922. doi: 10.1038/s41467-024-47899-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] Hui H. W. H., Chan W. X., Goh W. W. B.. Assessing the impact of batch effect associated missing values on downstream analysis in high-throughput biomedical data. Brief. Bioinform. 2025;26:bbaf168. doi: 10.1093/bib/bbaf168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] Snel B., Lehmann G., Bork P., Huynen M. A.. STRING: a Web-Server to Retrieve and Display the Repeatedly Occurring Neighbourhood of a Gene. Nucleic Acids Res. 2000;28:3442–3444. doi: 10.1093/nar/28.18.3442. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] Szklarczyk D., Kirsch R., Koutrouli M., Nastou K., Mehryary F., Hachilif R., Annika G. L., Fang T.. et al. The STRING Database in 2023: Protein–Protein Association Networks and Functional Enrichment Analyses for Any Sequenced Genome of Interest. Nucleic Acids Res. 2023;51:D638–D646. doi: 10.1093/nar/gkac1000. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] Shannon P., Markiel A., Ozier O., Baliga N. S., Wang J. T., Ramage D., Amin N., Schwikowski B.. et al. Cytoscape: a Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] Ono K., Fong D., Gao C., Churas C., Pillich R., Lenkiewicz J., Pratt D., Pico A. R.. et al. Cytoscape Web: Bringing Network Biology to the Browser. Nucleic Acids Res. 2025;53:W203–W212. doi: 10.1093/nar/gkaf365. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] Sherman B. T., Hao M., Qiu J., Jiao X., Baseler M. W., Lane H. C., Imamichi T., Chang W.. DAVID: a Web Server for Functional Enrichment Analysis and Functional Annotation of Gene Lists (2021 Update) Nucleic Acids Res. 2022;50:W216–W221. doi: 10.1093/nar/gkac194. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] Huang D. W., Sherma B. T., Lempicki R. A.. Systematic and Integrative Analysis of Large Gene Lists using DAVID Bioinformatics Resources. Nat. Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]

[ref27] Wilkerson M. D., Hayes D. N.. ConsensusClusterPlus: a Class Discovery Tool with Confidence Assessments and Item Tracking. Bioinformatics. 2010;26:1572–1573. doi: 10.1093/bioinformatics/btq170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] Gaujoux R., Seoighe C.. A Flexible R Package for Nonnegative Matrix Factorization. BMC Bioinf. 2010;11:367. doi: 10.1186/1471-2105-11-367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] Han W., Zhang S., Gao H., Bu D.. Clustering on Hierarchical Heterogeneous Data with Prior Pairwise Relationships. BMC Bioinf. 2024;25:40. doi: 10.1186/s12859-024-05652-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] Karpievitch Y. V., Dabney A. R., Smith R. D.. Normalization and Missing Value Imputation for Label-Free LC-MS Analysis. BMC Bioinf. 2012;13:55. doi: 10.1186/1471-2105-13-S16-S5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] Lazar C., Gatto L., Ferro M., Bruley C., Burger T.. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J. Proteome. Res. 2016;15:1116–1125. doi: 10.1021/acs.jproteome.5b00981. [DOI] [PubMed] [Google Scholar]

[ref32] Human Protein Atlas. https://www.proteinatlas.org/; accessed on October 7, 2025. [Google Scholar]

[ref33] Heiden M. G. V., Cantley L. C., Thompson C. B.. Understanding the Warburg Effect: The Metabolic Requirements of Cell Proliferation. Science. 2009;324:1029–1033. doi: 10.1126/science.1160809. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] Ashton T. M., McKenna W. G., Kunz-Schughart L. A., Higgins G. S.. Oxidative Phosphorylation as an Emerging Target in Cancer Therapy. Clin. Cancer Res. 2018;24:2482–2490. doi: 10.1158/1078-0432.CCR-17-3070. [DOI] [PubMed] [Google Scholar]

[ref35] García-Del Río A., Prieto-Fernández E., Egia-Mendikute L., Antonana-Vildosola A., Jimenez-Lasheras B., Lee S. Y., Barreira-Manrique A., Zanetti S. R.. et al. Factor-inhibiting HIF (FIH) Promotes Lung Cancer Progression. JCI Insight. 2023;8:e167394. doi: 10.1172/jci.insight.167394. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref36] Schwarz D. S., Blower M. D.. The Endoplasmic Reticulum: Structure, Function and Response to Cellular Signaling. Cell. Mol. Life Sci. 2016;73:79–94. doi: 10.1007/s00018-015-2052-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] Chen X., Cubillos-Ruiz J. R.. Endoplasmic Reticulum Stress Signals in the Tumour and its Microenvironment. Nat. Rev. Cancer. 2021;21:71–88. doi: 10.1038/s41568-020-00312-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref38] Zhao G., Qi J., Li F., Ma H., Wang R., Yu X., Wang Y., Qin S.. et al. TRAF3IP3 Induces ER Stress-Mediated Apoptosis with Protective Autophagy to Inhibit Lung Adenocarcinoma Proliferation. Adv. Sci. 2025;12:e2411020. doi: 10.1002/advs.202411020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] Casalou C., Faustino A., Barral D. C.. Arf Proteins in Cancer Cell Migration. Small GTPases. 2016;7:270–282. doi: 10.1080/21541248.2016.1228792. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] Wang Y., Lu Y., Wan R., Wang Y., Zhang C., Li M., Deng P., Cao L.. et al. Profilin 1 Induces Tumor Metastasis by Promoting Microvesicle Secretion through the ROCK 1/p-MLC Pathway in Non-Small Cell Lung Cancer. Front. Pharmacol. 2022;13:890891. doi: 10.3389/fphar.2022.890891. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Spectronaut and DIA-NN: A Comparison of their Performance in the Analysis of Lung Adenocarcinoma Biopsies

Zhaoyan Yu

Anqi Du

Xiaona Xu

Yinan Li

Xiaoxue Ma

Wenyang Zhang

Yunzeng Zhang

Ivan K Chu

K W Michael Siu

Abstract

Introduction

Methods

Patient Cohort and Sampling

Tissue Homogenization, Treatment and Digestion

LC–MS/MS

Computational Platforms and Parameters

Bioinformatics and Subtyping Analyses

Results

1.

2.

3.

4.

5.

1. Top 50 DEPs and PPI Proteins (PPIs) as Determined by Spectronaut and DIA-NN.

6.

7.

2. Subtyping of 24 LUAD Samples/Patients by CCP.

8.

Discussion

Conclusions

Supplementary Material

Acknowledgments

#.

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases