Skip to main content
mSphere logoLink to mSphere
. 2024 May 23;9(6):e00793-23. doi: 10.1128/msphere.00793-23

A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease

Katherine Do 1, Subina Mehta 1, Reid Wagner 2, Dechen Bhuming 1, Andrew T Rajczewski 1, Amy P N Skubitz 3, James E Johnson 2, Timothy J Griffin 1, Pratik D Jagtap 1,
Editor: Angela L Rasmussen4
PMCID: PMC11332332  PMID: 38780289

ABSTRACT

Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification, and prioritization of microbial proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant [to generate peptide-spectral matches (PSMs) and quantification], PepQuery2 (to verify the quality of PSMs), Unipept (for taxonomic and functional annotation), and MSstatsTMT (for statistical analysis). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies.

IMPORTANCE

Clinical metaproteomics has immense potential to offer functional insights into the microbiome and its contributions to human disease. However, there are numerous challenges in the metaproteomic analysis of clinical samples, including handling of very large protein sequence databases for sensitive and accurate peptide and protein identification from mass spectrometry data, as well as taxonomic and functional annotation of quantified peptides and proteins to enable interpretation of results. To address these challenges, we have developed a novel clinical metaproteomics workflow that provides customized bioinformatic identification, verification, quantification, and taxonomic and functional annotation. This bioinformatic workflow is implemented in the Galaxy ecosystem and has been used to characterize diverse clinical sample types, such as nasopharyngeal swabs and bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness and availability for use by the research community via analysis of residual fluid from cervical swabs.

KEYWORDS: microbiome, metaproteomics, clinical analysis, bioinformatics

INTRODUCTION

Mass spectrometry (MS)-based metaproteomics enables the analysis of proteins expressed by microbial communities and can be applied to clinical samples to understand microorganism contributions to disease (1). Metaproteomics provides insight into how the microbiome responds to a diseased condition by direct characterization of functional molecules (proteins) that are beyond the capabilities of metagenomics approaches, which mainly focus on taxonomic characterization (2). Moreover, clinical metaproteomics can provide insights into how the microbiome interacts with its host environment. However, one current challenge of metaproteomic analysis of clinical samples is that the high relative abundance of host (human) proteins can hamper the detection and identification of lower abundance microbial proteins. Moreover, identifying microbial peptides derived from tryptic digestion of isolated proteins involves searching tandem mass spectrometry (MS/MS) spectra against large sequence databases comprising all microbial proteomes present in the sample, decreasing sensitivity and increasing the potential for false positives (3). In addition, assigning taxonomy to the detected peptide sequences presents challenges due to the conservation of protein sequences across taxa (4). Assigning functions to detected proteins can also present challenges mainly due to a lack of confident annotation of encoded proteins (4, 5). Here, we offer a novel bioinformatics workflow that overcomes many of these challenges and enables effective metaproteomic analysis in clinical samples relevant to studies of disease.

To demonstrate the effectiveness of our clinical metaproteomics workflow, here we analyze selected MS/MS data from Pap test fluid (PTF) samples collected from ovarian cancer (OC) and non-OC patients. The bioinformatics workflow is accessible via the Galaxy ecosystem, which offers access to powerful bioinformatic tools for metaproteomic data analysis that facilitate the development and execution of complex workflows necessary for complete clinical metaproteomics (6). Galaxy is a free, browser-based, scalable platform that is maintained by a thriving community to meet emerging needs in bioinformatics analysis across omic domains (7, 8). New users can also access online and on-demand training resources, such as step-by-step instructions, access to workflows, and example data sets via the Galaxy Training Network (GTN) resources (6, 9). We envision these collective bioinformatic resources will find use in many clinical studies, such as potential secondary infections inherent to human infectious diseases and other broad research questions regarding host-microbe interactions underlying human disease and cancer (10, 11). Such clinical metaproteomic studies offer the discovery of new microbe-host responses and interactions, as well as the potential to define peptide targets of interest for the development of targeted MS-based clinical assays for diagnostics and health monitoring.

RESULTS

Database generation module: MetaNovo

All modules described in this workflow are depicted in Fig. 1 and summarized in Table S1, including inputs, software tools, and outputs. The first module of the clinical metaproteomics workflow is database generation, wherein a large database comprising 3,383,217 protein sequences (generated from 118 species) was used (Fig. 2). This database was generated using the UniProt XML Downloader tool. This Galaxy tool can also generate a database from proteomes collected at the genus, family, order, or any other higher taxonomic clade, and from any type of microorganism (e.g., bacteria, virus, fungus, etc.). In order to generate a reduced protein sequence database from this large starting database that will be compatible with standard sequence database searching tools, the MetaNovo tool was used (12, 13). As an input, four example Mascot generic format (MGF) files were processed using the DirecTag tool within MetaNovo to generate a reduced database of 1,908 protein sequences. This reduced database was merged with the Human SwissProt database and contaminants database to generate a database with 21,289 protein sequences so that it could be used for the Discovery module.

Fig 1.

Fig 1

Overview of Galaxy bioinformatics workflow. This figure summarizes the workflow into five modules: Database Generation, Discovery, Verification, Quantification, and Data Interpretation.

Fig 2.

Fig 2

Database Generation module. Overview of a large comprehensive database for input into MetaNovo and reduced database generation using MetaNovo.

Discovery module: SearchGUI/PeptideShaker and MaxQuant

The Discovery module uses the four example RAW files, an experimental design file (for MaxQuant), and MetaNovo protein sequence database (generated from the Database Generation module) (Fig. 3). Using the msconvert tool, MS/MS data (RAW files) were converted to MGF files to search against the MetaNovo-generated database using the SearchGUI/PeptideShaker tool suite and MaxQuant (1418). The peptides identified in this module were used for the subsequent Verification module. For our example data set, 184 microbial peptides were detected using MaxQuant, and 32 microbial peptides were detected using SearchGUI/PeptideShaker. Peptides from both tools were merged, filtered, and grouped to retain distinct peptides, and as a result, 196 unique microbial peptides were identified.

Fig 3.

Fig 3

Discovery module. Microbial peptide identification using SearchGUI/PeptideShaker and MaxQuant.

Verification module: PepQuery2

For verification, 196 microbial peptides from the Discovery module, MGF files, and peptide reports generated from MaxQuant and SearchGUI/PeptideShaker were used as inputs. The module uses the PepQuery2 tool to further evaluate MS/MS evidence for candidate microbial sequences identified in the Discovery module (19, 20). The PepQuery2 tool generated 164 confident PSMs, which were used to select 134 microbial peptides that passed the verification criteria for PepQuery2. The Query Tabular tool extracted 73 accession numbers of proteins associated with these 134 microbial peptides verified by PepQuery2 (Fig. 4A) (21). The 73 accession numbers associated with these verified microbial peptides were used to generate a verified microbial protein sequence database for the Quantification module (Fig. 4B).

Fig 4.

Fig 4

Verification module. Overview of peptide verification using PepQuery that consists of (A) generation of PepQuery-verified peptides, which will be used to (B) generate a PepQuery-verified protein database.

Quantification module: MaxQuant

Using the microbial protein sequences containing the verified peptides from PepQuery2, a “compact” database was constructed by merging these with human protein sequences and known contaminants. The MS data sets were searched against this compact database using MaxQuant to generate a peptide report file (3,203 peptides), which was filtered to retain distinct microbial peptides, resulting in a total of 155 quantified microbial peptides (Fig. 5) (16, 17, 22). MaxQuant also identified a total of 1,313 protein groups, which included 1,178 non-contaminant protein groups (1,117 human and 61 microbial; Fig. 5 and 6A). A protein group from MaxQuant contains one or more related protein isoforms that share identified peptide sequences.

Fig 5.

Fig 5

Quantification module. Overview of peptide quantification using MaxQuant.

Fig 6.

Fig 6

Data interpretation module. Overview of data interpretation module using (A) Unipept and MSstatsTMT. Examples of the microbial outputs: (B) microbial taxonomy tree, (C) microbial enzyme commission proteins tree, (D) microbial proteins volcano plot (gray denotes no regulation, red/blue for up-/down-regulated), and (E) a comparison plot for microbial protein LP1.

Data interpretation module: Unipept and MSstatsTMT

Unipept provided taxonomic and functional annotation for the 155 quantified microbial peptides (Fig. 6A) (23, 24). The taxonomy tree depicted the likely taxonomic assignments for each peptide sequence (Fig. 6B). Lactobacillus was the most abundant genus detected with 90 sequences assigned (56 sequences assigned at genus level and 34 sequences assigned to L. crispatus), followed by Citrobacter (one sequence assigned to C. portucalensis) and Escherichia (one sequence assigned to E. coli). Unipept also generated an enzyme commission (EC) proteins tree that displays the numerical classification of what chemical reaction is catalyzed. The most prominent classification was transferase with 33 sequences, followed by hydrolase with 29 sequences, and lyase with 24 sequences (Fig. 6C).

The Protein Group text file obtained through MaxQuant quantification contained 1,313 protein groups. After selecting human and microbial protein groups, this resulted in 1,117 human and 61 microbial proteins (Fig. 6A). Additionally, MSstatsTMT generated volcano plots (Fig. 6D) and comparisons (OC, healthy, benign) to identify differentially expressed microbial proteins (Fig. 6E; Fig. S1 and S2). The tandem-mass tag (TMT) module in MSstats was used for quantification as this selected data set was labeled with the TMT reagents for quantitative proteomics (22, 25, 26). As shown in Fig. 6D, a volcano plot visualizes the log-fold changes and negative log10 of adjusted P-values for a comparison between OC and benign cases for the 61 quantified microbial proteins. The horizontal dashed line represents the false discovery rate (FDR) cutoff (where adjusted P-value = 0.05), and data points above this line denote statistically significant proteins. The data points are colored to denote any differential abundance between cases: gray for no regulation and red/blue for up-/down-regulation. In this study, three microbial proteins from Lactobacillus were determined to be statistically significant when comparing OC versus benign cases. For demonstration purposes, these proteins have been labeled Lactobacillus protein (LP)1, LP2, and LP3 in Fig. 6D. As shown in Fig. 6E, a comparison plot visualizes log2-fold changes and variation of multiple comparisons for a single protein. An example of one microbial protein (LP1) that was statistically significant when comparing the OC cases to the benign cases is shown in Fig. 6E.

DISCUSSION

MS-based clinical metaproteomics, which offers insights into the functional molecules expressed by microorganisms as well as taxonomic composition within human samples, has been gaining attention in recent years (1, 11, 27). Moreover, the approach offers insights into how the host environment might interact with microbial communities, via simultaneous analysis of the host proteome. Clinical metaproteomics has been used in unraveling pathogenic mechanisms in Alzheimer’s disease, autism, colorectal cancer, cystic fibrosis, diabetes, inflammatory bowel disease, and COVID-19 co-infection in a variety of clinical sample types including feces, bronchoalveolar lavage fluid, vaginal swabs, and oral cavity specimens (10, 2840).

This study was meant to demonstrate the effectiveness of our Galaxy-based workflow for clinical metaproteomics, operating on a relatively small example set of quantitative proteomics data from an ongoing study of OC in fluid from cervical swabs. Even though limited, this demonstration data set did identify a protein from the emerging pathogen Citrobacter portucalensis. First isolated from an aquatic sample in Portugal, C. portucalensis has been detected in blood and fecal samples, and in recent years, it has been identified as an emerging global multi-drug resistant pathogen, stemming from its acquisition of clinically relevant resistance determinants (4146). The confirmation of the presence of C. portucalensis in cervical swabs and its pathogenic potential will require further validation but demonstrates the potential of this bioinformatics pipeline to enable discoveries.

Despite its immense potential in offering functional insights into the microbiome as well as the host response to infection, implementing clinical metaproteomics as a research approach still faces challenges. These include creating and analyzing large protein sequence databases and linking them to the necessary tools for confident peptide and protein identification and quantification, functional and taxonomy annotation, and statistical analysis. Ideally, all these steps would be encapsulated in workflows made publicly available and accessible to the bench researcher (1). Foremost among these challenges is the relatively low abundance of microbial proteins as compared to the host proteins in clinical samples, thus making it difficult to detect and characterize these microbial proteins (10, 31). Although our study does not address sample preparation or instrumental analysis methods that may increase MS detection of microbial proteins, our workflow does employ bioinformatics tools aimed at ensuring the most sensitive and confident identification of MS-detected peptides. Additionally, this bioinformatics workflow simultaneously identifies and quantifies human proteins found within these samples at no additional computational cost (Fig. S3).

Our workflow addresses the fundamental challenge of very large protein sequence databases inherent to MS-based metaproteomics. We used the MetaNovo tool to generate a reduced database from an initial database containing millions of sequences (12). Alternative searching approaches to MetaNovo include the sectioning method and the two-step database methods (4749). MetaNovo uses de novo sequence tag matching along with an algorithm for probabilistic optimization of the large input database to generate a customized, reduced-size protein sequence database (12, 13).

It should also be noted that our workflow does not require prior knowledge of sample microorganism composition or the use of metagenomic information (12). MetaNovo is capable of matching MS/MS data to any FASTA database that is generated either from a review of the literature to estimate microorganisms most likely present in a sample, or from collective protein sequence databases of organisms identified from 16S rRNA or metagenomic sequencing data of the sample. MetaNovo simultaneously identifies proteins from any organism type that are included in the database, including human, bacterial, fungal, viral, or any other organism type. The ability of MetaNovo to operate on large and unbiased sequence databases also avoids the potential for false positives when constraining the size of the database to only selected organisms (12).

The workflow also prioritizes rigorous analysis to generate confident matches to microbial peptides and proteins, which are oftentimes found at lower abundance than human host proteins in clinical samples. This analysis comprises several steps: (i) the Discovery module: the use of two peptide identification programs (SearchGUI/PeptideShaker and MaxQuant) increases the range of detected peptides and results in the detection of multiple microbial peptides; (ii) the Verification module: only microbial peptides that pass the PepQuery2 tool’s verification thresholds are retained; and (iii) the Quantification module: only microbial proteins from verified peptides are used to build the compact database for final peptide and protein identification and quantification.

In the Discovery module, to enhance initial identification of microbial peptides we have used multiple complementary database search algorithms for PSM generation. In this demonstration, we used X!Tandem and MS-GF+ within SearchGUI, which also offers more search algorithms to use if desired (14, 15). The Galaxy implementation of our workflow offers flexibility to use other Galaxy-deployed search algorithms, including emerging methods such as FragPipe and Scribe, that could further enhance advanced metaproteomic and related multi-omic analysis (9, 50, 51).

One step that is often overlooked when using MS-based metaproteomics is the need to ascertain the quality of the PSMs for microbial peptide identification and protein inference. This is especially important since most of the modern search algorithms rely on FDR analysis for the identification of peptides and proteins from data-dependent-acquisition MS data (52). This might lead to false positive PSMs, especially from microbes of low abundance that are identified using large protein sequence databases, which can erroneously bolster the number and quality of PSMs. Therefore, there is a need to verify these proteins using bioinformatic approaches which can lead to further validation via targeted proteomic methods (53, 54). We have described methods such as Peptide-Spectral-Match-Evaluation along with BLAST-P verification and PSM visualization using the Multi-omics Visualization Platform within the Galaxy platform (7, 55, 56). For this workflow, we have used the PepQuery2 tool which verifies putative peptide identifications using peptide-centric database searching (20). This rigorous algorithm evaluates the evidence for MS/MS spectra that support the presence of peptides of interest. The PepQuery2 tool compares other sequences within the human reference and microbial protein sequences in the reduced databases against the queried peptide sequences to ensure that the putatively identified sequence is indeed the best match to the MS/MS spectra within the raw data. The Verification module helps to further reduce the database size, enabling quantitative analysis.

For the Quantification module, we have used MaxQuant software within Galaxy which generates quantitative protein and peptide outputs ready for MSstatsTMT analysis in the Data Interpretation module (22, 25, 26). Although MaxQuant was used in this study, newer software such as FragPipe and Scribe are planned for Galaxy implementation and could also be used for quantitative analysis (50, 51). The Data Interpretation module uses MSstatsTMT and Unipept which enable data interpretation via visualization (2325). We used the MSstatsTMT tool for this demonstration data set, as it was TMT labeled; however, the MSstats package itself is compatible with label-free quantitative proteomics data as well. We have described the use of variations of this interpretation module for prior clinical metaproteomics studies (57).

The workflow we describe here is built with an eye toward flexibility to incorporate emerging methods and technologies that may further improve clinical metaproteomic studies. We believe that the outputs from the modules described above will aid in prioritizing proteins and peptides of the highest interest. The workflow outputs can be used to provide the necessary information needed to develop targeted assays for validation and possible implementation within the clinic. For example, methods for enriching microbial proteins from clinical samples containing high-abundance host proteins would enhance the depth of detection via MS as well as provide improved starting data, immediately compatible with our workflow. Metaproteomics researchers have also started using data-independent acquisition for quantitative studies, with growing evidence indicating that this approach can greatly improve the depth and quantitative accuracy of measurements (37, 58, 59). Along with the sensitive mass spectrometers available now for deep quantification studies, this offers an opportunity for metaproteomics researchers to go deeper into the microbiome functions along with deciphering taxonomic contributions (38, 60).

Implementation of our described workflows in Galaxy offers a number of benefits, including scalability for handling large data sets and compute-intensive analyses, as well as the aforementioned flexibility for incorporating new software as they emerge (8, 22, 55, 57, 60, 61). Perhaps the biggest advantage of Galaxy is the training resources it offers. Comprehensive training remains a requirement to promote the adoption of advanced bioinformatics tools (6). The GTN (https://training.galaxyproject.org/) was created to provide learners and instructors with free online training materials and access to globally-maintained resources while promoting open data analysis practices (6, 62). The Galaxy platform empowers learners and researchers worldwide, regardless of expertise, with the tools and skills to perform their own data analyses, all readily accessible through a standard web browser.

In summary, we have developed a clinical metaproteomics workflow within the Galaxy platform. This accessible workflow offers researchers all necessary modules for success in analyzing their metaproteomics data, including generating customized protein sequence databases, matching MS/MS data to these sequences using multiple sequence database search algorithms, verifying the quality of spectral matches, quantifying peptides and inferred proteins, and interpreting the taxonomically and functionally annotated data. We anticipate that the availability of this workflow and the underlying software tools will enable metaproteomics researchers to undertake challenging clinical metaproteomics investigations and make important new discoveries about human health.

MATERIALS AND METHODS

Sample processing to generate MS/MS spectra

For this workflow development, four example RAW files were used as input MS data sets. These RAW files were a subset of a total of 40 PTF samples from 20 OC and 20 non-OC patients. De-identified residual Pap test sample vials were obtained from the University of Minnesota BioNet Tissue Procurement Facility with approval from the University of Minnesota Institutional Review Board (Protocol 1112M07362), which does not require patient consent for de-identified clinical specimen use. All procedures followed the University of Minnesota Institutional Review Board guidelines and regulations. Clinical specimens were collected from the ectocervix of women undergoing routine cervical cancer screening. Pap tests were processed, stained, and examined by a pathologist for diagnosis, as previously described (63).

Proteins from each sample were isolated, digested into peptides with trypsin, and labeled with a distinct Tandem Mass Tag-11-plex tagging reagent. Each experimental group included one pooled reference sample labeled with a unique TMT reagent that served as a common reference for comparison to each patient sample across all four separate experiments. The pooled samples were then separated by offline pH reversed-phase liquid chromatography and analyzed by liquid chromatography-tandem MS (LC-MS/MS), using a hybrid quadrupole-Orbitrap mass spectrometer to generate RAW MS/MS data sets. As required, the four example RAW files were converted into MGF files for software compatibility, such as for SearchGUI/PeptideShaker searches in the Discovery module. In the writing, the abbreviations “RAW” and “MGF” were included to denote the file format of the input MS data sets, and in the figures, the input MS data sets were represented by the same RAW data set icons for simplification. As shown in Fig. 1, the Galaxy bioinformatics workflow consists of five modules (Database Generation, Discovery, Verification, Quantification, and Data Interpretation). All modules are summarized in Table S1, including details regarding inputs, outputs, and software tools. The complete workflow, data sets, and additional training resources are accessible via the Galaxy ecosystem and the GTN website (Tables S2 and S3).

Database generation module for microbial peptide identification

To generate a comprehensive protein sequence database, a list of 118 taxonomic species that are commonly associated with the female reproductive tract was obtained from a 2018 metaproteomic study investigating the cervical-vaginal microbiome (64). In the 2018 study, a microbial database of 131 species was constructed using Human Microbiome Project reference genomes (64). For demonstration purposes in this study, the original list was shortened to 118 species, which consisted of 117 bacterial species and the yeast Candida albicans (Table S3).

Using this list of 118 taxonomic species, a protein sequence FASTA database (3,383,217 sequences) was generated using the UniProt XML Downloader tool within the Galaxy framework. Additionally, Human SwissProt (reviewed-only; 20,408 sequences, as of September 2023) and contaminant (cRAP; 116 sequences) protein sequence databases were generated using the Protein Database Downloader tool. The Species UniProt protein sequence database was then merged with the Human SwissProt (reviewed-only) and cRAP databases, using the FASTA Merge Files and Filter Unique Sequences tool to filter out duplicates and contaminants. This resulted in a comprehensive protein sequence database (2,595,745 sequences) (Fig. 2).

This comprehensive database, along with the four MS data sets (MGF) generated by LC-MS/MS analysis of PTF samples, were inputs for the MetaNovo tool to generate a reduced database (1,908 sequences) (12, 13). The MetaNovo tool infers proteins and organisms directly from raw MS data and the input protein sequence database to generate a reduced (targeted) database, without requiring exact prior knowledge of sample composition or metagenomic data generation (12). The MetaNovo tool has three components: DirecTag (for generating de novo sequence tags), PeptideMapper (for mapping sequence tags to a large FASTA sequence database), and an algorithm for probabilistic ranking of sequence database proteins and filtering based on estimated species and protein abundance (12). The resulting customized, targeted protein sequence database can be searched against raw MS data in PSM-based target-decoy analysis with greater sensitivity and FDR-controlled protein identification (12). The MetaNovo-generated database was then merged with the Human SwissProt (reviewed only) and cRAP databases to generate a compact database of 21,289 human and microbial sequences that were used for peptide identification.

Discovery module using peptide identification programs

The four example MS data sets were searched against the compact database (21,289 sequences) to identify peptide sequences. Two peptide identification programs, SearchGUI/PeptideShaker and MaxQuant, were used for the searches (Fig. 3; Tables S4 and S5). For software compatibility, SearchGUI/PeptideShaker required the RAW files to be converted to MGF using the msconvert tool, whereas MaxQuant can process the RAW files.

SearchGUI is a database-searching tool that comprises different search engines to match sample MS/MS spectra to known peptide sequences (14, 18). In this analysis, the search algorithms MS-GF+ and X!Tandem within SearchGUI were employed to match spectra from MS data against peptides from the compact database (14, 15). Subsequently, PeptideShaker was used to organize the detected PSMs, assess the confidence of the data by using FDR analysis, and infer the identities of proteins based on the matched peptide sequences (18). Moreover, PeptideShaker generates outputs that can be used to visualize and interpret the data.

MaxQuant is an MS-based proteomics platform that is capable of processing raw data and provides improved mass precision and high precursor mass accuracy, which results in increased protein identification and more in-depth proteomic analysis (16, 17). Following database searching, microbial peptides from SearchGUI/PeptideShaker and MaxQuant were identified, merged, filtered to retain confident peptides, and grouped to obtain a list of distinct microbial peptides (Fig. 3). This list of distinct peptides was then extracted to use as input for PepQuery2 verification.

Verification module of distinct microbial peptides using PepQuery2

Inputs for the PepQuery2 tool consisted of the list of distinct microbial peptides (from the Discovery module), the four example MS data sets (MGF), the Human UniProt Reference proteome (with isoforms; 82,678 sequences), and contaminants (cRAP) protein sequence databases. PepQuery2 tool further verified the identified microbial peptides detected via the Discovery module to ensure that they were indeed of microbial origin and not a result of human peptides being misassigned (Fig. 4A).

The PepQuery tool was developed as a peptide-centric search engine for MS/MS data analysis to verify the quality of non-host sequences by assigning P-values corresponding to the confidence level in peptide detection (19). PepQuery enables users to search peptide sequences against a proteomics database to verify high-confident PSMs and rigorously examines peptide modifications, which greatly reduces false discoveries in novel peptide identification (19). The PepQuery2 tool builds upon the capabilities of the initial PepQuery software release by providing a new MS/MS spectrum indexing, which results in highly efficient, targeted peptide identification (20). Parameters for PepQuery2 verification in this study are detailed in Table S6, and both versions of the tool are available as web-based and stand-alone applications (http://www.pepquery.org/).

For each MGF MS data set used as input, PepQuery2 generated a PSM rank file, each containing peptide identification information. These PSM rank files were compiled and filtered to retain confident PSMs. Then, the Query Tabular tool was used to select microbial peptides detected with confident PSMs and to remove potential contaminants (21). This list of microbial peptides was grouped to obtain distinct peptides (Fig. 4A). Then, as shown in Fig. 4B, PepQuery2-verified peptides and the peptides from SearchGUI/PeptideShaker and MaxQuant (from database searching from the Discovery module) were used as inputs for Query Tabular to extract accession numbers of proteins associated with the PepQuery2-verified peptides. The protein accession numbers of the verified microbial peptides were used to generate a protein sequences FASTA file using the UniProt XML Downloader tool within the Galaxy platform. This microbial protein sequences database (73 sequences) was then merged with the Human UniProt Reference proteome (with isoforms; 82,678 sequences) and cRAP sequence database to generate a protein sequences database (82,562 sequences) for protein and peptide quantification using MaxQuant software (Fig. 4B).

Quantification module using MaxQuant

The Quantification module uses the verified microbial database from the previous module along with the Human UniProt and contaminants, RAW MS files, Experimental Design template (for MaxQuant), and Human protein sequences database as inputs (Fig. 5; Table S5). Proteins and peptides from the MS files were quantified using MaxQuant against the verified microbial protein sequences database (merged with human protein sequences and contaminants sequences; 82,562 protein sequences). The MaxQuant outputs of interest consist of a Protein Group text file, a Peptides text file, and an Evidence file. Within the Protein Group file, MaxQuant reported all proteins containing the sequences of identified peptides. For human proteins, wherein peptides are shared within isoforms, a group of proteins that share the peptides was reported. In microbial proteins, the protein groups are reported across organisms based on shared microbial peptides.

Since the inputs contained a mixture of human and microbial protein sequences and contaminants, MaxQuant outputs subsequently did not differentiate between microbial and human sequences. To perform data analysis of the microbial community present in the sample, microbial peptides were extracted from the list of identified peptides. This allowed us to prioritize the identification and quantification of microbial proteins, despite their lower abundance relative to the host proteins in the clinical samples. This module generated lists of quantified proteins and peptides that were processed for data interpretation and visualization (22).

Data interpretation module using Unipept and MSstatsTMT

The Protein Group text file obtained through MaxQuant quantification was first divided into human and microbial protein groups. Proteins were designated with tags: “_HUMAN” for human host proteins and “_CON” for contaminant proteins. The remaining proteins with microbial taxonomy tags were designated as microbial proteins. To gain deeper insights into the microbial protein groups, the Unipept tool was employed to perform taxonomic and functional annotation. First, Unipept determined taxonomic composition by performing the Lowest Common Ancestor (LCA) analysis on the microbial peptide sequences (7, 23, 24). The Unipept tool has a variety of functions that allow for biodiversity analysis and data visualization of metaproteomics samples (23). The Unipept tool takes tryptic peptides as inputs, and for each given peptide, Unipept retrieves UniProt entries, which include accession numbers, protein names, and NCBI taxon IDs, from the UniProtKB database as well as a complete set of organisms in which the peptide occurs. Each peptide’s set of organisms is processed using the Unipept LCA algorithm to obtain the most specific taxonomic rank that the organisms share. For this study, the Unipept LCA analysis allowed for the assignment of the most likely genus, and species whenever possible, as well as functional categories for each of the 155 quantified microbial peptides (Fig. 6A).

Unipept outputs included a microbial taxonomic tree and enzyme commission proteins tree, hierarchical taxonomic annotation with EC numbers, InterPro, and Gene Ontology terms (Table S7). These outputs offered a comprehensive understanding of the microbial ecology of the quantified proteins.

Statistical analysis using MSstatsTMT relied on the Protein Group text file (.txt) obtained through MaxQuant quantification as well as the MaxQuant Evidence file (Fig. 6A; Tables S8 and S9). The human and microbial protein groups were separately analyzed using the MSstatsTMT tool. The MSstatsTMT tool is a free, open-source R/Bioconductor package that is compatible with data processing, such as MaxQuant, and allows for sensitive and specific detection of differentially expressed proteins in large-scale experiments with multiple biological samples (22, 25). MSstatsTMT, requiring annotation and comparison matrix files, removed all proteins labeled as contaminants from the MaxQuant protein groups and performed statistical analysis to discern the differential expression of quantified proteins across the three sample groups: healthy, benign, and OC. The annotation file dictated how the quantifications were combined, and the comparison matrix file was needed to accommodate the three different sample groups (OC, healthy, benign). MSstatsTMT generated tabular files with protein abundance ratios as well as comparison and volcano plots, all of which showcased differentially expressed proteins between human and microbial protein groups. Tutorial material for using MaxQuant and MSstatTMT for TMT data analysis is on the Galaxy Training Network (https://gxy.io/GTN:T00220).

ACKNOWLEDGMENTS

We thank Dr. Kristin Boylan at the University of Minnesota and members of the Pacific Northwest National Laboratories, especially Drs. Paul Piehowski, Tao Liu, and Karin Rodland for their technical expertise in the collection and processing of the Pap test samples and generation of the TMT MS data used in this study.

This project was funded in part by the Minnesota Ovarian Cancer Alliance, the National Institutes of Health/National Cancer Institute Grant Number: 5R01CA262153 (A.P.N.S.), 1R21CA267707 (P.D.J. and T.J.G.), and The National Institutes of Health/National Cancer Institute Grant Number: P30CA077598 (P.D.J. and T.J.G.).

Contributor Information

Pratik D. Jagtap, Email: pjagtap@umn.edu.

Angela L. Rasmussen, University of Saskatchewan, Saskatoon, Saskatchewan, Canada

DATA AVAILABILITY

Input files can be accessed on Zenodo at https://doi.org/10.5281/zenodo.10720030 (last updated: February 2024). For future updates (if any), use the DOI 10.5281/zenodo.10105820 to access the most current version. All information for each module presented in this study is mentioned in the supplemental material, including example files and tools (and parameters) hosted on Galaxy Europe Server as well as Galaxy Training Network tutorials deposited on GitHub. All data and links are current as of February 2024.

ETHICS APPROVAL

De-identified residual Pap test sample vials were obtained from the University of Minnesota BioNet Tissue Procurement Facility with approval from the University of Minnesota Institutional Review Board (Protocol 1112M07362), which does not require patient consent for de-identified clinical specimen use. All procedures followed the University of Minnesota Institutional Review Board guidelines and regulations.

SUPPLEMENTAL MATERIAL

The following material is available online at https://doi.org/10.1128/msphere.00793-23.

Supplemental Information. msphere.00793-23-s0001.docx.

Supplemental figures and tables.

DOI: 10.1128/msphere.00793-23.SuF1

ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.

REFERENCES

  • 1. Zhang X, Li L, Butcher J, Stintzi A, Figeys D. 2019. Advancing functional and translational microbiome research using meta-omics approaches. Microbiome 7:154. doi: 10.1186/s40168-019-0767-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Van Den Bossche T, Arntzen MØ, Becher D, Benndorf D, Eijsink VGH, Henry C, Jagtap PD, Jehmlich N, Juste C, Kunath BJ, Mesuere B, Muth T, Pope PB, Seifert J, Tanca A, Uzzau S, Wilmes P, Hettich RL, Armengaud J. 2021. The metaproteomics initiative: a coordinated approach for propelling the functional characterization of microbiomes. Microbiome 9:243. doi: 10.1186/s40168-021-01176-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Tanca A, Palomba A, Deligios M, Cubeddu T, Fraumene C, Biosa G, Pagnozzi D, Addis MF, Uzzau S. 2013. Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture. PLoS One 8:e82981. doi: 10.1371/journal.pone.0082981 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Seifert J, Herbst F-A, Halkjaer Nielsen P, Planes FJ, Jehmlich N, Ferrer M, von Bergen M. 2013. Bioinformatic progress and applications in metaproteogenomics for bridging the gap between genomic sequences and metabolic functions in microbial communities. Proteomics 13:2786–2804. doi: 10.1002/pmic.201200566 [DOI] [PubMed] [Google Scholar]
  • 5. Muth T, Renard BY, Martens L. 2016. Metaproteomic data analysis at a glance: advances in computational microbial community proteomics. Expert Rev Proteomics 13:757–769. doi: 10.1080/14789450.2016.1209418 [DOI] [PubMed] [Google Scholar]
  • 6. Hiltemann S, Rasche H, Gladman S, Hotz H-R, Larivière D, Blankenberg D, Jagtap PD, Wollmann T, Bretaudeau A, Goué N, et al. 2023. Galaxy training: a powerful framework for teaching! PLoS Comput Biol 19:e1010752. doi: 10.1371/journal.pcbi.1010752 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Blank C, Easterly C, Gruening B, Johnson J, Kolmeder CA, Kumar P, May D, Mehta S, Mesuere B, Brown Z, Elias JE, Hervey WJ, McGowan T, Muth T, Nunn B, Rudney J, Tanca A, Griffin TJ, Jagtap PD. 2018. Disseminating metaproteomic informatics capabilities and knowledge using the Galaxy-P framework. Proteomes 6:7. doi: 10.3390/proteomes6010007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Galaxy Community . 2022. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 50:W345–W351. doi: 10.1093/nar/gkac247 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. 2023. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 20:251–266. doi: 10.1080/14789450.2023.2265062 [DOI] [PubMed] [Google Scholar]
  • 10. Bihani S, Gupta A, Mehta S, Rajczewski AT, Johnson J, Borishetty D, Griffin TJ, Srivastava S, Jagtap PD. 2023. Metaproteomic analysis of nasopharyngeal SWAB samples to identify microbial peptides in COVID-19 patients. J Proteome Res 22:2608–2619. doi: 10.1021/acs.jproteome.3c00040 [DOI] [PubMed] [Google Scholar]
  • 11. Armengaud J. 2023. Metaproteomics to understand how microbiota function: the crystal ball predicts a promising future. Environ Microbiol 25:115–125. doi: 10.1111/1462-2920.16238 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Potgieter MG, Nel AJM, Fortuin S, Garnett S, Wendoh JM, Tabb DL, Mulder NJ, Blackburn JM. 2023. MetaNovo: an open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets. PLoS Comput Biol 19:e1011163. doi: 10.1371/journal.pcbi.1011163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. O’Bryon I, Jenson SC, Merkley ED. 2020. Flying blind, or just flying under the radar? The underappreciated power of de novo methods of mass spectrometric peptide identification. Protein Sci 29:1864–1878. doi: 10.1002/pro.3919 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Vaudel M, Barsnes H, Berven FS, Sickmann A, Martens L. 2011. SearchGUI: an open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 11:996–999. doi: 10.1002/pmic.201000595 [DOI] [PubMed] [Google Scholar]
  • 15. Kim S, Pevzner PA. 2014. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 5:5277. doi: 10.1038/ncomms6277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Tyanova S, Temu T, Cox J. 2016. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 11:2301–2319. doi: 10.1038/nprot.2016.136 [DOI] [PubMed] [Google Scholar]
  • 17. Jagtap P, Bandhakavi S, Higgins L, McGowan T, Sa R, Stone MD, Chilton J, Arriaga EA, Seymour SL, Griffin TJ. 2012. Workflow for analysis of high mass accuracy salivary data set using MaxQuant and ProteinPilot search algorithm. Proteomics 12:1726–1730. doi: 10.1002/pmic.201100097 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Batut B, Hiltemann S, Bagnacani A, Baker D, Bhardwaj V, Blank C, Bretaudeau A, Brillet-Guéguen L, Čech M, Chilton J, et al. 2018. Community-driven data analysis training for biology. Cell Syst 6:752–758. doi: 10.1016/j.cels.2018.05.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Wen B, Wang X, Zhang B. 2019. PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations. Genome Res 29:485–493. doi: 10.1101/gr.235028.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Wen B, Zhang B. 2023. PepQuery2 democratizes public MS proteomics data for rapid peptide searching. Nat Commun 14:2213. doi: 10.1038/s41467-023-37462-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Johnson JE, Kumar P, Easterly C, Esler M, Mehta S, Eschenlauer AC, Hegeman AD, Jagtap PD, Griffin TJ. 2018. Improve your Galaxy text life: the Query Tabular Tool. F1000Res 7:1604. doi: 10.12688/f1000research.16450.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Pinter N, Glätzer D, Fahrner M, Fröhlich K, Johnson J, Grüning BA, Warscheid B, Drepper F, Schilling O, Föll MC. 2022. MaxQuant and MSstats in Galaxy enable reproducible cloud-based analysis of quantitative proteomics experiments for everyone. J Proteome Res 21:1558–1565. doi: 10.1021/acs.jproteome.2c00051 [DOI] [PubMed] [Google Scholar]
  • 23. Mesuere B, Willems T, Van der Jeugt F, Devreese B, Vandamme P, Dawyndt P. 2016. Unipept web services for metaproteomics analysis. Bioinformatics 32:1746–1748. doi: 10.1093/bioinformatics/btw039 [DOI] [PubMed] [Google Scholar]
  • 24. Verschaffelt P, Collier J, Botzki A, Martens L, Dawyndt P, Mesuere B. 2022. Unipept visualizations: an interactive visualization library for biological data. Bioinformatics 38:562–563. doi: 10.1093/bioinformatics/btab590 [DOI] [PubMed] [Google Scholar]
  • 25. Huang T, Choi M, Tzouros M, Golling S, Pandya NJ, Banfai B, Dunkley T, Vitek O. 2020. MSstatsTMT: statistical detection of differentially abundant proteins in experiments with isobaric labeling and multiple mixtures. Mol Cell Proteomics 19:1706–1723. doi: 10.1074/mcp.RA120.002105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Choi M, Chang C-Y, Clough T, Broudy D, Killeen T, MacLean B, Vitek O. 2014. MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics 30:2524–2526. doi: 10.1093/bioinformatics/btu305 [DOI] [PubMed] [Google Scholar]
  • 27. Wolf M, Schallert K, Knipper L, Sickmann A, Sczyrba A, Benndorf D, Heyer R. 2023. Advances in the clinical use of metaproteomics. Expert Rev Proteomics 20:71–86. doi: 10.1080/14789450.2023.2215440 [DOI] [PubMed] [Google Scholar]
  • 28. Ayan E, DeMirci H, Serdar MA, Palermo F, Baykal AT. 2023. Bridging the gap between gut microbiota and Alzheimer’s disease: a metaproteomic approach for biomarker discovery in transgenic mice. Int J Mol Sci 24:12819. doi: 10.3390/ijms241612819 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Levi Mortera S, Vernocchi P, Basadonne I, Zandonà A, Chierici M, Durighello M, Marzano V, Gardini S, Gasbarrini A, Urbani A, Vicari S, Roncada P, Furlanello C, Venuti P, Putignani L. 2022. A metaproteomic-based gut microbiota profiling in children affected by autism spectrum disorders. J Proteomics 251:104407. doi: 10.1016/j.jprot.2021.104407 [DOI] [PubMed] [Google Scholar]
  • 30. Long S, Yang Y, Shen C, Wang Y, Deng A, Qin Q, Qiao L. 2020. Metaproteomics characterizes human gut microbiome function in colorectal cancer. NPJ Biofilms Microbiomes 6:14. doi: 10.1038/s41522-020-0123-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Hardouin P, Chiron R, Marchandin H, Armengaud J, Grenga L. 2021. Metaproteomics to decipher CF host-microbiota interactions: overview challenges and future perspectives. Genes (Basel) 12:892. doi: 10.3390/genes12060892 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Levi Mortera S, Marzano V, Vernocchi P, Matteoli MC, Guarrasi V, Gardini S, Del Chierico F, Rapini N, Deodati A, Fierabracci A, Cianfarani S, Putignani L. 2022. Functional and taxonomic traits of the gut microbiota in type 1 diabetes children at the onset: a metaproteomic study. Int J Mol Sci 23:15982. doi: 10.3390/ijms232415982 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Gonzalez CG, Mills RH, Zhu Q, Sauceda C, Knight R, Dulai PS, Gonzalez DJ. 2022. Location-specific signatures of Crohn’s disease at a multi-omics scale. Microbiome 10:133. doi: 10.1186/s40168-022-01331-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Thuy-Boun PS, Mehta S, Gruening B, McGowan T, Nguyen A, Rajczewski AT, Johnson JE, Griffin TJ, Wolan DW, Jagtap PD. 2021. Metaproteomics analysis of SARS-CoV-2-infected patient samples reveals presence of potential coinfecting microorganisms. J Proteome Res 20:1451–1454. doi: 10.1021/acs.jproteome.0c00822 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Grenga L, Pible O, Miotello G, Culotta K, Ruat S, Roncato M-A, Gas F, Bellanger L, Claret P-G, Dunyach-Remy C, Laureillard D, Sotto A, Lavigne J-P, Armengaud J. 2022. Taxonomical and functional changes in COVID-19 faecal microbiome could be related to SARS-CoV-2 faecal load. Environ Microbiol 24:4299–4316. doi: 10.1111/1462-2920.16028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Biemann R, Buß E, Benndorf D, Lehmann T, Schallert K, Püttker S, Reichl U, Isermann B, Schneider JG, Saake G, Heyer R. 2021. Fecal metaproteomics reveals reduced gut inflammation and changed microbial metabolism following lifestyle-induced weight loss. Biomolecules 11:726. doi: 10.3390/biom11050726 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Gómez-Varela D, Xian F, Grundtner S, Sondermann JR, Carta G, Schmidt M. 2023. Increasing taxonomic and functional characterization of host-microbiome interactions by DIA-PASEF metaproteomics. Front Microbiol 14:1258703. doi: 10.3389/fmicb.2023.1258703 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Jagtap PD, Viken KJ, Johnson J, McGowan T, Pendleton KM, Griffin TJ, Hunter RC, Rudney JD, Bhargava M. 2018. BAL fluid metaproteome in acute respiratory failure. Am J Respir Cell Mol Biol 59:648–652. doi: 10.1165/rcmb.2018-0068LE [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Masson L, Wilson J, Amir Hamzah AS, Tachedjian G, Payne M. 2023. Advances in mass spectrometry technologies to characterize cervicovaginal microbiome functions that impact spontaneous preterm birth. Am J Reprod Immunol 90:e13750. doi: 10.1111/aji.13750 [DOI] [PubMed] [Google Scholar]
  • 40. Bankvall M, Carda-Diéguez M, Mira A, Karlsson A, Hasséus B, Karlsson R, Robledo-Sierra J. 2023. Metataxonomic and metaproteomic profiling of the oral microbiome in oral lichen planus - a pilot study. J Oral Microbiol 15:2161726. doi: 10.1080/20002297.2022.2161726 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Cao X, Xie H, Huang D, Zhou W, Liu Y, Shen H, Zhou K. 2021. Detection of a clinical carbapenem-resistant Citrobacter portucalensis strain and the dissemination of C. portucalensis in clinical settings. J Glob Antimicrob Resist 27:79–81. doi: 10.1016/j.jgar.2021.04.027 [DOI] [PubMed] [Google Scholar]
  • 42. Khorsand B, Asadzadeh Aghdaei H, Nazemalhosseini-Mojarad E, Nadalian B, Nadalian B, Houri H. 2022. Overrepresentation of Enterobacteriaceae and Escherichia coli is the major gut microbiome signature in Crohn’s disease and ulcerative colitis; a comprehensive metagenomic analysis of IBDMDB datasets. Front Cell Infect Microbiol 12:1015890. doi: 10.3389/fcimb.2022.1015890 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Luo X, Yu L, Feng J, Zhang J, Zheng C, Hu D, Dai P, Xu M, Li P, Lin R, Mu K. 2022. Emergence of extensively drug-resistant ST170 citrobacter portucalensis with plasmids pK218-KPC, pK218-NDM, and pK218-SHV from a tertiary hospital, China. Microbiol Spectr 10:e0251022. doi: 10.1128/spectrum.02510-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Sellera FP, Fernandes MR, Fuga B, Fontana H, Vásquez-Ponce F, Goldberg DW, Monte DF, Rodrigues L, Cardenas-Arias AR, Lopes R, Cardoso B, Costa DGC, Esposito F, Lincopan N. 2022. Phylogeographical landscape of Citrobacter portucalensis carrying clinically relevant resistomes. Microbiol Spectr 10:e0150621. doi: 10.1128/spectrum.01506-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Chen Y, Li X, Yu C, Wang E, Luo C, Jin Y, Zhang L, Ma Y, Jin Y, Yang L, Sun B, Qiao J, Zhou X, Rasche L, Einsele H, Song J, Bai T, Hou X. 2023. Gut microbiome alterations in patients with COVID-19-related coagulopathy. Ann Hematol 102:1589–1598. doi: 10.1007/s00277-023-05186-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Camargo CH, Yamada AY, de Souza AR, Sacchi CT, Reis AD, Santos MBN, de Assis DB, de Carvalho E, Takagi EH, Cunha MPV, Tiba-Casas MR. 2024. Genomic characterization of New Delhi metallo‐beta‐lactamase–producing species of Morganellaceae, Yersiniaceae, and Enterobacteriaceae (other than Klebsiella ) from Brazil over 2013–2022. Microbiol Immunol 68:1–5. doi: 10.1111/1348-0421.13100 [DOI] [PubMed] [Google Scholar]
  • 47. Jagtap P, Goslinga J, Kooren JA, McGowan T, Wroblewski MS, Seymour SL, Griffin TJ. 2013. A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies. Proteomics 13:1352–1357. doi: 10.1002/pmic.201200352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Kertesz-Farkas A, Keich U, Noble WS. 2015. Tandem mass spectrum identification via cascaded search. J Proteome Res 14:3027–3038. doi: 10.1021/pr501173s [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Kumar P, Johnson JE, Easterly C, Mehta S, Sajulga R, Nunn B, Jagtap PD, Griffin TJ. 2020. A sectioning and database enrichment approach for improved peptide spectrum matching in large, genome-guided protein sequence databases. J Proteome Res 19:2772–2785. doi: 10.1021/acs.jproteome.0c00260 [DOI] [PubMed] [Google Scholar]
  • 50. He T, Liu Y, Zhou Y, Li L, Wang H, Chen S, Gao J, Jiang W, Yu Y, Ge W, Chang H-Y, Fan Z, Nesvizhskii AI, Guo T, Sun Y. 2022. Comparative evaluation of proteome discoverer and FragPipe for the TMT-based proteome quantification. J Proteome Res 21:3007–3015. doi: 10.1021/acs.jproteome.2c00390 [DOI] [PubMed] [Google Scholar]
  • 51. Searle BC, Shannon AE, Wilburn DB. 2023. Scribe: next generation library searching for DDA experiments. J Proteome Res 22:482–490. doi: 10.1021/acs.jproteome.2c00672 [DOI] [PubMed] [Google Scholar]
  • 52. Elias JE, Gygi SP. 2007. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4:207–214. doi: 10.1038/nmeth1019 [DOI] [PubMed] [Google Scholar]
  • 53. Heil LR, Damoc E, Arrey TN, Pashkova A, Denisov E, Petzoldt J, Peterson AC, Hsu C, Searle BC, Shulman N, Riffle M, Connolly B, MacLean BX, Remes PM, Senko MW, Stewart HI, Hock C, Makarov AA, Hermanson D, Zabrouskov V, Wu CC, MacCoss MJ. 2023. Evaluating the performance of the Astral mass analyzer for quantitative proteomics using data-independent acquisitio. J Proteome Res 22:3290–3300. doi: 10.1021/acs.jproteome.3c00357 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Pino LK, Searle BC, Bollinger JG, Nunn B, MacLean B, MacCoss MJ. 2020. The Skyline ecosystem: informatics for quantitative mass spectrometry proteomics. Mass Spectrom Rev 39:229–244. doi: 10.1002/mas.21540 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Jagtap PD, Johnson JE, Onsongo G, Sadler FW, Murray K, Wang Y, Shenykman GM, Bandhakavi S, Smith LM, Griffin TJ. 2014. Flexible and accessible workflows for improved proteogenomic analysis using the Galaxy framework. J Proteome Res 13:5898–5908. doi: 10.1021/pr500812t [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. McGowan T, Johnson JE, Kumar P, Sajulga R, Mehta S, Jagtap PD, Griffin TJ. 2020. Multi-omics visualization platform: an extensible Galaxy plug-in for multi-omics data visualization and exploration. Gigascience 9:giaa025. doi: 10.1093/gigascience/giaa025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Jagtap P, Mehta S. 2023. Bioinformatic workflow for metaproteomic analysis of host-microbe dynamics in clinical samples. Available from: 10.5281/ZENODO.10079566 [DOI] [PMC free article] [PubMed]
  • 58. Aakko J, Pietilä S, Suomi T, Mahmoudian M, Toivonen R, Kouvonen P, Rokka A, Hänninen A, Elo LL. 2020. Data-independent acquisition mass spectrometry in metaproteomics of gut microbiota-implementation and computational analysis. J Proteome Res 19:432–436. doi: 10.1021/acs.jproteome.9b00606 [DOI] [PubMed] [Google Scholar]
  • 59. Zhao J, Yang Y, Xu H, Zheng J, Shen C, Chen T, Wang T, Wang B, Yi J, Zhao D, Wu E, Qin Q, Xia L, Qiao L. 2023. Data-independent acquisition boosts quantitative metaproteomics for deep characterization of gut microbiota. NPJ Biofilms Microbiomes 9:4. doi: 10.1038/s41522-023-00373-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Chambers MC, Jagtap PD, Johnson JE, McGowan T, Kumar P, Onsongo G, Guerrero CR, Barsnes H, Vaudel M, Martens L, Grüning B, Cooke IR, Heydarian M, Reddy KL, Griffin TJ. 2017. An accessible proteogenomics informatics resource for cancer researchers. Cancer Res 77:e43–e46. doi: 10.1158/0008-5472.CAN-17-0331 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Boekel J, Chilton JM, Cooke IR, Horvatovich PL, Jagtap PD, Käll L, Lehtiö J, Lukasse P, Moerland PD, Griffin TJ. 2015. Multi-omic data analysis using Galaxy. Nat Biotechnol 33:137–139. doi: 10.1038/nbt.3134 [DOI] [PubMed] [Google Scholar]
  • 62. Serrano-Solano B, Föll MC, Gallardo-Alba C, Erxleben A, Rasche H, Hiltemann S, Fahrner M, Dunning MJ, Schulz MH, Scholtz B, Clements D, Nekrutenko A, Batut B, Grüning BA. 2021. Fostering accessible online education using Galaxy as an e-learning platform. PLoS Comput Biol 17:e1008923. doi: 10.1371/journal.pcbi.1008923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Boylan KL, Afiuni-Zadeh S, Geller MA, Hickey K, Griffin TJ, Pambuccian SE, Skubitz AP. 2014. A feasibility study to identify proteins in the residual Pap test fluid of women with normal cytology by mass spectrometry-based proteomics. Clin Proteomics 11:30. doi: 10.1186/1559-0275-11-30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Afiuni-Zadeh S, Boylan KLM, Jagtap PD, Griffin TJ, Rudney JD, Peterson ML, Skubitz APN. 2018. Evaluating the potential of residual Pap test fluid as a resource for the metaproteomic analysis of the cervical-vaginal microbiome. Sci Rep 8:10868. doi: 10.1038/s41598-018-29092-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information. msphere.00793-23-s0001.docx.

Supplemental figures and tables.

DOI: 10.1128/msphere.00793-23.SuF1

Data Availability Statement

Input files can be accessed on Zenodo at https://doi.org/10.5281/zenodo.10720030 (last updated: February 2024). For future updates (if any), use the DOI 10.5281/zenodo.10105820 to access the most current version. All information for each module presented in this study is mentioned in the supplemental material, including example files and tools (and parameters) hosted on Galaxy Europe Server as well as Galaxy Training Network tutorials deposited on GitHub. All data and links are current as of February 2024.


Articles from mSphere are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES