Abstract
PatternLab for proteomics is an integrated computational environment that unifies several previously published modules for analyzing shotgun proteomic data. PatternLab contains modules for formatting sequence databases, performing peptide spectrum matching, statistically filtering and organizing shotgun proteomic data, extracting quantitative information from label-free and chemically labeled data, performing statistics for differential proteomics, displaying results in a variety of graphical formats, performing similarity-driven studies with de novo sequencing data, analyzing time-course experiments, and helping with the understanding of the biological significance of data in the light of the Gene Ontology. Here we describe PatternLab for proteomics 4.0, which closely knits together all of these modules in a self-contained environment, covering the principal aspects of proteomic data analysis as a freely available and easily installable software package. All updates to PatternLab, as well as all new features added to it, have been tested over the years on millions of mass spectra.
Keywords: shotgun proteomics, data analysis, bioinformatics, computational proteomics
INTRODUCTION
Shotgun proteomics has revolutionized biochemical and biomedical research by enabling the identification and quantitation of thousands of proteins in complex biological samples such as organelles, cell lysates, biological fluids, and tissues1. The field’s denomination as shotgun proteomics comes from the name coined at the Yates lab for a strategy to characterize proteins analyzed indirectly through peptides obtained by proteolysis, in analogy to shotgun genomic sequencing2. The core of the discipline relies on state-of-the-art nano-chromatography coupled with mass spectrometry, one of the most sensitive methods of analytical chemistry, to dissociate peptide ions in the mass spectrometer, and ultimately obtain peptide sequences and from them infer and quantitate the proteins found in complex mixtures. Hundreds of thousands of tandem mass spectra are commonly generated in an experiment, therefore advanced bioinformatics algorithms are required to make sense of all the data. In a typical experiment, peptides are fractionated by liquid chromatography online with tandem mass spectrometry, and protein identification is achieved by comparing experimental spectra against those theoretically generated from a sequence database. Proteins are then inferred by matching the identified peptide sequences to the sequences in the database; as peptides can match more than one protein, proteins can be further grouped according to a maximum parsimony criterion3. Peptide spectrum matching (PSM) algorithms commonly leverage data from existing genomic projects. As a post-genomic discipline, the goals of shotgun proteomics are far more ambitious than those of genome sequencing, as it aims to report protein expression, interaction, localization, post-translational modifications, turnover time, etc., when comparing different biological states. During the past decade, shotgun proteomics has been applied in many different ways to advance biological discovery. Notable examples can be found in studies describing differential protein expression between subcellular compartments4, pinpointing changes in proteomic profiles of cancer biopsies5, and describing the contents of venoms to ultimately aid in biotechnological applications6,7.
The field was jump-started by the creation of SEQUEST, an algorithm that correlates tandem mass spectra with theoretical spectra generated from a sequence database8. In what followed, the coupling of strong-cation exchange chromatography with reversed phase chromatography online with tandem mass spectrometry set new heights in terms of number of peptide identifications. This technology, later renamed as Multi-dimensional Protein Identification Technology (MudPIT)9, as well as competing strategies that use ultra-long chromatography gradients10, were adopted and thus raised the bar in terms of challenges, both in the handling of the new computational burden and in how to statistically deal with that time’s ‘big data’. In response, a new class of algorithms appeared, geared towards post-processing the search engine results in order to statistically pinpoint identifications with confidence; examples of pioneering efforts are Peptide Prophet11 and DTASelect12. At the same time, breakthroughs on how to quantitate complex peptide mixtures analyzed by mass spectrometry were being attained, the two main pillars being the labeled and label-free approaches. Examples of the former are the isobaric tags13 and stable isotope amino acid labeling14; as for the latter, spectral counting15,16 and extracted-ion chromatograms (XICs)17. Naturally, intensive software development tailored towards enabling these quantitation approaches became necessary. As the possibilities for how to mine the ‘proteomosphere’18 continued to expand, a plethora of new software began to be sparsely distributed among members of the community, each addressing very specific niches. These have included, for example, algorithms for scoring phosphosites19, deconvoluting mass spectra20, and even for dealing with unsequenced organisms18.
PatternLab and other widely adopted proteomic pipelines
With so many options to choose from to analyze shotgun proteomic data, efforts were shifted towards the creation of unified pipelines: indeed, deciding which software to pick and making them interact with one another were challenging problems. Thus, the first pipelines emerged, including viz., the trans-proteomic pipeline (TPP)21,22, OpenMS23, MaxQuant24,25, and PatternLab for proteomics26, each having its own set of advantages and limitations. In what followed, SkyLine27 and Galaxy28 emerged to overcome some of the limitations of the aforementioned tools at the time. While there is great overlap among these software pipelines, each has a special set of features that provides advantages when analyzing data originating from a certain setup.
TPP and Galaxy are tailored towards (but not limited to) working on computing clusters, thus relieving the users from the burden of processing large amounts of data (and therefore vastly mobilizing resources, such as storage) locally on their own computers. As these tools are generally remotely accessed through a web-based interface or command-line tools, no requirements are imposed on the local operating system or hardware configuration. Through the years, several leading groups have worked together on developing modules for TPP, so ultimately, questions regarding details of how each module works can be addressed directly by the corresponding specialists. On the other hand, the team behind Galaxy focuses on making available a customizable workflow management system, and thus efforts have been channeled towards providing a sophisticated environment for users to integrate several data analysis tools and protocols (as opposed to developing the data analysis modules themselves). In fact, this strategy culminated in making Galaxy an environment capable of integrating genomic, transcriptomic, proteomic, and metabolomic data29.
In contrast, MaxQuant, OpenMS, Skyline, and PatternLab are all designed exclusively to be used locally on one’s computer, with some clear benefits over their web-based counterparts. For example, when an update is done on a web-based pipeline, there is the possibility of immediate (and sometimes undesired) impact on ongoing analyses. Desktop users, on the other hand, have control over when to update their software. Moreover, it must be noted that today’s high-end desktops, and even notebooks, have become so powerful that they are fully capable of analyzing the data of large-scale proteomic experiments efficiently.
MaxQuant, Skyline, and PatternLab all require Microsoft’s Windows 7 (or later) operating system, as they are based on .NET, a software framework that runs primarily on Microsoft Windows. In contrast, OpenMS can be executed on any operating system, as it is based on the C++ programming language. Another advantage of OpenMS is that its modules are all available as stand-alone tools, which facilitates integration into third-party workflows or the design of custom, local bioinformatics pipelines. As for the other tools, MaxQuant has been known to excel on SILAC experiments and Skyline for its unmatched capabilities in experiments addressing Selected Reaction Monitoring (SRM)/Multiple Reaction Monitoring and Parallel Reaction Monitoring. More recently, Skyline became capable of analyzing DIA data, as described in a previous protocol30. PatternLab, in turn, provides one of the most complete and user-friendly experiences, owing to its very refined and interactive graphical user interface. As for its hallmarks, we believe they lie in analyzing label-free data through the T-Fold31 module and in the isobaric (e.g., iTRAQ/TMT) analyzer module. Some of its unique features include providing an integrated cloud service32, modules for statistically filtering and performing assembly of de novo sequencing data33, statistically scoring phosphopeptides34, dealing with time-course experiments35, and offering a module for integrated Gene Ontology analysis36. Modules yet to be integrated in future versions are capable of deisotoping and decharging mass spectra18, and of identifying cross-linked peptides to address protein-protein interaction and aid in providing structural data37 (the latter is described in a recent protocol38). Therefore, even though all mentioned tools, web- and desktop-based alike, overlap significantly with one another, each has its own hallmarks and unique features, and may as such be more suitable for one’s working style and needs.
PatternLab is freely available software and is flexible enough to be used in the analysis of most shotgun proteomic experiments. We advise using PatternLab on any experiment requiring label-free quantitation, or in which the data have been chemically labeled with isobaric markers.
Development of the protocol
Since its launch in 2008, PatternLab has undergone continual improvement as well as expansion. The very first version was limited to working with spectral counting, and offered strategies for pinpointing differentially expressed proteins, but all modules from that time have since been replaced by more sophisticated versions. Such major updates led us to release the system’s first major protocol in 201039. Thanks to the continual influx of suggestions from their various users, the modules continued to evolve and new modules appeared, such as the Search Engine Processor40 (SEPro) for filtering and organizing shotgun proteomics data, and a module for XICs. A revised version of that first protocol was then published in 201241. The PatternLab version at the time, PatternLab for proteomics 2.0, consisted of a series of modular software. One major request from its community of users was for the installation process of so many modules (one at a time) to be simplified, as well as for greater integration among the (then independent) modules so they would not have to be dealt with separately. Moreover, installing the modules could sometimes require installing third-party software such as, say, the Java Runtime Environment, and many times having to deal with configuration files. Simply put, PatternLab needed to be reengineered to be completely installable at a single click of the mouse as well as to work as a unit. PatternLab for proteomics 3.0 achieved this in 2013, uniting all modules under a single graphical user interface and thus fulfilling all user requests of that time.
Since 2013, PatternLab has acquired new modules and functionalities. Some examples are: Búzios, which allows the clustering of similar proteomic profiles5; the XD Scoring system, for evaluating the confidence in phosphosites34; PepExplorer, a tool for analyzing shotgun proteomic data of unsequenced organisms33; tools for performing ANOVA analysis; the incorporation of the Comet search engine, wrapped in a graphical user interface42, for analyzing isobaric experiments (e.g., iTRAQ and TMT); and a cloud service that enables large-scale quantitative predictions and comparisons of protein domains32. Some existing modules were significantly upgraded, such as the one for XICs. PatternLab for proteomics 4.0 is the culmination of these various changes, some of them major to the point of spanning the complete workflow but always aiming to simplify even more the process of analyzing shotgun proteomic data in an even more integrated environment. This protocol introduces the freely available PatternLab for proteomics 4.0 and shows how to operate the latest modules as well as how to deal with the new, simplified workflow. For those modules that underwent no changes, readers are referred to the corresponding sections of the previously published protocols.
Experimental design
PatternLab is adaptable to many experimental designs, and as such is applicable to analyzing data from most proteomic experiments. The topic of sample preparation and data acquisition in the mass spectrometer is an extensive one and encompasses tasks that must be performed prior to analyzing the data; in this regard, we recommend following the steps in the protocol by Richards et al.43.
Sequence database preparation
Databases of protein sequences are required so that theoretical mass spectra generated from them can be compared to experimental spectra. For the widely adopted PSM approach, we recommend downloading sequences from UniProt44, as some downstream analysis tools (e.g., the Gene Ontology) can take advantage of this knowledgebase. Regardless, any type of sequence database in the FASTA format is supported, so the user can download sequences from NCBI or even use an in-house generated database. The UniProt knowledgebase comprises the Swiss-Prot and the TrEMBL databases, the former containing manually annotated and reviewed sequences while the latter’s sequences are automatically annotated but not reviewed. We recommend downloading, whenever possible, only the species-specific database, containing entries from both Swiss-Prot and TrEMBL. This is achieved by navigating to the UniProt website at http://www.uniprot.org, clicking on the large ‘Proteomes’ square, and then naming the species in the search box. The sequences can be obtained by clicking on the number in the ‘Protein count’ column beside the desired species, clicking on the download button, and then selecting the FASTA format. In case one wishes to use the Gene Ontology as a downstream tool, an additional download of the sequences, in the ‘Text’ format, must be done.
Subsequently, a target-decoy database must be generated prior to searching with PatternLab’s integrated version of Comet. PatternLab contains a module that allows the automatic generation of decoys by reversing each sequence of the target database. A PatternLab option that we strongly recommend is to automatically include the 127 common contaminants found in proteomic experiments (e.g., Keratin, BSA, etc.). Even though there are many possible ways for generating decoy sequences, sequence reversal has been the most widely adopted one, as it conserves the complexity of the database (i.e., approximately the same number of decoy peptides and target peptides after an in silico digestion45).
Peptide identification from tandem mass spectra
PatternLab adopts Comet for the comparison of experimental and theoretically generated mass spectra. Comet is a fast and sensitive open-source search engine that stemmed from the widely adopted SEQUEST8. Comet is constantly being updated, and PatternLab’s automatic updates may include an updated built-in Comet search engine. A complete description of Comet’s parameters is available at the Comet project’s website, http://comet-ms.sourceforge.net/parameters/parameters_201502/; PatternLab allows the setting of these parameters through its graphical user interface.
When searching for peptide candidates within a database, a precursor mass tolerance must be specified. When using high-resolution instruments such as an Orbitrap Velos (Thermo, San Jose), we recommend using no less than 40, even if the mass spectrometer used provides, say, 5 ppm. The suggestion for the adoption of wide search windows is empirical and comes from experimenting with the search engine. Nevertheless, our experience is aligned with that of John S. Cottrell and David M. Creasy, from whom we quote, “The common observation is that FDR increases rather than decreases for very narrow precursor tolerances because the reliability of the scoring is reduced by the small numbers of candidates”46. Finally, we note that Comet’s results will later be statistically filtered and post-processed by SEPro. At that final stage, any matching containing more than a tighter tolerance, say, 5 ppm, will be discarded.
Peptides absent from the database cannot be identified by classical PSM. The PSM strategy is therefore blind to mutations/polymorphisms and may not work satisfactorily on organisms lacking a reference peptide sequence database. Moreover, post-translational modifications must be specified a priori. Often these are unknown for the experiment at hand, so usually only carbamidomethylation of cysteine and oxidation of methionine are specified as fixed and variable modifications, respectively. By having a quick look at UniMod (http://www.unimod.org), the protein modification for mass spectrometry database, one can take note of the variety of modifications that can occur in a sample. To cope with these limitations, approaches stemming from de novo sequencing have emerged. Among them we highlight Spectral Networks47, Mod-A48, MS-Blast49, and PepExplorer33. The first two are capable of pinpointing unanticipated modifications while the last two start with de novo sequencing results and align them against sequence databases of homologue organisms so that similar proteins can be determined. In particular, PepExplorer is integrated into PatternLab’s workflow, but notwithstanding this we recommend that the user consider other applications when working with unsequenced organisms. Being based on different paradigms, such applications may provide complementary results.
Statistically filtering peptide spectrum matches
The sensitivity of a PSM search engine is intimately related to how the search results are post-processed. PatternLab relies on SEPro40 to statistically filter its results in order to achieve a pre-determined FDR. The filtered results can be saved as a ‘sepr’ file and shared with collaborators. In this regard, anyone can open these files and have access to a dynamic report that enables sorting proteins according to various criteria (e.g., coverage, Normalized Spectral Abundance Factors (NSAF), spectral counts, etc.), as well as access to annotated mass spectra and search engine scores, and also accomplish much more within a few clicks of the mouse. Even though PatternLab houses Comet, SEPro (and consequently PatternLab) is compatible with ProLuCID50, SEQUEST8, and the Spectrum Identification Machine for PITC51. Our 2012 protocol provides the main steps for using SEPro41. At the time of its writing, PatternLab still required several separate downloads for installation and relied mostly on ProLuCID, but SEPro has now been ported to the main interface. Only the features that were implemented since 2012 are highlighted herein.
Quantitative proteomics
PatternLab can work with label-free quantitation and with chemically labeled relative quantitation. Among the label-free strategies, spectral counting has often been used in experiments with multi-dimensional separation (e.g., MudPIT). A spectral count refers simply to the number of tandem mass spectra associated with a protein and is used as a surrogate for the protein’s relative abundance. The community has proposed various ways for normalizing data of this type, and PatternLab optionally allows normalization by the Normalized Spectral Abundance Factors, which takes into account a protein’s length during the normalization process52. PatternLab also allows quantitation by XICs, which are frequently used in single-shot experiments and are obtained by plotting the intensity of a given m/z, plus or minus a given tolerance, over a given span of time. The area underneath this curve, or integral, can then be used as a surrogate for a peptide’s relative abundance in the mixture and as such provide a basis for comparison against the XIC of the same peptide in different mixtures.
A popular strategy for chemically labeling peptides to increase confidence in relative quantitation has been the use of isobaric tags; PatternLab also makes available modules for analyzing such data. Examples of widely adopted, commercially available tags are iTRAQ13 and TMT53, which enable experiments to be multiplexed. Currently the most commonly adopted configurations are the 4-plex iTRAQ, 6-plex TMT, and 8-plex iTRAQ; we point out that higher degrees of multiplexing are also available. These reagents rely on stable isotope-labeled molecules that covalently bind to the side-chain amines and the N-terminus of polypeptide chains. PatternLab used to rely on the now deprecated SEProQ module (then available as a separate download) for dealing with XICs and isobaric tag data, but this module has been significantly redesigned and integrated into PatternLab for proteomics 4.0. A limitation of relative quantitation by isobaric tags has been the interference of the nearly-isobaric peptides that are co-fragmented in the mass spectrometer along with the desired precursor ion, generating a false relative quantitation as the reporter ions’ signals get mixed with those from the nearly-isobaric molecules. To overcome this limitation, elaborate methods such as MultiNotch, only applicable to state-of-the-art or customized mass spectrometers, have been developed54. As far as we know, PatternLab’s isobaric module, described herein, is the only one to support MultiNotch acquisition while still providing a solution to standard data acquisition by automatically identifying and discarding multiplexed spectra.
The project must be organized in terms of what run belongs to which condition. This is performed using PatternLab’s Project Organization module, which ultimately generates a file containing all identifications and the quantitation data of all runs from the entire experiment for use in downstream analyses by several modules. Examples of such analyses are clustering proteins/peptides with similar expression profiles for time-course experiments, clustering data, pinpointing differentially expressed proteins or proteins found in only one condition, performing ANOVA, and even Gene Ontology analyses. In this protocol we provide the main steps, highlighted in the graphical summary of Figure 1, involved in these analyses. An accompanying video, demonstrating PatternLab for proteomics 4.0 in action, is available that provides an overview of the software (Supplementary Video 1).
Figure 1.

Overview of PatternLab’s workflow. In a general workflow, a target-decoy database is prepared (I), the mass spectra are searched (II) and statistically filtered to meet a user-defined FDR (III), the project is organized in terms of which mass spectral files belong to what biological conditions (IV), quantitative information is extracted (V), and then the various downstream modules for data analysis can be used (VI). The main modules for database generation, peptide identification, statistically filtering and quantitating PSMs, and data analysis are presented. The protocol steps pertinent to each module are also given.
Limitations of PatternLab for proteomics 4.0
The following are the major limitations of the current PatternLab version:
No handling of data from N15 labeling quantitative proteomic experiments.
No handling of SILAC data.
No handling of Selected Reaction Monitoring/Parallel Reaction Monitoring data55.
Not yet fully integrated with a public repository such as PRIDE56.
Cannot handle top-down data (i.e., mass spectrometry of intact proteins).
The seamless integration with raw data from mass spectrometers other than those from Thermo requires exporting data to text-based formats such as ms2, mzXML, mzML, or MGF.
Requires a computer with Microsoft Windows 7 or later.
We are working to overcome most of these limitations, though not currently looking into addressing the limitation in #3, as Skyline already does a good job on that. Tackling the limitation in #7 requires updates in the .NET environment from Microsoft’s end. The limitation in #6 can be overcome by referring to the ProteoWizard project57.
Materials
Equipment
Hardware requirements
-
A personal computer (PC) with at least 6 GB of RAM and an x86-64 processor.
CRITICAL: We strongly recommend having a multi-core processor, since it can effectively deal with the parallel computation performed by some of the modules, and having at least 16 GB of RAM.
Local storage is required for processing mass spectrometer RAW files. The space occupied by these files can significantly vary depending on the mass spectrometer used.
Data files
Software requirements
-
Microsoft Windows 7 or later (64-bit version).
CRITICAL ‘Regional and Language Options’ have to be set to English, as several modules are tied to its decimal system.
.NET Framework 4.5 or later needs to be installed. The .NET Framework is made freely available by Microsoft; a new computer should already have this requirement fulfilled. Nonetheless, if the .NET Framework is not detected during PatternLab’s installation, an attempt will be made to automatically install it through Microsoft’s website. The latest version, as of the time of this writing, is available from http://www.microsoft.com/en-us/download/details.aspx?id=42642.
Thermo Scientific MSFileReader should be installed in case the user wishes to work directly from the RAW instrument files. Instructions on obtaining this file are available from http://sjsupport.thermofinnigan.com/public/detail.asp?id=703.
Equipment setup
PatternLab setup
-
Go to the PatternLab home page at http://patternlabforproteomics.org and click on the ‘Download’ link. If the .NET Framework 4.5 or later is already installed on the computer, clicking on the ‘launch’ link will automatically install PatternLab; otherwise, click on the ‘Install’ button. After Patternlab is installed for the first time, its main screen will pop up (Figure 2).
CRITICAL Administrative access privileges are required for installation.
CRITICAL If PatternLab fails to install, you may need to update to .NET 4.5 or later. You can manually download and install the latest version of the .NET framework from Microsoft’s website.
Figure 2.

PatternLab’s main screen. The general PatternLab workflow is indicated by the order in which the pull-down menus appear. Generally, a target-decoy sequence database is prepared, searched with Comet, and filtered to achieve a given FDR using SEPro, or PepExplorer (in the case of de novo sequencing). The project is then organized in order to indicate which files belong to which biological condition. Downstream analysis is achieved by using the modules in the ‘Select and Analyze’ menus.
PROCEDURE
Generating a target-decoy sequence database
CRTICIAL A target-decoy sequence database must be generated prior to PSM.
Click on ‘Generate Search DB’ in the upper-left corner of the interface. The sequence database module will load (Supplementary Figure 1).
Select an input database file format (e.g., UniProt, NCBI, etc.). A generic format called ‘Identifier Space Description’ can be used for any FASTA file.
Choose the output database format. We strongly recommend using the target-decoy approach that automatically includes a reverse version of each sequence in the database (with a ‘Reverse_’ attached to the beginning of the identifier). The other formats are made available for very specific purposes of software benchmarking.
Check the ‘Include common contaminants in the Targets’ checkbox to include the sequences of 127 common contaminants to mass spectrometry (e.g., Keratins) at the beginning of the output sequence database.
Click on the ‘Browse’ button in the Input group box and select sequence databases that were downloaded from the Internet. More than one database can be selected by pressing the Ctrl key while clicking on the file names in the file selection window.
Click on the ‘Save as’ button in the Output group box and specify the name of the new database. A checkbox reading ‘Eliminate subset sequences’ is available for the elimination of sequences that meet a user-specified identity within other sequences in the database. When this happens, a note is appended to the remaining protein’s sequence description with a reference to the eliminated sequence. Specifying an identity below 100% will significantly increase the time for generating the database.
Press the ‘Go’ button to generate the new database. For proteogenomic studies, consider taking the extra measures described by Alexey Nesvizhskii60 so that the FDR is not underestimated. This is recommended.
Performing PSM with the integrated Comet search engine
-
8
Click on the ‘Search (Comet PSM)’ option from the main menu. The Comet graphical user interface will appear (Figure 3).
-
9
Indicate a directory containing Thermo RAW, MS2, mzXML, or mzML mass spectra files in the topmost textbox. The ‘Recursive Directory Search’ box must be checked for multiple directories to be searched.
-
10
Specify a target-decoy sequence database.
-
11
Specify a Precursor Mass Tolerance. We suggest using the default 40 ppm, even for high-resolution mass spectrometers as discussed in the introduction.
-
12
For species-specific databases, set the parameter ‘Enzyme specificity’ to ‘semi-specific’. This is recommended and will increase the search space and reduce the search engine speed. However, having an estimate of how many semi-tryptic peptides were obtained after a tryptic digest can shed light on how well the sample was digested. If the sample was significantly degraded, we expect more than 20% of the peptides to be semi-specific. Contrasting with this, samples with no more than 5% semi-specific peptides should be taken as having undergone almost no degradation (optional; see Box 1). We note that some degradation is always expected.
-
13
Specify the number of missed cleavages allowed. We recommend allowing up to two misses for standard shotgun proteomic searches.
-
14
Specify the ‘Fragment Bin Tolerance’, ‘Fragment Bin Offset’, and ‘Theoretical Fragment Ions’ parameters. For low-resolution MS2, as generally provided in a Thermo LTQ, we recommend setting these values to 1.0005, 0.4, and ‘M peak only’, respectively. For high-resolution MS2 such as provided by, say, a Thermo Q-Exactive, we recommend experimenting also with 0.02, 0, and ‘default peak shape’, respectively. This latter setting may slow the software significantly and, in our hands, has usually led to little improvement in the search results.
-
15
Post-translational modifications (PTMs) should be specified by clicking on the ‘Add Modification from Lib’ button, which makes the modification library window pop up (Supplementary Figure 2). To select one or more PTMs click on the corresponding row header, thus highlighting the entire row, then on the ‘Add selected row to my Search.xml’ button.
-
16
Optionally, new PTMs can be saved to the library. To do this, simply fill out the empty row (always the bottommost one) with the corresponding information and click on the ‘Update my Lib’ button.
-
17
Indicate whether the modification is variable, and which of the two termini it applies to, by checking the corresponding boxes. For example, if not all methionines in the sample are expected to be oxidized, then the modification should be checked as variable. However, for modifications that are expected in all occurrences of the amino acid, such as, say, carbamidomethylation of cysteine, leave the variable option unchecked. Figure 3 exemplifies a situation in which the iTRAQ 8-plex is to be considered as a fixed modification on the N-terminus and for the K and Y amino acids while variable oxidation is expected for the M amino acid.
-
18
For experiments making use of isobaric tags (e.g., iTRAQ or TMT), enter the m/z range that spans the reporter ions as a ‘Clear MZ Range’ option. This will have the software ignore the signal of these reporter ions when matching the theoretical spectra with the experimental one.
-
19
Click on the ‘Generate Comet Params’ button. The user will be transferred to the next tab, ‘Step 2: Verify and Execute’. The user should then simply click on the ‘Save Comet Params’ button, thus saving all search engine specifications in a text file in the search directory. We note that the contents of this file are made available in the upper section of the window, providing the experienced user with the possibility of manually altering the search engine specifications.
-
20
Click on the ‘Go!’ button. The user will be automatically transferred to the ‘Step 3: Monitor progress’ tab, which in turn is automatically updated as the search makes progress. Comet’s terminal screen will also pop up for each new search. The results files in the SQT format will be generated.
CAUTIONClosing the Comet pop-up terminal screen will terminate the search.
? TROUBLESHOOTING
Figure 3.

PatternLab’s Comet search engine graphical user interface.
BOX 1. ON ENZYMATIC SPECIFICITY.
The Comet search can be performed in the fully specific or semi-specific search spaces. Fully specific refers to considering only peptides originating from a complete digestion (i.e., with enzyme cleavage sites at both the C- and the N-terminus). Semi-specific makes Comet lift the constraint that both cleavage sites be present, allowing instead the presence of only one. For example, in the sequence R.APBCK.A, where ‘.’ denotes the occurrence of cleavage, selecting semi-specific will make Comet consider A, AP, APB, APBC, K, CK, BCK, and PBCK, in addition to APBCK. Otherwise (i.e., if fully specific is selected), the search space will be limited to APBCK.
Statistically filtering Comet results with SEPro
-
21
Load SEPro by clicking on the ‘Filter’ menu and then on ‘Search Engine Processor (SEPro – for PSM)’. SEPro’s entry screen will appear as in Supplementary Figure 3.
-
22
Copy and paste the directory containing the SQT files into the topmost textbox. This can also be achieved by clicking on the corresponding ’Browse’ button and navigating to the directory. If the corresponding directory contains a comet.params file, then SEPro will automatically detect the path to the sequence database and fill out the next textbox (Protein DB).
-
23
Choose from one of SEPro’s default filtering parameter configurations. For this, click on one of the appropriate radio buttons in the lower panel, ‘High Resolution MS1’ or ‘Low Resolution MS1’. Regardless, all SEPro parameters, as described in the 2012 protocol41, can be set and are readily available by clicking on the ‘Advanced parameters’ tab.
CRITICAL STEP The ‘High Resolution MS1’ mode is advised for data from instruments that provide less than 20 ppm for MS1 and more than 20k resolution. For example, if an Orbitrap was used to obtain MS1 and an LTQ to obtain the MS2, then the ‘High Resolution MS1’ option should be chosen; this configuration is also suitable for instruments that provide high-resolution MS2, such as a Q-Exactive HF. The ‘Low Resolution MS1’ mode is recommended when all data are obtained, say, on an LTQ-Velos (Thermo, San Jose).
-
24
Check the ‘Include MS2 in results’ box in case inclusion of the mass spectra of the identified peptides in the report is desired. This will enable the feature of double-clicking on an identification, and thus the spectrum browser to be opened.
CRITICAL STEP In case the experiment uses isobaric tags for downstream relative quantitation, checking this option is required.
-
25
Select the ‘Experiment with more than 50k spectra’ option in case it is estimated that there are about 50,000 or more mass spectra in the data; such volume is easily obtained when performing MudPIT experiments or using last generation instruments (e.g., Orbitrap Elite) with long (e.g, 3-or-more-hour) gradients. This will make SEPro group identifications according to precursor charge state and enzymatic status (i.e., fully specific and semi-specific) in order to generate discriminatory functions that are independent of both charge state and enzymatic status.
-
26
Click on the ‘Go’ button. The user will be redirected to the ‘Follow up’ tab, where the tool’s progress is reported.
? TROUBLESHOOTING
-
27
When the tool finishes processing, click on the ‘Result Browser’ tab to have access to the results (Supplementary Figure 3 and Figure 4).
-
28
Save the results by accessing the ‘File’ menu and then choosing ‘Save SEPro results’. Note that many formats are made available other than SEPro’s own; for example, one can save in the DTASelect format12 or in a tab-delimited file for opening with spreadsheet software.
CRITICAL STEP In case the user performed a ‘Batch Processing’ by checking the corresponding box in the entry page, the SEPro results files will be automatically saved to their corresponding directories. Batch processing is useful when there are several directories lying directly one level below a main directory; in this case, the user needs only to specify the path to filter the main directory, select the batch processing option, and press the ‘Go’ button. Figure 4 shows SEPro’s graphical user interface while browsing through filtered results.
-
29
(Optional) A frequent community request has been for the user to be able to concatenate the results of several SEPro files. To do this, place the desired files in the same directory, select the option ‘SEPro Fusion’ from the ‘Tools’ menu, then click on the ‘Save new SEPro file’ button in the pop-up window. A new SEPro file will be generated that joins the data from all the SEPro files pertaining to that directory.
Figure 4.

SEPro’s Result Browser. SEPro provides a dynamic report that can be sorted according to any column. The top panel lists protein identifications, and clicking on any one of them causes the lower panel to display all matches associated with the corresponding protein, together with their respective scores. Double-clicking on a protein result brings up a window (in the upper-right corner) displaying a graphical coverage representation, a FASTA coverage representation, and a group view (i.e., other proteins that share peptides). By double-clicking on a row in the lower panel, the annotated mass spectrum pops up. The lower-left corner displays one of the many new features in PatternLab for proteomics 4.0; clicking on the ‘Tools’ menu and then on ‘Evaluation of Enzyme Specificity’ will display a window informing how many fully specific, semi-specific, and non-specific peptides were identified in the mixture.
Quantitation analysis using spectral counting/XIC or analysis of multiplex experiments with isobaric tags
-
30At this point in the procedure, it is possible to choose option A for quantitation analysis by spectral counting, B for XIC, or C to analyze experiments using isobaric labels. Once this step is finalized, downstream data analysis involving differential proteomics (Box 3), scoring phosphopeptide sites (Box 4), or analyzing results under the light of the Gene Ontology (Box 5) become possible.
- Quantitation analysis with spectral counting
- Project organization: Click on the ‘Project Organization’ menu, then on the ‘SEPro or PepExplorer’ button. The interface will look like that of Figure 5.
- Include a brief (about 10 words) description of the experiment in the Project Description textbox.
- Click on the ‘Load’ button.
- To obtain Spectral counting data for downstream analysis, click on the ‘Step 2: Spectral Counting’ tab. There you can optionally select for NSAF52 normalization, and choose whether the quantitation will be mapped at the peptide or protein level. Then, click on the ‘Go’ button, followed by the ‘Save PatternLab project’ button.
- Optionally, map spectral counts to protein domains by selecting the ‘Step 2: Differential Domain Expression’ tab. This tab offers controls that enable generating a PatternLab project file as previously described32.
- (B) Quantitation analysis with XIC
- Follow Steps 30A i – iv.
- Click on the ‘Step 2: XIC Analysis’ tab if XICs are to be obtained. This tab offers controls that will ultimately produce an XIC file, viewable within PatternLab’s XIC Browser module, available through the ‘Quant’ menu by selecting ‘XIC Browser’. The XIC Browser module is then used to generate a PatternLab project file as described in the Using PatternLab’s XIC Browser section.
- Click on the ‘Quant’ menu and then select ‘XIC Browser’.
- Click on ‘File’, then on ‘Load’ and ‘Bin’ to load an xic file generated using PatternLab’s Project Organizer. This is a binary file by default, yet the XIC Browser allows files to be saved in the JavaScript Object Notation (JSON), which is a lightweight text-data-interchange format that simplifies the parsing by other software.
- Review the list of cross-experiment identified peptides that will appear as soon as the file finishes loading. Note that each column will be named after a search file (e.g., SQT) and list the XIC values for each peptide. Double-click on an XIC value to open an XIC plot together with a table discriminating the plotted values, as exemplified in Figure 6. The table discriminating the quantitation values can also be copied and the values pasted onto some spreadsheet software.
- Click on the ‘Graphical Analysis’ tab to view a histogram of the label-free quantitation values for all peptides (Supplementary Figure 4). Note that many experiments can be simultaneously assessed.
- Optionally, use the XIC Browser to reduce the effects from undersampling. Undersampling is a common problem in proteomics, since not all peptides are sampled by the mass spectrometer. The XIC Browser can help with this limitation by relying on the retention times and precursor masses of peptides identified in a run to estimate the XIC of a peptide in another run, one in which that peptide was not sampled. To accomplish this, first click on the ‘Completion’ tab; a list of all LC/MS/MS runs in the experiment will be provided in one column, along with another column to which the user can input a number for each run. Label the runs that should be grouped for inferring XICs by placing the same number beside each one (Figure 7). Finally, click on ‘Filter’ and then on ‘Fill in the gaps’. The new XICs, completed by using the retention times and the precursor masses of peptides identified in compatible runs, will be listed in the XIC Browser in green. Identifications with no XICs, or XICs not passing a minimum quality criterion, will have values of -1 and be listed in red.
- The same peptide is usually identified through different charge states and consequently with different precursor m/z values. The XIC Browser makes available an option, through the ‘Filter’ menu and then by selecting ‘Retain Optimum Signal’, for only the best (higher-value) XICs for a given charge state to be retained. So for example, if in general the charge-(+2) peptide precursors for a given peptide have XIC values greater than their charge-(+3) counterparts, then all XICs from the latter version of that peptide will be discarded. Arguably, by considering only the more intense XIC versions of the peptide, less noise gets into the model and a more accurate relative quantitation can be obtained (data not shown).
- Click on the ‘File’ menu, followed by ‘Save’ and then by ‘PatternLab project file’ to generate a PatternLab project file for downstream analysis.
-
(C) Analyzing multiplex experiments labeled with isobaric tagsCRITICAL SEPro files to be analyzed with the ‘Isobaric module’ must have been processed using the ‘include MS2 in results’ option.
- Click on the ‘Quant’ menu, then on ‘Isobaric Analyzer’.
- If data were acquired according to the MultiNotch approach, extract the MS3 data from the RAW file. For this, click on PatternLab’s ‘Utils’ menu, select the RawReader module, then check the MS3 checkbox and the directory containing the mass spectra raw files, and click on the ‘Go’ button. We note that this step can also be accomplished by any software capable of extracting MS3, such as RawExtractor, for example, made available at http://fields.scripps.edu/researchtools.php59. Once done, click on the ‘MultiNotch’ tab, specify the path to the SEPro file and to the MS3 directory, and click on the ‘Go’ button. This procedure will patch the SEPro file to include the MS3 data from the reporter ions so that downstream analysis can be performed.
- (Optional) Remove multiplexed tandem mass spectra from the dataset. This step is recommended for data not acquired using MultiNotch. For this, execute YADA20 with its default configuration on the MS1 and MS2 extracted files. This will generate a corrected batch of MS2 files where the multiplexed MS2 data have their multiple precursors indicated in the spectrum heading. Then, back in PatternLab’s Isobaric Analyzer module, specify the YADA output directory; multiplexed spectra will no longer be considered.
- Specify the reporter ion masses in the third textbox from the top; predefined masses can be automatically filled in by pressing the ‘iTRAQ 4’, ‘TMT6’, or ‘iTRAQ8’ buttons.
- Specify a data normalization strategy; we strongly recommend using the ‘Channel Signal’ normalization (default). This normalization adds up the signals of all spectra for each channel (i.e., isobaric marker), and the normalized values for each spectrum are obtained by dividing each reporter ion signal by the corresponding channel’s sum.
- (Optional) Check the ‘Apply purity correction’ box to correct for the distortions inherent to isobaric tags. These are not 100% pure and therefore come with a datasheet per batch indicating for each reporter ion reagent the percentages by which its mass differs from the quoted mass by -2, -1, +1, and +2 Da. This enables PatternLab to use Cramer’s rule to account for and correct such distortions. If the purity correction numbers provided by the manufacturer differ from those provided in the Isobaric Analyzer’s ‘Purity Correction’ tab, manually alter the values in the software to reflect those provided by the manufacturer. This correction tends to yield very subtle improvements, particularly when compared to the normalization of Step 49.
- Click on the ‘Generate Report’ button. This will generate a text file discriminating each peptide contained in the SEPro results, together with its spectral count and redundancy (i.e., how many proteins in the database it matches), followed by the scan numbers and the corresponding normalized TMT or iTRAQ signals in each channel. PatternLab’s screen will look like the one in Supplementary Figure 5.
- Generate a PatternLab project file by clicking on the ‘PatternLab project file’ radio button, and then on the ‘Generate Report’ button. This file is useful when analyzing experiments with more than two biological conditions.
- Comparing isobaric tag results from different channels: Click on the ‘Two conditions experiment’ button; a new window will pop up.
- Specify the ‘Class labels’ parameter for each channel. As this is a pairwise comparison, only 1 and 2 should be used as labels. In case a channel is not to be included in the statistics, it should be labeled as -1. So for example, if an iTRAQ 8-plex experiment was carried out, channels 1, 2, and 3 are related to biological condition 1 (i.e., class 1), 5, 6, and 7 to class 2, and 4 and 8 are not related to the experiment, then the class labels should be 1, 1, 1, -1, 2, 2, 2, and -1, respectively.
- Click on the ‘Browse’ button and select the Peptide Quantitation Report generated in the previous section.
- Press the ‘Go’ button. The software will load the report and then automatically switch to the next tab, ‘Result Browser’, and display results as in Figure 8.
-
Specify values for the parameters given in the following table.
Parameter Description Only unique peptides Makes the software consider only peptides that map to one protein in the sequence database. No. Peptides For example, setting this to 2 means only proteins that have 2 or more peptides will be considered in the analysis. Peptide Log Fold Change Cutoff Establishes a lower boundary on the absolute value of the natural logarithm of peptides’ fold changes. Peptides falling below the bound will be eliminated. Peptide p-value Cutoff Peptides whose paired t-test or binomial p-value does not fall below this cutoff will be eliminated. Corrected p-value for q Allows the user to control the theoretical false-discovery rate by specifying a q-value. A corrected p-value is calculated according to the Benjamini–Hochberg procedure. - Click on the ‘File’ menu, then on ‘Export Protein Results’, to export the filtered proteins, together with information on the corresponding peptides, to a text file.
- Click on the ‘Peptide Browser’ tab to review the list of identified peptides. Recall that peptides appearing only in one biological condition achieve low binomial p-values. The paired t-test p-value, on the other hand, indicates whether the peptide achieved a statistical change in the mean of its reporter ions when comparing the two biological conditions.
- Click on the ‘Peptide Distribution’ tab to view a volcano plot at the peptide level. Green circles stand for peptides having a higher abundance in condition 1, red circles in condition 2. The gray translucid circles stand for peptides that did not pass the user-specified criteria. Hover the mouse over a circle to review the pop-ups discriminating the corresponding peptide sequence, fold change, and p-value. An iTRAQ 8-plex example dataset is available for practice. It can be downloaded and the results obtained on it can be compared against those provided on PatternLab’s website.
BOX 3. DIFFERENTIAL PROTEOMICS USING THE ACFOLD/TFOLD/VENN DIAGRAM MODULES/PCA.
Once a PatternLab project file is generated, the ACFold/TFold61 and area-proportional Venn Diagram modules can be used for pinpointing differentially expressed proteins and proteins exclusive to a biological condition, respectively. Other modules for performing ANOVA, PCA (Búzios), and for analyzing time-course experiment data (TrendQuest) are also available.
These modules are all demonstrated in a supplementary video and have been described in our previous protocols, so we refer the reader to them41. Notwithstanding this, we note that these modules’ previous versions required the use of the ‘index.txt’ and ‘sparseMatrix.txt’ files to store all the identification and quantitation data of the experiment. In the current version they were replaced by a single PatternLab project file, generated in the Project Organization module as explained in Box 2. PatternLab for proteomics 4.0 provides a tool for migrating the legacy format to the updated PatternLab project file in the ‘Utils’ menu.
BOX 4. SCORING PHOSPHOPEPTIDE LOCALIZATIONS WITH THE XD SCORING MODULE.
Confidently determining phosphorylation sites is crucial to understanding the regulatory mechanisms in biological systems. PatternLab for proteomics 4.0 includes a false-localization rate (FLR) probabilistic module, termed XD Scoring, that enables unbiased phosphoproteomics studies25. Briefly, the XD Scoring algorithm infers a probabilistic function from the distribution of the identified phosphopeptides’ XCorr Delta scores (XD-Scores) and provides p-values by relying on Gaussian mixture models and a logistic function.
For a mass spectrum whose top scoring candidate is a phosphopeptide, the XD Score is calculated as the difference between the top two XCorr scores of alternative phosphorylation sites in the same peptide sequence. In this regard, for this module to work efficiently, we recommend having the search engine report at least the top 20 scoring candidates in its search results. When using the Comet search in PatternLab, this amounts to editing the line that starts with ‘num_output_lines = ‘ to indicate 20, after clicking on the ‘Generate Comet Params’ button.
Access the XD Scoring module by clicking on the ‘Utils’ menu and then on ‘XD Scoring (Phosphosite)’.
Click on the ‘Load SQT files’ button and select the Comet results files by pressing and holding the Ctrl key while left-clicking on the desired search results files.
Click on the ‘Calculate’ button. A list containing the logarithms of the Delta scores for all phosphopeptides will appear in the lower textbox.
Click on the ‘Generate GMM’ button. This will enable PatternLab to generate a Gaussian mixture model whose two Gaussians come from a histogram on the natural logarithms of the XD-Score. At the bottom of the interface, a green curve shows the cumulative distribution of the green Gaussian and a red curve shows the complementary cumulative distribution of the red Gaussian (Supplementary Figure 6). A complementary logistic function is then generated based on the former two distributions (purple curve). The desired p-values are given by this function.
Specify a SEPro file; this enables the program to output a table associating a p-value to each site attribution.
BOX 5. THE GENE ONTOLOGY EXPLORER.
The Gene Ontology Explorer (GOEx) allows users to analyze their data under the light of the Gene Ontology; this module has been well documented21,40. In order to analyze the data, a ‘precomp’ object must be generated; this is done by joining the Gene Ontology OBO file (available at http://geneontology.org/page/download-ontology) with an annotation file. Our original version worked only with annotation files provided at the Gene Ontology website, but the updated GOEx module can work with any organism available in the UniProt base. As this has been the only update to this module, what follows pertains exclusively to the steps for generating a precomp file using UniProt.
Download the data for the desired organism from UniProt as previously described, but instead of selecting the FASTA format, choose the Text format.
Download the latest Gene Ontology OBO file.
Access the Gene Ontology by clicking on the ‘Analyze’ menu and then on ‘GOEx (Gene Ontology Explorer)‘. The GOEx interface will appear.
Click on the ‘Load GO DAG’ button and select the GO.OBO file. This will cause GOEx to perform some optimizations that should take about 2 minutes.
Click on the ‘Load Associations’ button; a window will pop up. The new option for using UniProt text files will be available and selected by default.
Click on the ‘Browse for conversion file’ button and load the file downloaded from UniProt.
Click on the ‘Save Precomp’ button. The next time a GO analysis is performed, instead of having to repeat all these steps, the user can proceed directly to loading the precomp file by clicking on the ‘Load precomp’ button.
Refer to the previous publications on GOEx36,39 for a complete set of instructions for operating this module.
Figure 5.

PatternLab’s Project Organizer. This module is responsible for joining the information of the various biological/technical replicates from all biological conditions. Directories containing results filtered by SEPro should be indicated for each biological condition.
BOX 2. PROJECT ORGANIZATION.
One of the goals of proteomics is the study of differences in protein expression throughout different biological states. Others include analyzing time series data or samples originating from different tissues. In this regard, PatternLab must be informed which samples come from which biological condition or point in time. The Project Organization module deals with this matter. For example, suppose that one performed a 5-point time course experiment with three biological replicates at each point. Data were acquired using 12-step MudPIT and now the user wishes to perform relative quantitation by spectral counting. This hypothetical experiment would encompass a total of 180 LC/MS files. These files would need to be arranged in directories as follows. First, a directory for each time point would need to be created; for this example, say, T0, T1, T2, T3, and T4. Within each directory, directories for each biological replicate would also need to be created, so for example, within the T0 directory we would create the directories T0B1, T0B2, and T0B3. (We urge the user not to provide simplified names as, say, only B1, because this same name might ambiguously refer to B1 in directory T1 and some modules of PatternLab require each directory to have a unique name.) Finally, within T0B1, for example, the RAW files, SQT files, and the sepr file would be placed. We note that this organization can also be arranged prior to using Comet; in this way, only the main directory would need to be provided and PatternLab would have Comet search within each directory (consequently making the SQT files already appear in the corresponding directories). Likewise, SEPro can perform batch filtering if the main directory is provided. Structuring the files as described enables PatternLab to ultimately compile a PatternLab project file, containing cross-experiment identification and quantitation data, these in turn required for downstream analysis. During the next steps of the protocol, the user should decide if quantitation should be performed by spectral counting, by XICs, or through reporter ion signals provided by isobaric markers. While the latter originates from sample preparation, the former two remain an open choice; we recommend using spectral counting for MudPIT experiments and XICs for single shots.
Figure 6.

PatternLab’s XIC Browser. By clicking on the XIC values (blue numbers), a window displaying the corresponding XIC plot will pop up.
Figure 7.

The XIC Browser’s completion tab allows for establishing rules for grouping files that can be used to search for m/z and chromatographic retention times of possibly under-sampled peptides. Search results originating from each biological condition have their column header in a different color to facilitate the process. In this example, the user labeled runs from biological conditions 1 and 2 with ‘1’ and ‘2’, respectively, as indicated by the blue arrows. This will make the software use as references only files with the same labels to try and complete the XICs of underdamped peptides.
Figure 8.

Result Browser for PatternLab’s Isobaric Analyzer, Two Conditions Experiment. Panel A displays the main view when browsing results. The top section displays controls that allow the user to dynamically filter acceptable results according to: only unique peptides, only peptides that present an absolute fold change greater than a specified log fold change value, peptides with a binomial or paired t-test p-value lower than a given cutoff, and finally, only proteins containing at least a user-specified number of peptides satisfying these constraints. In what follows, the software reports the total number of peptides identified in the experiment and how many mass spectra, peptides, and proteins abide by the cutoff values. The software also suggests a p-value cutoff at the protein level (corrected p-value) based on the Benjamini-Hochberg procedure. The middle panel displays the protein identifications and various details. For example, we highlight the ‘StouffersPValue’ column, which represents a meta-analysis of the p-values of the various peptides belonging to that protein as to whether the protein can be considered as presenting a differential abundance or not. Another highlighted column is ‘Coverage’, where green sections represent identified peptides with a higher abundance in condition 1, red for condition 2, and gray sections for peptides not satisfying the user-established criteria. When clicking on a protein row, the lower panel refreshes to provide details, at the peptide level, for that protein. Double-clicking on a peptide row in the lower panel causes a window to pop up displaying the reporter ion signals for each pertinent mass spectrum, as exemplified in panel B.
TIMING
Generating a target-decoy sequence database (Steps 1–7)
This step usually takes 5 seconds of computing time. However, when the ‘Eliminate subset sequences’ option is selected, time quickly scales up to minutes or even hours, growing quadratically with the number of sequences in the database. For the RefSeq H. sapiens database (20,247 sequences), selecting this option led to approximately 2 minutes for the step to complete.
Performing PSM with the integrated Comet search engine (Steps 8–20)
By far, the most time-demanding step is the search itself (Step 20). Search time can range from a few minutes up to more than a day, varying mostly with sample complexity, the number of variable PTMs considered, the mass spectrometer used, LC gradient length, etc., as well as the computer’s processor. We exemplify the computational burden of an iTRAQ 8-plex experiment obtained from human biopsies of gastric cancer; two fractions of HILIC were obtained and each analyzed using a two-hour RP chromatography coupled online to an Obritrap Velos. This example dataset and sequence database are made available on PatternLab’s website as an exercise to certify that one can reproduce our results as indicated. The search, considering only the fixed modifications of carbamidomethylation of cysteine, and the iTRAQ 8 modification at the N-terminus and at the K and Y residues, took 1035 seconds on our 24-core (2 × X5675 Xeon) server. All other steps happen almost instantaneously (30 seconds at most), but users will want to spend time on the modules to assess results (e.g., browse through the list of identified proteins and the annotated spectra, experiment with the Gene Ontology, etc.).
Statistically filtering Comet results with SEPro (Steps 21–29)
Filtering time can vary greatly according to experimental design and number of spectra. It is expected to fall somewhere near 30 seconds for a typical two-hour LC/MS/MS analysis acquired on an Orbitrap Velos.
Quantitation Analysis with spectral counting (Step 30, option A)
Computing time should be of about 20 seconds per SEPro file, assuming each file originated from a typical two-hour LC/MS/MS analysis acquired on an Orbitrap Velos.
Quantitation Analysis with XIC (Step 30, option B)
Computing time should be of about 30 – 40 second for each mass spectrum raw file, assuming each file originated from a typical two-hour LC/MS/MS analysis acquired on an Orbitrap Velos.
Analyzing multiplexed experiments labeled with isobaric tags (Step 30, option C)
Computing time should be of about 20 – 50 second for each mass spectrum raw file, assuming each file originated from a typical two-hour LC/MS/MS analysis acquired on an Orbitrap Velos.
Differential proteomics (Box 3)
This step usually takes less than 3 seconds of computing time for any of modules.
Scoring phosphopeptides (Box 4)
The overall computing time is of about 35 seconds.
Setting up the Gene Ontology Explorer module to analyze the experiment at hand (Box 5)
Generating or loading a precomp file can take about 5 minutes. Computing time for exploring one’s data is practically negligible.
TROUBLESHOOTING
For troubleshooting advice, please consult table 1. If you require help for anything not covered in this protocol, describe the problem in our PatternLab Google group, made available through the project’s website at http://patternlabforproteomics.org or through the ‘Help’ menu in the graphical user interface by clicking on ‘Troubleshooting and user forum’.
Table 1.
Troubleshooting table
| Step | Problem | Possible reason | Possible solution |
|---|---|---|---|
| Step 20 | Comet tries to read Thermo RAW files and displays the message: ‘Retrieving the COM class factory for component with CLSID failed due to the following error: 80040154 Class not registered.’ | The MSFileReader lib is not installed. | Install the MSFileReader, available from Thermo’s website. |
| Step 26 | The message ‘Not enough spectra in decoy or target class to make robust statistic. ANALYSIS WILL BE DISCONTINUED.’ | There are not sufficient decoy peptide/spectra. | Disable the options ‘Group by charge state’ and/or ‘Group by enzymatic no termini’ in SEPro’s advanced parameter tab. |
| Box 3 | There are results from previous versions of PatternLab (i.e., index and sparse matrix) that cannot be opened in the current version. | Results must be upgraded to the new PatternLab project file. | Use the module ‘IndexSparseMatrixLegacy’ available in the ‘Utils’ menu. |
ANTICIPATED RESULTS
PatternLab for proteomics 4.0 is the culmination of the interaction between our group and the proteomics community since 2008. It has been tested on millions of spectra by various groups and aided in the research of a wide range of biological questions. Indeed, PatternLab’s goal has been to help scientists identify, quantitate, and attempt to make sense of the thousands of proteins identified by shotgun proteomics in order to ultimately make a difference in the understanding of biological processes62,63. The present protocol emphasizes only the new features and major changes, including some modules that were replaced with completely redesigned substitutes. For example, PatternLab’s new Project Organizer replaces the former ‘Regrouper’, doing away with the ‘index.txt’ and ‘SparseMatrix.txt’ files and introducing the PatternLab project file instead, this one used by many modules for performing quantitative proteomic analyses. The current version also includes a tool, accessible through the ‘Utils’ menu, that allows upgrading the legacy format to the new one. Additionally, the SEProQ functionalities (XIC and Isobaric Browser) were significantly upgraded and are now integrated into the same graphical user interface. New modules, such as PepExplorer, whose functionality is similar to that of SEPro but for de novo sequencing33, and the XD Scoring system for phosphopeptide localization, are also part of the new version.
Some representative works illustrating the types of results that can be expected from this protocol are the following. Webb et al. used PatternLab to analyze data originating from an online two dimensional liquid chromatography separation consisting of 39 strong cation exchange steps followed by a short 18.5 min reversed-phase gradient64. This large-scale data generation approach enabled the identification of 4269 proteins from 4189 distinguishable protein families from yeast during log phase growth. In this study, PatternLab’s T-Fold module was used to pinpoint differentially abundant proteins, according to spectral counting, during the yeast cellular quiescence, thus providing an overview of most of the yeast proteome. The works from Christie-Oleza et al. constitute another example where PatternLab and spectral counting were used to pinpoint differentially abundant proteins, this time comparing marine bacteria under several natural conditions65,66. Aquino et. al. used PatternLab’s XIC module to explore the proteomic landscape of a gastric tumor biopsy5. In the latter, the biopsy was sectioned into ten parts and each subjected to MudPIT analysis; the authors identified several proteins whose abundance gradually increases/decreases as a function of the distance to the center of the tumor. Chaves et. al. used PatternLab’s Isobaric Analyzer module to analyze TMT data from aging soleus and extensor digitorum longus rat muscles, disclosing quantitative data for more than 4000 proteins67. Finally, Shah et al. used PatternLab’s TrendQuest module to group protein expression profiles of J. curcas seeds during five developmental stages68.
One should always be able, when following a protocol, to reproduce previous results. In order to help make sure that this is case, PatternLab’s project website (patternlabforproteomics.org) makes available, through its download tab, previously analyzed datasets whose download and reanalysis we recommend strongly to those using PatternLab for the first time. All intermediate files, acquired step by step along the protocol, are also available. The new user can then practice with the protocol to reproduce our results. Figure 9 exemplifies good results provided by PatternLab’s Isobaric module on data acquired using the MultiNotch approach on TMT labeled peptides analyzed using an Orbitrap Fusion (Thermo, San Jose). This is so because peptides (dots) are evenly distributed along the y-axis and assume a disposition similar to the eruption of a volcano, thus constituting a so-called volcano plot.
Figure 9.

PatternLab’s Isobaric Analyzer. The screenshot shows the result of an analysis. Each dot represents a peptide that is mapped according to its Log fold change (y-axis) and its differential abundance p-value (x-axis). Peptides colored in green or red are those that satisfied user-specified cutoff criteria for fold-change and p-value.
As with any software pipeline or even individual scientist, it is the feedback from collaborators and other peers that drives improvement. In the case of PatternLab, all the feedback on its features, or with suggestions and even bug fixes, have by far been the most important asset we could count on, helping our suite of tools become more and more sophisticated and hopefully ever closer to supporting answers to questions that were previously intangible. In this regard, we look forward to receiving user feedback through the newly-created forum, so we can continue to improve on this community-driven and freely available tool.
Supplementary Material
Supplementary Figure 1. PatternLab’s target-decoy sequence database generation module. This module provides options for parsing data from UniProt, NCBI, IPI, and a Generic Format. The module can automatically include the sequences of 127 common contaminants to proteomics and simplify datasets by eliminating subset sequences or sequences having an identification threshold above a given user specification. In these cases, a note is appended to the description of the remaining sequence to indicate the eliminated sequence(s).
Supplementary Figure 2. The modification library window. New modifications can be included by typing the data in the corresponding cells and then clicking on the ‘Update my lib’ button. Modifications can be included in the search by selecting the desired rows and then clicking on the ‘Add selected row to my search.xml’ button.
Supplementary Figure 3. SEPro’s Entry Screen. PatternLab for proteomics 4.0 makes available preset configurations for filtering results from high-resolution and low-resolution MS1 acquisitions. Regardless, all SEPro filtering parameters are made available in the ‘Advanced Parameters’ tab.
Supplementary Figure 4. A histogram of minus the logarithm of the label-free quantitation values for all the XICs obtained by simultaneously analyzing 26 3-hour LC/MS/MS shotgun proteomic experiments on an Orbitrap Elite (Thermo, San Jose).
Supplementary Figure 5. PatternLab’s Isobaric Analyzer. The lower-right panel contains three plots. The topmost one shows the total signal, obtained only from identified spectra of a given run, for each isobaric marker before normalization. The middle one shows the signals after applying the Channel Signal normalization. The bottommost plot shows the total signal, obtained from all mass spectra of a given run, regardless of identification status, for each isobaric marker before normalization.
Supplementary Figure 6. PatternLab’s XD Scoring module. This module relies on the delta XCorr distribution to fit a Gaussian mixture model that ultimately results in p-values for the phosphosites.
Acknowledgments
We thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação do Câncer, Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ) for its BBP grant, and Programa de Apoio à Pesquisa Estratégica em Saúde da Fiocruz (PAPES VII). JRY acknowledges funding from the National Institutes of Health (P41 GM103533, R01 MH067880, and R01 MH100175) and the UCLA/NHLBI Proteomics Centers (HHSN268201000035C). JJM acknowledges NIH Research Resources (5P41RR011823) and the National Institute of General Medical Sciences (8 P41 GM103533).
Footnotes
AUTHOR CONTRIBUTIONS
PCC, JRY, and VCB have participated in the PatternLab project since its beginning in 2008. DBL participated in updating features from several modules and the graphical user interface, as well as in helping migrate to the new PatternLab project file format. FVL developed the PepExplorer module together with PCC. MMD developed several functions in PepExplorer and had a major participation in the development of the isobaric quantification module. JSGF, PFA, and JJM have been participating in PatternLab since early versions by continuously performing beta testing, pointing out required features, and providing suggestions on how to make the software more user-friendly. PCC and DBL created the supplementary video. PCC and VCB wrote the manuscript. All authors read and approved the manuscript.
COMPETING INTERESTS
The authors declare to have no competing financial interests.
References
- 1.Hebert AS, et al. The one hour yeast proteome. Mol Cell Proteomics MCP. 2014;13:339–347. doi: 10.1074/mcp.M113.034769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Yates JR. Mass spectrometry and the age of the proteome. J Mass Spectrom JMS. 1998;33:1–19. doi: 10.1002/(SICI)1096-9888(199801)33:1<1::AID-JMS624>3.0.CO;2-9. [DOI] [PubMed] [Google Scholar]
- 3.Zhang B, Chambers MC, Tabb DL. Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J Proteome Res. 2007;6:3549–3557. doi: 10.1021/pr070230d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hwang SI, et al. Systematic characterization of nuclear proteome during apoptosis: a quantitative proteomic study by differential extraction and stable isotope labeling. Mol Cell Proteomics MCP. 2006;5:1131–1145. doi: 10.1074/mcp.M500162-MCP200. [DOI] [PubMed] [Google Scholar]
- 5.Aquino PF, et al. Exploring the proteomic landscape of a gastric cancer biopsy with the shotgun imaging analyzer. J Proteome Res. 2014;13:314–320. doi: 10.1021/pr400919k. [DOI] [PubMed] [Google Scholar]
- 6.Calvete JJ, Sanz L, Angulo Y, Lomonte B, Gutiérrez JM. Venoms, venomics, antivenomics. FEBS Lett. 2009;583:1736–1743. doi: 10.1016/j.febslet.2009.03.029. [DOI] [PubMed] [Google Scholar]
- 7.Valente RH, Dragulev B, Perales J, Fox JW, Domont GB. BJ46a, a snake venom metalloproteinase inhibitor. Isolation, characterization, cloning and insights into its mechanism of action. Eur J Biochem FEBS. 2001;268:3042–3052. doi: 10.1046/j.1432-1327.2001.02199.x. [DOI] [PubMed] [Google Scholar]
- 8.Eng JK, McCormack AL, Yates JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5:976–989. doi: 10.1016/1044-0305(94)80016-2. [DOI] [PubMed] [Google Scholar]
- 9.Washburn MP, Wolters D, Yates JR., 3rd Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol. 2001;19:242–247. doi: 10.1038/85686. [DOI] [PubMed] [Google Scholar]
- 10.Köcher T, Pichler P, Swart R, Mechtler K. Analysis of protein mixtures from whole-cell extracts by single-run nanoLC-MS/MS using ultralong gradients. Nat Protoc. 2012;7:882–890. doi: 10.1038/nprot.2012.036. [DOI] [PubMed] [Google Scholar]
- 11.Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 2002;74:5383–5392. doi: 10.1021/ac025747h. [DOI] [PubMed] [Google Scholar]
- 12.Cociorva DL, Tabb D, Yates JR. Validation of tandem mass spectrometry database search results using DTASelect. Curr Protoc Bioinforma Ed Board Andreas Baxevanis Al. 2007 doi: 10.1002/0471250953.bi1304s16. Chapter 13, Unit 13.4. [DOI] [PubMed] [Google Scholar]
- 13.Ross PL, et al. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics MCP. 2004;3:1154–1169. doi: 10.1074/mcp.M400129-MCP200. [DOI] [PubMed] [Google Scholar]
- 14.Oda Y, Huang K, Cross FR, Cowburn D, Chait BT. Accurate quantitation of protein expression and site-specific phosphorylation. Proc Natl Acad Sci U S A. 1999;96:6591–6596. doi: 10.1073/pnas.96.12.6591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Carvalho PC, Hewel J, Barbosa VC, Yates JR., 3rd Identifying differences in protein expression levels by spectral counting and feature selection. Genet Mol Res GMR. 2008;7:342–356. doi: 10.4238/vol7-2gmr426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Liu H, Sadygov RG, Yates JR., 3rd A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem. 2004;76:4193–4201. doi: 10.1021/ac0498563. [DOI] [PubMed] [Google Scholar]
- 17.Neilson KA, et al. Less label, more free: approaches in label-free quantitative mass spectrometry. Proteomics. 2011;11:535–553. doi: 10.1002/pmic.201000553. [DOI] [PubMed] [Google Scholar]
- 18.Shevchenko A, Valcu CM, Junqueira M. Tools for exploring the proteomosphere. J Proteomics. 2009;72:137–144. doi: 10.1016/j.jprot.2009.01.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Beausoleil SA, Villén J, Gerber SA, Rush J, Gygi SP. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat Biotechnol. 2006;24:1285–1292. doi: 10.1038/nbt1240. [DOI] [PubMed] [Google Scholar]
- 20.Carvalho PC, et al. YADA: a tool for taking the most out of high-resolution spectra. Bioinformatics. 2009;25:2734–2736. doi: 10.1093/bioinformatics/btp489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Keller A, Eng J, Zhang N, Li X, Aebersold R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol. 2005;1(2005):0017. doi: 10.1038/msb4100024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Deutsch EW, et al. Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. Proteomics Clin Appl. 2015 doi: 10.1002/prca.201400164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kohlbacher O, et al. TOPP–the OpenMS proteomics pipeline. Bioinforma Oxf Engl. 2007;23:e191–197. doi: 10.1093/bioinformatics/btl299. [DOI] [PubMed] [Google Scholar]
- 24.Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008;26:1367–1372. doi: 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
- 25.Cox J, et al. A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat Protoc. 2009;4:698–705. doi: 10.1038/nprot.2009.36. [DOI] [PubMed] [Google Scholar]
- 26.Carvalho PC, Fischer JSG, Chen EI, Yates JR, Barbosa VC. PatternLab for proteomics: a tool for differential shotgun proteomics. BMC Bioinformatics. 2008;9:316. doi: 10.1186/1471-2105-9-316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.MacLean B, et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinforma Oxf Engl. 2010;26:966–968. doi: 10.1093/bioinformatics/btq054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Giardine B, et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005;15:1451–1455. doi: 10.1101/gr.4086505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Boekel J, et al. Multi-omic data analysis using Galaxy. Nat Biotechnol. 2015;33:137–139. doi: 10.1038/nbt.3134. [DOI] [PubMed] [Google Scholar]
- 30.Egertson JD, MacLean B, Johnson R, Xuan Y, MacCoss MJ. Multiplexed peptide analysis using data-independent acquisition and Skyline. Nat Protoc. 2015;10:887–903. doi: 10.1038/nprot.2015.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Carvalho PC, Yates JR, 3rd, Barbosa VC. Improving the TFold test for differential shotgun proteomics. Bioinforma Oxf Engl. 2012;28:1652–1654. doi: 10.1093/bioinformatics/bts247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Leprevost FV, et al. Pinpointing differentially expressed domains in complex protein mixtures with the cloud service of PatternLab for Proteomics. J Proteomics. 2013;89:179–182. doi: 10.1016/j.jprot.2013.06.013. [DOI] [PubMed] [Google Scholar]
- 33.Leprevost FV, et al. PepExplorer: A Similarity-driven Tool for Analyzing de Novo Sequencing Results. Mol Cell Proteomics MCP. 2014;13:2480–2489. doi: 10.1074/mcp.M113.037002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Fischer J, de S, de G, et al. A scoring model for phosphopeptide site localization and its impact on the question of whether to use MSA. J Proteomics. 2015 doi: 10.1016/j.jprot.2015.01.008. [DOI] [PubMed] [Google Scholar]
- 35.Fischer J, de S, da G, et al. Dynamic proteomic overview of glioblastoma cells (A172) exposed to perillyl alcohol. J Proteomics. 2010;73:1018–1027. doi: 10.1016/j.jprot.2010.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Carvalho PC, et al. GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data. Proteome Sci. 2009;7:6. doi: 10.1186/1477-5956-7-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Lima DB, et al. SIM-XL: A powerful and user-friendly tool for peptide cross-linking analysis. J Proteomics. 2015 doi: 10.1016/j.jprot.2015.01.013. [DOI] [PubMed] [Google Scholar]
- 38.Borges D, et al. Using SIM-XL to identify and annotate cross-linked peptides analyzed by mass spectrometry. Protoc Exch. 2015 doi: 10.1038/protex.2015.015. [DOI] [Google Scholar]
- 39.Carvalho PC, Yates JR, III, Barbosa VC. Analyzing shotgun proteomic data with PatternLab for proteomics. Curr Protoc Bioinforma Ed Board Andreas Baxevanis Al. 2010:1–15. doi: 10.1002/0471250953.bi1313s30. Chapter 13, Unit 13.13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Carvalho PC, et al. Search engine processor: Filtering and organizing peptide spectrum matches. Proteomics. 2012;12:944–949. doi: 10.1002/pmic.201100529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Carvalho PC, Fischer JSG, Xu T, Yates JR, 3rd, Barbosa VC. PatternLab: from mass spectra to label-free differential shotgun proteomics. Curr Protoc Bioinforma Ed Board Andreas Baxevanis Al. 2012 doi: 10.1002/0471250953.bi1319s40. Chapter 13, Unit 13.19. [DOI] [PubMed] [Google Scholar]
- 42.Eng JK, Jahan TA, Hoopmann MR. Comet: an open-source MS/MS sequence database search tool. Proteomics. 2013;13:22–24. doi: 10.1002/pmic.201200439. [DOI] [PubMed] [Google Scholar]
- 43.Richards AL, et al. One-hour proteome analysis in yeast. Nat Protoc. 2015;10:701–714. doi: 10.1038/nprot.2015.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.UniProt Consortium. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res. 2013;41:D43–47. doi: 10.1093/nar/gks1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Elias JE, Gygi SP. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods. 2007;4:207–214. doi: 10.1038/nmeth1019. [DOI] [PubMed] [Google Scholar]
- 46.Cottrell JS, Creasy DM. Response to: The Problem with Peptide Presumption and Low Mascot Scoring. J Proteome Res. 2011;10:5272–5273. doi: 10.1021/pr200726c. [DOI] [PubMed] [Google Scholar]
- 47.Bandeira N. Spectral networks: a new approach to de novo discovery of protein sequences and posttranslational modifications. BioTechniques. 2007;42(687):689. doi: 10.2144/000112487. 691 passim. [DOI] [PubMed] [Google Scholar]
- 48.Na S, Bandeira N, Paek E. Fast multi-blind modification search through tandem mass spectrometry. Mol Cell Proteomics MCP. 2012;11010199:M111. doi: 10.1074/mcp.M111.010199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Shevchenko A, et al. Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. Anal Chem. 2001;73:1917–1926. doi: 10.1021/ac0013709. [DOI] [PubMed] [Google Scholar]
- 50.Xu T, et al. ProLuCID, a Fast and Sensitive Tandem Mass Spectra-based Protein Identification Program. Mol Cell Proteomics. 2006;5:S174. [Google Scholar]
- 51.Borges D, et al. Effectively addressing complex proteomic search spaces with peptide spectrum matching. Bioinforma Oxf Engl. 2013;29:1343–1344. doi: 10.1093/bioinformatics/btt106. [DOI] [PubMed] [Google Scholar]
- 52.Zybailov B, et al. Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. J Proteome Res. 2006;5:2339–2347. doi: 10.1021/pr060161n. [DOI] [PubMed] [Google Scholar]
- 53.Thompson A, et al. Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal Chem. 2003;75:1895–1904. doi: 10.1021/ac0262560. [DOI] [PubMed] [Google Scholar]
- 54.McAlister GC, et al. MultiNotch MS3 Enables Accurate, Sensitive, and Multiplexed Detection of Differential Expression across Cancer Cell Line Proteomes. Anal Chem. 2014;86:7150–7158. doi: 10.1021/ac502040v. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Picotti P, Aebersold R. Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions. Nat Methods. 2012;9:555–566. doi: 10.1038/nmeth.2015. [DOI] [PubMed] [Google Scholar]
- 56.Vizcaíno JA, et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 2013;41:D1063–1069. doi: 10.1093/nar/gks1262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Chambers MC, et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol. 2012;30:918–920. doi: 10.1038/nbt.2377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Martens L, et al. mzML--a Community Standard for Mass Spectrometry Data. Mol Cell Proteomics. 2011;10:R110.000133–R110.000133. doi: 10.1074/mcp.R110.000133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.McDonald WH, et al. MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications. Rapid Commun Mass Spectrom RCM. 2004;18:2162–2168. doi: 10.1002/rcm.1603. [DOI] [PubMed] [Google Scholar]
- 60.Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nat Methods. 2014;11:1114–1125. doi: 10.1038/nmeth.3144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Carvalho PC, Yates JR, 3rd, Barbosa VC. Improving the TFold test for differential shotgun proteomics. Bioinforma Oxf Engl. 2012;28:1652–1654. doi: 10.1093/bioinformatics/bts247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.de Miguel N, et al. Proteome Analysis of the Surface of Trichomonas vaginalis Reveals Novel Proteins and Strain-dependent Differential Expression. Mol Cell Proteomics. 2010;9:1554–1566. doi: 10.1074/mcp.M000022-MCP201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Clair G, Armengaud J, Duport C. Restricting Fermentative Potential by Proteome Remodeling: AN ADAPTIVE STRATEGY EVIDENCED IN BACILLUS CEREUS. Mol Cell Proteomics. 2012;11:M111.013102–M111.013102. doi: 10.1074/mcp.M111.013102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Webb KJ, Xu T, Park SK, Yates JR. Modified MuDPIT separation identified 4488 proteins in a system-wide analysis of quiescence in yeast. J Proteome Res. 2013;12:2177–2184. doi: 10.1021/pr400027m. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Christie-Oleza JA, Piña-Villalonga JM, Bosch R, Nogales B, Armengaud J. Comparative proteogenomics of twelve Roseobacter exoproteomes reveals different adaptive strategies among these marine bacteria. Mol Cell Proteomics MCP. 2012;11013110:M111. doi: 10.1074/mcp.M111.013110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Christie-Oleza JA, Fernandez B, Nogales B, Bosch R, Armengaud J. Proteomic insights into the lifestyle of an environmentally relevant marine bacterium. ISME J. 2012;6:124–135. doi: 10.1038/ismej.2011.86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Chaves DFS, et al. Comparative proteomic analysis of the aging soleus and extensor digitorum longus rat muscles using TMT labeling and mass spectrometry. J Proteome Res. 2013;12:4532–4546. doi: 10.1021/pr400644x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Shah M, et al. Proteomic Analysis of the Endosperm Ontogeny of Jatropha curcas L. Seeds. J Proteome Res. 2015;14:2557–2568. doi: 10.1021/acs.jproteome.5b00106. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Figure 1. PatternLab’s target-decoy sequence database generation module. This module provides options for parsing data from UniProt, NCBI, IPI, and a Generic Format. The module can automatically include the sequences of 127 common contaminants to proteomics and simplify datasets by eliminating subset sequences or sequences having an identification threshold above a given user specification. In these cases, a note is appended to the description of the remaining sequence to indicate the eliminated sequence(s).
Supplementary Figure 2. The modification library window. New modifications can be included by typing the data in the corresponding cells and then clicking on the ‘Update my lib’ button. Modifications can be included in the search by selecting the desired rows and then clicking on the ‘Add selected row to my search.xml’ button.
Supplementary Figure 3. SEPro’s Entry Screen. PatternLab for proteomics 4.0 makes available preset configurations for filtering results from high-resolution and low-resolution MS1 acquisitions. Regardless, all SEPro filtering parameters are made available in the ‘Advanced Parameters’ tab.
Supplementary Figure 4. A histogram of minus the logarithm of the label-free quantitation values for all the XICs obtained by simultaneously analyzing 26 3-hour LC/MS/MS shotgun proteomic experiments on an Orbitrap Elite (Thermo, San Jose).
Supplementary Figure 5. PatternLab’s Isobaric Analyzer. The lower-right panel contains three plots. The topmost one shows the total signal, obtained only from identified spectra of a given run, for each isobaric marker before normalization. The middle one shows the signals after applying the Channel Signal normalization. The bottommost plot shows the total signal, obtained from all mass spectra of a given run, regardless of identification status, for each isobaric marker before normalization.
Supplementary Figure 6. PatternLab’s XD Scoring module. This module relies on the delta XCorr distribution to fit a Gaussian mixture model that ultimately results in p-values for the phosphosites.
