Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jan 26.
Published in final edited form as: Nat Methods. 2021 Jun;18(6):580–582. doi: 10.1038/s41592-021-01176-6

PANOPLY: A cloud-based platform for automated and reproducible proteogenomic data analysis

D R Mani 1,*, Myranda Maynard 1, Ramani Kothadia 1, Karsten Krug 1, Karen E Christianson 1, David Heiman 2, Karl R Clauser 1, Chet Birger 2, Gad Getz 2,3,4, Steven A Carr 1
PMCID: PMC8791030  NIHMSID: NIHMS1765107  PMID: 34040252

Abstract

Proteogenomics involves the integrative analysis of genomic, transcriptomic, proteomic and post-translational modification data produced by next-generation sequencing and mass spectrometry-based proteomics. Several publications by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and others have highlighted the impact of proteogenomics in enabling deeper insight into the biology of cancer and identification of potential drug targets. In order to encapsulate the complex data processing required for proteogenomics, and provide a simple interface to deploy a range of algorithms developed for data analysis, we have developed PANOPLY—a cloud-based platform for automated and reproducible proteogenomic data analysis. A wide array of algorithms have been implemented, and we highlight the application of PANOPLY to the analysis of cancer proteogenomic data.


Proteogenomics involves the integrative analysis of genomic, transcriptomic, proteomic and post-translational modification (PTM) data produced by next-generation sequencing and mass spectrometry-based proteomics. Several publications by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and others have highlighted the impact of proteogenomics in enabling deeper insight into the biology of cancer and identification of potential drug targets14. In order to encapsulate the complex data processing required for proteogenomics, and provide a simple interface to deploy a range of algorithms developed for data analysis, we have developed PANOPLY—a cloud-based platform for automated and reproducible proteogenomic data analysis. A wide array of algorithms have been implemented, and we highlight the application of PANOPLY to the analysis of cancer proteogenomic data.

PANOPLY uses state-of-the-art statistical and machine learning algorithms to transform multi-omic data from cancer samples into biologically meaningful and interpretable results. PANOPLY provides a comprehensive collection of proteogenomic data analysis methods including sample QC (sample quality evaluation using profile plots and tumor purity scores1, identify sample swaps, etc.), association analysis, RNA and copy number correlation (to proteome), connectivity map (CMAP) analysis5, outlier analysis using BlackSheep6, PTM Signature Enrichment Analysis (PTM-SEA)7, Gene Set Enrichment Analysis (GSEA)8 and single-sample GSEA7, consensus clustering, and multi-omic clustering using non-negative matrix factorization (NMF) (Figure 1A). Most analysis modules include a report generation task that outputs an interactive HTML report summarizing results from the analysis (Figure 1B). A complete list of PANOPLY task modules with documentation can be found at the PANOPLY wiki (https://github.com/broadinstitute/PANOPLY/wiki). The types of liquid chromatography-tandem mass spectrometry-based (LC-MS/MS) proteomics data that are amenable to analysis by PANOPLY includes label-free and isobaric mass tag label-based LC-MS/MS approaches like iTRAQ and TMT profiling of the proteome and multiple PTM-omes including phospho-, acetyl- and ubiquitylomes. PANOPLY can analyze LC-MS/MS data input as (log-transformed) intensities or ratios.

Figure 1.

Figure 1.

Overview of PANOPLY architecture and the various tasks that constitute the complete workflow. Tasks can also be run independently, or combined into custom workflows, including new tasks added by users. (A) Inputs to PANOPLY consists of (i) externally characterized genomics and proteomics data (in gct format); (ii) sample phenotypes and annotations (in csv format); and (iii) parameters settings (in yaml format). Panoply modules are grouped into Data Preparation Modules (green box), Data Analysis Modules (blue box) and Report Modules (red box). Data preparation modules perform quality checks on input data followed by optional normalization and filtering for proteomics data. Analysis ready data tables are then used as inputs to the data analysis modules. Results from the data analysis modules are summarized in interactive reports generated by appropriate report modules. (B) A sampling of reports from PANOPLY for the tutorial BRCA dataset 1 that can be explored in a web browser. The full suite of reports (html) derived from the tutorial dataset can be accessed and explored at http://prot-shiny-vm.broadinstitute.org:3838/panoply-tutorial/reports, with an interpretation summary at https://github.com/broadinstitute/PANOPLY/wiki/Navigating-Results. Multi-omics NMF clustering: Heatmap (left) depicts multi-omics expression patterns across three NMF clusters derived from integrative clustering of copy-number alterations (CNA), RNA, protein and phosphosite data. Annotation tracks depict mutation status of key BRCA genes (TP53, GATA3, PIK3CA), hormone receptor status (ER, PR, HER2) and PAM50 transcriptional subtypes. Right heatmap shows activity scores of cancer hallmark pathways for the three NMF clusters. Association analysis: Interactive volcano plot illustrating differentially abundant signatures of cancer hallmark pathways between tumors with GATA3 mutations (right side) compared to GATA3 wild-type tumors (left side). A browseable table summarizes the results. RNA correlation analysis: Histogram of all (blue) and significant (red, FDR < 0.05) gene-level Pearson correlations between RNA and protein data. A table summarizing high-level results and an interactive correlation (y-axis) rank (x-axis) plot enables further exploration of the results. CNA correlation analysis: pairwise gene-level correlations between CNA (x-axis in both panels) and RNA (y-axis, left panel) and proteome (y-axis, right panel). Significant (FDR < 0.05) positive (red) and negative (green) correlations are indicated. CNA cis-effects appear as a red diagonal line, CNA trans-effects as vertical stripes. The histograms show the number of significant CNA trans-effects for each CNA gene. Outlier analysis: Phospho-level outlier analysis showing outliers in the HER2 PAM50 subtype in a searchable table and accompanying heatmap. Immune analysis: Computational approaches to infer immune/stromal tumor components, and immune subtype from RNA expression data are implemented in PANOPLY. Shown are immune scores derived from ESTIMATE9 (x-axis) and xCell10 (y-axis) for 77 BRCA tumors.

PANOPLY leverages Terra (http://app.terra.bio )—a Google Cloud-based cloud-native platform for extreme-scale data analysis, sharing, and collaboration—to host proteogenomic workflows, and is designed to be flexible, automated, reproducible, scalable, and secure. Using the underlying architecture of Terra, PANOPLY analysis algorithms and methods are implemented as tasks, interchangeably referred to as modules. A series of tasks constitutes a workflow, which represents a pipeline with outputs of one or more tasks feeding into inputs of downstream tasks. Tasks and workflows are defined using the Workflow Description Language (WDL), allowing users to customize pipelines and quickly add their own tasks to the PANOPLY workflow. The central element of PANOPLY is the Terra workspace—a shareable collection of everything needed for a data analysis project, including data located on cloud storage linked with the workspace, workflows encapsulating algorithms and pipelines, analysis parameters, results, and a data model that organizes sample meta-data into collections of participants, samples, and sample sets. With versioned workflows, workspaces enable reproducible research by completely encapsulating input data, analysis methods, settings, and results in a shareable cloud-based space.

PANOPLY v1.0 consists of the following components:

PANOPLY has been used for analyzing data from many CPTAC and other cancer proteogenomic landscape studies, and workspaces with results for published data sets, including CPTAC BRCA1,4 (https://app.terra.bio/#workspaces/broad-firecloud-cptac/PANOPLY_CPTAC_LUAD) and LUAD3 (https://app.terra.bio/#workspaces/broad-firecloud-cptac/PANOPLY_CPTAC_LUAD) are available. While PANOPLY has been developed and presented in the context of cancer proteogenomic studies, it can be applied to any proteogenomics study with appropriate data, vastly simplifying the analysis and integration process. While the current release of PANOPLY contains algorithms for integrative proteogenomic data analysis, future versions will include workflows to perform raw data characterization of genomics and proteomics data. New analysis methods from recent CPTAC proteogenomic studies will also be added in new releases. We envision that use of PANOPLY will provide a quick and comprehensive baseline analysis for proteogenomic studies, leading to many disease specific hypotheses that can be explored further using additional computational and wet-lab experiments.

Supplementary Material

Supplementary Table 1

Acknowledgements

This work was supported by grants from the National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium grants NIH/NCI U24CA210979 (to DRM and GG) and NIH/NCI U24-CA210986 and NIH/NCI U01 CA214125 (to SAC).

Footnotes

Competing Interests

The authors declare no competing interests.

Data and Code Availability

All code and data are accessible online at GitHub (https://github.com/broadinstitute/PANOPLY) and Terra (https://app.terra.bio/). More detailed information is provided in Supplementary Table 1.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table 1

Data Availability Statement

All code and data are accessible online at GitHub (https://github.com/broadinstitute/PANOPLY) and Terra (https://app.terra.bio/). More detailed information is provided in Supplementary Table 1.

RESOURCES