Skip to main content
JCO Clinical Cancer Informatics logoLink to JCO Clinical Cancer Informatics
. 2022 Aug 25;6:e2200032. doi: 10.1200/CCI.22.00032

MTPpilot: An Interactive Software for Visualization of Next-Generation Sequencing Results in Molecular Tumor Boards

Abdullah Kahraman 1,2, Fabian M Arnold 1, Jacob Hanimann 1, Marta Nowak 1, Chantal Pauli 1,3, Christian Britschgi 4, Holger Moch 1,3, Martin Zoche 1,
PMCID: PMC9470140  PMID: 36007219

PURPOSE

Comprehensive targeted next-generation sequencing (NGS) panels are routinely used in modern molecular cancer diagnostics. In molecular tumor boards, the detected genomic alterations are often discussed to decide the next treatment options for patients with cancer. With the increasing size and complexity of NGS panels, the discussion of these results becomes increasingly complex, especially if they are reported in a text-based form, as it is the standard in current molecular pathology.

METHODS

We have developed the Molecular Tumor Profiling pilot (MTPpilot) webservice using HTML, PHP, JavaScript, and MySQL to support the clinical discussion of NGS results at molecular tumor boards.

RESULTS

MTPpilot integrates various public genome, network, and cancer mutation databases with interactive visualization tools to assess the functional impact of mutations and support clinical decision making at tumor boards.

CONCLUSION

MTPpilot is tailored for discussion of NGS gene panel results at molecular tumor boards. It is freely available as a webservice at MTPpilot.

INTRODUCTION

Next-generation sequencing (NGS) has become a key cornerstone for today's clinical decision making in oncology. NGS cancer panels are widely used and constitute an integral part of modern molecular diagnostics.1 Results of NGS cancer panels are often discussed at molecular tumor boards, where oncologists, molecular pathologists, and staff scientists decide on the best treatment options for a patient.2-6 With the transition from small panels covering hotspot regions of actionable genes to larger comprehensive panels, NGS results have become more challenging to understand and interpret.7 Typically, only a subset of the detected alterations are well-known actionable mutations. Most alterations are scarcely described or are of unknown significance.8 Different institutions and private companies have tackled this issue by providing automated molecular reports that classify alterations according to evidence in the literature or cancer databases. However, these reports generally lack visualization of the results and are very text-focused, making them often very cumbersome to read, especially in the case of many alterations. In addition, text-based reports are less convenient for discussion at molecular tumor boards, where interactive analysis of the results is preferred.

CONTEXT

  • Key Objective

  • How can complex next-generation sequencing results from large cancer panels be effectively used for clinical decision making at molecular tumor boards?

  • Knowledge Generated

  • We developed a webservice called Molecular Tumor Profiling Pilot (MTPpilot) to automatically annotate and interactively visualize genomic alterations from comprehensive genomics profiling assays.

  • Relevance

  • A local version of MTPpilot supports already for many years the clinical decision making at the weekly molecular tumor boards of the Comprehensive Cancer Center in Zürich (CCCZ). We offer MTPpilot as a free webservice to the clinical and bioinformatical community worldwide with similar tools and annotations for the analysis and interpretation of next-generation sequencing results as our local version.

Several solutions tackling this problem have been recently developed, such as cBioPortal, Swiss-PO, or AML Varan. cBioPortal visualizes NGS results of large research cohorts like The Cancer Genome Atlas (TCGA).9,10 Swiss-PO is a tool for structural analysis of mutations in common cancer genes.11 AML Varan is a comprehensive solution, which offers the complete workflow from raw data analysis to medical report generation, but lacks the interactive visualization for molecular tumor boards.12 Local software solutions have also been developed by commercial providers but require access fees. With the increasing complexity of NGS data, a simple comprehensive software solution is required that integrates all aspects of NGS data analysis in the context of precision oncology.

Here, we have developed the Molecular Tumor Profiling Pilot (MTPpilot) software in close collaboration with oncologists and pathologists within the molecular tumor board of the University Hospital Zurich. MTPpilot is tailored for various comprehensive panels and provides a set of fully automated annotations and interactive tools for a real-time interpretation of genomic alterations from several different perspectives. To our knowledge, this is the first freely available software that offers a holistic set of tools to easily analyze NGS data and to aid clinical decision making at molecular tumor boards. MTPpilot is freely available as a web service.13

METHODS

Database Sources

MTPpilot uses at its core a MySQL database populated with tables from other publicly available databases that are routinely used in clinical applications. The public databases form a reference, evidence, and annotation data layer within MTPpilot. They are updated every 6 months. The web application is implemented with PHP for the backend, and HTML and JavaScript for the frontend (Fig 1).

FIG 1.

FIG 1.

The MTPpilot software has at its core a MySQL database with tables holding data from various publicly available databases and a web interface implemented in PHP, HTML, and JavaScript. The publicly available databases function as a reference, evidence, or annotation source. Mutation data, biomarker information, and pathology data are provided by the user via the MTPpilot website and are matched automatically against MTPpilot DB. The matching results are presented on a website in a tabular form and extended with interactive visualizations, including many out links to the reference, evidence, annotation, and other databases.

Reference Databases

Reference databases provide information on the genomic location of genes, transcripts, exons, and proteins. For MTPpilot, we use the Ensembl database version 105.14 Genomic location information for GRCh37 coordinates including UniProt ID15 for chromosomes 1 to 22, X, Y, and M was downloaded using the BioMart data mining tool provided on the Ensembl website. Haplotypes and genome patches were ignored.

Evidence Databases

MTPpilot uses Evidence databases to assess the pathogenicity of mutations. At the moment, the Evidence databases include TCGA, ClinVar, gnomAD, tumorfusions.org,16 and the ARUP database. Because of licensing issues, the inclusion of the COSMIC database had to be dropped.

As the largest Exome cancer mutation database, we downloaded from the TCGA data repository17 MUTECT2-annotated simple nucleotide variation VCF files for 11,037 whole-exome sequencing cases covering 33 TCGA projects. In addition, 11,104 gene-level copy number data for the same 33 TCGA projects were downloaded.

To obtain information on the clinical relevance and pathogenicity of mutations, we downloaded from ClinVar 862,195 GRCh37 variants via the tab-separated file variant_summary.txt.gz that is provided on ClinVar's FTP server.18

To judge whether mutation is benign, we downloaded from GATK's best practice Google Cloud repository19 GnomAD GRCh37. GRCh37 variants can be found at Somatic b37 gnomad sites.20

As no gene fusion data are available at the TCGA data repository, we took all 20,731 gene fusion events from Supplementary Table S1 of the recent publication from the tumorfusion.org database.16

The ARUP database is a gold reference database for classifying benign and pathogenic BRCA1 and BRCA2 mutations in clinics. We downloaded the database directly from the ARUP tables at BRCA1 Database21 and BRCA2 Database.22

Annotation Databases

Annotation databases provide MTPpilot with various sets of functional information on mutations. MTPpilot's annotation databases include the Protein Data Bank (PDB), STRING23 protein interaction database, SMART domain database, Gene Ontology (GO) database,24 KEGG pathway database,25 and ConSurf protein conservation database.26

MTPpilot maps mutations on PDB structures to visualize their potential effect on protein functions. The amino acid sequence in PDB structures, however, often deviates from the sequence in protein databases, which is why a simple amino acid number selection in PDB structures is difficult. To circumvent this problem, we pairwise aligned each PDB sequence to its associated UniProt sequence and stored an amino acid mapping table into the MTPpilot database.

URLs for retrieving secondary structure images were obtained from the SMART and STRING interaction network developers. The URLs are adjusted at runtime during result page loading to accommodate a red label at the location of the mutation in the protein sequence.

To obtain information on a gene's molecular function, cellular location, and biologic process, GO information for each gene was retrieved via the BioMart data mining tool as described above. Information on which cancer pathway a gene is participating was retrieved via the KEGG pathway database and KEGG Markup Language (KGML) files that are available at the KEGG pathway websites. The KGML files were downloaded on March 19, 2019, and processed to extract the pathway name, KEGG pathway ID, and gene name. The resulting files were uploaded as an SQL table to the MTPpilot DB.

The ConSurf algorithm was applied on all human canonical protein sequences in UniProt. ConSurf is one of the first algorithms to include phylogenetic relations in estimating evolutionary conservation and has already a proven track record to identify functionally important amino acids.26 A copy of the ConSurf software was provided on request by the ConSurf developers. The software was applied on the UniRef90 database (downloaded on December 29, 2020) with a maximum number of 300 homologs used in ConSurf calculation, one iteration of PsiBlast search with an E-score < 0.0001, and the MUSCLE multiple sequence alignment program.27 Subsequently, the .grades output files of ConSurf were parsed and the conservation scores were saved as a table in the MTPpilot DB.

All files generated as described above were further filtered for 1,035 relevant genes that are part of Foundation Medicine's FoundationOne CDx panel,28 Illumina's TruSight Oncology 500 panel,29 and Thermo Fisher’s Oncomine Focus and Comprehensive Assay panels.30

Implementation of Interactive Visualizations

MTPpilot provides various interactive visualization tools to interact with the mutational data.

Ideogram

For drawing a chromosome ideogram to provide a quick summary view of all mutations and mutated chromosomes, MTPpilot uses the ideogram.js library version 1.5.0 developed by Eric Weitz.31 In the ideogram, short variances are shown as green circles, amplifications as red squares, copy number losses as blue squares, and fusions as pink triangles. Variants of unknown significance are shown in the same colors but opaque.

TCGA Histology Matcher

For each histology of the TCGA short nucleotide variant (SNV) data set, the top 20 mutated genes were calculated and the cancer histologies were categorized into 37 cancer tissues. The user first selects a tissue and subsequently one of the associated TCGA histologies that matches best to the uploaded data. The SNV provided by the user is then matched against the top 20 mutated genes from the TCGA data set.

TCGA Patient Matcher

MTPpilot’s TCGA patient matcher compares the mutational profile of a tumor board case with MTPpilot’s TCGA database. For the comparison, all pathogenic mutations of the tumor board case are matched against all mutations in the TCGA database. A float number score is computed for each match in the format n.m, where the predecimal number n is the number of identical mutated genes between the tumor board case and a TCGA case and the decimal number m is the number of identical mutations. The value of m is always equal to or smaller than the value of n, depending on whether all or some mutations are identical between the tumor board case and a TCGA case. Two cases with the same number of identical mutated genes n but different numbers of identical mutations are ranked such that the case with a larger m is higher than the case with the smaller m. The score can be found by hovering over the values in the Similarity column. To put the score into context and report the score in percent similarity, we divide each score by the score of the tumor board case. As a result, the tumor board case is always 100% similar to itself and equal or less similar to the other TCGA cases.

TCGA Prevalence Viewer

The prevalence viewer eases the recognition of hotspot mutations in cancer genes by providing a bar plot of mutational frequencies for all amino acid positions ordered from left to right from higher to lower frequencies. The frequencies are computed on the basis of more than 1,754,000 SNV data from TCGA. Hotspot regions are typically more frequently mutated than other regions of the gene. To determine a hotspot region, we first excluded all amino acid positions with a single mutation in TCGA. Next, using the frequencies of the remaining positions, we computed the mean and standard deviation frequency for the gene and defined hotspot regions, if their frequency is higher than 10 and higher than the sum of the mean and standard deviation.

MTP3D PDB Viewer

The MTP3D PDB viewer is based on the JavaScript NGL Viewer library version 2.0.0-dev.39.32 The MTP3D viewer shows by default the PDB structure in cartoon representation with an opaque visualization of the molecular surface. Small molecules and HETATM groups are represented as multicolored spheres. Buttons are available to toggle between the structural representations of the protein including cartoon, sticks + balls, spheres, and surface representations. Checkboxes allow one to hide or show HETATMs, water + ions, all mutations in TCGA, or a protein structure with a wild-type amino acid at the mutation side. The latter will show any conformational change inflicted by the mutation.

MTP3D will by default show protein structures, which were crystallized with the mutation of the case. If not available, a structure with coordinates of the mutation site, eg, of the wild-type sequence, will be displayed. In any case, a drop-down menu at the top edge of the MTP3D viewer allows the user to switch to any other protein structure of the mutated protein. To ease the selection, the PDB title is given next to the PDB ID. A small information panel at the top right corner of the MTP3D viewer presents information on the availability of the mutation in the protein structure, the resolution of the PDB structure, the number of residues in the structure, and the percentage of the original protein sequence covered by the PDB structure.

MTP Fusion Viewer

For rendering gene structures in fusions, MTPpilot uses the Snap.svg JavaScript library version 0.5.1.33 Exon and intron elements including untranslated regions are drawn on the basis of exon, intron, and untranslated regions start and end positions stored in MTPpilot's reference Ensembl database (see above). Protein family domains are indicated with additional green boxes labeled with protein family domain names, whose position is determined by amino acid start and end positions retrieved from the Ensembl database. Whether a fusion event is in-frame or out-of-frame is determined via codon-phase information from Ensembl's BioMart data mining tool (see above).

GDPR Compliance

The MTPpilot website does not collect any personally identifiable information and anonymizes any collected data. Users cannot be identified and are never tracked across websites. No website cookies are used. Thus, the website is GDPR-compliant.

We urge users not to upload any germline mutations that could potentially reveal the identity of a patient. All data uploaded to the MTPpilot web server should be anonymized before upload. All data of a case are automatically deleted after 30 days via a MySQL Event Scheduling job.

RESULTS

Data Upload and Graphical Interface

Users can either upload files in tab-separated value (tsv) format or enter the mutational data manually via a table upload form. The tsv format is specified by an example file available in the csv upload section. The upload is limited to 200 lines of alterations. After uploading, the user is able to visually inspect the data in a table view. Conventions for how to annotate nucleotide and amino acids changes are given in a tutorial section. After the successful submission of the data, the user is redirected to the graphical interface of the MTPpilot software (Fig 2).

FIG 2.

FIG 2.

The MTPpilot graphical interface. The interface is subdivided into (1) a case-specific bookmark; (2) a header with case information; (3) a mutational profile analysis panel, which features the biomarkers provided by the user, profile and patient matching tools, and a plot of the mutations on the affected chromosomes (ideogram); (4) a short variant panel, offering several tools to analyze short nucleotide variants; (5) a copy-number-alterations panel; (6) and a panel for the inspection of rearrangements, such as fusions or truncations.

Case Bookmark and Case Information

The case bookmark includes a five-character long alphanumeric hash ID that a user can use to retrieve the current session after closing the MTPpilot website, without requiring a data reupload. Note that no permission check is performed when loading a MTPpilot website via a bookmark. Each bookmark is valid for 30 days, after which all data associated with the bookmark will be automatically deleted.

The case information displays the case identifier, the tissue and the histology provided by the user, and the submission time and date.

Genomic Biomarkers and Mutational Profiles

The genomic biomarkers and mutational profile panel (Fig 3) display the tumor mutational burden,34 microsatellite status,35 and loss of heterozygosity score.36 The profile matching section offers two matching tools: (1) a TCGA histology matcher and (2) a TCGA patient matcher. In the TCGA histology matcher, SNVs are matched against the top 20 mutated genes for a preselected cancer histology in the TCGA data set.10 Via a pop-up, a histogram highlights the match between TCGA and user-provided altered genes (Fig 3A). This allows the user to assess how well the provided mutations fit to the selected histology. In the TCGA patient matcher tool, all alterations are matched against the mutations of each TCGA sample. Matched samples are listed in table form ordered by similarity, and the tissue frequencies of matched samples are reported graphically (Fig 3B). This allows the user to identify TCGA samples with similar alteration profiles. These tools are especially beneficial for checking and predicting cancer histologies, eg, in the case of cancers of unknown primary. All mutations are plotted on an ideogram showing the alterations by type (single-nucleotide variants, amplifications, losses, and rearrangements) on the affected chromosomes (Fig 2). The ideogram allows grasping the mutational landscape of a case, identifying, for example, highly mutated tumors or homologous recombination deficiency in the case of many copy number alterations.

FIG 3.

FIG 3.

Profile analysis with TCGA data. (A) In the profile matching tool, the altered genes are matched against the top 20 most frequently altered genes of the TCGA histology (adenocarcinoma, NOS [samples N = 328]) provided by the user. Matched genes are highlighted in red. (B) Alterations are matched against each TCGA sample in the TCGA patient matcher tool. The most similar samples are displayed in table form, and the tissue distribution of these samples is reported in a pie chart. The user can manually set the minimum similarity threshold for matching. TCGA, The Cancer Genome Atlas.

Short Variants Analysis

The short variants analysis (SNV) panel offers comprehensive annotations and several tools for inspection of SNV, insertions, deletions (indels), and frameshifts (Fig 4A). Variants of unknown significance can be displayed or hidden by using the toggle switch at the top of the panel. The first five columns provide standard annotations such as protein sequence, coding sequence, and exon annotations with outlinks to bioinformatic databases (Fig 4A, columns 1-5). The TCGA prevalence tool shows a histogram of the most frequently mutated amino acid positions within a gene in the TCGA data set (Figs 4C and Fig 4A, column 7). The tool helps to determine whether a user-provided alteration lies in a mutational hotspot. The evidence column provides information if an alteration is described in the TCGA, ClinVar,37 ARUP,38 or gnomAD39 databases. If no evidence is available, the database links are greyed out (Fig 4A, column 8). The SMART40 domain viewer shows the position of the alteration within the secondary structure of the protein and conserved domains (Figs 4D and Fig 4A, column 9). The MTP3D tool displays the special coordinates of the altered amino acid on protein structures of the PDB41 (Figs 4B and Fig 4A, column 11).

FIG 4.

FIG 4.

Features of the single-nucleotide variant panel. (A) The different columns of the panel comprising annotation and links to interactive tools are highlighted by numbers: (1) protein name link to the UniProt entry; (2) variant allele frequency provided by the user; (3) protein mutation; (4) nucleotide alteration; (5) chromosome and exon number of the alteration; (6) TCGA amino acid position prevalence tool; (7) further information such as chromosomal coordinates or amino acid conservation score; (8) evidence inspection tool comprising the databases TCGA, ClinVar, gnomAD, and for BRCA genes, ARUP; (9) visualization of the alteration (red tag) within the 2-dimensional SMART domains of the protein; (10) biologic role of the gene according to KEGG pathways; (11) link to the MTP3D viewer tool for analysis of alterations on three-dimensional protein structures; and (12) further links to other databases and NGS panels. (B) The MTP3D viewer tool displays the altered amino acid as a sphere model on the structure with the highest resolution of the affected protein in cartoon representation. Different views such as stick and balls, spheres, or surfaces can be chosen, as well as other structures from the PDB database. In this example, the EGFR alteration C797S is displayed on PDB ID 6SBA. (C) The TCGA prevalence tool shows the frequency of alterations at a given amino acid position for the affected protein (gene EGFR) and highlights the alteration provided by the user in red. In this example, the EGFR E746 position was altered and is highlighted as one of the 20 most common mutated amino acid positions of EGFR. (D) The SMART domain tool displays the location of the short nucleotide variant (red line) within the protein domains. In this example, the P53 R196* terminating mutation is shown within the DNA-binding domain of P53. NGS, next-generation sequencing; PDB, Protein Data Bank; TCGA, The Cancer Genome Atlas.

Copy Number Variants Panel

The copy number analysis panel (Fig 2) displays information such as the chromosomal location and the evidence in the TCGA data set and in ClinVar. This section is useful in conjunction with the ideogram, where the copy number variants panel shows the copy number changes, whereas the ideogram highlights the distribution of gains and losses over the affected chromosomes.

Rearrangement Analysis Panel

The rearrangement analysis panel (Fig 2) displays the chromosomal coordinates of the fusion breakpoints, the TCGA evidence viewer, and a link for the MTP fusion viewer tool. The TCGA evidence viewer shows common fusion partners in the TCGA data set and the tissue frequencies, in which the rearranged gene was observed. The MTPfusion viewer gives a graphical representation of the resulting fusion event. The user can choose between different protein transcripts and sequence orientations (Fig 5).

FIG 5.

FIG 5.

The MTPfusion viewer tool. The tool displays a case of a canonical EML4-ALK fusion. Breakpoints of the involved genes provided by the user are highlighted in red. The tool allows to change the isoform of the two genes involved in the rearrangement. The orientation of the rearrangement can be flipped with the reverse selector.

DISCUSSION

In conclusion, at molecular tumor boards, patients are discussed in a limited amount of time. To support efficient discussions, MTPpilot offers automatic alteration annotations and numerous interactive tools for clinical NGS data interpretation. For several years, a local version of the MTPpilot application has been used at the molecular tumor board of the Comprehensive Cancer Center at the University Hospital Zurich in Switzerland. It is fully integrated into the local pathology, NGS laboratory, and mutation database infrastructure. By providing MTPpilot as a free web application, we want to offer a similar experience to the clinical community. The application is available at the URL.13 Contact the authors for a local integration of the software at a hospital.

ACKNOWLEDGMENT

We thank Domingo Aguilera for his continuous support in this project.

Christian Britschgi

Consulting or Advisory Role: AstraZeneca, Pfizer, Roche, Takeda, Janssen-Cilag, Boehringer Ingelheim, Roche

Travel, Accommodations, Expenses: AstraZeneca, Takeda

Holger Moch

Consulting or Advisory Role: Targovax

Travel, Accommodations, Expenses: Roche Pharma AG

Martin Zoche

Consulting or Advisory Role: GlaxoSmithKline, Bayer Schering Pharma

Research Funding: Roche

Travel, Accommodations, Expenses: F. Hoffmann-La Roche AG, Basel

No other potential conflicts of interest were reported.

Footnotes

*

A.K. and F.M.A. contributed equally to this work.

DATA SHARING STATEMENT

MTPpilot database was build using publicly available data from the Ensembl database version 105, PDB (retrieved on January 11, 2021), gnomAD (af-only-gnomad.raw.sites.vcf from GATK best practice Google cloud repository), TCGA (retrieved on July 17, 2021), ClinVar (retrieved on January 10, 2022), the ARUP Database (retrieved on January 4, 2022), KEGG database (retrieved on March 19, 2019), tumorfusions.org (retrieved on August 13, 2021), and ConSurf (computed with UniProt data retrieved on December 29, 2020).

AUTHOR CONTRIBUTIONS

Conception and design: Abdullah Kahraman, Fabian M. Arnold, Marta Nowak, Holger Moch, Martin Zoche

Financial support: Holger Moch

Collection and assembly of data: Abdullah Kahraman, Fabian M. Arnold, Chantal Pauli

Data analysis and interpretation: Abdullah Kahraman, Fabian M. Arnold, Jacob Hanimann, Chantal Pauli, Christian Britschgi

Manuscript writing: All authors

Final approval of manuscript: All authors

Accountable for all aspects of the work: All authors

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Christian Britschgi

Consulting or Advisory Role: AstraZeneca, Pfizer, Roche, Takeda, Janssen-Cilag, Boehringer Ingelheim, Roche

Travel, Accommodations, Expenses: AstraZeneca, Takeda

Holger Moch

Consulting or Advisory Role: Targovax

Travel, Accommodations, Expenses: Roche Pharma AG

Martin Zoche

Consulting or Advisory Role: GlaxoSmithKline, Bayer Schering Pharma

Research Funding: Roche

Travel, Accommodations, Expenses: F. Hoffmann-La Roche AG, Basel

No other potential conflicts of interest were reported.

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

MTPpilot database was build using publicly available data from the Ensembl database version 105, PDB (retrieved on January 11, 2021), gnomAD (af-only-gnomad.raw.sites.vcf from GATK best practice Google cloud repository), TCGA (retrieved on July 17, 2021), ClinVar (retrieved on January 10, 2022), the ARUP Database (retrieved on January 4, 2022), KEGG database (retrieved on March 19, 2019), tumorfusions.org (retrieved on August 13, 2021), and ConSurf (computed with UniProt data retrieved on December 29, 2020).


Articles from JCO Clinical Cancer Informatics are provided here courtesy of American Society of Clinical Oncology

RESOURCES