Skip to main content
Cancer Informatics logoLink to Cancer Informatics
. 2019 Mar 13;18:1176935119835546. doi: 10.1177/1176935119835546

Visual Analytics of Genomic and Cancer Data: A Systematic Review

Zhonglin Qu 1,, Chng Wei Lau 1, Quang Vinh Nguyen 1,2, Yi Zhou 1, Daniel R Catchpoole 3,4,5
PMCID: PMC6416684  PMID: 30890859

Abstract

Visual analytics and visualisation can leverage the human perceptual system to interpret and uncover hidden patterns in big data. The advent of next-generation sequencing technologies has allowed the rapid production of massive amounts of genomic data and created a corresponding need for new tools and methods for visualising and interpreting these data. Visualising genomic data requires not only simply plotting of data but should also offer a decision or a choice about what the message should be conveyed in the particular plot; which methodologies should be used to represent the results must provide an easy, clear, and accurate way to the clinicians, experts, or researchers to interact with the data. Genomic data visual analytics is rapidly evolving in parallel with advances in high-throughput technologies such as artificial intelligence (AI) and virtual reality (VR). Personalised medicine requires new genomic visualisation tools, which can efficiently extract knowledge from the genomic data and speed up expert decisions about the best treatment of individual patient’s needs. However, meaningful visual analytics of such large genomic data remains a serious challenge. This article provides a comprehensive systematic review and discussion on the tools, methods, and trends for visual analytics of cancer-related genomic data. We reviewed methods for genomic data visualisation including traditional approaches such as scatter plots, heatmaps, coordinates, and networks, as well as emerging technologies using AI and VR. We also demonstrate the development of genomic data visualisation tools over time and analyse the evolution of visualising genomic data.

Keywords: multidimensional data, genomic data, analytics, visualisation, virtual reality, augmented reality, immersive, artificial intelligence, machine learning, personalised medicine

Introduction

Visual analytics of genomic data is widely used in biology to help understand the data and communicate its contents, generate ideas, and to gain insight into biological processes. Visualisation plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analytics combine visualisation with analysis tools to enable seamless use of both approaches for scientific enquiring and offer a powerful method for performing complex genomic analyses.

Genomic is the convergence of many sciences including genetics, molecular biology, biochemistry, statistics, and computer sciences.1 Since Gregor Mendel discovered the basic principles of heredity, which became the foundation of modern genetics with the study of heredity,2 huge amounts of genomic data have now been collected around the world by different organisations. For example, one of the world’s largest pharmaceutical companies, AstraZeneca, launched a massive effort to compile genome sequences and health records from 2 million people. The company and its collaborators hoped to unearth rare genetic sequences that are associated with diseases and with responses to treatment.3 By the end of 2003, the Human Genome Project4 had successfully completed the ambitious goal of collecting sequence code covering 3 billion base pairs in the human genome, 2 years ahead of the previous projects.5 Sequencing is becoming the most popular high-throughput technology including the study of various genetic diseases as well as drug design and discovery for the diseases. With the development of computer technologies, genomic data can be collected at a faster pace and at a lower cost. The significance of this has launched the age of individual genome sequencing which supports an era of personalised medicine.6 Personalised cancer medicine based on the molecular characteristics of a tumour from an individual patient has great potential in the therapy of any type of cancer.7 DNA sequencing capacities continue to grow rapidly. If the growth continues at the current rate by doubling every 7 months, then we should reach more than 1 exabyte (1018) of sequence per year in the next 5 years and the approach 1 zettabyte (1021) of sequence per year by 2025.8

In human health, the major need driven by the vast amount of genomic data is how to interpret genomic sequences and how to find patterns9 over the large collections in high dimensions. Data visualisation is a way to convey meaningful concepts in a universal manner that is rapid and efficient and can allow humans to find potential value in big data. How we visualise complex data is becoming an increasingly significant part of the cognitive system and can provide the highest bandwidth channel from the computer to the human. The term visualisation, in the past, meant constructing a visual image in the mind. Now, it means a physical or graphical representation of data or concepts that clinicians or researchers graph with genomic big data. Visualisation, as a cognitive tool, has the following advantages: provides an ability to comprehend huge amounts of data; allows the perception of emergent properties that were not anticipated; enables problems with the data to become immediately apparent; facilitates understanding of both large- and small-scale features of the data; and facilitates hypothesis formation.10-12 Some intuitive visualisation tools are used to visualise multidimensional cancer genomic data, and they integrate different types of alterations with clinical data to extract useful knowledge from the vast amount of data which is generated by high-throughput technologies.13,14

New technologies are starting to be used in visual analytics of genomics such as artificial intelligence (AI) and virtual reality (VR) or augmented reality (AR). Artificial intelligence is already a part of our everyday lives and has been heralded as the key to our civilisation’s brightest future.15 Machine learning, as an approach to achieve AI, is the practice of using algorithms to parse data, learning from it, and then making a determination or prediction about something in the world.16 Machine learning boosts the next generation of visualisation which is named as intelligent visualisation. Intelligent visualisation assists a human user to handle tedious or repetitive tasks by learning from previous sessions and input data. Intelligent visualisation combines machine-learning algorithms to make high-level, goal-oriented decisions, which makes data visualisation technology directly accessible to a wide range of application scientists.17,18

Intelligent data visualisation can be used to find the relationship between genomic data and diseases and aid in the process of targeted personalised therapy.19 In the analysis of genomic data, the current statistical analysis methods are not enough for achieving data insight from the data-analysed applications. Meanwhile, applications of machine learning and data visualisation have become more attractive. Intelligent visualisation combined with machine-learning algorithms for genomic data is a big challenge and is becoming a new trend in the genomic visualisation evolution. Some modern data visualisation tools use AI technology, modern three-dimensional (3D) plots, mobile devices, and VR or AR techniques to tell the full story of genomic data. Three-dimensional and VR/AR techniques immerse the user into a digitally created space and simulate movement in three dimensions to greatly increase the bandwidth of data available to our brains.20-22 All the tools allow users to interact with the data in a way that is more natural to human cognition and movement. This includes reaching out to manipulate virtual objects constructed from the data with our hands, moving around them to view them from a clearer perspective and highlighting objects of interest with a point of the finger.

In this article, we focus on selected intelligent visual analytic tools for genomic and cancer data that are essential to support the effective disease and patient assessment. We provide a comprehensive comparison of the tools in both aspects: (1) the visualisation methods in genomic and cancer data fields and (2) the trends of visualisation in genomic analytic fields from 2000 to now. We reviewed the situation of current genomic and cancer data, the potential application to personalised medicine, and methods for genomic data visualisation. Here, we assess the units of traditional approaches such as scatter plots, heatmaps, coordinates, networks, and clustering, as well as emerging technologies involving AI and VR. We also review the evolution of genomic data visualisation tools from the speed of technology development, effective interactions, current tool status, tool integrations, and new features.

Review Strategy

Methods

This systematic review was conducted in accordance with the guidelines provided in the PRISMA statement. ‘Computational methods and resources for the interpretation of genomic variants in cancer’23 was reviewed in 2015, and ‘Expanding the computational toolbox for mining cancer genomes’24 was reviewed in 2014. In this article, we focus on tools, methods, and trends for visual analytics of genomic data, particularly cancer data. This study has no direct involvement of the handling or inclusion of personal data, so ethical approval was not necessary.

Search strategy

We commenced with a general search on a search engine, such as Google, and then in several databases, namely, BMC Genomics, Nature, Genome Research, IEEE, and ACM. We also searched through the relevant reports such as Scientific Report. In addition, a forward search of authors mentioned and the website of a tool in selected articles was also conducted. The search terms included ‘Genomic visualisation’, ‘Genomic visual analytics’, ‘Cancer data visualisation’, and ‘Genomic data visualisation tools’. These words were used for all the other database searches. Only studies published in English language from year 2000 onwards were included for review. The main reviewer extracted and analysed data from all articles in consultation with the other authors.

Bias assessment

In this article, we focused on reviewing the methods and trends of all the selected genomic data visualisation tools. There is no specific data collection process and no specific source of data, so this systematic review has no bias related to data. There is no meta-analysis in this systematic review either to avoid statistical procedure bias. We classified the tools in a tabular form and we discussed both positive and negative aspects in the main document. We aimed to minimise the bias in the discussion by referring to details that were presented in the previous publications or respectable sources.

Outcomes

Related work

Massive genomic datasets are generated by different projects, stored and shared with the different group of professionals. To help downstream analysts to access and manipulate the massive sequencing datasets in a programmatic way, new feature-rich, efficient, and robust analysis tools have been developed to process data to answer specific scientific questions.25,26 Through this, knowledge about associations between genomic factors and diseases have rapidly accumulated. Genomic analyses have provided new biologic insights into the pathogenesis and classification of diseases and insights into determinants of success and failure of therapies, which lead to develop analytic approaches that use multidimensional datasets and embrace the complexity of genomic data for personalised medicine.27,28

Personalised medicine is the tailoring of medical treatment to the individual characteristics, needs, and preferences of each patient. Personalised medicine presents the unique challenge for new tools that can efficiently extract knowledge from the data, explore the multiple relationships between the data, and speed up experts’ decisions about individual patients. Then, patients can be treated and monitored in specific ways to meet their individual needs.29-32

Personal health data are soaring with increasing number of mobile health applications. Mobile health has grown exponentially over the last several years and is expected to worth about $20.7 billion by 2018, with nearly 96 million users.33 Thousands of applications are being developed and used to collect personal health and lifestyle data, which make personalised health more personal than ever imagined. Data analytical tools can be used to visualise data from the population level to a more personalised approach, from the reactive method to proactive method, to focus on prevention, wellness, and most importantly – the individual.34,35

In the following two sections, we provide a comprehensive comparison of the tools in both aspects: (1) the visualisation methods in genomic and cancer data fields (section ‘Comparison of traditional and new methods for genomic data visualisation’) and (2) the trend of visualisation in genomic analytic fields from 2000 to 2018 (section ‘The trend of genomic data visual analytics’).

Comparison of traditional and new methods for genomic data visualisation

Along with personalised cancer medicine development, cancer genomic data visualisation in the clinical setting is becoming a key topic. Using computational and statistical methodologies, effective visualisation is crucial to successful extraction of knowledge from oncogenomic data by experts. High-throughput technologies allow the comparison of the genomic sequences, epigenomics profiles, and transcriptomes of tumour cells with those of normal cells. Visualisation techniques and tools can integrate different type of alterations with clinical experience to show vast amount of multidimensional oncogenomic data in different types of plots such as heatmaps, genomic coordinates, and networks.13,36,37 Efficient tools, that support the visual stratification of a tumour genomic profiles and that highlight their relationships to know drugs or treatments, will be more useful than the existing research-oriented tools.13,38

Researchers and doctors usually combine different visualisation methods in a typical analysis procedure to assist their work. For example, they need first to normalise experimental and batch differences between samples and then to identify differentially regulated genes based on a fold-change level when comparing across samples, such as between a healthy and a non-healthy tissue. In this procedure, principal component analysis or partitioned clustering algorithms39,40 can be used to group together genes with similar behaviour patterns, then scatter-plotting is the typical visualisation to represent such groupings. Then, categorising genes with similar behaviour patterns across time, hierarchical clustering based on expression correlation can be performed with clustering heatmaps which can allow data from distant genome loci to be grouped and visualised together for comparison.41,42

Nowadays, new visualisation tools and methods such as cluster analysis, AI, and VR are introduced by different groups of people including designers, software developers, and scientists. They try to combine existing visualisation tools with new technological opportunities, especially AI and VR, to maximise human knowledge and intuition.43-45 Figure 1 shows the genomic visualisation methods used in recent years: scatter plots, cluster, matrix heatmaps, genomic coordinates, networks, AI, and VR from screenshots of tools that are frequently used in cancer genomics research distributed according to their visualisation principles. Two-dimensional and 3D scatter plots, networks, heatmaps, and coordinates are four traditional statistical visualisation methods for genomic data which are still key methods in current popular visual analytic tools, and clustering could support all the four methods to enhance the classification of these methods. Clustering is also an AI technique that involves the grouping of data points to classify each data point into a specific group. Artificial intelligence algorithms support visualisation by automatically identifying patterns and making highly accurate prediction, meanwhile visualisation methods can interpret AI by framing predictive modelling problem and evaluating the outcome. Interactive visualisation work has been extended to emerging environments such as VR, AR, large, and high-resolution displays as well as mobile devices. Virtual reality, augmented reality, immersive, and mobile are the new environments for data visualisation to make the interactions with data in a more natural or easier way. Genomic and cancer visualisation tools have supported new environments to enhance human’s perception in such environments.46 The tools usually include multiple visualisation methods, for example, Integrative Genomics Viewer (IGV) uses both scatter plot and genomic coordinate and UCSC uses scatter plot, clustering, and genomic coordinate. We provide a summary of popular visualisation methods, their description, and the tools in Table 1. We also illustrate the popular genomic data visualisation methods and the environments in Figure 1, including scatter plots, cluster, heatmap, networks, genomic coordinates, AI, and VR.

Figure 1.

Figure 1.

Genomic data visualisation methods and environments: scatter plots, cluster, heatmap, networks, genomic coordinates, AI, and VR for visualisation. Two-dimensional and 3D scatter plots, networks, heatmaps, and coordinates are four traditional statistical visualisation methods for genomic data which are still main methods in current popular visual analytic tools and clustering could support all the four methods to enhance the classification of these methods. Artificial intelligence algorithms support visualisation by automatically identifying patterns and making highly accurate prediction, while visualisation methods can aid or interpret AI by framing predictive modelling problem and evaluating model. Virtual reality/augmented reality/immersive/big screen/tablets are new environments for data visualisation to make the interactions with data in a more natural or easier way.

Table 1.

Summary of popular visualisation methods, their description, and the tools.

Description Example vVisualisation tools
Two-dimensional scatter plot The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them. If the variables are correlated, the points will fall along a line or curve. The better the correlation, the tighter the points will hug the line.47 IGV48
UCSC49
Three-dimensional scatter plot Three-dimensional scatter plots are used to plot data points on three axes in the attempt to show the relationship between three variables. Each row in the data table is represented by a marker whose position depends on its values in the columns set on the X, Y, and Z axes. The fourth variable can be set to correspond to the colour or size of the markers, thus adding yet another dimension to the plot.50 Medical Data Visualisation51
Heatmap A heatmap is a graphical representation of data that uses a system of colour-coding to represent different values. A common method of visualising gene expression data is to display it as a heatmap. In heatmaps, the data are displayed in a grid where each row represents a gene and each column represents a sample. The colour and intensity of the boxes are used to represent changes in gene expression.52 ngs.plot53
Gitools54
PARADIGM55
Clustering A cluster is a group of similar elements. Each cluster can be represented by a profile, either a summary measure such as a cluster means or one of the elements itself, which is called a medoid or centroid.56 Medical Data Visualisation51
UCSC57
Network A network graph uses information from both the link and the node datasets to generate a graphical depiction of the network. The nodes and links in a network graph can be arranged in a variety of layout patterns.58 Cytoscape59
Genomic coordinate Genomic coordinate can visualise single-nucleotide polymorphism (SNP) including their physical location relative to their host gene and the structure of the relevant transcripts to provide intuitive supplements to the understanding of their functions.60 UCSC57
IGV48
RNASeqBrowser61
GATK25
Savant Genome62
Artificial intelligence (AI) Artificial intelligence is a term of cognitive technologies and a big forest of academic and commercial work around the science and engineering intelligent machines. Artificial intelligence has many branches with many significant connections and commonalities among them, in which machine-learning is one of the branches.15 DeepVariant63
GDC DAVE64
Virtual reality (VR) Virtual reality is by immersing the user in a digitally created space and simulated movement in three dimensions, it should be possible to greatly increase the bandwidth of data available to our brains.65 UWS Microsoft HoloLens Visualisation

We now explain and evaluate each visualisation method with example tools in the following paragraphs. We also analyse the combinations between these methods and how to use them in research and clinical fields.

Scatter plots

The scatter plots use horizontal and vertical axes to plot data points and display how much one variable is affected by another. The diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them.47 A scatter plot is a simple way to visualise genetic similarity of the patients. For example, Figure 2 shows a scatter plot of 100 acute lymphoblastic leukaemia patients with a 2D scatter plot that shows their genetic similarity. Patients’ locations are decided by their genetic properties. Two patients are close together if their genes are similar, while they are located far from each other if their genetic properties are different. The visual mapping includes the following: (1) colour → risk stratification (red, very high risk; orange, high risk; blue, medium risk; green, normal; and purple, unknown), (2) shape → gender (O, female; X, male), and (3) bar → status (top-bar, deceived; no-bar, survived). We can see from it that most of the deceased patients are located in the top-left area.51

Figure 2.

Figure 2.

A scatter plot of 100 acute lymphoblastic leukaemia patients. Two-dimensional scatter plot showing their genetic similarity. The visual mapping includes the following: (1) colour → risk stratification (red, very high risk; orange, high risk; blue, medium risk; green, normal; and purple, unknown), (2) shape → gender (O, female; X, male), and (3) bar → status (top-bar, deceived; no-bar, survived). It shows that most of the deceased patients are located in the top-left area.51

UCSC Cancer Genomics Browser is a web-based application for hosting, visualising, and analysing cancer genomic datasets with multidimensional visualisations.57 The UCSC scatter plots are used to quickly and easily see the relationship between any two variables or columns of data such as glioblastoma multiforme (GBM) and lower grade glioma (LGG) samples.49

Three-dimensional scatter plot is used to discover relationships between three variables at the same time and is boosted by the recent widespread use of VR devices. Even though VR has been in development for decades, only recently are into producing compelling experiences. Virtual reality reveals spatially complex structures behind 3D data and 3D scatter plots and can solve the problematic issues on common 2D scatter plots such as overlapping of data and the absence of depth perception.66 Some genomic and cancer data visualisation tools such as Medical Data Visualisation started to use 3D scatter plots and supported mixed reality devices such as Microsoft HoloLens.

Heatmaps

Heatmap is a 2D graphical false-colour image representation of data which makes use of a predefined colour scheme, and different colours display different values and variations in a data matrix. Heatmap plot is a fundamental method in genomic data visualisation and is broadly used to unravel patterns hidden in genomic data, especially popular used for gene expression analysis and methylation profiling.67 Many genomic visualisation tools provide heatmap plots, such as ngs.plot, Gitools, and PARADIGM. Figure 3 shows a heatmap for comparing gene of interests between four patients: ALL92, ALL129, ALL321, and ALL323 which were chosen by users.

Figure 3.

Figure 3.

Heatmap for comparing gene between different patients: ALL92, ALL129, ALL321, and ALL323.51

Heatmaps are very handy for large, multidimensional dataset visualisation. High-throughput gene expression data are often displayed using heat maps: data are displayed in a grid where each row represents a gene and each column represents a sample. Colour and intensity of each box represent variations of gene expression. Scientists often use green-black-red heat maps to visualise gene expression data from microarrays.68

Most heatmap representations are also combined with clustering methods to group genes or samples based on their expression patterns. Each gene is represented as a row and is colour-coded to represent the intensity of its variation, such as positive or negative, relative to a reference value, and biological samples are represented as columns in the grid.69

Genomic coordinates

Genomic coordinate plot is a common way to visualise oncogenomic data to show alterations tied to their genomic loci. UCSC, IGV, RNASeqBrowser, GATK, and Savant Genome provide genomic coordinates. The different tool may have the different focus but most of them can display genomic topography of alterations in each tumour samples as genomic tracks to inspect particular genome loci.

Integrative Genomics Viewer is a lightweight visualisation tool for interactive exploration of integrated genomic datasets and it makes use of efficient, multi-resolution file formats to enable intuitive real-time exploration of diverse, large-scale genomic datasets on standard desktop computers. Integrative Genomics Viewer can handle large heterogeneous dataset to provide a smooth and intuitive user experience at all levels of genome resolution. It uses special data tiling technique which is a pyramidal data structure to support interactive exploration of large-scale genomic datasets on standard desktop computers.70

In IGV, all tracks can be annotated with a coordinate application colour-coded sample and clinical information. Genomic regions can be annotated with text labels.71 Figure 4 shows an IGV attribute panel that displays a colour-coded matrix of phenotypic and clinical data. Just below the command bar is a header panel with an ideogram representation of the currently viewed chromosome, along with a genome coordinate ruler that indicates the size of the region in view. The remainder of the window is divided into one or more data panels and an attribute panel. Data are mapped to the genomic coordinates of the reference genome and are displayed in the data panels as horizontal rows called ‘tracks’. Each track typically represents one sample, experiment, or genomic annotation. If any sample or track attributes have been loaded, they are displayed as a colour-coded matrix in the attribute panel. Each column in the matrix corresponds to an attribute, and a track’s attribute values are displayed as a row of coloured cells adjacent to the track.70

Figure 4.

Figure 4.

Integrative Genomics Viewer’s genomic coordinates show a colour-coded matrix of phenotypic and clinical data. Just below the command bar is a header panel with an ideogram representation of the currently viewed chromosome, along with a genome coordinate ruler that indicates the size of the region in view. Data are mapped to the genomic coordinates of the reference genome and are displayed in the data panels as horizontal rows called ‘tracks’. Each track typically represents one sample, experiment, or genomic annotation.70

Networks

Networks can show functional relationships between different genomic entities to allow the researchers to explore visually clusters of nodes representing highly interconnected altered genes that can constitute driver pathways or subnetworks. Cytoscape provides network visualisation in genomic research.

Cytoscape is an open-source software for visualising complex networks and integrating these with any type of attribute networks desktop data such as genomic data and clinical patient information. Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. The software is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features.59

Figure 5 shows breast cancer genomic data visualisation with network method from Cytoscape v3.4.0. The upper network shows the gene ontology (GO) analysis based on the biological process of the 513 differentially expressed genes (DEGs), and the bottom network shows the KEGG pathway analysis of the 513 DEGs.72

Figure 5.

Figure 5.

Network visualisation from Cytoscape v3.4.0. The upper network shows the GO analysis based on the biological process of the 513 DEGs, and the bottom network shows the KEGG pathway analysis of the 513 DEGs.72

Cluster

Cluster is a strategy that is used to combine other visualisation methods such as scatter plots, heatmaps, and networks. For example, Medical Data Visualisation uses scatter plot cluster, while UCSC uses heatmap cluster. A cluster is usually a group of similar elements that can be represented by a profile, either a summary measure such as a cluster means or one of the elements itself.

Clustering combined with heatmaps enable grouping of genes or samples which can be obtained through high-throughput sequencing methods such as RNA sequencing or DNA microarray studies together. Clustering is useful in visualising similarity of gene expression pattern.68 Figure 6 shows a clustering heatmap to explore relationships between somatic mutation profiles, genomic subtypes, and survival. It illustrates the somatic mutation profile of the significantly mutated genes in The Cancer Genome Atlas (TCGA) project acute myeloid leukaemia (AML) cohort, as well as the corresponding AML subtype designations for these samples.73

Figure 6.

Figure 6.

UCSC shows clustering heatmaps to explore relationships between somatic mutation profiles, genomic subtypes, and survival. (A) Somatic mutations for the most significantly mutated genes in The Cancer Genome Atlas (TCGA) project AML tumour samples. Samples are arranged in rows and genes in columns. A strong concordance is observed between miRNA cluster 3 (orange), DNA methylation cluster 3 (also orange), and intermediate cytogenetic risk (light blue); and between miRNA cluster 5 (green), DNA methylation cluster 5 (also green), and favourable cytogenetic risk (dark blue). (B) Column 1 represents the miRNA expression clusters, Column 2 represents the DNA methylation clusters, and Column 3 represents cytogenetic risk category for the AML cohort.

Clustering method can combine scatter plots, network, and genomic coordinate methods to show a group of similar elements. Clustering data can identify a subset of representative examples to process sensory signals and detect patterns in data. Clustering data based on a measure of similarity is a critical step in scientific data analysis and in engineering systems. A common approach is to use data to learn a set of centres such as the sum of squared error between data points and their nearest centres is small.74

Artificial intelligence for genomic data visualisation

In recent years, AI has started to be used in big data visualisations, including multivariate genomic data for development of quicker hardware.75 Machine learning is one branch of the field of AI, and it is a way of solving problems without explicitly codifying the solution and a way of building systems that improve themselves over time. Machine-learning goals are typically used to build predictive or descriptive models from characteristic features within a dataset and then use those features to draw conclusions from other similar datasets. For example, in cancer detection, diagnosis, and management, machine learning helps identify significant factors in high-dimensional datasets of genomic, proteomic, chemical, or clinical data which can be used to understand the predicate underlying diseases, as well as to provide possible insights into effective disease-management strategies. Machine learning combined with data visualisation should have three stages: developing an algorithm, applying genomic data to the algorithm, and predicting new unlabelled data.76 Figure 7 shows a canonical example of a machine-learning application with these three stages. A training set of DNA sequences is provided as input to a learning procedure, along with binary labels indicating whether each sequence is centred on a transcription start site (TSS) or not. The learning algorithm produces a model that can then be subsequently used, in conjunction with a prediction algorithm, to assign predicted labels (such as ‘TSS’ or ‘not TSS’) to unlabelled test sequences. In Figure 7, the red-blue gradient might represent, for example, the scores of various motif models (one per column) against the DNA sequence.

Figure 7.

Figure 7.

A canonical example of a machine-learning application for DNA sequences, A training set of DNA sequences is provided as input to a learning procedure, along with binary labels indicating whether each sequence is centred on a transcription start site (TSS) or not. The learning algorithm produces a model that can then be subsequently used, in conjunction with a prediction algorithm, to assign predicted labels (such as ‘TSS’ or ‘not TSS’) to unlabelled test sequences.76

DeepVariant is a tool that uses the latest AI techniques to build a more accurate picture of a person’s genome from sequencing data. The tool fed the data from millions of high-throughput reads and fully sequenced genomes from the Genome in a Bottle (GIAB) project, a public-private effort to promote genomic sequencing tools and techniques, to a deep-learning system and painstakingly tweaked the parameters of the model until it learned to interpret sequenced data with a high level of accuracy.77 DeepVariant is a genomic variant caller which uses deep neural networks to call genetic variants in germline genomes. It is originally developed by Google Brain and Verily Life Science and it won the 2016 PrecisionFDA Truth Challenge award for Highest SNP Performance.63

The future of big data visual exploration will involve the tight integration of visualisation tools with traditional techniques from such disciplines as statistics, machine learning, operations research, and simulation. Visual exploration also needs to combine fast automatic data mining algorithms with the intuitive power of the human mind which can improve the quality and speed of the data exploration process.78

Virtual reality and augmented reality

Virtual reality enables the psychophysical immersive experience in an artificially computer-generated virtual environment.79 Augmented reality, usually, is built upon VR in integrating and overlaying the virtual environment into the user’s real world and allowing the user to interact with the virtual objects in the context of his or her actual surroundings.80,81 Special equipment such as a head-mounted display (HMD) or cave automatic virtual environment (CAVE) system is required for the use of VR/AR technologies. The sensor and camera on the equipment will help the system to determine and track the user moment and move the point of view accordingly.

Shan et al80 developed an AR visualisation which runs on the mobile platform to deliver real-time 3D brain tumour volume rendering. It allows the clinician to visualise and communicate with the patients on their tumour sizes and locations. The visualisation uses the facial features of the patient as the tracking point to project the reconstructed brain tumour model onto the same location as the subject’s actual anatomy. Chang et al81 have created a 3D AR visualisation for archaeological purposes. It uses the ARToolKit in rendering the objects. The purpose of the visualisation is to create a platform for underground cultural heritage protection and research.

Some analysts even think the application of AI to VR enables important possibilities such as AI-based continuous image recognition reporting results in a VR display.82 One of the biggest challenges of big data is extracting information in a way that enables clinicians to quickly use the vast amount of data to analyse the purpose of making better decisions in a timely manner. Immersive environments such as AR and VR can measure people’s reactions of large datasets to understand the subconscious process of the human brain to determine the optimum amount of information. Virtual reality either simplifies the visualisations so as to reduce the cognitive load, thus keeping the user less stressed and more able to focus, or it will guide the person to the areas of the data representation that are not as heavy in information.83,84

Children Cancer Data Visualisation tool can show the whole group of patients’ data with a 3D scatter plot and check a single patient’s details. It can also zoom and rotate the visualisation plot, compare gene among several patients, and interact with users and shows the comparison visualisation between selected patients.22 The tool supports different mobile operating systems such as iOS and Android, and VR devices. Figure 8 shows a 3D scatter plot from the tool running on Microsoft HoloLens, which is a pair of mixed reality smart glasses developed and manufactured by Microsoft. HoloLens gained popularity for being one of the first computers running the Windows Mixed Reality platform under the Windows 10 operating system and it can trace its lineage to Kinect, an add-on for Microsoft’s Xbox gaming console that was introduced in 2010.85

Figure 8.

Figure 8.

Children Cancer Data Visualisation tool running in Microsoft HoloLens. It shows a 3D scatter plot and checks individual patient’s details.

The trend of genomic data visual analytics

We compared genomic data visualisation tools via the timeline since 2000s. We evaluated the trend of visual analytics and the current status of these tools. Particularly, the usefulness of the software and how the tools assist with genomic analysis are evaluated.

Rapidly evolving genomic and cancer data, and intelligent visualisations

‘A picture is worth a thousand words’ – this is an adage especially for life science which is one of the biggest generators of enormous datasets because of recent and rapid technological advances. The complexity of genomic data makes these datasets incomprehensible without effective visualisation methods. Genomic data visualisation is a rapidly evolving field and great progress has been achieved in many areas including hardware acceleration, standardised exchangeable file formats, dimensionality reduction, visual feature selection, multivariate data analyses, interoperability, 3D rendering, and visualisation of complex data at different resolutions, especially the area of image processing combined with AI-based pattern recognition.86

Interactive visualisation of complex genomic data is an effective way to bring the insight of information and to discover the relationships, non-trivial structures, and irregularities that may pertain to the disease course of the patient. Basic statistics and visualisations without effective interaction and capabilities to control the visual data mining process are often insufficient for the analysis and exploration process. Intelligent visualisation can focus on patient-to-patient comparisons through the biological data and then display the multidimensional data in cooperation with the automated analysis.51 Intelligent genomic visualisation can support experts in the process of hypotheses generation concerning the roles of genes in diseases and find the complex interdependencies between genes by bringing gene expressions into context with pathways.87

The evolution of genomic data visual analytics

Figure 9 shows the tools for visual analytics of genomic and cancer data grouped in the years they started to be developed or extracted from papers written during those years. We can see that between 2000 and 2015, most genomic data visualisation tools only use some traditional methods such as scatter plots, heatmaps, genomic coordinates, networks, and clustering. From 2016, new visualisation techniques started to be used such as machine-learning algorithms for predictions and personalised medicine. Some visualisation tools can be ran on environments such as mobile devices and VR/AR/immersive big screen. Some tools were used for a short time such as X:Map and GenomeComp, while some tools were developed very early before 2010, but kept being updated and added new features until now such as GATK and Cytoscape, which are still very popular genomic data visualisation tools now. Integration among tools is also a key to keep a tool lasting for a longer time. For example, Epiviz can obtain annotation data from the UCSC, Gitools can get heatmaps from IGV, and RNASeqBrowser is compatible with UCSC as shown in Figure 9 with purple arrows.

Figure 9.

Figure 9.

Timeline and integration of tools; the blue arrows stand for timeline and the purple arrows stand for integration. Between 2000 and 2015, most genomic data visualisation tools only used some traditional methods such as scatter plots, heatmaps, genomic coordinates, networks, and clustering. From 2016, new visualisation methods started to be used such as machine-learning algorithms for predictions and personalised medicine.

Table 2 shows the tool list for visual analytics of genomic and cancer data. Some tools have not been updated recently such as GenomeComp, X:Map, PARADIGM, and ngs.plot, while most tools are still being maintained very well or upgraded with new technologies such as IGV and other tools as shown in Figure 9. Some non-updated tools are still used and can be downloaded from online. GenomeComp is a visualisation tool which is implemented as a stand-alone programme that can compare, parse, and visualise large genomic sequences, especially closely related genomes such as interspecies or interstrain.90 It was developed by Laboratory of Bioinformatics, Institute of Biophysics, Beijing, and use Perl/TK, and can run on Linux, Unix, Mac OS X, and Microsoft Windows operating systems. The last version update happened in 2004.98 X:Map is a genome annotation database browser developed by the University of Manchester, UK, around 2008. It is a tool designed for annotation and visualisation of genome structure for Affymetrix exon array analysis.89 PARADIGM is a tool which focuses on inferring patient-specific genetic activities incorporating curated pathway interactions among genes and can predict the degree to which a pathway’s activities are altered in the patient using probabilistic inference.55

Table 2.

Tools for visual analytics of genomic and cancer data.

Tool name/website Description Visualisation methods Developer/year Tool type
Genome Analysis Toolkit (GATK)
https://software.broadinstitute.org/gatk/
Genome Analysis Toolkit (GATK) is designed to process exomes and whole genomes generated with illumine sequencing technology and can also be adapted to handle a variety of other technologies and experimental designs. This toolkit focuses on the variant discovery and also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data.88 Genomic coordinates, cluster, 2D scatter plot, AI Broad Institute of MIT and Harvard
2004 to current
Structured Java programming framework
X:Map
http://xmap.picr.man.ac.uk
X:Map is a tool which is designed specifically for high-density microarrays that are required to show for each gene, transcript, and exon the probe sets that match it, their specificity, and for each probe, their locations of potential hybridisation and for each individual exon, its sequence.89 Heatmap, genomic coordinates University of Manchester, UK
2008
Genome annotation database browser
GenomeComp
http://www.mgc.ac.cn/GenomeComp/
GenomeComp is a visualisation tool which is implemented as a stand-alone programme that can compare, parse, and visualise large genomic sequences, especially closely related genomes such as interspecies or interstrain.90 Genomic coordinates Laboratory of Bioinformatics, Institute of Biophysics, Beijing
2002-2004
Use Perl/TK, run in Linux, Unix, Mac OS X, and Microsoft Windows
Epiviz
http://epiviz.cbcb.umd.edu/
Epiviz is a genomic information visualisation tool which can quickly and easily visualise and compare large amounts of genomic information resulting from high-throughput sequencing experiments. It is the first system to provide tight integration between a state-of-the-art analytics platform and a modern, powerful, integrative visualisation system for functional genomics.91 Heatmaps, 2D scatter plot, genomic coordinates University of Maryland
2014 to now
Web-based genome browsing application
Gitools
http://www.gitools.org/
Gitools is a desktop application for analysis and visualisation of matrices using interactive heatmaps which contain multiple dimensions. It has interactive capabilities to allow the user to filter, sort, move, and hide rows and columns in the heatmaps. Gitools is especially useful for cancer genomic analysis as it includes all the methods implemented for some integrative sources and can import data directly from some other tools.54 Heatmaps Biomedical Genomics Group located in Barcelona at the Biomedical Research Park in Barcelona
2011 to current
Desktop application
UCSC
https://genome-cancer.ucsc.edu/
UCSC Cancer Genomics Browser is a web-based application for hosting, visualising, and analysing cancer genomic datasets. The browser provides interactive views of data from genomic regions to annotated biological pathways and user-contributed collections of genes.57 Heatmap, cluster UCSC in the University of California system
2015 to current
Web-based application
Integrative Genomics Viewer (IGV)
http://software.broadinstitute.org/software/igv/
Integrative Genomics Viewer (IGV) is a lightweight visualisation tool for interactive exploration of integrated genomic datasets and it supports a wide range of genomic data including aligned sequence reads, mutations, copy number, RNAi screen, gene expression, methylation, and genomic annotations.71 Heatmap, genomic coordinates, cluster, 2D scatter plot Broad Institute, the University of California
2013 to current
Visualisation tool for integrated genomic datasets
Savant Genome Browser
http://www.genomesavant.com/p/home/index/
Savant Genome Browser is a sequence annotation, desktop visualisation, and analysis browser for genomic data. This tool was primarily developed for the effective visualisation of large sets of high-throughput sequencing data. Multiple visualisation modes enable the exploration of genome-based sequence, points, intervals, or continuous datasets. Plug-ins are available, among which is the WikiPathways plug-in, which aids the navigation of the data by the integration of pathways.62 Genomic coordinates, heatmap, cluster The Computational Biology Lab at the University of Toronto.92
2010 to current
Desktop visualisation and analysis browser for genomic data
PARADIGM
http://sbenz.github.io/Paradigm/
PARADIGM is a tool which focuses on inferring patient-specific genetic activities incorporating curated pathway interactions among genes and can predict the degree to which a pathway’s activities are altered in the patient using probabilistic inference. CircleMap is one of the PARADIGM visualisation methods that produce heatmaps with a circular layout.55 Heatmap Charles Vaske, Steve Benz, University of California, Santa Cruz
2010
A factor graph framework for pathway inference on high-throughput genomic data
CaleydoStratomeX
http://caleydo.org/tools/stratomex/
CaleydoStratomeX is a visual analytic framework prepared for the visualisation of interdependencies between multiple datasets. It allows exploration of relationships between multiple groupings and different datasets. It can cluster genomic data of different alterations and represents them as matrix heatmaps. The different groupings are connected by ribbons whose width corresponds to the number of samples shared by the connected clusters. Clinical data and pathway maps can be integrated to characterise the clusters.93 Heatmap, cluster Marc Streit, Linz, Alexander Lex, Nils Gehlenborg, Christian Partl, Samuel Gratzl, Hanspeterpfister, Dieter Schmalstieg, and Peter J. Park.94
2012 to current
StratomeX is a visual analytic framework for the analysis of multiple stratified datasets
Regulome Explorer
http://explorer.cancerregulome.org
Regulome Explorer is a tool for the visualisation options that includes circular and linear genomic coordinates and networks.95 The Cancer Genome Atlas takes an integrated approach towards a systems-level understanding of regulatory disruptions in cancer which are intertwined within complex dynamical networks through a multitude of interactions among different types of molecules.96 Heatmap,
genomic coordinates
Institute for Systems Biology and MD Anderson Cancer Centre
2016 to current
A tool for the integrative exploration of associations between clinical and molecular features of data
Cytoscape
http://www.cytoscape.org
Cytoscape is an open-source software for visualising complex networks and integrating these with any type of attribute Networks Desktop data such as genomic data and clinical patient information.59 Networks US National Institute of General Medical Sciences (NIGMS) and National Resource for Network Biology (NRNB).
2003 to current
An open-source software platform for visualising complex networks
ngs.plot
https://code.google.com/p/ngsplot
ngs.plot is a tool to help understand the relationship between the millions of functional DNA elements and their protein regulators and demonstrate how they work in conjunction to manifest diverse phenotypes. ngs.plot uses two steps to quickly mine and visualise genome samples: the first step is to define a region of interest and the second step is to plot something meaningful.53 Heatmap Peter Briggs from the University of Manchester
supported by the Friedman
Brain Institute; and the National Institutes of Health
2014
A quick mining and visualisation tool for NGS data
Programming language is R and Python
GDC DAVE (Genomic Data Commons Data Analysis, Visualisation, and Exploration)
https://gdc.cancer.gov/analyse-data/gdc-dave-tools
GDC DAVE Tools allow users to interact intuitively with GDC data and promote the development of a true cancer genomics knowledge base, which including the following key features: view most frequently mutated genes, plot high-impact mutations using oncoGrid, perform survival analysis, visualise mutations for protein-coding regions, view cancer distribution, view top mutated genes across projects, view genes annotated by COCMIC, build and compare custom cohorts, and perform set operations.64 Heatmap, 2D scatter plot, cluster The National Cancer Institute (NCI) Centre for Cancer Genomics (CCG) from Maryland, USA
2016 to current
GDC Data Portal
VarDict
https://github.com/AstraZeneca-NGS/VarDict
VarDict is a novel and versatile variant caller for both DNA- and RNA-sequencing data and it simultaneously calls SNA, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumours.97 Heatmap, genomic coordinates AstraZeneca which is in the United States.
2016 to current
VarDict is implemented in Perl
DeepVariant
https://github.com/google/deepvariant
DeepVariant is a tool that uses the latest AI techniques to build a more accurate picture of a person’s genome from sequencing data. The tool fed the data from millions of high-throughput reads and fully sequenced genomes to a deep-learning system and painstakingly tweaked the parameters of the model until it learned to interpret sequenced data with a high level of accuracy.77 Artificial intelligence, genomic coordinates, heatmap Google Brain and Verily Life Science.
2016 to current
Deep neural networks to call genetic variants in germline genomes
RNASeqBrowser RNASeqBrowser is a visualisation tool that incorporates and extends the function of the UCSC genome browser and NGS visualisation tools such as IGV.61 Genomic coordinates, cluster JA, Australian Government Department of Health
2015 to current
A visualisation tool that incorporates and extends the function of UCSC and IGV
Children Cancer Data Visualisation Children Cancer Data Visualisation tool can show the whole group of patients’ data with a 3D scatter plot and check a single patient’s details, zoom and rotate the visualisation plot, compare gene among several patients, and interact with users and shows the comparison visualisation between selected patients22 3D scatter plot, heatmap, cluster, VR Western Sydney University
2016 to current
Developed by Java, Unity 3D

CircleMap is one of the PARADIGM visualisation methods that produce heatmaps with a circular layout. Different datasets coming from the same samples can be plotted as different layered circles that form a node. The data layers are plotted application maintaining the sample order, which can be adjusted by the user. CircleMap visualisation can be used to display multiple datasets centred around each gene in a pathway.55 The tool is a factor graph framework for pathway inference on high-throughput genomic data and was developed by Charles Vaske and Steve Benz from the Regents of the University of California, Santa Cruz in around 2010.

ngs.plot is a tool to quick mining and visualisation of next-generation sequencing data by integrating genomic databases. The tool visualises massive datasets and genomic information based on big sequencing data and it can produce 1 billion sequencing reads in a few days. ngs.plot uses two steps to quickly mine and visualise genome samples. The first step is to define a region of interest and the second step is to plot something meaningful.53 It is platform independent, and the programming languages are R and Python. It was produced by Peter Briggs from the University of Manchester, supported by the Friedman Brain Institute and the National Institutes of Health, and was developed in around 2014.

New visualisation techniques are applied to tools

More and more modern visualisation methods are applied to popular genomic visualisation tools. For example, Genome Analysis Toolkit (GATK) now has features for deep learning with AI technology using variants and annotations encoded as tensors, which carry the precise read and reference sequences, read flags, as well as base and mapping qualities.99 Genome Analysis Toolkit is a structured programming framework designed to process exomes and whole genomes generated with illumine sequencing technology and can also be adapted to handle a variety of other technologies and experimental designs. This toolkit focuses on the variant discovery and also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data.88 The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs and it can separate specific analysis calculations from common data management infrastructure for correctness, stability, and efficiency.25

DeepVariant is also a visualisation tool that uses machine-learning technique to identify all the mutations that an individual inherits from their parents and modelled loosely on the networks of neurons in the human brain.100 DeepVariant helps turn high-throughput sequencing readouts into a picture of a full genome. The tool developers are the researchers from the Google Brain team, who fed the data to a deep-learning system to interpret sequenced data with a high level of accuracy.77

VarDict is a tool that uses polymerase chain reaction (PCR) technology to amplify genes before submitting them to sequencing. VarDict’s abilities to detect PCR artefacts, such as amplicon bias and mispaired primers, together with the linear scalability to depth, make it desirable in such studies to reduce both false positives and false negatives. VarDict is a novel and versatile variant caller for both DNA- and RNA-sequencing data and it simultaneously calls special nucleic acids (SNAs), murine norovirus (MNV), insertion and deletion (InDels), complex and structural variants, and expanding the detected genetic driver landscape of tumours. VarDict has three main features: (1) performing scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumour evolution or detect tumour DNA circulating in blood; (2) performing amplicon-aware variant calling for PCR-based targeted sequencing which is often used in diagnostic setting; and (3) detecting differences in somatic and loss of heterozygosity variants between paired samples. VarDict uses data from TCGA Lung Adenocarcinoma dataset to call known driver mutations in KRAS, EGFR, BRAF, PIK3CA, and MET in 16% more patients than previously published variant calls.97

Some visualisation tools start to be available on VR/AR/immersive big screen and mobile devices such as Children Cancer Data Visualisation tools. It can show the whole group of patients’ data with a 3D scatter plot and check a single patient’s details, zoom and rotate the visualisation plot, compare gene among several patients, and interact with users and shows the comparison visualisation between selected patients.22 The tool now supports mobile devices, VR devices, and other immersive environments.

Tools are integrated with each other

Some visualisation tools can be integrated to do the tool-to-tool communication. For example, Epiviz can obtain annotation data from the UCSC57 genome browser.91 Epiviz is a genomic information visualisation tool that can quickly and easily visualise and compare large amounts of genomic information resulted from high-throughput sequencing experiments. As the first system to provide tight integration between a state-of-the-art analytic platform and a modern, powerful, integrative visualisation system for functional genomics, Epiviz can interactively support a number of widely used, state-of-the-art methods for (1) ChIP-seq where iterative visualisation of data and results of peak-calling algorithms is necessary; (2) RNA-seq analysis where both location-based coverage and feature-based expression levels are required; and (3) methylation analyses using location-based analysis at multiple genomic scales.91

Gitools can get heatmaps from IGV71 through load command and then send locate commands for selected rows in the heatmaps to IGV via IGV logo in the Gitools toolbar, which makes it easy to spot and compare genes of interest within IGV.101 Gitools is a desktop application for analysis and visualisation of matrices using interactive heatmaps which contain multiple dimensions. It has interactive capabilities to allow the user to filter, sort, move, and hide rows and columns in the heatmaps. Gitools is especially useful for cancer genomic analysis as it includes all the methods implemented for some integrative sources and can import data directly from some other tools. Gitools can be used by researchers without advanced knowledge on bioinformatics as well as more experienced users who need to perform many of the operations available using the command line.54

Savant Genome Browser is a sequence annotation, desktop visualisation, and analysis browser for genomic data. This tool was primarily developed for the effective visualisation of large sets of high-throughput sequencing data. Multiple visualisation modes enable the exploration of genome-based sequence, points, intervals, or continuous datasets. Plug-ins are available, among which is the WikiPathways plug-in, which aids the navigation of the data by the integration of pathways.62 Savant also planned to expand by allowing users to automatically download annotation tracks from various public resources such as the UCSC Genome Browser.62

RNASeqBrowser is another tool that can be compatible with UCSC files and extend the functionality over IGV. RNASeqBrowser is a visualisation tool that adds several new types of tracks to show NGS data such as individual raw reads, SNPs, and InDel; it can dynamically generate RNA secondary structure which is useful for identifying non-coding RNA such as miRNA, and it overlays NGS wiggle data to display differential expression. Paired reads are also connected in the browser to enable easier identification of novel exon/intron borders and chimaeric transcripts. Strand-specific RNA-seq data are also supported by RNASeqBrowser that displays reads above (positive strand transcript) or below (negative strand transcripts) the central line.61

Tools allow more interactions and more visual analytical methods

The active tools usually allow users to interact intuitively with data and choose multi-visualisation methods to support different research purpose. For example, the Genomic Data Commons Data Analysis, Visualisation, and Exploration (GDC DAVE)Visualisation tools use scatter plot to visualise mutations and their frequency across cases mapped to a graphical visualisation of protein-coding regions and use heatmap to visualise the top mutated genes across projects and the number of cases affected. GDC DAVE Tools’ web interface can analyse cancer genomic data, in real time, online, without the need to download or process the data. Users can navigate from project cohorts to individual patients, to specific genes and mutations of interest. DAVE uses specialised graphs to visualise genomic signatures of cancer and identify potential drivers of disease and also visualise patient survival curves and identify the molecular consequence of a mutation on resultant protein.102 DAVE Tools allow users to interact intuitively with GDC data and promote the development of a true cancer genomic knowledge base, which includes the following key features: view most frequently mutated genes, plot high-impact mutations using oncoGrid, perform survival analysis, visualise mutations for protein-coding regions, view cancer distribution, view top mutated genes across projects, view genes annotated by COCMIC, build and compare custom cohorts, and perform set operations.64

Discussion

Genomic research is critical to progress against cancer. Through the study of cancer genomes, abnormalities in genes have been revealed to drive the development and growth of many types of cancer. Genomic and cancer data visualisation tools can assist in improving our understanding of the biology of cancer and lead to new methods of diagnosing and treating the disease. Over the past decade, large-scale research projects have begun to survey and catalogue the genomic changes associated with a number of types of cancer which have revealed unexpected genetic similarities across different types of tumours. For instance, mutations in the HER2 gene, distinct from amplifications of this gene, for which therapies have been developed for breast, esophageal, and gastric cancers, have been found in a number of cancers, including breast, bladder, pancreatic, and ovarian.103

Personalised medicine refers to diagnosis and treatment based on a person’s entire DNA sequence. Variants in the DNA sequence determine the differences between individuals and differences between types of cells such as tumour cells and non-tumour cells. Targeted genomic cancer medicine uses the latest genome sequencing to look at the genetics of cancer rather than treating it based on location to allow us to understand the inherited cancer risk and find more effective treatments for people with cancer.104

The cancer genomic research field is rapidly evolving in parallel with advances in high-throughput genomic technologies. This evolution of the field requires continuous advancement in visualisation techniques and tools. As this rapid scientific evolution continues, cancer researchers are highly dependent on computers to manage, analyse, and visualise data. The conventional genomic and cancer data visualisation tools are two-dimensional and present data by enhancing with the creative use of colour and size, combination of space and time, and advanced computer graphics. Most visualisation tools have four visualisation methods: two-dimensional scatter plot, networks, heatmaps, and genomic coordinates. These traditional visualisation methods are used to graph genomic and cancer data, for example, IGV supports all the four visualisation methods.

Genomic and cancer data visualisation is entering a new era with emerging sources of AI and new visual environment equipment such as VR/AR/immersive big screen and mobile devices. New technologies and evolving cognitive framework are opening new horizons to enable more accurate and contextual data visualisation.

Artificial intelligence is playing an integral role in the evolution of the field of genomics. Genomics is closely related to precision medicine whose market size projected to reach $87 billion by 2023.105 The field of personalised medicine is an approach to patient care that encompasses genetics, behaviours, and environment with a goal of implementing a patient- or population-specific treatment method in contrast to a one-size-fits-all approach. Artificial intelligence and machine learning have been applied in genomics for analysing genome sequencing, gene editing, clinical workflow, and direct-to-consumer genomics. Future applications of machine learning in the field of genomics are diverse and may potentially contribute to the development of patient or population-specific pharmaceutical drugs to look at the role of genetics in the context of how an individual responds to drugs.106 While the field is still quite new, there is already some evidences of research involving machine learning. For example, what is regarded as the first study to apply machine-learning models to determine a stable dose of Tacrolimus in renal transplant patients was published in February 2017. Tacrolimus is commonly administered to patients following a solid organ transplantation to prevent ‘acute rejection’ of the new organ.107

Virtual reality and related technologies have been adopted in health care industry. Medical researchers have been exploring ways to create 3D models of patients’ internal organs using VR since the 1990s. Recently, VR and related technologies are used to plan complex operations, reduce anxiety in cancer patients, and help patients overcome balance and mobility problems resulting from stroke or head injury. Virtual reality environment is expected to bring a revolution in genomic data visualisation as one could integrate meta-genomic data in virtual worlds. Approaching the problem from a different angle, mixed reality devices such as Google Glass, HoloLens, and Magic Leap offer an AR experience which can facilitate the learning process of the biological systems because it builds on exploratory learning.

In summary, genomic and cancer data visualisation tools are essential to facilitate decision-making for the treatment methods or targeted medicine. New technologies have been used in recent years to create visualisation tools that can explore complex genomic data. Further efforts are needed to develop new tools to meet the changing needs of the field.

Acknowledgments

The authors would like to thank Hien Dang and Jesse Tran for their invaluable comments and proof reads.

Footnotes

Declaration of conflicting interests:The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding:The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research has been partially supported by the Impact Funding at Western Sydney University and Big Data, Big Impact Grant-Stage 2 from Cancer Institute of NSW, Australia.

Author Contributions: ZQ led the writing of the manuscript and did the pilot group study with the end-users of genomic visualisation tools who are cancer researchers and medical doctors. CWL contributed partially to the manuscript. QVN and YZ provided guidance and revision on the article, particularly on the technologies and methodology. DRC gave general direction on genomics and cancer research perspective as well as revision on the manuscript. DRC and QVN provide oversight and leadership to the team and initiated the projects.

References


Articles from Cancer Informatics are provided here courtesy of SAGE Publications

RESOURCES