Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Apr 26.
Published in final edited form as: J Biomed Inform. 2021 Mar 16;117:103732. doi: 10.1016/j.jbi.2021.103732

Search and visualization of gene-drug-disease interactions for pharmacogenomics and precision medicine research using GeneDive

Mike Wong b, Paul Previde a, Jack Cole a, Brook Thomas a, Nayana Laxmeshwar a, Emily Mallory c, Jake Lever d, Dragutin Petkovic a,b, Russ B Altman e, Anagha Kulkarni a,*
PMCID: PMC9042200  NIHMSID: NIHMS1792106  PMID: 33737208

Abstract

Background:

Understanding the relationships between genes, drugs, and disease states is at the core of pharmacogenomics. Two leading approaches for identifying these relationships in medical literature are: human expert led manual curation efforts, and modern data mining based automated approaches. The former generates small amounts of high-quality data, and the later offers large volumes of mixed quality data. The algorithmically extracted relationships are often accompanied by supporting evidence, such as, confidence scores, source articles, and surrounding contexts (excerpts) from the articles, that can used as data quality indicators. Tools that can leverage these quality indicators to help the user gain access to larger and high-quality data are needed.

Approach:

We introduce GeneDive, a web application for pharmacogenomics researchers and precision medicine practitioners that makes gene, disease, and drug interactions data easily accessible and usable. GeneDive is designed to meet three key objectives: (1) provide functionality to manage information-overload problem and facilitate easy assimilation of supporting evidence, (2) support longitudinal and exploratory research investigations, and (3) offer integration of user-provided interactions data without requiring data sharing.

Results:

GeneDive offers multiple search modalities, visualizations, and other features that guide the user efficiently to the information of their interest. To facilitate exploratory research, GeneDive makes the supporting evidence and context for each interaction readily available and allows the data quality threshold to be controlled by the user as per their risk tolerance level. The interactive search-visualization loop enables relationship discoveries between diseases, genes, and drugs that might not be explicitly described in literature but are emergent from the source medical corpus and deductive reasoning. The ability to utilize user’s data either in combination with the GeneDive native datasets or in isolation promotes richer data-driven exploration and discovery. These functionalities along with GeneDive’s applicability for precision medicine, bringing the knowledge contained in biomedical literature to bear on particular clinical situations and improving patient care, are illustrated through detailed use cases.

Conclusion:

GeneDive is a comprehensive, broad-use biological interactions browser. The GeneDive application and information about its underlying system architecture are available at http://www.genedive.net. GeneDive Docker image is also available from the previously mentioned URL, allowing users to (1) import their own interaction data securely and privately; and (2) generate and test hypotheses across their own and other datasets.

Keywords: Gene Interactions, Retrieval and Visualization, Gene Sets, Gene-Disease and Gene-Drug Relationships, Biomedical Information Retrieval

1. Introduction

Advancements in pharmacogenomics and precision medicine hinge on the availability of accurate information about gene-drug-disease interactions [1, 2, 3?, 4, 5]. Having easy and intuitive access to the known interactions between genes, drugs, and diseases is necessary for medical researchers and practitioners to understand cellular processes, disease states, and drug responses that are relevant to the medical case under consideration. Long-standing curation efforts, such as PharmGKB [6] underscore the importance and necessity of such data.

Reviewing the rapidly growing biomedical literature and maintaining a high-quality database of gene, drug, and disease interactions is a labor-intensive task, requiring highly-trained curators to assess the quality of the supporting evidence and to identify relevant entities. Capturing gene, drug, and disease interactions, in a way that is computationally accessible, further complicates curation. Due to the scale and complexity of the task, this manual approach of data curation is expensive both in time and in effort. Consequently, curators have to limit the scope of the task by starting with key genes, drugs, or literature central to a given pathology, disease state, or species of interest. This selective and subjective coverage of the literature consequentially leads to curation bias towards well-studied or high-impact work, while potentially missing important relationships that are under-studied or under-reported in the literature.

Modern data mining approaches designed to algorithmically identify and extract gene interactions from biomedical literature have demonstrated promising ability to scale to big data and thus also reduce curation bias [7, 8, 9, 10, 11, 12]. However, the interactions data extracted by these techniques, although abundant, is of mixed quality. As a result, for this interactions data to be useful to medical researchers and practitioners, it is essential for it to be presented such that (1) the data volume does not overwhelm the user; (2) the supporting evidence and context for each interaction is made easily available; (3) the data quality threshold can be controlled via an interactive and easy-to-use interface, as per the user’s risk tolerance level; and (4) longitudinal and exploratory research investigations are well supported.

Many interaction databases offer search and visualization tools, such as CTD, DisGeNet, and Open Platform, and provide features well-suited for their particular data, but universally lack one of the four aforementioned qualities. They do not address the issue of information overload management by integrating an association score and supporting evidence with each interaction. Lastly, these databases are tied to their data sets. Existing gene interaction visualization tools such as Cytoscape[13], VisANT[14] and Literome[15] provide some of these features but not all. For example, Cytoscape, a powerful search and visualization application and JS framework, does not easily sort or filter interactions by confidence score or display supporting evidence. Allowing users to investigate interaction data by putting evidence side-by-side with the extracted interactions can accelerate curation and discovery. Motivated by these observations, we have designed and developed GeneDive, a web-based information retrieval, filtering, topology discovery, and visualization tool that processes millions of gene-drug-disease interactions efficiently [16]. Additionally GeneDive easily integrates user-provided interactions data with other datasets without requiring the user to share or disseminate their interaction data.

The following biomedical resources are also worth mentioning here: OpenTargets platform[17]1, DisGeNet[18]2, CTD[12]3. To the best of our knowledge, two of these resources (MetaCore, and Data4Cure) are commercial products. The other three (OpenTargets, DisGeNet, and CTD) are databases with search and visualization tools, and as such do not allow user-provided data. CTD does not provide interaction confidence score ordering/filtering, and DisGeNet does not display supporting evidence. Furthermore, all three do not offer three key functionalities: 1) search for multiple entities in a single query (they either only allow single entity queries or batch queries); 2) the multi-hop search modalities; and 3) the network visualization of results. As such, the search and visualization tools in these databases do not directly address the issue of information overload, although OpenTargets is closest towards achieving this objective.

GeneDive offers a powerful suite of search and visualization modalities that reveal direct and indirect relations between Disease, Gene, and dRug (DGR) entities, and uncover networks of interactions while also providing ready access to the supporting evidence.

1.1. Design

GeneDive’s design is driven by three key objectives:

  1. Provide tools to combat the information-overload problem while promoting assimilation of supporting evidence: To manage the overabundance of data, GeneDive is designed to offer a multi-pronged solution involving various search modes, multiple result views, a sophisticated toolkit for data filtering, sorting, and highlighting, and data visualization capabilities. To facilitate information assimilation, GeneDive juxtaposes the supporting evidence and context for each DGR interaction, which can be used to calibrate the quality threshold that aligns with the user’s risk tolerance level. The quality threshold can then be used filter out interactions and thus provides an additional information-overload control.

  2. Facilitate data integration while respecting data ownership: To enable users to import their own DGR interaction dataset(s) into GeneDive, without ever uploading their data to the GeneDive server. Any combination of GeneDive native and user dataset(s) may be selected to work within the GeneDive application. Thus, enabling consolidation of multiple datasets without compromising data ownership and privacy. The ability to coalesce information from varied sources is crucial to formulating and testing hypotheses.

  3. Support longitudinal and exploratory investigations that can be reproduced: To enable research investigations that span long periods of time, GeneDive users can save the current state of their activity and restart it from exactly that state at a later point in time. To facilitate worry-free data exploration, GeneDive provides Undo and Redo capabilities, which are available even after the session has been restarted. Being able to return to a known-good-state (either via restoring a saved state or undoing to a known state) encourages branched hypothesis testing, changing the line of investigation based on multiple what-if query results.

1.2. Implementation

GeneDive was developed using agile software development approaches [19] and User Centered Design, including iterative design and development by a cross-functional team consisting of computer scientists, developers, biomedical researchers, a physician, a curation director, and curators.

1.2.1. System Architecture

GeneDive is a single-page web application following the widely-used Model-View-Controller (MVC) architecture [20]. Fig 1 provides the system architecture diagram for the GeneDive application. The Controller component of the GeneDive architecture is shown at the top of the diagram (teal background box). The various modules under this component denote the features that control and manipulate the data that is searched, retrieved, and displayed. The View component of the GeneDive is represented by the two modules in the bottom right corner (blue background) of the diagram, indicating that two presentation modes (tabular and graphical) are used by GeneDive to show the results. The Model component of the GeneDive architecture is shown by two boxes, the Backend modules (green background), and Interactome modules (beige background). The data that will be searched and the core application logic resides in the Model component. The separation between the three components (Model, View, and Controller) that is promoted by the MVC design pattern has allowed the GeneDive application to evolve and absorb new features, functionalities, and data sources without needing complete refactoring of the codebase. The sub-components of each module are described next.

Figure 1:

Figure 1:

GeneDive architecture diagram showing data sources and workflow.

1.2.2. GeneDive: Native Data

As shown in the GeneDive Interactome Model component of Fig 1, GeneDive collates information from several public databases. Gene IDs, symbols, and aliases are obtained from NCBI Entrez [21], and MSigDB C2-CP gene sets: Reactome, KEGG, BioCarta, and PID. The core data, comprised of DGR interactions, comes from two sources. The first is the PharmGKB knowledge base, which is a high-quality information source since it is manually curated by biomedical experts [22]. The second source is a large collection of DGR interactions that were algorithmically extracted from biomedical literature [11]. In total, there are over 3.2 million DGR interactions. We refer to this collection of DGR interactions as the GeneDive native data sources. Every DGR interaction is expected to have a confidence score which models the likelihood of that interaction being valid. The expert-curated interactions from PharmGKB are assigned the near-perfect confidence score of 0.99, while the extracted DGR interactions are assigned scores algorithmically based on the prevalence of the supporting evidence for the interaction in the literature [23]. To facilitate evidence-based research, the articles that provide the relevant attestation for each interaction are associated with it. All of this data is organized into an SQLite relational database. To provide fast responsiveness, GeneDive conducts most of the processing on the client-side through modules connected in serial, assisted by cached lookup tables.

Input/Output:

Acceptable search inputs for genes, drugs, and diseases include the symbols and names from several major data repositories. Specifically, GeneDive DGR accessions are standardized by entity type. Genes use NCBI Entrez gene identifiers (symbols), and gene sets use their names from their source database (Reactome, KEGG, and others). Drugs use PharmGKB drug and chemical identifiers from PubChem or the World Health Organization’s Anatomical Therapeutic Chemical classification system. Lastly, diseases use MeSH and SnoMEDCT identifiers. Gene symbols can be ambiguous: for example, PSP-A may refer to either of two genes, SFTPA-1 and SFTPA-2, that encode distinct pulmonary surfactant proteins. In this case, information about the distinct genes that correspond to the same input gene symbol is provided, and the user is prompted to choose the intended gene. All gene symbol ambiguities, if any, have to be resolved before proceeding.

1.2.3. Tackling the information-overload problem

Search modalities:

The first tool that GeneDive offers to deal with information-overload is the four search modalities: 1-Hop, 2-Hop, 3-Hop, and Clique. These search modes are provided by the Search Manager as seen in the GeneDive Controller subsystem in Fig 1. The input to the search modes consists of one or more of the following entities: gene symbols; gene set names; disease names and symbols; or drug symbols. 1-Hop search (default mode) retrieves all interactions containing the input entity (immediate neighbors). In the case of two or more input entities, 1-Hop mode retrieves interactions containing all possible input pairs4. 2-Hop and 3-Hop search modes require at least two input entities and retrieve interactions with intermediary entities along a pathway. 2-Hop shows paths with one or no intermediaries, and 3-Hop shows paths with two or fewer entities between the input entities. Clique search mode accepts only one entity symbol, and retrieves all the complete networks5 with three nodes where the input entity is one of the nodes.

Result presentation:

Each of the search modes helps the user focus on the entities of interest and the corresponding immediate neighborhood. Search results are presented simultaneously in tabular and graphical modes, as seen in the GeneDive View subsystem shown in Fig 1. In tabular mode, a summary view of the results is provided by default, by grouping the individual interactions based on unique entity pairs or based on their source article. Grouping is managed by the Grouping module of the GeneDive Controller subsystem in Fig 1, and allows users to scan for interactions and gain an overall understanding of the subnetwork structure and evidence. Summary statistics, such as the number of interactions in the group, and the distribution of confidence scores for the group, are provided for each group to guide user’s investigation. The user can examine the details of a group by simply clicking on its summary row. One of the most effective and well-established approaches for combating information-overload problems is data visualization. GeneDive employs Cytoscape’s force-directed layout engine[13], with nodes representing entities, and edges representing interactions, to provide an interactive graphical visualization of the data.

Control mechanisms:

The third strategy that GeneDive adopts to tackle the potentially large numbers of interactions is to offer multiple results control mechanisms to the user, shown in Fig 1 as steps occurring after the Search Manager in the GeneDive Controller subsystem (i.e. Filter, Highlight). The Filter feature allows the user to retain or remove certain search results. The filters are additive; users can add or remove filters as desired. Filters are applied independently, so removal of a filter does not impact other filters; likewise reordering does not affect the outcome. Users select filter criteria to deploy from a drop-down menu for partial text matching with certain fields (i.e. entity name, journal, specific article, and excerpt), and a slider control to set the minimum confidence score cutoff. The other control mechanism, the Highlight feature, allows user to emphasize certain interactions that match the user-specified criteria. In the graphical view, edges that match the highlighting criteria are rendered in a different color; in the tabular view, the background of the matching rows are highlighted in the same color. This strategy uses visual aid to spotlight the subset of data that is of interest.

1.2.4. Enabling data integration with uncompromising data ownership

GeneDive features the ability for users to import their own gene, disease, and/or drug relationships, provided these interactions are assigned confidence scores. This feature is provided by the Datasource Importer in the GeneDive Backend subsystem, shown in Fig 1. The score can be manually introduced (e.g. a high score of 0.999 for expert curation) or algorithmically calculated. A downloadable version of the GeneDive application is provided as a Docker image. Docker is virtualization software that enables researchers to deploy reproducible software systems, called containers, on heterogeneous hosts [24], and the packaged software file is called an image. The GeneDive container allows users to keep their data completely local and private on their own computers, while securely accessing the GeneDive database server for queries. The user’s data is never transmitted to the GeneDive server or any other server. These features accomplish the requirements about user data ownership and privacy. After import, the user-provided data are immediately available for selection as a data source for subsequent queries. The user can select any combination of their own dataset(s) and native datasets, and then proceed to query, filter, highlight, and visualize the results from all the selected sources seamlessly.

1.2.5. Facilitating reproducible, longitudinal, exploratory investigations

(The following paragraph describes parts of the Controller component of the MVC architecture — the contents of the teal background box in Fig 1.)

GeneDive manages application state to promote longitudinal and reproducible investigations (as shown in Fig 2). GeneDive users can undo their last action, all the way to the starting state, and redo their last undone action all the way to their latest new action. Users can download their search session as a GeneDive file, along with a Excel-compatible CSV file of their search results, and a PNG file of the graphical view. The downloaded files are portable; users can share and re-use previously-saved sessions to reproduce research and findings. The GeneDive file can be re-uploaded to resume where the user stopped. These features allow users to boldly investigate multiple lines of inquiry, knowing that they can save and restore a previously-known good state. GeneDive’s interactive search and reporting encourages exploratory investigations. The user determines the minimum confidence score for DGR interactions for search, and reviews multiple forms of supporting evidence for DGR interactions returned in the search results: source articles, excerpt text that provide specific context, the number of instances of the DGR interaction, and the number of articles that mention the DGR interaction. These two features together provide a mechanism to balance risk and exploration, and allow the user to calibrate the confidence score threshold that is appropriate for their specific investigations, useful for either a clinical setting (requiring evidence with higher confidence) or a curation standpoint (as higher confidence-scored interactions are annotated, unannotated interactions with lower confidence scores gain priority).

Figure 2:

Figure 2:

GeneDive Application State Diagram showing application states and transitions.

1.2.6. Client Automation for Documentation and Testing

GeneDive is an academic project, as such, it is vulnerable to periodic loss of developers when students graduate. This pose a recurring cost to software projects, namely interruption in training, knowledge, and hence project momentum. These challenges can be mitigated with good cohesive modular design, easy-to-run and easy-to-interpret tests, and well-written and up-to-date documentation. Maintaining testing and documentation are arduous tasks, so a design decision was made to automate these processes as much as possible.

An automation framework, complete with class and API definition, was developed initially to capture screenshots for documentation. The framework builds on jQuery’s DOM manipulation API6 and the Google Puppeteer package7, which provides a version of the popular Chrome browser8 that is fully programmable and can be run headless (no browser window) or headed (the user can watch the automation framework interact with the browser window in an autopilot mode). The GeneDive codebase is documented as per JavaDoc conventions. The documentation and code structure are parsed by Doxygen9, an automated documentation system which emits structured documents in HTML format. Doxygen generates class inheritance diagrams and call graphs to illustrate the flow of execution of the program for both the frontend and backend of Docker, across multiple programming languages. Online documentation is available at https://www.genedive.net/docs/html/index.html. The automation framework also serves as the backbone for front-end testing. Functional tests that check GeneDive’s key features are employed to accomplish UI smoke testing. In addition to detailed logging, Puppeteer is used to capture screenshots of the UI when a test fails. This helps gather evidence for root causes of failure, reducing the time and effort of the developer. The automation framework is developed in Node.js10 and is implemented using asynchronous calls in the form of Promises11.

2. Results

With its repository of gene-drug, gene-disease, and gene-gene interactions, GeneDive can facilitate exploration and discovery in a wide variety of research contexts and clinical settings. This section demonstrates these abilities through eight use cases. The first three use cases demonstrate the features and benefits of GeneDive for research and biomedical literature curation in connection with gene-related searching: single-gene searches, multi-gene searches, and searches involving related genes (gene sets or pathways). The next three use cases demonstrate the versatility and applicability of GeneDive for precision medicine research, bringing the knowledge contained in biomedical literature to bear on particular clinical situations and improving patient care. These three use cases are closely based on the Training Exercises made available by PharmGKB12. The last two use cases demonstrate the ability to use user-provided data with the GeneDive application.

2.1. Use Case 1: Single-Gene Query

A scientist is studying Crohn’s disease and wants to investigate the gene NOD2 in more detail. NOD2 is involved in immune system pathways and is associated with Crohn’s disease [25]. The scientist enters NOD2 in the search bar and limits the results to confidence scores greater than or equal to 0.85 (Fig 3).

Figure 3:

Figure 3:

Search results for NOD2 gene with a confidence score of 0.85, showing references to genes related to immune response.

In the DGR Pair group view, the result table contains interactions between NOD2 and over forty other DGR entities, including 24 other genes. Each individual DGR pair result row displays the highest-scoring sentence for the given pair. After expanding an interaction row with a single DGR, the scientist browses the supporting evidence that GeneDive displays for the interaction. The scientist employs a DGR filter for RIPK2 and reads sentences that support this interaction from three different articles. After removing the RIPK2 filter criterion, the scientist switches to the article group view in order to review all sentences with interactions from individual articles at once. This feature accelerates the scientist’s full-article review. After re-applying the DGR group view, the scientist uses the sentences shown in the Sample Excerpts column to discern several genes related to immune response (RIPK2 [26], TRAF6 [27], DUOX2 [28]). After uncovering genes potentially related to the underlying mechanism for Crohn’s disease, the scientist queries the resulting genes and constructs networks with their associations using the graphical view. This example highlights the utility to the scientist of single-gene searches for understanding the underlying mechanisms of a disease.

2.2. Use Case 2: Multi-Gene Query

Given a group of disease-associated genes, GeneDive can facilitate the discovery of potential connections between the genes or between other genes that interact with one or more of them. In this example, a scientist studying follicular lymphoma wishes to investigate the genes which impact this disease. The scientist is aware that translocation and subsequent over-expression of BCL2 is found in follicular lymphoma [29], but an additional seven genes (EZH2, ARID1A, MEF2B, EP300, FKH1, CREBBP, and CARD11) are used for predicting disease risk [30]. The scientist wishes to query GeneDive using the official symbols for these risk genes.

For example, starting with CREBBP and EP300, the scientist enters these genes and conducts a search in 1-Hop mode. GeneDive returns sentences from five publications as evidence for interactions between these genes. After setting the confidence score cutoff at 0.9, the scientist finds that the only qualifying sentence describes the interaction of the input genes as actually being a mutual interaction with a third gene, c-Myb, not between the input genes themselves. Next, the scientist investigates whether any of the risk genes had interactions two or less hops (i.e., 2-Hop) from the known causative gene BCL2. Using a confidence score cutoff of 0.9, the scientist discovers three risk genes connected to BCL2 via a 2-Hop search (EZH2, EP300, CREBBP). For example, applying an excerpt filter for “lymphoma” in the results of the 2-Hop search for BCL2 and EZH2 reveals that c-Myc expression may be a prognostic marker in diffuse large B-cell lymphoma, a disease related to follicular lymphoma. From here, the scientist can explore the intermediate genes and any interactions with pathway gene sets included in GeneDive.

2.3. Use Case 3: Pathway Gene Set Query

GeneDive is able to detect potential pathway overlaps within a set of disease-related genes using curated pathway gene sets from many database sources, such as KEGG or Reactome, that are pre-loaded in GeneDive. As an example, a scientist is studying Type 2 diabetes mellitus (T2DM), a complex disease characterized by insulin resistance, and over the last decade, a number of genetic variants have been shown to impact disease risk [31]. The scientist intends to use GeneDive to explore two KEGG pathways included in the MSigDB gene sets: KEGG_TYPE_II_DIABETES_MELLITUS and KEGG_INSULIN_SIGNALING_PATHWAY. The scientist is able to manually input a custom gene set if desired; however, these gene sets are pre-loaded in GeneDive and can be specified in the search area in the same way as any other DGR. Exploiting this feature, the scientist selects the gene sets for the KEGG T2DM pathway13. From the network view (Fig 4, (a)), numerous genes and their interactions from the T2DM pathway overlap directly with the insulin signaling pathway. The scientist then selects and compares the KEGG pathway for Type 1 diabetes mellitus (T1DM) (KEGG_TYPE_1_DIABETES_MELLITUS). The scientist finds that the T1DM genes interact with those from the insulin signaling pathway, but the T1DM genes are not found in the pathway itself (Fig 4, (b)). The lack of overlap between T1DM genes and the insulin signaling pathway is due to T1DM resulting from deficiency in insulin production, not resistance [32]. Because GeneDive makes gene sets readily available for analysis, the scientist can not only query her own custom gene set of interest, but can also investigate possible disease and biological pathway associations using curated gene sets.

Figure 4:

Figure 4:

Figure 4:

Comparison of partial network overlap of two gene sets.

(a) Search results for genes in Type 2 diabetes mellitus pathway set and in insulin signaling pathway set. Multiple genes exist in both sets, indicated by two-colored nodes.

(b) Search results for genes in Type 1 diabetes mellitus pathway set and in insulin signaling pathway set. No genes exist in both sets, indicated by single colored nodes.

2.4. Use Case 4: Infectious Disease Treatment

A patient is co-infected with HIV and hepatitis C genotype I and needs to begin treatment to manage both infections. An attending physician wishes to prescribe peginterferon alfa and ribavirin to treat the hepatitis C infection, and abacavir, atazanavir, or raltegravir to control the HIV infection. Also, the patients record indicates that he has had his genome sequenced, so the physician decides to check the data to see if the patient is carrying any variants which may affect their reaction to antiviral treatment.

First, the physician wishes to determine whether there are any dosing guidelines available for peginterferon alfa and ribavarin treatment. Using the default search mode of GeneDive, 1-Hop, for peginterferon alfa, dosing guidelines and FDA drug labels are easily and quickly accessed by clicking on the link-out provided in the search area to the appropriate PharmGKB webpage (Fig 5). In the retrieved results for peginterferon alfa, by clicking on the ‘# Interactions’ column, the gene(s) with highest occurrences of interactions with the drug can be identified. For the above two drugs that gene is IFNL3 (also known as IL28B). Next, the physician repeats the above steps for the antiretroviral medications abacavir, atazanavir, and raltegravir. GeneDive presents links for ready access to each medications information on PharmGKB, and based on occurrence frequency helps identify the genes that should be checked for variants. For abacavir it is gene HLA-B, atazanavir it is UGT1A1, and for raltegravir it is ABCB1.

Figure 5:

Figure 5:

GeneDive provides link-outs to appropriate NCBI and PharmGKB resource pages for convenient referencing.

The patient’s genome sequence analysis indicates that the patient’s geno-type/diplotype at the genes IFNL3, HLA-B, and UGT1A1 are CC; *08:73/*57:01; and *28/*37 respectively. The physician needs to examine the significance of these genotypes. Using the 1-Hop mode, the physician can search for direct interactions between any of the above entities. The physician starts with 1-Hop search for abacavir and HLA-B. 70 interactions from 34 different articles are found for this pair. In order to examine only the high confidence interactions, the physician selects the ‘High’ confidence cutoffs on the left control panel. This narrows the results to 45 interactions with 0.95 or higher confidence score, that are from 25 articles. Next, the physician uses the ‘Highlight Rows’ functionality with value ‘57:01 allele’ to zoom in onto the allele. Out of the 45 interactions, 7 are highlighted since they contain ‘57:01’ or ‘5701’ allele (Fig 6). This single interaction and its corresponding article can be quickly examined, and the physician can determine that the presence of the 57:01 allele suggests an increased risk that the patient will be hypersensitive to abacavir. The physician then uses the ‘Download Results’ functionality to retain the steps that retrieved this data, and the data itself.

Figure 6:

Figure 6:

The search results for ‘abacavir’ and ‘HLA-B’, with highlight on ‘57:01 allele’ reveals the risk of patient’s hypersensitivity to abacavir.

Similarly the physician uses 1-Hop mode to search atazanavir and UGT1A1 (Fig 7. The physician next uses the ‘Filter Results’ functionality to select only those interactions that contain ‘allele’ in excerpts. The resulting handful of interactions clearly indicate that a patient with the UGT1A1 *28/*37 allele has an increased risk of developing jaundice as a result of atazanavir treatment. As such, raltegravir would be the preferred treatment option.

Figure 7:

Figure 7:

The search results for ‘atazanavir’ and ‘UGT1A1’, with filter on Excerpts for ‘allele’ reveals the risk of patient developing jaundice.

Finally, the physician searches for ‘ribavirin’ and filters the results to contain ‘coinfect’. The retrieved interaction results (Fig 8) indicate that the patient with HCV and HIV coinfection is likely to respond favorably to a combination treatment with peginterferon alfa and ribavarin.

Figure 8:

Figure 8:

The search results for ‘ribavirin’ filtered on Excerpts for ‘coinfect

2.5. Use Case 5: Rheumatoid Arthritis Treatment

A physician has diagnosed a patient with rheumatoid arthritis and is deciding between azathioprine and methotrexate for treatment. The physician uses GeneDive to first determine whether there are any Level 1A clinical annotations associated with either of these choices, and if so, which genes are involved. Entering each drug name as the search term yields links to PharmGKB data sheets, which yield a Level 1A clinical annotation for azathioprine and the gene TPMT, which encodes thiopurine methyltransferase. There are no Level 1A annotations associated with methotrexate.

Next, the physician explores the relationship between TPMT and azathioprine. Using the 1-Hop search mode, the physician limits the search results to only those interactions between these two entities. The Excerpt column quickly shows that the enzyme encoded by TPMT metabolizes azathioprine: the enzyme catalyzes the S-methylation of the molecule. The third-to-last row of the tabular view (Fig 9) reveals this relationship. The confidence score cutoff of 0.85, corresponding to the “medium” setting, assists the physician by further limiting the results to those interactions reported with high confidence in the literature. The links to the source publications for each interaction allow the physician to assess the evidence presented in the publication, if desired.

Figure 9:

Figure 9:

The tabular view reveals the optimum dosage of azathioprine and its methylation by TPMT.

The patient has provided the physician with genetic information received from a direct-to-consumer testing company. The genetic information indicates that the patients TPMT diplotype is *1/*3C. Referring to the PharmGKB data sheet linked to by GeneDive, the physician determines that the patient is expected to be an intermediate metabolizer of azathioprine based on the indicated diplotype. The physician then further filters the TMPT-azathioprine interactions for publications discussing the safety of this medication, revealing that its most common serious side effects are dose-dependent hepatotoxicity and myelotoxicity. Based on these findings, the physician determines that, if azathioprine is chosen, then the patient should be started on a reduced dose of this medication. Dosing information is also presented: the tabular view of GeneDive reveals a publication excerpt identifying an optimum azathioprine dosage of 2.5-3.5 mg/kg/day, with a reduced dosage of 1 mg/kg/day (Fig 9, second-to-last row of tabular view).

2.6. Use Case 6: Post-Operative Transplant Surgery Care

A physician has prescribed tacrolimus (an immuno-suppressant medication) to a patient who recently had a kidney transplant. Prior to the operation, the patient advised that he had been genotyped and has given access to the data. The physician wishes to determine if any of the data inform the starting dose of tacrolimus. The physician first wishes to determine whether there are any Dosing Guidelines, FDA Drug Labels or Level 1 Clinical Annotations associated with tacrolimus. A GeneDive search provides quick links to the relevant pages for tacrolimus; the PharmGKB link-out provides ready reference to the drug labels and Level 1 Clinical Annotations. Next, the physician wishes to check which genes (if any) should be checked for variants. To answer this question, tacrolimus and the phrase “kidney transplantation” are entered as search terms in GeneDive, and the 2-Hop mode is chosen to find which gene(s) are reported in the literature as interacting with both entities. To focus on those interactions reported with the highest confidence, the confidence score cutoff is set to the “high” setting of 0.95. In the Excerpt column in the tabular view, the physician observes that CYP3A5 is known to play a key role in the pharmacokinetics of tacrolimus and on its dosing.

Now aware of the importance of CYP3A5, the physician wishes to determine the patients CYP3A5 metabolizer status and whether any dosage adjustment is warranted. The physician replaces “kidney transplantation” with “CYP3A5” and performs a 1-Hop search with tacrolimus. Using the excerpts from the tabular view of GeneDive and following the publication links, the physician learns that tacrolimus is metabolized primarily to 13-O-demethyl-taorolimus and 31-O-demethyl-tacrolimus in the liver and intestine. Referring to the literature and PharmGKB link for tacrolimus, the physician finds that the patient can be characterized as an intermediate metabolizer, and that the starting dose should be increased up to two times the recommended starting dose, but not exceeding 0.3 mg/kg/day.

2.7. Use Case 7 - User Data Only

A biologist is in the early stages of studying the interaction of genes within a specific disease, and trying to understand genes that likely work together. They are using gene expression data from a large number of disease samples and have built a gene co-expression matrix to find highly correlated genes that are likely co-regulated. There is currently only one gene known to be important in this disease and they wish to understand which genes work with it and if there are any gene groups that appear to participate in the same function. As this research is at an early stage, they dont want to upload their expression data to a public website yet.

After downloading and installing the GeneDive Docker image14, the biologist would preprocess their gene expression correlations to import into GeneDive locally. The biologist would unselect other data sources other than their own, and perform a search for the known associated gene and select 1-Hop search mode to reveal known neighbors and potentially new interactions. Next, they adjust the cutoffs for their correlation scores to a high threshold to display only strong correlations. Finally, they use the Clique search mode to view interactions between neighbors and look for coregulation subnetworks that include the gene of interest.

2.8. Use Case 8: User Data with Native data

A biomedical researcher wants to visualize SARS-CoV-2 (COVID-19) data from BioGrid15 to investigate potential mechanisms that influence viral infection. The spike protein (S) binds to transmembrane ACE2 receptors in human heart, lung and other cells, and are critical to initiating the mechanism of viral infection [33, 34].

Using the curated COVID-19 interactions, the researcher can visualize the relationship of genes for spike proteins and ACE2 (Fig 10). By enriching this network with DeepDive-extracted interactions, the researcher can see how the COVID-19 genome interacts with the human genome, ML-identified drugs, and diseases. Extending the search to low confidence casts a wider net to look for potential therapeutic approaches to thwart the viral infection mechanism. The researcher can then use the tabular view to prioritize investigations for lowest cost and most supporting evidence to direct future study.

Figure 10:

Figure 10:

Visualizing BioGrid’s COVID-19 interaction data with native GeneDive data sources using low confidence for broad exploration

3. Discussion

Collectively, the eight use cases demonstrate representative scenarios where the GeneDive application proves most useful. In addition to the aforementioned direct use cases, future use cases for GeneDive may include informing gene therapy to help design novel CRISPR/Cas applications. GeneDive might be used to prioritize gene therapy targets for specific diseases. GeneDive topology search can potentially foresee adverse gene therapy interactions, either with patient-specific gene mutations or a patient’s current prescription regimen.

3.1. Feature Comparison with Other Tools

Existing tools such as Cytoscape[13] and VisANT[14] have multi-species support for integrating and visualizing curated datasets (e.g., GO, KEGG pathways). The approach by Poon et al. [15] uses natural language processing techniques to extract interactions from biomedical literature, and the corresponding web-application, Literome, provides search and visualization capabilities for the extracted interactions data. Table 1 provides a side-by-side comparison of GeneDive features with those of Literome, Cytoscape, and VisANT. This comparison considers features that help the user combat information-overload (#1 and #2), promote longitudinal and exploratory research analyses (#3), and support integration of user-provided interactions data (#4). This comparison shows that GeneDive provides features that are traditionally supported by tools built on curated datasets (Cytoscape and VisANT), as well as features typical of tools built on algorithmically created databases (Literome).

Table 1:

Comparison of features and functionalities provided by GeneDive (GD) with three prominent search and visualization tools: Literome (Lit), Cytoscape (CS), VisAnt (VA).

Feature GD Lit CS VA
1. Search Modalities
 1-hop
 2-hop
 3-hop
 Clique

2. Information Assimilation
 Tabular View
 Graphical View
 Supporting evidence (Excerpt, Articles, Conf. score)

3. Longitudinal Investigations
 Save/Load Session
 Undo/Redo
 Client Automation

4. User-provided Interactions
 Integration Ability
 Data Privacy

The following biomedical resources are also worth mentioning here: OpenTargets platform16, DisGeNet17, CTD18, MetaCore19, Data4Cure20. To the best of our knowledge, two of these resources (MetaCore, and Data4Cure) are commercial products. The other three (OpenTargets, DisGeNet, and CTD) do not offer two key functionalities: 1) the multi-hop search modalities, and 2) the graphical view of results.

3.2. Availability and Future Directions

The GeneDive web application is freely available to everyone at https://www.genedive.net/. A minimal registration step is needed to start using the application. The GeneDive application as a Docker container image is also available for download at the aforementioned URL (Login to the GeneDive web application, click on the hamburger icon on the top left corner, and select “Download GeneDive” from the dropdown menu.) The GeneDive source code is available at: https://github.com/mikewong-sfsu/GeneDive/releases/tag/v2.5.2.

One of the limitations of the current GeneDive version is cross-browser compatibility — GeneDive is currently optimized only for the Chrome web browser. Future releases of GeneDive will expand the set of supported browsers and browser versions that offer the full suite of GeneDive features. Also, GeneDive currently supports a specific set of gene, drug, or disease entity accessions: NCBI, Entrez, Reactome, KEGG, and PharmKGB entity identifiers are included in the application. While these databases are widely-used and offer GeneDive users access to information about over three million DGR interaction pairs, this set of entity accessions is not exhaustive, and future releases may expand on the available identifiers. Every interaction sourced from PharmGKB is supported with a published article, but short excerpts are not available for these interactions. Extracting the excerpts from the associated article is part of the future work. Another limitation of GeneDive that will be addressed in a near-future release is the static view model — the search results presented in the tabular mode use a predefined set of columns that cannot be modified by the user. Future releases of GeneDive will provide a flexible view model that allows the user to (1) hide/unhide columns, (2) add new columns that consolidate data from existing columns, and (3) add new columns that are transformations of the original data.

Acknowledgments

This work was partially supported by NIH grant LM005652, and by CoSE Computing for Life Sciences at San Francisco State University. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agency. We are grateful to the PharmGKB team and Dan Sosa for their valuable suggestions and insights.

Footnotes

4.

Declaration of Conflicting Interests

The authors declare that there is no conflict of interest.

4

If the user has entered genes A, B, and C, then 1-Hop search will return gene-gene interactions for the following gene pairs: AB, AC, and BC. Since these interactions are non-directional, the interactions AB and BA are equivalent.

5

A complete network is formed when every member entity directly interacts with every other entity in the network.

References

  • [1].Mooney S, Progress towards the integration of pharmacogenomics in practice, Hum Genet 134 (5) (2015) 459–465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].RellingV M, Evans W, Pharmacogenomics in the clinic, Nature 526 (2015) 343–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Lee J, Aminkeng F, Bhavsar A, et al. , The emerging era of pharmacogenomics: current successes, future potential, and challenges, Clin Genet 86 (1) (2014) 21–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Thorn C, Klein T, Altman R, Pharmacogenomics and bioinformatics: Pharmgkb, Pharmacogenomics 11 (4) (2010) 501–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Samwald M, Coulet A, Huerga I, et al. , Semantically enabling pharmacogenomic data for the realization of personalized medicine, Pharmacogenomics 13 (2) (2012) 201–2102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Whirl-Carrillo M, McDonagh E, Hebert J, et al. , Pharmacogenomics knowledge for personalized medicine, Clin Pharmacol Ther 92 (4) (2012) 414–417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Tari L, Anwar S, Liang S, Cai J, Baral C, Discovering drugdrug interactions: a text-mining and reasoning approach based on properties of drug metabolism, Bioinformatics 26 (18) (2010) 547–553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Vazquez M, Krallinger M, Leitner F, Valencia A, Text mining for drugs and chemical compounds: Methods, tools and applications, Mol Inform 30 (2011) 506–519. [DOI] [PubMed] [Google Scholar]
  • [9].Percha B, Garten Y, Altman R, Discovery and explanation of drug-drug interactions via text mining, Pac Symp Biocomput (2012) 410–421. [PMC free article] [PubMed] [Google Scholar]
  • [10].Poon H, Toutanova K, Quirk C, Distant supervision for cancer pathway extraction from text, Pac Symp Biocomput (2015) 120–131. [PubMed] [Google Scholar]
  • [11].Mallory E, Zhang C, Re C, Altman R, Large-scale extraction of gene interactions from full-text literature using deepdive, Bioinformatics 32 (1) (2016) 106–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Davis A, Wiegers T, Roberts P, et al. , A ctd-pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions, Database (Oxford) 2013. (bat080). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Shannon P, Markiel A, Ozier O, et al. , Cytoscape: a software environment for integrated models of biomolecular interaction networks, Genome Res 13 (2003) 2498–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Hu Z, Mellor J, Wi J, DeLisi C, Visant: an online visualization and analysis tool for biological interaction data, BMC Bioinformatics 5 (17). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Poon H, Quirk C, DeZiel C, Heckerman D, Literome: Pubmed-scale genomic knowledge base in the cloud, Bioinformatics 30 (19) (2014) 2840–2842. [DOI] [PubMed] [Google Scholar]
  • [16].Previde P, Thomas B, Wong M, Mallory E, Petkovic D, Altman R, Kulkarni A, Genedive: A gene interaction search and visualization tool to facilitate precision medicine, Pac Symp Biocomput (World Scientific) 590–601. [PMC free article] [PubMed] [Google Scholar]
  • [17].Koscielny G, et al. , Open targets: a platform for therapeutic target identification and validation, Nucleic Acids Research 45 (D1). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Pinero J, et al. , Disgenet: a comprehensive platform integrating information on human disease-associated genes and variants, Nucleic Acids Research 45 (D1) (2016) D833D839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Highsmith J, Cockburn A, Agile software development: The business of innovation, Computer 34 (9) (2001) 120–127. [Google Scholar]
  • [20].Buschmann F, Meunier R, Rohnert H, Sommerlad P, Stal M, Pattern-oriented software architecture: a system of patterns, Wiley, 2000. [Google Scholar]
  • [21].Baxevanis A, Searching the ncbi databases using entrez. current protocols in human genetics, chapter 6, Cur Protocols in Hum Gen 51 (1) (2006) 6.10.1–6.10.24. [DOI] [PubMed] [Google Scholar]
  • [22].Thorn C, Klein T, Altman R, Pharmgkb: the pharmacogenomics knowledge base, Pharmacogenomics 1015 (2013) 311–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Niu F, Zhang C, Re C, Shavlik J, Deepdive: Web-scale knowledge-base construction using statistical learning and inference, VLDS 884 (2012) 25–28. [Google Scholar]
  • [24].Boettiger C, An introduction to docker for reproducible research, with examples from the r environment, ACM SIGOPS Oper. Syst 49 (1) (2015) 71–79. [Google Scholar]
  • [25].Yamamoto S, Ma X, Role of nod2 in the development of crohn’s disease, Microbes Infect 11 (12) (2009) 912–918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Ruefli-Brasse A, Lee W, Hurst S, Dixit V, Rip2 participates in bcl10 signaling and t-cell receptor-mediated nf-kappab activation, J Biol Chem 279 (2) (2004) 1570–1574. [DOI] [PubMed] [Google Scholar]
  • [27].Manna S, Ramesh G, Interleukin-8 induces nuclear transcription factor-kappab through a traf6-dependent pathway, J Biol Chem 280 (8) (2005) 7010–7021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Joo J, Ryu J, Kim C, et al. , Dual oxidase 2 is essential for the toll-like receptor 5-mediated inflammatory response in airway mucosa, Antioxid Redox Signal 16 (1) (2012) 57–70. [DOI] [PubMed] [Google Scholar]
  • [29].Kridel R, Sehn L, Gascoyne R, Pathogenesis of follicular lymphoma, J Clin Invest 122 (10) (2012) 3424–3431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Pastore A, Jurinovic V, Kridel R, et al. , Integration of gene mutations in risk prognostication for patients receiving first-line immunochemotherapy for follicular lymphoma: a retrospective analysis of a prospective clinical trial and validation in a population-based registry, Lancet Oncol 16 (9) (2015) 1111–1122. [DOI] [PubMed] [Google Scholar]
  • [31].DeFronzo R, Ferrannini E, Groop L, et al. , Type 2 diabetes mellitus, Nat Rev Dis Primers 1 (15019). [DOI] [PubMed] [Google Scholar]
  • [32].Katsarou A, Gudbjornsdottir S, Rawshani A, et al. , Type 1 diabetes mellitus, Nat Rev Dis Primers 3 (17016). [DOI] [PubMed] [Google Scholar]
  • [33].Li W, Moore M, Vasilieva N, et al. , Angiotensin-converting enzyme 2 is a functional receptor for the sars coronavirus, Nature 426 (2003) 450–454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Verdecchia P, Cavallini C, Spanevello A, Angeli F, The pivotal link between ace2 deficiency and sars-cov-2 infection, Eu J Int Med 76 (2020) 14–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES