Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database

Shur-Jen Wang; Stanley J F Laulederkind; G Thomas Hayman; Victoria Petri; Jennifer R Smith; Marek Tutaj; Rajni Nigam; Melinda R Dwinell; Mary Shimoyama

doi:10.1152/physiolgenomics.00046.2016

. 2016 Jun 10;48(8):589–600. doi: 10.1152/physiolgenomics.00046.2016

Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database

Shur-Jen Wang ^1,^✉, Stanley J F Laulederkind ¹, G Thomas Hayman ¹, Victoria Petri ¹, Jennifer R Smith ¹, Marek Tutaj ¹, Rajni Nigam ¹, Melinda R Dwinell ², Mary Shimoyama ¹

PMCID: PMC5005459 PMID: 27287925

Abstract

Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality.

Keywords: cardiovascular disease, animal models of human disease, functional genomics, model organism database

complex diseases are caused by a combination of multiple factors such as genetic predisposition, diet, and environmental components. Individuals carrying susceptible genes have a higher risk of developing disease under permissive environmental conditions compared with the general population. Susceptible genetic predisposition to complex diseases is likely governed by groups of genes through interacting networks/pathways. Therefore, functional analysis of a gene set associated with the disease, rather than one single gene, would provide a more complete picture of the disease. The disease portals at the Rat Genome Database (RGD, http://rgd.mcw.edu) provide a comprehensive platform for genetic and genomic research in complex diseases. In the disease portals three types of data objects, genes, strains, and quantitative trait loci (QTLs), are annotated with disease-related data and integrated together for easy access. So far RGD has targeted 10 disease areas, with aging and age-related diseases being the newest and cardiovascular diseases being one of the most comprehensive.

Disease curation at RGD is facilitated by using the text-mining tool OntoMate (12). The tool performs an extensive literature search of PubMed publications (http://www.ncbi.nlm.nih.gov/pubmed) on a daily basis and tags articles with gene names, ontology terms, and species names. Curators manually curate targeted diseases from the tagged publications across rat, mouse, and human literature. The inclusion of disease curation from rat and mouse literature enables researchers to find suitable rodent models for human diseases. RGD uses ontologies/controlled vocabularies to systematically organize curated disease data. An ontology is a controlled vocabulary of well-defined hierarchically organized terms with specified relationships between them. For example, in the RGD disease vocabulary (7) “Cardiovascular Diseases (RDO:0005134)” is a parent to “Cardiovascular Abnormalities (RDO:0000746)”, “Heart Diseases (RDO:0004906)” and “Vascular Diseases (RDO:0003177).” These three children are siblings to one another. A “cardiovascular diseases” query will retrieve all data annotated with “cardiovascular diseases” and its children terms. The ontologically organized knowledge facilitates data mining and sharing across data sources.

In addition to serving the research community as a data warehouse, RGD also provides genome tools to analyze and visualize data sets. The comprehensive list of tools can be found at Genome Tools (http://www.rgd.mcw.edu/wg/tool-menu?100). There are genome browsers for rat, mouse, and human and analysis tools for functional analysis of genes. Each tool has an entry point from the Genome Tools page and some also have links from the recently developed tool OLGA (Object List Generator & Analyzer). OLGA integrates links to the Gene Annotator (GA), Variant Visualizer, and Genome Viewer (GViewer) tools in one place to guide users to explore the full functionality of analysis tools available at RGD. In this article, the cardiovascular disease portal is used to showcase how disease data are presented and interpreted at RGD. Finally, the cardiovascular disease genes were analyzed with RGD tools, and the results were corroborated with other enrichment tools.

METHODS

Gene Retrieval

Human cardiovascular disease genes were queried from the RGD database according to the source of annotation. The disease gene files are available for download from the “FTP Download” tab at RGD (http://www.rgd.mcw.edu). There were 1,552 protein coding genes in the cardiovascular disease portal at the time of data query in April 2015. The RGD curation team had contributed manual annotations for 1,294 genes, imported 368 disease genes from ClinVar (10), and 375 from Online Mendelian Inheritance in Man (OMIM) (1). These disease genes, RGD manually curated set, imported ClinVar, and OMIM sets, named RGD Curated, ClinVar, and OMIM, respectively, hereafter, are listed in Supplementary Table S1.¹

Analysis Tools

The GA tool was used to analyze cardiovascular disease genes from different data sources. A video tutorial and step-by-step instructions for use of the GA tool can be found on the RGD website (http://www.rgd.mcw.edu/wg/home/rgd_rat_community_videos/gene-annotator-tutorial). The official gene symbols from the Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC) were used to upload disease genes to the tool. Version 10 of STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) (http://string-db.org/) was used to analyze enrichment of disease genes in KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways. The details for the enrichment widget in the tool are described in one of the STRING publications (6). The enrichment tables were downloaded and pathways related to cardiovascular diseases were compared. The Gene Ontology (GO) enrichment patterns of cardiovascular disease genes were studied using the overrepresentation test (13) in PANTHER (Protein Analysis Through Evolutionary Relationships). There are eight biological process (BP) terms, five cellular component (CC) terms, and eight molecular function (MF) terms selected from the top three or four terms from each disease gene set for comparison (see Fig. 8).

Fig. 8. — The cardiovascular disease genes from RGD manual curation, ClinVar, and OMIM disease databases were subjected to KEGG and GO enrichment analyses using STRING (KEGG Pathway, A) and PANTHER-Gene List Analysis (biological process, B; cellular component, C; and molecular function, D). The top overrepresented terms were selected from each gene list and compared with the enrichment P values, shown as “−Log P” in each bar chart.

RESULTS

Cardiovascular Disease Portal

The disease portals at the RGD (http://rgd.mcw.edu) provide a comprehensive platform for genetic and genomic research in diseases. Multifaceted datasets are integrated into the context of the genome by use of standardized controlled vocabularies/ontologies. For in-depth coverage of specific disease areas, RGD has targeted approaches to manually curate gene-disease associations from the biomedical literature. This project has resulted in 10 specifically targeted disease areas with the cardiovascular disease portal being one of the most comprehensive. The cardiovascular disease data are integrated into one portal so that users can access different data types from tabs on the top of the disease portal page (Fig. 1). The clickable tabs for diseases, phenotypes, biological processes, and pathways allow users to search disease data by ontologies. For example, users can search disease data by selecting a disease category like “vascular diseases” and then narrow down the search by selecting a child term such as “aortic aneurysm, abdominal” under the secondary dropdown menu.

Fig. 1. — Cardiovascular disease portal home page. Clickable tabs at the *top* allow searches using different ontologies available at the Rat Genome Database (RGD). For the disease search example shown here, the disease category “Vascular Diseases” is selected on the *left* dropdown menu and “Aortic Aneurysm, Abdominal” is selected in a secondary dropdown menu for more granular child terms.

Gene Annotations

GO (2) was the first controlled vocabulary implemented at RGD to organize functional knowledge of gene products curated manually from the biomedical literature. GO uses three ontology branches to describe the MF of a gene product, the BP that a gene product is involved in, and the CC where the gene product is localized. The appropriate GO term reflecting functional knowledge extracted from the literature is associated with a gene to make a GO annotation during the manual curation process. In addition to curating rat literature from PubMed, RGD also imports human and mouse GO annotations from the Gene Ontology Consortium (http://geneontology.org/). These imported GO annotations are transferred to rat orthologs and presented in the gene report pages.

Disease Annotations

The disease annotations at RGD are associated with three types of data objects: genes, strains, and QTLs. These disease-object associations curated from the literature are based on association to biomarkers, molecular mechanism, and therapeutic target, as well as genetic association. Each disease annotation has an evidence code to describe what type of experiment supports the association (http://rgd.mcw.edu/wg/help3/genes/rgd-gene-report-pages/evidence-codes-guide). Four experimental evidence codes are used to reflect different types of associations during disease curation. The evidence code IAGP (inferred by association of genotype with phenotype) is used in genetic association studies where genome variations are linked to diseases. The evidence code IED (inferred from experimental data) is used in studies where molecular mechanisms or therapeutic targets are examined. The evidence code IEP (inferred from expression pattern) is used in the study where biomarkers for a disease are examined. In addition to the manual disease annotations, RGD has imported gene-disease associations from ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/), OMIM (http://www.omim.org/), and the Genetic Association Database (GAD) (http://geneticassociationdb.nih.gov/). These imported disease annotations are assigned the IEA evidence code to human genes to indicate that the association is inferred from electronic annotation. Each primary annotation, manual or imported, is propagated to orthologs in the other two species using the ISS (inferred from sequence similarity) evidence code (7). These disease annotations are integrated with other data and available in several places at RGD such as the gene report pages (Fig. 2) and the pathway diagram pages (Fig. 3B).

Fig. 2. — The human APOE gene report page. The *top* of the page provides general information about the human APOE gene including gene description, synonyms (“Also known as”), orthologs, and the “Annotation” section. The “Disease Annotations” section is expanded and toggled to show full annotations.

Fig. 3. — Pathway diagram page of the “angiotensin II signaling pathway.” A: manually curated pathway diagram showing genes, processes, and pathways involved in angiotensin II signaling. Icons for genes and pathways in this diagram provide links to the respective report pages. B: the pathway gene annotation tables list disease and pathway annotations associated with genes in the diagram.

Pathway Annotations

The pathway ontology was developed concurrently with the manual pathway curation project at RGD. It is a structured, controlled vocabulary used to describe related reactions/interactions in biological systems. The complexity of the pathway ontology and pathway annotations at RGD (16) has been expanded by incorporation of data from the Pathway Interaction Database (18), KEGG (3), and the Small Molecule Pathway Database (http://smpdb.ca) (8). The pathway data are accessible through the pathway portal landing page (http://www.rgd.mcw.edu/wg/home/pathway2) or from the “Molecular Pathway Annotations” section of the gene report pages. The pathway data have been integrated into the cardiovascular disease portal where users can retrieve cardiovascular disease genes annotated with pathway terms. RGD has created interactive pathway diagrams to illustrate how genes, and their interactions are orchestrated in a pathway. Using blood pressure regulator angiotensin II as an example (Fig. 3A), the angiotensin II signaling pathway is activated through the AT1 or AT2 receptor. These two receptor signaling pathways involve different molecules and interactions and work coordinately to maintain vascular tone. Users can find pathway gene-associated disease and pathway annotations in tables below the diagram (Fig. 3B). To provide broader, system-level views of molecular pathways, related or connected pathways are organized into pathway suites or suite networks. One example is the “Balancing Blood Pressure Regulatory Mechanisms Pathway Suite Network,” which contains three interconnected pathway suites related to blood pressure regulation (http://rgd.mcw.edu/wg/pathway/the-balancing-blood-pressure-regulatory-mechanisms-pathway-suite-network).

Data Object Report Pages

Genes.

The RGD gene catalogue is based on large-scale genomic data imported from the National Center for Biotechnology Information (17). These imported data are organized to display a brief overview about the gene, and in-depth information can be retrieved from links provided on the page (Fig. 2). RGD also provides information on orthologs for rat, mouse, and human. The official gene nomenclature and symbol (19) are used for gene names and symbols, and common names used by research communities are listed as synonyms. Navigation among orthologs within RGD or to the associated gene reports from HGNC and Mouse Genome Informatics (MGI) are accessible from links provided. The ortholog relationship allows for ISS assignment of disease annotations across species. Using the human APOE gene as an example, the human gene is annotated with “Aortic Aneurysm, Abdominal (RDO:0007052)” based on the genomic analysis data provided in the reference RGD:1578483 (PMID:10848855) (Fig. 2, circled). Thus, this annotation is assigned the IAGP evidence code and is propagated to rat and mouse orthologs with ISS as the evidence code. The association of the mouse Apoe gene with “Aortic Aneurysm, Abdominal” was demonstrated in Apoe knockout mice in the reference RGD:6903247 (PMID:0841519) and is propagated to the rat and human orthologs with the ISS evidence code. The RGD identifier for mouse Apoe (RGD:733604) is provided in the “with” column on the human APOE report page and hyperlinks back to the mouse Apoe gene report page (http://www.rgd.mcw.edu/rgdweb/report/gene/main.html?id=733604) where this primary annotation has assigned the inferred from phenotype manipulation (IPM) evidence code.

Strains.

Rats are one of the preferred animal models for studying human diseases. Many rat strains have been selectively bred for resistance/susceptibility to various diseases such as cardiovascular diseases, renal diseases, and obesity (5). To facilitate usage of rat strains in disease research, RGD systematically organizes rat strains with official nomenclature and rat strain identifiers (15). Figure 4A is an example that shows the ontological relationship among SS.LEW congenics. These congenics were generated from matings of hypertension resistant LEW (RS:0000121, RGD:60999) rats with the susceptible SS/Jr (RS:0000817, RGD:10041). From back-crossing of the congenic strain “SS.LEW-(D1Mco36-D1Rat49)/Jr” (RS:0000830, RGD:728159) to the “SS/Jr” (RS:0000817, RGD:10041), one of the offspring strains “SS.LEW-(D1Mco36-D1Mco101)/Mco” (RS:0001534, RGD:2292653), exhibiting decreased systemic arterial systolic blood pressure compared with SS/Jr, was found to carry a blood pressure regulatory region that is smaller than the original region identified in the parent donor strain (reference RGD:2291850, PMID:18324438). The relationship among these rat strains can be visualized with the Rat Strain Ontology. The SS/Jr and the two SS.LEW congenics are included in the cardiovascular disease portal based on their association with hypertension curated from the literature (22). Curated strain data are displayed in the “Annotation” section of strain report pages (Fig. 4B) with links to other resources. One of the tabs “Phenotype Values via Phenominer” leads to the quantitative database Phenominer, where numerical values of physiological measurements from rat strains are available (11, 23).

Fig. 4. — Rat strain ontology browser page (A) and strain report page (B). A: the strain browser page shows the parent (*left*) of the highlighted “SS.LEW-(D1Mco36-D1Mco101)/Mco” rat strain, its siblings (*center*) and its child entries (*right*). The orange icon with a white “A” links to annotations associated with the rat strain. These annotations can also be viewed by clicking on the link “View Strain Report.” B: the strain report page contains a general information section and an “Annotation” section.

QTLs.

A QTL is a polymorphic chromosome region that contains genes/alleles that differentially affect the expression of a continuously distributed phenotypic trait. A wide variety of traits associated with multiple physiological systems are represented by the rat QTLs at RGD (14). As with strains, an official name and symbol are assigned to each QTL according to nomenclature rules. RGD collects and curates all known rat QTLs from direct submissions by rat researchers as well as those published in the literature. A QTL report page (Fig. 5) contains parental information, statistical details such as the logarithm of odds (LOD) score and P value that determined the QTL, chromosome location, and genome markers curated from the literature or provided by the researcher. QTLs in RGD are annotated with the Mammalian Phenotype Ontology (20) and the RDO disease ontology (7). Those QTLs annotated with cardiovascular-related phenotypes or diseases are integrated into the cardiovascular disease portals. Since each QTL is defined experimentally, RGD recently developed three ontologies (21) to capture experimental parameters used to define QTLs. These include the clinical measurement ontology (CMO) defining the measurement, the measurement method ontology (MMO) defining the measuring method, and the experimental condition ontology (XCO) defining the conditions during the experiment. QTLs have been used as tools to narrow down the genomic regions associated with specific diseases. A disease-specific QTL in rats is often obtained by intensive genetic crossings and analysis and is expected to contain one or more genetic elements that contribute to the phenotype observed. These disease-associated QTLs are then used to narrow down genome regions regulating the association to the disease. In the SS.LEW example mentioned in the previous section, the congenic “SS.LEW-(D1Mco36-D1Rat49)/Jr” carries a genome region from a LEW chromosome that rescues the hypertension phenotype contributed from SS parents. This blood pressure regulatory region from the LEW chromosome is further divided into two regions, named QTL BP313 and QTL BP314 (RGD:2293140 and RGD:2293142, respectively; data not shown), that have differential effects on blood pressure regulation. Nr2f2 is found to be one of the genes that is differentially expressed in the congenics carrying BP 313, and its role in cardiovascular diseases is further confirmed in Nr2f2 mutant rats (9).

Fig. 5. — The quantitative trait locus (QTL) Bp155 report page contains general information about the QTL followed by curated Annotation and Region sections. “Disease Annotations” and “Experimental Data Annotations” sections are expanded to show annotation terms associated with the QTL. The QTL peak and position markers that define Bp155 are shown in the expanded “Position Markers” section.

Genome Analysis Tools

The comprehensive list of tools can be found at Genome Tools (http://www.rgd.mcw.edu/wg/tool-menu?100). The GA tool takes a list of genes or a chromosomal region and retrieves ontology annotations from RGD. The GViewer provides users with a complete genome view of genes and QTLs annotated to an ontology term. These two tools along with the Variant Visualizer are integrated into the OLGA tool so that users can move data sets between tools. The OLGA tool takes a premade gene list or generates gene, strain, or QTL lists from annotations or genome information. From the OLGA tool, users have options to send the gene list to the GA tool for further analysis, to view chromosomal positions of genes using the GViewer tool, or to download the gene sets generated.

Cardiovascular Disease Gene Analysis

Cardiovascular disease genes.

The cardiovascular disease genes were classified into four groups according to the source of disease annotations: ClinVar (368 genes), GAD (30 genes), OMIM (375 genes), and RGD Curated (1,294 genes). Figure 6A shows the distribution of the three major groups: ClinVar, OMIM, and RGD Curated (gene lists are presented in Supplementary Table S1). The ClinVar gene set and OMIM gene set are about the same size and have about a 75% (280 genes) overlap. The RGD Curated set has 1,089 unique genes and 205 genes that overlap with genes from ClinVar and/or OMIM annotations. The disease annotations imported from ClinVar and OMIM are based on genetic association, which makes them comparable to the RGD manual annotations assigned the IAGP evidence code. We determined the distribution of annotations based on evidence codes among the Curated genes to see how many of them overlapped with the imported disease annotations. When RGD Curated gene annotations are broken down into four groups by evidence codes (Fig. 6B) (gene lists are presented in Supplementary Table S2), there are 481 genes with the IAGP evidence code and a third of those (160 genes) also have imported annotations, the highest in number and percentage when one considers the imported genes that overlap the list of curated genes in all evidence code categories. The rest of the three evidence code groups have similar percentages of genes with imported annotations.

Fig. 6. — A: Venn diagram showing distribution of the cardiovascular disease genes among 3 major sources: ClinVar, OMIM, and RGD Curated. The total gene count is 1,552; the number in each area represents the gene count of that section. B: the RGD Curated genes are grouped according to the evidence code assigned to the disease annotations. The number above each blue bar represents the gene count of the evidence group. The number within parentheses above each orange bar represents the percentage of Curated genes with imported annotations, from ClinVar and/or OMIM. IAGP, inferred by association of genotype with phenotype; IED, inferred from experimental data; IEP, inferred from expression pattern; IPM, inferred from phenotype manipulation.

GA tool.

Annotations associated with these three gene sets are examined with the GA tool. The annotation distribution analysis in the GA tool shows the percentage of genes associated with various ontologies (Fig. 7A) (diseases and pathways are shown). Users can select a specific category such as “heart diseases” to explore genes associated with heart diseases and its child terms. The comparison heat map visualizes the distribution of genes across two ontologies such as disease and pathway (Fig. 7B). Each of the top-level ontology terms is expandable to its children for more granular analysis. The ontology terms in blue indicate that there are child term annotations included in the comparison, while terms in red indicate that the exact term annotation is compared. The diseases vs. pathway heat map analyses of the cardiovascular disease genes are shown in Fig. 7B. Under cardiovascular diseases, clusters of genes associated with the cardiomyopathy pathway (and its children) and the metabolic syndrome pathway (and its children) were found among the RGD Curated genes; however, among ClinVar and OMIM genes, only one major cluster associated with cardiomyopathy was observed. The ClinVar genes and OMIM genes have fewer genes associated with the metabolic syndrome pathway. The percentage of genes associated with the cardiomyopathy pathway or the metabolic syndrome pathway is shown in Fig. 7C. The ClinVar and OMIM gene sets have higher percentages of genes associated with the cardiomyopathy group and lower percentages of genes associated with the metabolic syndrome pathway compared with the RGD Curated group. It is not surprising that the Total Cardiovascular Portal genes exhibit a similar distribution to the Curated group since it is the largest contributor.

Fig. 7. — Gene Annotation (GA) tool analysis of the cardiovascular disease genes. A: “Annotation Distribution” tables show the percentage of RGD Curated genes associated with diseases and pathways. B: “Comparison Heat Map” of the cardiovascular disease genes from different sources. The distribution of cardiovascular disease genes across diseases and the cardiovascular system disease pathway is shown. The ontology terms in blue indicate that there are child term annotations included in the comparison, while terms in red indicate that the exact term annotation is compared. C: bar charts show the percentage of cardiovascular disease genes associated with “cardiomyopathy pathway” and “metabolic syndrome pathway.” The comparison is based on the ontological structures of disease and pathway annotations; therefore, the child terms of the compared terms are included in the results.

STRING.

According to the heat map analysis in the GA tool, the RGD Curated genes are different from the ClinVar and OMIM genes in terms of their association with both the cardiovascular system disease pathway and the metabolic syndrome pathway. To corroborate this finding, we submitted these three gene sets to the STRING tool for KEGG pathway enrichment analysis. Figure 8A shows that the ClinVar and OMIM had similar enrichment patterns for the top cardiovascular system disease pathways, highly enriched in “Dilated cardiomyopathy pathway,” “Hypertrophic cardiomyopathy pathway,” and “Arrhythmogenic right ventricular cardiomyopathy pathway.” On the other hand, the RGD Curated genes are enriched both in cardiomyopathy pathway terms and “Type II diabetes mellitus pathway,” which is a child term of “metabolic syndrome pathway.” This result confirms that both “Cardiomyopathy pathway” and “Metabolic syndrome pathway” are overrepresented in the Curated genes (Fig. 8A).

GO enrichment analysis.

In general, ClinVar genes and OMIM genes have similar GO enrichment patterns (Fig. 8, B–D). Two BP terms “mesoderm development” and “anatomical structure morphogenesis,” along with their parent “developmental process,” were highly overrepresented among these two gene sets. For Curated genes these developmental terms were overrepresented according to P values, but other terms such as “cell communication,” “immune system process,” and “biological regulation” were ranked above them with smaller P values. In contrast, “cell death” and “immune system process” are the two least overrepresented BP terms among ClinVar genes and OMIM genes but were highly overrepresented in the Curated genes. For MF terms, “structural molecule activity” and “voltage-gated ion channel activity” were highly overrepresented in ClinVar genes and OMIM genes but not in the Curated genes. For CC terms, the three gene sets showed similar enrichment patterns.

DISCUSSION

The cardiovascular disease portal at RGD provides disease-associated data across rat, mouse, and human. These disease data are manually curated by RGD curators and imported from other disease databases. In this study, we have compared the annotation profiles of disease genes from three different sources: RGD Curated genes, ClinVar genes, and OMIM genes. ClinVar and OMIM are two major archives of human genetic disorders. The cardiovascular disease genes imported from these two sources were compatible in numbers and have over 70% overlap. Their annotation profiles in pathways and GO are similar but are distinct from those of the Curated genes. According to comparison heat map analysis in the GA tool, the majority of all three gene sets are represented in the “cardiomyopathy pathway.” However, the Curated gene set shows many more genes annotated to “metabolic syndrome pathway” compared with the two imported gene sets. The overrepresentation of “metabolic syndrome pathway” genes in the Curated gene set is confirmed by KEGG enrichment in “type II diabetes mellitus pathway.” The diversity of pathway annotation in the Curated gene set reflects the variety of evidence used for the corresponding disease annotations compared with the imported disease annotations. The main difference appears to be the genes annotated as disease biomarkers (IEP evidence code). The biomarker gene set should include a subset of the genetic association disease genes, as well as many more that are peripherally involved with the disease.

The Curated gene set is also different from the imported disease gene sets in GO enrichment analyses. GO is a rich bioinformatics resource for summarizing functional knowledge of gene products across species. It is interesting to compare how the Curated gene set is different from the imported disease sets in terms of functional knowledge. We used PANTHER enrichment analysis to demonstrate the functional differences between the Curated and imported gene sets. There are more than three times the number of RGD Curated genes compared with imported genes, with the largest group of annotated genes being in the disease biomarker group. That probably explains the increased GO enrichment for cytokines, cell communication, and extracellular region in the Curated gene set compared with the imported gene sets. Those genes that are peripheral to or downstream from the disease-causing genetic lesion are often found initially as biomarkers. The imported gene sets are enriched in GO annotations encompassing development and structural molecules, which relate to the pathogenesis of many of the cardiac diseases caused be genetic variants. Overall, our analysis results show that the manual annotations and automated imported data are complementary to each other in identifying disease genes with distinctive functions. In part the RGD Curated disease annotations with genetic association evidence confirm the information imported through the disease annotation pipeline. The RGD Curated disease annotations also add depth to the knowledge of genetic association with disease by describing gene variants not found in the sets of imported gene annotations. In addition the RGD Curated gene set identifies many other genes related to disease based on biomarkers, molecular mechanisms, and therapeutic target data.

RGD systematically organizes biological data and integrates these data by targeted diseases into the disease portals. Standard official nomenclatures are used to name genes, strains, and QTLs, and ontologies/controlled vocabularies are used to organize data for users to perform comprehensive data mining. To keep up with the rapid growth of disease research, RGD has weekly releases of imported annotations downloaded from source databases. The manual annotations are updated on the same schedule, adding annotations in newly targeted disease area and existing ones. To keep the existing disease portals up to date in the future, RGD plans to implement an alert system to inform curators of related disease publications through the text-mining tool OntoMate.

Using the cardiovascular disease portal as an example, we show how integrated data are presented in the RGD disease portals. Users can approach disease data from several aspects such as diseases, phenotypes, biological processes, pathways, or other options provided in the portal. The ontological relationship established among disease annotations allows users to perform a broad search on vascular diseases or a specific search on aneurysm using just one query term “vascular diseases (RDO:0003177)” or “aneurysm (RDO:0000744),” respectively. Searching for “vascular diseases” will also retrieve data associated with “aneurysm” since “aneurysm” is a child term of “vascular diseases.” The disease vocabulary RDO used here is a collaborative result of several institutions, and the continued development of a complete, structured human disease ontology for the research community is underway (7).

Complex diseases are the manifestation of altered physiological pathways. Related genes involved in shared pathways might contribute dynamically to development of the disease. Therefore, it is important to consider the interaction of a group of related genes rather than just one single gene in disease research. RGD has organized genes in pathways, pathway suites, or pathway suite networks to allow users to look at the bigger picture, at the system level, or to drill down to the single gene level. Another powerful tool to study complex diseases is the use of rat models. RGD has curated biological data from rat strains developed for biomedical research. The catalogue includes the extensively used hypertension model SS rats (RGD:69369, RS:0000789) and the mutant strains developed by modern gene-editing tools, e.g., SS/JrHsdMcwi (ZFN) mutants (RS:0002576). The mutant data have been expanding since the advancement of genome modification techniques in rats (4). In the future, the phenotype database PhenoMiner (http://www.rgd.mcw.edu/phenotypes/) will be integrated to the disease portals to provide links between genetic variation and quantitative phenotypes.

RGD started as a data resource to house curated genetic and genomic data for rats and expanded its data coverage to mouse and human to better serve the disease research community. The disease annotations are organized by a controlled vocabulary and annotated to genes, strains, and QTLs. The ontological organization of disease data allows researchers to examine data visually or use analysis tools to analyze large data sets. In this article, we explain how disease data are organized in the cardiovascular disease portal and on object report pages. We also analyze cardiovascular disease genes grouped by their sources with the GA tool and find the distinctive quality of the RGD curated genes versus the imported genes. We wish to facilitate cardiovascular research by presenting the organized cardiovascular disease data as a useful resource to the research community.

GRANTS

National Heart, Lung, and Blood Institute Grants HL-064541, HL-094271.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

AUTHOR CONTRIBUTIONS

S.-J.W., S.J.F.L., and M.S. conception and design of research; S.-J.W. and S.J.F.L. analyzed data; S.-J.W., S.J.F.L., G.T.H., V.P., and J.R.S. interpreted results of experiments; S.-J.W. and S.J.F.L. prepared figures; S.-J.W. drafted manuscript; S.-J.W., S.J.F.L., G.T.H., V.P., J.R.S., R.N., M.R.D., and M.S. edited and revised manuscript; S.-J.W., S.J.F.L., G.T.H., V.P., J.R.S., M.T., R.N., M.R.D., and M.S. approved final version of manuscript; M.T. performed experiments.

Supplementary Material

Supplementary Table 1

Supplementary_Table_1.xlsx^{(32.4KB, xlsx)}

Supplementary Table 2

Supplementary_Table_2.xlsx^{(28.3KB, xlsx)}

Footnotes

The online version of this article contains supplemental material.

REFERENCES

1.Amberger J, Bocchini C, Hamosh A. A new face and new challenges for Online Mendelian Inheritance in Man [OMIM(R)]. Hum Mutat 32: 564–567, 2011. [DOI] [PubMed] [Google Scholar]
2.Consortium GO. Gene Ontology Consortium: going forward. Nucleic Acids Res 43: D1049–D1056, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Du J, Yuan Z, Ma Z, Song J, Xie X, Chen Y. KEGG-PATH: Kyoto encyclopedia of genes and genomes-based pathway analysis using a path analysis model. Mol Biosyst 10: 2441–2447, 2014. [DOI] [PubMed] [Google Scholar]
4.Dwinell MR. Online tools for understanding rat physiology. Brief Bioinform 11: 431–439, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Dwinell MR, Lazar J, Geurts AM. The emerging role for rat models in gene discovery. Mamm Genome 22: 466–475, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. STRING v9.1:protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41: D808–D815, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Hayman GT, Laulederkind SJ, Smith JR, Wang SJ, Petri V, Nigam R, Tutaj M, De Pons J, Dwinell MR, Shimoyama M. The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database. Database (Oxford) 2016: baw034, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Jewison T, Su Y, Disfany FM, Liang Y, Knox C, Maciejewski A, Poelzer J, Huynh J, Zhou Y, Arndt D, Djoumbou Y, Liu Y, Deng L, Guo AC, Han B, Pon A, Wilson M, Rafatnia S, Liu P, Wishart DS. SMPDB 2.0: big improvements to the Small Molecule Pathway Database. Nucleic Acids Res 42: D478–D484, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Kumarasamy S, Waghulde H, Gopalakrishnan K, Mell B, Morgan E, Joe B. Mutation within the hinge region of the transcription factor Nr2f2 attenuates salt-sensitive hypertension. Nat Commun 6: 6252, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42: D980–D985, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Laulederkind SJ, Liu W, Smith JR, Hayman GT, Wang SJ, Nigam R, Petri V, Lowry TF, De Pons J, Dwinell MR, Shimoyama M. PhenoMiner: quantitative phenotype curation at the rat genome database. Database (Oxford) 2013: bat015, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Liu W, Laulederkind SJ, Hayman GT, Wang SJ, Nigam R, Smith JR, De Pons J, Dwinell MR, Shimoyama M. OntoMate: a text-mining tool aiding curation at the Rat Genome Database. Database (Oxford) 2015: bau129, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Mi H, Muruganujan A, Thomas PD. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41: D377–D386, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Nigam R, Laulederkind SJ, Hayman GT, Smith JR, Wang SJ, Lowry TF, Petri V, De Pons J, Tutaj M, Liu W, Jayaraman P, Munzenmaier DH, Worthey EA, Dwinell MR, Shimoyama M, Jacob HJ. Rat Genome Database: a unique resource for rat, human, and mouse quantitative trait locus data. Physiol Genomics 45: 809–816, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Nigam R, Munzenmaier DH, Worthey EA, Dwinell MR, Shimoyama M, Jacob HJ. Rat Strain Ontology: structured controlled vocabulary designed to facilitate access to strain data at RGD. J Biomed Semantics 4: 36, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Petri V, Jayaraman P, Tutaj M, Hayman GT, Smith JR, De Pons J, Laulederkind SJ, Lowry TF, Nigam R, Wang SJ, Shimoyama M, Dwinell MR, Munzenmaier DH, Worthey EA, Jacob HJ. The pathway ontology - updates and applications. J Biomed Semantics 5: 7, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, Feolo M, Fingerman IM, Geer LY, Helmberg W, Kapustin Y, Krasnov S, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Karsch-Mizrachi I, Ostell J, Panchenko A, Phan L, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang Y, Wilbur WJ, Yaschenko E, Ye J. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 40: D13–D25, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH. PID: the Pathway Interaction Database. Nucleic Acids Res 37: D674–D679, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Seal RL, Gordon SM, Lush MJ, Wright MW, Bruford EA. genenames.org: the HGNC resources in 2011. Nucleic Acids Res 39: D514–D519, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Smith CL, Goldsmith CA, Eppig JT. The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 6: R7, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Smith JR, Park CA, Nigam R, Laulederkind SJ, Hayman GT, Wang SJ, Lowry TF, Petri V, Pons JD, Tutaj M, Liu W, Worthey EA, Shimoyama M, Dwinell MR. The clinical measurement, measurement method and experimental condition ontologies: expansion, improvements and new applications. J Biomed Semantics 4: 26, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Toland EJ, Saad Y, Yerga-Woolwine S, Ummel S, Farms P, Ramdath R, Frank BC, Lee NH, Joe B. Closely linked non-additive blood pressure quantitative trait loci. Mamm Genome 19: 209–218, 2008. [DOI] [PubMed] [Google Scholar]
23.Wang SJ, Laulederkind SJ, Hayman GT, Petri V, Liu W, Smith JR, Nigam R, Dwinell MR, Shimoyama M. PhenoMiner: a quantitative phenotype database for the laboratory rat, Rattus norvegicus. Application in hypertension and renal disease. Database (Oxford) 2015: bau128, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table 1

Supplementary_Table_1.xlsx^{(32.4KB, xlsx)}

Supplementary Table 2

Supplementary_Table_2.xlsx^{(28.3KB, xlsx)}

[B1] 1.Amberger J, Bocchini C, Hamosh A. A new face and new challenges for Online Mendelian Inheritance in Man [OMIM(R)]. Hum Mutat 32: 564–567, 2011. [DOI] [PubMed] [Google Scholar]

[B2] 2.Consortium GO. Gene Ontology Consortium: going forward. Nucleic Acids Res 43: D1049–D1056, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Du J, Yuan Z, Ma Z, Song J, Xie X, Chen Y. KEGG-PATH: Kyoto encyclopedia of genes and genomes-based pathway analysis using a path analysis model. Mol Biosyst 10: 2441–2447, 2014. [DOI] [PubMed] [Google Scholar]

[B4] 4.Dwinell MR. Online tools for understanding rat physiology. Brief Bioinform 11: 431–439, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Dwinell MR, Lazar J, Geurts AM. The emerging role for rat models in gene discovery. Mamm Genome 22: 466–475, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. STRING v9.1:protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41: D808–D815, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Hayman GT, Laulederkind SJ, Smith JR, Wang SJ, Petri V, Nigam R, Tutaj M, De Pons J, Dwinell MR, Shimoyama M. The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database. Database (Oxford) 2016: baw034, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Jewison T, Su Y, Disfany FM, Liang Y, Knox C, Maciejewski A, Poelzer J, Huynh J, Zhou Y, Arndt D, Djoumbou Y, Liu Y, Deng L, Guo AC, Han B, Pon A, Wilson M, Rafatnia S, Liu P, Wishart DS. SMPDB 2.0: big improvements to the Small Molecule Pathway Database. Nucleic Acids Res 42: D478–D484, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Kumarasamy S, Waghulde H, Gopalakrishnan K, Mell B, Morgan E, Joe B. Mutation within the hinge region of the transcription factor Nr2f2 attenuates salt-sensitive hypertension. Nat Commun 6: 6252, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42: D980–D985, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Laulederkind SJ, Liu W, Smith JR, Hayman GT, Wang SJ, Nigam R, Petri V, Lowry TF, De Pons J, Dwinell MR, Shimoyama M. PhenoMiner: quantitative phenotype curation at the rat genome database. Database (Oxford) 2013: bat015, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12.Liu W, Laulederkind SJ, Hayman GT, Wang SJ, Nigam R, Smith JR, De Pons J, Dwinell MR, Shimoyama M. OntoMate: a text-mining tool aiding curation at the Rat Genome Database. Database (Oxford) 2015: bau129, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Mi H, Muruganujan A, Thomas PD. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41: D377–D386, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Nigam R, Laulederkind SJ, Hayman GT, Smith JR, Wang SJ, Lowry TF, Petri V, De Pons J, Tutaj M, Liu W, Jayaraman P, Munzenmaier DH, Worthey EA, Dwinell MR, Shimoyama M, Jacob HJ. Rat Genome Database: a unique resource for rat, human, and mouse quantitative trait locus data. Physiol Genomics 45: 809–816, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Nigam R, Munzenmaier DH, Worthey EA, Dwinell MR, Shimoyama M, Jacob HJ. Rat Strain Ontology: structured controlled vocabulary designed to facilitate access to strain data at RGD. J Biomed Semantics 4: 36, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Petri V, Jayaraman P, Tutaj M, Hayman GT, Smith JR, De Pons J, Laulederkind SJ, Lowry TF, Nigam R, Wang SJ, Shimoyama M, Dwinell MR, Munzenmaier DH, Worthey EA, Jacob HJ. The pathway ontology - updates and applications. J Biomed Semantics 5: 7, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, Feolo M, Fingerman IM, Geer LY, Helmberg W, Kapustin Y, Krasnov S, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Karsch-Mizrachi I, Ostell J, Panchenko A, Phan L, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang Y, Wilbur WJ, Yaschenko E, Ye J. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 40: D13–D25, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH. PID: the Pathway Interaction Database. Nucleic Acids Res 37: D674–D679, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Seal RL, Gordon SM, Lush MJ, Wright MW, Bruford EA. genenames.org: the HGNC resources in 2011. Nucleic Acids Res 39: D514–D519, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Smith CL, Goldsmith CA, Eppig JT. The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 6: R7, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Smith JR, Park CA, Nigam R, Laulederkind SJ, Hayman GT, Wang SJ, Lowry TF, Petri V, Pons JD, Tutaj M, Liu W, Worthey EA, Shimoyama M, Dwinell MR. The clinical measurement, measurement method and experimental condition ontologies: expansion, improvements and new applications. J Biomed Semantics 4: 26, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Toland EJ, Saad Y, Yerga-Woolwine S, Ummel S, Farms P, Ramdath R, Frank BC, Lee NH, Joe B. Closely linked non-additive blood pressure quantitative trait loci. Mamm Genome 19: 209–218, 2008. [DOI] [PubMed] [Google Scholar]

[B23] 23.Wang SJ, Laulederkind SJ, Hayman GT, Petri V, Liu W, Smith JR, Nigam R, Dwinell MR, Shimoyama M. PhenoMiner: a quantitative phenotype database for the laboratory rat, Rattus norvegicus. Application in hypertension and renal disease. Database (Oxford) 2015: bau128, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database

Shur-Jen Wang

Stanley J F Laulederkind

G Thomas Hayman

Victoria Petri

Jennifer R Smith

Marek Tutaj

Rajni Nigam

Melinda R Dwinell

Mary Shimoyama

Abstract

METHODS

Gene Retrieval

Analysis Tools

Fig. 8.

RESULTS

Cardiovascular Disease Portal

Fig. 1.

Gene Annotations

Disease Annotations

Fig. 2.

Fig. 3.

Pathway Annotations

Data Object Report Pages

Genes.

Strains.

Fig. 4.

QTLs.

Fig. 5.

Genome Analysis Tools

Cardiovascular Disease Gene Analysis

Cardiovascular disease genes.

Fig. 6.

GA tool.

Fig. 7.

STRING.

GO enrichment analysis.

DISCUSSION

GRANTS

DISCLOSURES

AUTHOR CONTRIBUTIONS

Supplementary Material

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases