Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 May 23.
Published in final edited form as: Chem Biol. 2013 May 23;20(5):629–635. doi: 10.1016/j.chembiol.2013.03.018

Pathway Databases: Making Chemical and Biological Sense of the Genomic Data Flood

Peter D’Eustachio 1
PMCID: PMC3678733  NIHMSID: NIHMS462970  PMID: 23706629

Summary

Pathway databases are a means to systematically associate proteins with their functions and link them into networks that describe the reaction space of an organism. Here the Reactome Knowledgebase provides a convenient example to illustrate strategies used to assemble such a reaction space based on manually curated experimental data; approaches to semi-automated extension of these manual annotations to infer annotations for a large fraction of a species’ proteins; and the use of networks of functional annotations to infer pathway relationships among variant proteins that have been associated with disease risk through genome-wide surveys and resequencing studies of tumors.

Keywords: biological networks, data mining, ontology, protein-protein interaction


A reductionist but compelling view of the physiology of a cell is that it is fully determined by the properties of its parts: a catalog of its molecules and their functions is sufficient to specify its behavior; feedback loops and regulatory interactions specified in the catalog allow inference of responses to external stimuli at the cellular level. Proteins, the key components of this parts list, can be comprehensively catalogued. Accurate genomic DNA sequences exist for many species including humans. High-throughput tools for identifying mRNAs and proteins, combined with computational tools for identifying open reading frames in a genomic DNA sequence provide near-complete lists of the proteins a genome can encode (The ENCODE Project Consortium 2012). A variety of technologies provide views of the proteins actually present in specific tissues and their changes as a function of tissue state (Uhlen et al. 2010). DNA resequencing projects are beginning to support inference of comprehensive catalogs of individual variations in protein sequences (e.g., Tennessen et al. 2012, 1000 Genomes Consortium 2012). But these protein lists do not explain an organism’s functions. Explanation requires a wiring diagram that embodies functional information: what components interact with what other ones and with what consequences. Interactions can include transformations of molecules including changes in chemical structure and location, and association of proteins and other molecules to form complexes with novel properties. A pathway database captures these functional relationships among the molecular parts of a cell.

A variety of pathway databases have been developed in the past decade, notably HumanCyc (Romero et al. 2005), KEGG (Kanehisa et al. 2012), NCI-PID (Schaefer et al. 2009), Panther (Mi et al. 2013), and Reactome (Croft et al. 2011). Features of broadly useful pathway databases are emerging. Capturing the diverse functions of a cell in a single consistent representation is critical. For example, a hormone binds a cell-surface receptor, setting off a cascade of phosphorylation events some of which activate or inactivate proteins already in the cell while others activate transcription factors, causing synthesis of proteins not previously present. The changed array of active proteins in turn might switch the cell from synthesizing and storing energy-rich molecules to breaking them down or from a dividing state to a non-dividing one. For this description to capture a cell’s normal behavior or predict its response to a stress like a mutational change in a protein’s function or the presence of a novel small molecule in its environment, the description must be comprehensive and internally consistent. A single vocabulary is needed to describe biochemical transformations of small molecules, binding and transport events, and signal transduction. The description should also account for location. Cells are not homogeneous; the same molecule in different subcellular locations can have entirely different fates.

Organization and Content of a Pathway Database

The problem of representing diverse physical entities and molecular functions in a consistent and detailed way can be illustrated by human “Hippo” signaling and linked processes that regulate the YAP1 and WWTR1 transcription factors (Figure 1). In their unphosphorylated states, cytosolic YAP1 and WWTR1 can be taken up into the nucleus and function as transcriptional co-activators for an array of genes that promote cell proliferation. In their phosphorylated states YAP1 and WWTR1 are instead sequestered in the cytosol. Phosphorylation is mediated by the three-step Hippo kinase cascade, first described in Drosophila (Oh and Irvine 2010). There are two human homologues of each of the Drosophila kinases. Autophosphorylated STK3 (MST2) and STK4 (MST1) (homologues of Drosophila Hippo) catalyze the phosphorylation and activation of LATS1 and LATS2 and of the accessory proteins MOB1A and MOB1B. LATS1 and LATS2 in turn catalyze the phosphorylation of the transcriptional co-activators YAP1 and WWTR1 (TAZ). Several accessory proteins are required for the kinase cascade. STK3 (MST2) and STK4 (MST1) each form a complex with SAV1, and LATS1 and LATS2 form complexes with MOB1A and MOB1B.

Figure 1.

Figure 1

Human Hippo signaling pathway. Kinase cascade components STK, LATS, MOB, and SAV1 interact, leading to the phosphorylation and cytosolic sequestration of YAP and WWTR1 transcription factors. Caspase 3 may activate this process by cleaving STK3, increasing its kinase activity. Cytosolic sequestration of YAP and WWTR1 may also be mediated by AMOT and ZO proteins.

Additional processes can modulate YAP1 and WWTR1 activity. Caspase 3 protease cleaves STK3 (MST2) and STK4 (MST1), releasing inhibitory carboxyterminal domains from each and leading to increased kinase activity and YAP1/WWTR1 phosphorylation (Lee et al. 2001). Also, cytosolic AMOT and ZO proteins can bind YAP1 and WWTR1 in their unphosphorylated states to form complexes that are sequestered in the cytosol (Chan et al. 2011; Oka et al. 2010; Remue et al. 2010).

To represent processes like these, a data model is needed that can classify and catalog physical entities (e.g., proteins and other macromolecules, small molecules, complexes of these entities and post-translationally modified forms of them), the transformations they can undergo (e.g., biochemical reaction, association to form a complex, translocation from one cellular compartment to another), and their subcellular locations. It is convenient to provide an overview of these features as they are implemented in the Reactome data model (Vastrik et al. 2007), which builds on earlier work by Kanehisa (2000) and Karp et al. (1999) (Figure 2). A more rigorous data model specification is available in documentation linked to the Reactome web site (www.reactome.org).

Figure 2.

Figure 2

Schematic view of Reactome data model. Relationships of event and physical entity classes to one another and to external reference databases and ontologies are shown.

Formally, the data model is a list of classes and subclasses, notably a physical entity class with protein, small molecule, and complex subclasses, and an event class with reaction and pathway subclasses. Each class has a set of attributes that identify individual class members (instances) and distinguish them from one another.

For the protein class, attributes include the protein’s name, length, covalent modifications if any, and subcellular location. Phosphorylated STK3, for example, is 491 residues in length with a phosphate group attached to the side chain hydroxyl group of threonine residue 180 and is cytosolic. Different instances are created not only for different proteins (STK3 versus caspase 3) but for forms of the same protein that differ in their locations or covalent modifications.

Changes in the amino acid sequence of a protein due to mutation can be represented as a kind of covalent modification by identifying the affected residue in the protein and the replacement amino acid residue (Milasic et al. 2012).

For the complex class, attributes are the proteins and small molecules that form the complex, the numbers of copies of each if known, and a subcellular location. The p-STK3:p-SAV1 complex, for example consists of two copies of the full-length form of each protein, each covalently modified by phosphorylation (STK3 at residue 180, SAV1 at an unknown residue), located in the cytosol.

A reaction’s attributes are its substrates (inputs), products (outputs), location, and when appropriate a catalyst and one or more regulators. Inputs and outputs are physical entities, i.e., protein, small molecule, and complex instances. Catalysts are annotated by creating instances of a catalyst-activity class with two key attributes, a physical entity and its activity described with a molecular function term from the Gene Ontology (GO) (Ashburner et al. 2000; The Gene Ontology Consortium 2010). For example, the proteolytic cleavage of the p-STK3 component of the pSTK3:p-SAV1 is represented by a reaction instance that has the intact complex as input, a new complex consisting of intact p-SAV1 and the amino-terminal portion of p-STK3, plus the free carboxyterminal portion of p-STK3, as outputs, a cytosolic location, and an associated catalyst-activity that has cytosolic caspase-3 as its physical entity and “cysteine-type endopeptidase activity” as its catalyst activity. If one wanted to annotate the effect of a drug that accelerated the reaction by increasing caspase-3 activity, one could create an instance of the regulation class with the drug as its regulator attribute and the cleavage reaction as its regulated-event attribute.

A transport reaction instance would have the cargo entity in its initial location as input, the same molecular entity but in its final location, hence a different physical-entity instance in the data model, as output, and a “catalyst-activity” instance containing the protein or complex that acted as a transporter or channel plus a GO molecular-function transport activity term. Binding reactions to form complexes are annotated simply with small molecules, protein monomers, or subcomplexes as inputs and the complex as output.

A pathway instance has as its attributes a list of the reactions that comprise it and an appropriate descriptive term from the GO biological process-ontology.

To annotate the experimental evidence that supports these assertions about the properties of physical entities, events, and catalytic and regulatory activities, all of these classes have two additional attributes, one for literature references and one for a free-text summary.

This annotation strategy can readily capture disease processes, by supporting the annotation of variant physical entities formed by mutated proteins, activities of biological toxins, and regulatory effects of xenobiotic molecules (Milacic et al. 2012).

The annotation of alternative locations, post-translational modifications and genetic variations and conformations of a molecule causes instances of a physical entity to proliferate. The basic chemical information that all forms share is stored in a separate class of reference (canonical) physical entities, explicitly linking all the alternative forms of a single entity. The attributes of a reference entity include its name, reference chemical structure or sequence, and its accession numbers in widely used reference databases: UniProt for proteins (The UniProt Consortium 2012), ChEBI for small molecules (de Matos et al. 2010) and Ensembl for nucleic acids (Flicek et al. 2012) (Figure 2).

These annotations can be visualized as SBGN-like (Le Novère et al. 2009) diagrams (Figure 3A) in which physical entity icons are connected by reaction edges, and displayed in a web environment that allows access to annotation details of the entities and events (Croft et al. 2011). As of March 2013, 6,977 human proteins have been annotated and associated with 6,198 reactions involving 1,347 small molecules, backed by evidence gathered from 13,588 literature references. The reactions span biological processes that include metabolism, core events of molecular biology, signaling and transport processes, and specialized processes such as blood coagulation and neuronal guidance. In these respects Reactome, like HumanCyc, KEGG, NCI-PID, and PANTHER, provides an extensive but incomplete set of data linking many human proteins to qualitative specifications of their functions.

Figure 3.

Figure 3

Visualization of curated components of pathways and inferred interacting proteins and small molecules. A. A diagram of the Hippo/STK signaling pathway as annotated in Reactome, using SBGN-like iconography to display proteins and complexes (rectangles) and small molecules (ovals), connected by reaction edges that distinguish inputs (plain end), outputs (arrowhead end) and catalysis (open-circle end) as well as reaction type (open box reaction node, chemical transformation; closed circle, binding). B. A portion of the manually annotated pathway is enlarged and overlaid with proteins found in high-throughput surveys (Wu et al. 2010) to interact with Hippo pathway components STK3 (MST2) or STK4 (MST1) or both. C) The manually annotated pathway is overlaid with small molecules from the ChEMBL database (Gaulton et al. 2012) that bind STK3 (MST2) or STK4 (MST1) or both.

Inference from annotated physical entities and reactions to functional descriptions of whole proteomes

This description of a manually curated pathway database raises two major issues. First, how can the annotation process used so far be scaled up and extended? The human genome encodes on the order of 22,000 proteins. How can the remaining two-thirds of them be annotated and how can the full range of functions for each be captured? Can this process be extended to other species as their genome sequences and predicted protein sets are generated? Second, these annotations represent an archive of known relationships among proteins and other cellular components. Can they be used to discover novel relationships and thereby convert an archive of well-characterized proteins and their functions into a tool for exploring possible novel functions and relationships of these and other proteins? Recent work suggests promising approaches to both of these issues and the remainder of this perspective will focus on them. Again, discussion will center on Reactome and again it is important to note that this focus is a matter of convenience – the approaches are generalizable.

Manual annotation is labor-intensive: the 7,000 human proteins and 6,200 reactions involving them now annotated in Reactome are the product of several years’ work, and the rate of manual annotation is a nearly linear function of the number of skilled curators at work and the level of accuracy and detail expected. How can a manually-built framework be used to organize larger bodies of information automatically or semi-automatically, within and between species?

Inferences from Annotated Proteins to Protein Families

Several strategies are widely used to group proteins into families based on sequence similarity (e.g., Altschul et al. 1997; Finn et al. 2011). These strategies can be applied to identify unannotated human proteins likely to share a functional property of interest with an annotated one. Specifically, sequence alignment strategies that combine requirements for a moderate level of overall sequence similarity with perfect conservation (identity and spacing) of key amino acid residues show very high specificity when applied to identify members of enzyme families with conserved substrate specificity. In a preliminary analysis of enzymes mediating reactions of intermediary metabolism catalogued in Reactome, 110 instances have been found in which a reaction said to be catalyzed by a well-studied enzyme can be re-annotated to be catalyzed by any one of a set of up to ten candidate enzymes. Work is underway to validate these observations and to extend the analysis to additional enzymes and potentially to transport proteins. An intriguing observation is that in several cases a single enzyme has been identified as a candidate catalyst of more than one reaction, suggesting that this strategy might be useful not only to bring more proteins into an existing reaction network but also to extend the connectivity of that network.

Sequence similarity among proteins is also widely used to predict functions of an uncharacterized protein in a species of interest from the functions found for it in a well-studied system. The complete protein sets predicted from whole genome sequencing for many species support this inference process but also complicate it by forcing investigators to sort out complex sub-family structural relationships within and between species in order to infer likely functional relationships (Koonin 2005). A promising recent approach has been developed by a group of investigators associated with the Gene Ontology project to extend annotation in a semi-automated fashion to a collection of reference genomes (Gaudet et al. 2011). Briefly, a sequence alignment is assembled for each protein family from the predicted complete protein sets for all of the species to be annotated. The alignment is organized to display both similarity relationships among the proteins and evolutionary relationships among the species, and each sequence is tagged with any molecular functions that have been established for it experimentally. Where different family members have different present-day functions in well-studied species, these can be traced back through the phylogenetic relationships in the alignment to identify the likely ancestral point at which gene duplication gave rise to the genes that subsequently diverged to take on these distinct current functions, or a mutation in a single gene caused a change in function that has been passed on to present-day species in that branch of the phylogeny. This in turn allows parsimonious inferences of likely functions for present-day members of the protein family for all of the species.

Extending Annotations with Computationally Inferred Protein-Protein Pairs

Many high-throughput strategies have been used to catalog pairwise protein-protein interactions – co-occurrence in immune precipitates, interaction in yeast two-hybrid screens co-annotation with the same specific GO term, and others. Wu et al. (2010) have developed a strategy to use such pairwise interactions between already-annotated proteins and previously unknown ones to incorporate the latter into pathway databases. They have assembled many pairwise-interaction data sets and designed a naïve Bayes classifier, trained on validated protein-protein functional interaction data, to assign a quality score to each pairwise interaction. Any one pair of proteins might thus have interaction scores from one or several sources. Whenever a pair involves one protein that has been manually curated and associated with pathways and one that has not been curated, the un-curated protein can be associated with the annotated one (Figure 3B). Evidence for the association can be provided as a tabulation of all of the high-throughput data from which the protein-protein interaction was inferred (Figure 4). In aggregate, these pairings identify approximately 5,000 unannotated human proteins as candidates for involvement with one or more Reactome pathways.

Figure 4.

Figure 4

Clustering genes with functional information from curated and inferred pathway database content. Seventy-nine genes whose variants are associated with clinically important red blood cell phenotypes in humans (circles) (van der Harst et al. 2012), together with fifty-five genes identified as their functional interactors (diamonds) are grouped into ten modules with an edge-betweenness strategy (Wu et al. 2010). Modules are distinguished by color; manually annotated interactions between gene products are shown as solid lines; inferred pairwise interactions are shown as dotted lines; arrowheads indicate activation or catalysis; T-bars indicate inhibition. The largest module (34 total nodes, 16 linkers) is enlarged, and the inset shows the quality scores for the inferred pairwise functional interaction between TAL1 (RBC variant gene) and HDAC2 (linker).

These interactors can be visualized as coronas decorating the annotated proteins in pathway diagrams. The version of this analysis and display tool implemented in the Reactome web site enables decoration of pathways with interactions from pre-computed data sets or with custom pairwise-interaction data sets generated by users. In the example shown in Figure 3B, pre-computed data are used to highlight groups of proteins that interact with each of STK3 (MST2) and STK4 (MST1) proteins in the Hippo signaling pathway and a smaller number that interact with both. The observation of a set of interactors not known to be involved in the Hippo signaling process must be interpreted cautiously, but at a minimum it represents an experimentally testable hypothesis identifying a set of proteins that might be functional links between Hippo signaling and other processes, and in two cases, between different arms of the Hippo signaling process itself.

The same logic can be applied to display interactions between proteins and small molecules. In Figure 3C the interactions of STK3 (MST2) and STK4 (MST1) proteins with small molecules catalogued in the ChEMBL database (Gaulton et al. 2012) are shown. Again, the display suggests the intriguing speculation that it might be possible to find drugs to modulate two arms of this pathway independently.

Discovering Functional Networks in Somatic Mutant and GWAS Gene Sets

Functionally interacting protein-protein pairs can not only be used to extend manual annotation to suggest related proteins. It can also be applied to sets of genes identified in large-scale screens, such as those carried out in connection with the Cancer Genome Atlas project. Working with sets of recurrently mutated genes from several tumor types identified in that project, Wu et al. (2010) have identified, for each data set/tumor type they examined sets of related genes (e.g., all involved in signaling or in cell cycle regulation) composed of recurrently mutated genes linked via functional protein-protein interactions. This was done without applying prior information about gene function.

Here, this strategy for data analysis has been applied to a set of 121 genes whose polymorphism has been linked to clinically important red blood cell phenotypes in a group of genome-wide association studies (GWASs) (van der Harst et al. 2012). Use of the edge-betweenness strategy described by Wu et al. (2010) to cluster these genes based on shared functional properties yields only a few small clusters. When other genes identified as pairwise interactors of the 121 are added to the collection, however, 79 of the 121 are grouped with 55 linker genes into ten modules, the largest of which contains 18 genes identified in the GWAS and 16 linkers from the pairwise interactor set (Figure 4). The well-studied genes in this cluster play related roles in DNA replication and transcription, suggesting similar roles for other genes brought into the cluster by this analysis. While these processes take place in all nucleated cells, it is a reasonable hypothesis that their precise quantitative regulation is especially critical in any developing tissue and perhaps especially so in a system like erythropoiesis where very large numbers of cells containing large but precisely controlled amounts of alpha- and beta-globin proteins must be generated at a high basal rate and with regulated responses to environmental stresses like blood loss or hypoxia. At a less theoretical level, the analysis also identifies, in the set of linker genes, attractive targets for association studies and large-scale resequencing studies aimed at identifying rare genetic variants that might have strong phenotypic effects.

Conclusions

This perspective, using a single pathway database as a convenient model, has attempted to show the value of such data structures generally. These databases are becoming widely useful on-line archives that allow experimental data detailing the molecular functions of proteins to be organized in a useful, consistent, format that supports computational mining and querying. Several recently developed tools for analysis of protein-protein pairs defined by high-throughput assays, and for the analysis of functional relationships suggested by the organization of sequence-based protein families appear likely to support efficient extension of manual annotations to cover a large fraction of the human proteome and perhaps the proteomes of other species. Other tools allow the use of a pathway archive as the starting point for identifying possible functional relationships among genes identified because variants of them are disease risk factors. In all of these ways, pathway databases appear ready to grow beyond their role as repositories for well-established data and to take on a central role in experiments to work out the functional architecture of whole proteomes.

Acknowledgments

The Reactome content and software are the product of a collaboration among groups at the Ontario Institute for Cancer Research (Lincoln Stein, Michael Caudy, Marc Gillespie, Robin Haw, Bruce May, Joel Weiser, Guanming Wu), the European Bioinformatics Institute (Henning Hermjakob, Ewan Birney, David Croft, Antonio Fabregat-Mundo, Phani Garapati, Bijay Jassal, Steven Jupe) and the NYU School of Medicine (Peter D’Eustachio, Lisa Matthews, Veronica Shamovsky). We are grateful to the many scientists who collaborated with us as authors and reviewers to build the content of the Knowledgebase. This work is supported by grants from the National Human Genome Research Institute/NIH (U41 HG003751) and the European Union 6th Framework Programme (LSHG-CT-2005-518254 “ENFIN”).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. 1000 Genomes Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chan SW, Lim CJ, Chong YF, Pobbati AV, Huang C, Hong W. Hippo Pathway-independent restriction of TAZ and YAP by Angiomotin. J Biol Chem. 2011;286:7018–7026. doi: 10.1074/jbc.C110.212621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Croft D, O’Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 2011;39:D691–697. doi: 10.1093/nar/gkq1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. de Matos P, Alcántara R, Dekker A, Ennis M, Hastings J, Haug K, Spiteri I, Turner S, Steinbeck C. Chemical entities of biological interest: an update. Nucleic Acids Res. 2009;38:D249–254. doi: 10.1093/nar/gkp886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:W29–W37. doi: 10.1093/nar/gkr367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, et al. Ensembl 2012. Nucleic Acids Res. 2012;40:D84–D90. doi: 10.1093/nar/gkr991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gaudet P, Livstone MS, Lewis SE, Thomas PD. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Brief Bioinform. 2011;12:449–462. doi: 10.1093/bib/bbr042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40:D1100–D1107. doi: 10.1093/nar/gkr777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. The Gene Ontology Consortium. The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 2010;38:D331–D335. doi: 10.1093/nar/gkp1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kanehisa M. Post-Genome Informatics. New York: Oxford University Press; 2000. [Google Scholar]
  14. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res. 2012;40:D109–D114. doi: 10.1093/nar/gkr988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Karp PD, Krummenacker M, Paley S, Wagg J. Integrated pathway–genome databases and their role in drug discovery. Trends in Biotechnology. 1999;17:275–281. doi: 10.1016/s0167-7799(99)01316-5. [DOI] [PubMed] [Google Scholar]
  16. Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005;39:309–338. doi: 10.1146/annurev.genet.39.073003.114725. [DOI] [PubMed] [Google Scholar]
  17. Le Novère N, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin A, Demir E, Wegner K, Aladjem MI, Wimalaratne SM, et al. The Systems Biology Graphical Notation. Nat Biotechnol 2009. 2009;27:735–741. doi: 10.1038/nbt.1558. [DOI] [PubMed] [Google Scholar]
  18. Lee KK, Ohyama T, Yajima N, Tsubuki S, Yonehara S. MST, a physiological caspase substrate, highly sensitizes apoptosis both upstream and downstream of caspase activation. J Biol Chem. 2001;276:19276–19285. doi: 10.1074/jbc.M005109200. [DOI] [PubMed] [Google Scholar]
  19. Mi H, Muruganujan A, Thomas PD. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 2013;41:D377–386. doi: 10.1093/nar/gks1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Milacic M, Haw R, Rothfels K, Wu G, Croft D, Hermjakob H, D’Eustachio P, Stein L. Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome. Cancers. 2012;4:1180–1211. doi: 10.3390/cancers4041180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Oh H, Irvine KD. Yorkie: the final destination of Hippo signaling. Trends Cell Biol. 2010;20:410–417. doi: 10.1016/j.tcb.2010.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Oka T, Remue E, Meerschaert K, Vanloo B, Boucherie C, Gfeller D, Bader GD, Sidhu SS, Vandekerckhove J, Gettemans J, Sudol M. Functional complexes between YAP2 and ZO-2 are PDZ domain-dependent, and regulate YAP2 nuclear localization and signalling. Biochem J. 2010;432:461–472. doi: 10.1042/BJ20100870. [DOI] [PubMed] [Google Scholar]
  23. Remue E, Meerschaert K, Oka T, Boucherie C, Vandekerckhove J, Sudol M, Gettemans J. TAZ interacts with zonula occludens-1 and -2 proteins in a PDZ-1 dependent manner. FEBS Lett. 2010;584:4175–4180. doi: 10.1016/j.febslet.2010.09.020. [DOI] [PubMed] [Google Scholar]
  24. Romero P, Wagg J, Green ML, Kaiser D, Krummenacker M, Karp PD. Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 2005;6:R2. doi: 10.1186/gb-2004-6-1-r2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH. PID: The Pathway Interaction Database. Nucleic Acids Res. 2009;37:D674–679. doi: 10.1093/nar/gkn653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Tennessen JA, Bigham AW, O’Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337:64–69. doi: 10.1126/science.1219240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, Forsberg M, Zwahlen M, Kampf C, Wester K, Hober S, Wernerus H, Björling L, Ponten F. Towards a knowledge-based Human Protein Atlas. Nature Biotechnol. 2010;28:1248–1250. doi: 10.1038/nbt1210-1248. [DOI] [PubMed] [Google Scholar]
  28. The UniProt Consortium. Reorganizing the protein space at the Universal Protein Resource (UniProt) Nucleic Acids Res. 2012;40:D71–D75. doi: 10.1093/nar/gkr981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. van der Harst P, Zhang W, Mateo Leach I, Rendon A, Verweij N, Sehmi J, Paul DS, Elling U, Allayee H, Li X, et al. Seventy-five genetic loci influencing the human red blood cell. Nature. 2012;492:369–375. doi: 10.1038/nature11677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Vastrik I, D’Eustachio P, Schmidt E, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, Lewis S, Matthews L, Wu G, Birney E, Stein L. Reactome: a knowledgebase of biological pathways and processes. Genome Biology. 2007;8:R39. doi: 10.1186/gb-2007-8-3-r39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Wu G, Feng X, Stein L. A human functional protein interaction network and its application to cancer data analysis. Genome Biology. 2010;11:R63. doi: 10.1186/gb-2010-11-5-r53. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES