Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2006 Apr 28.
Published in final edited form as: Bioinformatics. 2005 Jun 16;21(16):3454–3455. doi: 10.1093/bioinformatics/bti546

Querying and Computing with BioCyc Databases

Markus Krummenacker a, Suzanne Paley a, Lukas Mueller b, Thomas Yan c, Peter D Karp a,*
PMCID: PMC1450015  NIHMSID: NIHMS4518  PMID: 15961440

Abstract

Summary

We describe multiple methods for accessing and querying the complex and integrated cellular data in the BioCyc family of databases: access through multiple file formats, access through Application Program Interfaces (APIs) for LISP, Perl and Java, and SQL access through the BioWarehouse relational database.

Availability

The Pathway Tools software and 20 BioCyc DBs in Tiers 1 and 2 are freely available to academic users; fees apply to some types of commercial use. For download instructions see http://BioCyc.org/download.shtml

1 INTRODUCTION

BioCyc (see http://BioCyc.org/) is a collection of 161 Pathway/Genome DataBases (PGDBs) that represent cellular networks and genome information in a structured manner, to allow powerful computational analysis and manipulation of the data. The highly curated Tier 1 PGDBs at the core of BioCyc are the EcoCyc and MetaCyc DBs (Karp et al., 2002c,b). They contain many experimentally elucidated metabolic pathways from E. coli and other organisms. BioCyc is viewed and edited through Pathway Tools (Karp et al., 2002a), a software environment we have developed to query, display, and edit information about each pathway and its component reactions, compounds, enzymes, protein complexes, genes, operons, and regulation at the substrate and transcriptional level. Additionally, the data objects support literature references, evidence codes, and links to external databases. The BioCyc schema attempts to faithfully capture biological concepts and the cross-links among widely differing types of data. Tier 2 and 3 were computationally predicted by Pathways Tools. Tier 2 has undergone moderate curation, whereas the 139 DBs in Tier 3 have undergone no curation (note also that Tier 3 PGDBs are not yet available for programmatic access, but we expect they will be soon).

This article describes multiple methods that are exposed for querying BioCyc DBs programmatically. The same access mechanisms are available for the many PGDBs now being created by Pathway Tools users outside SRI, such as by TAIR for Arabidopsis thaliana (Mueller et al., 2003), and by SGD for Saccharomyces cerevisiae. These query methods will simplify the investigation of global questions about cellular networks.

2 SCHEMA AND DATA FILES

BioCyc uses an object-oriented database called a Frame Representation System (FRS), the schema for which has been described previously (Karp, 2000); see also Appendix A of (Paley et al., 2005). In short, every biological object (such as a compound or gene) is stored in a frame bearing a unique ID. A frame has slots, in which attributes and connections to other frames can be stored as values. Slots can store single or multiple values, and individual values can be annotated with comments or literature references. The frames are organized in a class hierarchy.

Pathway Tools can export BioCyc PGDBs in several formats: (a) A column-delimited format and attribute-value format are described in detail online.1 These formats are attractive for import into spreadsheets or relational DBs, or for parsing by Perl scripts. (b) BioPAX2 format, which is an OWL RDF/XML-based format for exchange of pathway data. (c) SBML3 format, which is an XML-based format for capturing models of biochemical reaction networks.

3 PROGRAMMATIC QUERYING

APIs in three languages provide direct, programmatic access to BioCyc DBs within Pathway Tools. The shared APIs are based upon the Generic Frame Protocol (GFP). The most commonly used GFP functions have been summarized4 and detailed documentation of GFP is available.5 Additional useful functions6 retrieve complex relationships in PGDBs. SQL querying is possible through the BioWarehouse.

Due to space limitations, only a simple example can be given below, which is transliterated to three languages: LISP, Perl, and SQL. The example query finds all enzymes for which ATP is an inhibitor.

3.1 LISP

Common LISP is the native programming language of Pathway Tools and thus provides the richest environment for queries. The API consists of the commonly used GFP functions plus the additional useful relations, as referred to above. Many LISP query examples are available.7

(defun atp-inhibits ()
    ;; We check every instance of the class
    (loop for x in (get-class-all-instances
                    '|Enzymatic-Reactions|)
        ;; We test for whether the INHIBITORS-ALL
        ;; slot contains the compound frame ATP
        when (member-slot-value-p
              x 'INHIBITORS-ALL 'ATP)
        ;; Whenever the test is positive, we collect
        ;; the value of the slot ENZYME . The
        ;; collected values are returned as a list,
        ;; once the loop terminates.
        collect (get-slot-value x 'ENZYME) )
  )
;;; invoking the query:
(select-organism :org-id 'ECOLI)
(atp-inhibits)

3.2 PerlCyc

PerlCyc8 is a Perl API that allows Perl programmers to query and to update data within a running Pathway Tools server. The communication between Pathway Tools and Perl occurs through a UNIX socket, and so both programs need to execute on the same machine.

use perlcyc;
my $cyc = perlcyc -> new(“ECOLI”);
my @enzrxns = $cyc -> get_class_all_instances(
                            “|Enzymatic-Reactions|”);
## We check every instance of the class
foreach my $er (@enzrxns) {
    ## We test for whether the INHIBITORS-ALL
    ## slot contains the compound frame ATP
    my $bool = $cyc -> member_slot_value_p($er,
                            “Inhibitors-All”, “Atp”);
    if ($bool) {
    ## Whenever the test is positive, we collect
    ## the value of the slot ENZYME . The results
    ## are printed in the terminal.
    my $enz = $cyc -> get_slot_value($er, “Enzyme”);
    print STDOUT “$enz\n”;
  }
}

3.3 JavaCyc

JavaCyc 9 is a Java analog of PerlCyc. JavaCyc also communicates with Pathway Tools through a UNIX socket. The example query is available online 10.

3.4 SQL Access via BioWarehouse

BioWarehouse is a DB integration project11 that allows multiple DBs including BioCyc, SWISS-PROT, Genbank, NCBI Taxonomy, and KEGG to be loaded within a relational DBMS server. BioWarehouse supports SQL queries to BioCyc DBs, and it allows cross-DB queries and validations to be performed. A detailed description of the BioWarehouse schema is beyond the scope of this Application Note.

select distinct DBID.xid
from DBID, Protein, EnzymaticReaction,
   EnzReactionInhibitorActivator, Chemical, DataSet
where DataSet.name=‘EcoCyc’
and DataSet.wid=EnzymaticReaction.datasetwid
and EnzymaticReaction.proteinwid = Protein.wid
and EnzymaticReaction.wid =
    EnzReactionInhibitorActivator.enzymaticreactionwid
and EnzReactionInhibitorActivator.compoundwid=Chemical.wid
and EnzReactionInhibitorActivator.inhibitoractivate=‘I’
and Chemical.name=‘ATP’
and DBID.otherwid = Protein.wid

ACKNOWLEDGMENT

We thank Jeremy Zucker for the SBML exporter and Thomas J. Lee for his SQL example.

Footnotes

REFERENCES

  1. Karp P, Paley S, Romero P. The Pathway Tools Software. Bioinformatics. 2002a;18:S225–S232. doi: 10.1093/bioinformatics/18.suppl_1.s225. [DOI] [PubMed] [Google Scholar]
  2. Karp P, Riley M, Paley S, Pellegrini-Toole A. The MetaCyc database. Nuc. Acids Res. 2002b;30(1):59–61. doi: 10.1093/nar/30.1.59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Karp P, Riley M, Saier M, Paulsen I, Paley S, Pellegrini-Toole A. The EcoCyc database. Nuc. Acids Res. 2002c;30(1):56–8. doi: 10.1093/nar/30.1.56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Karp PD. An ontology for biological function based on molecular interactions. Bioinformatics. 2000;16(3):269–285. doi: 10.1093/bioinformatics/16.3.269. [DOI] [PubMed] [Google Scholar]
  5. Mueller L, Zhang P, Rhee S. AraCyc, a biochemical pathway database for Arabidopsis. Plant Physiology. 2003;132:453–460. doi: 10.1104/pp.102.017236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Paley S, Krummenacker M, Pick J, Green M, Karp P. Pathway Tools User's Guide version 9.0. 2005 Available from SRI International. [Google Scholar]

RESOURCES