Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jun 18.
Published in final edited form as: Methods Mol Biol. 2013;1021:273–283. doi: 10.1007/978-1-62703-450-0_14

Building Models Using Reactome Pathways as Templates

David Croft 1
PMCID: PMC11184635  NIHMSID: NIHMS746384  PMID: 23715990

Abstract

The first steps of building a new model can be very time-consuming, involving consulting many research papers and then assembling a plausible network of reactions. In this chapter, tools for speeding up this process will be discussed. Reactome is a database containing extensive coverage of pathways in Homo sapiens and numerous reference species. It offers researchers wishing to create new models from scratch various tools for extracting the relevant reactions, complete with layout information. In this chapter, two use cases will be described, in which a modeller provides certain essential pieces of information and Reactome automatically constructs the basic models and then dumps them in SBML-ML format.

Keywords: Reactome, Pathway, Reaction, Model, SBML, Data

1. Introduction

In order to construct a new model, systems biologists often start out by consulting the literature in the field of interest and pulling relevant reactions out of the papers they have read. These reactions then need to be spliced together and laid out, probably using a tool such as CellDesigner [5], which can then export the model as an SBML file or run simulations of the model. The user has to live with the uncertainty of whether they have found all of the relevant literature and whether it is reliable or not. Copying over the reactions by hand to a pathway editing tool is error-prone and time-consuming. This chapter will show how Reactome tools can be used to partially automate this process and increase its reliability.

Reactome is a curated, peer-reviewed pathway database. It focuses on pathways in Homo sapiens but also infers pathways to 20 other reference species, using gene orthologies. Curators record pathways, reactions, proteins, small molecules and subcellular compartment, as well as literature references backing up all of the reactions present in a pathway. Comprehensive literature searches are made for each pathway, and only the literature deemed most reliable is actually used. A sophisticated data model allows all of this information to be stored in a computationally accessible and searchable form. In addition, diagrams are hand-drawn for all pathways, and the layout of these diagrams is stored as coordinates for all reactions and participating molecules.

The scope of Reactome is very broad, encompassing the traditional metabolic pathways but also extending to a wide range of signalling pathways, the cell cycle, apoptosis and mechanisms of bacterial and viral infection, amongst other things. At the time of writing, Reactome covered 187,219 proteins, taking part in 52,818 distinct reactions, of which 5,568 were from H. sapiens.

Reactome has tools that can accept as input lists of molecule identifiers, e.g., UniProt IDs, and then determine which reactions or pathways these molecules take part in. From there, it is possible to export entire pathways, or sets of individual reactions, as models in SBML files. If required, layout data can also be incorporated into these models, based on the hand-drawn diagrams produced by Reactome curators.

SBML files can be imported by a wide range of simulation and model editing tools, where they can then be further enriched with extra information, such as reaction kinetics.

So the general concept of model generation using Reactome works as follows:

  1. The user puts together a list of proteins, genes and possibly small molecules that they expect to see in their model.

  2. The pathways or reactions in which these entities are involved are determined using Reactome tools.

  3. An export to SBML creates the model.

  4. The SBML is imported into an external tool for further enrichment and modelling.

2. Materials

2.1. General

In order to obtain the best performance from Reactome, a computer with a processor speed of at least 1.5 GHz and memory of at least 2 GB is recommended. Reactome will work with Windows, MacOS and Linux. Reactome is a browser-based tool, and a broadband connection will be needed. It is known to work in Internet Explorer 7 and 8, Chrome, Firefox, Safari and Opera (see Note 1). Firefox is the browser that is recommended. To start a Reactome session, the URL http://www.reactome.org should be entered into the browser. This will open the Reactome home page.

In the following sections, a number of Reactome tools relevant to model construction will be described in detail.

2.2. Analyze Expression Data

From the Reactome home page, the button on the left-hand side, labelled Analyze Expression Data, should be clicked (see Note 2). This tool can also be used to analyze simple lists of identifiers, and it will be used in this way in the Methods section.

Identifiers should be stored in a simple text file, with each identifier separated by a newline (see Note 3). The following identifier types are known to Reactome:

  • Reactome

  • KEGG COMPOUND [6]

  • PubChem Substance [7]

  • ChEBI [8]

  • GO [9]

  • UniProt [10]

  • RefSeq [11]

  • Ensembl [12]

  • Affymetrix [13]

  • NCBI gene [14]

  • IPI [15]

  • Illumina [16]

  • OMIM [17]

  • EC [18]

  • MGI [19]

  • PDB [20]

  • EMBL [21]

  • miRBase [22]

Identifiers will be automatically recognized by Reactome in most cases. If one has purely numerical identifiers, they will by default be assumed to be NCBI gene, but the user will be provided with the opportunity to select the identifier type after the analysis is complete.

Once an identifier list has been constructed, it can be either copied and pasted or uploaded as a file, as indicated in Fig. 1.

Fig. 1.

Fig. 1

Launch page for expression analysis, with functionality for uploading identifier lists

Clicking the Analyze button will, after a short delay, produce a list of the pathways known to Reactome and, for each one, an indication of the number of matching proteins from the supplied identifier list, as seen in Fig. 2 (see also Note 4).

Fig. 2.

Fig. 2

Results page for expression analysis, showing the overlap with user data on a per-pathway basis

2.3. BioMart

BioMart [23, 24] is a querying infrastructure that allows very general queries to be specified and which delivers the answers in tabular format. From the Reactome home page, it can be accessed by mousing over the Tools menu and clicking on BioMart: query, link [25]. see Fig. 3.

Fig. 3.

Fig. 3

BioMart start page

To start using it, a database, and dataset must be selected. Clicking on the dropdown labelled CHOOSE DATABASE will show the available databases. Once a database has been chosen, a new dropdown will appear, labelled CHOOSE DATASET. Selecting one of these datasets then causes the Filters and Attributes to appear on the left-hand side of the page (see Note 5).

Clicking on Attributes reveals the full range of attributes available for the given dataset. Clicking in any of the checkboxes causes the associated attribute to be added as a column to the output table. Clicking once on an already checked attribute will cause it to be deselected.

Clicking on Filters shows the filters relevant to the current dataset; these allow the user to put constraints on the results of the query (see Note 6). Only the first of these filters is significant for this chapter. This contains a text area, which can be used to copy and paste a newline-separated list of identifiers, plus a dropdown that allows the type of the identifier to be specified. The following identifier types are recognized: ChEBI compound, KEGG gene, NCBI gene, UniProt and Ensembl gene. If gene identifiers are used, Reactome will find the corresponding translated proteins and use those. This filter is not able to automatically determine the identifier type, so it must be explicitly specified.

Clicking the Results button causes the query to be performed. Results are returned in tabular form. By default, only the first ten rows are shown. To get all of the results, it is recommended to click the Go button in the Export all results to section. This will allow the results to be deposited into a file in TSV (tab-separated value) format.

2.4. SBML Exporter: URL Version

The URL version of the exporter requires that a URL be typed into the browser. The URL stem looks like this: http://www.reactome.org/ReactomeGWT/entrypoint/sbmlRetrieval?

To this can be added parameters that will determine the content of the SBML produced. These should be separated by & symbols. Parameters are specified as a parameter name, followed by an equals symbol, followed by a comma-separated list of values. The two parameters that are relevant to this chapter are LAYOUT and ID, which are used to specify the type of layout to use and pathway IDs, respectively. For example, if one wished to build a model based on the two pathways 109607 (Extrinsic Pathway for Apoptosis) and 169911 (Regulation of Apoptosis), with layout information embedded as SBGN into the SBML, then the following should be supplied to the browser: http://www.reactome.org/ReactomeGWT/entrypoint/sbmlRetrieval?LAYOUT=SBGN&ID=109607,169911

2.5. Interactive SBML Exporter

The interactive SBML generator is used to turn a list of Reactome reaction IDs into a model in SBML format. To use this tool, the following URL should be entered into the browser: http://www.reactome.org/ReactomeGWT/entrypoint.html#SBMLRetrievalPage

The layout of this page is illustrated in Figs. 4 and 5. The tool needs to be supplied with a list of newline-separated Reactome reaction identifiers (see Note 7).

Fig. 4.

Fig. 4

Top of interactive SBML generator page, showing data entry panel

Fig. 5.

Fig. 5

Bottom of interactive SBML generator page, showing parameter setting options

The panel below the data entry area provides extensive possibilities for customizing the exported SBML. It is possible to set the desired SBML level and version, choose the format of the layout included with the SBML (if any), and filter according to various criteria.

Once the user is satisfied with the data format and the selected customizations, the Generate SBML button should be clicked to perform the export.

3. Methods

In this section, two use cases will be described, providing step-by-step instructions that lead users through the process of creating models based on the combination of user-supplied data and Reactome pathways. In the first use case, the pathways that are richest in proteins or genes found in the user’s data will be used to generate a model. In the second use case, all reactions in Reactome that utilize proteins, genes or small molecules from the user’s data will be put together into a model.

3.1. Constructing a Model Based on Pathways Enriched in User Data

  1. The user constructs a list of proteins and/or genes that are relevant to the model. It is not necessary for the list to be complete, since Reactome will try to fill in any gaps, but the more complete the list is, the more likely it is that the correct pathways will be found.

  2. The user translates this list into a list of corresponding identifiers. For example, serum amyloid P-component might map onto the UniProt ID P02743. Reactome recognizes a wide variety of identifier types, which gives the user a significant amount of freedom. If gene identifiers are used, Reactome will find the corresponding translated proteins and use those. Identifiers should be separated by newlines.

  3. The list of identifiers is submitted to the Reactome Analyze Expression Data tool.

  4. Once the analysis has completed and the results table has been loaded, the user clicks on the top of the % in data column. The rows in the table will be reordered according to the percentage of the proteins in the Reactome pathway that are also found in the user’s data, with those having the highest coverage appearing at the top of the table. Assuming that the dataset was designed to model a fairly specific area of biology, it is likely that one or two pathways will show a high coverage, the remainder a very low coverage.

  5. For each of the pathways with high coverage, the user clicks on the View button. A new tab opens when this button is clicked. The user examines the URL and makes a note of the very last number in the URL. This is the Reactome internal identifier for the pathway. These numbers should be noted.

  6. The URL version of the SBML exporter is used with the pathway identifiers found in the previous step to generate a model for the pathways. This model will contain all of the reactions in all of the chosen pathways.

  7. The model can now be imported into a model editor. It is likely that a significant amount of pruning will be required, since the pathways used will probably contain reactions that are not of interest to the user.

3.2. Constructing a Model Based on Reactions Operating on Molecules in User Data

  1. The user constructs a list of proteins, genes or small molecules that are relevant to the model.

  2. The user translates this list into a list of corresponding identifiers. For example, serum amyloid P-component might map onto the UniProt ID P02743. Identifiers should be separated by newlines. This list needs to be as complete as possible, since the only reactions that will be found are those which contain identifiers in the user’s data.

  3. From the BioMart page, the REACTOME database is chosen and the reaction dataset selected.

  4. Under Filters, the user pastes the identifier list into the text field of Limit to reactions containing these IDs and chooses the appropriate identifier type.

  5. Under Attributes, the user must deselect Reaction stable ID (see Note 8).

  6. The Results button is clicked.

  7. The query results are exported to a file (see Note 9). This file must be edited and the first line deleted, since this simply contains the column name

  8. This file is then imported into the interactive SBML generator, using the Browse button (see Note 10). Clicking on Generate SBML produces the model.

  9. The model can now be imported into a model editor. It is likely that a significant amount of pruning will be required, since the pathways used will probably contain reactions that are not of interest to the user.

Acknowledgements

The development of Reactome is supported by a grant from the US National Institutes of Health (P41 HG003751), EU grant LSHG-CT-2005-518254 “ENFIN,” Ontario Research Fund, and the EBI Industry Programme.

Footnotes

4

Notes

1.

The Reactome website is fairly robust, and it is likely that it will also work without problems in other browsers.

2.

The name of this tool is a bit misleading in this context.

3.

Clicking on the Example button will illustrate the required format, but note that the example provided also contains numerical expression values, which are not needed for model generation. Also, the example shows a table with column names. These are not mandatory.

4.

The column labelled Matching proteins in data in this table contains, for each pathway, a count of the number of identifiers from the user’s data that have been found in the pathway. These numbers are also links, and clicking on one of them provides a more detailed analysis of the identifiers hitting that particular pathway.

5.

The interactions dataset is a little bit different from the others, in both the available attributes and filters, but it will not be used in any of the methods presented in this chapter.

6.

Probably the best way to understand filters is as a form of query. The terms that are supplied to the filter are the ones that will be queried against.

7.

Clicking on the Example button will illustrate the required format.

8.

The idea is to produce an output table with only a single column for reaction identifiers.

9.

The default file name suggested by BioMart may be used, but the file name and path should be noted.

10.

At this point, the user can also select the desired level and version for the generated SBML. The defaults are level 2, version 3. If a layouter was selected, then reaction layout will also be incorporated into the SBML file. Note that the CellDesigner layouter currently does not work. Filters can be used for doing things like selecting only those reactions that occur within a given cellular compartment.

References

  • 1.Matthews L, Gopinath G, Gillespie M et al. (2009) Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res 37:D619–D622 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Matthews L, D’Eustachio P, Gillespie M et al. (2007) An Introduction to the Reactome Knowledgebase of Human Biological Pathways and Processes. Bioinformatics Primer, NCI/Nature Pathway Interaction Database. doi: 10.1038/pid.2007.3 [DOI] [Google Scholar]
  • 3.Croft D, O’Kelly G, Wu G et al. (2011) Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res 39 (Database Issue):D691–D697 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hucka M, Finney A, Sauro HM et al. (2003) The Systems Biology Markup Language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 9(4):524–531 [DOI] [PubMed] [Google Scholar]
  • 5.Funahashi A, Tanimura N, Morohashi M et al. (2003) Cell Designer: a process diagram editor for gene-regulatory and biochemical networks. BIOSILICO 1:159–162 [Google Scholar]
  • 6.Kanehisa M, Goto S, Kawashima S et al. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32(Suppl 1): D277–D280 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bolton E, Wang Y, Thiessen PA et al. (2008) PubChem: integrated platform of small molecules and biological activities (Chapter 12). In: Wheeler RA, Spellmeyer DC (eds) Annual reports in computational chemistry, vol 4. American Chemical Society, Washington, DC [Google Scholar]
  • 8.de Matos P, Alcántara R, Dekker A et al. (2009) Chemical entities of biological interest: an update. Nucleic Acids Res 38(Database Issue):D249–D254 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ashburner M, Ball CA, Blake JA et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.The UniProt Consortium (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40: D71–D75 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35(Database Issue):D61–D65 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Paul Flicek M, Amode R, Barrell D et al. (2011) Ensembl 2012. Nucleic Acids Res 40(Database Issue):D84–D90 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lockhart DJ et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14 (13):1675–1680 [DOI] [PubMed] [Google Scholar]
  • 14.Maglott D, Ostell J, Pruitt KD et al. (2005) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33(Database Issue): D54–D58 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kersey PJ, Duarte J, Williams A et al. (2004) The International Protein Index: an integrated database for proteomics experiments. Proteomics 4(7):1985–1988 [DOI] [PubMed] [Google Scholar]
  • 16.Wang C, Krishnakumar S, Wilhelmy J et al. (2012) High-throughput, high-fidelity HLA genotyping with deep sequencing. Proc Natl Acad Sci 109(22):8676–8681 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hamosh A, Scott A, Amberger J et al. (2004) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33(Database Issue): D514–D517. doi: 10.1093/nar/gki033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Webb EC (1992) Enzyme nomenclature 1992: recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes. Published for the International Union of Biochemistry and Molecular Biology by Academic Press, San Diego. ISBN 0–12-227164–5 [Google Scholar]
  • 19.Eppig JT, Blake JA, Bult CJ et al. (2012) The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Res 40(1): D881–D886 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Berman HM, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10:980. [DOI] [PubMed] [Google Scholar]
  • 21.Cochrane G, Akhtar R, Bonfield J et al. (2008) Petabyte-scale innovations at the European Nucleotide Archive. Nucleic Acids Res 37 (Database Issue):D19–D25. doi: 10.1093/nar/gkn765 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Griffiths-Jones S (2004) The microRNA Registry. Nucleic Acids Res 32(Suppl 1): D109–D111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Smedley D, Haider S, Ballester B et al. (2009) BioMart—biological queries made easy. BMC Genomics 10:22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhang J, Haider S, Baran J et al. (2011) BioMart: a data federation framework for large collaborative projects. Database (Oxford) 2011:bar038. doi: 10.1093/database/bar038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Haw RA, Croft D, Yung CK et al. (2011) The Reactome BioMart. Database (Oxford) 2011: bar031 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES