Abstract
Summary: Recent studies have revealed that alternative splicing plays an important role in the observed protein and interaction diversity. Special microarrays allow for measuring gene expression at the exon level and thus for studying alternative transcripts and their corresponding protein domain architecture. We have developed the Cytoscape plugin DomainGraph that enables the visualization and detailed study of domain–domain interactions forming protein interaction networks. In addition, the integration of exon expression data supports the analysis of alternative splicing events and the characterization of their effects on the protein and domain interaction network. Different expression patterns between human tissues or cells can be identified by comparing the generated domain graphs.
Availability: The plugin DomainGraph and the online documentation are available at http://domaingraph.bioinf.mpi-inf.mpg.de.
Contact: mario.albrecht@mpi-inf.mpg.de
1 INTRODUCTION
In recent years, a large number of experimentally derived and computationally predicted protein–protein interactions (PPIs) and domain–domain interactions (DDIs) have been made publicly available (Ramírez et al., 2007; Schlicker et al., 2007). These data often neglect different splice variants produced by the same gene (Matlin et al., 2005), although protein diversity is greatly influenced by alternative splicing (Blencowe, 2006; Stamm et al., 2005). Alternative isoforms of a protein may vary in their domain composition, and certain protein domains tend to be spliced out more frequently than others (Liu and Altman, 2003; Resch et al., 2004). Specific PPIs can thus be suppressed by alternative splicing events that affect protein domains responsible for protein interaction.
Based on user feedback for our previous tool DomainNetwork-Builder (Albrecht et al., 2005), we developed the versatile Java plugin DomainGraph for the established free, open-source software Cytoscape (http://www.cytoscape.org), a platform for the visualization and analysis of molecular interaction networks (Cline et al., 2007). DomainGraph is an entirely new implementation with many novel features and now supports the integration and analysis of exon expression data using Affymetrix GeneChip microarrays (http://www.affymetrix.com/products/arrays/specific/exon.affx) with over one million human probesets (Clark et al., 2007). Our plugin comprises two main functionalities: (1) the decomposition of a user-imported protein interaction network into the underlying DDIs and (2) the highlighting of protein domains and their interactions affected by alternative splicing events.
2 PROGRAM OVERVIEW
2.1 Constructing a domain graph
DomainGraph facilitates the decomposition of a user-imported protein interaction network from human, yeast and 10 other species (see online documentation) into the underlying DDIs. The DDIs in the network are computed by first mapping the proteins to their constituent Pfam domains (Finn et al., 2008) and then inferring all potential DDIs according to a user-selected DDI dataset. In the resultant network (the domain graph), proteins and domains are represented by nodes. Three different edge types exist in the domain graph: pp-edges for PPIs, dd-edges for DDIs and pd-edges for linking proteins and their constituent domains. For pp- and dd-edges, confidence scores are provided according to the user-selected dataset of DDIs. A user-defined threshold can be applied for filtering dd-edges, and the edge width can correlate with the confidence score.
The user can choose between three different network views for the domain graph. The extended view is the most detailed view showing separate nodes for each Pfam domain contained in some protein. The compact view decreases the number of domain nodes by collapsing all domains of the same Pfam domain family into a single meta-node. Each meta-node is linked to all proteins containing an instance of the domain that the meta-node represents. The protein-network view shows only the protein interaction network, and the user can select a subset of protein nodes for the display of the underlying domain interactions. Switching between the three views is possible at any time.
Additional information, for example, on protein and domain names, Gene Ontology annotation or OMIM annotation, is provided via the node right-click mouse context menu. When hovering the mouse over a node, tooltips provide a brief summary of the node information, and all direct protein interaction partners and the domains contained in them are highlighted automatically. Double-clicking on a protein node opens a new tab in the Cytoscape Attribute Browser, which shows a graphical representation of the protein domain architecture and, if Ensembl identifiers are used, the underlying exon structure (Fig. 1). The user can customize the colors of this graphics and can export it as PNG file. Tooltips in this browser display data about the absolute positions of the domains and exons within the protein or about the exons that belong completely or partly to a domain. Apart from that, web links in the mouse context menus connect all protein and domain nodes and the exons to additional information in external databases.
Fig. 1.
The extended view of a domain graph consisting of protein (rectangle) and domain (diamond) nodes. The nodes are colored according to the integrated Affymetrix exon data. Domains affected by alternative splicing are colored pink, and domains forming interactions with spliced domains are colored orange. Gray nodes represent proteins belonging to unexpressed genes. Hovering the mouse over a node displays a tooltip and highlights the direct node interaction partners in red color. In the Attribute Browser at the bottom, the domain architecture (top line, only with CDS) and exon structure (middle line, including both CDS and 3′ - and 5′ -UTRs) of the protein selected by the user is shown together with the probesets (bottom line), which are colored according to the respective expression level.
2.2 Integrating exon expression data
To integrate human Affymetrix exon expression data, the user first needs to import preprocessed expression data and P-value files based on probeset ids. Both files may be produced using the Affymetrix Power Tools or Expression Console (see online documentation for details). Probeset P-values are determined by the detection-above-background method and indicate which probesets can be regarded as present or absent. Exon expression is then concluded from the probeset P-values according to a percentage threshold (50% by default) of probesets required to be present (default P-value threshold 0.05) within the exon region. Both thresholds can be modified by the user. The imported data are permanently stored in an embedded database and thus can be readily reused in new Cytoscape sessions. The expression and P-value data are internally mapped to the corresponding exon structure and protein domain architecture (see Section 2.3). DomainGraph applies different colors to domain nodes (Fig. 1) that are (partly or completely) missing due to splicing events, form interactions with a spliced domain, or appear absent due to gene suppression. A domain is regarded missing if some exons covering the domain region are not expressed (user-changeable default threshold of at least 25% unexpressed exons). The default coloring is provided as Visual Style and is customizable.
After integrating Affymetrix expression data, the user can visualize the location of Affymetrix probesets relative to both the exon structure and the domain architecture by double-clicking on a protein node (Fig. 1). A color gradient from yellow to red is applied to display the expression level of probesets that are regarded as present according to their P-values, while probesets regarded as absent are colored pink. The default coloring of probesets in addition to that of domains and exons can be changed via the right-click mouse context menu in the data panel. Further information like the expression level and P-value of a certain probeset and its absolute positions within the corresponding protein sequence is provided by tooltips.
DomainGraph provides several options for analyzing the generated networks. The possible analysis tools are not solely based on the proteins and domains alone that occur in the respective domain graphs, but the methods also consider the biological context derived, for instance, from the node type like expressed or suppressed protein or domain node. In particular, the user can compute the intersection, union and difference of two domain graphs. For example, if a domain node is contained in both analyzed domain graphs but expressed in one and spliced out in the other graph, this is especially highlighted in the resultant network. Such a visual analysis is quite useful when comparing domain expression patterns between different tissues or cells since protein isoforms with varying domain composition due to alternative splicing can easily be identified (see online tutorial).
2.3 Integrated datasets
All protein and domain data required for constructing a domain graph are provided in an Apache Derby database (http://db.apache.org/derby/), which is stored locally in the user's Cytoscape directory. The database also contains the mappings between Exon Array probesets and Ensembl transcripts as provided by Affymetrix and comprises the corresponding assignments of domain architectures, exon structures and probesets, which were derived by mapping the genomic coordinates given by Ensembl and Affymetrix.
DomainGraph supports UniProtKB accession numbers for all available species, and Ensembl transcript and peptide identifiers additionally for human protein interaction networks. The decomposition of proteins into domains is performed using mappings from UniProtKB and Ensembl to Pfam, which were obtained from UniProtKB and BioMart, respectively. In particular, DomainGraph offers more than a dozen structurally derived or predicted datasets of DDIs (see online documentation for details), which can be used to decompose PPIs into their corresponding DDIs.
3 CONCLUSIONS
DomainGraph is a powerful Cytoscape plugin that enables an integrative analysis and visualization of protein and domain interactions in combination with exon expression data. The impact of alternative splicing events on protein and domain networks can be visually studied, and further analysis methods allow the simple detection of similarities and dissimilarities between domain graphs.
Funding: German National Genome Research Network (NGFN); German Research Foundation (DFG contract number KFO 129/1-2).
Conflict of Interest: none declared.
REFERENCES
- Albrecht M, et al. Decomposing protein networks into domain-domain interactions. Bioinformatics. 2005;21(Suppl. 2):ii220–ii221. doi: 10.1093/bioinformatics/bti1135. [DOI] [PubMed] [Google Scholar]
- Blencowe BJ. Alternative splicing: new insights from global analyses. Cell. 2006;126:37–47. doi: 10.1016/j.cell.2006.06.023. [DOI] [PubMed] [Google Scholar]
- Clark TA, et al. Discovery of tissue-specific exons using comprehensive human exon microarrays. Genome Biol. 2007;8:R64. doi: 10.1186/gb-2007-8-4-r64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cline MS, et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2007;2:2366–2382. doi: 10.1038/nprot.2007.324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Finn RD, et al. The Pfam protein families database. Nucleic Acids Res. 2008;36:D281–D288. doi: 10.1093/nar/gkm960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu S, Altman RB. Large scale study of protein domain distribution in the context of alternative splicing. Nucleic Acids Res. 2003;31:4828–4835. doi: 10.1093/nar/gkg668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matlin AJ, et al. Understanding alternative splicing: towards a cellular code. Nat. Rev. Mol. Cell Biol. 2005;6:386–398. doi: 10.1038/nrm1645. [DOI] [PubMed] [Google Scholar]
- Ramírez F, et al. Computational analysis of human protein interaction networks. Proteomics. 2007;7:2541–2552. doi: 10.1002/pmic.200600924. [DOI] [PubMed] [Google Scholar]
- Resch A, et al. Assessing the impact of alternative splicing on domain interactions in the human proteome. J. Proteome Res. 2004;3:76–83. doi: 10.1021/pr034064v. [DOI] [PubMed] [Google Scholar]
- Schlicker A, et al. Functional evaluation of domain-domain interactions and human protein interaction networks. Bioinformatics. 2007;23:859–865. doi: 10.1093/bioinformatics/btm012. [DOI] [PubMed] [Google Scholar]
- Stamm S, et al. Function of alternative splicing. Gene. 2005;344:1–20. doi: 10.1016/j.gene.2004.10.022. [DOI] [PubMed] [Google Scholar]