Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2007 Aug 22;2(8):e766. doi: 10.1371/journal.pone.0000766

The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists

Sven Heinicke 1,#, Michael S Livstone 1,#, Charles Lu 1,#, Rose Oughtred 1,#, Fan Kang 1, Samuel V Angiuoli 2,3, Owen White 2, David Botstein 1, Kara Dolinski 1,*
Editor: Berend Snel4
PMCID: PMC1942082  PMID: 17712414

Abstract

Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools.

Introduction

With the great explosion of biological data in the last decade, biological databases have become an essential part of today's research. The earliest online databases were the sequence repositories, such as Genbank [1] and EMBL [2], that provided the non-expert public access to the sequence data for genes, chromosomes, and eventually entire genomes, along with highly effective query and comparison tools. Soon after, several model organism databases that store and display the annotated genome sequences of well-studied organisms were developed. These databases now serve as an essential basic information source for all kinds of biological researchers.

For working biologists, some of the most important information concerns the phylogenetic relationships among proteins, which is not necessarily straightforward to recover from the basic sequence databases. Regardless of which organism one works with, much of the functional annotation of gene and protein functions is transferred, based on sequence similarity, from other organisms where more experimental information is available (for example, see the Gene Ontology annotations at http://www.geneontology.org/GO.current.annotations.shtml). It is for this reason that sequence similarity searching has become one of the most popular database tools in current use, perhaps second only to searching the published literature. To make good use of sequence similarity information, it would be very useful to have a simple, user-friendly way to visualize relationships in their phylogenetic context, particularly the relationships among the proteins in the model organisms from which most of the functional annotations are derived. It is of particular value to be able to know which proteins are (or might be) orthologous [i.e. similar to each other in sequence because they originated from a common ancestor, having been separated in evolutionary time only by speciation event(s)]. It is also useful to see these orthologous relationships in the context of the larger paralogous gene families ultimately caused by gene duplications during the course of evolution.

In this paper, we describe P-POD, which provides the user an easy way to find and visualize the orthologs to a query sequence in the eukaryotes of greatest interest to working biologists (i.e. the experimental model organisms and the human) in their evolutionary context, and to link these relationships with the relevant literature. Several databases that specialize in comparative genomics have recently come online. Each of these databases, including P-POD, has both useful features and problems specific to the methods or species chosen in the analysis (Table 1, reviewed in [3]); none is perfect, but each fulfills the needs of particular database users.

Table 1. Comparative genomics web resources.

Name Description Ortholog prediction Larger seq. families Disease information Curated literature
Clusters of Orthologous Groups (COGs/KOGs) [22] Provides groups of orthologous proteins for seven eukaryotic species; the construction protocol involves manual curation Yes Yes No No
Eukaryotic Gene Orthologs (EGO) [23] Displays predicted orthologs derived from several eukaryotic genomes based on gene alignments Yes No No No
Homologene [24] Provides automated predictions of homologs among the genes of several eukaryotes No Yes Yes No
Inparanoid [25] Houses pair-wise groups of orthologous proteins for multiple species Yes No No No
OrthoDisease [26] Uses the Inparanoid algorithm to generate pair-wise orthologs between human disease genes and genes from other species Yes No Yes No
OrthoMCL-DB [4], [27] Utilizes a Markov Cluster algorithm to predict orthologous groups of proteins for multiple species simultaneously Yes No No No
Sybil (S. Angiuoli and O. White, in preparation) Uses Jaccard clustering to group sequences based on pair-wise BLAST analysis No Yes No No
YOGY [28] Retrieves orthologous proteins from four different resources: KOGs, Inparanoid, Homologene, and OrthoMCL-DB Yes No No Yes (only budding and fission yeast)
P-POD (This study) Orthologs and Jaccard clusters Yes Yes Yes Yes

P-POD is meant to complement these existing databases by providing a comparative genomics analysis system readily accessible to and readable by experimentalists, containing not just computational comparative analyses of the most common experimental organisms but also literature curation and links to other databases of interest. For example, while the OrthoMCL database contains sequences from over 55 prokaryotic and eukaryotic genomes, we chose to include protein sequences from eight eukaryotic organisms for their medical value or their status as widely-studied model organisms. There are certainly users who would need the more comprehensive species set from OrthoMCL. While P-POD uses the underlying OrthoMCL algorithm, it is meant to complement the OrthoMCL online database by serving another set of users, primarily experimental biologists who wish to query with their gene of interest from a well studied model organism to quickly get the evolutionary context of that gene along with other relevant information about that gene without sorting through a very large list of other sequences.

We designed our comparative genomics analysis system so that different components could be added to and removed from the pipeline in a modular fashion; the initial version of the pipeline described here generates related protein families using two different methods to provide complementary views of phylogenetic relationships. We used OrthoMCL ([4]) to find the orthologs and a version of Jaccard Clustering [modified to find homologs across multiple genomes (S. Angiuoli and O. White, in preparation)] to provide a larger protein family context. The phylogenetic relationships among family members from each method are determined using CLUSTAL W [5] and PHYLIP and visualized as arbitrarily rooted trees. In addition, we provide relevant gene and disease information from the Online Mendelian Inheritance in Man (OMIM) [6] database and also provide information culled from the literature that can be used to indicate when functional conservation has been shown experimentally between predicted orthologs. All the data within the database are freely available through the web and by downloading the entire software and database system via the following URL: http://ortholog.princeton.edu/

Historically, genomic databases have been developed in isolation, with idiosyncratic database schemata and software. Much duplication of effort can be avoided by developing generic modular databases and software that save, especially in the long run, both time and money spent on development, maintenance, and user training. In constructing P-POD we made use of the database schema, installation and loading tools, and various software components from the Generic Model Organism Database (GMOD) project (www.gmod.org), The goal of GMOD is to develop an open and generic genomic database environment, including database schemata and required software tools.

Results

The P-POD Pipeline

In the interests of both simplicity and flexibility, the P-POD pipeline employs a modular architecture. The pipeline takes FASTA-formatted protein sequences as input, performs comparative genomic analyses, and stores the results in a database. In addition, we have created web tools that allow searching and browsing of the results in a user-friendly manner. We built the initial pipeline to identify putative orthologous proteins using OrthoMCL [4]. We chose OrthoMCL over other algorithms mainly because it can be run on multiple species at once and is one of the better-performing algorithms in terms of sensitivity and specificity [7] [3]. We generated larger families of related sequences using Jaccard clustering modified to find homologs across multiple genomes; see the Materials and Methods section for algorithm details. It is important to note that we built the P-POD system so that we can easily add or remove results from different analysis methods. We acknowledge that the first choice is not always the best choice, and as algorithms improve and/or as users request other methods, we plan to modify and expand the system as appropriate. P-POD generates phylogenetic trees from both analyses using CLUSTAL W [5] and PHYLIP; the trees are graphically displayed on the web. The overall pipeline is illustrated in Figure 1. The sources and versions of the pipeline components are listed in Table 2. The data are stored in a Generic Model Organism Database (GMOD) database schema using the freely available PostgreSQL software to make the entire system accessible to as many users as possible, not only through the web but also via download of the entire system.

Figure 1. Steps in the analysis pipeline.

Figure 1

Table 2. Components of the analysis pipeline.

Program Version Source
GMOD::Loader This study
WU-BLAST 2.0MP-WashU 10-May-2005 http://blast.wustl.edu/
OrthoMCL [4] Version 1.2 14-March-2005 http://sourceforge.net/projects/orthomcl/
MCL [29] Version 1.005, 05-118 http://micans.org/mcl/
Jaccard Clustering NA S. Angiuoli and O. White (in preparation)
Clustal W [5] Version 1.83 ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw
PHYLIP Version 3.64 http://evolution.genetics.washington.edu/phylip.html
createTree This study

The P-POD database contains protein sequences from eight eukaryotic organisms with fully sequenced genomes chosen for either their medical value or their status as widely-studied model organisms. They include a yeast (Saccharomyces cerevisiae), a nematode worm (Caenorhabditis elegans), a fruit fly (Drosophila melanogaster), a flowering plant (Arabidopsis thaliana), a fish (Danio rerio), a mouse (Mus musculus), and human (Homo sapiens). These are the leading experimental organisms for modern biologists, and among them span much of the evolutionary tree of the eukaryotes. Also included is the malaria parasite Plasmodium falciparum, an organism that, although it is a eukaryote, has a relatively exotic parasitic lifestyle. Sources for each protein set are listed in Table 3. Also stored in the system are results from each step of the pipeline, gene and disease information from OMIM, and curated information from the literature describing experimental tests of functional conservation (see Figure 2).

Table 3. Sources and numbers of sequences analyzed.

Organism Proteins Database Filename
S. cerevisiae 6704 SGD orf_trans_all.fasta.gz
H. sapiens 33869 ENSEMBL Homo_sapiens.NCBI35.nov.pep.fa.gz
M. musculus 36471 ENSEMBL Mus_musculus.NCBIM34.nov.pep.fa
D. rerio 32143 ENSEMBL Danio_rerio.ZFISH5.nov.pep.fa
D. melanogaster 19178 FlyBase dmel-all-translation-r4.2.1.fa
C. elegans 22858 WormBase wormpep150.fa
A. thaliana 30690 TAIR TAIR6_pep_20051108.fa
P. falciparum 5363 PlasmoDB Pfa3D7_WholeGenome_Annotated_PEP_2005.2.11.fa

Figure 2. Screenshots of the P-POD web interface.

Figure 2

(A) A portion of the results page for the DPM1 OrthoMCL family is shown superimposed on the search form. Results from OrthoMCL are provided, and a link to the larger Jaccard family (B) is also available. Disease information from OMIM is displayed, as well as any relevant disease or cross-complementation literature.

The pipeline generated a total of 25,271 OrthoMCL families and 15,050 Jaccard Clustering families that contain a total of 165,970 proteins (154,736 and 152,799 for each method, respectively) from eight different organisms. There are 984 OrthoMCL families that contain at least one protein from each of the species, with 112 of them containing exactly one protein from each. We used the GO Term Mapper tool available at SGD to determine the distribution of GO annotations for the 112 yeast proteins in these families; we chose the yeast proteins because complete GO annotation is available for the entire yeast genome [8]. Not surprisingly, these proteins are involved in core biological processes that are common across eukaryotes, including translation, transport, cell cycle regulation, and cytoskeleton organization. These genes are also well characterized; only four of the 112 genes were annotated to “biological process unknown.” We also used the GO Term Finder [9] implementation at Princeton (http://go.princeton.edu/) to look for enrichment of GO terms among the 112 genes. Again unsurprisingly, the most significant shared term is “ribosome biogenesis and assembly” (corrected P-value = 5.85e-18) along with other terms related to translation and basic metabolic processes, all processes common among the eukaryotes.

The complete species distribution of each family is available via the web (http://ortholog.princeton.edu/organismdist.html), and the number of proteins found in families and orphan proteins (those not found in an OrthoMCL or Jaccard family) from all the species is found in Table 4.

Table 4. Number of proteins in each organism found in OrthoMCL or Jaccard families.

Organism OrthoMCL Jaccard Orphan (% of total proteome)
S. cerevisiae 4,333 3,660 2,176 (32%)
H. sapiens 27,606 29,315 3,193 (9%)
M. musculus 29,214 31,388 3,902 (11%)
D. rerio 27,602 28,968 1,903 (6%)
D. melanogaster 16,015 15,048 2,503 (13%)
C. elegans 18,070 16,308 4,078 (7%)
A. thaliana 27,987 25,819 2,279 (13%)
P. falciparum 3,909 2,293 1,284 (33%)

The percentage of orphans is generally strikingly low, with the percent orphaned in a given species 13% or lower, with two exceptions, yeast (32%) and Plasmodium (33%). These numbers confirm the high conservation of proteins across eukaryotes, with the notable exception the Plasmodium outlier. The high percentage of yeast orphans is due to the fact that we did the analysis with the complete protein set, including over 800 ORFs flagged as “Dubious” by SGD; these are not likely to actually encode proteins, and when they are excluded the percentage of orphans in yeast drops to about 20%.

P-POD includes 1,895 human proteins that are associated with human diseases (based on protein-OMIM disease files downloaded from ENSEMBL), 1,852 of which were found in either an OrthoMCL or Jaccard family; in each of these cases, links to the relevant OMIM records are provided online.

Manually Curated Information

P-POD also includes curated literature that contains information relevant to the yeast proteins in the database. The source of the literature is the Saccharomyces Genome Database (SGD). SGD provides a Literature Guide tool that categorizes yeast literature into different topics, two of which, “Cross-species expression” and “Disease-gene related,” are particularly relevant to the data in P-POD; we believe that this set of papers, which is continually updated and curated, contains most, if not quite all, of the experimental data testing functional conservation between yeast and other organisms. All papers associated with these topics were downloaded from the SGD FTP site and loaded into the database (see Materials and Methods). They are then displayed on the web interface, with links to PubMed, so that users can compare experimentally determined functional conservation and computationally predicted orthology. This set of papers does not, of course, address proteins without a yeast ortholog. A way of dealing with this limitation is under study; a likely development will be the inclusion of papers from the literatures of other model organisms. For disease-related genes, we provide OMIM links that at least partially fill this gap for the human.

In addition, we manually curated the “Cross-species expression” papers to indicate explicitly when functional conservation was experimentally determined. These cross-species expression experiments test whether expressing a putative ortholog from one organism will restore wildtype function to the corresponding inactivated gene in another organism (almost always S. cerevisiae). Table 5 summarizes this curated information for only the yeast proteins in the disease-related families to illustrate how this information can be compared to computational results, but P-POD contains experimental results for all yeast proteins for which curated information is available. The orthologs predicted by OrthoMCL often exhibit conserved function. Of the 643 curated complementation experiments between yeast genes and their putative orthologous sequences from other organisms, 395 showed functional conservation and were also identified as orthologs by OrthoMCL; 50 did not complement and were also not predicted to be orthologs by OrthoMCL. Thus, in most cases (445/643), the computational determination of orthology was consistent with experimental results of functional conservation. However, in 153 experiments, complementation was observed, but the proteins were not in the same OrthoMCL family, and in 45 experiments, complementation did not occur, but OrthoMCL predicted an orthologous relationship between the two proteins. These experimental results can be used as a rudimentary assessment of the computational predictions but it must be noted that the definition of orthology does not require functional conservation [10], and there are actual cases (e.g. actin) where in vivo complementation fails for biological reasons, even for true orthologs that can function in vitro [11].

Table 5. Functional conservation vs. ortholog prediction: comparing experimental results with the OrthoMCL ortholog predictions for disease-related families.

OrthoMCL Experimental Yeast gene Protein(s) tested Citation
No No YJL095W: BCK1 H. sapiens: ENSP00000306124 [31]
No No YJR040W: GEF1 M. musculus: ENSMUSP00000035964 [32]
No No YMR190C: SGS1 H. sapiens: ENSP00000298139 [33]
No No YOL090W: MSH2 H. sapiens: ENSP00000265081, ENSP00000234420 [34]
Yes Yes YAL016W: TPD3 A. thaliana: AT1G25490.1 [35]
Yes Yes YBR110W: ALG1 H. sapiens: ENSP00000262374 [36] [37]
Yes Yes YBR140C: IRA1 H. sapiens: ENSP00000351015, ENSP00000348498 [38]
Yes Yes YBR140C: IRA1 H. sapiens: ENSP00000351015, ENSP00000352435, ENSP00000348498 [39]
Yes Yes YBR254C: TRS20 H. sapiens: ENSP00000310153 [40]
Yes Yes YCR075C: ERS1 H. sapiens: ENSP00000046640 [41]
Yes Yes YDL120W: YFH1 H. sapiens: ENSP00000297735 [42], [43]
Yes Yes YDL126C: CDC48 A. thaliana: AT3G09840.1 [44]
Yes Yes YDR270W: CCC2 H. sapiens: ENSP00000242839, ENSP00000342559 [45] [46] [47]
Yes Yes YDR270W: CCC2 C. elegans: Y76A2A.2 [48]
Yes Yes YDR270W: CCC2 H. sapiens: ENSP00000343026, ENSP00000345728 [49] [50]
Yes Yes YDR363W-A: SEM1 M. musculus: ENSMUSP00000040741 [51]
Yes Yes YDR363W-A: SEM1 H. sapiens: ENSP00000248566 [52]
Yes Yes YER095W: RAD51 M. musculus: ENSMUSP00000028795 [53]
Yes Yes YER120W: SCS2 H. sapiens: ENSP00000217602, ENSP00000345656 [54]
Yes Yes YER171W: RAD3 H. sapiens: ENSP00000221481 [55] [56]
Yes Yes YFL018C: LPD1 H. sapiens: ENSP00000205402 [57]
Yes Yes YFR019W: FAB1 M. musculus: ENSMUSP00000079926 [58]
Yes Yes YFR053C: HXK1 H. sapiens: ENSP00000338009, ENSP00000223366, ENSP00000350996 [59]
Yes Yes YGL001C: ERG26 M. musculus: ENSMUSP00000033715 [60]
Yes Yes YGL006W: PMC1 A. thaliana: AT2G41560.1 [61]
Yes Yes YGL006W: PMC1 A. thaliana: AT3G21180.1 [62]
Yes Yes YGL115W: SNF4 A. thaliana: AT1G09020.1 [63], [64]
Yes Yes YGL125W: MET13 A. thaliana: AT3G59970.1, AT2G44160.1 [65]
Yes Yes YGL125W: MET13 H. sapiens: ENSP00000315965 [66]
Yes Yes YGL167C: PMR1 H. sapiens: ENSP00000306816, ENSP00000329664, ENSP00000352665 [67], [68]
Yes Yes YGL167C: PMR1 H. sapiens: ENSP00000306816, ENSP00000329664, ENSP00000349901, ENSP00000352580, ENSP00000352665 [69]
Yes Yes YGL253W: HXK2 H. sapiens: ENSP00000338009, ENSP00000223366, ENSP00000350996 [59]
Yes Yes YGR240C: PFK1 H. sapiens: ENSP00000345771, ENSP00000352842 [70], [71]
Yes Yes YGR267C: FOL2 H. sapiens: ENSP00000352686, ENSP00000254299 [72], [73]
Yes Yes YHR037W: PUT2 H. sapiens: ENSP00000290597, ENSP00000336944 [74], [75]
Yes Yes YIL143C: SSL2 A. thaliana: AT5G41360.1 [76]
Yes Yes YJL059W: YHC3 H. sapiens: ENSP00000353116, ENSP00000353116, ENSP00000346650 [77]
Yes Yes YJL101C: GSH1 D. melanogaster: CG2259-PA, CG2259-PB [78]
Yes Yes YJR104C: SOD1 H. sapiens: ENSP00000270142 [79]
Yes Yes YJR117W: STE24 H. sapiens: ENSP00000196805 [80], [81]
Yes Yes YJR135W-A: TIM8 H. sapiens: ENSP00000247385 [82], [83]
Yes Yes YKL209C: STE6 M. musculus: ENSMUSP00000041204 [84]
Yes Yes YKL209C: STE6 M. musculus: ENSMUSP00000041204, ENSMUSP00000088389 [85]
Yes Yes YKR079C: TRZ1 H. sapiens: ENSP00000337445 [86]
Yes Yes YLR142W: PUT1 A. thaliana: AT5G38710.1 [87]
Yes Yes YML021C: UNG1 H. sapiens: ENSP00000242576, ENSP00000337398 [88]
Yes Yes YMR190C: SGS1 H. sapiens: ENSP00000347232, ENSP00000349859 [33], [89], [90]
Yes Yes YMR205C: PFK2 H. sapiens: ENSP00000345771, ENSP00000352842 [70], [71]
Yes Yes YNL219C: ALG9 H. sapiens: ENSP00000316397 [36]
Yes Yes YNR030W: ALG12 H. sapiens: ENSP00000333813 [91]
Yes Yes YNR041C: COQ2 H. sapiens: ENSP00000310873 [92]
Yes Yes YNR041C: COQ2 A. thaliana: AT4G23660.1 [93]
Yes Yes YOL049W: GSH2 H. sapiens: ENSP00000216951 [94]
Yes Yes YOL081W: IRA2 H. sapiens: ENSP00000351015, ENSP00000348498 [38], [95]
Yes Yes YOR204W: DED1 H. sapiens: ENSP00000310870 [96]
Yes Yes YOR204W: DED1 D. melanogaster: CG9748-PA [97]
Yes Yes YPL022W: RAD1 A. thaliana: AT5G41150.1 [98]
Yes Yes YPL153C: RAD53 H. sapiens: ENSP00000329178, ENSP00000329012 [99]
Yes Yes YPL218W: SAR1 A. thaliana: AT1G56330.1 [100]
Yes Yes YPR183W: DPM1 S. cerevisiae: DPM1 [101]
No Yes YBR018C: GAL7 H. sapiens: ENSP00000338703 [102]
No Yes YBR289W: SNF5 A. thaliana: AT3G17590 [103]
No Yes YDR135C: YCF1 A. thaliana: AT3G13080.1 [104], [105]
No Yes YGL006W: PMC1 H. sapiens: ENSP00000306816, ENSP00000329664, ENSP00000352665 [68]
No Yes YGL167C: PMR1 A. thaliana: AT1G07810.1 [106]
No Yes YGL167C: PMR1 A. thaliana: AT2G41560.1 [61]
No Yes YGL167C: PMR1 A. thaliana: AT3G21180.1 [62]
No Yes YHL007C: STE20 A. thaliana: AT4G08500.1 [107]
No Yes YJR040W: GEF1 M. musculus: ENSMUSP00000030879 [32]
No Yes YJR104C: SOD1 H. sapiens: ENSP00000307870 [108]
No Yes YNL098C: RAS2 H. sapiens: ENSP00000309845 [109]
No Yes YOR101W: RAS1 H. sapiens: ENSP00000309845 [109]
No Yes YOR130C: ORT1 A. thaliana: AT1G79900.1 [110]
No Yes YPL111W: CAR1 A. thaliana: AT4G08900.1 [111]
Yes No YDR529C: QCR7 H. sapiens: ENSP00000287022 [112]
Yes No YER148W: SPT15 H. sapiens: ENSP00000230354 [113]
Yes No YNL280C: ERG24 D. melanogaster: CG17952-PC [114]
Yes No YOL090W: MSH2 H. sapiens: ENSP00000233146 [34]
Yes No YPR183W: DPM1 H. sapiens: ENSP00000001585 [115]

In all but one of these experiments, the yeast gene was mutated and the gene from the other organism was tested for the ability to complement the mutant phenotype. In the one exception, the yeast gene DPM1 was expressed in mouse. In the OrthoMCL column, “Yes” indicates that the OrthoMCL algorithm placed the two proteins in the same ortholog family, while “No” indicates it did not. In the Experimental column, “Yes” indicates functional complementation, while “No” indicates none. Thus, when both columns are the same, the OrthoMCL prediction is consistent with the experimental result i.e. in the cases where both are “Yes,” the predicted orthologs are functionally conserved, and when both are “No,” the proteins are not predicted to be orthologs, and they are not functionally conserved.

The P-POD User Interface: Orthologs, Families and Diseases

We designed a simple web interface that allows users to search and browse the data in several ways (Figure 2). Results can be queried by various peptide identifiers or gene names, choosing from any of eight model organisms for the query protein and a particular analysis method, or they can be searched or browsed by Online Mendelian Inheritance in Man (OMIM) ID.

Searches generate result pages that contain:

  • a hyperlinked phylogenetic tree of predicted orthologs generated by OrthoMCL or of more distantly-related proteins generated by Jaccard clustering,

  • a list of diseases and genes associated with the human ortholog(s) as documented in OMIM,

  • a manually curated list of papers with cross-complementation experiments involving the yeast ortholog(s), and

  • a downloadable ClustalW alignment of family members.

Using P-POD to Compare Methods: Jaccard and OrthoMCL

To illustrate the usefulness of being able to store multiple analyses in a single database, we further compared the results between the OrthoMCL and Jaccard Clustering methods. A query for yeast TUB1 using only OrthoMCL reveals the alpha tubulins from yeast and other organisms (Figure 3), but not the important paralogous relationships to the beta and gamma tubulins [12] [13], which are observed in the TUB1 Jaccard cluster (not shown). These three main classes of tubulins are related to the bacterial FtsZ protein and diverged prior to the divergence of the eukaryotes [12]. Many such examples are found, especially among the ancient gene families that go back to the common ancestors of all eukaryotes. The Jaccard clustering provides this larger evolutionary context.

Figure 3. OrthoMCL family of the alpha tubulins.

Figure 3

This OrthoMCL family contains only the alpha tubulins, while the tubulin family generated by the Jaccard family (too large to be shown here) contains the alpha, beta, and gamma tubulins.

While OrthoMCL identifies predicted orthologs, the Jaccard clustering algorithm should build broader families of more distantly related sequences. Accordingly, one might initially expect that each OrthoMCL family would be a subset of a corresponding Jaccard cluster. Of course, because each algorithm defines homologs quite differently, in practice it would be reasonable to expect a certain degree of disagreement between the OrthoMCL and Jaccard clustering results. Of the 25,271 OrthoMCL families, 17,340 (69%) are subsets of Jaccard clusters. A certain amount of the “loss” of family members is due to stochastic effects; 72% of the 22,216 OrthoMCL families with ten or fewer members remain intact as subsets of Jaccard clusters, compared to only 49% of the 3,055 larger families. Fully 91% of the peptides assigned to OrthoMCL families also lie in Jaccard clusters. 82% of the OrthoMCL families have 80% or more of their peptides in a single Jaccard cluster; 93% have 50% or more.

Another possible source of inconsistency between the OrthoMCL and Jaccard results is that these analyses were run with different parameter settings. In particular, an alignment constraint was used for the Jaccard clustering alone because the default and recommended settings for OrthoMCL do not include an alignment constraint (see http://orthomcl.cbil.upenn.edu/ORTHOMCL/). The Jaccard clustering software was configured to ignore BLAST hits that did not align over 50% of the length of both peptides. For example, yeast MET3 and MET14 respectively encode ATP sulfurylase and adenylylsulfate kinase, which catalyze the first two steps of a sulfate assimilation pathway. A. thaliana retains this distinction, but C. elegans, D. melanogaster, D. rerio, human, and mouse have bifunctional proteins containing both activities. The OrthoMCL family contains all of these peptides (Figure 4B), but MET14 and the four Arabidopsis adenylylsulfate kinases form their own Jaccard cluster (Figure 4A). At 202 amino acids, Met14p is less than half the length of the other OrthoMCL family members and therefore fails to satisfy the 50% alignment constraint used in the Jaccard clustering algorithm.

Figure 4. The MET3/MET14 families.

Figure 4

(A) MET14 Jaccard family, and (B) MET3/MET14 OrthoMCL family.

Again, having both sets of results within the same database made comparison of the two methods and detection of possible issues relatively straightforward. We expect that this will be a useful feature for database developers and/or bioinformaticians who may download the entire P-POD system for local installation to use as a development base for their algorithms of choice.

Other Uses for P-POD

We provide several examples of how P-POD might be used by experimental biologists, and not necessarily those expert in phylogenomics. In addition, we illustrate how providing results from different analysis methods can help to identify issues characteristic of the different methods.

The P-POD system can be used in a simple way to learn something global about the genes and/or proteins of an organism. As an illustration, we studied the conservation of essential genes, i.e. genes that are required for viability, across yeast and mammals. Among the 929 OrthoMCL families with unambiguous orthologs from yeast, mouse, and human (i.e. exactly one member from each of these species), phenotype data were available for the yeast and mouse genes in 107 cases. In 28 cases, the yeast gene was essential, and in 24 of these families (86%), the mouse gene was also essential. The entire analysis can be found at http://ortholog.princeton.edu/essential_analysis.html.

P-POD can be used to estimate whether essential yeast genes are more likely to be conserved and/or related to a human disease gene. There are 1100 essential and 4670 non-essential yeast genes, respectively. 853 essential yeast genes (77.5%) are found in an OrthoMCL family, while 247 (22.5%) are not. Of the non-essential genes, 2968 (63.6%) are found in families, while 1702 (36.4%) are not. These data suggest that essential genes are more conserved than non-essential genes (χ2 = 78, p = 1.1e-18). When examining essentiality among the 954 yeast genes found in disease-related families, 191 of them are essential (20% of the disease-related genes, 17% of all essential genes), while 691 of them are non-essential (72% of disease-related genes, 14.8% of all non-essential genes); phenotype data are not available for the remaining 72 yeast genes. Thus, there does not appear to be enrichment of essential genes among the disease-related yeast genes (χ2 = 4.5, p = 0.03). The lack of enrichment of essential genes among disease-related genes is initially surprising; however, this result can be explained if genes required for viability in yeast are also required for viability of human cells, thus making it impossible for the mammal to fully develop into even a diseased organism.

P-POD simplifies the study of the relationships among families of proteins with related functions. One example is the DNA-dependent RNA polymerase family (Figure 5A, B, C). Transcription of genes in eukaryotes is generally performed by three RNA polymerases (I, II, and III), each of which is composed of more than 10 subunits [14], Searching on a selection of individual yeast RNA polymerase subunits (RPO21, RPO31, RPA190, RPB2, RPB4, RPB5, RPA135, and RET1) resulted in separate phylogenetic tree displays for each protein, demonstrating that they had been effectively resolved into distinct ortholog clusters. Within each cluster, there were mainly one-to-one orthologous relationships between the proteins from each species, except for RPA135, and RET1, which include orthologs from each species examined except for D. rerio (Figure 5A, B).

Figure 5. OrthoMCL and Jaccard clustering results for the second largest RNA polymerase subunit families of S. cerevisiae.

Figure 5

The second largest subunits of RNA polymerase I, II, and III in yeast are named RPA135, RPB2, and RET1, respectively. (A) Phylogenetic tree display of OrthoMCL results showing individual yeast subunit RPA135 and its predicted orthologs resolved into a distinct family. OrthoMCL results showing yeast RNA polymerase subunits RET1 (B) and RPB2 (C) resolved into separate families of orthologs. (D) Jaccard clustering results showing a “super family” of related RNA polymerase subfamilies. Arrows from each OrthoMCL family on the left point to the separate subfamilies in the Jaccard results. I to IV on the right of each tree indicates RNA polymerase subfamily. The second largest subunits for a fourth RNA polymerase, Pol IV, unique to plants were resolved into their own distinct two-member family by the OrthoMCL program (not shown), and were appropriately clustered with this superfamily by the Jaccard clustering method. (Adapted from figure 2 of [15])

For some subunits, in particular RPO21, RPA190, and RPA135, there appear to be more than one mouse or human paralog; however, upon further investigation, it was determined that the separate peptides were encoded by a single mouse or human gene (Figure 5A). Therefore, for the most part, each protein from each species appeared to be orthologous to the others, as would be expected for proteins functioning in a core biological process [14].

Interestingly, experimental evidence shows that although all eukaryotes have RNA polymerases I, II, and III, plants are unique in that they have subunits for a fourth polymerase, Pol IV. The closely related genes, AT3G18090.1 (NRPD2B) and AT3G23780.1 (NRPD2A), have been found to encode the second largest subunit of plant Pol IV, with most of the NRPD2 transcripts coming from NRPD2A. These atypical second largest subunits occurring only in plants are most similar in sequence to the RNA polymerase II second largest subunits in other eukaryotes such as yeast RPB2 [15], [16]. Despite this sequence similarity, they were effectively resolved away from the OrthoMCL-generated ortholog cluster containing yeast RBP2 into their own distinct two-member family. The Jaccard clustering method, on the other hand, correctly grouped these unique Pol IV plant subunits with the other second largest RNA polymerase subunit families, as shown in Figure 5D.

As another illustration, we examined thirty yeast ER proteins involved in asparagine-linked glycosylation, a pathway which is well-conserved between yeast and humans in its early steps and diverges soon after glycosylated proteins enter the Golgi (Table 6). Of these, 27 are known from the literature to have human homologs. This analysis shows that 26 lie in ortholog families, with the majority having orthologs in Homo sapiens (26), D. melanogaster (24), A. thaliana (24), M. musculus (23), C. elegans (23), and D. rerio (21). The four proteins that do not lie in ortholog families are subunits of the yeast oligosaccharyltransferase complex. Deleterious mutations in ten of the human homologs cause congenital disorders of glycosylation. Interestingly, only nine of the thirty yeast ER proteins have orthologs in P. falciparum. N-linked glycosylation has been detected only at very low levels in P. falciparum [17], and ensuring appropriate glycosylation in heterologously-expressed P. falciparum proteins has been a technical challenge in the development of malaria vaccines [18], [19].

Table 6. Conservation of yeast proteins involved in N-linked glycosylation.

Function Yeast gene Human gene CDG (OMIM) At Ce Dm Dr Mm Pf
Dolichol synthesis and modification RER2 DHDDS x x x x
SEC59 TMEM15 x x x x x
DPM1 DPM1 Ie (608799) x x x x x x
ALG5 ALG5 x x x x x
CAX4 DOLPP1 x x x x
Assembly of core oligo-saccharides ALG7 DPAGT1 Ij (608093) x x x x x x
ALG13 GLT28D1 x x x x x x
ALG14 unnamed x x x x x x
ALG1 ALG1 Ik (608540) x x x x x
ALG2 ALG2 Ii (607906) x x x x
ALG11 unnamed x x x x x
RFT1 RFT1 x x x x
ALG3 ALG3 Id (601110) x x x x
ALG9 ALG9 Il (608776) x x x x x
ALG12 ALG12 Ig (607143) x x x x
ALG6 ALG6 Ic (603147) x x x x
ALG8 ALG8 Ih (608104) x x x x x
DIE2/ALG10 ALG10/KCR1 x x x
Oligo-saccharyl-transferase complex OST1 RPN1 x x x x x
OST2 DAD1 x x x x x
OST3 TUSC3 x x
STT3 ITM1 x x x x x x
WBP1 DDOST x x x x x x
Trimming of outer saccharides CWH41/GLS1 GCS1 IIb (606056) x x x x x
ROT2/GLS2 GANAB x x x x x
MNS1 MAN1B1 x x x x x

Genes are broadly categorized by function. Human genes are identified by name when possible and the corresponding congenital disorders of glycosylation (CDG, with OMIM ID) are shown. For A. thaliana, C. elegans, D. melanogaster, D. rerio, M. musculus, and P. falciparum, boxes marked with “x” indicate that a peptide from this organism was placed in the same OrthoMCL family with the yeast gene. Not shown: SWP1 is homologous to human ribophorin II [30], and SWP1, OST4, OST5, and OST6 do not lie in ortholog families.

Discussion

The database system (P-POD) we constructed shows users predicted orthologs of query proteins alone (using OrthoMCL) and in their broader evolutionary context (using Jaccard clustering). It consists of a comparative genomics analysis pipeline whose results are stored in a generic, modular database schema (GMOD/chado) using a freely available database system (PostgreSQL). P-POD is meant not to replace but rather to complement the currently available comparative genomics databases. To our knowledge, no other comparative genomics database provides experimental evidence of conservation curated from the primary literature.

We envision at least three sets of users of our database system. First, molecular biologists can query the database over the web to browse orthology data, both computational and experimental, for their favorite proteins. Another set of users consists of model organism database developers, who will quickly be able to provide comparative genomics tools with their species of interest by implementing our system. Finally, we expect that computational biologists who are developing novel comparative genomics algorithms will find the curated information and computational data from other methods extremely useful in assessing their approach. In addition, by using our system, they will save time in implementation and will be able to more readily distribute their algorithms.

It is important to emphasize that while computational methods to identify orthologs are extremely useful, they are by no means perfect. While OrthoMCL does reasonably well in creating putative orthologous groups, like all computational methods, in many cases it fails, either leaving out true orthologs or inappropriately including paralogs [7]. If one's main goal is to use such an algorithm solely to identify strict orthologs, then the selection of species is critical, and the inclusion of two mammals along with the distantly related Plasmodium certainly will increase the number of families that contain extraneous paralogs. Our goal, however, is to provide a database that can serve not only computational or evolutionary biologists but also the day-to-day needs of biologists who work on the common model organisms. P-POD provides a way for biologists to query directly for their gene of interest from their species of study, even though in some cases the phylogenetic trees must be manually examined to determine true orthologs because of the occasional inclusion of paralogs. As more refined methods for automatic detection of orthology are developed (for example, [20], [21]) we plan to incorporate them into the P-POD tool, taking advantage of our modular design scheme.

We plan to provide regular updates to the data contained within the database. At the time of writing, we are running the analysis pipeline with the latest versions of the genomes. In addition, we will add new features to the web interface and will expand upon the amount of data stored within the database. We will also continue to provide curated literature describing experimental confirmation of orthology. All the data within the database are freely and publicly available through the web and by downloading the entire database system via the URL http://ortholog.princeton.edu/.

Materials and Methods

The overall analysis pipeline is illustrated in Figure 1. The sources and versions of the pipeline components are listed in Table 2.

WU-BLAST

The same WU-BLAST results were used as input to both OrthoMCL and Jaccard algorithms described below. WU-BLAST (version 2.0MP-WashU) was run with the default BLASTP settings: matrix = BLOSUM62, Expectation Threshold = 10, ctxfactor = 1.0, no filtering.

OrthoMCL and Jaccard Algorithms

OrthoMCL (v. 1.2, 14-March-2005 [4]) compares the all-against-all BLASTP scores from a set of genomes, first identifying putative orthologs as reciprocal best hits between pairs of genomes, then identifying candidate recent paralogs as proteins within the same species that are more similar to each other than to any sequence in the other species. All orthologs and recent paralogs are then converted into a graph where the nodes represent the proteins and the edges represent their relationships. A normalization step is then used to correct for systematic biases when comparing pairs of genomes. Finally, the ortholog families are resolved by application of the Markov Cluster algorithm (MCL v. 1.005, 05-118). Since this procedure maximally includes in a family only those proteins at least as closely related as between-species reciprocal best hits, the resultant OrthoMCL group can be considered a set of putative orthologs in that every protein in the group is likely orthologous to at least one other group member. Some groups, however, consist solely of proteins from a single species; obviously, such groups only contain recent paralogs, but this information is often of great importance to experimental biologists.

We used the following OrthoMCL parameters. P-value cutoff: 1e−5, percent identity and percent match cutoffs: 0, maximum weight: 100.

OrthoMCL family size can be adjusted by changing the inflation index (1.5 in this study), but this does not loosen the fundamental restriction that the algorithm begins with a list of putative orthologs and paralogs. To get larger families showing more distant relationships, we wanted to remove this restriction and include proteins that exhibit significant sequence similarity over a large portion of their lengths. We chose to perform Jaccard clustering and to apply a more broadly-defined set of criteria, namely that members of the same family should have significant BLAST scores over at least half of their length. This last point is important to reduce the chance of grouping two sequences together based on the presence of short promiscuous domains.

In the Jaccard clustering analysis, two proteins are grouped into the same family if they share a significant number of homologs, calculated as follows. First, a list of homologs for each sequence, consisting of those whose relative BLASTP scores are less than 1e−5 over a total of at least 50% of the length of each, is generated for each protein. Then the Jaccard index for each pair is calculated; this is the ratio of the magnitude of the intersection of their homolog sets vs. the union, or |A∩B| / |A∪B|. Final clusters are generated by linking proteins whose mutual Jaccard index is above a pre-determined cutoff. We evaluated the impact of varying the cutoff over a range of 0.3 to 0.8 for several well-characterized protein families, such as actins, tubulins, RNA polymerases, and several proteins containing RING finger or SH3 domains. We chose a Jaccard index of 0.4 since it most broadly permitted the inclusion of expected members of the families while excluding obvious non-members. For example, at a cutoff of 0.5, the family containing yeast actin (ACT1) inappropriately omitted the human and mouse actin-related proteins ACTR8 and Actr8, while a cutoff of 0.3 was clearly too low and yielded many families with hundreds of extraneous members.

Generation of phylogenetic trees

P-POD generates phylogenetic trees of the OrthoMCL and Jaccard families using CLUSTAL W [5] and PHYLIP (Felsenstein, J. 2005. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle), using ProML with global rearrangements turned on. CLUSTAL W was run with the default settings: matrix = BLOSUM, Gapopen = 10, Gapext = 0.2, Gapdist = 8, Max div. = 40, ENDGAPS, NOPGAPS and NOHGAPS off, PWMATRIX = BLOSUM, PWGAPOPEN = 10, PWGAPEXT = 0.1, Distance = Kimura, TOSSGAPS = ON, Output = PHYLIP.

Literature

During literature curation at SGD for its “Literature Guide” resource, papers may be associated with yeast genes and various topics that describe what the paper addresses. A list of all papers associated with the topics “Cross-species expression” or “Disease-related” was downloaded from the SGD FTP site and loaded into the P-POD database, along with links to the yeast genes as made by the SGD curators. These papers are displayed on the P-POD interface whenever a family that contains the relevant yeast genes is viewed; each paper displayed is hyperlinked to the PubMed database. For the papers associated with the “Cross-species expression” topic, we manually read each paper to extract which gene(s) from which organism(s) were tested, and whether functional complementation was demonstrated. These results are stored in the database and displayed on the P-POD interface.

Database schema and software

P-POD uses the Generic Model Organism Database (GMOD) database package using PostgreSQL software. Information and documentation about the GMOD schema (also known as the “chado” schema) can be found on the GMOD web site (www.gmod.org). In addition, Supplemental Table 1 (http://ortholog.princeton.edu/help.html#schema) provides details about our particular implementation of the GMOD schema, including how data from our analysis (FASTA files, OrthoMCL results, etc.) are mapped to the GMOD database tables.

Acknowledgments

We acknowledge John Wiggins and Mark Schroeder for excellent technical support and Mike Cherry (SGD), Shuai Weng (SGD), Eurie Hong (SGD), Laurie Kramer (Princeton) and John Matese (Princeton) for valuable discussions.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This work was funded by NIH grant 5R01HG003471 awarded to DB (PI) and KD (co-investigator), by NIH grant P50 GM071508 awarded to DB, and by NIH contract NO1-AI-40038 awarded to OW.

References

  • 1.Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2007;35:D21–25. doi: 10.1093/nar/gkl986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kulikova T, Akhtar R, Aldebert P, Althorpe N, Andersson M, et al. EMBL Nucleotide Sequence Database in 2006. Nucleic Acids Res. 2007;35:D16–20. doi: 10.1093/nar/gkl913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Alexeyenko A, Lindberg J, Perez-Bercoff A, Sonnhammer ELL. Overview and comparison of ortholog databases. Drug Discov Today. 2006;11:137–143. doi: 10.1016/j.ddtec.2006.06.002. [DOI] [PubMed] [Google Scholar]
  • 4.Li L, Stoeckert CJ, Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lenffer J, Nicholas FW, Castle K, Rao A, Gregory S, et al. OMIA (Online Mendelian Inheritance in Animals): an enhanced platform and integration into the Entrez search interface at NCBI. Nucleic Acids Res. 2006;34:D599–601. doi: 10.1093/nar/gkj152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chen F, Mackey AJ, Vermunt JK, Roos DS. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE. 2007;2:e383. doi: 10.1371/journal.pone.0000383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Dwight SS, Harris MA, Dolinski K, Ball CA, Binkley G, et al. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res. 2002;30:69–72. doi: 10.1093/nar/30.1.69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Boyle EI, Weng S, Gollub J, Jin H, Botstein D, et al. GO::TermFinder–open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics. 2004;20:3710–3715. doi: 10.1093/bioinformatics/bth456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005;39:309–338. doi: 10.1146/annurev.genet.39.073003.114725. [DOI] [PubMed] [Google Scholar]
  • 11.Kron SJ, Drubin DG, Botstein D, Spudich JA. Yeast actin filaments display ATP-dependent sliding movement over surfaces coated with rabbit muscle myosin. Proc Natl Acad Sci U S A. 1992;89:4466–4470. doi: 10.1073/pnas.89.10.4466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Keeling PJ, Doolittle WF. Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family. Mol Biol Evol. 1996;13:1297–1305. doi: 10.1093/oxfordjournals.molbev.a025576. [DOI] [PubMed] [Google Scholar]
  • 13.Dutcher SK. Long-lost relatives reappear: identification of new members of the tubulin superfamily. Curr Opin Microbiol. 2003;6:634–640. doi: 10.1016/j.mib.2003.10.016. [DOI] [PubMed] [Google Scholar]
  • 14.Archambault J, Friesen JD. Genetics of eukaryotic RNA polymerases I, II, and III. Microbiol Rev. 1993;57:703–724. doi: 10.1128/mr.57.3.703-724.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Herr AJ, Jensen MB, Dalmay T, Baulcombe DC. RNA polymerase IV directs silencing of endogenous DNA. Science. 2005;308:118–120. doi: 10.1126/science.1106910. [DOI] [PubMed] [Google Scholar]
  • 16.Onodera Y, Haag JR, Ream T, Nunes PC, Pontes O, et al. Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell. 2005;120:613–622. doi: 10.1016/j.cell.2005.02.007. [DOI] [PubMed] [Google Scholar]
  • 17.Gowda DC, Gupta P, Davidson EA. Glycosylphosphatidylinositol anchors represent the major carbohydrate modification in proteins of intraerythrocytic stage Plasmodium falciparum. J Biol Chem. 1997;272:6428–6439. doi: 10.1074/jbc.272.10.6428. [DOI] [PubMed] [Google Scholar]
  • 18.Kedees MH, Azzouz N, Gerold P, Shams-Eldin H, Iqbal J, et al. Plasmodium falciparum: glycosylation status of Plasmodium falciparum circumsporozoite protein expressed in the baculovirus system. Exp Parasitol. 2002;101:64–68. doi: 10.1016/s0014-4894(02)00030-9. [DOI] [PubMed] [Google Scholar]
  • 19.Kocken CH, Withers-Martinez C, Dubbeld MA, van der Wel A, Hackett F, et al. High-level expression of the malaria blood-stage vaccine candidate Plasmodium falciparum apical membrane antigen 1 and induction of antibodies that inhibit erythrocyte invasion. Infect Immun. 2002;70:4471–4476. doi: 10.1128/IAI.70.8.4471-4476.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Alexeyenko A, Tamas I, Liu G, Sonnhammer EL. Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics. 2006;22:e9–15. doi: 10.1093/bioinformatics/btl213. [DOI] [PubMed] [Google Scholar]
  • 21.Jothi R, Zotenko E, Tasneem A, Przytycka TM. COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations. Bioinformatics. 2006;22:779–788. doi: 10.1093/bioinformatics/btl009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. doi: 10.1186/1471-2105-4-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, et al. Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res. 2002;12:493–502. doi: 10.1101/gr.212002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2006;34:D173–180. doi: 10.1093/nar/gkj158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.O'Brien KP, Remm M, Sonnhammer EL. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2005;33:D476–480. doi: 10.1093/nar/gki107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.O'Brien KP, Westerlund I, Sonnhammer EL. OrthoDisease: a database of human disease orthologs. Hum Mutat. 2004;24:112–119. doi: 10.1002/humu.20068. [DOI] [PubMed] [Google Scholar]
  • 27.Chen F, Mackey AJ, Stoeckert CJ, Jr, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006;34:D363–368. doi: 10.1093/nar/gkj123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Penkett CJ, Morris JA, Wood V, Bahler J. YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms. Nucleic Acids Res. 2006;34:W330–334. doi: 10.1093/nar/gkl311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Samuel Lattimore B, van Dongen S, Crabbe MJ. GeneMCL in microarray analysis. Comput Biol Chem. 2005;29:354–359. doi: 10.1016/j.compbiolchem.2005.07.002. [DOI] [PubMed] [Google Scholar]
  • 30.Kelleher DJ, Gilmore R. The Saccharomyces cerevisiae oligosaccharyltransferase is a protein complex composed of Wbp1p, Swp1p, and four additional polypeptides. J Biol Chem. 1994;269:12908–12917. [PubMed] [Google Scholar]
  • 31.Nomoto S, Watanabe Y, Ninomiya-Tsuji J, Yang LX, Nagai Y, et al. Functional analyses of mammalian protein kinase C isozymes in budding yeast and mammalian fibroblasts. Genes Cells. 1997;2:601–614. doi: 10.1046/j.1365-2443.1997.1470346.x. [DOI] [PubMed] [Google Scholar]
  • 32.Kida Y, Uchida S, Miyazaki H, Sasaki S, Marumo F. Localization of mouse CLC-6 and CLC-7 mRNA and their functional complementation of yeast CLC gene mutant. Histochem Cell Biol. 2001;115:189–194. doi: 10.1007/s004180000245. [DOI] [PubMed] [Google Scholar]
  • 33.Yamagata K, Kato J, Shimamoto A, Goto M, Furuichi Y, et al. Bloom's and Werner's syndrome genes suppress hyperrecombination in yeast sgs1 mutant: implication for genomic instability in human diseases. Proc Natl Acad Sci U S A. 1998;95:8733–8738. doi: 10.1073/pnas.95.15.8733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Clark AB, Cook ME, Tran HT, Gordenin DA, Resnick MA, et al. Functional analysis of human MutSalpha and MutSbeta complexes in yeast. Nucleic Acids Res. 1999;27:736–742. doi: 10.1093/nar/27.3.736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Garbers C, DeLong A, Deruere J, Bernasconi P, Soll D. A mutation in protein phosphatase 2A regulatory subunit A affects auxin transport in Arabidopsis. Embo J. 1996;15:2115–2124. [PMC free article] [PubMed] [Google Scholar]
  • 36.Frank CG, Grubenmann CE, Eyaid W, Berger EG, Aebi M, et al. Identification and functional analysis of a defect in the human ALG9 gene: definition of congenital disorder of glycosylation type IL. Am J Hum Genet. 2004;75:146–150. doi: 10.1086/422367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Schwarz M, Thiel C, Lubbehusen J, Dorland B, de Koning T, et al. Deficiency of GDP-Man:GlcNAc2-PP-dolichol mannosyltransferase causes congenital disorder of glycosylation type Ik. Am J Hum Genet. 2004;74:472–481. doi: 10.1086/382492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Ballester R, Marchuk D, Boguski M, Saulino A, Letcher R, et al. The NF1 locus encodes a protein functionally related to mammalian GAP and yeast IRA proteins. Cell. 1990;63:851–859. doi: 10.1016/0092-8674(90)90151-4. [DOI] [PubMed] [Google Scholar]
  • 39.Poullet P, Lin B, Esson K, Tamanoi F. Functional significance of lysine 1423 of neurofibromin and characterization of a second site suppressor which rescues mutations at this residue and suppresses RAS2Val-19-activated phenotypes. Mol Cell Biol. 1994;14:815–821. doi: 10.1128/mcb.14.1.815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Gecz J, Shaw MA, Bellon JR, de Barros Lopes M. Human wild-type SEDL protein functionally complements yeast Trs20p but some naturally occurring SEDL mutants do not. Gene. 2003;320:137–144. doi: 10.1016/s0378-1119(03)00819-9. [DOI] [PubMed] [Google Scholar]
  • 41.Gao XD, Wang J, Keppler-Ross S, Dean N. ERS1 encodes a functional homologue of the human lysosomal cystine transporter. Febs J. 2005;272:2497–2511. doi: 10.1111/j.1742-4658.2005.04670.x. [DOI] [PubMed] [Google Scholar]
  • 42.Cavadini P, Gellera C, Patel PI, Isaya G. Human frataxin maintains mitochondrial iron homeostasis in Saccharomyces cerevisiae. Hum Mol Genet. 2000;9:2523–2530. doi: 10.1093/hmg/9.17.2523. [DOI] [PubMed] [Google Scholar]
  • 43.Desmyter L, Dewaele S, Reekmans R, Nystrom T, Contreras R, et al. Expression of the human ferritin light chain in a frataxin mutant yeast affects ageing and cell death. Exp Gerontol. 2004;39:707–715. doi: 10.1016/j.exger.2004.01.008. [DOI] [PubMed] [Google Scholar]
  • 44.Feiler HS, Desprez T, Santoni V, Kronenberger J, Caboche M, et al. The higher plant Arabidopsis thaliana encodes a functional CDC48 homologue which is highly expressed in dividing and expanding cells. Embo J. 1995;14:5626–5637. doi: 10.1002/j.1460-2075.1995.tb00250.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hsi G, Cullen LM, Moira Glerum D, Cox DW. Functional assessment of the carboxy-terminus of the Wilson disease copper-transporting ATPase, ATP7B. Genomics. 2004;83:473–481. doi: 10.1016/j.ygeno.2003.08.022. [DOI] [PubMed] [Google Scholar]
  • 46.Bussey H, Storms RK, Ahmed A, Albermann K, Allen E, et al. The nucleotide sequence of Saccharomyces cerevisiae chromosome XVI. Nature. 1997;387:103–105. [PubMed] [Google Scholar]
  • 47.Portmann R, Solioz M. Purification and functional reconstitution of the human Wilson copper ATPase, ATP7B. FEBS Lett. 2005;579:3589–3595. doi: 10.1016/j.febslet.2005.05.042. [DOI] [PubMed] [Google Scholar]
  • 48.Sambongi Y, Wakabayashi T, Yoshimizu T, Omote H, Oka T, et al. Caenorhabditis elegans cDNA for a Menkes/Wilson disease gene homologue and its function in a yeast CCC2 gene deletion mutant. J Biochem (Tokyo) 1997;121:1169–1175. doi: 10.1093/oxfordjournals.jbchem.a021711. [DOI] [PubMed] [Google Scholar]
  • 49.Mercer JF, Barnes N, Stevenson J, Strausak D, Llanos RM. Copper-induced trafficking of the cU-ATPases: a key mechanism for copper homeostasis. Biometals. 2003;16:175–184. doi: 10.1023/a:1020719016675. [DOI] [PubMed] [Google Scholar]
  • 50.Payne AS, Gitlin JD. Functional expression of the menkes disease protein reveals common biochemical mechanisms among the copper-transporting P-type ATPases. J Biol Chem. 1998;273:3765–3770. doi: 10.1074/jbc.273.6.3765. [DOI] [PubMed] [Google Scholar]
  • 51.Jantti J, Lahdenranta J, Olkkonen VM, Soderlund H, Keranen S. SEM1, a homologue of the split hand/split foot malformation candidate gene Dss1, regulates exocytosis and pseudohyphal differentiation in yeast. Proc Natl Acad Sci U S A. 1999;96:909–914. doi: 10.1073/pnas.96.3.909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Sone T, Saeki Y, Toh-e A, Yokosawa H. Sem1p is a novel subunit of the 26 S proteasome from Saccharomyces cerevisiae. J Biol Chem. 2004;279:28807–28816. doi: 10.1074/jbc.M403165200. [DOI] [PubMed] [Google Scholar]
  • 53.Morita T, Yoshimura Y, Yamamoto A, Murata K, Mori M, et al. A mouse homolog of the Escherichia coli recA and Saccharomyces cerevisiae RAD51 genes. Proc Natl Acad Sci U S A. 1993;90:6577–6580. doi: 10.1073/pnas.90.14.6577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Loewen CJ, Levine TP. A highly conserved binding site in vesicle-associated membrane protein-associated protein (VAP) for the FFAT motif of lipid-binding proteins. J Biol Chem. 2005;280:14097–14104. doi: 10.1074/jbc.M500147200. [DOI] [PubMed] [Google Scholar]
  • 55.Guzder SN, Sung P, Prakash S, Prakash L. Lethality in yeast of trichothiodystrophy (TTD) mutations in the human xeroderma pigmentosum group D gene. Implications for transcriptional defect in TTD. J Biol Chem. 1995;270:17660–17663. doi: 10.1074/jbc.270.30.17660. [DOI] [PubMed] [Google Scholar]
  • 56.Sung P, Bailly V, Weber C, Thompson LH, Prakash L, et al. Human xeroderma pigmentosum group D gene encodes a DNA helicase. Nature. 1993;365:852–855. doi: 10.1038/365852a0. [DOI] [PubMed] [Google Scholar]
  • 57.Lanterman MM, Dickinson JR, Danner DJ. Functional analysis in Saccharomyces cerevisiae of naturally occurring amino acid substitutions in human dihydrolipoamide dehydrogenase. Hum Mol Genet. 1996;5:1643–1648. doi: 10.1093/hmg/5.10.1643. [DOI] [PubMed] [Google Scholar]
  • 58.McEwen RK, Dove SK, Cooke FT, Painter GF, Holmes AB, et al. Complementation analysis in PtdInsP kinase-deficient yeast mutants demonstrates that Schizosaccharomyces pombe and murine Fab1p homologues are phosphatidylinositol 3-phosphate 5-kinases. J Biol Chem. 1999;274:33905–33912. doi: 10.1074/jbc.274.48.33905. [DOI] [PubMed] [Google Scholar]
  • 59.Mayordomo I, Sanz P. Human pancreatic glucokinase (GlkB) complements the glucose signalling defect of Saccharomyces cerevisiae hxk2 mutants. Yeast. 2001;18:1309–1316. doi: 10.1002/yea.780. [DOI] [PubMed] [Google Scholar]
  • 60.Lucas ME, Ma Q, Cunningham D, Peters J, Cattanach B, et al. Identification of two novel mutations in the murine Nsdhl sterol dehydrogenase gene and development of a functional complementation assay in yeast. Mol Genet Metab. 2003;80:227–233. doi: 10.1016/s1096-7192(03)00137-9. [DOI] [PubMed] [Google Scholar]
  • 61.Geisler M, Frangne N, Gomes E, Martinoia E, Palmgren MG. The ACA4 gene of Arabidopsis encodes a vacuolar membrane calcium pump that improves salt tolerance in yeast. Plant Physiol. 2000;124:1814–1827. doi: 10.1104/pp.124.4.1814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Schiott M, Romanowsky SM, Baekgaard L, Jakobsen MK, Palmgren MG, et al. A plant plasma membrane Ca2+ pump is required for normal pollen tube growth and fertilization. Proc Natl Acad Sci U S A. 2004;101:9502–9507. doi: 10.1073/pnas.0401542101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Kleinow T, Bhalerao R, Breuer F, Umeda M, Salchert K, et al. Functional identification of an Arabidopsis snf4 ortholog by screening for heterologous multicopy suppressors of snf4 deficiency in yeast. Plant J. 2000;23:115–122. doi: 10.1046/j.1365-313x.2000.00809.x. [DOI] [PubMed] [Google Scholar]
  • 64.Lumbreras V, Alba MM, Kleinow T, Koncz C, Pages M. Domain fusion between SNF1-related kinase subunits during plant evolution. EMBO Rep. 2001;2:55–60. doi: 10.1093/embo-reports/kve001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Roje S, Wang H, McNeil SD, Raymond RK, Appling DR, et al. Isolation, characterization, and functional expression of cDNAs encoding NADH-dependent methylenetetrahydrofolate reductase from higher plants. J Biol Chem. 1999;274:36089–36096. doi: 10.1074/jbc.274.51.36089. [DOI] [PubMed] [Google Scholar]
  • 66.Raymond RK, Kastanos EK, Appling DR. Saccharomyces cerevisiae expresses two genes encoding isozymes of methylenetetrahydrofolate reductase. Arch Biochem Biophys. 1999;372:300–308. doi: 10.1006/abbi.1999.1498. [DOI] [PubMed] [Google Scholar]
  • 67.Ton VK, Mandal D, Vahadji C, Rao R. Functional expression in yeast of the human secretory pathway Ca(2+), Mn(2+)-ATPase defective in Hailey-Hailey disease. J Biol Chem. 2002;277:6422–6427. doi: 10.1074/jbc.M110612200. [DOI] [PubMed] [Google Scholar]
  • 68.Ton VK, Rao R. Expression of Hailey-Hailey disease mutations in yeast. J Invest Dermatol. 2004;123:1192–1194. doi: 10.1111/j.0022-202X.2004.23437.x. [DOI] [PubMed] [Google Scholar]
  • 69.Kellermayer R. Hailey-Hailey disease as an orthodisease of PMR1 deficiency in Saccharomyces cerevisiae. FEBS Lett. 2005;579:2021–2025. doi: 10.1016/j.febslet.2005.03.003. [DOI] [PubMed] [Google Scholar]
  • 70.Heinisch JJ. Expression of heterologous phosphofructokinase genes in yeast. FEBS Lett. 1993;328:35–40. doi: 10.1016/0014-5793(93)80960-3. [DOI] [PubMed] [Google Scholar]
  • 71.Raben N, Exelbert R, Spiegel R, Sherman JB, Nakajima H, et al. Functional expression of human mutant phosphofructokinase in yeast: genetic defects in French Canadian and Swiss patients with phosphofructokinase deficiency. Am J Hum Genet. 1995;56:131–141. [PMC free article] [PubMed] [Google Scholar]
  • 72.Garavaglia B, Invernizzi F, Carbone ML, Viscardi V, Saracino F, et al. GTP-cyclohydrolase I gene mutations in patients with autosomal dominant and recessive GTP-CH1 deficiency: identification and functional characterization of four novel mutations. J Inherit Metab Dis. 2004;27:455–463. doi: 10.1023/B:BOLI.0000037349.08483.96. [DOI] [PubMed] [Google Scholar]
  • 73.Mancini R, Saracino F, Buscemi G, Fischer M, Schramek N, et al. Complementation of the fol2 deletion in Saccharomyces cerevisiae by human and Escherichia coli genes encoding GTP cyclohydrolase I. Biochem Biophys Res Commun. 1999;255:521–527. doi: 10.1006/bbrc.1998.9951. [DOI] [PubMed] [Google Scholar]
  • 74.Geraghty MT, Vaughn D, Nicholson AJ, Lin WW, Jimenez-Sanchez G, et al. Mutations in the Delta1-pyrroline 5-carboxylate dehydrogenase gene cause type II hyperprolinemia. Hum Mol Genet. 1998;7:1411–1415. doi: 10.1093/hmg/7.9.1411. [DOI] [PubMed] [Google Scholar]
  • 75.Hu CA, Lin WW, Valle D. Cloning, characterization, and expression of cDNAs encoding human delta 1-pyrroline-5-carboxylate dehydrogenase. J Biol Chem. 1996;271:9795–9800. doi: 10.1074/jbc.271.16.9795. [DOI] [PubMed] [Google Scholar]
  • 76.Morgante PG, Berra CM, Nakabashi M, Costa RM, Menck CF, et al. Functional XPB/RAD25 redundancy in Arabidopsis genome: characterization of AtXPB2 and expression analysis. Gene. 2005;344:93–103. doi: 10.1016/j.gene.2004.10.006. [DOI] [PubMed] [Google Scholar]
  • 77.Pearce DA, Sherman F. A yeast model for the study of Batten disease. Proc Natl Acad Sci U S A. 1998;95:6915–6918. doi: 10.1073/pnas.95.12.6915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Saunders RD, McLellan LI. Molecular cloning of Drosophila gamma-glutamylcysteine synthetase by functional complementation of a yeast mutant. FEBS Lett. 2000;467:337–340. doi: 10.1016/s0014-5793(00)01148-0. [DOI] [PubMed] [Google Scholar]
  • 79.Srinivasan C, Liba A, Imlay JA, Valentine JS, Gralla EB. Yeast lacking superoxide dismutase(s) show elevated levels of “free iron” as measured by whole cell electron paramagnetic resonance. J Biol Chem. 2000;275:29187–29192. doi: 10.1074/jbc.M004239200. [DOI] [PubMed] [Google Scholar]
  • 80.Agarwal AK, Fryns JP, Auchus RJ, Garg A. Zinc metalloproteinase, ZMPSTE24, is mutated in mandibuloacral dysplasia. Hum Mol Genet. 2003;12:1995–2001. doi: 10.1093/hmg/ddg213. [DOI] [PubMed] [Google Scholar]
  • 81.Schmidt WK, Tam A, Michaelis S. Reconstitution of the Ste24p-dependent N-terminal proteolytic step in yeast a-factor biogenesis. J Biol Chem. 2000;275:6227–6233. doi: 10.1074/jbc.275.9.6227. [DOI] [PubMed] [Google Scholar]
  • 82.Hofmann S, Rothbauer U, Muhlenbein N, Neupert W, Gerbitz KD, et al. The C66W mutation in the deafness dystonia peptide 1 (DDP1) affects the formation of functional DDP1.TIM13 complexes in the mitochondrial intermembrane space. J Biol Chem. 2002;277:23287–23293. doi: 10.1074/jbc.M201154200. [DOI] [PubMed] [Google Scholar]
  • 83.Rothbauer U, Hofmann S, Muhlenbein N, Paschen SA, Gerbitz KD, et al. Role of the deafness dystonia peptide 1 (DDP1) in import of human Tim23 into the inner membrane of mitochondria. J Biol Chem. 2001;276:37327–37334. doi: 10.1074/jbc.M105313200. [DOI] [PubMed] [Google Scholar]
  • 84.Raymond M, Gros P, Whiteway M, Thomas DY. Functional complementation of yeast ste6 by a mammalian multidrug resistance mdr gene. Science. 1992;256:232–234. doi: 10.1126/science.1348873. [DOI] [PubMed] [Google Scholar]
  • 85.Boyum R, Guidotti G. Effect of ATP binding cassette/multidrug resistance proteins on ATP efflux of Saccharomyces cerevisiae. Biochem Biophys Res Commun. 1997;230:22–26. doi: 10.1006/bbrc.1996.5913. [DOI] [PubMed] [Google Scholar]
  • 86.Chen Y, Beck A, Davenport C, Chen Y, Shattuck D, et al. Characterization of TRZ1, a yeast homolog of the human candidate prostate cancer susceptibility gene ELAC2 encoding tRNase Z. BMC Mol Biol. 2005;6:12. doi: 10.1186/1471-2199-6-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Peng Z, Lu Q, Verma DP. Reciprocal regulation of delta 1-pyrroline-5-carboxylate synthetase and proline dehydrogenase genes controls proline levels during and after osmotic stress in plants. Mol Gen Genet. 1996;253:334–341. doi: 10.1007/pl00008600. [DOI] [PubMed] [Google Scholar]
  • 88.Chatterjee A, Singh KK. Uracil-DNA glycosylase-deficient yeast exhibit a mitochondrial mutator phenotype. Nucleic Acids Res. 2001;29:4935–4940. doi: 10.1093/nar/29.24.4935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Lillard-Wetherell K, Combs KA, Groden J. BLM helicase complements disrupted type II telomere lengthening in telomerase-negative sgs1 yeast. Cancer Res. 2005;65:5520–5522. doi: 10.1158/0008-5472.CAN-05-0632. [DOI] [PubMed] [Google Scholar]
  • 90.Neff NF, Ellis NA, Ye TZ, Noonan J, Huang K, et al. The DNA helicase activity of BLM is necessary for the correction of the genomic instability of bloom syndrome cells. Mol Biol Cell. 1999;10:665–676. doi: 10.1091/mbc.10.3.665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Grubenmann CE, Frank CG, Kjaergaard S, Berger EG, Aebi M, et al. ALG12 mannosyltransferase defect in congenital disorder of glycosylation type lg. Hum Mol Genet. 2002;11:2331–2339. doi: 10.1093/hmg/11.19.2331. [DOI] [PubMed] [Google Scholar]
  • 92.Forsgren M, Attersand A, Lake S, Grunler J, Swiezewska E, et al. Isolation and functional expression of human COQ2, a gene encoding a polyprenyl transferase involved in the synthesis of CoQ. Biochem J. 2004;382:519–526. doi: 10.1042/BJ20040261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Okada K, Ohara K, Yazaki K, Nozaki K, Uchida N, et al. The AtPPT1 gene encoding 4-hydroxybenzoate polyprenyl diphosphate transferase in ubiquinone biosynthesis is required for embryo development in Arabidopsis thaliana. Plant Mol Biol. 2004;55:567–577. doi: 10.1007/s11103-004-1298-4. [DOI] [PubMed] [Google Scholar]
  • 94.Willingham S, Outeiro TF, DeVit MJ, Lindquist SL, Muchowski PJ. Yeast genes that enhance the toxicity of a mutant huntingtin fragment or alpha-synuclein. Science. 2003;302:1769–1772. doi: 10.1126/science.1090389. [DOI] [PubMed] [Google Scholar]
  • 95.Xu GF, Lin B, Tanaka K, Dunn D, Wood D, et al. The catalytic domain of the neurofibromatosis type 1 gene product stimulates ras GTPase and complements ira mutants of S. cerevisiae. Cell. 1990;63:835–841. doi: 10.1016/0092-8674(90)90149-9. [DOI] [PubMed] [Google Scholar]
  • 96.Mamiya N, Worman HJ. Hepatitis C virus core protein binds to a DEAD box RNA helicase. J Biol Chem. 1999;274:15751–15756. doi: 10.1074/jbc.274.22.15751. [DOI] [PubMed] [Google Scholar]
  • 97.Johnstone O, Deuring R, Bock R, Linder P, Fuller MT, et al. Belle is a Drosophila DEAD-box protein required for viability and in the germ line. Dev Biol. 2005;277:92–101. doi: 10.1016/j.ydbio.2004.09.009. [DOI] [PubMed] [Google Scholar]
  • 98.Vonarx EJ, Howlett NG, Schiestl RH, Kunz BA. Detection of Arabidopsis thaliana AtRAD1 cDNA variants and assessment of function by expression in a yeast rad1 mutant. Gene. 2002;296:1–9. doi: 10.1016/s0378-1119(02)00869-7. [DOI] [PubMed] [Google Scholar]
  • 99.Shaag A, Walsh T, Renbaum P, Kirchhoff T, Nafa K, et al. Functional and genomic approaches reveal an ancient CHEK2 allele associated with breast cancer in the Ashkenazi Jewish population. Hum Mol Genet. 2005;14:555–563. doi: 10.1093/hmg/ddi052. [DOI] [PubMed] [Google Scholar]
  • 100.Takeuchi M, Tada M, Saito C, Yashiroda H, Nakano A. Isolation of a tobacco cDNA encoding Sar1 GTPase and analysis of its dominant mutations in vesicular traffic using a yeast complementation system. Plant Cell Physiol. 1998;39:590–599. doi: 10.1093/oxfordjournals.pcp.a029409. [DOI] [PubMed] [Google Scholar]
  • 101.Tomita S, Inoue N, Maeda Y, Ohishi K, Takeda J, et al. A homologue of Saccharomyces cerevisiae Dpm1p is not sufficient for synthesis of dolichol-phosphate-mannose in mammalian cells. J Biol Chem. 1998;273:9249–9254. doi: 10.1074/jbc.273.15.9249. [DOI] [PubMed] [Google Scholar]
  • 102.Lai K, Elsas LJ. Overexpression of human UDP-glucose pyrophosphorylase rescues galactose-1-phosphate uridyltransferase-deficient yeast. Biochem Biophys Res Commun. 2000;271:392–400. doi: 10.1006/bbrc.2000.2629. [DOI] [PubMed] [Google Scholar]
  • 103.Brzeski J, Podstolski W, Olczak K, Jerzmanowski A. Identification and analysis of the Arabidopsis thaliana BSH gene, a member of the SNF5 gene family. Nucleic Acids Res. 1999;27:2393–2399. doi: 10.1093/nar/27.11.2393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Song WY, Martinoia E, Lee J, Kim D, Kim DY, et al. A novel family of cys-rich membrane proteins mediates cadmium resistance in Arabidopsis. Plant Physiol. 2004;135:1027–1039. doi: 10.1104/pp.103.037739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Tommasini R, Vogt E, Fromenteau M, Hortensteiner S, Matile P, et al. An ABC-transporter of Arabidopsis thaliana has both glutathione-conjugate and chlorophyll catabolite transport activity. Plant J. 1998;13:773–780. doi: 10.1046/j.1365-313x.1998.00076.x. [DOI] [PubMed] [Google Scholar]
  • 106.Liang F, Cunningham KW, Harper JF, Sze H. ECA1 complements yeast mutants defective in Ca2+ pumps and encodes an endoplasmic reticulum-type Ca2+-ATPase in Arabidopsis thaliana. Proc Natl Acad Sci U S A. 1997;94:8579–8584. doi: 10.1073/pnas.94.16.8579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Covic L, Lew RR. Arabidopsis thaliana cDNA isolated by functional complementation shows homology to serine/threonine protein kinases. Biochim Biophys Acta. 1996;1305:125–129. doi: 10.1016/0167-4781(95)00233-2. [DOI] [PubMed] [Google Scholar]
  • 108.Schmidt PJ, Ramos-Gomez M, Culotta VC. A gain of superoxide dismutase (SOD) activity obtained with CCS, the copper metallochaperone for SOD1. J Biol Chem. 1999;274:36952–36956. doi: 10.1074/jbc.274.52.36952. [DOI] [PubMed] [Google Scholar]
  • 109.Kataoka T, Powers S, Cameron S, Fasano O, Goldfarb M, et al. Functional homology of mammalian and yeast RAS genes. Cell. 1985;40:19–26. doi: 10.1016/0092-8674(85)90304-6. [DOI] [PubMed] [Google Scholar]
  • 110.Catoni E, Desimone M, Hilpert M, Wipf D, Kunze R, et al. Expression pattern of a nuclear encoded mitochondrial arginine-ornithine translocator gene from Arabidopsis. BMC Plant Biol. 2003;3:1. doi: 10.1186/1471-2229-3-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Krumpelman PM, Freyermuth SK, Cannon JF, Fink GR, Polacco JC. Nucleotide sequence of Arabidopsis thaliana arginase expressed in yeast. Plant Physiol. 1995;107:1479–1480. doi: 10.1104/pp.107.4.1479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.van Wilpe S, Boumans H, Lobo-Hajdu G, Grivell LA, Berden JA. Functional complementation analysis of yeast bc1 mutants. A study of the mitochondrial import of heterologous and hybrid proteins. Eur J Biochem. 1999;264:825–832. doi: 10.1046/j.1432-1327.1999.00673.x. [DOI] [PubMed] [Google Scholar]
  • 113.Schaffar G, Breuer P, Boteva R, Behrends C, Tzvetkov N, et al. Cellular toxicity of polyglutamine expansion proteins: mechanism of transcription factor deactivation Functional complementation analysis of yeast bc1 mutants. A study of the mitochondrial import of heterologous and hybrid proteins. Mol Cell. 2004;15:95–105. doi: 10.1016/j.molcel.2004.06.029. [DOI] [PubMed] [Google Scholar]
  • 114.Wagner N, Weber D, Seitz S, Krohne G. The lamin B receptor of Drosophila melanogaster. J Cell Sci. 2004;117:2015–2028. doi: 10.1242/jcs.01052. [DOI] [PubMed] [Google Scholar]
  • 115.Colussi PA, Taron CH, Mack JC, Orlean P. Human and Saccharomyces cerevisiae dolichol phosphate mannose synthases represent two classes of the enzyme, but both function in Schizosaccharomyces pombe. Proc Natl Acad Sci U S A. 1997;94:7873–7878. doi: 10.1073/pnas.94.15.7873. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES