Short abstract
Functional annotation is used to catalog information that would be of value in experimental design and analysis but annotations in public databases are often incorrect. Here, one such case is discussed.
One of the goals of functional annotation is to catalog information that would be of value in guiding experimental design and analysis. Even in cases for which sequence similarity can be detected reliably, however, functional annotations found in public databases are often incorrect [1,2]. Here, we discuss a case in which a functional assignment was made to RAG1, a protein catalogued as a homeodomain protein in the Online Mendelian Inheritace in Man (OMIM) database [3]. A more in-depth bioinformatic analysis shows this assignment to be incorrect. The known biochemical functions of RAG1 are as an integrase and recombinase [4], functions that are not consistent with those of other homeodomain proteins [5].
Comparison of RAG 1 with homeodomains
The homeodomain is a DNA-binding domain found in many eukaryotic transcription factors and is characterized by a highly stringent sequence signature [6,7]. Structural studies on homeodomain family members have revealed that these proteins contain almost superimposable structures, all consisting of a three-helical bundle with an amino-terminal extension and all exhibiting a similar mode of DNA binding [8,9,10]. Several positions within the homeodomain region that are involved in DNA recognition or stabilization of the structure are conserved across species.
The DNA-binding region of RAG1 is also highly conserved (Figure 1a). This region shows 20% sequence identity with the homeodomain of the Engrailed protein. The DNA-binding domain of RAG1 could not, however, be aligned to a dataset of 129 human homeodomain sequences [11], as RAG1 lacks the evolutionarily conserved amino-acid residues that define the homeodomain family. Also, the sequence motif in the homeodomain DNA-recognition helix (48-WF-x-N-x-R-53, where x is any amino acid) is in fact absent from the RAG1 DNA-binding region. Experimental evidence that RAG1 does not belong to the homeodomain family comes from the observation that a mutant RAG1 protein containing a conserved homeodomain motif failed to bind DNA and is non-functional in in vitro recombination assays [12].
Comparison of RAG 1 with Tc3 transposase and Hin invertase
Caenorhabditis elegans Tc3 is a member of the Tc1/mariner family of transposable elements found in species ranging from fungi to humans [13]. The X-ray structure of Tc3 transposase revealed that the specific DNA-binding region contains three α helices comprising the helix-turn-helix motif [14]. Sequence alignments between the DNA-binding regions of RAG1 and Tc3 shows patches of high sequence conservation, especially in the amino-terminal region. The lack of the DNA-contacting residues of Tc3 from RAG1 indicates significant divergence in the RAG1 DNA-recognition site, however. Automated fold prediction using the University of California, Los Angeles and the Department of Energy cooperative (UCLA-DOE) Fold Recognition Server [15] identified the DNA-binding domain of Tc3 transposase of C. elegans as the candidate whose fold is most likely to represent the RAG1 family.
Hin recombinase belongs to a family of bacterial DNA invertases that catalyze a site-specific recombination reaction. The Hin DNA-binding domain shares distinct sequence similarities with RAG1, and there is a striking similarity between the Hin recognition sequence and RAG1-nonamer site. At the sequence level, the invariant 138-GGRPR-142 motif in the amino-terminal arm of Hin DNA-binding domain is conserved in RAG1. This motif is positioned in the minor groove of the DNA-recognition sequence and provides critical DNA contacts. The sequence similarity between RAG1 and Hin recombinase is extended through the DNA-binding region, with a total of 13 residues absolutely conserved in this region. Structural conservation of the DNA-binding domain of RAG1 and Hin recombinase is illustrated by the observation that a RAG1 hybrid protein containing the homologous DNA-binding region of Hin recombinase is functional in in vitro recombination assays [12].
The helix-turn-helix motif
In the DNA-binding domains of Hin recombinase, Tc3 transposase, and Engrailed, the first and the second helices lie almost anti-parallel to each other, with a turn between the second and the third helices. In all cases, the recognition helix fits into the major groove of the DNA. Although the essential features of the helix-turn-helix motifs are very similar, these proteins do not all dock on the DNA in the same fashion (Figure 1b,1c,1d).
In the X-ray structure of the Engrailed homeodomain-DNA complex, several residues in the exposed hydrophilic face of helix 3 establish specific contacts with the last four base pairs of the recognition sequence, whereas the residues in the amino-terminal arm of the protein contact the first two base pairs of the recognition sequence. Compared to Tc3 transposase and Hin recombinase, the helices and the loops are longer in the homeodomain structure; only helix 3 is inserted in the major groove and the residues in the center of this relatively longer helix provide DNA contacts.
In the Hin-recombinase structure, the α-helical core, along with extensions at both the amino and carboxyl termini, participate in DNA recognition. The eight-residue carboxy-terminal tail of Hin recombinase is inserted in the minor groove of the DNA-recognition site. Wrap-around of the DNA-binding site by the carboxy-terminal extension has not been observed in Tc3 or homeodomain structures. In contrast to the other structures, both the second and third helices of Tc3 transposase participate in DNA recognition by binding to the major groove. The six residues preceding the first helix in Tc3 adopt a conformation different from that seen in the longer amino terminus of the Hin recombinase and the Engrailed homeodomain.
Genomic perspective
The ability of the RAG1 proteins to catalyze both the formation of hybrid joints and transposition highlight the similarities between the mechanism of site-specific rearrangement by V(D)J recombination and certain transposition/retroviral integration reactions. The occurrence of RAG proteins in jawed vertebrates and conservation of domain architecture and function from prokaryotes suggest that the RAG1 proteins might have been horizontally transferred into the eukaryotic genome by a transposon.
The question that may be posed here is what the relevance of the current observation is, and whether the functional mis-assignment is of great importance. During vertebrate lymphocyte development, RAG1 mediates the somatic assembly of antigen receptors, which involves DNA-bond breakage and strand-transfer reactions, reminiscent of transposition reactions in bacteria. Homeodomain proteins play a fundamental role in diverse cellular processes by transcriptional regulation of downstream-target genes. RAG1 has been identified only in jawed vertebrates, whereas homeodomain proteins are highly conserved from yeast to human. The evolution and biological functions of RAG1 and homeodomain proteins are markedly different, and one cannot substitute for the other.
Unfortunately, with the initial mis-classification by Spanopoulou et al. [12] has come experimental interpretation in the context of RAG1 being a homeodomain. Specifically, Villa et al. [16] have interpreted the biochemical effects of mutations leading to Omenn Syndrome as having to do with changes in homeodomain structure, despite statements implicating the observed defects with low degrees of V(D)J recombination. In addition, Aidinis et al. [17] proposed models of interaction of a RAG1 'homeodomain' with the chromatin proteins HMG1 and HMG2. In the study by Aidinis et al. [17], the experiments were designed under the assumption that RAG1 was a homeodomain, leading to incorrect extension of the interpretation of results to the involvement of a homeodomain structure in V(D)J recombination. The model proposed by this group has therefore been made in the wrong biological context.
The incorrect assignment of RAG1 as a homeodomain has colored the interpretation of experimental results. This is emblematic of the larger problem that annotation-error propagation plays in incorrectly guiding experimental discovery. Often, there may be little or no similarity between a sequence of interest and those in the public databases, meaning that it would be very difficult (if not impossible) to determine any degree of relatedness on the basis of sequence alone. Even in cases where homology can be detected reliably, the annotations currently found in the public databases are often incorrect. The considerable effect of processes such as alternative splicing [18] and the ability of proteins to perform markedly different functions depending on their cellular localization and compartmentalization [19], coupled with the number of annotation errors currently in the public databases, all help to re-emphasize the importance of database curation and experimental validation in maintaining the purity and utility of these public resources.
References
- Brenner SE. Errors in genome annotation. Trends Genet. 1999;15:132–133. doi: 10.1016/s0168-9525(99)01706-0. [DOI] [PubMed] [Google Scholar]
- Baxevanis AD. Making the best use of publicly-available bioinformatics resources: keeping biology in mind. Nature Genetics. 2002.
- Online Mendelian Inheritance in Man http://www.ncbi.nlm.nih.gov/omim/
- Sadofsky MJ. The RAG proteins in V(D)J recombination: more than just a nuclease. Nucleic Acids Res. 2001;29:1399–1409. doi: 10.1093/nar/29.7.1399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gehring WJ, Affolter M, Burglin T. Homeodomain proteins. Annu Rev Biochem. 1994;63:487–526. doi: 10.1146/annurev.bi.63.070194.002415. [DOI] [PubMed] [Google Scholar]
- Gehring WJ, Qian YQ, Billeter M, Furukubo-Tokunaga K, Schier AF, Resendez-Perez D, Affolter M, Otting G, Wuthrich K. Homeodomain-DNA recognition. Cell. 1994;78:211–223. doi: 10.1016/0092-8674(94)90292-5. [DOI] [PubMed] [Google Scholar]
- Laughon A. DNA binding specificity of homeodomains. Biochemistry. 1991;30:11357–11367. doi: 10.1021/bi00112a001. [DOI] [PubMed] [Google Scholar]
- Wolberger C, Vershon AK, Liu B, Johnson AD, Pabo CO. Crystal structure of a MATα2 homeodomain-operator complex suggests a general model for homeodomain-DNA interactions. Cell. 1991;67:517–528. doi: 10.1016/0092-8674(91)90526-5. [DOI] [PubMed] [Google Scholar]
- Kissinger CR, Liu BS, Martin-Blanco E, Kornberg TB, Pabo CO. Crystal structure of an engrailed homeodomain-DNA complex at 2.8 Å resolution: a framework for understanding homeodomain-DNA interactions. Cell. 1990;63:579–590. doi: 10.1016/0092-8674(90)90453-l. [DOI] [PubMed] [Google Scholar]
- Gruschus JM, Tsao DH, Wang LH, Nirenberg M, Ferretti JA. Interactions of the vnd/NK-2 homeodomain with DNA by nuclear magnetic resonance spectroscopy: basis of binding specificity. Biochemistry. 1997;36:5372–5380. doi: 10.1021/bi9620060. [DOI] [PubMed] [Google Scholar]
- Banerjee-Basu S, Baxevanis AD. Molecular evolution of the homeodomain family of transcription factors. Nucleic Acids Res. 2001;29:3258–3269. doi: 10.1093/nar/29.15.3258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spanopoulou E, Zaitseva F, Wang FH, Santagata S, Baltimore D, Panayotou G. The homeodomain region of Rag-1 reveals the parallel mechanisms of bacterial and V(D)J recombination. Cell. 1996;87:263–276. doi: 10.1016/s0092-8674(00)81344-6. [DOI] [PubMed] [Google Scholar]
- Doak TG, Doerder FP, Jahn CL, Herrick G. A proposed superfamily of transposase genes: transposon-like elements in ciliated protozoa and a common 'D35E' motif. Proc Natl Acad Sci USA. 1994;91:942–946. doi: 10.1073/pnas.91.3.942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Pouderoyen G, Ketting RF, Perrakis A, Plasterk RH, Sixma TK. Crystal structure of the specific DNA-binding domain of Tc3 transposase of C. elegans in complex with transposon DNA. EMBO J. 1997;16:6044–6054. doi: 10.1093/emboj/16.19.6044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- UCLA-DOE fold server http://fold.doe-mbi.ucla.edu/
- Villa A, Santagata S, Bozzi F, Giliani S, Frattini A, Imberti L, Gatta LB, Ochs HD, Schwarz K, Notarangelo LD, et al. Partial V(D)J recombination activity leads to Omenn syndrome. Cell. 1998;93:885–896. doi: 10.1016/s0092-8674(00)81448-8. [DOI] [PubMed] [Google Scholar]
- Aidinis V, Bonaldi T, Beltrame M, Santagata S, Bianchi ME, Spanopoulou E. The RAG1 homeodomain recruits HMG1 and HMG2 to facilitate recombination signal sequence binding and to enhance the intrinsic DNA-bending activity of RAG1-RAG2. Mol Cell Biol. 1999;19:6532–6542. doi: 10.1128/mcb.19.10.6532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1086/172716. [DOI] [PubMed] [Google Scholar]
- Jeffery CJ. Moonlighting proteins. Trends Biochem Sci. 1999;24:8–11. doi: 10.1016/s0968-0004(98)01335-8. [DOI] [PubMed] [Google Scholar]
- ClustalW http://www.ebi.ac.uk/clustalw/
- Barton GJ. ALSCRIPT: a tool to format multiple sequence alignments. Protein Eng. 1993;6:37–40. doi: 10.1093/protein/6.1.37. [DOI] [PubMed] [Google Scholar]
- Visual Molecular Dynamics http://www.ks.uiuc.edu/Research/vmd
- Protein Data Bank (PDB) http://www.rcsb.org/pdb/