Abstract
We present a method to determine the location and extent of protein binding regions in nucleic acids by computer-assisted analysis of sequence data. The program ConsIndex establishes a library of consensus descriptions based on sequence sets containing known regulatory elements. These defined consensus descriptions are used by the program ConsInspector to predict binding sites in new sequences. We show the programs to correctly determine the significant regions involved in transcriptional control of seven sequence elements. The internal profile of relative variability of individual nucleotide positions within these regions paralleled experimental profiles of biological significance. Consensus descriptions are determined by employing an anchored alignment scheme, the results of which are then evaluated by a novel method which is superior to cluster algorithms. The alignment procedure is able to include several closely related sequences without biasing the consensus description. Moreover, the algorithm detects additional elements on the basis of a moderate distance correlation and is capable of discriminating between real binding sites and false positive matches. The software is well suited to cope with the frequent phenomenon of optional elements present in a subset of functionally similar sequences, while taking maximal advantage of the existing sequence data base. Since it requires only a minimum of seven sequences for a single element, it is applicable to a wide range of binding sites.
Full text
PDFSelected References
These references are in PubMed. This may not be the complete list of references from this article.
- Angel P., Hattori K., Smeal T., Karin M. The jun proto-oncogene is positively autoregulated by its product, Jun/AP-1. Cell. 1988 Dec 2;55(5):875–885. doi: 10.1016/0092-8674(88)90143-2. [DOI] [PubMed] [Google Scholar]
- Brack-Werner R., Barton D. E., Werner T., Foellmer B. E., Leib-Mösch C., Francke U., Erfle V., Hehlmann R. Human SSAV-related endogenous retroviral element: LTR-like sequence and chromosomal localization to 18q21. Genomics. 1989 Jan;4(1):68–75. doi: 10.1016/0888-7543(89)90316-9. [DOI] [PubMed] [Google Scholar]
- Bucher P., Trifonov E. N. Compilation and analysis of eukaryotic POL II promoter sequences. Nucleic Acids Res. 1986 Dec 22;14(24):10009–10026. doi: 10.1093/nar/14.24.10009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cardon L. R., Stormo G. D. Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments. J Mol Biol. 1992 Jan 5;223(1):159–170. doi: 10.1016/0022-2836(92)90723-w. [DOI] [PubMed] [Google Scholar]
- Cavener D. R. Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates. Nucleic Acids Res. 1987 Feb 25;15(4):1353–1361. doi: 10.1093/nar/15.4.1353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Demeler B., Zhou G. W. Neural network optimization for E. coli promoter prediction. Nucleic Acids Res. 1991 Apr 11;19(7):1593–1599. doi: 10.1093/nar/19.7.1593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doolittle R. F., Feng D. F., McClure M. A., Johnson M. S. Retrovirus phylogeny and evolution. Curr Top Microbiol Immunol. 1990;157:1–18. doi: 10.1007/978-3-642-75218-6_1. [DOI] [PubMed] [Google Scholar]
- Dorn A., Bollekens J., Staub A., Benoist C., Mathis D. A multiplicity of CCAAT box-binding proteins. Cell. 1987 Sep 11;50(6):863–872. doi: 10.1016/0092-8674(87)90513-7. [DOI] [PubMed] [Google Scholar]
- Elena S. F., Dopazo J., Flores R., Diener T. O., Moya A. Phylogeny of viroids, viroidlike satellite RNAs, and the viroidlike domain of hepatitis delta virus RNA. Proc Natl Acad Sci U S A. 1991 Jul 1;88(13):5631–5634. doi: 10.1073/pnas.88.13.5631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng D. F., Doolittle R. F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. doi: 10.1007/BF02603120. [DOI] [PubMed] [Google Scholar]
- Galas D. J., Eggert M., Waterman M. S. Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J Mol Biol. 1985 Nov 5;186(1):117–128. doi: 10.1016/0022-2836(85)90262-1. [DOI] [PubMed] [Google Scholar]
- Ghosh D. TFD: the transcription factors database. Nucleic Acids Res. 1992 May 11;20 (Suppl):2091–2093. doi: 10.1093/nar/20.suppl.2091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Golemis E. A., Speck N. A., Hopkins N. Alignment of U3 region sequences of mammalian type C viruses: identification of highly conserved motifs and implications for enhancer design. J Virol. 1990 Feb;64(2):534–542. doi: 10.1128/jvi.64.2.534-542.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goodrich J. A., Schwartz M. L., McClure W. R. Searching for and predicting the activity of sites for DNA binding proteins: compilation and analysis of the binding sites for Escherichia coli integration host factor (IHF). Nucleic Acids Res. 1990 Sep 11;18(17):4993–5000. doi: 10.1093/nar/18.17.4993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halvorsen Y. D., Nandabalan K., Dickson R. C. Identification of base and backbone contacts used for DNA sequence recognition and high-affinity binding by LAC9, a transcription activator containing a C6 zinc finger. Mol Cell Biol. 1991 Apr;11(4):1777–1784. doi: 10.1128/mcb.11.4.1777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hertz G. Z., Hartzell G. W., 3rd, Stormo G. D. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci. 1990 Apr;6(2):81–92. doi: 10.1093/bioinformatics/6.2.81. [DOI] [PubMed] [Google Scholar]
- Jantzen H. M., Strähle U., Gloss B., Stewart F., Schmid W., Boshart M., Miksicek R., Schütz G. Cooperativity of glucocorticoid response elements located far upstream of the tyrosine aminotransferase gene. Cell. 1987 Apr 10;49(1):29–38. doi: 10.1016/0092-8674(87)90752-5. [DOI] [PubMed] [Google Scholar]
- Koonin E. V. The phylogeny of RNA-dependent RNA polymerases of positive-strand RNA viruses. J Gen Virol. 1991 Sep;72(Pt 9):2197–2206. doi: 10.1099/0022-1317-72-9-2197. [DOI] [PubMed] [Google Scholar]
- Lamb P., McKnight S. L. Diversity and specificity in transcriptional regulation: the benefits of heterotypic dimerization. Trends Biochem Sci. 1991 Nov;16(11):417–422. doi: 10.1016/0968-0004(91)90167-t. [DOI] [PubMed] [Google Scholar]
- Lawrence C. E., Reilly A. A. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins. 1990;7(1):41–51. doi: 10.1002/prot.340070105. [DOI] [PubMed] [Google Scholar]
- Lee W., Mitchell P., Tjian R. Purified transcription factor AP-1 interacts with TPA-inducible enhancer elements. Cell. 1987 Jun 19;49(6):741–752. doi: 10.1016/0092-8674(87)90612-x. [DOI] [PubMed] [Google Scholar]
- Luisi B. F., Xu W. X., Otwinowski Z., Freedman L. P., Yamamoto K. R., Sigler P. B. Crystallographic analysis of the interaction of the glucocorticoid receptor with DNA. Nature. 1991 Aug 8;352(6335):497–505. doi: 10.1038/352497a0. [DOI] [PubMed] [Google Scholar]
- McLauchlan J., Gaffney D., Whitton J. L., Clements J. B. The consensus sequence YGTGTTYY located downstream from the AATAAA signal is required for efficient formation of mRNA 3' termini. Nucleic Acids Res. 1985 Feb 25;13(4):1347–1368. doi: 10.1093/nar/13.4.1347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mengeritsky G., Smith T. F. Recognition of characteristic patterns in sets of functionally equivalent DNA sequences. Comput Appl Biosci. 1987 Sep;3(3):223–227. doi: 10.1093/bioinformatics/3.3.223. [DOI] [PubMed] [Google Scholar]
- O'Neill M. C. Training back-propagation neural networks to define and detect DNA-binding sites. Nucleic Acids Res. 1991 Jan 25;19(2):313–318. doi: 10.1093/nar/19.2.313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quinn J. P., Farina A. R., Gardner K., Krutzsch H., Levens D. Multiple components are required for sequence recognition of the AP1 site in the gibbon ape leukemia virus enhancer. Mol Cell Biol. 1989 Nov;9(11):4713–4721. doi: 10.1128/mcb.9.11.4713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Risse G., Jooss K., Neuberg M., Brüller H. J., Müller R. Asymmetrical recognition of the palindromic AP1 binding site (TRE) by Fos protein complexes. EMBO J. 1989 Dec 1;8(12):3825–3832. doi: 10.1002/j.1460-2075.1989.tb08560.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ryden T. A., Beemon K. Avian retroviral long terminal repeats bind CCAAT/enhancer-binding protein. Mol Cell Biol. 1989 Mar;9(3):1155–1164. doi: 10.1128/mcb.9.3.1155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneider T. D., Stormo G. D., Gold L., Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol. 1986 Apr 5;188(3):415–431. doi: 10.1016/0022-2836(86)90165-8. [DOI] [PubMed] [Google Scholar]
- Schüle R., Umesono K., Mangelsdorf D. J., Bolado J., Pike J. W., Evans R. M. Jun-Fos and receptors for vitamins A and D recognize a common response element in the human osteocalcin gene. Cell. 1990 May 4;61(3):497–504. doi: 10.1016/0092-8674(90)90531-i. [DOI] [PubMed] [Google Scholar]
- Stormo G. D., Hartzell G. W., 3rd Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A. 1989 Feb;86(4):1183–1187. doi: 10.1073/pnas.86.4.1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strömstedt P. E., Poellinger L., Gustafsson J. A., Carlstedt-Duke J. The glucocorticoid receptor binds to a sequence overlapping the TATA box of the human osteocalcin promoter: a potential mechanism for negative regulation. Mol Cell Biol. 1991 Jun;11(6):3379–3383. doi: 10.1128/mcb.11.6.3379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsukiyama T., Niwa O., Yokoro K. Mechanism of suppression of the long terminal repeat of Moloney leukemia virus in mouse embryonal carcinoma cells. Mol Cell Biol. 1989 Nov;9(11):4670–4676. doi: 10.1128/mcb.9.11.4670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weiss E. A., Gilmartin G. M., Nevins J. R. Poly(A) site efficiency reflects the stability of complex formation involving the downstream element. EMBO J. 1991 Jan;10(1):215–219. doi: 10.1002/j.1460-2075.1991.tb07938.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Werner T., Brack-Werner R., Leib-Mösch C., Backhaus H., Erfle V., Hehlmann R. S71 is a phylogenetically distinct human endogenous retroviral element with structural and sequence homology to simian sarcoma virus (SSV). Virology. 1990 Jan;174(1):225–238. doi: 10.1016/0042-6822(90)90071-x. [DOI] [PubMed] [Google Scholar]
- Wingender E. Compilation of transcription regulating proteins. Nucleic Acids Res. 1988 Mar 25;16(5):1879–1902. doi: 10.1093/nar/16.5.1879. [DOI] [PMC free article] [PubMed] [Google Scholar]