Abstract
Using techniques from optimization theory, we have developed a computer program that approximates a desired probability distribution for amino acids by imposing a probability distribution on the four nucleotides in each of the three codon positions. These base probabilities allow for the generation of biased codons for use in mutational studies and in the design of biologically encoded libraries. The dependencies between codons in the genetic code often makes the exact generation of the desired probability distribution for amino acids impossible. Compromises are often necessary. The program, therefore, not only solves for the "optimal" approximation to the desired distribution (where the definition of "optimal" is influenced by several types of parameters entered by the user), but also solves for a number of "sub-optimal" solutions that are classified into families of similar solutions. A representative of each family is presented to the program user, who can then choose the type of approximation that is best for the intended application. The Combinatorial Codons program is available for use over the web from http://www.wi.mit.edu/kim/computing.html.
Full Text
The Full Text of this article is available as a PDF (1,022.6 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Dalphin M. E., Brown C. M., Stockwell P. A., Tate W. P. The translational signal database, TransTerm, is now a relational database. Nucleic Acids Res. 1998 Jan 1;26(1):335–337. doi: 10.1093/nar/26.1.335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu J. C., Newell N. E., Tidor B., Sauer R. T. Probing the roles of residues at the e and g positions of the GCN4 leucine zipper by combinatorial mutagenesis. Protein Sci. 1993 Jul;2(7):1072–1084. doi: 10.1002/pro.5560020701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kane J. F. Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Curr Opin Biotechnol. 1995 Oct;6(5):494–500. doi: 10.1016/0958-1669(95)80082-4. [DOI] [PubMed] [Google Scholar]
- LaBean T. H., Kauffman S. A. Design of synthetic gene libraries encoding random sequence proteins with desired ensemble characteristics. Protein Sci. 1993 Aug;2(8):1249–1254. doi: 10.1002/pro.5560020807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pu W. T., Struhl K. Dimerization of leucine zippers analyzed by random selection. Nucleic Acids Res. 1993 Sep 11;21(18):4348–4355. doi: 10.1093/nar/21.18.4348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reidhaar-Olson J. F., Sauer R. T. Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science. 1988 Jul 1;241(4861):53–57. doi: 10.1126/science.3388019. [DOI] [PubMed] [Google Scholar]
- Richards F. M. The interpretation of protein structures: total volume, group volume distributions and packing density. J Mol Biol. 1974 Jan 5;82(1):1–14. doi: 10.1016/0022-2836(74)90570-1. [DOI] [PubMed] [Google Scholar]
- Rose G. D., Geselowitz A. R., Lesser G. J., Lee R. H., Zehfus M. H. Hydrophobicity of amino acid residues in globular proteins. Science. 1985 Aug 30;229(4716):834–838. doi: 10.1126/science.4023714. [DOI] [PubMed] [Google Scholar]
- Taylor W. R. Identification of protein sequence homology by consensus template alignment. J Mol Biol. 1986 Mar 20;188(2):233–258. doi: 10.1016/0022-2836(86)90308-6. [DOI] [PubMed] [Google Scholar]
- West M. W., Hecht M. H. Binary patterning of polar and nonpolar amino acids in the sequences and structures of native proteins. Protein Sci. 1995 Oct;4(10):2032–2039. doi: 10.1002/pro.5560041008. [DOI] [PMC free article] [PubMed] [Google Scholar]