Abstract
Natural oligopeptides may regulate nearly all vital processes. To date, the chemical structures of nearly 6000 oligopeptides have been identified from >1000 organisms representing all the biological kingdoms. We have compiled the known physical, chemical and biological properties of these oligopeptides—whether synthesized on ribosomes or by non-ribosomal enzymes—and have constructed an internet-accessible database, EROP-Moscow (Endogenous Regulatory OligoPeptides), which resides at http://erop.inbi.ras.ru. This database enables users to perform rapid searches via many key features of the oligopeptides, and to carry out statistical analysis of all the available information. The database lists only those oligopeptides whose chemical structures have been completely determined (directly or by translation from nucleotide sequences). It provides extensive links with the Swiss-Prot-TrEMBL peptide-protein database, as well as with the PubMed biomedical bibliographic database. EROP-Moscow also contains data on many oligopeptides that are absent from other convenient databases, and is designed for extended use in classifying new natural oligopeptides and for production of novel peptide pharmaceuticals.
INTRODUCTION
For more than a century, natural oligopeptides have attracted scientific attention (1) as biochemical regulators. The very first such oligopeptide, carnosine (β-ala-his), was discovered by Gulevitch and Amiradzhibi in 1900 (2), but its chemical structure was not determined until 1918 (3). Since that time, thousands oligopeptide regulators have been described, and now ∼500 new natural oligopeptides emerge annually, out of a literature of >20 000 publications each year on oligopeptide chemistry and biology.
Regulatory oligopeptides generally do not exceed ∼50 amino acid residues (4), and they differ substantially from larger polypeptides (proteins) in their physicochemical and biological properties. Specifically, the smaller peptides rarely possess strong enough intramolecular attractions to form stable globules (5), so they are able to shift configurations (6) and to fit themselves into specific receptor molecules, a process which is further aided by high diffusional mobility (7).
The terms ‘natural’ oligopeptide and ‘regulatory’ oligopeptide can be considered synonymous. Few integrated biological processes are known which are not regulated, or at least modulated, by small peptides. Such roles are especially well known in the regulatory organ systems, viz. nervous, endocrine and immune systems (1), but their functions extend well beyond the bounds of single organ systems or even of single biological species. Antimicrobial oligopeptides produced by prokaryotes themselves, for example, regulate competition for ecological niches and simultaneously function as signaling molecules for species-specific intercellular communication (8). And even eukaryotic oligopeptide toxins seem to play important roles in regulating interspecies reactions (9).
It has been clear for >15 years that detailed understanding of the complex regulatory processes involving oligopeptides requires a system for classifying these molecules and for cataloguing their major properties. Our first attempt at such a system, in 1991 (4,10), yielded the MS-DOS version of EROP-Moscow, which contained structures, functions and sources of the oligopeptides then known. That database, however, was not widely accessible to the research community. In the meanwhile, a number of extensive and highly utilized peptide–protein databases have been created (e.g. Swiss-Prot-TrEMBL), but their data on small natural peptides is far from comprehensive and constitutes only a small fraction of their total information, so that retrieving relevant oligopeptide data from them can be laborious and excessively time-consuming period.
Here we present an internet version of the compact specialized database EROP-Moscow, recreated to provide a comprehensive description of all presently known natural oligopeptides. Neuropeptides, peptide hormones, antimicrobial agents and toxins represent the largest functional classes. This database should now enable investigators to search easily for oligopeptides via a wide variety of different features, to compare their properties quickly, and to retrieve statistics about all relevant information in the database.
INFORMATION SOURCES
Since the objectives of an internet version of EROP-Moscow are to collect and disseminate all essential information about currently known oligopeptides, authenticity is of paramount importance. Therefore, all information in this database has been extracted directly from the primary sources, the great majority of which are publications in scientific journals. More than 100 journals in biochemistry, biophysics, physiology, genetics and general biology are being continuously screened and descriptions of the structures of newly found natural oligopeptides are being retrieved. In many of these articles, the authors compare the novel structures with known ones and provide references to publications not included in our systematic screening. Such publications then become an additional source of primary information for the EROP-Moscow. Finally, initial reports on novel oligopeptides are sometimes found in book chapters, patent descriptions and other protein-peptide databases, and these sources are used, as well, and are appropriately documented. The total number of useful sources for basic information on natural oligopeptides is now >250.
In addition to the client–server features of EROP-Moscow, a library of publications has been created, containing the actual primary descriptions of oligopeptides, along with their pdf files.
SELECTION OF OLIGOPEPTIDES
Only those oligopeptides are entered into EROP-Moscow, whose chemical structures have been completely determined (either directly or by translation from nucleotide sequences), and can be described by the standard single-letter amino acid code. Although most peptides included in this version of EROP-Moscow are formed by ribosomal synthesis, a small number formed by non-ribosomal enzymes (11)—mostly from bacteria and fungi—are also included, provided they comprise residues fitting the standard one-letter code. Oligopeptides with still ambiguous structures, such as asparagine/aspartic acid or glutamine/glutamic acid at single residues, have been deliberately excluded from this database, as have artificially synthesized molecules that are not found in nature.
ORGANIZATION OF EROP-MOSCOW
The EROP-Moscow database presents multilevel bioinformation via an HTML-based interface. This interface includes: Home page, Query page, Peptide page, Results page, Family page and Statistics pages (Figure 1). All of these pages contain internal EROP-Moscow links (including Home page, Site map, Contact us and Help) as well as links to external databases such as Swiss-Prot, Protein Identification Resource (PIR), PDB and PubMed.
The following programming elements, freely available on the basis of the GNU License, have been used as server software, and these are updated as new versions appear.
MySQL database server, version 4.0.14-max;
Apache web server, version 2.0.47, compiled with PHP support;
PHP language, version 4.0;
Remote management server with the HTML interface WebMin, version 1.0.70.
The basic operational unit of EROP-Moscow is an entry (= record). Each individual entry describes the physical, chemical and biological characteristics of one unique natural oligopeptide. Each entry is tagged by a unique accession number, beginning with character ‘E’ (from EROP) followed by five numerical digits.
Individual sequences which are found in multiple organisms, even taxonomically remote ones, are presented only once in EROP-Moscow, with the names of all known organisms possessing that oligopeptide being suspended.
On the other hand, oligopeptides existing in two or more distinct chemical modifications (usually accompanied by clear functional differences) are registered as separate entries and are assigned distinct names and accession numbers. Good examples of this are two natural chemical forms of gastrin: one having a simple tyrosine residue and the other, a sulfated tyrosine residue (12).
Home page
Users would normally enter EROP-Moscow via the site address http://erop.inbi.ras.ru. The Home page lists various information about the database itself, including the date of the most recent version, the current number of entries, on-going changes in the content (EROP-news), the list of database authors and some descriptive information. Home page is linked to the Query page and to Statistics pages for individual peptides. A Contact-us button facilitates ordinary E-mail messages and inqueries to the database manager.
Query page
The Query page, entered from Home page via the ‘Query page’ button, provides a rapid search for the peptide records signaled by specific characteristics. These characteristics are subdivided into the following groups: general information (such as oligopeptide name or accession number), organismic classification (including multiple trivial species names), physicochemical properties (such as partial amino acid sequences), biochemical or biologic functions and literature references. Query examples (single words, phrases or numbers) are provided adjacent to each Query window, and pull-down menus are provided with most query options. An ‘Abbreviations’ button, located near the Query window for amino acid sequence, elicits display of the standard one-letter code for amino acid residues, along with optional abbreviations.
Because some oligopeptides, particularly the smallest ones, are chemically modified at the N- and C-termini, six more symbols augment the standard one-letter code. These are:
‘+’ to denote +H2, which is the open N-terminus,
‘b’ for an acetyl residue or other chemical group at the N-terminus,
‘−’ to denote O−, which is the open C-terminus,
‘z’ for an amide bond at the C-terminus,
‘J’ to denote the pyroglutaminyl linkage, formed by an N-terminal glutamine, owing to side-chain reaction with the terminal amine residue (13,14), and finally,
‘U’ for the (occasional) aminoisobutyric acid residue.
Results page
After entering a search word or phrase on the Query page, the user should click on the ‘Submit query’ button, which initiates the search and returns with the Results page, containing a list of oligopeptides that meet the specified characteristics. Each item in this list will contain the preferred name of oligopeptide, the trivial and taxonomic names of organisms where the peptide has been identified, and the accession number. The accession number, in turn, links to the appropriate Peptide page (record). When the query returns only a single oligopeptide, its record opens immediately.
Peptide page
This page, reached via accession number, presents the collected data on each oligopeptide, including the number of amino acid residues, primary structure, precursors, known posttranslational modification(s), affinity to any definite structure–function family, taxon(s) of biological sources, tissue/cell localization in each organism, major known biological functions, molecular mass (Da), isoelectric point, pI (calculated and experimentally observed), literature sources and linking accession numbers (if any) in other peptide-protein databases (see above) or PubMed.
Family page
Tentative homologous family assignments, for each oligopeptide in EROP-Moscow, have been developed by sequence alignment, and the entire family can be reached from the Peptide page via a ‘View family’ button. Equally located amino acid residues are highlighted in red and the attached oligopeptide name for each sequence links back to the appropriate Peptide page.
Statistics pages
A special set of pages is devoted to the overall characteristics of data on oligopeptides listed in EROP-Moscow. The starting Statistics page is reached from Home page via the ‘EROP Statistics’ button, and it contains the list of statistical parameters compiled, each named parameter being a link to one of 15 additional Statistics pages (pp. 1–10.2). These in turn present graphic and tabular information on oligopeptides currently available in EROP-Moscow, information including:
a chronological diagram, by years, for decoding chemical structures of new oligopeptides (Figure 2),
size distribution of oligopeptides (number of amino acid residues; Figure 3),
current numerical yield of oligopeptides per taxonomic group,
total amino acid residue content of all listed oligopeptides,
relative contributions of international scientists, by home country, to the discovery of new oligopeptide structures,
organisms covered (>1000), and tissues and organs (>500),
functional classes of oligopeptides (∼100),
primary literature sources (>300), and
authors of the original publications devoted to decoding of oligopeptide chemical structures (>8000).
This statistical information demonstrates, for example, that the greatest number of natural oligopeptides have been identified in humans, especially in the human brain; to date neuropeptides represent the single largest functional class. The data also show that the most prolific journal for publication of new oligopeptides is the Journal of Biological Chemistry and that an American laboratory, that of J. M. Conlon, leads the discovery of new oligopeptides.
SERVICE MODULES
The EROP-Moscow database accesses a set of special software tools for alignment, and for calculating molecular masses and isoelectric points. These operations are executed outside the EROP-Moscow site, and results are returned to EROP-Moscow (as a part of its update capability) and displayed on the Peptide page and the Family page.
This system creates dynamic web pages using cgi-scripts written in the PHP language. In response to each appropriate user query, the required HTML-pages are generated interactively. Once the user's web browser has sent the HTTP query to the web server, the required script containing the database query is executed. After the survey of needed records, the PHP script dynamically generates the results, in the form of an HTML-page sent to the user's computer.
Graphic information is also generated and plotted dynamically, e.g. in the statistical processing of data, thus permitting online display of statistical summaries for all natural oligopeptides currently available in the EROP-Moscow database.
COMPARISON OF EROP-MOSCOW WITH OTHER PEPTIDE–PROTEIN DATABASES
The internet now provides free access to a large number of both generalized and specialized peptide-protein databases. Best known among the generalized databases are PIR (15) and Swiss-Prot (Swiss Protein), which is linked with the database of amino acid sequences translated from nucleotides TrEMBL (Translated, European Molecular Biology Laboratory) (16). Smaller databases, containing information about selected classes of oligopeptides, include Peptaibol (Peptide aminoisobuturic, for data on peptides possessing at least one residue of aminoisobutyric acid) (17), ANTIMIC (ANTIMICrobial, concerned with antimicrobial peptides) (18) and SCORPION (19), especially created for the peptide-protein toxins from a single order of arachnoids, the scorpions.
About half the natural oligopeptide structures now available in EROP-Moscow, however, are absent from the above-named databases, for several reasons, especially (i) that little attention is paid to oligopeptides formed by means of non-ribosomal or pure enzymatic synthesis, and (ii) that precursor-product series are not handled systematically. In particular, very many oligopeptides generated as natural fragments of large precursors are not specifically indexed in the above databases and can be found there only by their amino acid sequences. Swiss-Prot contains one record (P01019), e.g. on human angiotensinogen, which includes the amino acid sequences for angiotensins I and II (20), but omits angiotensins V and VI (21). EROP-Moscow contains these (records E00165 and E00166), as well as the more familiar oligopeptides.
In addition, EROP-Moscow lists a considerable number of oligopeptides with unique amino acid sequences that are simply not included in the other databases—owing either to the source journals (e.g. Biological Bulletin) being out of view of most database managers, or to failure in tracing primary sources keyed from secondary publications.
Information on very short oligopeptides is particularly deficient in the other major databases. Table 1 displays a useful comparison of these (di- to hepta-) peptides in EROP-Moscow, versus Swiss-Prot.
Table 1.
Amino acid residue number | Oligopeptide number | |
---|---|---|
Swiss-Prot-TrEMBL | EROP-Moscow | |
2 | 1 | 120 |
3 | 6 | 71 |
4 | 25 | 90 |
5 | 28 | 120 |
6 | 19 | 139 |
7 | 106 | 232 |
CONCLUDING REMARKS
The database EROP-Moscow has been developed for simple and rapid retrieval of information on natural regulatory oligopeptides and their structurally homologous families. In addition to solving true informational problems, EROP-Moscow can serve as a basis for new research and for elucidating general principles of structural and functional organization for these substances. For example, the current size distribution of oligopeptides, by number of amino acid residues (Figure 3), shows a numerical peak ∼8–10 residues, but this peak has no proper rationale at present. Study of structurally homologous families should facilitate prediction of the functional properties of newly found oligopeptide molecules, should provide bases for classifying newly discovered molecules and in turn should promote creation of novel, highly efficient pharmaceutics derived from the natural regulatory oligopeptides.
Because the discovery of new regulatory oligopeptides is a vigorous and continuing process (Figure 2), continuous revision and upgrading of EROP-Moscow will be essential. In this cause, we would ask users of EROP-Moscow to alert us to newly discovered oligopeptides which may not yet have been entered into the EROP-Moscow database. For this purpose, the Contact-us button on Home page should be convenient.
CITING EROP-MOSCOW
Users of the EROP-Moscow database are asked to cite this paper, in their relevant published research.
Acknowledgments
We thank Prof. Clifford Slayman, of Yale University, for fruitful discussions and comments on the manuscript, two anonymous reviewers for their valuable suggestions, and Margarita Il'ina for technical assistance. This work has been supported by Grant 02-07-90175 from the Russian Foundation for Basic Research (RFBR) and by the Program ‘Molecular and Cellular Biology’, RAS-10P (Russian Academy of Sciences). Funding to pay the Open Access publication charges for this article was provided by A.N. Bach Institute of Biochemistry, Russian Academy of Sciences.
Conflict of interest statement. None declared.
REFERENCES
- 1.Sewald N., Jakubke H.-D. Pepides: Chemistry and Biology. GmbH, Weinheim: WILEY-VCH Verlag; 2002. [Google Scholar]
- 2.Gulevitch V.S., Amiradzhibi S. Ueber das Carnosin, eine neue organishe Base des Fleischextrakt. Deutsch. Chem. Ges. 1900;33:1902–1903. [Google Scholar]
- 3.Baumann L., Ingwaldsen T. Conserning histidine and carnosine. J. Biol. Chem. 1918;35:263–276. [Google Scholar]
- 4.Zamyatnin A.A. EROP-Moscow specialized data bank for endogenous regulatory oligopeptides. Protein Seq. Data Anal. 1991;4:49–52. [PubMed] [Google Scholar]
- 5.Privalov P.L. Energy characteristics of the structure of protein molecules. Biophysics (Moscow) 1985;30:722–733. [PubMed] [Google Scholar]
- 6.Karle I.L. X-ray analysis conformation of peptides in the crystalline state. In: Gross E., Meienhofer J., editors. The Peptides: Analysis, Synthesis. NY: Academic Press; 1981. pp. 1–54. [Google Scholar]
- 7.Zamyatnin A.A. Biophysical problems of oligopeptide regulation. Biophysics (Moscow) 2003;48:950–958. [Google Scholar]
- 8.Woo P.C., To A.P., Lau S.K., Yuen K.Y. Facilitation of horizontal transfer of antimicrobial resistance by transformation of antibiotic-induced cell-wall-deficient bacteria. Med. Hypotheses. 2003;61:503–508. doi: 10.1016/s0306-9877(03)00205-6. [DOI] [PubMed] [Google Scholar]
- 9.Zamyatnin A.A. Physicochemical and biological features of endogenous oligopeptide toxins. Neirokhimia. 1996;13:243–259. (in Russian) [Google Scholar]
- 10.Zamyatnin A.A. Structural classification of endogenous regulatory oligopeptides. Protein Seq. Data Anal. 1991;4:53–56. [PubMed] [Google Scholar]
- 11.Egorov N.S., Silaev A.B., Katrukha G.S. Antibiotics-polypeptides (structure, function, biogenesis) Moscow: Moscow University Publishers; 1987. p. 263. [Google Scholar]
- 12.Bentley P.H., Kenner G.W., Sheppard R.C. Structures of human gastrins I and II. Nature. 1966;209:583–585. doi: 10.1038/209583b0. [DOI] [PubMed] [Google Scholar]
- 13.Kizer J.S., Busby W.H., Cottle C., Youngblood W.W. Glycine-directed peptide amidation: presence in rat brain of two enzymes that convert p-Glu-His-Pro-Gly-OH into p-Glu-His-Pro-NH2 (thyrotropin-releasing hormone) Proc. Natl Acad. Sci. 1984;81:3228–3232. doi: 10.1073/pnas.81.10.3228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bateman R.C., Yougblood W.W., Busby W.H., Kizer J.S. Nonenzymatic peptide alpha-amidation. Implications for a novel enzyme mechanism. J. Biol. Chem. 1985;260:9088–9091. [PubMed] [Google Scholar]
- 15.Barker W.C., Garavelli J.S., Hou Z., Huang H., Ledley R.S., McGarvey P.B., Mewes H.W., Orcutt B.C., Pfeiffer F., Tsugita A., et al. Protein information resource: a community resource for expert annotation of protein data. Nucleic Acid Res. 2001;29:29–32. doi: 10.1093/nar/29.1.29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I., et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acid Res. 2003;31:365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Whitmore L., Wallace B.A. The peptaibol database: a database for sequences and structures of naturally occurring peptaibols. Nucleic Acid Res. 2004;32:D593–D594. doi: 10.1093/nar/gkh077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Brahmachary M., Krishnan S.P.T., Koh J.L.Y., Khan A.M., Seah S.H., Tan T.W., Brusic V., Bajic A.V. ANTIMIC: a database of antimicrobial sequences. Nucleic Acid Res. 2004;32:D586–D589. doi: 10.1093/nar/gkh032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Srinivasan K.N., Gopalakrishnakone P., Tan P.T., Chew K.C., Cheng B., Kini R.M., Koh J.L., Seah S.H., Brusic V. SCORPION, a molecular database of scorpion toxins. Toxicon. 2002;40:23–31. doi: 10.1016/s0041-0101(01)00182-9. [DOI] [PubMed] [Google Scholar]
- 20.Skeggs L.T., Lentz K.L., Kahn J.R., Shumway N.P., Woods K.R. J. Exp. Med. 1956;104:193–197. doi: 10.1084/jem.104.2.193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Semple P.F., Boyd A.S., Daves P.M., Morton J.J. Angiotensin II and its heptapeptide (2–8), hexapeptide (3–8), and pentapeptide (4–8) metabolites in arterial and venous blood of man. Circ. Res. 1976;39:671–678. doi: 10.1161/01.res.39.5.671. [DOI] [PubMed] [Google Scholar]