Abstract
The cataloguing of marine chemicals is a fundamental aspect for bioprospecting. This has applications in the development of drugs from marine sources. A publicly accessible database that provides comprehensive information about these compounds is therefore helpful. The Seaweed Metabolite Database (SWMD) is designed to provide information about the known compounds and their biological activity described in the literature. Geographical origin of the seaweed, extraction method and the chemical descriptors of each the compounds are recorded to enable effective chemo-informatics analysis. Crosslinks to other databases are also introduced to facilitate the access of information about 3D Structure by X-ray and NMR activity, drug properties and related literature for each compound. This database currently contains entries for 517 compounds encompassing 25 descriptive fields mostly from the Red algae of the genus Laurencia (Ceramiales, Rhodomelaceae). The customized search engine of this database will enable wildcard querying, which includes Accession Number, Compound type, Seaweed Binomial name, IUPAC name, SMILES notation or InChI.
Availability
The database is available for free at http://www.swmd.co.in
Keywords: Seaweed, bioactive compounds, database, Laurencia, chemo-informatics
Background
Marine chemicals have novel structures with pronounced biological activity and pharmacology. The study of such chemicals therefore is promising. High throughput screening of marine metabolites for a given drug target can be achieved only if natural compounds are available as a database. Creating a database of natural products and sharing it with huge scientific community facilitates the understanding of basic mechanism of compounds and can reduce the timeline in drug discovery [1]. A publicly accessible database that provides comprehensive information about these compounds is therefore helpful to the relevant communities.
Seaweeds are among the first marine organisms chemically analyzed, with more than 3,600 articles published describing 3,300 secondary metabolites from marine plants and algae, and they still remain an almost endless source of new bioactive compounds. This database is focused on bioactive compounds that target the pharmaceutical market, along with the spectrum of biological activities (Table 1Table 1). Among macroalgae, significantly more rich in secondary metabolites appear the brown and red algae, with the latter being the top producers of halogenated metabolites. Red algae of the genus Laurencia (Ceramiales, Rhodomelaceae) are some of the most prolific producers of secondary metabolites in the marine environment. Secondary metabolites from these algae are predominantly sesquiterpenes, diterpenes, triterpenes and C15-acetogenins, characterized by the presence of halogen atoms in their chemical structures. Most Laurencia species accumulate a characteristic major metabolite or a class of compounds not widely distributed within the genus [2].
Database Structure
The entries of this database are generated from a text mining of published articles. Our database currently contains 356 entries of compounds found from literature. SWMD is designed in MySQL 5.1.36 and PHP 5.3.0. These compounds cover 37 different species of Laurencia and other genera, which is shown is Table 2A, 2B respectively. Geographical origin and extraction method directed for each of these compounds were searched and included in the database along with the biological activity exhibited.
Compounds in SWMD are annotated by molecular property. These include molecular weight, Monoisotopic Mass, Molar Refractivity, number of rotatable bonds, calculated LogP, number of hydrogen-bond donors, number of hydrogen-bond acceptors, Polar Surface Area and Van der Waals surface area. The chemical descriptors and 3D structure for each compound were calculated using Marvinsketch [3] and Chemsketch [4], respectively. Lipophilicity or calculated LogP is predicted using ALOGPS 2.1 program [5]. For molecular visualization, the user needs the free Chime-Plugin from MDL (available for Windows, SGI, Mac) or the Java2 Runtime Environment.
The SWMD database web interface is shown in Figure 1. This database is searchable by Accession Number, Compound type, Seaweed Binomial name, IUPAC name, SMILES notation or InChI. The search is case insensitive. In a query, a user can specify full name or any part of the name in a text field. Wild characters of ‘%’ and ‘_’ are supported in text field.
Database features
SWMD has a web interface at http://www.swmd.co.in. The database is unique in providing comprehensive information of compounds from seaweeds via 25 descriptive fields. Each entry in the database is categorized into sections such as General information, Structure information, Predicted properties and Bibliographic references. The general information part of the database entry displays the compound’s unique SWMD accession number viz. XY123 where X represents the Macroalgae ‐ Brown, Green and Red by B, G and R respectively and Y represent the first letter of the genus. It also encompasses compound type, and an external links to the compound's PUBCHEM ID and Chemspider ID (if available) are provided. Binomial name is followed by geographical origin and biological activity which was curated from literature sources. The Structure of the compound, its name in IUPAC, SMILES notation and InChI are displayed in structure information along with atomic coordinates in MOL and PDB format which can be downloaded for 3D molecular visualization [6]. The predicted properties display the pre-computed chemical descriptors of the compounds and reference section lists the citations relevant to the respective compounds with external links to PubMed if available.
SWMD currently contains entries for 517 compounds encompassing 25 descriptive fields mostly from the Red algae of the genus Laurencia (Ceramiales, Rhodomelaceae) (Table 2 in Table 2). The number of compounds in SWMD is growing, and the numbers reported here should be considered a representative snapshot; see the Web-page for up-to-date statistics. Of these 517 compounds, 331 are Lipinski compliant [7], with the caveat that we have used ALOGPS 2.1 program [5] as a surrogate for c Log P between -3.5 and 5, MW ≤ 500 g•mol-1, a maximum of 10 H bond acceptors and 5 H bond donors. Of these, 107 are “lead-like” molecules [8, 9] , which have MW = 150-350 g•mol-1, c Log P ≫ 4, H bond donors ≤ 3, and H bond acceptors ≤ 6. A total of 27 molecules are “fragment-like” [10] with c Log P between ‐ 2 and 3, MW ≫ 250 g•mol-1, H bond donors ≫ 3, H bond acceptors ≫ 6 and rotatable bonds ≫ 3 (Figure 2).
Conclusion and Future Perspectives
The data presented in SWMD can be effectively used for chemoinformatics studies like QSAR analysis, pharmacophore search, molecular docking etc. pertaining to drug discovery. It also portrays the span of secondary metabolites available in seaweeds and the need to preserve the perishing marine ecosystem. The database will be extended with more data on molecular interactions, embedded interactive visualization tools and additional chemical descriptors. The users are also welcome to contribute relevant data to the database via email to authors. The dataset and web interface shall be upgraded periodically.
Supplementary material
Footnotes
Citation:Davis & Vasanthi, Bioinformation 5(8): 361-364 (2011)
References
- 1.Lazo JS, et al. Mol Pharmacol. 2007;72:1. doi: 10.1124/mol.107.035113. [DOI] [PubMed] [Google Scholar]
- 2.Fenical W. Phytochem. 1976;15:511. [Google Scholar]
- 3.Csizmadia F. J Chem Inf Comput Sci. 2000;40:323. doi: 10.1021/ci9902696. [DOI] [PubMed] [Google Scholar]
- 4. http://www.acdlabs.com.
- 5.Tetko IV, Tanchuk VY. J Chem Inf Comput Sci. 2002;42:1136. doi: 10.1021/ci025515j. [DOI] [PubMed] [Google Scholar]
- 6.Vetrivel U, et al. Bioinformation. 2009;4(2):71. doi: 10.6026/97320630004071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lipinski CA. J Pharmacol Toxicol Methods. 2000;44:235. doi: 10.1016/s1056-8719(00)00107-6. [DOI] [PubMed] [Google Scholar]
- 8.Schneider G. Curr Med Chem. 2002;9:2095. doi: 10.2174/0929867023368755. [DOI] [PubMed] [Google Scholar]
- 9.Oprea TI. J Comput-Aided Mol Des. 2002;16:325. doi: 10.1023/a:1020877402759. [DOI] [PubMed] [Google Scholar]
- 10.Verdonk ML, et al. Proteins. 2003;52:609. doi: 10.1002/prot.10465. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.