Abstract
Various species of Ocimum have acquired special attention due to their medicinal properties. Different parts of the plant (root, stem, flower, leaves) are used in the treatment of a wide range of disorders from centuries. Experimental structures (X-ray and NMR) of proteins from different Ocimum species, are not yet available in the Protein Databank (PDB). These proteins play a key role in various metabolic pathways in Ocimum. 3D structures of the proteins are essential to determine most of their functions. Homology modeling approach was employed in order to derive structures for these proteins. A program meant for comparative modeling- Modeller 9v7 was utilized for the purpose. The modeled proteins were further validated by Prochek and Verify-3d and Errat servers. Amino acid composition and polarity of these proteins was determined by CLC-Protein Workbench tool. Expasy's Prot-param server and Cys_rec tool were used for physico-chemical and functional characterization of these proteins. Studies of secondary structure of these proteins were carried out by computational program, Profunc. Swiss-pdb viewer was used to visualize and analyze these homology derived structures. The structures are finally submitted in Protein Model Database, PMDB so that they become accessible to other users for further studies.
Keywords: Ocimum, Homology modeling, CLC protein work bench, Secondary structure prediction, Swiss-PDB Viewer
Background
Botanically, basil belongs to the genus Ocimum of the family Lamiaceae. More than 160 species of Ocimum are reported from different parts of the world. Different parts (roots, stem, leaves, seeds and flowers) of Ocimum have been used for treatment of variety of diseases such as bronchitis, malaria, diarrhea, dysentery, skin diseases, arthritis etc. Ocimum sp. contains monoterpene derivatives such as camphor, limonene, thymol, citral, geraniol and linalool. A detailed analysis of protein sequences from Ocimum, their probable structures and mode of action are yet to be accomplished. Plants synthesize chemicals in their leaves in order to protect themselves from herbivores. One such class of defense compounds that has been used extensively by humans are members of phenylpropenoid class namely eugenol, chavicol and their derivatives. It has been reported that in basil glands, two closely related (90% identical) enzymes chavicol o-methyltransferase (CVOMT) and eugenol o-methyltransferase (EOMT) catalyze the formation of methylchavicol and methyleugenol from chavicol and eugenol respectively [1]. The enzymes are involved in aroma production in basil. From an evolutionary perspective plant and microbial PALs (phenylalanine ammonia lyase) are part of superfamily of enzymes from plants, fungi and bacteria, and are likely derived from a precursor of the widespread histidine ammonia lyase (HAL) family in the histidine degradation pathway [2]. PAL catalyses the non-oxidative deamination of phenylalanine to trans-cinnamate and directs the carbon flow from the shikimate pathway to the various branches of the general phenylpropanoid metabolism. Lipooxygenase (Fatty-acid metabolism) is one of the most widely studied enzymes found in more than 60 species of plant and animal kingdom. The enzyme catalyses the biooxygenation of polyunsaturated fatty acids (PUFA) containing a cis, cis-1, 4-pentadiene unit to form conjugated hydroperoxydienoic acids. Lipoxygenase has considerable application in food related products such as in bread making. The enzyme plays a significant role in formation of secondary metabolites in sweet basil.
Enzymatic browning of fruits and vegetables is caused mainly by the conversion of native phenolic compounds to quinones which are then polymerized to brown, red or black pigments imparting colour to various plant parts. The enzymes responsible for catalyzing this sequence of reactions are termed as polyphenol oxidases, but are also known as tyrosinases, catecholases, cresolases and phenolases [3]. Because of deleterious effect of enzymatic browning on fruits and vegetables much work is devoted as to retard or at least delay the browning process. Polyphenol oxidase being the causative agent responsible for browning is exploited for the purpose. The enzyme is involved in iso-quinoline alkaloid biosynthesis and in biosynthesis of other secondary metabolites. In order to understand biochemical function and interaction properties of the protein at molecular level, three dimensional structure of protein is foremost requirement. However, the number of available protein sequences exceeds far behind the available three dimensional protein structures. In order to compensate this, homology modeling approach came into being. These methods are believed to be cost-effective and time-effective when compared to X-rays crystallography and NMR techniques. Computational methods make use of hidden information inside amino acid sequences in order to predict protein structure and function. In the present study, In silico analysis and homology modeling studies on uncharacterized proteins in different species of Ocimum like O. basilicum, O. tenuiflorum, O. citriodorum, O. seloi, O. gratissimum and O. americanum whose structures are not yet available in PDB have been accomplished.
Materials and Methodology
The amino acid sequences of secondary metabolite proteins of Ocimum whose structures are not yet available in RCSB Protein Databank (PDB) were retrieved from SWISSPROT, a public resource of curated protein sequences [4] and subjected to NCBI BLAST [5]. Based on high score, lower e-value and maximum sequence identity, the best template was selected which was then used as reference structure to build a 3D model. Template and target proteins considered for the study have been shown in (see Table 1).
Model building and evaluation
The three dimensional structures of proteins were modeled using Modeler 9v8 [6]. Quality of generated models was evaluated with PROCHECK [7] by Ramachandran plot analysis [8]. Stereochemical quality and accuracy of the selected models was further improved by subjecting it to energy minimization with the GROMOS 96 43B1 parameters set, implementation of Swiss-PDB Viewer [9]. Validation of generated models was further performed by VERIFY 3D [10] and ERRAT [11] programs. ProSA [12] was used for the analysis of Zscores and energy plots. The three dimensional structures of modeled proteins were analyzed using Deep View Swiss PDB viewer. Root Mean Square Deviation (RMSD) values were calculated between the set of targets and template protein to see how much modeled protein deviates from the template protein structure.
Computation of amino acid composition
Amino acid composition (see Table 2) of Ocimum proteins under study was calculated using CLC protein workbench tool (www.clcbio.com/protein). The tool also provides estimation of percentage of hydrophobic and hydrophilic residues present in the protein (see Table 3).
Physiochemical characterization
For physiochemical characterization, theoretical pI (isoelectric point), molecular weight, R and +R (total number of positive and negative residues), EI (extinction coefficient) [13], II (instability index [14]) [15], AI (aliphatic index) and GRAVY (grand average hydropathy) [16] were computed using the Expasy's ProtParam server [17] for set of proteins (http://us.expasy.org/tools/protparam.html). The results are shown in (see Table 4)
Functional characterization
CYS_REC (http://sunl.softberry.com/berry.phtml? topic) was used to locate “SS bond” between the pair of cystein residues, if present. The tool yields position of cysteins, total number of cysteins present and pattern, if present, of pairs in the protein sequence as output. All the Ocimum proteins under study showed absence of disulphide bonds. The results are presented in Table 5 (see Supplementary material).
Secondary structure prediction
Profunc [18] was employed for calculating the secondary structural features of Ocimum protein sequences. The results are presented in Table 6(see Supplementary material).
Submission of the modeled proteins in protein model database (PMDB)
The models generated for various Ocimum proteins were successfully submitted in Protein model database, PMDB [19] without any stereochemical errors. The submitted models can be accessed via their PMIDs (see Table 7 Supplementary material).
Results and Discussion
As experimental structures of some of the important secondary metabolite proteins of Ocimum are not available, homology modeling approach was used in order to derive their structures.
Model building, refinement and evaluation
PROCHECK analysis
Ramachandran plot for Chavicol O-methyltransferase (D3KYA1) has been illustrated in Figure 1. Altogether more than 90% of the residues were found to be in favoured and allowed regions, which validate the quality of homology models. The overall G-factor for D3KYA1 was 0.19. As the value is greater than the acceptable value 0.50, this suggests that the modeled structure is acceptable. The modeled structures were also validated by other structure verification servers such as Verify 3D and Errat. Verify 3D assigned a 3D-1D score of >0.2 for all the modeled proteins. This implies that the models are compatible with its sequence. ERRAT showed overall quality factor of 49.62 for D3KYA1. The plot generated by Verify-3D and Errat for Chavicol omethyltransferase has been illustrated in Figure 2A & 2B.
PROSA analysis
The z-score for all the modeled proteins was found to be within the range of scores typically found for native proteins of similar size showing good quality of the model. Energy Plot for chavicol o-methyltransferase (D3KYA1) with chain length (257 AA) and z-score (7.26) is presented in Figure 3A & 3B.
Swiss-PDB viewer analysis of predicted model
Visualization and analysis of the model using Swiss-PDB reveals that there are no steric hindrances between the residues and thus modeled structures are stable. Structure-structure superimposition was done in order to calculate Root Mean Square Deviation (RMSD) between the target and template sequence. RMSD values for D3KYA1 were found to be 0.94. This implies good quality of the modeled structures. Figure 4 represents modeled structure of Chavicol o-methyltransferase.
Physiochemical characterization
The physiochemical parameters viz., theoretical isoelectric point (Ip), molecular weight, total number of positive and negative residues, extinction coefficient, half-life, instability index, aliphatic index and grand average hydropathy (GRAVY) were computed using the Expasy's ProtParam tool (Table 4). The computed pI value for A8D7D8, B2ZA17, B6VQV5, B6VQV6, D3KYA1 (pI<7) indicated their acidic nature, whereas pI for A8D6D7, B2ZA12, B2ZA16 (pI>7) revealed there basic behaviour. The computed isoelectric point (pI) will be useful for developing buffer system for purification by isoelectric focusing method. Extinction coefficient values for Ocimum proteins at 280 nm ranged from 1490 to 50795 M-1cm-1 for B6VQV6 and D3KYA1 indicating the presence of higher concentration of Tyr and Trp. Cys was very low in concentration in all the eight Ocimum proteins studied. This indicates that these proteins cannot be analyzed using UV spectral methods. On the basis of instability index Expasy's ProtParam classified the B2ZA17 (Eugenol o-methyltransferase), A8D7D8 (Lipoxygenase), B2ZA12 (Eugenol o-methyltransferase) and B2ZA16 (Eugenol o-methyltransferase) proteins as unstable (Instability index>40) and other Ocimum proteins as stable (Instability index<40). The aliphatic index (AI) which is defined as the relative volume of a protein occupied by aliphatic side chain is regarded as the positive factor for the increase of thermal stability of globular proteins. The very high aliphatic index of all Ocimum proteins infers that these proteins may be stable for a wide range of temperature. The very low GRAVY index of proteins B6VQV6 and D3KYA1 infers that these proteins could result in a better interaction with water.
Functional characterization
The result of primary analysis suggests that all the Ocimum proteins under study were hydrophobic in nature due to the presence of high non-polar residues content (Table 2 & 3). As percentage of Cysteine(C) is very low in all the Ocimum proteins under study (Table 2), none of these proteins have disulphide bond linkages, as indicated by CYS_REC result (Table 4). The extensive hydrogen bonding may provide stability to these proteins in absence of disulphide bonds. Proteins B2ZA12, A8D6D7 and B6VQV5 have high percentage of methionine(M), alanine(A), leucine(L) and lysine(K). As these amino acids have high helix-forming propensities, alpha helix are dominant in these proteins. This is also evident from analysis of PROFUNC result (Table 6). Rest of the Ocimum proteins had mixed secondary structures i.e. alphahelices, beta-strands and coils. All the proteins showed high percentage of glycine and proline (Table 2). As these amino acids are common in turns, other secondary structures such as Beta turns and Gamma turns are dominant in these proteins (Table 6).
Submission of modeled proteins in PMDB
The modeled structures of proteins from various species of Ocimum were successfully deposited in Protein Model Database (PMDB). The PMDB ID for the submitted structures has been presented in (see Table 7). These 3D structures may be further used in characterizing the protein experimentally.
Conclusion
In this study proteins from various species of Ocimum were modeled using homology modeling approach. Different parameters such as isoelectric point, molecular weight, total number of positive and negative residues, extinction coefficient, instability index, aliphatic index and grand average hydropathy (GRAVY) were computed for these proteins in order to determine their physiochemical characteristics. All the proteins were found to be deficient in amino acid cystein, and therefore lack presence of disulphide linkages as also inferred from analysis of cys_rec result. In the absence of disulphide bond, extensive hydrogen bonding is believed to be responsible for stability of these proteins. Polarity studies using CLC protein work bench tool confirmed all the studied proteins to be hydrophobic in nature. This may be due to the presence of a large number of non-polar residues. Secondary structure studies showed that all the studied proteins contain high proportion of other secondary structures ie. Beta-turns and Gamma-turns. This is attributed to the presence of higher concentration of proline and glycine residues. The modeled structures can be accessed through protein model database PMDB via there PMID's. Homology derived models are extensively used in wide range of applications such as virtual screening, site-directed mutagenesis experiments or in rationalizing the effects of sequence variation. These structures will serve as cornerstone for functional analysis of experimentally derived crystal structures.
Supplementary material
Acknowledgments
Financial support of Department of Biotechnology (DBT) Govt of India; New Delhi under BTISnet program is gratefully acknowledged. Sudeep Roy is thankful to Council of Scientific and Industrial Research for Senior Research Fellowship.
Footnotes
Citation:Roy et al, Bioinformation 6(8): 315-319 (2011)
References
- 1.T Koeduka, et al. Plant Physiol. 2009;149:384. doi: 10.1104/pp.108.128066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.D Rother, et al. Eur J Biochem. 2002;269:3065. doi: 10.1046/j.1432-1033.2002.02984.x. [DOI] [PubMed] [Google Scholar]
- 3.AM Mayer, E Harel. Phytochemistry. 1979;18:193. [Google Scholar]
- 4.A Bairoch, R Apweiler. Nucleic Acids Res. 2000;28:45. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.SF Altschul, et al. J Mol Biol. 1990;215:403. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 6.A Sali, TL Blundell. J Mol Biol. 1993;234:779. doi: 10.1006/jmbi.1993.1626. [DOI] [PubMed] [Google Scholar]
- 7.RA Laskowski, et al. J Appl Cryst. 1993;26:283. [Google Scholar]
- 8.GN Ramachandran, et al. J Mol Biol. 1963;7:95. [Google Scholar]
- 9.W Kaplan, TG Littlejohn. Brief Bioinform. 2001;2:195. doi: 10.1093/bib/2.2.195. [DOI] [PubMed] [Google Scholar]
- 10.D Eisenberg, et al. Methods Enzymol. 1997;277:396. doi: 10.1016/s0076-6879(97)77022-8. [DOI] [PubMed] [Google Scholar]
- 11.VC Colovos, TO Yeates. Protein Sci. 1993;2:1511. [Google Scholar]
- 12.MJ Sippl. Proteins. 1993;17:355. doi: 10.1002/prot.340170404. [DOI] [PubMed] [Google Scholar]
- 13.SC Gill, PH von Hippel. Anal Biochem. 1989;182:319. doi: 10.1016/0003-2697(89)90602-7. [DOI] [PubMed] [Google Scholar]
- 14.K Guruprasad, et al. Protein Eng. 1990;4:155. doi: 10.1093/protein/4.2.155. [DOI] [PubMed] [Google Scholar]
- 15.A Ikai, et al. J Biochem. 1980;88:1895. [PubMed] [Google Scholar]
- 16.J Kyte, RF Doolittle. J Mol Biol. 1982;157:105. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
- 17.MR Wilkins, et al. Methods Mol Biol. 1999;112:531. doi: 10.1385/1-59259-584-7:531. [DOI] [PubMed] [Google Scholar]
- 18.RA Laskowski, et al. Nucleic Acids Res. 2005;33:W89. [Google Scholar]
- 19.T Castrignanò, et al. Nucleic Acids Res. 2006;34:D306. doi: 10.1093/nar/gkj105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.