Skip to main content
Bioinformation logoLink to Bioinformation
. 2011 Jun 23;6(7):250–254. doi: 10.6026/97320630006250

Characterization of Lovastatin biosynthetic cluster proteins in Aspergillus terreus strain ATCC 20542

Thankaswamy Kosalai Subazini 1,*, Gopal Ramesh Kumar 1
PMCID: PMC3124688  PMID: 21738324

Abstract

Aspergillus terreus is a filamentous ascomycota, which is prominent for its production of lovastatin, an antihypercholesterolemic drug. The commercial importance of lovastatin with annual sales of billions of dollars made us to focus on lovastatin biosynthetic cluster proteins. The analysis of these lovastatin biosynthetic cluster proteins with different perspectives such as physicochemical property, structure based analysis and functional studies were done to find out the role and function of every protein involved in the lovastatin biosynthesis pathway. Several computational tools are used to predict the physicochemical properties, secondary structural features, topology, patterns, domains and cellular location. There are 8 unidentified proteins in lovastatin biosynthetic cluster, in which 6 proteins have homologous partners, and annotation transfer is done based on the closely related homologous genes, and their structures are also modeled. The two other proteins that do not have homologous partners are predicted as PQ loop repeat protein that may be involved in glycosylation machinery and as thiolase-acyl activity by the integrated functional analysis approach.

Keywords: Aspergillus terreus, Functional annotation, HMG CoA reductase inhibitor, Lovastatin, Sequence analysis, Structure analysis

Background

Aspergillus terreus, an important fungus is a major source for lovastatin, which treats cardiovascular disease by effectively inhibiting HMG CoA reductase activity, the rate limiting step in cholesterol biosynthesis. In addition, lovastatin is also involved in anti-inflammatory antioxidant activities and has the ability to inhibit proliferation and induce apoptosis in a variety of tumor cell lines [1]. Experimental evidences from literature reveal the presence of 18 proteins involved in lovastatin biosynthesis of A. terreus, ATCC 20542. Out of the 18 lovastatin biosynthetic cluster proteins it is found that 2 proteins are involved in regulatory mechanisms, 3 in transportation, 9 enzymes, 2 unknown proteins of thiolases acyl-enzyme intermediate signature and PQ loop repeat, and 2 megasynthases, [2] Lovastatin Nonaketide Synthase (LNKS) and Lovastatin Diketide Synthase (LDKS). The insilico analysis of physicochemical properties along with their topology and the functional analyses of these proteins are done by integrating standard functional annotation tools available online based on different methodologies. The final verification of functions of unknown proteins is done by structure based analysis.

Materials and Methodology

The protein sequences involved in lovastatin biosynthesis in ATCC 20542 strains are retrieved from NCBI (www.ncbi.nlm.nih.gov) database.

Sequence analysis

Primary sequence analysis is done using sequence manipulation suite [3]. By using the suite the physicochemical properties like grand average of hydropathicity (GRAVY), pH, acidicity, basicity, Instability index, aliphatic index, extinction coefficient and molecular weight are calculated.

Secondary structure prediction

The secondary structure of protein features such as helical content, beta sheet formations and turns, loops, and coil regions are predicted by GOR [4], available within Antherprot [5]. A number of trans-membrane helices are also identified from the sequences.

Functional annotation

The functional annotation is done using different standard functional annotation tools available online such as KOGnitor [6] which incorporates orthologous information for eukaryotic sequence, Pfam [7], a tool that is based on protein families, ScanProsite [8] for identifying biologically significant sites, ProDom [9] for domain based analysis, and BLAST [10] for annotating function based on homology. SignalP [11] is used for determining whether the protein is secretory or non-secretory and ProtFun [12] for characterizing function by integrating post-translational and localization aspects of the protein.

Tertiary structure prediction

The 3D structures of the unknown proteins are predicted by using MODELLER9V8 [13]. The templates for unknown proteins, which do not have structural homologs in Protein Data Bank (PDB) [14] (through blastp analysis) is determined using mgenThreader [15]. The model of the proteins was subjected to Swiss-PDB Viewer [16] for refinement (of side chains, problematic loops, removal of amino acid clashes and energy minimization). Finally, the refined models were subjected for validation. Backbone conformations are evaluated by Psi/Phi Ramachandran plot obtained from SAVS [17] and the Z-Score is obtained from Prosa web server [18].

Results and Discussion

Primary sequence analysis

The primary sequence analysis revealed that the GRAVY indices of 13 lovastatin biosynthetic proteins are ranging from − 0.03 to -0.5, which indicates they are hydrophilic and mostly predicted to be localized in extracellular and mitochondrial regions, the other 5 proteins (PQ loop repeat containing protein, HMG-CoA reductase, the 2 Permeases of the major facilitator superfamily, tricarboxylate transport protein) are hydrophobic (GRAVY values range from 0.05 to 0.6) and are localized in plasma membrane and basic in nature with pH value greater than 7. There are 2 more basic proteins, cytochrome P450 monooxygenase and protein containing thiolase activity region. The rest of proteins are acidic in nature. The aliphatic index value of lovastatin cluster proteins ranges between 60°C and 110°C, indicating that the proteins are highly thermostable for wide temperature range. 11 proteins have instability index greater than 40 (Figure 1) and are unstable.

Figure 1.

Figure 1

Physicochemical analysis on lovastatin biosynthetic cluster proteins. The graph shows the correlation observed in aliphatic, instability index and molecular weight of lovastatin biosynthetic cluster proteins.

Secondary structure

The secondary structures of lovastatin biosynthetic cluster proteins revealed that all proteins in cluster have random coils dominated among secondary structure elements alpha helix, extended strand and beta turns.

Topology predictions

In lovastatin biosynthetic cluster, there are 4 protein sequences with transmembrane helices, in which 2 protein sequences (efflux pump (gi|4959951) and Major facilitator superfamily (gi|4959957)) are involved in membrane transport and has 12 transmembrane helices. The other two sequences with 6 transmembrane helices are enzymes predicted to be PQ loop repeat membrane protein (gi|4959944) and HMG-CoA reductase (gi|4959949).

Functional analysis

The ScanProsite results showed that presence of unknown protein sequence with accession number gi|4959953 has the thiolases acyl-enzyme intermediate signature with the pattern [LIVM]-[NST] -{T}-x-C-[SAGLI]-[ST]-[SAG]- [LIVMFYNS]-x-[STAG]-[LIVM ]-x(6)-[LIVM]. The BLAST results revealed that even though gi|4959959 is highly homologous with immunoglobulin I-set domain protein it is also homologous with Glycosyl hydrolase family 67 Nterminus with 98% query coverage and E-value 9e-86. The presence of sequons NXS/T also confirms that gi|4959959 codes for Glycosyl family protein. The sequons are verified by using Net-N-Glyc server [19] (Figure 2). Gi|4959944 is distantly homologous to uroporphyrinogen decarboxylase with query coverage of 33% and E-value of 0.11 is predicted as a PQ-loop repeat family protein. The other proteins are having close homology and their functional prediction is done by annotation transfer from homologous sequences. The patterns and signature of lovastatin biosynthetic proteins along with their predicted functional roles of ATCC strain are shown in Table 1 (see Table 1).

Figure 2.

Figure 2

Prediction of glycosylation sites for unknown protein sequence with accession number gi|4959959 using NetNGlyc Server. The vertical lines indicate the presence of 4 sequons crossing the threshold (horizontal line at 0.5) and 2 sequons crossing additional thresholds are predicted to be glycosylated.

3-D structure prediction and analysis

Homology based tertiary structure prediction for unknown proteins are done using Modeller so as to verify the function predicted. Totally 5 models are generated for each of the individual protein and the best model of each protein is refined using Swiss PDB Viewer. The refined structures are validated and the scores obtained are shown in Table 2 (see Table 2). More than 80% of residues in modeled structures obey Ramachandran plot, and Prosa Z- scores are negative, which indicates the predicted structure correlated with the structural features of PDB. Homology modeling for the unknown protein gi|4959944 failed, because of the lack of homologous structures. The 3- D structure of modeled protein gi|4959959 (Figure 3) with the template Beta- N-acetylhexosaminidase enzyme confirms that gi|4959959 belongs to Glycosyl hydrolase family.

Figure 3.

Figure 3

Modelled structure for unknown protein sequence with accession number gi|4959959. The unknown protein sequence of accession number gi|4959959 is modeled using Modeller and the function is predicted as beta-Nacetylhexosaminidases based on structural similarity. The N-glycosylation sites are represented by shaded regions in sequence and as stick in the 3-D structure.

Conclusion

The insilico analysis of lovastatin biosynthetic cluster protein to understand their physicochemical, structural and functional properties are performed. Functional analysis of eight unknown proteins performed by different function analysis tools based on domains, motifs and profiles improved the function annotation capability. The annotation revealed the presence of cis-aconitic acid decarboxylase of itaconic acid biosynthesis in lovastatin biosynthetic cluster, which shows that the two metabolites itaconic acid and lovastatin are related. Gi|4959953 has thiolase activity domain and regulation of it may affect the energy level of the cells in metabolic state. From this analysis it is also found that the sequences with accession number gi|4959944 and gi|4959959 may be involved in glycosylation machinery as tailoring enzyme and can be used in preparing analogues of Lovastatin. The incorporation of these enzymes with cytochrome P450s into heterologous hosts may help in large scale production of lovastatin in microbial fermentations.

Supplementary material

Data 1
97320630006250S1.pdf (40.8KB, pdf)

Acknowledgments

Subazini TK is thankful to CSIR, Govt of India, New Delhi for the award of Senior Research fellowship (SRF).

Footnotes

Citation:Subazini & Kumar, Bioinformation 6(7): 250-254 (2011)

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data 1
97320630006250S1.pdf (40.8KB, pdf)

Articles from Bioinformation are provided here courtesy of Biomedical Informatics Publishing Group

RESOURCES