Skip to main content
Bioinformation logoLink to Bioinformation
. 2015 Aug 31;11(8):407–412. doi: 10.6026/97320630011407

Codon bias and gene expression of mitochondrial ND2 gene in chordates

Arif Uddin 1, Tarikul Huda Mazumder 1, Monisha Nath Choudhury 1, Supriyo Chakraborty 1,*
PMCID: PMC4574124  PMID: 26420922

Abstract

Background: Mitochondrial ND gene, which encodes NADH dehydrogenase, is the first enzyme of the mitochondrial electron transport chain. Leigh syndrome, a neurodegenerative disease caused by mutation in the ND2 gene (T4681C), is associated with bilateral symmetric lesions in basal ganglia and subcortical brain regions. Therefore, it is of interest to analyze mitochondrial DNA to glean information for evolutionary relationship. This study highlights on the analysis of compositional dynamics and selection pressure in shaping the codon usage patterns in the coding sequence of MT-ND2 gene across pisces, aves and mammals by using bioinformatics tools like effective number of codons (ENC), codon adaptation index (CAI), relative synonymous codon usage (RSCU) etc. Results: We observed a low codon usage bias as reflected by high ENC values in MT-ND2 gene among pisces, aves and mammals. The most frequently used codons were ending with A/C at the 3rd position of codon and the gene was AT rich in all the three classes. The codons TCA, CTA, CGA and TGA were over represented in all three classes. The F1 correspondence showed significant positive correlation with G, T3 and CAI while the F2 axis showed significant negative correlation with A and T but significant positive correlation with G, C, G3, C3, ENC, GC, GC1, GC2 and GC3. Conclusions: The codon usage bias in MTND2 gene is not associated with expression level. Mutation pressure and natural selection affect the codon usage pattern in MT-ND 2 gene.

Keywords: Codon usage, MT-ND2 gene, natural selection, mutation pressure

Background

The mitochondrial genome is ideal as the molecular marker for species identification as well as systematic phylogenetic studies due to its small size. It is easily amplified and mostly conserved in gene content and characterized by lack of recombination, maternal inheritance and high evolutionary rate [1]. The respiratory chains of mitochondrial genome comprise of four complexes (complex, I-IV) and are encoded by 37 genes consisting of two ribosomal RNA (rRNA), twenty-two transfer RNA (tRNA) and thirteen protein coding genes. The complex-I of mitochondrial respiratory chain includes the first enzyme NADH dehydrogenase and its seven subunits (ND1-6 & ND4L) play a pivotal role in diverse pathological processes [2]. The subunit 2 of NADH dehydrogenase is encoded by ND2 gene and its function is not yet fully understood. However, literature suggests that a mutation in the ND2 (T4681C) gene was found in patients with Leigh syndrome, a neurodegenerative disease characterized by bilateral symmetric lesions in basal ganglia and subcortical brain regions [3].

Urrutia and Hurst (2003) reported that the codon usage in human is positively related to gene expression but is inversely related to the rate of synonymous substitution [4]. Several genomic factors such as gene expression level, protein secondary structure, and translational preferences balancing between the mutational pressure and natural selection contribute to the synonymous codon usage variation in different organisms [5, 6]. Therefore, gaining the information on the synonymous codon usage pattern provides significant insights pertaining to the prediction, classification, and evolution of a gene at molecular level and also helps in designing highly expressed genes. In the present study we have carried out a comparative analysis of the ND2 gene codon usage and codon context patterns among the mitochondrial genomes of three chordate classes (pisces, aves and mammals) in order to understand the molecular mechanism along with functional conservation of gene expression during the period of evolution using several bioinformatics tools.

Methodology

Retrieval of Sequence data:

The coding sequences (cds) of MT-ND2 gene from five species of pisces, aves and mammals each were retrieved from National Center for Biotechnology Information, USA ( http://www.ncbi.nlm.nih.gov/) using the following accession numbers. The accession numbers of different species are AP006806, AP006813, AP006778, AP006825, AP006858, X52392, AF090337, AF090341, AF090338, AF090340, U96639, AJ001562, X14848, Y11832 and AJ001588. A perl programme was used to analyze the compositional features and codon usage bias parameters.

Compositional properties:

The overall composition of A, T, G, C bases and its composition at 3rd position along with GC, GC1, GC2 and GC3 contents were calculated using the perl script.

Codon adaptation Index (CAI):

Codon adaptation index (CAI) is used to estimate gene expression level. The CAI is calculated as

CAI=exp(1LΣκ=1L1nωκ)

Where, ωκ is the relative adaptiveness of the κth codon and L is the number of synonymous codons in the gene [7].

Effective Number of Codons (ENC):

The effective number of codons (ENC) is the most extensively used parameter to measure the usage bias of the synonymous codons [8]. The ENC value ranges from 20 (when only one codon is used for each amino acid) to 61 (when all codons are used randomly). It is calculated as:

ENC=2+9F2+1F3+5F4+3F6

Where Fκ(κ= 2,3,4,6) is the mean of Fκ values for the κ-fold degenerate amino acids.

Relative Synonymous Codon Usage (RSCU):

Relative synonymous codon usage was calculated as the ratio of the observed frequency of a codon to its expected frequency if all the synonymous codons of a particular amino acid are used equally [9]. The RSCU value is calculated using the formula

RSCUij=Xij1niΣj=1niXij

where, Xij is the frequency of occurrence of the jth codon for ith amino acid (any Xij with a value of zero is arbitrarily assigned a value of 0.5) and ni is the number of codons for the ith amino acid (ith codon family).

GRAVY:

GRAVY (Grand Average of Hydropathicity) values are the sum of the hydropathy values of all the amino acids in the encoded protein of the gene divided by the number of residues in the sequence [10].

Aromaticity (Aromo):

Aromo refers to the frequency of aromatic amino acids (Phe, Tyr, Trp) in the translated gene product [11].

Correspondence analysis (COA):

Correspondence analysis is a multivariate statistical method used to study the major trends in synonymous codon usage variation in coding sequences and distributes the codons in axis1 and axis2 with these trends [12].

Software used:

Novel software developed by SC (corresponding author) using Perl script was used to calculate all the codon usage bias parameters and nucleotide composition. The genetic code of vertebrate mitochondria having 60 sense codons available in NCBI database was used for the present analysis. The RSCU values of each codon from different species were clustered by hierarchal clustering method using XLSTAT.

Statistical analysis:

Correlation analysis was used to identify the relationship between overall nucleotide composition and each base at 3rd codon position. All the statistical analyses were done using the SPSS software.

Results & Discussion

The overall nucleotide compositions in the coding sequence of MT-ND2 gene among pisces, aves and mammals were analyzed Table 1 (see supplementary material). Our results showed that the nucleobase C was the highest (%) in pisces and aves but the nucleobase A was the highest in mammals whereas G was the lowest in pisces, aves and mammals. For the 3rd position of codon, A3 was the highest in pisces, aves and mammals but G3 the lowest. This clearly indicates that compositional constraint might influence the codon usage pattern of MT-ND2 gene [13].

The effective number of codon (ENC) values for MT-ND2 gene among pisces, aves and mammals were estimated Table 2 (see supplementary material). The effective number of codon (ENC) is a non directional measure of codon usage bias. Its value ranges from 20-61. ENC value 20 indicates that for each amino acid only one codon is used (extreme bias) and 61 means all codons equally encode the same amino acids (no bias) [8]. We observed that the ENC value in MT-ND2 gene was (Mean±SD) 57±2.91, 59±0.44 and 55±1.58 among pisces, aves and mammals, reflecting a weak codon bias which was similar to the findings of Jia X et al, 2015 in B.mori [14]. It was also found that the overall GC % was less than 50% and the gene was AT rich. This phenomenon was also reported in AT rich species such as Plasmodium falciparum [15].

We calculated the codon adaptation index (CAI) values for MTND2 gene in order to find out the expression level among pisces, aves and mammals (Figure 1). In our analysis, the CAI values were (Mean±SD) 0.7851±0.05, 0.7667±0.05, 0.7635±0.02 in pisces, aves and mammals, respectively. We used unpaired t test between pisces and aves as well as between pisces and mammals but the difference was not statistically significant. Wei et.al 2014, also reported the average value of CAI in mitochondrial protein coding genes ranged from 0.5-0.7 in B.mori [16]. In addition, we performed a correlation analysis between ENC and gene expression level as measured by CAI and found no significant relationship suggesting that the codon usage bias in MT-ND2 gene is not associated with expression level among the three classes.

Figure 1.

Figure 1

Distribution of CAI in MT-ND2 gene among different species

Moreover, we calculated the relative synonymous codon usage (RSCU) values in the coding sequences of MT-ND2 gene among pisces, aves and mammals Table 3 (see supplementary material). In our analysis, the RSCU value > 1 means the codon is more frequently used, RSCU value <1 as less frequently used codon. But RSCU value >1.6 means over represented codon whereas the RSCU value <0.06 as under represented codon. The RSCU values of 60 codons also indicated that the codon usage bias of MT-ND2 gene is low. Approximately half of the codons were frequently used i.e. 30 codons in pisces, 28 in aves and 21 in mammals. In all three classes the most frequent codons ended with A or C at the 3rd codon position (A/T ended: G/C ended = 16:14 in pisces, 13: 15 in aves and 14: 7 in mammals). The RSCU values were analyzed using heat map (Figure 2) which represented the more frequently, less frequently, over-represented and under-represented codons. The results showed that the most preferred codons were TTC, CTA and TAC in all the three classes. In addition, the four codons namely TCA, CTA, CGA and TGA encoding the amino acid serine, leucine, arginine and tryptophan respectively were over represented in MT-ND2 gene among pisces, aves and mammals. This result suggests that compositional features played an important role in codon usage in MT-ND2 gene [13].

Figure 2.

Figure 2

Hierarchal clustering of RSCU value in different species of MT-ND2 gene

The overall percentage of GC contents at different codon positions were calculated (see supplementary material Table S2). In order to find out the role of mutation pressure and natural selection, we constructed a neutrality plot of GC12 against GC3 (Figure 3, a-c) [17]. The linear regression coefficient of GC12 on GC3 indicated that natural selection plays a major role while mutation pressure plays a minor role in shaping the codon usage patterns in MT-ND2 gene. Our result was similar to the findings of Wei et.al (2014) in the mitochondrial DNA codon usage analysis of B.mori [16].

Figure 3.

Figure 3

Neutrality plot of GC12 versus GC3 in (a) Pisces (b) Aves (c) mammals. GC12: average of GC1 and GC2.

We performed correspondence analysis (CoA) based on RSCU values to analyze the codon usage variation in MT-ND2 gene among pisces, aves and mammals. In our analysis, the 1st axis (F1) accounted for 34.50% of the total variation and the 2nd axis accounted for 12.51% of the total variation (Figure 4). Further, correlation analysis was done to determine the interrelationships between the first two principle axes (F1 and F2), nucleotide constraints and indices of natural selection (CAI, Gravy, Aromo) on MT-ND2 gene. The F1 axis showed significant positive correlation with G, T3 and CAI whereas the F2 showed significant positive correlation with G, C, G3, C3, ENC, GC and GC1-3 but significant negative correlation with A and T Table 4 (see supplementary material). These results suggest both compositional constraint under mutation pressure and natural selection affect the codon usage pattern in MTND2. The results were similar to the findings of Butt et.al. [18].

Figure 4.

Figure 4

Correspondence analysis of the synonymous codon usage in MT-ND2 gene. The analysis was based on the RSCU value of the 60 synonymous codons.

Conclusion

The codon usage bias in MT-ND2 gene is weak with high expression level. It is found that natural selection and mutation pressure affect the codon usage pattern in MT-ND 2 gene.

Supplementary material

Data 1
97320630011407S1.pdf (136.1KB, pdf)

Acknowledgments

We are thankful to Assam University, Silchar, Assam, India, for providing the necessary infrastructural facilities for this work.

Footnotes

Citation:Uddin et al, Bioinformation 11(8): 407-412 (2015)

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data 1
97320630011407S1.pdf (136.1KB, pdf)

Articles from Bioinformation are provided here courtesy of Biomedical Informatics Publishing Group

RESOURCES