Skip to main content
Bioinformation logoLink to Bioinformation
. 2013 Mar 19;9(6):299–308. doi: 10.6026/97320630009299

Comparative analysis to identify determinants of changing life style in Thermosynechococcus elongatus BP-1, a thermophilic cyanobacterium

Ratna Prabha 1,5, Dhananjaya P Singh 1,*, Shailendra K Gupta 2, Sávio Torres de Farias 3, Anil Rai 4
PMCID: PMC3607189  PMID: 23559749

Abstract

A comparative genomics analysis among all forty whole genome sequences available for cyanobacteria (3 thermophiles– Thermosynechococcus elongatus BP-1, Synechococcus sp. JA-2-3B'a (2-13), Synechococcus sp. JA-3-3Ab and 37 mesophiles) was performed to identify genomic and proteomic factors responsible for the behaviour of T. elongatus BP-1, a thermophilic unicellular cyanobacterium with optimum growth temperature [OGT] of 55°C. Majority of genomic and proteomic characteristics for this cyanobacterium indicated contrasting features indicating its mesophilic behaviour while the role of mutational biasness and selection pressure is thought to be responsible for high OGT. Contradictory results were obtained for T. elongatus for synonymous codon usage, CvP-bias and amino acid composition with respect to thermophilic behaviour. Calculated J2 index is lowest among all cyanobacterial genomes. Except for proline and termination codons, T. elongatus showed synonymous codon usage pattern which is expected for mesophiles. Results indicated that among cyanobacterial genomes, majority of genomic and proteomic determinants put T. elongatus very close to mesophiles and the whole genome of this organism represents continuous gain of mesophilic rather than thermophilic behavior.

Keywords: Thermosynechococcus elongatus, Thermophily, Genomics, Codon usage, CvP bias, J2 index

Background

Thermophiles provide comprehensive physiological, biochemical and molecular insights into the biology of microbial life at high temperature [1]. Hyperthermophiles grow optimally above 65°C, thermophiles have optimal growth temperature (OGT) between 45 to 65°C while mesophliles grow well below 55°C (OGT 37°C) [2]. Possible relation between OGT of the organisms and nucleotide content of their genomes [3, 4, 5, 6] and specific trends in amino acid composition [7, 8, 9] has widely been worked out as signatures for thermophilicity. Genomic evidences for thermal adaptations suggest a positive correlation between G+C content and OGT because G:C pair is more stable than A:T but, there are contradictions too [10]. The composition of purine/pyrimidine dinucleotides is shown to correlate linearly with the OGT in Archaea [11]. Increase in the purine (A+G) load index in the genomes of thermophilic bacteria [12] represents possible primary factor of adaptation mechanism. Several other identified factors for the thermal adaptations in the organisms include high core hydrophobicity [13], high secondary structure propensity and packing density [14, 15], increase in van der Waals interactions in thermophilic proteins [16], ionic interactions [17], high content of disulphide bonding [18, 19], hydrogen bonding [20], increase in Glu (E) and Lys (K) and decrease in Gln (Q) and His (H) residues [21]. Although genomic information facilitates the study of thermophily in the prokaryotes and determines thermostability at proteome level [22], the central issue lies in finding out the adaptive strategy of nucleic acid molecules towards different OGTs [10].

Despite significant efforts, compositional biases representing most definitive signatures of thermophilic adaptations in genomes and proteomes still remain elusive [9]. Thermophilic adaptations based on nucleotide biases are largely governed at the level of amino acid composition but additional adaptation in DNA sequence at the level of high-order correlation in nucleotides is inferred in prokaryotes using codon usage analysis [3] has also been suggested [9]. While thoroughly analysing Thermosynechococcus elongatus BP-1, a thermophilic unicellular rod-shaped cyanobacterium (OGT 55°C) inhabiting hot springs and comparing the same with 39 other cyanobacteria for whom whole genome sequences are available (NCBI, Genome database, May 2012), we found different levels of correlations among physical OGT, nucleotide and amino acid composition and codon biases. We discussed the relation of the observed pattern at nucleotide, proteome and amino acid level with the physical growth of T. elongatus and trends that appeared after comparative genomic analysis with other cyanobacterial organisms to find out determinants for the thermophilic or mesophilic behavior.

Methodology

Sequences:

Out of 42 complete genome sequences available for cyanobacteria (NCBI Genome, Apr. 2012), 40 sequences were taken for the study. For Arthospira platensis NIES-39 sequencing is incomplete and only one out of two sequences available for Synechocystis sp. PCC6803 was taken. Complete genome, protein and rRNA sequences were downloaded from NCBI for each of the 40 genomes Table 1 (see supplementary material). OGT of the organisms was obtained from the available literature.

DAMBE version 5.2.73 [23] was used for counting number of individual nucleotides (A, T, G, C) in each of the 40 genomes. PERL scripts were used to calculate the composition of different combination of dinucleotides [YR (TA, TG, CA, CG), RY (AT, AC, GT, GC), YY (TT, CC, TC, CT) and RR (AA, GG, AG, GA)]. J2 Index is the subtraction of the frequency of all combinations of YR (TA,TG, CA,CG) and RY (AT, AC, GT, GC) from that of all YY (TT, CC, TC, CT) and RR (AA, GG, AG, GA) combinations [11].

The index was calculated by the following formula: J2 index = Σ (FYY + FRR – FYR –FRY). Total number of particular codons in the genome and relative synonymous codon usage (RSCU) was calculated through CUSP from EMBOSS package (http://www.ebi.ac.uk/Tools/emboss/).

The relationship between G+C content of RNA and OGT (OGTRNA) was expressed by the equation - OGT-RNA= 2.91× (G+C) -103; Where, OGT-RNA is the OGT estimated in degree Celsius (°C) and G+C is the percentage of guanine and cytosine in 16S rRNA [2].

Percent composition and total number of each amino acid in the proteome was calculated with the help of PERL script. The difference between charged (Lys, Arg, Asp, Glu) and polar-noncharged amino acid (Asn, Gln, Ser, Thr) i.e. CvP-bias and E+K/Q+H ratio in the proteome was calculated.

Results & Discussion

Analysis of genomes:

All 40 genomes varied in size from 1.44 Mb (Cyanobacterium UCYN-A) to 9.04 Mb (Nostoc punctiforme PCC 73102). Genomic GC content varied from 30.8% (Prochlorococcus marinus subsp. pastoris CCMP1986 and P. marinus MIT 9515) to 62 % (Gloeobacter violaceous PCC 7421). In cyanobacterial genomes, number of CDS (coding sequences) vary from 1199 (Cyanobacterium UCYN-A) to 6312 (Microcystis aeruginosa NIES-843) (Table 1).

Out of these 40 cyanobacteria, except T. elongatus, two other cyanobacteria Synechococcus sp. JA-2-3B'a(2-13) (OGT 50 to 55°C) [24] and Synechococcus sp. JA-3-3Ab (OGT 50 to 60°C) [24] are also reported as thermophiles while rest of 37 are mesophiles. Thermophiles have shown higher proportion of G+C that varied from 30 to 60% or more, irrespective of their behaviour [2]. The genomic GC content in T. elongatus is 53.9% while over-all GC content of rRNA operon is 55.15% Table 2 (see supplementary material). In comparison to other two thermophilic cyanobacteria that showed higher GC content i.e. 58.5% (Synechococcus sp. JA-2-3B'a(2-13) and 60.2% (Synechococcus sp. JA-3-3Ab) GC content of T. elongatus seems to favour mesophilic behaviour.

Purine load is a preferred index for thermophiles with low GC content like T. elongatus but this is uncommon in nonthermophilic organisms [25]. Purine load index i.e. the concentration of A+G is known to exhibit highest correlation with OGT and represents a primary adaptation mechanism to thermophily [9]. In contrast to other thermophilic cyanobacteria that showed nearly similar purine and pyrimidine nucleotides, T. elongatus showed no biasness towards purine but had higher pyrimidine (51%) than purine (49%) content Table 3 (see supplementary material). This organism, therefore, neither strongly favours GC bias nor nucleotide bias for its thermophilic character.

Combination of purine (R)/pyrimidine (Y) dinucleotide composition is shown to correlate linearly with the OGT among thermophilic Archaea [11]. A higher J2 index is considered as important criteria for hyperthermophiles [26] and a positive J2 value is reported for the sequences of all the thermophiles, while negative value represented mesophiles [11]. A positive J2 index ranging from 0.003599 (Gloeobacter violaceus PCC 7421) to 0.148216 (P. marinus MIT 9301) was calculated for all the cyanobacteria studied except Synechococcus sp. RCC307 (J2 index -0.00142) Table 4 (see supplementary material). Among thermophilic cyanobacteria under study, J2 index calculated to be the least for T. elongatus (0.046) while Synechococcus sp. JA-3- 3Ab and Synechococcus sp. JA-2-3B'a(2-13) showed J2 index value as 0.079 and 0.083, respectively. It is suggested that for the thermophiles, J2 value should be positive whereas for mesophiles, a negative value is recommended [11]. Although, T. elongatus showed a positive value (0.046) for J2 index, this criterion is insufficient to establish this organism as mesophile or thermophile because, a high J2 index value is important but not a sufficient criterion for thermophilc behaviour [26].

rRNA analysis:

Significant correlation between G+C content of structural RNAs (rather than entire genome) and OGT [27] along with an additional preference for purine rich RNA (particularly adenine as compared to guanine in t-RNA and r-RNA but not in m- RNA) is reported in thermophiles [28]. Showing thermophilic tendency, T. elongatus uses purines more frequently than pyrimidine for both tRNA and rRNA but prefer guanine over adenine that further reflects mesophilic nature. While considering distribution of individual nucleotides i.e. A, T, G, C in 16S rRNA across all cyanobacteria, we identified that purine content is higher than pyrimidine in all of them where G occupied the maximum percentage and T the minimum. However, for A and C, there is different trend for thermophiles and mesophiles. Among thermophilic cyanobacteria (including T. elongatus) maximum percent content in rRNA is of C after G, whereas for mesophilic cyanobacteria it is for A. Comparative analysis of T. elongatus rRNA GC content (55.15%) with Bacillus-related mesophilic species (in which it varied from 52.7 to 54.4% in rRNA GC content and 42 to 58°C in growth temperatures) [22] showed behavioural similarity. But, significant difference is observed when we compare the genomic G+C content of T. elongatus (53%) and Bacillus-related mesophilic species (35.3- 43.7%) in relation to the OGT. On the basis of 16S rRNA GC content,calculated OGT for T. elongatus was 57.25 °C and this correlated with the physical growth temperature of this organism. More or less, the same is reflected by the GC content of rRNA (52.7-54.4%) in Bacillus-related mesophilic species, OGT of which lies in the range of 42-58°C and the organism is a mesophile.

Analysis of protein content:

In organisms, enhanced thermostability reflected specific trend of their amino acid composition [7, 8, 29] and that too, especially in the increased fraction of charged residues [6, 30] and/or enhanced content of hydrophobic residues [9]. We examined difference between charged (Lys, Arg, Asp, Glu) and polar non-charged amino acids (Asn, Gln, Ser, Thr) which were considered as a characteristic signature of thermophilic microorganisms [26]. T. elongatus, along-with other 39 cyanobacteria under study, did not show any preference for charged amino acids and there is no significant difference between charged and polar amino acids across all the cyanobacteria. Charged amino acids occupy 20%, polar-noncharged 19.3% and others 60.7% of entire proteome (Figure 1) and therefore, significant difference between charged and noncharged amino acids does not exist in T. elongatus (Figure 2).

Figure 1.

Figure 1

Distribution of charged amino acids (Asp, Glu, Arg, Lys), polar non-charged amino acids (Asn, Gln, Ser, Thr) and non-polar amino acids (Gly, Ala, Val. Leu, Ile, Phe, Trp, Tyr, Pro, Met, Cys, His) in T. elongatus proteome

Figure 2.

Figure 2

Distribution of different amino acids in T. elongatus

CvP bias, the ratio of charged (Lys, Arg, Asp, Glu) and polar non-charged amino acids (Asn, Gln, Ser, Thr) is an important signature and a global characteristic of all thermophiles (OGT>55°C) in which it remains markedly higher than the mesophilic organisms [26]. Among all cyanobacteria, Gloeobacter violaceus PCC 7421 showed highest value for CvP bias i.e. 3.83. Synechococcus sp. JA-3-3Ab (2.713) and Synechococcus sp. JA-2- 3B'a(2-13) (2.06) also have a higher value for CvP along with some other strains of P. marinus but T. elongatus showed a small CvP bias value of 0.68 that establishes a mesophilic life style for this cyanobacteria. The ratio of Glu (E) +Lys (K)/Gln (Q) +His (H) for T. elongatus proteome is calculated to be 1.12 that again indicates mesophilic behaviour. Other two thermophiles also show quite similar but comparatively higher value. Some strains of P. marinus showed higher value. In T. elongatus genome, 30 genes are predicted to be responsible for thermophily. On considering each gene (and their protein products) individually, similar results for CvP bias as of entire proteome except for three proteins namely chaperonin2, chaperonin GroEL and chaperonin GroES (4.2, 5.27 and 9.5 respectively) were found. Chaperonins are potentially thermostable proteins that are expected to favour thermophilic behaviour in organisms [21].

Total fraction of the universal set of amino acids (Ile, Val. Tyr, Trp, Arg, Glu, Leu [IVYWREL]) in the proteome is considered to correlate with OGT [9]. In T. elongatus proteome, the fractional composition of the universal set of amino acids (IVYWREL) is 41.56%. Similar distribution of IVYWREL is observed across all the cyanobacteria. In comparison to mesophiles, thermophiles have significantly higher content of Val and Glu than Gln and Thr [29] along with the reduced frequency of His and Gln. No significant change is observed in the distribution of Val and Glu across meso- and thermophilic cyanobacteria. Leu is in high proportion (11.97%) in the proteome in terms of amino acid composition but the composition of His is only 2.2%. Across all the cyanobacteria under study, Leu occupies the largest proportion whereas Cys occupies the least and there remains no exception. These results indicated a thermophilic pattern for His only but a good proportion of Gln (5.76%) and Thr (5.48%) along with the low fraction of IVYWREL indicated mesophilic behaviour for T. elongatus Table 5 (see supplementary material). Similarly, among the amino acids Glu, Arg, Tyr, Asp and Lys that are reported to be abundant in thermophiles [8, 31], T. elongatus proteome contains high proportion of Glu (5.84%) and Arg (6.61%) but other amino acids are poorly contained (Table 5).

Codon usage pattern:

Synonymous codon usage pattern in thermophilic prokaryotes is different from mesophiles and the difference is a result of natural selection linked to thermophily. Moreover, this phenomenon is not restricted to specific set of genes and affects all of the genes within the genome [32]. Synonymous codon usage pattern in T. elongatus genome does not show any distinguishable thermophilic pattern. Except for proline and termination codons, T. elongatus shows synonymous codon usage pattern which is expected for mesophiles and not for thermophiles.

Thermophilic prokaryotes characteristically include increased usage of AGR codons for arginine, ATA for isoleucine and decreased usage of CGN for arginine [6, 33, 34]. For Arg, T. elongatus preferentially uses CGC out of the 6 codons available along with other two thermophilic cyanobacteria. For isoleucine, ATC is used by both Synechococcus sp. JA-3-3Ab and Synechococcus sp. JA-2-3B'a(2-13) whereas T. elongatus uses ATT which resembles mesophilic cyanobacteria. Biasness for arginine codon family containing the largest number of codons [6] in organisms is a signature of thermophiles and hyperthermophiles that represent important implications to thermostability [21]. T. elongatus preferentially uses CGC not only in arginine codon family but also in the entire genome followed by CGG. This is supported by the relative synonymous codon usage (RSCU) value for CGC which is highest followed by CGG and CGU in the entire genome. These results contradict with the fact that thermophiles and hyperthermophiles tend to employ AGR to encode arginine. Preferential usage of AGR for arginine implies positive error minimization and contributes to avoid mutations that harm protein thermostability and represent alternative mechanism of adaptation to proteins [1]. In T. elongatus, AGA and AGG are least used for Arg in the entire genome reflecting a mesophilic character of the genome because encoding for arginine by CGT and CGC are positive indicators for mesophiles [35]. Similar codon usage pattern was observed when all the 30 genes predicted for thermophily were considered individually (data not shown).

Entire proteome composition of T. elongatus showed that out of the total number of Leu, maximum is encoded by CUG codons. Thermophiles prefer GGR over GGY and vice-versa and this pattern is characteristically established for glycine (Gly) and arginine (Arg) [24]. It is observed that maximum cyanobacteria, including thermophiles preferentially use GGY over GGR. Also among GGY, GGC is more commonly used than GGT by thermophilic cyanobacteria. Like other mesophilic cyanobacteria and previous studies, preferences of T. elongatus genome for GGY and CGY codons for Gly and Arg again indicated mesophilic behaviour of the organism (data not shown).

A preference for specific nucleotides at 3rd position of codons of some amino acids has been shown by some thermophilic organisms [36]. Given a synonymous choice between T and C (Asn, Asp, Cys, His, Phe, Tyr) at 3rd position, thermophiles systematically favour C-ending codons. Similar trend was also followed by the two thermophilic cyanobacteria (with an exception for phenylalanine). However, T. elongatus prefers Tending codons instead of C-ending codons. When the choice at 3rd position is between A and G (Gln, Glu, Lys), G-ending codons are preferentially favoured [36]. Synechococcus sp. JA-3- 3Ab and Synechococcus sp. JA-2-3B'a(2-13), like other thermophiles, show expected pattern but T. elongatus preferentially uses A-ending codons for these amino acids as is evident for mesophilic cyanobacteria. Among codons ending with pyrimidine, T-ending codons are most widely used across all mesophilic cyanobacteria including T. elongatus whereas Synechococcus sp. JA-3-3Ab and Synechococcus sp. JA-2-3B'a prefer C-ending codons as expected for thermophiles. G-ending codons also share a significant proportion among two thermophilic cyanobacteria whereas for T. elongatus and other mesophiles studied, they are the least used codons (data not shown). For termination codons, T. elongatus favours G-ending codons like thermophiles but mesophiles prefer A-ending termination codons. This characteristic reflects thermophilic behaviour of T. elongatus.

Thermal adaptations in T. elongatus:

Potential adaptations to high temperatures in thermophilic organisms lie in their highly efficient and specialized DNA repair systems [37]. A specific DNA repair system largely confined to thermophiles is known to be regulated by a pathway consisting of more than 20 genes [38]. This pathway is missing in T. elongatus genome. Even no gene of T. elongatus showed significant similarity to the genes conferring the DNA repair system. The genes encoding fatty acid desaturases (desA, desB, desC) are missing reflecting the absence of highly unsaturated fatty acids in lipids composition which is characteristic of thermophiles [39]. Genes for replication, recombination and repair (as per COG classification) occupy a major composition (6.7049%) in T. elongatus genome along with a significant composition of genes involved in posttranslational modification, protein turnover, chaperones (4.0158%) showing thermal adaptation mechanisms.

Conclusion

T. elongatus requires high temperature (55°C) for physical growth [39]. To a certain extent, genomic determinants but majority of proteomic determinants indicated mesophilic behavior of this organism with an OGT of (38.41°C) and that too, was supported by various reports [22, 40, 41]. In correspondence analysis performed for the proteomes of 279 prokaryotes, T. elongatus occupies the position under a large group of mesophiles [34]. Phylogenetically, T. elongatus is again found to be closely related to a large group of mesophiles suggesting a recent gain of thermophily by this organism [34]. Thermophilic adaptation mechanisms such as the presence of additional genes for heat-shock proteins [39], 28 copies of group II introns comprising of almost 1.3% of the whole genome [39, 42], and heat- induced groEL2 gene [43] gained by T. elongatus over the time has made it comfortable in mesothermophilic conditions. Additional features like presence of widely temperature compensated endogenous circadian rhythm is known in mesophilic organisms [44] and presumptive counterparts of all the genes involved in circadian clock system have also been identified in T. elongatus [39]. Adaptation to a new environment often necessitates a coordinated change in the genomic organization, rather than independent modifications of individual components [11]. Our results indicated that among the cyanobacteria, majority of genomic and proteomic determinants put T. elongatus very close to the mesophiles and the whole genome of this organism represents continuous gain of mesophilic rather than thermophilic behavior.

Supplementary material

Data 1
97320630009299S1.pdf (54.7KB, pdf)

Acknowledgments

Financial support from Indian Council of Agricultural Research, India in the form of “National Agricultural Innovation Project” entitled “Establishment of National Agricultural Bioinformatics Grid” (NABG) is gratefully acknowledged.

Footnotes

Citation:Prabha et al, Bioinformation 9(6): 299-308 (2013)

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data 1
97320630009299S1.pdf (54.7KB, pdf)

Articles from Bioinformation are provided here courtesy of Biomedical Informatics Publishing Group

RESOURCES